## Comparison and Critique of Anomaly Combination Methods

Posted by Jeff Id on August 21, 2010

This post is a comparison of the first difference method with other anomaly methods and an explanation of why first difference methods don’t work well for temperature series as well as comparison to other methods. The calculatoins here are relatively simple. First I built 10,000 artificial temp series from 1910 -2010 and another 10,000 from 1960-2010 having the same trendless AR1 noise as the King_Sejong temperature station in the Antarctic. Why the Antarctic? It was easy for me to access. I applied a uniform trend of 0.05 C/Decade to both the short and long series with the shorter series offset -.25 C from the first to mimic the offset created by temperature anomaly calculations. Simple version, two temp stations of different lengths with known trend of 0.05C.

I calculated the same stats for each series and each method below.

First, the 10,000 long series by itself 1910-2010.

Long Series AloneTrend: 0.0496 C/Decade

Autocorrelation confidence interval from single trend Santer style: +/-0.04 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0399

Then the 10,000 shorter series 1960-2010. Both the long and short have the same AR1

Short Series Alone

Trend: 0.0503 C/Decade

Autocorrelation confidence interval from single trend Santer style: +/-0.1244 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.111

As expected the shorter series has wider confidence intervals. Next, the most simple anomaly combination where the two anomalies are averaged together with no offset.

Simple AverageTrend: 0.0309 C/Decade

Autocorrelation confidence interval from single trend Santer style: +/-0.0351 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0346

Note that the trend is underestimated by this method. This is the reason that anomaly offsetting is critical to calculating a proper global trend. In climatology the offsetting problem is often handled by taking the anomaly of the series over a short window. This has advantages which include ease of computation and drawbacks in that it slightly underestimates the trend. For this post I chose a window of 1950 – 1970, note the second series begins half way through the window at 1960.

Anomaly Offset by Window (Climatology)Trend: 0.0485 C/Decade — slightly low.

Autocorrelation confidence interval from single trend Santer style: +/-0.0311 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0469

Note that the Monte Carlo confidence has expanded. This is due to noise on the offset calculation (120 months offset) which causes a *systematic error that is not included in confidence intervals for global datasets*. The Santer method only sees the noise in the final trend and doesn’t pick up on this methodological error which is an important point for the first difference method which we’ll apply next.

First Difference MethodTrend: 0.0500 C/Decade – Perfect

Autocorrelation confidence interval from single trend Santer style: +/-0.0363 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.245

The trend using first differences over 10,000 series came out perfect in this run. Sometimes it was slightly high or low in other runs but we would expect any differences to center around the true trend, an advantage for FDM. The Santer confidence interval was of reasonable size being calculated from only 1 run, but the more encompassing Monte Carlo two sigma confidence interval *expanded to a whopping +/-0.245 C/Decade 7X larger than normal.* This is the reason that first difference doesn’t really work for temperature series, I’ll demonstrate what is going on later in the post. Next though, we’ll look at what Roman’s seasonal offset method does.

Roman’s Seasonal OffsetTrend: 0.0498 C/Decade

Autocorrelation confidence interval from single trend Santer style: +/-0.035 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0385

**——–**

Of all the methods above, Roman’s is the only one to replicate the trend correctly while having a reasonable confidence interval. The climatology standard comes close, but only Roman’s least squared method achieves the correct answer. Note his trend is slightly higher than the climate standard anomaly window method, but is more correct.

### So what is happening to make first difference so bad?

Figure 1 is a plot of the two trends used in this post. Both have identical slopes of 0.05.

Figure 2 is a plot of the average of the two above slopes. Note that the trend matches the ‘simple anomaly’ combination above very closely – makes sense! Please ignore the confidence interval.

I created two random ARMA trends (noise) adding in the above trends. The trend 0f 0.05C/Decade isn’t visible under the noise.

The next plot is the two above series combined by first differences.

It’s hard to miss the big step in this plot at 1960 and the hugely significant trend. But what caused it? Figure 5 is a zoomed in point where the second series is introduced.

To explain further, the next plot is a bit different. I’ve taken the short series and offset it to make the first coexisting point equal to the long series value at that time.

It looks extreme but that is exactly what first differences does. A zoomed in plot of the above with the intersection point highlighted in green is below.

Figure 8 is the average of Figure 6. Note that it is the same curve as Figure 4.

The difference between the two results is a straight line.

So there you have it. The first difference method is really just an offset of the anomaly based on the first coexisting point. Since there is so much mircoclimate noise on the temperature signal, the offsets by using a single point are prohibitively large. I chose this example to demonstrate the offset clearly, but other runs step negative and sometimes the step isn’t visible at all. The average of all the steps over 10,000 series is about zero, but even over 50 series the average trend is dominated by this random microclimate noise. **That’s why FDM is a terrible method for combining temperature trends. ** While the net trend using thousands of stations will be good, regional areas having even 20 stations will demonstrate complete nonsense trends with FDM.

**Now that we can see what FDM does, consider improving the offset of the shorter anomaly above based on more than one coexisting data point!** This was the topic of the following post.

Area Weighted Antarctic Reconstructions

The method used a window of overlapping points to determine the offset for the inserted series. The accuracy of this method falls between the climatology normal method and Roman’s seasonal offset but very much closer to Roman’s method. Figure 3 is reproduced below so you can see how much the trend from even 63 stations varied as we went from 1 month overlap ( equivalent to FDM) to 95. The proper trend for the Antarctic for the last 50 years is about 0.06C/Decade yet FDM returned negative values.

Note the stability of trend once enough months of overlapping data are utilized.################################################### ## gridded global trend calculation with offset#### ################################################### #### load external functions filtering used source("http://www.climateaudit.info/scripts/utilities.txt") #Steve McIntyre ################################################### ######## functions ################################ ################################################### # many of these functions were written by Ryan O SteveM, NicL,RomanM ### Gausssian filter ff=function(x) { filter.combine.pad(x,truncated.gauss.weights(11) )[,2]#31 } ### Sum of squares ssq=function(x) {sum(x^2)} ################################################ ######## roman M offset series combination ##### ################################################ ##### version2.0 ### subfunction to do pseudoinverse psx.inv = function(mat,tol = NULL) { if (NCOL(mat)==1) return( mat /sum(mat^2)) msvd = svd(mat) dind = msvd$d if (is.null(tol)) { tol = max(NROW(mat),NCOL(mat))*max(dind)*.Machine$double.eps } dind[dind<tol]=0 dind[dind>0] = 1/dind[dind>0] inv = msvd$v %*% diag(dind, length(dind)) %*% t(msvd$u) inv } ### subfunction to do offsets calcx.offset = function(tdat,wts) { ## new version nr = length(wts) delt.mat = !is.na(tdat) delt.vec = rowSums(delt.mat) row.miss= (delt.vec ==0) delt2 = delt.mat/(delt.vec+row.miss) co.mat = diag(colSums(delt.mat)) - (t(delt.mat)%*% delt2) co.vec = colSums(delt.mat*tdat,na.rm=T) - colSums(rowSums(delt.mat*tdat,na.rm=T)*delt2) co.mat[nr,] = wts co.vec[nr]=0 psx.inv(co.mat)%*%co.vec } temp.combine = function(tsdat, wts=NULL, all=T) { ### main routine nr = nrow(tsdat) nc = ncol(tsdat) dims = dim(tsdat) if (is.null(wts)) wts = rep(1,nc) wts=wts/sum(wts) off.mat = matrix(NA,12,nc) dat.tsp = tsp(tsdat) for (i in 1:12) off.mat[i,] = calcx.offset(window(tsdat,start=c(dat.tsp[1],i), deltat=1), wts) colnames(off.mat) = colnames(tsdat) rownames(off.mat) = month.abb matoff = matrix(NA,nr,nc) for (i in 1:nc) matoff[,i] = rep(off.mat[,i],length=nr) temp = rowMeans(tsdat-matoff,na.rm=T) pred=NULL residual=NULL if (all==T) { pred =c(temp) + matoff residual = tsdat-pred } list(temps = ts(temp,start=c(dat.tsp[1],1),freq=12),pred =pred, residual = residual, offsets=off.mat) } #pick out those series with have at least nn + 1 observations in every month #Outputs a logical vector with TRUE indicating that that sereis is OK dat.check = function(tsdat, nn=0) { good = rep(NA,ncol(tsdat)) for (i in 1:ncol(tsdat)) good[i]= (min(rowSums(!is.na(matrix(tsdat[,i],nrow=12))))>nn) good } ###################################################################### ##### slope plotting function with modifications for 12 month anomaly ###################################################################### get.slope=function(dat=dat) { len=length(dat) month = factor(rep(1:12,as.integer(len/12)+1)) month=month[1:len] #first month by this method is not necessarily jan #the important bit is that there are 12 steps time2=time(dat) coes = coef(lm(dat~0+month+time2)) residual=ts(rep(NA,len),start=time2[1],deltat=1/12) fitline=array(NA,dim=c(len,12)) for(i in 1:12) { fitline[,i]=coes[13]*time2+coes[i] } anomaly=rep(NA,len) for(i in 1:len) { residual[i]=dat[i]-fitline[i,month[i]] } anomaly = residual+time2*coes[13]-mean(time2*coes[13],na.rm=TRUE) list(coes,residual,anomaly) } ### plot average slope using roman's methods plt.avg2=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled") { ### Get trend fm=get.slope(window(dat, st, en)) ### Initialize variables N=length(window(dat, st, en)) I=seq(1:N)/frequency(dat) rmI=ssq(I-mean(I)) ### Calculate sum of squared errors SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2) ### Calculate OLS standard errors seOLS=sqrt(SSE/rmI)*10 ### Calculate two-tailed 95% CI ci95=seOLS*1.964 ### Get lag-1 ACF r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]] ### Calculate CIs with Quenouille (Santer) adjustment for autocorrelation Q=(1-r1)/(1+r1) ciQ=ci95/sqrt(Q) maxx=max(time(fm[[3]])) minx=min(time(fm[[3]])) maxy=max(fm[[3]],na.rm=TRUE) miny=min(fm[[3]],na.rm=TRUE) ### Plot data plot(window(fm[[3]], st, en), main=main.t, ylab="Anomaly (Deg C)") lfit=lm(window(fm[[3]], st, en)~I(time(window(fm[[3]], st, en)))) ### Add trendline and x-axis abline(a=lfit[[1]][1],b=lfit[[1]][2], col=2); abline(fm, col=4, lwd=2) grid() ### Annotate text tx=paste("Slope: ", round(fm[[1]][13]*10, 4) ,"Deg C/Decade", "+/-", round(ciQ, 4) , "Deg C/Decade\n(corrected for AR(1) serial correlation)\nLag-1 value:" , round(r1, 3)) text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.8+miny, y.pos), tx, cex=.8, col=4, pos=4) } ######## function plots monthly data with seasonal signal intact 12 slope curves plt.seasonal.slopes=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled") { ### Get trend fm=get.slope(window(dat, st, en)) ### Initialize variables N=length(window(dat, st, en)) I=seq(1:N)/frequency(dat) rmI=ssq(I-mean(I)) ### Calculate sum of squared errors SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2) ### Calculate OLS standard errors seOLS=sqrt(SSE/rmI)*10 ### Calculate two-tailed 95% CI ci95=seOLS*1.964 ### Get lag-1 ACF r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]] ### Calculate CIs with Quenouille (Santer) adjustment for autocorrelation Q=(1-r1)/(1+r1) ciQ=ci95/sqrt(Q) maxx=max(time(dat)) minx=min(time(dat)) maxy=max(dat,na.rm=TRUE) miny=min(dat,na.rm=TRUE) ### Plot data plot(window(dat, st, en), main=main.t, ylab="Temperature (Deg C)") ### Add 12 trendlines and x-axis for(i in 1:12) { abline(a=fm[[1]][i],b=fm[[1]][13], col=2); } grid() ### Annotate text tx=paste("Slope: ", round(fm[[1]][13]*10, 4),"C/Dec" ,"+/-", round(ciQ, 4) ,"C/Dec(AR(1)Lag-1 val:" , round(r1, 3)) text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.05+miny, y.pos), tx, cex=.8, col=4, pos=4) } ################################################################### ################################################################### ################################################################### ################################################################### ################################################################### ################################################################### ################################################################### ################################################################### #temp stations King_Sejong data from BAS #You don't need to run this section because I only used the AR1 value for the #next part arm=arima(Yo[,22] ,order=c(1,0,0))#FIT ARIMA MODEL sdv=sd(tdif,na.rm=TRUE) Call: arima(x = tdif, order = c(1, 0, 0)) Coefficients: ar1 intercept 0.4987 0.0023 s.e. 0.0456 0.2191 ################################################################### tren=seq(0,.5,1/2400) #long series slp=array(NA,10000) for(i in 1:10000) { sim=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren sim=ts(sim,end=2010,deltat=1/12) slp[i]=lsfit(time(sim),sim)[[1]][2] } sd(slp)*2*10 #-- +/- 0.0399 C/Dec mean(slp)*10 #0.0496C/Dec plt.avg(dat=sim,st=1900,en=2010,y.pos=2,x.pos=1910) #+/- 0.04 C/Dec #short series slp=array(NA,10000) for(i in 1:10000) { sim=arima.sim(n = 601,list(ar=0.4987, ma = 0,sd = sdv))+tren[1:601] sim=ts(sim,end=2010,deltat=1/12) slp[i]=lsfit(time(sim),sim)[[1]][2] } sd(slp)*2*10 #-- +/- 0.111 mean(slp)*10 #0.0503C/Dec plt.avg(dat=sim,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.1244 C/Dec #simple anomaly combination slp=array(NA,10000) for(i in 1:10000) { sim=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren sim2=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren-.25 sld= cbind(sim,sim2) sld[1:600,2]=NA sim3=ts(rowMeans(sld,na.rm=TRUE),end=2010,deltat=1/12) slp[i]=lsfit(time(sim3),sim3)[[1]][2] } sd(slp)*2*10 #-- +/- 0.0351 mean(slp)*10 #0.0309C/Dec plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.0346 C/Dec #window anomaly combination slp=array(NA,10000) for(i in 1:10000) { sim=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren sim2=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren-.25 sld= ts(cbind(sim,sim2),end=2010,deltat=1/12) sld[1:600,2]=NA cm=colMeans(window(sld,start=1950,end=1970),na.rm=TRUE) sld=t(t(sld)-cm) sim3=ts(rowMeans(sld,na.rm=TRUE),end=2010,deltat=1/12) slp[i]=lsfit(time(sim3),sim3)[[1]][2] } sd(slp)*2*10 #-- +/- 0.0469 mean(slp)*10 #0.0485C/Dec plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.0311 C/Dec ### first difference method slp=array(NA,10000) for(i in 1:10000) { sim=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren sim2=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren-.25 sld= cbind(sim,sim2) sld[1:600,2]=NA dsld=diff(sld,1) sim3=ts(cumsum(rowMeans(dsld,na.rm=TRUE)),end=2010,deltat=1/12) slp[i]=lsfit(time(sim3),sim3)[[1]][2] } sd(slp)*2*10 #-- +/- 0.245 mean(slp)*10 #0.0500C/Dec plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.0363 C/Dec ### Roman Seasonal Offset method slp=array(NA,10000) for(i in 1:10000) { sim=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren sim2=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren-.25 sld= cbind(sim,sim2) sld[1:600,2]=NA sld=ts(sld,end=2010,deltat=1/12) sim3=temp.combine(sld)$temps slp[i]=lsfit(time(sim3),sim3)[[1]][2] print(i) } sd(slp)*2*10 #-- +/- 0.0385 mean(slp)*10 #0.0498C/Dec plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.035 C/Dec ######### # plots ta=ts(tren,end=2010,deltat=1/12) tb=ta-.25 tb[1:600]=NA plot(ta, main="Artificial Trends Used Above", xlab="Year", ylab="Deg C") lines(tb,col="red") smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long Trend","Short Trend") ,cex=.7) #savePlot("c:/agw/plots/1.jpg",type="jpg") #show slope of averaged trends tcomb=ts(rowMeans(cbind(ta,tb),na.rm=TRUE),end=2010, freq=12) plt.avg(tcomb,main.t="Averaged Trends",st=1900,en=2010,x.pos=1910) #savePlot("c:/agw/plots/1.jpg",type="jpg") #first difference plots sim=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren sim2=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren-.25 sld= cbind(sim,sim2) sld[1:600,2]=NA dsld=diff(sld,1) sim3=ts(cumsum(rowMeans(dsld,na.rm=TRUE)),end=2010,deltat=1/12) plot(sld[,1], main="Simulated Temperature Series With Artificial Trends", xlab="Year", ylab="Deg C") lines(sld[,2],col="red") smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long series with trend","Short series with trend") ,cex=.7) #savePlot("c:/agw/plots/1.jpg",type="jpg") plt.avg(sim3,main.t="First Difference Combination of Series",st=1900,en=2010,x.pos=1910,y.pos=4) #savePlot("c:/agw/plots/1.jpg",type="jpg") plot(sim3,xlim=c(1959,1961),main="Zoomed in Interface Between Series 1 and 2") abline(v=1960,col="red") #savePlot("c:/agw/plots/1.jpg",type="jpg") sldoffset = sld sldoffset[,2]=sldoffset[,2]-(sldoffset[601,2]-sldoffset[601,1])#offset temp series to match plot(sldoffset[,1], main="Simulated Temperature Series With Artificial Trends \nOffset by First Coexisting Point", xlab="Year", ylab="Deg C",ylim=c(-3,4)) lines(sldoffset[,2],col="red") smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long series with trend","Short series with trend") ,cex=.7) #savePlot("c:/agw/plots/1.jpg",type="jpg") plot(sldoffset[,1], main="Simulated Temperature Series With Artificial Trends \nOffset by First Coexisting Point", xlab="Year", ylab="Deg C",ylim=c(-3,4),xlim=c(1959,1961)) lines(sldoffset[,2],col="red") points(1960,sldoffset[601,1],col="green",pch=19) smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long series with trend","Short series with trend") ,cex=.7) #savePlot("c:/agw/plots/1.jpg",type="jpg") offsetmean=ts(rowMeans(sldoffset,na.rm=TRUE),end=2010,freq=12) plt.avg(offsetmean,main.t="Combination of Series Offset by First Point",st=1900,en=2010,x.pos=1910,y.pos=3) #savePlot("c:/agw/plots/1.jpg",type="jpg") diff=sim3-offsetmean plot(diff-mean(diff),ylim=c(-1,1),main="Difference Plot FDM minus Offset Anomaly by First Coexisting Point",xlab="Year", ylab="Deg C") #savePlot("c:/agw/plots/1.jpg",type="jpg")

## Brian H said

Lesson: don’t use linear extrapolations of apparent short term trends?

## Kenneth Fritsch said

http://www.ncdc.noaa.gov/temp-and-precip/ghcn-gridded-temp.html

Data Set Development

These datasets were created from station data using the Anomaly Method, a method that uses station averages during a specified base period (1961-1990 in this case) from which the monthly/seasonal/annual departures can be calculated. Prior to March 25, 2005 the First Difference Method (Peterson et al. 1998) was used in producing the Mean Temperature grid, and prior to February 24, 2004, the FD Method was used in producing the Maximum and Minimum Temperature grids. Due to recent efforts to restructure the GHCN station data by removing duplicates (see Peterson et al. 1997 for a discussion of duplicates), it became possible to develop grids for all temperature data sets using the Anomaly Method. The Anomaly Method can be considered to be superior to the FD Method when calculating gridbox size averages. The FD Method has been found to introduce random errors that increase with the number of time gaps in the data and with decreasing number of stations (Free et al. 2004). While the FD Method works well for large-scale means, these issues become serious on 5X5 degree scales.

## RomanM said

You’re a busy man, Jeff. Good stuff!

You’ve become quite a data analyst in the past year or two. maybe you should have become an academic instead of just a practical engineer. ;)

## Brian H said

RomanM;

Jeff’s doing good work so you want to demote him? WUWT? >:(

## Jeff Id said

Thanks Roman, unfortunately I didn’t think of this problem until after I saw FDM not working. That’s how it goes sometimes though.

## Nic L said

Excellent article, Jeff. It very clearly explains why FDM won’t work satisfactorily with noisy data. One has to compare stations using a longer period of overlap, as you say.

## Genghis said

Jeff I was wondering if you could do the reverse and see what happens when you take out a few inputs. They have dropped a lot of stations and I am wondering if that has an effect.

## David Jones said

Your anomaly by climatology example is not representative: The 20 year window is too short; the 10 year overlap is too short.

CRU requires 15 years in a 30 year window. Your short series would be reject in a CRU style analysis.

GISTEMP (Reference Station Method) when combining records (critically, Step 3 when station records are combined to produce a gridded product) combines using as many years are in the overlap, and requires 20 years.

## Jeff Id said

#8 An odd comment.

Of course if I widened the window centered on the start of the short series to 30 years, the series would pass the artificial CRU criteria and it would only affect the confidence interval so why does that matter?

## Steven Mosher said

#8.

David, what makes you think that CRU criteria are “correct” there is no CORRECT. I use 15 years, but I require the series to have ALL 12 months present. CRU only require 10 months to make a year. which is correct? neither. Each gives you an estimate with different properties. The best one can do with CAM is look at a variety of criteria and study the effect DUE TO your choice.

Roman’s method has no such “analyst chocies” all the data is used and the best estimate is made from the data. Simply,

the trend changes is you go from 15 years to 10 to 20, 10 month years, 11 month, 12 month 10 month.. or how about

at least 12 full years, 5 years with 11 months and 3 years with 10 months? That variation is a variation that is due to the “model specification” that the analyst defines arbitrarily. It puts undocumented uncertainty into the estimate

## David Jones said

@Jeff #9. Doesn’t your observation in the article “Note that the Monte Carlo confidence has expanded.” deserve a note to say that had the overlap and window been larger then the “expanded confidence” would not have expanded so much? Aren’t we comparing the confidence intervals that different methods give?

@Steven #10. Who mentioned “CORRECT”?

FTR I have no problems with Roman’s method and I understand it perfectly. I don’t know why you mention it.

## Jeff Id said

#11 Yes the confidence interval would tighten as I mentioned. It would still underestimate the trend though. My intent wasn’t to replicate CUR or gisstemp (we did that elsewhere) but to demonstrate the strengths and weaknesses of the different methods.

## David Jones said

Why are you regressing monthly anomalies? Didn’t Roman M tell us that we have to compute yearly anomalies first? (which is what GISTEMP does).

## RomanM said

#13

Did I miss something in my own post? That is not what I said.

In fact, calculating yearly anomalies is a problem when some months are missing. Infilling may increase uncertainty and skipping years with some missing monthly values results in the loss of information from the months which are present. The methods I describe can be applied to the original non-anomaly measurements using all of the information which is available without risk of distortion due to seasonal starting points.

## David Jones said

@RomanM: Ah yes, you’re right. You said “Several fixes are possible. The simplest is to use year rather than time as the independent variable in the regression.”

Which in this toy example amounts to the same thing as computing yearly anomalies.

Of course I agree with your following qualification: “this solution can still be problematical if the number of available anomalies differs from year to year” if by “problematical” you mean “introduces distortions that a much smaller than a dot of printer’s ink”.

Do you agree that Jeff is introducing an error by drawing trend lines through monthly anomalies?

## Jeff Id said

#15, In this example there is no stairstep from monthly anomaly. The data is random AR1 with a gaussian distribution. Think of it like a correct anomaly, rather than the climate standard.

## David Jones said

So the mean of your July “anomalies” isn’t 0, it’s whatever the trend line is when it hits July (presumably, at 0.05C per decade, some small value).

## Jeff Id said

#17, Yes. Interestingly if you read some of the older posts on Roman’s method, it provides a system to do this exact calculation. I think of it as a true anomaly, it makes very little difference to the result though.

## Steven Mosher said

Sorry david.

when you said that 20 was “too short” I thought you were implying there was a correct method. Glad to see you say that there is no correct method.