the Air Vent

Because the world needs another opinion

Comparison and Critique of Anomaly Combination Methods

Posted by Jeff Id on August 21, 2010

This post is a comparison of the first difference method with other anomaly methods and an explanation of why first difference methods don’t work well for temperature series as well as comparison to other methods. The calculatoins here are relatively simple.  First I built 10,000 artificial temp series from 1910 -2010 and another 10,000 from 1960-2010 having the same trendless AR1 noise as the King_Sejong temperature station in the Antarctic.  Why the Antarctic? It was easy for me to access.  I applied a uniform trend of 0.05 C/Decade to both the short and long series with the shorter series offset -.25 C from the first to mimic the offset created by temperature anomaly calculations.  Simple version, two temp stations of different lengths with known trend of 0.05C.

I calculated the same stats for each series and each method below.

First, the 10,000 long series by itself 1910-2010.

Long Series Alone

Trend:  0.0496 C/Decade

Autocorrelation confidence interval from single trend Santer style:  +/-0.04 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0399

Then the 10,000 shorter series 1960-2010.  Both the long and short have the same AR1

Short Series Alone

Trend:  0.0503 C/Decade

Autocorrelation confidence interval from single trend Santer style:  +/-0.1244 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.111

As expected the shorter series has wider confidence intervals. Next, the most simple anomaly combination where the two anomalies are averaged together with no offset.

Simple Average

Trend:  0.0309 C/Decade

Autocorrelation confidence interval from single trend Santer style:  +/-0.0351 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0346

Note that the trend is underestimated by this method.  This is the reason that anomaly offsetting is critical to calculating a proper global trend. In climatology the offsetting problem is often handled by taking the anomaly of the series over a short window.  This has advantages which include ease of computation and drawbacks in that it slightly underestimates the trend.  For this post I chose a window of 1950 – 1970, note the second series begins half way through the window at 1960.

Anomaly Offset by Window (Climatology)

Trend:  0.0485 C/Decade  — slightly low.

Autocorrelation confidence interval from single trend Santer style:  +/-0.0311 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0469

Note that the Monte Carlo confidence has expanded.  This is due to noise on the offset calculation (120 months offset) which causes a systematic error that is not included in confidence intervals for global datasets.   The Santer method only sees the noise in the final trend and doesn’t pick up on this methodological error which is an important point for the first difference method which we’ll apply next.

First Difference Method

Trend:  0.0500 C/Decade – Perfect

Autocorrelation confidence interval from single trend Santer style:  +/-0.0363 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.245

The trend using first differences over 10,000 series came out perfect in this run.  Sometimes it was slightly high or low in other runs but we would expect any differences to center around the true trend, an advantage for FDM.  The Santer confidence interval was of reasonable size being calculated from only 1 run, but the more encompassing Monte Carlo two sigma confidence interval expanded to a whopping +/-0.245 C/Decade 7X larger than normal. This is the reason that first difference doesn’t really work for temperature series, I’ll demonstrate what is going on later in the post.  Next though, we’ll look at what Roman’s seasonal offset method does.

Roman’s Seasonal Offset

Trend:  0.0498 C/Decade

Autocorrelation confidence interval from single trend Santer style:  +/-0.035 C/Decade

Monte Carlo confidence interval from 10,000 trends: +/- 0.0385

——–

Of all the methods above, Roman’s is the only one to replicate the trend correctly while having a reasonable confidence interval.  The climatology standard comes close, but only Roman’s least squared method achieves the correct answer.  Note his trend is slightly higher than the climate standard anomaly window method, but is more correct.

So what is happening to make first difference so bad?

Figure 1 is a plot of the two trends used in this post.  Both have identical slopes of 0.05.

Figure 1

Figure 2 is a plot of the average of the two above slopes. Note that the trend matches the ‘simple anomaly’ combination above very closely – makes sense!  Please ignore the confidence interval.

Figure 2

I created two random ARMA trends (noise) adding in the above trends. The trend 0f 0.05C/Decade isn’t visible under the noise.

Figure 3

The next plot is the two above series combined by first differences.

Figure 4

It’s hard to miss the big step in this plot at 1960 and the hugely significant trend.  But what caused it?  Figure 5 is a zoomed in point where the second series is introduced.

Figure 5

To explain further, the next plot is a bit different.  I’ve taken the short series and offset it to make the first coexisting point equal to the long series value at that time.

Figure 6

It looks extreme but that is exactly what first differences does.  A zoomed in plot of the above with the intersection point highlighted in green is below.

Figure 7

Figure 8 is the average of Figure 6.  Note that it is the same curve as Figure 4.

Figure 8

The difference between the two results is a straight line.

Figure 9 - Offset to zero manually for clarity

So there you have it.  The first difference method is really just an offset of the anomaly based on the first coexisting point.  Since there is so much mircoclimate noise on the temperature signal, the offsets by using a single point are prohibitively large. I chose this example to demonstrate the offset clearly, but other runs step negative and sometimes the step isn’t visible at all.  The average of all the steps over 10,000 series is about zero, but even over 50 series the average trend is dominated by this random microclimate noise.  That’s why FDM is a terrible method for combining temperature trends. While the net trend using thousands of stations will be good, regional areas having even 20 stations will demonstrate complete nonsense trends with FDM.

Now that we can see what FDM does, consider improving the offset of the shorter anomaly above based on more than one coexisting data point! This was the topic of the following post.

Area Weighted Antarctic Reconstructions

The method used a window of overlapping points to determine the offset for the inserted series.  The accuracy of this method falls between the climatology normal method and Roman’s seasonal offset but very much closer to Roman’s method. Figure 3 is reproduced below so you can see how much the trend from even 63 stations varied as we went from 1 month overlap ( equivalent to FDM) to 95.  The proper trend for the Antarctic for the last 50 years is about 0.06C/Decade yet FDM returned negative values.

Figure 10

Note the stability of trend once enough months of overlapping data are utilized.

###################################################
## gridded global trend calculation with offset####
###################################################

#### load external functions filtering used
source("http://www.climateaudit.info/scripts/utilities.txt")  #Steve McIntyre

###################################################
######## functions ################################
###################################################
# many of these functions were written by Ryan O SteveM, NicL,RomanM

### Gausssian filter
ff=function(x)
{
	filter.combine.pad(x,truncated.gauss.weights(11) )[,2]#31
}

### Sum of squares
ssq=function(x) {sum(x^2)}

################################################
######## roman M offset series combination #####
################################################
##### version2.0
### subfunction to do pseudoinverse
psx.inv = function(mat,tol = NULL)
{
 	if (NCOL(mat)==1) return( mat /sum(mat^2))
	msvd = svd(mat)
 	dind = msvd$d
	if (is.null(tol))
	{
		tol = max(NROW(mat),NCOL(mat))*max(dind)*.Machine$double.eps
	}
 	dind[dind<tol]=0
 	dind[dind>0] = 1/dind[dind>0]
 	inv = msvd$v %*% diag(dind, length(dind)) %*% t(msvd$u)
	inv
}
### subfunction to do offsets
calcx.offset = function(tdat,wts)
{
	## new version
 	nr = length(wts)
 	delt.mat = !is.na(tdat)
 	delt.vec = rowSums(delt.mat)
 	row.miss= (delt.vec ==0)
 	delt2 = delt.mat/(delt.vec+row.miss)
 	co.mat = diag(colSums(delt.mat)) - (t(delt.mat)%*% delt2)
 	co.vec = colSums(delt.mat*tdat,na.rm=T) - colSums(rowSums(delt.mat*tdat,na.rm=T)*delt2)
 	co.mat[nr,] = wts
 	co.vec[nr]=0
 	psx.inv(co.mat)%*%co.vec
}

temp.combine = function(tsdat, wts=NULL, all=T)
{
	### main routine
	nr = nrow(tsdat)
	nc = ncol(tsdat)
 	dims = dim(tsdat)
 	if (is.null(wts)) wts = rep(1,nc)
 	wts=wts/sum(wts)
 	off.mat = matrix(NA,12,nc)
 	dat.tsp = tsp(tsdat)
 	for (i in 1:12) off.mat[i,] = calcx.offset(window(tsdat,start=c(dat.tsp[1],i), deltat=1), wts)
 	colnames(off.mat) = colnames(tsdat)
 	rownames(off.mat) = month.abb
 	matoff = matrix(NA,nr,nc)
 	for (i in 1:nc) matoff[,i] = rep(off.mat[,i],length=nr)
 	temp = rowMeans(tsdat-matoff,na.rm=T)
 	pred=NULL
 	residual=NULL
 	if (all==T)
	{
		pred =c(temp) + matoff
 		residual = tsdat-pred
	}
	list(temps = ts(temp,start=c(dat.tsp[1],1),freq=12),pred =pred, residual = residual, offsets=off.mat)
}

#pick out those series with have at least nn + 1 observations in every month
#Outputs a logical vector with TRUE indicating that that sereis is OK
dat.check = function(tsdat, nn=0)
{
	good = rep(NA,ncol(tsdat))
	for (i in 1:ncol(tsdat)) good[i]= (min(rowSums(!is.na(matrix(tsdat[,i],nrow=12))))>nn)
	good
}

######################################################################
##### slope plotting function with modifications for 12 month anomaly
######################################################################
get.slope=function(dat=dat)
{
	len=length(dat)
	month = factor(rep(1:12,as.integer(len/12)+1))
	month=month[1:len]	#first month by this method is not necessarily jan
					#the important bit is that there are 12 steps
	time2=time(dat)
	coes = coef(lm(dat~0+month+time2))
	residual=ts(rep(NA,len),start=time2[1],deltat=1/12)
	fitline=array(NA,dim=c(len,12))
	for(i in 1:12)
	{
		fitline[,i]=coes[13]*time2+coes[i]
	}
	anomaly=rep(NA,len)
	for(i in 1:len)
	{
		residual[i]=dat[i]-fitline[i,month[i]]
	}
	anomaly = residual+time2*coes[13]-mean(time2*coes[13],na.rm=TRUE)
	list(coes,residual,anomaly)
}

### plot average slope using roman's methods
plt.avg2=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled") {

	###  Get trend
	fm=get.slope(window(dat, st, en))

	###  Initialize variables
	N=length(window(dat, st, en))
	I=seq(1:N)/frequency(dat)
	rmI=ssq(I-mean(I))

	###  Calculate sum of squared errors
	SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2)

	###  Calculate OLS standard errors
	seOLS=sqrt(SSE/rmI)*10

	###  Calculate two-tailed 95% CI
	ci95=seOLS*1.964

	###  Get lag-1 ACF
	r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]]

	###  Calculate CIs with Quenouille (Santer) adjustment for autocorrelation
	Q=(1-r1)/(1+r1)
	ciQ=ci95/sqrt(Q)

	maxx=max(time(fm[[3]]))
	minx=min(time(fm[[3]]))
	maxy=max(fm[[3]],na.rm=TRUE)
	miny=min(fm[[3]],na.rm=TRUE)

	###  Plot data
	plot(window(fm[[3]], st, en), main=main.t, ylab="Anomaly (Deg C)")
	lfit=lm(window(fm[[3]], st, en)~I(time(window(fm[[3]], st, en))))

	###  Add trendline and x-axis
	abline(a=lfit[[1]][1],b=lfit[[1]][2], col=2); abline(fm, col=4, lwd=2)
	grid()
	###  Annotate text
	tx=paste("Slope: ", round(fm[[1]][13]*10, 4)
		,"Deg C/Decade", "+/-", round(ciQ, 4)
		, "Deg C/Decade\n(corrected for AR(1) serial correlation)\nLag-1 value:"
		, round(r1, 3))
	text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.8+miny, y.pos), tx, cex=.8, col=4, pos=4)
}
######## function plots monthly data with seasonal signal intact 12 slope curves
plt.seasonal.slopes=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled")
{
	###  Get trend
	fm=get.slope(window(dat, st, en))

	###  Initialize variables
	N=length(window(dat, st, en))
	I=seq(1:N)/frequency(dat)
	rmI=ssq(I-mean(I))

	###  Calculate sum of squared errors
	SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2)

	###  Calculate OLS standard errors
	seOLS=sqrt(SSE/rmI)*10

	###  Calculate two-tailed 95% CI
	ci95=seOLS*1.964

	###  Get lag-1 ACF
	r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]]

	###  Calculate CIs with Quenouille (Santer) adjustment for autocorrelation
	Q=(1-r1)/(1+r1)
	ciQ=ci95/sqrt(Q)

	maxx=max(time(dat))
	minx=min(time(dat))
	maxy=max(dat,na.rm=TRUE)
	miny=min(dat,na.rm=TRUE)

	###  Plot data
	plot(window(dat, st, en), main=main.t, ylab="Temperature (Deg C)")

	###  Add 12 trendlines and x-axis
	for(i in 1:12)
	{
		abline(a=fm[[1]][i],b=fm[[1]][13], col=2);
	}
	grid()
	###  Annotate text
	tx=paste("Slope: ", round(fm[[1]][13]*10, 4),"C/Dec"
		,"+/-", round(ciQ, 4)
		,"C/Dec(AR(1)Lag-1 val:"
		, round(r1, 3))
	text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.05+miny, y.pos), tx, cex=.8, col=4, pos=4)
}

###################################################################
###################################################################
###################################################################
###################################################################
###################################################################
###################################################################
###################################################################

###################################################################
#temp stations King_Sejong data from BAS
#You don't need to run this section because I only used the AR1 value for the
#next part

arm=arima(Yo[,22] ,order=c(1,0,0))#FIT ARIMA MODEL

sdv=sd(tdif,na.rm=TRUE)
Call:
arima(x = tdif, order = c(1, 0, 0))

Coefficients:
         ar1  intercept
      0.4987     0.0023
s.e.  0.0456     0.2191
###################################################################

tren=seq(0,.5,1/2400)

#long series
slp=array(NA,10000)
for(i in 1:10000)
{
	sim=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren
	sim=ts(sim,end=2010,deltat=1/12)
	slp[i]=lsfit(time(sim),sim)[[1]][2]
}
sd(slp)*2*10 #-- +/- 0.0399 C/Dec
mean(slp)*10 #0.0496C/Dec
plt.avg(dat=sim,st=1900,en=2010,y.pos=2,x.pos=1910) #+/- 0.04 C/Dec

#short series
slp=array(NA,10000)
for(i in 1:10000)
{
	sim=arima.sim(n = 601,list(ar=0.4987, ma = 0,sd = sdv))+tren[1:601]
	sim=ts(sim,end=2010,deltat=1/12)
	slp[i]=lsfit(time(sim),sim)[[1]][2]
}
sd(slp)*2*10 #-- +/- 0.111
mean(slp)*10 #0.0503C/Dec
plt.avg(dat=sim,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.1244 C/Dec

#simple anomaly combination
slp=array(NA,10000)
for(i in 1:10000)
{
	sim=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren
	sim2=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren-.25
	sld= cbind(sim,sim2)
	sld[1:600,2]=NA
	sim3=ts(rowMeans(sld,na.rm=TRUE),end=2010,deltat=1/12)

	slp[i]=lsfit(time(sim3),sim3)[[1]][2]
}
sd(slp)*2*10 #-- +/- 0.0351
mean(slp)*10 #0.0309C/Dec
plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.0346 C/Dec

#window anomaly combination
slp=array(NA,10000)
for(i in 1:10000)
{
	sim=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren
	sim2=arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv))+tren-.25
	sld= ts(cbind(sim,sim2),end=2010,deltat=1/12)
	sld[1:600,2]=NA

	cm=colMeans(window(sld,start=1950,end=1970),na.rm=TRUE)
	sld=t(t(sld)-cm)
	sim3=ts(rowMeans(sld,na.rm=TRUE),end=2010,deltat=1/12)

	slp[i]=lsfit(time(sim3),sim3)[[1]][2]
}
sd(slp)*2*10 #-- +/- 0.0469
mean(slp)*10 #0.0485C/Dec
plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.0311 C/Dec

### first difference method
slp=array(NA,10000)
for(i in 1:10000)
{
	sim=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren
	sim2=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren-.25
	sld= cbind(sim,sim2)
	sld[1:600,2]=NA
	dsld=diff(sld,1)
	sim3=ts(cumsum(rowMeans(dsld,na.rm=TRUE)),end=2010,deltat=1/12)

	slp[i]=lsfit(time(sim3),sim3)[[1]][2]
}
sd(slp)*2*10 #-- +/- 0.245
mean(slp)*10 #0.0500C/Dec
plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.0363 C/Dec

### Roman Seasonal Offset method
slp=array(NA,10000)
for(i in 1:10000)
{
	sim=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren
	sim2=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren-.25
	sld= cbind(sim,sim2)
	sld[1:600,2]=NA
	sld=ts(sld,end=2010,deltat=1/12)
	sim3=temp.combine(sld)$temps
	slp[i]=lsfit(time(sim3),sim3)[[1]][2]
	print(i)
}
sd(slp)*2*10 #-- +/- 0.0385
mean(slp)*10 #0.0498C/Dec
plt.avg(dat=sim3,st=1900,en=2010,y.pos=2,x.pos=1970) #+/- 0.035 C/Dec

#########
# plots
ta=ts(tren,end=2010,deltat=1/12)
tb=ta-.25
tb[1:600]=NA
plot(ta, main="Artificial Trends Used Above", xlab="Year", ylab="Deg C")
lines(tb,col="red")
smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long Trend","Short Trend") ,cex=.7)
#savePlot("c:/agw/plots/1.jpg",type="jpg")

#show slope of averaged trends
tcomb=ts(rowMeans(cbind(ta,tb),na.rm=TRUE),end=2010, freq=12)
plt.avg(tcomb,main.t="Averaged Trends",st=1900,en=2010,x.pos=1910)
#savePlot("c:/agw/plots/1.jpg",type="jpg")

#first difference plots
sim=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren
sim2=ts(arima.sim(n = 1201,list(ar=0.4987, ma = 0,sd = sdv)),deltat=1/12,end=2010)+tren-.25
sld= cbind(sim,sim2)
sld[1:600,2]=NA
dsld=diff(sld,1)
sim3=ts(cumsum(rowMeans(dsld,na.rm=TRUE)),end=2010,deltat=1/12)

plot(sld[,1], main="Simulated Temperature Series With Artificial Trends", xlab="Year", ylab="Deg C")
lines(sld[,2],col="red")
smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long series with trend","Short series with trend") ,cex=.7)
#savePlot("c:/agw/plots/1.jpg",type="jpg")

plt.avg(sim3,main.t="First Difference Combination of Series",st=1900,en=2010,x.pos=1910,y.pos=4)
#savePlot("c:/agw/plots/1.jpg",type="jpg")

plot(sim3,xlim=c(1959,1961),main="Zoomed in Interface Between Series 1 and 2")
abline(v=1960,col="red")
#savePlot("c:/agw/plots/1.jpg",type="jpg")

sldoffset = sld
sldoffset[,2]=sldoffset[,2]-(sldoffset[601,2]-sldoffset[601,1])#offset temp series to match

plot(sldoffset[,1], main="Simulated Temperature Series With Artificial Trends \nOffset by First Coexisting Point", xlab="Year", ylab="Deg C",ylim=c(-3,4))
lines(sldoffset[,2],col="red")
smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long series with trend","Short series with trend") ,cex=.7)
#savePlot("c:/agw/plots/1.jpg",type="jpg")

plot(sldoffset[,1], main="Simulated Temperature Series With Artificial Trends \nOffset by First Coexisting Point", xlab="Year", ylab="Deg C",ylim=c(-3,4),xlim=c(1959,1961))
lines(sldoffset[,2],col="red")
points(1960,sldoffset[601,1],col="green",pch=19)
smartlegend(x="left",y="top",inset=0,fill=1:2,legend=c("Long series with trend","Short series with trend") ,cex=.7)
#savePlot("c:/agw/plots/1.jpg",type="jpg")

offsetmean=ts(rowMeans(sldoffset,na.rm=TRUE),end=2010,freq=12)
plt.avg(offsetmean,main.t="Combination of Series Offset by First Point",st=1900,en=2010,x.pos=1910,y.pos=3)
#savePlot("c:/agw/plots/1.jpg",type="jpg")

diff=sim3-offsetmean

plot(diff-mean(diff),ylim=c(-1,1),main="Difference Plot FDM minus Offset Anomaly by First Coexisting Point",xlab="Year", ylab="Deg C")
#savePlot("c:/agw/plots/1.jpg",type="jpg")

19 Responses to “Comparison and Critique of Anomaly Combination Methods”

  1. Brian H said

    Lesson: don’t use linear extrapolations of apparent short term trends?

  2. Kenneth Fritsch said

    http://www.ncdc.noaa.gov/temp-and-precip/ghcn-gridded-temp.html

    Data Set Development

    These datasets were created from station data using the Anomaly Method, a method that uses station averages during a specified base period (1961-1990 in this case) from which the monthly/seasonal/annual departures can be calculated. Prior to March 25, 2005 the First Difference Method (Peterson et al. 1998) was used in producing the Mean Temperature grid, and prior to February 24, 2004, the FD Method was used in producing the Maximum and Minimum Temperature grids. Due to recent efforts to restructure the GHCN station data by removing duplicates (see Peterson et al. 1997 for a discussion of duplicates), it became possible to develop grids for all temperature data sets using the Anomaly Method. The Anomaly Method can be considered to be superior to the FD Method when calculating gridbox size averages. The FD Method has been found to introduce random errors that increase with the number of time gaps in the data and with decreasing number of stations (Free et al. 2004). While the FD Method works well for large-scale means, these issues become serious on 5X5 degree scales.

  3. RomanM said

    You’re a busy man, Jeff. Good stuff!

    You’ve become quite a data analyst in the past year or two. maybe you should have become an academic instead of just a practical engineer. ;)

  4. Brian H said

    RomanM;
    Jeff’s doing good work so you want to demote him? WUWT? >:(

  5. Jeff Id said

    Thanks Roman, unfortunately I didn’t think of this problem until after I saw FDM not working. That’s how it goes sometimes though.

  6. Nic L said

    Excellent article, Jeff. It very clearly explains why FDM won’t work satisfactorily with noisy data. One has to compare stations using a longer period of overlap, as you say.

  7. Genghis said

    Jeff I was wondering if you could do the reverse and see what happens when you take out a few inputs. They have dropped a lot of stations and I am wondering if that has an effect.

  8. Your anomaly by climatology example is not representative: The 20 year window is too short; the 10 year overlap is too short.

    CRU requires 15 years in a 30 year window. Your short series would be reject in a CRU style analysis.

    GISTEMP (Reference Station Method) when combining records (critically, Step 3 when station records are combined to produce a gridded product) combines using as many years are in the overlap, and requires 20 years.

  9. Jeff Id said

    #8 An odd comment.

    Of course if I widened the window centered on the start of the short series to 30 years, the series would pass the artificial CRU criteria and it would only affect the confidence interval so why does that matter?

  10. Steven Mosher said

    #8.

    David, what makes you think that CRU criteria are “correct” there is no CORRECT. I use 15 years, but I require the series to have ALL 12 months present. CRU only require 10 months to make a year. which is correct? neither. Each gives you an estimate with different properties. The best one can do with CAM is look at a variety of criteria and study the effect DUE TO your choice.

    Roman’s method has no such “analyst chocies” all the data is used and the best estimate is made from the data. Simply,
    the trend changes is you go from 15 years to 10 to 20, 10 month years, 11 month, 12 month 10 month.. or how about
    at least 12 full years, 5 years with 11 months and 3 years with 10 months? That variation is a variation that is due to the “model specification” that the analyst defines arbitrarily. It puts undocumented uncertainty into the estimate

  11. @Jeff #9. Doesn’t your observation in the article “Note that the Monte Carlo confidence has expanded.” deserve a note to say that had the overlap and window been larger then the “expanded confidence” would not have expanded so much? Aren’t we comparing the confidence intervals that different methods give?

    @Steven #10. Who mentioned “CORRECT”?

    FTR I have no problems with Roman’s method and I understand it perfectly. I don’t know why you mention it.

  12. Jeff Id said

    #11 Yes the confidence interval would tighten as I mentioned. It would still underestimate the trend though. My intent wasn’t to replicate CUR or gisstemp (we did that elsewhere) but to demonstrate the strengths and weaknesses of the different methods.

  13. Why are you regressing monthly anomalies? Didn’t Roman M tell us that we have to compute yearly anomalies first? (which is what GISTEMP does).

  14. RomanM said

    #13

    Did I miss something in my own post? That is not what I said.

    In fact, calculating yearly anomalies is a problem when some months are missing. Infilling may increase uncertainty and skipping years with some missing monthly values results in the loss of information from the months which are present. The methods I describe can be applied to the original non-anomaly measurements using all of the information which is available without risk of distortion due to seasonal starting points.

  15. @RomanM: Ah yes, you’re right. You said “Several fixes are possible. The simplest is to use year rather than time as the independent variable in the regression.”

    Which in this toy example amounts to the same thing as computing yearly anomalies.

    Of course I agree with your following qualification: “this solution can still be problematical if the number of available anomalies differs from year to year” if by “problematical” you mean “introduces distortions that a much smaller than a dot of printer’s ink”.

    Do you agree that Jeff is introducing an error by drawing trend lines through monthly anomalies?

  16. Jeff Id said

    #15, In this example there is no stairstep from monthly anomaly. The data is random AR1 with a gaussian distribution. Think of it like a correct anomaly, rather than the climate standard.

  17. So the mean of your July “anomalies” isn’t 0, it’s whatever the trend line is when it hits July (presumably, at 0.05C per decade, some small value).

  18. Jeff Id said

    #17, Yes. Interestingly if you read some of the older posts on Roman’s method, it provides a system to do this exact calculation. I think of it as a true anomaly, it makes very little difference to the result though.

  19. Steven Mosher said

    Sorry david.

    when you said that 20 was “too short” I thought you were implying there was a correct method. Glad to see you say that there is no correct method.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 147 other followers

%d bloggers like this: