the Air Vent

Because the world needs another opinion

Thermal Hammer Part Deux

Posted by Jeff Id on March 25, 2010

This is basically a continuation of the documentation of Roman’s improved method of combining temperature anomaly.  First there are several plots which show the difference between the simple – average the anomaly method – used universally in climate science overlaid on the improved method.

Figure 1 - Two plots of global temperature anomaly. Red is the standard method used in climate science, and black is the improved method.

Note the slightly higher trend in the black method.  This is created by the required offset corrections for incomplete temperature stations.  Looking back, it’s kind of funny, I wouldn’t even calculate an Antarctic trend without the offsets.  Nobody had to tell me, hey Jeff you should….. it’s obvious!!  They are a conceptual necessity in calculation of a global trend.  The more I consider it, the more idiotic it seems that climate ‘science’ doesn’t do it.

Below is the difference in trend according to these new methods.  Please don’t pay too much attention to the confidence intervals b/c the data is filtered.

Figure 2 - Difference between global trend

On the last thread  Chip Knappenberger said;

The land-only CRU dataset has a trend of about 0.224C/decade since 1978 [edit], which is not all that much different from your 0.248C/dec.

Which is pretty cool.  The difference between the corrected offset anomaly average and the standard is basically the same thing.

Figure 3 - Global trend since 1978 using standard non-offset anomaly combination

Figure 4 - Global trend since 1978 using improved offset anomaly combination

How cool is that!  I mashed together global trends using my own standard method in Figure 3 and came within a few decimals of the pro’s.  Like sea ice, this is not too bad.

Roman’s method is so obviously better Figure 4, as is the method I used for Antarctic trend but they are not used by the pro’s. It’s slightly higher in trend as we would expect.

It’s too much, it has to be said —  the waministas who would trade their grandma’s for a few tenths of a degree, didn’t think of the most obvious and correct method to calc temp anomaly.  I’m not even talking about Roman’s mastery of the subject but just about offsetting two anomalies to match each other.  The method always increases trend — but it is correct!! Wow, in Mann08, I was convinced that the idiocy was fraud, this time, I think it’s just incompetence. It is like a whole new world has opened up.

Let’s leave my overactive (and over-vocal) thoughts for another post though. I promised the complete code last night and nobody called me out on it.  I’ve attached everything except some plotting functions below.  The plotting functions are not required for verification and they do add a  lot of false complexity to the appearance.  One of my goals is to get people to take a crack at using R and actually running this stuff.  It’s not so heady that you can’t do it, and there is a ton to learn. If you need the plotting code, leave a comment.

###################################################
## gridded global trend calculation with offset####
###################################################

#### load external functions filtering used
source("http://www.climateaudit.info/scripts/utilities.txt")  #Steve McIntyre

###################################################
######## data load ################################
###################################################
loc="C:/agw/R/GHCN/v2.mean."
wd=c(3,5,3,1,4,5,5,5,5,5,5,5,5,5,5,5,5)
ghmean=read.fwf(loc,widths=wd,comment.char="")

loc="C:/agw/R/GHCN/v2.country.codes"
wd=c(3,38)
ccode=read.fwf(loc,widths=wd)

loc="C:/agw/R/GHCN/v2.temperature.inv"
wd=c(3,5,4,30,8,7,4,6,1,10,3,2,16,1)
#wd=100
tinv=read.fwf(loc,widths=wd,comment.char="")
stationnum=nrow(tinv) #7280

#save(ghmean, ccode, stationnum,tinv ,file="GHCN data.RData")
#load("ghcn data.RData")

###################################################
######## functions ################################
###################################################
# many of these functions were written by Ryan O SteveM, NicL,RomanM

### Gausssian filter
ff=function(x)
{
	filter.combine.pad(x,truncated.gauss.weights(11) )[,2]#31
}

### Sum of squares
ssq=function(x) {sum(x^2)}

################################################
######## roman M offset series combination #####
################################################
##### version2.0
### subfunction to do pseudoinverse
psx.inv = function(mat,tol = NULL)
{
 	if (NCOL(mat)==1) return( mat /sum(mat^2))
	msvd = svd(mat)
 	dind = msvd$d
	if (is.null(tol))
	{
		tol = max(NROW(mat),NCOL(mat))*max(dind)*.Machine$double.eps
	}
 	dind[dind<tol]=0
 	dind[dind>0] = 1/dind[dind>0]
 	inv = msvd$v %*% diag(dind, length(dind)) %*% t(msvd$u)
	inv
}
### subfunction to do offsets
calcx.offset = function(tdat,wts)
{
	## new version
 	nr = length(wts)
 	delt.mat = !is.na(tdat)
 	delt.vec = rowSums(delt.mat)
 	row.miss= (delt.vec ==0)
 	delt2 = delt.mat/(delt.vec+row.miss)
 	co.mat = diag(colSums(delt.mat)) - (t(delt.mat)%*% delt2)
 	co.vec = colSums(delt.mat*tdat,na.rm=T) - colSums(rowSums(delt.mat*tdat,na.rm=T)*delt2)
 	co.mat[nr,] = wts
 	co.vec[nr]=0
 	psx.inv(co.mat)%*%co.vec
}

temp.combine = function(tsdat, wts=NULL, all=T)
{
	### main routine
	nr = nrow(tsdat)
	nc = ncol(tsdat)
 	dims = dim(tsdat)
 	if (is.null(wts)) wts = rep(1,nc)
 	wts=wts/sum(wts)
 	off.mat = matrix(NA,12,nc)
 	dat.tsp = tsp(tsdat)
 	for (i in 1:12) off.mat[i,] = calcx.offset(window(tsdat,start=c(dat.tsp[1],i), deltat=1), wts)
 	colnames(off.mat) = colnames(tsdat)
 	rownames(off.mat) = month.abb
 	matoff = matrix(NA,nr,nc)
 	for (i in 1:nc) matoff[,i] = rep(off.mat[,i],length=nr)
 	temp = rowMeans(tsdat-matoff,na.rm=T)
 	pred=NULL
 	residual=NULL
 	if (all==T)
	{
		pred =c(temp) + matoff
 		residual = tsdat-pred
	}
	list(temps = ts(temp,start=c(dat.tsp[1],1),freq=12),pred =pred, residual = residual, offsets=off.mat)
}

#pick out those series with have at least nn + 1 observations in every month
#Outputs a logical vector with TRUE indicating that that sereis is OK
dat.check = function(tsdat, nn=0)
{
	good = rep(NA,ncol(tsdat))
	for (i in 1:ncol(tsdat)) good[i]= (min(rowSums(!is.na(matrix(tsdat[,i],nrow=12))))>nn)
	good
}

######################################################################
##### slope plotting function with modifications for 12 month anomaly
######################################################################
get.slope=function(dat=dat)
{
	len=length(dat)
	month = factor(rep(1:12,as.integer(len/12)+1))
	month=month[1:len]	#first month by this method is not necessarily jan
					#the important bit is that there are 12 steps
	time2=time(dat)
	coes = coef(lm(dat~0+month+time2))
	residual=ts(rep(NA,len),start=time2[1],deltat=1/12)
	fitline=array(NA,dim=c(len,12))
	for(i in 1:12)
	{
		fitline[,i]=coes[13]*time2+coes[i]
	}
	anomaly=rep(NA,len)
	for(i in 1:len)
	{
		residual[i]=dat[i]-fitline[i,month[i]]
	}
	anomaly = residual+time2*coes[13]-mean(time2*coes[13],na.rm=TRUE)
	list(coes,residual,anomaly)
}

### plot average slope using roman's methods
plt.avg2=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled") {

	###  Get trend
	fm=get.slope(window(dat, st, en))

	###  Initialize variables
	N=length(window(dat, st, en))
	I=seq(1:N)/frequency(dat)
	rmI=ssq(I-mean(I))

	###  Calculate sum of squared errors
	SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2)

	###  Calculate OLS standard errors
	seOLS=sqrt(SSE/rmI)*10

	###  Calculate two-tailed 95% CI
	ci95=seOLS*1.964

	###  Get lag-1 ACF
	r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]]

	###  Calculate CIs with Quenouille (Santer) adjustment for autocorrelation
	Q=(1-r1)/(1+r1)
	ciQ=ci95/sqrt(Q)

	maxx=max(time(fm[[3]]))
	minx=min(time(fm[[3]]))
	maxy=max(fm[[3]],na.rm=TRUE)
	miny=min(fm[[3]],na.rm=TRUE)

	###  Plot data
	plot(window(fm[[3]], st, en), main=main.t, ylab="Anomaly (Deg C)")
	lfit=lm(window(fm[[3]], st, en)~I(time(window(fm[[3]], st, en))))

	###  Add trendline and x-axis
	abline(a=lfit[[1]][1],b=lfit[[1]][2], col=2); abline(fm, col=4, lwd=2)
	grid()
	###  Annotate text
	tx=paste("Slope: ", round(fm[[1]][13]*10, 4)
		,"Deg C/Decade", "+/-", round(ciQ, 4)
		, "Deg C/Decade\n(corrected for AR(1) serial correlation)\nLag-1 value:"
		, round(r1, 3))
	text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.8+miny, y.pos), tx, cex=.8, col=4, pos=4)
}
######## function plots monthly data with seasonal signal intact 12 slope curves
plt.seasonal.slopes=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled")
{
	###  Get trend
	fm=get.slope(window(dat, st, en))

	###  Initialize variables
	N=length(window(dat, st, en))
	I=seq(1:N)/frequency(dat)
	rmI=ssq(I-mean(I))

	###  Calculate sum of squared errors
	SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2)

	###  Calculate OLS standard errors
	seOLS=sqrt(SSE/rmI)*10

	###  Calculate two-tailed 95% CI
	ci95=seOLS*1.964

	###  Get lag-1 ACF
	r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]]

	###  Calculate CIs with Quenouille (Santer) adjustment for autocorrelation
	Q=(1-r1)/(1+r1)
	ciQ=ci95/sqrt(Q)

	maxx=max(time(dat))
	minx=min(time(dat))
	maxy=max(dat,na.rm=TRUE)
	miny=min(dat,na.rm=TRUE)

	###  Plot data
	plot(window(dat, st, en), main=main.t, ylab="Temperature (Deg C)")

	###  Add 12 trendlines and x-axis
	for(i in 1:12)
	{
		abline(a=fm[[1]][i],b=fm[[1]][13], col=2);
	}
	grid()
	###  Annotate text
	tx=paste("Slope: ", round(fm[[1]][13]*10, 4),"C/Dec"
		,"+/-", round(ciQ, 4)
		,"C/Dec(AR(1)Lag-1 val:"
		, round(r1, 3))
	text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.05+miny, y.pos), tx, cex=.8, col=4, pos=4)
}
dat=nh
plt.seasonal.slopes(dat,main.t="Typical Temperature Series")
#savePlot("c:/agw/plots/12 line seasonal sloope.jpg",type="jpg")

###########################################################
### combine staton data by insuring whether each series is in fact the same
###########################################################
getsinglestation=function(staid=60360, version=0)
{
	staraw=NA
	#raw data
	smask= (ghmean[,2]==staid)
	mask= ghmean[smask,3]==version
	data=ghmean[smask,][mask,]
	noser= levels(factor(data[,4]))

	for(j in noser)
	{
		mask2=data[,4]==j
		startyear=min(data[mask2,5])
		endyear=max(data[mask2,5])
		subd=data[mask2,6:17]
		index=(data[mask2,5]-startyear)+1
		dat=array(NA,dim=c(12,(endyear-startyear)+1))
		for(k in 1:length(index))
		{
			dat[,index[k]]=as.numeric(subd[k,])
		}
		dim(dat)=c(length(dat),1)

		dat[dat==-9999]=NA
		dat=dat/10
		rawd=ts(dat,start=startyear,deltat=1/12)
		if(max(time(rawd))>=2011)
		{
			print ("error series")
			rawd=NA
		}

		if(!is.ts(staraw))
		{
			staraw=rawd
		}else{
			staraw=ts.union(staraw,rawd)
		}
	}
	if(!is.null(ncol(staraw)))
	{
		allraw=ts(rowMeans(staraw,na.rm=TRUE),start=time(staraw)[1],freq=12)
	}else{
		allraw=staraw
	}

	allraw
}

##############################################################
############ Assemble stations into grid Jeff Id #############
##############################################################

##assign stations to gridcells
gridval=array(NA,dim=c(stationnum,2))
gridval[,1]=as.integer((tinv[,5]+90)/5)+1  #lat
gridval[,2]=as.integer((tinv[,6]+180)/5)+1  #lon

#calculate gridcell trends
gridtem=array(NA,dim=c(36,72,211*12))

for(ii in 1:72)
{
	for(jj in 1:36)
	{
		maska=gridval[,2]==ii
		maskb=gridval[,1]==jj
		maskc= maska & maskb
		sta=NA ##initialize to non TS
		for(i in (1:stationnum)[maskc])
		{
			#get all station trends for station ID
			rawsta = getsinglestation(tinv[i,2],tinv[i,3])

			#place all curves side by side in a ts structure
			if(!is.ts(sta))
			{
				sta=rawsta
			}else{
				ts.union(sta,rawsta)
			}
		}

		if(is.ts(sta))#if at least one time series exists
		{

			sta=window(sta,start=1800)	#ignore pre-1800
			index=as.integer(((time(sta)-1800)*12+.002)+1)  #calculate time index for insertion into array
			if(!is.null(ncol(sta)))  #is station more than one column
			{
				gridtem[jj,ii,index]=temp.combine(sta)$temps
			}else{
				gridtem[jj,ii,index]=sta
			}
			print(jj)  #progress debug
			tit=paste("ii=",ii,"jj=",jj)
			plot(ts(gridtem[jj,ii,],start=1800,freq=12),main=tit)  #debug
		}
	}
	print ("COLUMN")  #progress debug
	print(ii) 		#progress debug
}

###########################################
## calculate global trends using roman's offset combine on each grid
###########################################
rgl=gridtem
dim(rgl)=c(36*72,2532)
rgl=t(rgl)
rgl=ts(rgl,start=1800,deltat=1/12)
mask=!is.na(colMeans(rgl,na.rm=TRUE))
wt=rep(weight,72)
maskb=array(FALSE,dim=c(36,72))
maskb[1:18,]=TRUE
dim(maskb)=(36*72)
maskc=mask & maskb
sh=temp.combine(rgl[,maskc],wt[maskc])$temps
maskc=mask & !maskb
nh=temp.combine(rgl[,maskc],wt[maskc])$temps

plot(nh,main="Northern Hemisphere Temperature",xlab="Year", ylab="Deg J")
#savePlot("c:/agw/plots/NH temp - roman h.jpg",type="jpg")
plot(sh,main="Southern Hemisphere Temperature",xlab="Year", ylab="Deg J")
#savePlot("c:/agw/plots/SH temp - roman h.jpg",type="jpg")
un=ts.union(nh,sh)

glt=temp.combine(un)$temps
plot(glt,main="Global Temperature",xlab="Year", ylab="Deg J")
#savePlot("c:/agw/plots/global temp - roman h2.jpg",type="jpg")
plt.seasonal.slopes(window(glt,start=1900),main.t="Global Temperature Trend")
#savePlot("c:/agw/plots/global temp trend- roman h2.jpg",type="jpg")

plt.avg2(window(glt,start=1900),main.t="Global Temperature Anomaly")
#savePlot("c:/agw/plots/global temp anomaly- roman h2.jpg",type="jpg")

plt.avg2(window(glt,start=1900),main.t="Global Temperature Anomaly")
lines(hadcrut,col="red")
legend("bottomright",c("Ids","HadCru"),col=c(1,2),pch=10)
#savePlot("c:/agw/plots/id plus hadcrut.jpg",type="jpg")

nha=get.slope(nh)
plot(window(nha[[3]],1900),main="Northern Hemisphere Anomaly",ylab="Anomaly Deg C",xlab="Year")
#savePlot("c:/agw/plots/northern hemisphere anom.jpg",type="jpg")

sha=get.slope(sh)
plot(window(sha[[3]],1900),main="Southern Hemisphere Anomaly",ylab="Anomaly Deg C",xlab="Year")
#savePlot("c:/agw/plots/Southern hemisphere anom.jpg",type="jpg")

gla= get.slope(glt)

plot(ff(window(nha[[3]],1978)),main="Hemispheric Temperature Anomaly",ylab="Anomaly Deg C",xlab="Year")
lines(ff(window(sha[[3]],1978)),col=2)
legend("bottomright",c("Northern","Southern"),col=c(1,2),pch=10)

#savePlot("c:/agw/plots/hemisphere anom1978 .jpg",type="jpg")

plot(ff(window(nha[[3]],1978)),main="Hemispheric Temperature Anomaly",ylab="Anomaly Deg C",xlab="Year")
lines(ff(window(glava,1978)),col=2)
legend("bottomright",c("Northern","Southern"),col=c(1,2),pch=10)

#savePlot("c:/agw/plots/hemisphere anom1978 .jpg",type="jpg")

plt.avg2(ff(window(glt,1978)),main="Global Temperature Anomaly")
#savePlot("c:/agw/plots/Global anom1978 .jpg",type="jpg")

####################################
## global average without offset grid combinations
####################################

gta=array(NA,dim=c(36,72,211*12))

for(ii in 1:72)
{
	for(jj in 1:36)
	{
		gta[jj,ii,]=calc.anom(gridtem[jj,ii,])
	}
}

glav=rep(NA,211*12)				#global ave
weight=sin(seq(2.5,180, 5)*(pi)/180)	#cell weights
meanweight=mean(weight)
weight=weight/meanweight			#normalize cell weights
#weight=rep(1,36) # calculate unweighted temp  #uncomment for unweighted version
mask=rep(FALSE,36)				#lon mask
mask[1:18]=TRUE					#lon mask to allow southern hemisphere

for(kk in 1: (211*12))				#hemisphere 1
{
	mm=rep(NA,72)
	for(ii in 1:72)
	{
		mm[ii]=mean(gta[mask,ii,kk]*weight[1:18],na.rm=TRUE)
	}
	glav[kk]=mean(mm,na.rm=TRUE)
}
glava=ts(glav,start=1800,deltat=1/12)	#southern hemisphere average
mask=!mask						#select other hemisphere
for(kk in 1:(211*12))
{
	mm=rep(NA,72)
	for(ii in 1:72)
	{
		mm[ii]=mean(gta[mask,ii,kk]*weight[19:36],na.rm=TRUE)
	}
	glav[kk]=mean(mm,na.rm=TRUE)
}

glavb=ts(glav,start=1800,deltat=1/12)     #north hemisphere average

glav=(glava+glavb)/2				#global average

57 Responses to “Thermal Hammer Part Deux”

  1. tarpon said

    I have a question, does it really matter if the temperature is rising? Because even if it is, it proves exactly nothing, except the temperature is rising.

  2. Jeff Id said

    #1, I tend to let these comments go to others with more opinion. Life is incredibly busy (far too) but my hope is that people realize that I always read.

    There are a ton of things I could write on the meaning of temp rise. It would be cool to get someone like the Lurker to let it rip. IMO, it’s good to combine data in the best possible way, no matter the result or meaning. I mean it’s a pure goal. We don’t get a choice, nobody is forcing anything on anyone else. It’s just data. Cold as space and as predictable as addition. We can actually calculate a proper temperature anomaly from what appears to be weak data in need of serious QC.

    When you say — does it matter.

    The answer is yes, to modelers who believe they can predict the anomaly of the atmosphere. Consider that we have hundred million dollar computers calculating temperature models, and guys like Ben Santer comparing their output to temperature trend for accuracy. YET NOBODY BOTHERED TO OFFSET [snip] ANOMALIES!!!

    [snip] — I really am the most snipped person here. And it’s real snips, not fabricated ones. haha.

  3. M. Simon said

    Look at Willis’ post today on WUWT.

    I am not at all comfortable with manufacturing temperatures.

  4. Jeff Id said

    #3, This is in no way ‘manufactured’ temperature.

    Willis’s post has to do with the comparison between correlation (which like the Antarctic posts here) is highly dependent on short term variance, and trend which affects correlation nearly zero. It’s actually an odd post for him because he’s very good with numbers.

    What this post, like the previous post, shows, is that there is no ‘hide the decline’ in the algorithms that create global temperature series. They are basically correct for the data that is being shoved into them. That means that those who would argue with global warming have got to focus on the data, b/c the trends presented are reasonable in a mathematical sense.

    What this post and the previous also show, is that the boys who like warming, don’t have a clue what they are doing. It’s obvious and correct to offset anomalies when calculating trend. If you need the usage of a supercomputer to calculate your model, why the hell don’t you calculate anomaly offsets for trend on your laptop.

  5. Geoff Sherrington said

    Not wanting to sound downbeat, but to me the next logical step would be to look at the ways that various experts would handle gridding/interpolation etc. You sound like you could do with a holiday first, though. Even a Roman Holiday.

    Beware the empty data cell.

  6. timetochooseagain said

    Now there is just one final nut to crack-how to account for various sources of bias that might be present in the GHCN data. By which I mean, anything that might induce spurious or unrepresentative trends that hasn’t been accounted for.

    That’s a tricky one.

    Oh, quick question, have you thought about looking at Max/Min temps? I believe, correct me if I’m wrong, that GHCN lists those separately, in addition to the mean they get from them. A lot of us surface data skeptics think that Max temps would be more representative of the thermal state of the atmosphere and less influenced by local biases. The Diurnal range is also supposed to be a big signal in the data (of something, but people disagree on what).

  7. Carrick said

    It looks to me like there is a problem with the script as posted.

    Line 232: dat = nh

    It doesn’t appear that nh was defined at this point.

  8. Jeff Id said

    #7, yup, that was a test added in after the fact.

  9. Carrick said

    Gotcha. So just comment out lines 232 and 233 to get this to run.

    In case it’s not clear the urls for the data files are at

    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.country.codes
    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.temperature.inv
    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z
    

    v2.mean will need to be renamed to v2.mean. and the path names probably appropriately edited.

    Nicely done, Jeff.

  10. Thanks, Jeff and Carrick,
    I revived the savePlot commands and found my version of R insisted on type=”jpeg”, not jpg. That fixed, it ran for a long time, through all the gridcell trending (showing reasonable plots), but stopped at line 350 where “weight” had not been created.

  11. Re: Nick Stokes (Mar 26 07:30),
    I went to this post, and found this code for weights:
    weight=sin(seq(2.5,180, 5)*(pi)/180) #cell weights
    meanweight=mean(weight)
    weight=weight/meanweight #normalize cell weights

    That let me proceed. Objects hadcrut and glava were missing, but I could just skip those plots. Then the function calc.anom() was missing. I guess I could find that without too much trouble, but by now a whole lot of plots have appeared, and they look pretty reasonable. “Global Temperature Anomaly” has a super HS at the end – an artefact, but matches well otherwise. Fig 4 of the previous post seems identical, as do the hemisphere anomaly plots. The temp plots look much the same.

    It’s getting late here – I’ll track down calc.anom() tomorrow.

  12. Jeff Id said

    It sounds like I’ve got some clean up work to do. Too many posts were generated from different pieces of the code.

  13. Bob H. said

    It seems the processing of the data has been refined to the point where the results are reliable based on the input data. The problem, as I see it, is the input data has been badly corrupted through UHI effect and other seemingly random adjustment (New Zealand, Australia, etc.) and poor station siting (Watt’s station siting project) plus the station dropout since 1990.

    Once these problems have been corrected from the raw data, perhaps we can then determine what the real numbers are. Until then, it GIGO.

  14. CarlGullans said

    bang bang roman’s silver hammer came down, upon their heads…

  15. [...] Comparisons Written by: lucia Jeff Id posted Roman’s Temperature Reconstruction part 1 and part 2. This uses Roman’s method of combining stations which I am not going to try to explain. (Jeff [...]

  16. It would be cool to replace the Actual temps in the GHCn files with some test data.

    Other guys ( like zeke and CCC and CRU ) could then see how they handle that.

  17. Jeff, I really think you need a fixed anomaly base period here (eg 1961-1990). I described here the problem caused if you let it vary. You are subtracting an anomaly adjustment made up of fragments from different time periods, and trend in the data will be appear as trend in the anomaly subtraction, unless you use a common period.

  18. Jeff Id said

    #17, That’s a good point for the standard anomaly method, Roman’s doesn’t need it. What should I do with series that don’t have data for that timeperiod?

  19. RomanM said

    Re: Nick Stokes (Mar 26 15:33),

    Nick, I don’t think that you understand how the method works. The anomaly is not calculated individually for the “fragments”, but is calculated for the finished series. The entire calculation prior to that point is based on an “anomaly basis” common to all of the series – it’s called “0 degrees C”.

    One of the things that clued me into the fact that this may be a good approach was the property in the original equivalent SS that Tamino proposed to minimize. Although it may not be immediately apparent from my formulation (but can be proved reasonably easily), the offsets are chosen in a way which also minimizes paired differences between offsetted values at the same time of measurement. If there is a trend common to all the series present in the temperatures, it is removed by the differencing and should not mathematically affect the calculation of the offsets. The trend is the mean of the offsetted values. All of this, however, is done simultaneously.

    Show us that this is happening. If you have math to back up what you say (or at least some sort of example illustrating the effect of what you refer to), I will be happy to consider what you say. Otherwise, this is arm waving not mathematics.

  20. Jeff Id said

    #19, I may have misunderstood but I thought Nic was referring to the standard no-offset method compared to yours in the post above. It could be the reason that I’m about a tenth of a degree lower in trend than the published version.

  21. Zeke Hausfather said

    Interesting work Jeff. Looks like you are ready to dive into segmenting and slicing up the stations to see what results you get from various subsets. I’ve been working on something along those lines over at Lucia’s place using satellite nightlights, population density, and urban boundary data as proxies for urbanity, with the assumption that rural places were most likely rural in the past so UHI effects will be observable in the difference between trends in adjacent urban and rural stations. Ron Broberg over at the Whiteboard has done some good work improving the GHCN station metadata by referencing it to various spatial datasets.

    Checking my model quickly, it looks like I get a 0.258 C per decade trend, pretty close to what you get. For reference, GISS land temps show 0.190 C for that period and NCDC land temps show 0.294. I’m somewhat impressed how far apart they are.

  22. Zeke Hausfather said

    Also: be careful in comparing your results to HadCRUT, since they don’t just use GHCN v2.mean_adj, but rather do a lot of their own adjustments to the data.

    The product that should be easiest to replicate is NCDC land temps available at ftp://ftp.ncdc.noaa.gov/pub/data/anomalies/annual.land.90S.90N.df_1901-2000mean.dat since they are simply using the GHCN v2.mean_adj data.

  23. Re: RomanM (Mar 26 16:00),
    Roman, it’s not an issue just for fragments. It applies when combining any groupings – stations, gridcells etc. If the data covers different periods, and the anomalies are calculated with respect to those periods, then the means that you subtract have a trend. This is a far bigger effect than the artificial seasonal effect that you identified.

  24. Jeff Id said

    #22 thanks Zeke.

    Nick,

    I don’t know if you are reading my comments to you at Lucia’s or above but there are two issues Roman solved. One is the seasonal ripple, the second is to realign anomalies by offsetting them to remove steps.

  25. Re: Jeff Id (Mar 26 15:35),
    What should I do with series that don’t have data for that timeperiod?
    That’s the big issue with all anomaly gridding methods. It’s why GISS compute anomalies for grid cells rather than stations, because the grid cell is more likely to have some data for the period.

    My own non-standard suggestion is to do a regression over some period which is as close as possible to the 1961-90 but has enough data, and then use the end 1975 fitted value as the mean to subtract.

    I managed to get your program to run to completion using the following calc.anom function, which I think implements your intent:
    calc.anom =function(tsdat) {
    anom = tsdat
    for (i in 1:12) { sequ = seq(i,length(tsdat),by = 12)
    anom[sequ] = tsdat[sequ] – mean(tsdat[sequ])
    }
    anom
    }

    I adapted it from here.

  26. RomanM said

    Re: Nick Stokes (Mar 26 16:55),

    Jeff’s calculation of the trends simultaneously anomalizes the data. It is not done sequentially as you suggest. No “means with a trend” are subtracted.

  27. ha,

    I managed to get R to run on my mac.

    Don’t expect anything of value from me for a while.

  28. Re: RomanM (Mar 26 17:45),
    Yes, OK, I’ve gone through the math and I now agree that with the simultaneous fitting of offsets and combined function (as with your method and Tamino’s), you don’t need a fixed base period to avoid spurious trend effects. However, the cost is that the offsets will vary as the data period varies. If new data comes in, all the past offsets change. GISS publishes gridded anomalies for 1978, say, and they would change with 2010 data. You need a fixed period to avoid that.

    The objection to the varying base period still applies to the implementation of the “standard non-offset anomaly” which is presented here for comparison. There the fixed common period is needed, and is part of the “standard”. So a comparison without it is meaningless.

  29. Jim said

    What if you plotted two charts – one for the grids with data in the same latitude around the globe and one for grids in the same longitude line around the globe. Smooth both graphs. For a grid with no data, use the average of the two (monthly) smoothed graphs at that grid. This would give an estimate based on the shape of the two temperature curves for the entire circumference of the Earth.

  30. 28.

    Nick, does it matter to the comparision of trend estimations?

  31. Re: steven mosher (Mar 27 14:12),
    Yes. I think Roman’s method is fine, but the “standard” method without a fixed base period will tend to underestimate trend, which spoils the comparison.

  32. Carrick said

    I agree with Nick here that the “standard method” needs to be cleaned up so that a common base period gets used. Same for Roman’s just so prior years are stabilized against future data additions.

    We need to get Jeff/Roman’s code robust enough and have it automatically produce “normal products”.

    I would suggest these outputs:

    global temperature in “standard format”:

    each line would be of the form:

    year jan feb mar .. dec

    where jan…dec are the temperatures. E.g., line #1 of crutempgl3v:

    1850 -1.650 -0.270 -0.481 -0.725 -0.572 0.059 -0.581 -0.300 -0.245 -0.303 -0.4
    15 0.170 -0.443

    same for northern hemisphere and southern hemisphere.

    Then something like the hadcrut 5°x5° gridded format:

    Year month 36 rows 72 columns NA
    followed by 36 lines with 72 values each starting from 90°S going to 90°N i

    such as:

    1980   3  36 rows 72 columns
    NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
     NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA N
    A NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 
    NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1.23 NA NA NA NA 1.646 NA 
    NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3.789 NA NA NA
     NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
    NA NA NA NA NA NA NA NA NA NA NA NA 0.9329 NA NA NA NA NA NA NA NA NA NA NA NA N
    A NA NA NA NA NA NA 4.401 NA NA NA NA NA NA -0.2039 -1.638 -1.751 0.4102 NA NA N
    A NA NA NA NA NA 2.867 NA NA NA NA 1.744 NA NA NA NA NA NA -1.395 NA NA NA NA NA
     NA NA NA
    1.306 NA NA NA 1.868 NA NA 1.267 NA NA 0.5835 NA NA NA NA NA NA NA NA NA 4.805 N
    A 4.517 NA 5.579 NA NA NA NA NA NA 4.778 -0.6205 -0.8986 -0.2707 0.6645 0.1079 0
    .08463 0.03469 -0.09201 -0.3282 -0.1264 -0.2592 -1.261 -0.4234 -0.7038 -0.9933 -
    0.06599 NA NA NA NA 1.145 NA NA NA -2.385 NA NA NA NA NA NA NA NA -1.863 NA NA -
    0.9145 NA NA NA
    

    This needs to get to that level so that misuse of your results by neophyte R-users like me are less likely to occur.

    If you need any help with writing the output functions (which is really all that is needed right now), I’m sure Nick or I could help.

    Not trying to be tough on you Jeff, it’s great work. It’s just getting close enough to a complete project to need further code cleanup. You knew that. ;-)

  33. Carrick said

    I mean to have the first line of my gridded file to look like:

    1980   3  36 rows 72 columns NA
    

    “NA” is the missing data token.

  34. Jeff Id said

    Carrick,

    I also agree that Nick’s right and the normal method needs to be done over a base period. Roman’s method doesn’t use anomaly. The whole thing is compiled as seasonal data before the final trend was anomalized using a method employing Roman’s improved trend calc.

  35. Geoff Sherrington said

    Jeff,

    Here is a paper from Australia BoM year 2003

    I had to OCR from a poor potocopy given to me by the BoM. Sorry about the quality. If you need equations repeated I’ll JPEG them.

    The point is that the 1200 kn radius correction is invoked. The problem is that I do not know if it is invoked in the global data you have been using above. I suspect it its, so there is a double correction.

    I seem to be due to pay some indulgences before the BoM will talk to me again because a few years ago I mentioned that global temperatures might not increase. If one of you has a better connection, you might be able to find precisely what adjustments the BOM has made to the data (and over what periods) that you are using in the above valuable global example.

  36. Geoff Sherrington said

    Nick Stokes “It’s why GISS compute anomalies for grid cells rather than stations, because the grid cell is more likely to have some data for the period”

    This is a judgement call, because there is no guarantee that an estimate made form other stations in a grid cell is any better or worse that a straight guess of a missing value for a station, or an average od data on either side.

    One day we might get to a stage where we can simply leave all missing data out of global calculations, do no guesses or interpolations, then calculate a difference to see if it is significant. First we have to find a reliable series to compare the difference. It might be that the effects of all missing values sum to nothingness – they might be equally distributed over seasons, latitudes, altitudes, have cancelling TOBS adjustments and so on. Maybe this present work is a large investment in finding ways to overcome missing data and I hope that it allows an examnation of whether missing data matter or not. I am forever uncomfortable about assigning values to missing data because all methods are guesses, by definition.

  37. I tried my short cut method of finding an anomaly by fitting a regression and using a fixed year value in place of the mean. I tested the 1978-2009 period, and used 1994 as the reference year, being a sort of mid-point. This required just a change to calc.anom(). Using the original calc.anom() I got a global trend of 0,211 C/Dec (cf Jeff’s 0.2107). Using the regression I got 0.2217 C/Dec, which is moving towards agreement with Roman’s method (0.2476), but not very far. However, I found that there was more dependence on the choice of year than I liked. Using 1975 brought the trend back to 0.2156 C/Dec.

    Here’s the revised calc.anom():
    calc.anom =function(tsdat) {
    anom = tsdat
    x = seq(1,length(tsdat),by = 12)
    n = 1:length(x) – 195 # centred at 1994
    for (i in 1:12) {
    a = tsdat[x]
    if(sum(!is.na(a))>8){ # eliminate very short sequences
    h = lm(a ~ n)$coef[1]
    anom[x] = a – h
    }else{
    anom[x] = NA
    }
    x = x + 1
    }
    anom
    }

  38. RomanM said

    Nick:

    There appear to be some R technical problems with your script. My computer hiccupped on several of the lines.

    It appears that you are regressing the raw data on a monthly basis over the entire range and zeroeing the intercept at the given year. I am not sure that the programme actually does that correctly since the results I was getting on a sequence of 240 normal variates looked a little weird.

    You are using different slopes for each month and I don’t know that that is desirable. The other possible difficulty I perceive with this is the assumption of linearity inherent in the process. I am having some trouble visualizing exactly what anom actually represents at the end of this entire calculation.

  39. RomanM said

    Carrick:

    Same for Roman’s just so prior years are stabilized against future data additions.

    Yes, you have a point here.

    I ran an example with four series with reasonable amounts of data. I did four runs cutting the data at 1979, 1989, 1999, and 2009, respectively. Differences among the four resultingcombined series were less than a maximum .05 in more recent years (with most much closer to zero), however some reached .15 a century back when only one station was contributing.

    The reason is that the offsets may change slightly as more information becomes available for comparing the input series. Anomalizing the combined series does not change the situation substantially. However, if the entire period is not used, the problem of some series not being usable any more becomes an issue. As well, the uncertainty of the combined series also is increased. Some thought needs to be put into finding procedures regarding these stabilization issues.

  40. Carrick said

    RomanN, when you add more data there are two effects, one is geometric, namely the centroid of the data shifts forward in time. Subtracting the value for a constant baseline (e.g., 1960-1980) just “freezes” the baseline so it doesn’t drift as you add more data.

    The second effect is a statistical one, and is a “real” effect: As you add more data, you are using more information, and that obviously influences your estimates of the offsets.

  41. Re: RomanM (Mar 28 12:40),
    You are using different slopes for each month and I don’t know that that is desirable.
    Yes, it’s a short cut to emulate the effect of a fixed base period without the difficulties that ensue when there are missing values there. It’s for comparison, not claimed to be better than your use of a common trend. The idea is that instead of subtracting the mean of each grid set, you fit a line and subtract its value in a fixed year.

    I’ve pasted the code as it ran here. I have been trying (unsuccessfully) to locate a technical problem because, as I mentioned, it is more sensitive to the choice of base year than I think it should be. Could you be using a different na.action.default for lm()? I’m using the default na.omit.

  42. Geoff Sherrington said

    Sorry for the junk transmission in my 35. I’ll tidy it up. Different browsers/hosters can have subtle (to me) differences and sometime I get it wrong the first time.

    The essence is whether countries like Australia have already made adjustments of the type you incorporate, before you get the data. I simply do not know the answer. The important graphs from above for visual purposes are

    http://i260.photobucket.com/albums/ii14/sherro_2008/d-m05.jpg?t=1269817480

    Re use of correlation coefficients, one gets a different answer using daily Tmax over a short period, to annual Tmax over a long period, for parts of the the same pairs in a group, and a different answer again after smoothing. How do you decide the fundamentally correct calculation? It points to correlation coefficients changing as instruments change, since some of them can do smoothing like daily spike rejection.

  43. Carrick said

    Geoff what’s the reference for that figure?

    Thanks.

  44. geoff

    the missing data and the adjustments are the bits of fudge where more certainty gets assigned than is warrented.

    I’ll take TOBS for example. If you have 60 measurements per month and your instrument error is like that assumed by Jones,
    then your monthly estimate is good to .03C 1sd.

    So a monthly temp ( say march) is say 10c +-.1C at a 95% CI.

    Now you find out that this 10C was taken at 6AM as opposed to midnight, the standard time.

    What’s a poor climate scientist to do?

    Well, build a TOBS model. This model is empirical. they look at years of hourly data from stations. They note the
    month, the lat, the lon, the position of the sun. Then they do a regression. This regression is a prediction.
    The TOBS regression says This:

    F(lat,lon,month, 6AM) = -.5C. In most cases the prediction has a SE of between .1 and .2C It all depends.

    So now, how well do you know that march temperature?.. The estimate comes out to 9.5C.. what about the error?

    I imagine that it’s vanishingly small. Still, I’d like to see how its treated. Same for the MMTS adjustments.

  45. Zeke Hausfather said

    Jeff,

    Cross-posting this from Lucia’s comment thread, since I’m not sure you are actively reading it. Would it be possible to obtain annual anomalies 1900-present for this method? I’m planning on learning R myself in the near future, but at the moment I’m looking to put together all the various surface temp reconstructions floating around (as well as GISS/NCDC/CRUT land temps) on a chart.

  46. Geoff Sherrington said

    43 Carrick

    I could not find an electronic version and goofed in creating one. All I have is

    Aust. Met. Mag. 53 (2004) 75-93
    Updating Australia’s high-quality annual temperature dataset
    Paul Della-Marta and Dean Collins
    National Climate Centre, Bureau of Meteorology, Australia
    and
    Karl Braganza
    CSIRO Atmospheric Research, Aspendale, Australia
    (Manuscript received June 2003; revised September 2003)

    If you go to my post at 35 above, the whole lot is hyperlinked by accident so you can read most of the text. The figures were dropped out in the OCR/rename/post prcesses. Let me kow if you need figs or equations. Geoff.

  47. Geoff Sherrington said

    44 Steven Mosher I’m surprised how hard it is to discover if a country submitted data to the CRUs of the world in raw, averaged, adjusted or whatever form, over what periods and also whether this has been upgraded as the info is constantly being improved by people digging back through the original meta data sheets and making revisions to the home country data. My worry is that better statistical methods for assembling global data are doubling up on some adjustments without the authors knowing that they do not have truly “raw” data. I have an email from one country authority asserting that their data are “raw” when in fact they are rather modified.

  48. Carrick said

    Thanks, Geoff. My google kung fu has grown powerful indeed.

    Here’s the paper. Mercifully no pay wall.

  49. Carrick said

    Actually you want this one (hires figures)

  50. [...] Jeff Id/RomanM [...]

  51. Geoff Sherrington said

    48,49 Carrick,

    Thank you for the electronic version. I feel a bit better knowing that it’s not so easy an exercise with a PC. (Enter stage left, Mac owners with PDF claims).

    Now can you find out the periods when the BOM applied these adjustments and which coefficients they used?

  52. [...] Jeff Id/Roman M [...]

  53. [...] Jeff Id http://noconsensus.wordpress.com/2010/03/25/thermal-hammer-part-deux/ [...]

  54. Joe Papp said

    WIthout making a long, involved comment on the INSANITY of “average temperature”. (Sorry, it’s patent nonsense.) If one wishes to think there is some significance in it…let’s note this, the rise (if real, I have HUGE trouble with the QA of the data, which I’ll get to in a moment)…is “linear” and “consistent” from 1900 on. When one uses the Mana Loa CO2 data as a baseline, and the allegations with regard CO2’s “heat trapping” (which I also debate..yes, I’m and Elasser/Milkosky radical…CO2 is a NET EXHANGE AGENT in the troposphere, and makes no differences until above 2500 PPM!) there is obviously a “cause/effect” problem.

    This heads us towards asking what causes overall cycles, taking “man” out of the question.

    With regard to DATA QUALITY. WHEN, just WHEN does anyone think we transitioned from “hand recorded”, “guess at the high or low” data…and “rotating drum” recorders?????

    Does ANYONE REMEMBER that there were NO satellites in 1900? 1920, 1950, even up to 1980??? Does anyone have a CONCEPT that the “hand recorded data” has virtuatlly NO VALUE for 0.1 C values? (.2 F?)….Does anyone understand GARBAGE DATA IN GARBAGE ANALYSIS OUT?

    The reason for the gradual rise may very much be due to a systematic data bias error due to changing technology.

    I see NO error bars. I see NO attempt to quantify error. (Although BEST does have that. I will give them credit for that, but ZERO credit for the great “Average temperature” flaw.!!!!)

    I hope and pray that some day, some BRIGHT INTELLIGENT people will Wake UP! And realize the futility of all this “average temperature” nonsense.

    The ONLY think that you can do with temperature averages is look at a LOCAL REGION, and see if there are “seasonal shifts” going on.

    I’ve done that for MN. AND AFTER REMOVING THE UHI from the Minneapolis/St. Paul area, and “guessing” at the “error bars” for the prior to 1920 data, I cannot (beyond statistical doubt, STANDARD DEVIATION ANYONE?) say that since the first temperature record (Fort Snelling, Minneapolis MN) there has been any SIGNIFICANT change in the overall weather/temperature profiles in MN. (Barring the fact that the 1820-1840 was the end of the “mini-ice age” and does show early winters and late springs. And averages somewhat lower than the following 40 years, but then 1880 to 1920 pans out pretty close in historgrams and trends to the 1820 to 1840 realm. Again, comparisons made by HISTOGRAMS in terms of “degree days”, and NOT by strictly averaging temperatures…insanity.

  55. [...] simultaneously. In this regard the method is similar to that implemented by skeptical blogger JeffId (with RomanM) and the methods developed by Nick Stokes and [...]

  56. [...] [...]

  57. […] http://noconsensus.wordpress.com/2010/03/25/thermal-hammer-part-deux/ […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 140 other followers

%d bloggers like this: