This is basically a continuation of the documentation of Roman’s improved method of combining temperature anomaly. First there are several plots which show the difference between the simple – average the anomaly method – used universally in climate science overlaid on the improved method.

Note the slightly higher trend in the black method. This is created by the required offset corrections for incomplete temperature stations. Looking back, it’s kind of funny, I wouldn’t even calculate an Antarctic trend without the offsets. Nobody had to tell me, hey Jeff you should….. it’s obvious!! They are a conceptual necessity in calculation of a global trend. The more I consider it, the more idiotic it seems that climate ‘science’ doesn’t do it.
Below is the difference in trend according to these new methods. Please don’t pay too much attention to the confidence intervals b/c the data is filtered.

On the last thread Chip Knappenberger said;
The land-only CRU dataset has a trend of about 0.224C/decade since 1978 [edit], which is not all that much different from your 0.248C/dec.
Which is pretty cool. The difference between the corrected offset anomaly average and the standard is basically the same thing.

How cool is that! I mashed together global trends using my own standard method in Figure 3 and came within a few decimals of the pro’s. Like sea ice, this is not too bad.
Roman’s method is so obviously better Figure 4, as is the method I used for Antarctic trend but they are not used by the pro’s. It’s slightly higher in trend as we would expect.
It’s too much, it has to be said — the waministas who would trade their grandma’s for a few tenths of a degree, didn’t think of the most obvious and correct method to calc temp anomaly. I’m not even talking about Roman’s mastery of the subject but just about offsetting two anomalies to match each other. The method always increases trend — but it is correct!! Wow, in Mann08, I was convinced that the idiocy was fraud, this time, I think it’s just incompetence. It is like a whole new world has opened up.
Let’s leave my overactive (and over-vocal) thoughts for another post though. I promised the complete code last night and nobody called me out on it. I’ve attached everything except some plotting functions below. The plotting functions are not required for verification and they do add a lot of false complexity to the appearance. One of my goals is to get people to take a crack at using R and actually running this stuff. It’s not so heady that you can’t do it, and there is a ton to learn. If you need the plotting code, leave a comment.
################################################### ## gridded global trend calculation with offset#### ################################################### #### load external functions filtering used source("http://www.climateaudit.info/scripts/utilities.txt") #Steve McIntyre ################################################### ######## data load ################################ ################################################### loc="C:/agw/R/GHCN/v2.mean." wd=c(3,5,3,1,4,5,5,5,5,5,5,5,5,5,5,5,5) ghmean=read.fwf(loc,widths=wd,comment.char="") loc="C:/agw/R/GHCN/v2.country.codes" wd=c(3,38) ccode=read.fwf(loc,widths=wd) loc="C:/agw/R/GHCN/v2.temperature.inv" wd=c(3,5,4,30,8,7,4,6,1,10,3,2,16,1) #wd=100 tinv=read.fwf(loc,widths=wd,comment.char="") stationnum=nrow(tinv) #7280 #save(ghmean, ccode, stationnum,tinv ,file="GHCN data.RData") #load("ghcn data.RData") ################################################### ######## functions ################################ ################################################### # many of these functions were written by Ryan O SteveM, NicL,RomanM ### Gausssian filter ff=function(x) { filter.combine.pad(x,truncated.gauss.weights(11) )[,2]#31 } ### Sum of squares ssq=function(x) {sum(x^2)} ################################################ ######## roman M offset series combination ##### ################################################ ##### version2.0 ### subfunction to do pseudoinverse psx.inv = function(mat,tol = NULL) { if (NCOL(mat)==1) return( mat /sum(mat^2)) msvd = svd(mat) dind = msvd$d if (is.null(tol)) { tol = max(NROW(mat),NCOL(mat))*max(dind)*.Machine$double.eps } dind[dind<tol]=0 dind[dind>0] = 1/dind[dind>0] inv = msvd$v %*% diag(dind, length(dind)) %*% t(msvd$u) inv } ### subfunction to do offsets calcx.offset = function(tdat,wts) { ## new version nr = length(wts) delt.mat = !is.na(tdat) delt.vec = rowSums(delt.mat) row.miss= (delt.vec ==0) delt2 = delt.mat/(delt.vec+row.miss) co.mat = diag(colSums(delt.mat)) - (t(delt.mat)%*% delt2) co.vec = colSums(delt.mat*tdat,na.rm=T) - colSums(rowSums(delt.mat*tdat,na.rm=T)*delt2) co.mat[nr,] = wts co.vec[nr]=0 psx.inv(co.mat)%*%co.vec } temp.combine = function(tsdat, wts=NULL, all=T) { ### main routine nr = nrow(tsdat) nc = ncol(tsdat) dims = dim(tsdat) if (is.null(wts)) wts = rep(1,nc) wts=wts/sum(wts) off.mat = matrix(NA,12,nc) dat.tsp = tsp(tsdat) for (i in 1:12) off.mat[i,] = calcx.offset(window(tsdat,start=c(dat.tsp[1],i), deltat=1), wts) colnames(off.mat) = colnames(tsdat) rownames(off.mat) = month.abb matoff = matrix(NA,nr,nc) for (i in 1:nc) matoff[,i] = rep(off.mat[,i],length=nr) temp = rowMeans(tsdat-matoff,na.rm=T) pred=NULL residual=NULL if (all==T) { pred =c(temp) + matoff residual = tsdat-pred } list(temps = ts(temp,start=c(dat.tsp[1],1),freq=12),pred =pred, residual = residual, offsets=off.mat) } #pick out those series with have at least nn + 1 observations in every month #Outputs a logical vector with TRUE indicating that that sereis is OK dat.check = function(tsdat, nn=0) { good = rep(NA,ncol(tsdat)) for (i in 1:ncol(tsdat)) good[i]= (min(rowSums(!is.na(matrix(tsdat[,i],nrow=12))))>nn) good } ###################################################################### ##### slope plotting function with modifications for 12 month anomaly ###################################################################### get.slope=function(dat=dat) { len=length(dat) month = factor(rep(1:12,as.integer(len/12)+1)) month=month[1:len] #first month by this method is not necessarily jan #the important bit is that there are 12 steps time2=time(dat) coes = coef(lm(dat~0+month+time2)) residual=ts(rep(NA,len),start=time2[1],deltat=1/12) fitline=array(NA,dim=c(len,12)) for(i in 1:12) { fitline[,i]=coes[13]*time2+coes[i] } anomaly=rep(NA,len) for(i in 1:len) { residual[i]=dat[i]-fitline[i,month[i]] } anomaly = residual+time2*coes[13]-mean(time2*coes[13],na.rm=TRUE) list(coes,residual,anomaly) } ### plot average slope using roman's methods plt.avg2=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled") { ### Get trend fm=get.slope(window(dat, st, en)) ### Initialize variables N=length(window(dat, st, en)) I=seq(1:N)/frequency(dat) rmI=ssq(I-mean(I)) ### Calculate sum of squared errors SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2) ### Calculate OLS standard errors seOLS=sqrt(SSE/rmI)*10 ### Calculate two-tailed 95% CI ci95=seOLS*1.964 ### Get lag-1 ACF r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]] ### Calculate CIs with Quenouille (Santer) adjustment for autocorrelation Q=(1-r1)/(1+r1) ciQ=ci95/sqrt(Q) maxx=max(time(fm[[3]])) minx=min(time(fm[[3]])) maxy=max(fm[[3]],na.rm=TRUE) miny=min(fm[[3]],na.rm=TRUE) ### Plot data plot(window(fm[[3]], st, en), main=main.t, ylab="Anomaly (Deg C)") lfit=lm(window(fm[[3]], st, en)~I(time(window(fm[[3]], st, en)))) ### Add trendline and x-axis abline(a=lfit[[1]][1],b=lfit[[1]][2], col=2); abline(fm, col=4, lwd=2) grid() ### Annotate text tx=paste("Slope: ", round(fm[[1]][13]*10, 4) ,"Deg C/Decade", "+/-", round(ciQ, 4) , "Deg C/Decade\n(corrected for AR(1) serial correlation)\nLag-1 value:" , round(r1, 3)) text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.8+miny, y.pos), tx, cex=.8, col=4, pos=4) } ######## function plots monthly data with seasonal signal intact 12 slope curves plt.seasonal.slopes=function(dat, st=1850, en=2011, y.pos=NA, x.pos=NA, main.t="Untitled") { ### Get trend fm=get.slope(window(dat, st, en)) ### Initialize variables N=length(window(dat, st, en)) I=seq(1:N)/frequency(dat) rmI=ssq(I-mean(I)) ### Calculate sum of squared errors SSE=ssq(fm[[2]][!is.na(fm[[2]])])/(N-2) ### Calculate OLS standard errors seOLS=sqrt(SSE/rmI)*10 ### Calculate two-tailed 95% CI ci95=seOLS*1.964 ### Get lag-1 ACF r1=acf(fm [[2]], lag.max=1, plot=FALSE,na.action=na.pass)$acf[[2]] ### Calculate CIs with Quenouille (Santer) adjustment for autocorrelation Q=(1-r1)/(1+r1) ciQ=ci95/sqrt(Q) maxx=max(time(dat)) minx=min(time(dat)) maxy=max(dat,na.rm=TRUE) miny=min(dat,na.rm=TRUE) ### Plot data plot(window(dat, st, en), main=main.t, ylab="Temperature (Deg C)") ### Add 12 trendlines and x-axis for(i in 1:12) { abline(a=fm[[1]][i],b=fm[[1]][13], col=2); } grid() ### Annotate text tx=paste("Slope: ", round(fm[[1]][13]*10, 4),"C/Dec" ,"+/-", round(ciQ, 4) ,"C/Dec(AR(1)Lag-1 val:" , round(r1, 3)) text(x=ifelse(is.na(x.pos), (maxx-minx)*.2+minx,x.pos), y=ifelse(is.na(y.pos), (maxy-miny)*.05+miny, y.pos), tx, cex=.8, col=4, pos=4) } dat=nh plt.seasonal.slopes(dat,main.t="Typical Temperature Series") #savePlot("c:/agw/plots/12 line seasonal sloope.jpg",type="jpg") ########################################################### ### combine staton data by insuring whether each series is in fact the same ########################################################### getsinglestation=function(staid=60360, version=0) { staraw=NA #raw data smask= (ghmean[,2]==staid) mask= ghmean[smask,3]==version data=ghmean[smask,][mask,] noser= levels(factor(data[,4])) for(j in noser) { mask2=data[,4]==j startyear=min(data[mask2,5]) endyear=max(data[mask2,5]) subd=data[mask2,6:17] index=(data[mask2,5]-startyear)+1 dat=array(NA,dim=c(12,(endyear-startyear)+1)) for(k in 1:length(index)) { dat[,index[k]]=as.numeric(subd[k,]) } dim(dat)=c(length(dat),1) dat[dat==-9999]=NA dat=dat/10 rawd=ts(dat,start=startyear,deltat=1/12) if(max(time(rawd))>=2011) { print ("error series") rawd=NA } if(!is.ts(staraw)) { staraw=rawd }else{ staraw=ts.union(staraw,rawd) } } if(!is.null(ncol(staraw))) { allraw=ts(rowMeans(staraw,na.rm=TRUE),start=time(staraw)[1],freq=12) }else{ allraw=staraw } allraw } ############################################################## ############ Assemble stations into grid Jeff Id ############# ############################################################## ##assign stations to gridcells gridval=array(NA,dim=c(stationnum,2)) gridval[,1]=as.integer((tinv[,5]+90)/5)+1 #lat gridval[,2]=as.integer((tinv[,6]+180)/5)+1 #lon #calculate gridcell trends gridtem=array(NA,dim=c(36,72,211*12)) for(ii in 1:72) { for(jj in 1:36) { maska=gridval[,2]==ii maskb=gridval[,1]==jj maskc= maska & maskb sta=NA ##initialize to non TS for(i in (1:stationnum)[maskc]) { #get all station trends for station ID rawsta = getsinglestation(tinv[i,2],tinv[i,3]) #place all curves side by side in a ts structure if(!is.ts(sta)) { sta=rawsta }else{ ts.union(sta,rawsta) } } if(is.ts(sta))#if at least one time series exists { sta=window(sta,start=1800) #ignore pre-1800 index=as.integer(((time(sta)-1800)*12+.002)+1) #calculate time index for insertion into array if(!is.null(ncol(sta))) #is station more than one column { gridtem[jj,ii,index]=temp.combine(sta)$temps }else{ gridtem[jj,ii,index]=sta } print(jj) #progress debug tit=paste("ii=",ii,"jj=",jj) plot(ts(gridtem[jj,ii,],start=1800,freq=12),main=tit) #debug } } print ("COLUMN") #progress debug print(ii) #progress debug } ########################################### ## calculate global trends using roman's offset combine on each grid ########################################### rgl=gridtem dim(rgl)=c(36*72,2532) rgl=t(rgl) rgl=ts(rgl,start=1800,deltat=1/12) mask=!is.na(colMeans(rgl,na.rm=TRUE)) wt=rep(weight,72) maskb=array(FALSE,dim=c(36,72)) maskb[1:18,]=TRUE dim(maskb)=(36*72) maskc=mask & maskb sh=temp.combine(rgl[,maskc],wt[maskc])$temps maskc=mask & !maskb nh=temp.combine(rgl[,maskc],wt[maskc])$temps plot(nh,main="Northern Hemisphere Temperature",xlab="Year", ylab="Deg J") #savePlot("c:/agw/plots/NH temp - roman h.jpg",type="jpg") plot(sh,main="Southern Hemisphere Temperature",xlab="Year", ylab="Deg J") #savePlot("c:/agw/plots/SH temp - roman h.jpg",type="jpg") un=ts.union(nh,sh) glt=temp.combine(un)$temps plot(glt,main="Global Temperature",xlab="Year", ylab="Deg J") #savePlot("c:/agw/plots/global temp - roman h2.jpg",type="jpg") plt.seasonal.slopes(window(glt,start=1900),main.t="Global Temperature Trend") #savePlot("c:/agw/plots/global temp trend- roman h2.jpg",type="jpg") plt.avg2(window(glt,start=1900),main.t="Global Temperature Anomaly") #savePlot("c:/agw/plots/global temp anomaly- roman h2.jpg",type="jpg") plt.avg2(window(glt,start=1900),main.t="Global Temperature Anomaly") lines(hadcrut,col="red") legend("bottomright",c("Ids","HadCru"),col=c(1,2),pch=10) #savePlot("c:/agw/plots/id plus hadcrut.jpg",type="jpg") nha=get.slope(nh) plot(window(nha[[3]],1900),main="Northern Hemisphere Anomaly",ylab="Anomaly Deg C",xlab="Year") #savePlot("c:/agw/plots/northern hemisphere anom.jpg",type="jpg") sha=get.slope(sh) plot(window(sha[[3]],1900),main="Southern Hemisphere Anomaly",ylab="Anomaly Deg C",xlab="Year") #savePlot("c:/agw/plots/Southern hemisphere anom.jpg",type="jpg") gla= get.slope(glt) plot(ff(window(nha[[3]],1978)),main="Hemispheric Temperature Anomaly",ylab="Anomaly Deg C",xlab="Year") lines(ff(window(sha[[3]],1978)),col=2) legend("bottomright",c("Northern","Southern"),col=c(1,2),pch=10) #savePlot("c:/agw/plots/hemisphere anom1978 .jpg",type="jpg") plot(ff(window(nha[[3]],1978)),main="Hemispheric Temperature Anomaly",ylab="Anomaly Deg C",xlab="Year") lines(ff(window(glava,1978)),col=2) legend("bottomright",c("Northern","Southern"),col=c(1,2),pch=10) #savePlot("c:/agw/plots/hemisphere anom1978 .jpg",type="jpg") plt.avg2(ff(window(glt,1978)),main="Global Temperature Anomaly") #savePlot("c:/agw/plots/Global anom1978 .jpg",type="jpg") #################################### ## global average without offset grid combinations #################################### gta=array(NA,dim=c(36,72,211*12)) for(ii in 1:72) { for(jj in 1:36) { gta[jj,ii,]=calc.anom(gridtem[jj,ii,]) } } glav=rep(NA,211*12) #global ave weight=sin(seq(2.5,180, 5)*(pi)/180) #cell weights meanweight=mean(weight) weight=weight/meanweight #normalize cell weights #weight=rep(1,36) # calculate unweighted temp #uncomment for unweighted version mask=rep(FALSE,36) #lon mask mask[1:18]=TRUE #lon mask to allow southern hemisphere for(kk in 1: (211*12)) #hemisphere 1 { mm=rep(NA,72) for(ii in 1:72) { mm[ii]=mean(gta[mask,ii,kk]*weight[1:18],na.rm=TRUE) } glav[kk]=mean(mm,na.rm=TRUE) } glava=ts(glav,start=1800,deltat=1/12) #southern hemisphere average mask=!mask #select other hemisphere for(kk in 1:(211*12)) { mm=rep(NA,72) for(ii in 1:72) { mm[ii]=mean(gta[mask,ii,kk]*weight[19:36],na.rm=TRUE) } glav[kk]=mean(mm,na.rm=TRUE) } glavb=ts(glav,start=1800,deltat=1/12) #north hemisphere average glav=(glava+glavb)/2 #global average
I have a question, does it really matter if the temperature is rising? Because even if it is, it proves exactly nothing, except the temperature is rising.
#1, I tend to let these comments go to others with more opinion. Life is incredibly busy (far too) but my hope is that people realize that I always read.
There are a ton of things I could write on the meaning of temp rise. It would be cool to get someone like the Lurker to let it rip. IMO, it’s good to combine data in the best possible way, no matter the result or meaning. I mean it’s a pure goal. We don’t get a choice, nobody is forcing anything on anyone else. It’s just data. Cold as space and as predictable as addition. We can actually calculate a proper temperature anomaly from what appears to be weak data in need of serious QC.
When you say — does it matter.
The answer is yes, to modelers who believe they can predict the anomaly of the atmosphere. Consider that we have hundred million dollar computers calculating temperature models, and guys like Ben Santer comparing their output to temperature trend for accuracy. YET NOBODY BOTHERED TO OFFSET [snip] ANOMALIES!!!
[snip] — I really am the most snipped person here. And it’s real snips, not fabricated ones. haha.
Look at Willis’ post today on WUWT.
I am not at all comfortable with manufacturing temperatures.
#3, This is in no way ‘manufactured’ temperature.
Willis’s post has to do with the comparison between correlation (which like the Antarctic posts here) is highly dependent on short term variance, and trend which affects correlation nearly zero. It’s actually an odd post for him because he’s very good with numbers.
What this post, like the previous post, shows, is that there is no ‘hide the decline’ in the algorithms that create global temperature series. They are basically correct for the data that is being shoved into them. That means that those who would argue with global warming have got to focus on the data, b/c the trends presented are reasonable in a mathematical sense.
What this post and the previous also show, is that the boys who like warming, don’t have a clue what they are doing. It’s obvious and correct to offset anomalies when calculating trend. If you need the usage of a supercomputer to calculate your model, why the hell don’t you calculate anomaly offsets for trend on your laptop.
Not wanting to sound downbeat, but to me the next logical step would be to look at the ways that various experts would handle gridding/interpolation etc. You sound like you could do with a holiday first, though. Even a Roman Holiday.
Beware the empty data cell.
Now there is just one final nut to crack-how to account for various sources of bias that might be present in the GHCN data. By which I mean, anything that might induce spurious or unrepresentative trends that hasn’t been accounted for.
That’s a tricky one.
Oh, quick question, have you thought about looking at Max/Min temps? I believe, correct me if I’m wrong, that GHCN lists those separately, in addition to the mean they get from them. A lot of us surface data skeptics think that Max temps would be more representative of the thermal state of the atmosphere and less influenced by local biases. The Diurnal range is also supposed to be a big signal in the data (of something, but people disagree on what).
It looks to me like there is a problem with the script as posted.
Line 232: dat = nh
It doesn’t appear that nh was defined at this point.
#7, yup, that was a test added in after the fact.
Gotcha. So just comment out lines 232 and 233 to get this to run.
In case it’s not clear the urls for the data files are at
v2.mean will need to be renamed to v2.mean. and the path names probably appropriately edited.
Nicely done, Jeff.
Thanks, Jeff and Carrick,
I revived the savePlot commands and found my version of R insisted on type=”jpeg”, not jpg. That fixed, it ran for a long time, through all the gridcell trending (showing reasonable plots), but stopped at line 350 where “weight” had not been created.
Re: Nick Stokes (Mar 26 07:30),
I went to this post, and found this code for weights:
weight=sin(seq(2.5,180, 5)*(pi)/180) #cell weights
meanweight=mean(weight)
weight=weight/meanweight #normalize cell weights
That let me proceed. Objects hadcrut and glava were missing, but I could just skip those plots. Then the function calc.anom() was missing. I guess I could find that without too much trouble, but by now a whole lot of plots have appeared, and they look pretty reasonable. “Global Temperature Anomaly” has a super HS at the end – an artefact, but matches well otherwise. Fig 4 of the previous post seems identical, as do the hemisphere anomaly plots. The temp plots look much the same.
It’s getting late here – I’ll track down calc.anom() tomorrow.
It sounds like I’ve got some clean up work to do. Too many posts were generated from different pieces of the code.
It seems the processing of the data has been refined to the point where the results are reliable based on the input data. The problem, as I see it, is the input data has been badly corrupted through UHI effect and other seemingly random adjustment (New Zealand, Australia, etc.) and poor station siting (Watt’s station siting project) plus the station dropout since 1990.
Once these problems have been corrected from the raw data, perhaps we can then determine what the real numbers are. Until then, it GIGO.
bang bang roman’s silver hammer came down, upon their heads…
It would be cool to replace the Actual temps in the GHCn files with some test data.
Other guys ( like zeke and CCC and CRU ) could then see how they handle that.
Jeff, I really think you need a fixed anomaly base period here (eg 1961-1990). I described here the problem caused if you let it vary. You are subtracting an anomaly adjustment made up of fragments from different time periods, and trend in the data will be appear as trend in the anomaly subtraction, unless you use a common period.
#17, That’s a good point for the standard anomaly method, Roman’s doesn’t need it. What should I do with series that don’t have data for that timeperiod?
Re: Nick Stokes (Mar 26 15:33),
Nick, I don’t think that you understand how the method works. The anomaly is not calculated individually for the “fragments”, but is calculated for the finished series. The entire calculation prior to that point is based on an “anomaly basis” common to all of the series – it’s called “0 degrees C”.
One of the things that clued me into the fact that this may be a good approach was the property in the original equivalent SS that Tamino proposed to minimize. Although it may not be immediately apparent from my formulation (but can be proved reasonably easily), the offsets are chosen in a way which also minimizes paired differences between offsetted values at the same time of measurement. If there is a trend common to all the series present in the temperatures, it is removed by the differencing and should not mathematically affect the calculation of the offsets. The trend is the mean of the offsetted values. All of this, however, is done simultaneously.
Show us that this is happening. If you have math to back up what you say (or at least some sort of example illustrating the effect of what you refer to), I will be happy to consider what you say. Otherwise, this is arm waving not mathematics.
#19, I may have misunderstood but I thought Nic was referring to the standard no-offset method compared to yours in the post above. It could be the reason that I’m about a tenth of a degree lower in trend than the published version.
Interesting work Jeff. Looks like you are ready to dive into segmenting and slicing up the stations to see what results you get from various subsets. I’ve been working on something along those lines over at Lucia’s place using satellite nightlights, population density, and urban boundary data as proxies for urbanity, with the assumption that rural places were most likely rural in the past so UHI effects will be observable in the difference between trends in adjacent urban and rural stations. Ron Broberg over at the Whiteboard has done some good work improving the GHCN station metadata by referencing it to various spatial datasets.
Checking my model quickly, it looks like I get a 0.258 C per decade trend, pretty close to what you get. For reference, GISS land temps show 0.190 C for that period and NCDC land temps show 0.294. I’m somewhat impressed how far apart they are.
Also: be careful in comparing your results to HadCRUT, since they don’t just use GHCN v2.mean_adj, but rather do a lot of their own adjustments to the data.
The product that should be easiest to replicate is NCDC land temps available at ftp://ftp.ncdc.noaa.gov/pub/data/anomalies/annual.land.90S.90N.df_1901-2000mean.dat since they are simply using the GHCN v2.mean_adj data.
Re: RomanM (Mar 26 16:00),
Roman, it’s not an issue just for fragments. It applies when combining any groupings – stations, gridcells etc. If the data covers different periods, and the anomalies are calculated with respect to those periods, then the means that you subtract have a trend. This is a far bigger effect than the artificial seasonal effect that you identified.
#22 thanks Zeke.
Nick,
I don’t know if you are reading my comments to you at Lucia’s or above but there are two issues Roman solved. One is the seasonal ripple, the second is to realign anomalies by offsetting them to remove steps.
Re: Jeff Id (Mar 26 15:35),
What should I do with series that don’t have data for that timeperiod?
That’s the big issue with all anomaly gridding methods. It’s why GISS compute anomalies for grid cells rather than stations, because the grid cell is more likely to have some data for the period.
My own non-standard suggestion is to do a regression over some period which is as close as possible to the 1961-90 but has enough data, and then use the end 1975 fitted value as the mean to subtract.
I managed to get your program to run to completion using the following calc.anom function, which I think implements your intent:
calc.anom =function(tsdat) {
anom = tsdat
for (i in 1:12) { sequ = seq(i,length(tsdat),by = 12)
anom[sequ] = tsdat[sequ] – mean(tsdat[sequ])
}
anom
}
I adapted it from here.
Re: Nick Stokes (Mar 26 16:55),
Jeff’s calculation of the trends simultaneously anomalizes the data. It is not done sequentially as you suggest. No “means with a trend” are subtracted.
ha,
I managed to get R to run on my mac.
Don’t expect anything of value from me for a while.
Re: RomanM (Mar 26 17:45),
Yes, OK, I’ve gone through the math and I now agree that with the simultaneous fitting of offsets and combined function (as with your method and Tamino’s), you don’t need a fixed base period to avoid spurious trend effects. However, the cost is that the offsets will vary as the data period varies. If new data comes in, all the past offsets change. GISS publishes gridded anomalies for 1978, say, and they would change with 2010 data. You need a fixed period to avoid that.
The objection to the varying base period still applies to the implementation of the “standard non-offset anomaly” which is presented here for comparison. There the fixed common period is needed, and is part of the “standard”. So a comparison without it is meaningless.
What if you plotted two charts – one for the grids with data in the same latitude around the globe and one for grids in the same longitude line around the globe. Smooth both graphs. For a grid with no data, use the average of the two (monthly) smoothed graphs at that grid. This would give an estimate based on the shape of the two temperature curves for the entire circumference of the Earth.
28.
Nick, does it matter to the comparision of trend estimations?
Re: steven mosher (Mar 27 14:12),
Yes. I think Roman’s method is fine, but the “standard” method without a fixed base period will tend to underestimate trend, which spoils the comparison.
I agree with Nick here that the “standard method” needs to be cleaned up so that a common base period gets used. Same for Roman’s just so prior years are stabilized against future data additions.
We need to get Jeff/Roman’s code robust enough and have it automatically produce “normal products”.
I would suggest these outputs:
global temperature in “standard format”:
each line would be of the form:
year jan feb mar .. dec
where jan…dec are the temperatures. E.g., line #1 of crutempgl3v:
1850 -1.650 -0.270 -0.481 -0.725 -0.572 0.059 -0.581 -0.300 -0.245 -0.303 -0.4
15 0.170 -0.443
same for northern hemisphere and southern hemisphere.
Then something like the hadcrut 5°x5° gridded format:
Year month 36 rows 72 columns NA
followed by 36 lines with 72 values each starting from 90°S going to 90°N i
such as:
This needs to get to that level so that misuse of your results by neophyte R-users like me are less likely to occur.
If you need any help with writing the output functions (which is really all that is needed right now), I’m sure Nick or I could help.
Not trying to be tough on you Jeff, it’s great work. It’s just getting close enough to a complete project to need further code cleanup. You knew that. 😉
I mean to have the first line of my gridded file to look like:
“NA” is the missing data token.
Carrick,
I also agree that Nick’s right and the normal method needs to be done over a base period. Roman’s method doesn’t use anomaly. The whole thing is compiled as seasonal data before the final trend was anomalized using a method employing Roman’s improved trend calc.
Jeff,
Here is a paper from Australia BoM year 2003
I had to OCR from a poor potocopy given to me by the BoM. Sorry about the quality. If you need equations repeated I’ll JPEG them.
The point is that the 1200 kn radius correction is invoked. The problem is that I do not know if it is invoked in the global data you have been using above. I suspect it its, so there is a double correction.
I seem to be due to pay some indulgences before the BoM will talk to me again because a few years ago I mentioned that global temperatures might not increase. If one of you has a better connection, you might be able to find precisely what adjustments the BOM has made to the data (and over what periods) that you are using in the above valuable global example.
Nick Stokes “It’s why GISS compute anomalies for grid cells rather than stations, because the grid cell is more likely to have some data for the period”
This is a judgement call, because there is no guarantee that an estimate made form other stations in a grid cell is any better or worse that a straight guess of a missing value for a station, or an average od data on either side.
One day we might get to a stage where we can simply leave all missing data out of global calculations, do no guesses or interpolations, then calculate a difference to see if it is significant. First we have to find a reliable series to compare the difference. It might be that the effects of all missing values sum to nothingness – they might be equally distributed over seasons, latitudes, altitudes, have cancelling TOBS adjustments and so on. Maybe this present work is a large investment in finding ways to overcome missing data and I hope that it allows an examnation of whether missing data matter or not. I am forever uncomfortable about assigning values to missing data because all methods are guesses, by definition.
I tried my short cut method of finding an anomaly by fitting a regression and using a fixed year value in place of the mean. I tested the 1978-2009 period, and used 1994 as the reference year, being a sort of mid-point. This required just a change to calc.anom(). Using the original calc.anom() I got a global trend of 0,211 C/Dec (cf Jeff’s 0.2107). Using the regression I got 0.2217 C/Dec, which is moving towards agreement with Roman’s method (0.2476), but not very far. However, I found that there was more dependence on the choice of year than I liked. Using 1975 brought the trend back to 0.2156 C/Dec.
Here’s the revised calc.anom():
calc.anom =function(tsdat) {
anom = tsdat
x = seq(1,length(tsdat),by = 12)
n = 1:length(x) – 195 # centred at 1994
for (i in 1:12) {
a = tsdat[x]
if(sum(!is.na(a))>8){ # eliminate very short sequences
h = lm(a ~ n)$coef[1]
anom[x] = a – h
}else{
anom[x] = NA
}
x = x + 1
}
anom
}
Nick:
There appear to be some R technical problems with your script. My computer hiccupped on several of the lines.
It appears that you are regressing the raw data on a monthly basis over the entire range and zeroeing the intercept at the given year. I am not sure that the programme actually does that correctly since the results I was getting on a sequence of 240 normal variates looked a little weird.
You are using different slopes for each month and I don’t know that that is desirable. The other possible difficulty I perceive with this is the assumption of linearity inherent in the process. I am having some trouble visualizing exactly what anom actually represents at the end of this entire calculation.
Carrick:
Yes, you have a point here.
I ran an example with four series with reasonable amounts of data. I did four runs cutting the data at 1979, 1989, 1999, and 2009, respectively. Differences among the four resultingcombined series were less than a maximum .05 in more recent years (with most much closer to zero), however some reached .15 a century back when only one station was contributing.
The reason is that the offsets may change slightly as more information becomes available for comparing the input series. Anomalizing the combined series does not change the situation substantially. However, if the entire period is not used, the problem of some series not being usable any more becomes an issue. As well, the uncertainty of the combined series also is increased. Some thought needs to be put into finding procedures regarding these stabilization issues.
RomanN, when you add more data there are two effects, one is geometric, namely the centroid of the data shifts forward in time. Subtracting the value for a constant baseline (e.g., 1960-1980) just “freezes” the baseline so it doesn’t drift as you add more data.
The second effect is a statistical one, and is a “real” effect: As you add more data, you are using more information, and that obviously influences your estimates of the offsets.
Re: RomanM (Mar 28 12:40),
You are using different slopes for each month and I don’t know that that is desirable.
Yes, it’s a short cut to emulate the effect of a fixed base period without the difficulties that ensue when there are missing values there. It’s for comparison, not claimed to be better than your use of a common trend. The idea is that instead of subtracting the mean of each grid set, you fit a line and subtract its value in a fixed year.
I’ve pasted the code as it ran here. I have been trying (unsuccessfully) to locate a technical problem because, as I mentioned, it is more sensitive to the choice of base year than I think it should be. Could you be using a different na.action.default for lm()? I’m using the default na.omit.
Sorry for the junk transmission in my 35. I’ll tidy it up. Different browsers/hosters can have subtle (to me) differences and sometime I get it wrong the first time.
The essence is whether countries like Australia have already made adjustments of the type you incorporate, before you get the data. I simply do not know the answer. The important graphs from above for visual purposes are
http://i260.photobucket.com/albums/ii14/sherro_2008/d-m05.jpg?t=1269817480
Re use of correlation coefficients, one gets a different answer using daily Tmax over a short period, to annual Tmax over a long period, for parts of the the same pairs in a group, and a different answer again after smoothing. How do you decide the fundamentally correct calculation? It points to correlation coefficients changing as instruments change, since some of them can do smoothing like daily spike rejection.
Geoff what’s the reference for that figure?
Thanks.
geoff
the missing data and the adjustments are the bits of fudge where more certainty gets assigned than is warrented.
I’ll take TOBS for example. If you have 60 measurements per month and your instrument error is like that assumed by Jones,
then your monthly estimate is good to .03C 1sd.
So a monthly temp ( say march) is say 10c +-.1C at a 95% CI.
Now you find out that this 10C was taken at 6AM as opposed to midnight, the standard time.
What’s a poor climate scientist to do?
Well, build a TOBS model. This model is empirical. they look at years of hourly data from stations. They note the
month, the lat, the lon, the position of the sun. Then they do a regression. This regression is a prediction.
The TOBS regression says This:
F(lat,lon,month, 6AM) = -.5C. In most cases the prediction has a SE of between .1 and .2C It all depends.
So now, how well do you know that march temperature?.. The estimate comes out to 9.5C.. what about the error?
I imagine that it’s vanishingly small. Still, I’d like to see how its treated. Same for the MMTS adjustments.
Jeff,
Cross-posting this from Lucia’s comment thread, since I’m not sure you are actively reading it. Would it be possible to obtain annual anomalies 1900-present for this method? I’m planning on learning R myself in the near future, but at the moment I’m looking to put together all the various surface temp reconstructions floating around (as well as GISS/NCDC/CRUT land temps) on a chart.
43 Carrick
I could not find an electronic version and goofed in creating one. All I have is
Aust. Met. Mag. 53 (2004) 75-93
Updating Australia’s high-quality annual temperature dataset
Paul Della-Marta and Dean Collins
National Climate Centre, Bureau of Meteorology, Australia
and
Karl Braganza
CSIRO Atmospheric Research, Aspendale, Australia
(Manuscript received June 2003; revised September 2003)
If you go to my post at 35 above, the whole lot is hyperlinked by accident so you can read most of the text. The figures were dropped out in the OCR/rename/post prcesses. Let me kow if you need figs or equations. Geoff.
44 Steven Mosher I’m surprised how hard it is to discover if a country submitted data to the CRUs of the world in raw, averaged, adjusted or whatever form, over what periods and also whether this has been upgraded as the info is constantly being improved by people digging back through the original meta data sheets and making revisions to the home country data. My worry is that better statistical methods for assembling global data are doubling up on some adjustments without the authors knowing that they do not have truly “raw” data. I have an email from one country authority asserting that their data are “raw” when in fact they are rather modified.
Thanks, Geoff. My google kung fu has grown powerful indeed.
Here’s the paper. Mercifully no pay wall.
Actually you want this one (hires figures)
48,49 Carrick,
Thank you for the electronic version. I feel a bit better knowing that it’s not so easy an exercise with a PC. (Enter stage left, Mac owners with PDF claims).
Now can you find out the periods when the BOM applied these adjustments and which coefficients they used?
WIthout making a long, involved comment on the INSANITY of “average temperature”. (Sorry, it’s patent nonsense.) If one wishes to think there is some significance in it…let’s note this, the rise (if real, I have HUGE trouble with the QA of the data, which I’ll get to in a moment)…is “linear” and “consistent” from 1900 on. When one uses the Mana Loa CO2 data as a baseline, and the allegations with regard CO2’s “heat trapping” (which I also debate..yes, I’m and Elasser/Milkosky radical…CO2 is a NET EXHANGE AGENT in the troposphere, and makes no differences until above 2500 PPM!) there is obviously a “cause/effect” problem.
This heads us towards asking what causes overall cycles, taking “man” out of the question.
With regard to DATA QUALITY. WHEN, just WHEN does anyone think we transitioned from “hand recorded”, “guess at the high or low” data…and “rotating drum” recorders?????
Does ANYONE REMEMBER that there were NO satellites in 1900? 1920, 1950, even up to 1980??? Does anyone have a CONCEPT that the “hand recorded data” has virtuatlly NO VALUE for 0.1 C values? (.2 F?)….Does anyone understand GARBAGE DATA IN GARBAGE ANALYSIS OUT?
The reason for the gradual rise may very much be due to a systematic data bias error due to changing technology.
I see NO error bars. I see NO attempt to quantify error. (Although BEST does have that. I will give them credit for that, but ZERO credit for the great “Average temperature” flaw.!!!!)
I hope and pray that some day, some BRIGHT INTELLIGENT people will Wake UP! And realize the futility of all this “average temperature” nonsense.
The ONLY think that you can do with temperature averages is look at a LOCAL REGION, and see if there are “seasonal shifts” going on.
I’ve done that for MN. AND AFTER REMOVING THE UHI from the Minneapolis/St. Paul area, and “guessing” at the “error bars” for the prior to 1920 data, I cannot (beyond statistical doubt, STANDARD DEVIATION ANYONE?) say that since the first temperature record (Fort Snelling, Minneapolis MN) there has been any SIGNIFICANT change in the overall weather/temperature profiles in MN. (Barring the fact that the 1820-1840 was the end of the “mini-ice age” and does show early winters and late springs. And averages somewhat lower than the following 40 years, but then 1880 to 1920 pans out pretty close in historgrams and trends to the 1820 to 1840 realm. Again, comparisons made by HISTOGRAMS in terms of “degree days”, and NOT by strictly averaging temperatures…insanity.
The way I read Roman’s offset function, it combines grid cells using ordinary least squares with the area weighting imposed as a linear equality constraint on the offsets. This weighting only shifts the resulting global temperature series up or down, it doesn’t affect its shape. All grid cells have equal weight in the least squares calculation, regardless of the equality constraint, which means that high-latitude regions have a disproportionately big influence on the result. So this method is actually biased towards warming due to faster warming in high latitudes. I think proper weighted least squares would solve the problem and probably result in better agreement with CRUTEM.
That’s an old post. This section attempts to correct for longitude weighting. Looks like latitude but the longitude is what is compressed in lat/lon coordinates.
weight=sin(seq(2.5,180, 5)*(pi)/180) #cell weights
meanweight=mean(weight)
weight=weight/meanweight #normalize cell weights
Roman did the cool anomaly join process. The parts with mistakes are mine.
Thanks, Jeff. Sorry for bumping an old thread, I’m implementing this method and your posts have helped a lot.
I think what goes into the sine function is the polar angle in spherical coordinates. This is the same as taking the cosine of the latitude, which makes sense.
The problem is mainly in line 92,
temp = rowMeans(tsdat-matoff,na.rm=T)
This is where the combined temperature series is calculated as the mean of the gridded data minus the mean of the offsets. So the combined series is just the unweighted global mean minus some constant. Using an area weighted mean in this step would solve the problem and result in a slightly reduced warming rate.
I don’t mean to criticize Roman’s method. It’s beautiful!