## Global Gridded GHCN Trend by Seasonal Offset Matching

Posted by Jeff Id on February 26, 2010

This post employs global gridded data using Seasonal Offset Matching. RomanM’s recent posts on the topic might even become standard method in climate science. Certainly the Met office should at least pay attention to his methods if they are planning on redoing the dataset. Of course this example is just the incredibly lacking GHCN dataset, full of station dropouts, weak metadata and urban warming but it provides a good example set to apply the methods toward. In my previous versions of the GHCN the code spent a fair amount of time to determine if the data was from the same instrument or not. This method does a good job combining the station info either way. With that said, this is still a preliminary result but it is getting closer to a finished set.

The way the algorithm is applied here, stations with multiple copies of the same data are weighted more in the averaging. Also, this temperature series is only for land based data.

Roman’s station combination code is explained here. – If you’ve not read up on it, this is the best description to date of what he has proposed.

The concept is to provide 12 months of offsets for each temperature series when combining data. It maximizes corrections for missing data when combining multiple incomplete temperature series. The function he wrote is deceptively simple.

calc.offset = function(tdat) { delt.mat = !is.na(tdat) delt.vec = rowSums(delt.mat) co.mat = diag(colSums(delt.vec*delt.mat)) - (t(delt.mat)%*%delt.mat) co.vec = colSums(delt.vec*delt.mat*tdat,na.rm=T) - colSums(rowSums(delt.mat*tdat,na.rm=T)*delt.mat) offs = psx.inv(co.mat)%*%co.vec c(offs)-mean(offs)} ### generally assumes that tsdat starts in month 1 and is a time series. monthly.offset =function(tsdat) { dims = dim(tsdat) off.mat = matrix(NA,12,dims[2]) dat.tsp = tsp(tsdat) for (i in 1:12) { dats = window(tsdat,start=c(dat.tsp[1],i), deltat=1) offsets = calc.offset(dats) off.mat[i,] = offsets } colnames(off.mat) = colnames(tsdat) rownames(off.mat) = month.abb nr = dims[1] nc = dims[2] matoff = matrix(NA,nr,nc) for (i in 1:nc) matoff[,i] = rep(off.mat[,i],length=nr) temp = rowMeans(tsdat-matoff,na.rm=T) pred = c(temp) + matoff residual = tsdat-pred list(temps = ts(temp,start=c(dat.tsp[1],1),freq=12),pred =pred, residual = residual, offsets=off.mat) }

This version which includes Roman’s own hand written regression, doesn’t fail on some of the weird series with no overlapping months of data. I had already written the monthly station combining code from my getstation efforts so it was a simple matter t0 combine Roman’s new algorithm with my work to plot a complete GHCN global temperature station trend.

gridval=array(NA,dim=c(length(sid),2)) #place stations on grid for(i in 1:length(sid)) { gridval[i,1]=as.integer((rawcoord[i,2]+90)/5)+1 #lat gridval[i,2]=as.integer((rawcoord[i,1]+180)/5)+1 #lon } gridtem=ts(array(NA,dim=c(36,72,161*12)),start=1850,deltat=1/12) for(ii in 1:72) { for(jj in 1:36) { maska=gridval[,2]==ii maskb=gridval[,1]==jj maskc= maska & maskb if( sum(maskc)>0) { tar=array(NA,dim=c(161*12,sum(maskc))) tar=ts(tar,start=1850,deltat=1/12) kk=1 for(i in sid[maskc]) { rawsta = getallcurves(as.numeric(i)) if(ncol(rawsta)>1) { sta=monthly.offset(rawsta)$temps }else{ sta=rawsta } ind=which(tinv[,2]==i,arr.ind=TRUE) if(max(time(sta))>1850) { sta=window(sta,start=1850) index=as.integer(((time(sta)-1850)*12+.02)+1) tar[index,kk]=as.vector(sta) } kk = kk+1 } if(ncol(tar)>1) { gridtem[jj,ii,]=calc.anom(monthly.offset(tar)$temps) }else{ gridtem[jj,ii,]=calc.anom(tar) } print(jj) plot(gridtem[jj,ii,],type="l") } } print ("COLUMN") print(ii) } glav=rep(NA,161*12) weight=sin(seq(2.5,180, 5)*(pi)/180) meanweight=mean(weight) weight=weight/meanweight #weight=rep(1,36) # calculate unweighted temp mask=rep(FALSE,36) mask[1:18]=TRUE for(kk in 1: (161*12)) { mm=rep(NA,72) for(ii in 1:72) { mm[ii]=mean(gridtem[mask,ii,kk]*weight,na.rm=TRUE) } glav[kk]=mean(mm,na.rm=TRUE) } glava=glav mask=!mask for(kk in 1: (161*12)) { mm=rep(NA,72) for(ii in 1:72) { mm[ii]=mean(gridtem[mask,ii,kk]*weight,na.rm=TRUE) } glav[kk]=mean(mm,na.rm=TRUE) } glavb=glav #glav=(glava+glavb)/2

Again the code is preliminary but most of the bugs have been worked out over time. This calculates a 5×5 degree gridded temperature plot directly from the GHCN data without any hidden steps. It employs a unique and publishable method by Roman M for combining temperature series with anomalies and results in the following temperature plots.

The trend above looks pretty reasonable for a GHCN dataset. I expanded it to include 1850 to present below.

The trend since 1978 is below. Plenty of warming. 1998 again is not the highest year by the complete GHCN dataset.

Global trend since 1900 plotted by gridcell.

Global temperature trend since 1978

## jstults said

Nice work!

These sorts of posts are great as “see, it’s not so hard to show your work” exemplars (and usually when you know something is for public consumption, you write better, more well commented and documented stuff anyway).

Thanks to you both, keep it up.

## RomanM said

psx.inv (pseudoinverse calculation) is missing. It is needed in calc.offset.

## Jeff Id said

#2, I’ll put it in tonight, in the meantime others can get it from the link to Roman’s blog.

## Hmmm said

Neat! Can you plot the adjustment magnitudes over the years too? Or a before and after plot of the temp anomaly?

## Gene Zeien said

omg, you’re using the adjusted GHCN data. Don’t even need a link to your data source to see that (though a link would be nice). Drop by justdata.wordpress for bash scripts to parse the raw data files and a link to the unadjusted temps.

So sad, that so many people have wasted time working with that junk. “Scientists” start with the same padded & fluffed data. Never mind that Jim Hansen is lead author on the paper used to justify the adjustments…

## steven mosher said

That is great work.

Next some joker will ask for error bars.

A while back ( like 2 years ) Roman discussed with me the way Brohan jones 06 handled the grids that had

part ocean and part land.. I recall that he had issues with that as well..

So when you get to including SSTs that will be a fun discussion.

## David S said

Jeff

I don’t know anything about code, but remember some trigonometry – is

weight=sin(seq(2.5,180, 5)*(pi)/180) the bit where you adjust for the relative size of the 5 x 5 cells?

## Jeff Id said

#5, Why would you assume I’m using adjusted data? This is what they call raw, although Ryan has pointed out even the raw isn’t really raw.

#7, Yup.

## RomanM said

#6 Steven

Your memory better that I thought! Yes, rather than combining grids with both land and water using a fixed pair of weights (such as relative area contribution), they “minimise” the variance of the end result by combining with weights related to the uncertainties of the estimates of the two components. Although it

soundslike a good thing to do, it means that these weights can vary from month to month making grid anomalies at different times incomparable and introduces a bias into the end result.It is like trying to estimate a combined mean of some variable for two equal size populations (that differ drastically from each other). However, 90% of my sample is taken from population A and 10% from B. Instead of weighting the two calculated averages equally to calculate a combined average, since my error bars are so much smaller for group A, I weight them 90% and 10% instead.

Learning some stratified sampling stuff could have been helpful for them…

## Tim said

#9 My understanding is there are huge issues with grid cells containing water that is covered in ice part of the year. This plays hovak with the averages if the time of the ice advance/retreat changes.

## Zeke Hausfather said

Jeff,

Nice work. Its interesting to note that the next effect of the adjustments is to lower the global trend in recent years for the major series (GISS/NCDC/HadCRUt) since they all have a slope of ~ 0.16 C per decade for the period 1978-present (versus 0.233 for the raw data).

It would be interesting to see how the Tamino/Roman method of station combination effects the trend vis-a-vis the reference station method used by GISS.

## Zeke Hausfather said

Er, by next I meant net :-p

## Zeke Hausfather said

Also, I was comparing apples to oranges; the GISS land-only trend is 0.19 C per decade. Still, the general point holds.

## Tim said

#13 – Zeke, I would not get too excited. Many of the dubious adjustments that people find by pulling up individual stations seem to involve adding cooling to the earlier record rather than adding warming to the recent record. This would have the effect of making the 1940 peak look lower than it should be.

Also, UAH has a trend of .13 degC/decade. That may be closer to the true trend once UHI is taken into account.

## Jeff Id said

I just realized “Anomaly” worked its way into the title again.

## steven mosher said

#9

Of course I remember! Dr. M you ever have one of those students who waved his hand wildly with a question he thought was smart?

and maybe 1 times out of 100 it was smart? I’m that dude of course I’d remember that.

## Ruhroh said

Important to note here, an Anti-Groupthink behavior;

Jeff did publish a graph which is not notably dissimilar from some Jones=esque graphs.

This would seem to be a prime candidate for Self-Censorship if Jeff was constrained by GroupThink pressure.

Instead, it seems to be a good example of

Derive/Define the Analytical Method,

then look unflinchingly at the results.

Integrity on Parade…

RR

## Jeff Id said

#17, Thanks for noticing. There is one hell of a lot of warming here :)

## RC Saumarez said

I agree with #17. I find it rather difficult to derive basic mathematical methods from uncommented code. I would like to see an exact analytical decription of how weighting is performed before delving into code.

## RC Saumarez said

Having followed the links after my last post, this seems to be a suspect method. There does not seem to be a proper physical assumption behind the method and estimating the offsets, by a minimisation procedure, I assume, is likely to be ill-conditioned due to a large number of unknowns. This underlies the comment that there are many possible solutions for the offsets. Should there be a geographical weighting between stations?

It simply reinforces my view that any amount of processing lousy data unless, it had been degraded in a very precise way, is not going to produce good estimates of the underlying process.

## jstults said

Yeah, being successful at ill-posed problems depends heavily on knowing the forward model and the noise process pretty well.

## RomanM said

#20 RC S:

I don’t know what “proper physical assumption” you are looking for. Perhaps you should tell us. I presume that you saw the posts here for the model and here for some of the unbiasedness properties.

This is an estimation problem. This particular solution treats each station equally. The method deals with the problems created by missing data in the various series and has some verygood properties. If you want weighted stations, that is possible within this context.

The calculation methodology is proper statistical Analysis of Variance. The “non-uniqueness” consists of a single additive constant which could be added to each of one set of coefficients and subtracted from each coefficient in the other set. Ill-conditioned? I don’t think so. I used a pseudo-inverse to simply avoid choosing a single superfluous equation to eliminate when solving for the coefficients.

I have no idea what you mean by

I would be interested in hearing an explanation.

## jstults said

RomanM said:

I thought he was talking about something along the lines of deconvolution, if you know that your signal was blurred with a gaussian, it’s pretty easy (if numerically ill-posed), if the ‘degradation’ process was more complex than that though…

## Geoff Sherrington said

The rawest of gobal temperature data that I have been able to find is from the Oak Ridge DoE CDIAC files TR022 and TR027, two thick books that go to 1986. They have been put into machine form as described at the Warwick Hughes blog http://www.warwickhughes.com/blog/?p=510

Anyone who uses adjusted data does so at the mercy of the adjusters, of course.

It was interesting to see a partial explanation for one of my hot topics, the global hot year of 1998.

Roy Spencer (07:31:23)noted:

“Steve Goddard: the January 1998 warm spike in the satellite data compared to surface data is because peak El Nino warmth is in the eastern tropical and subtropical Pacific, which is poorly sampled by thermometers. The same is true during most if not all El Nino years.”

An alternative or perhaps complementary explanation comes from a recent, fascinating blink graph at a site I can’t find now (bookmark went wrong), where the hot 1998 was compared to ththe coldest year early in the 1900s, month by month. The peak maxima and minima were roughly the same, it’s just that 1998 had a longer warmer spring and autumn. How does GHG do that? How does El Nino do that? From where does the troposphere gain heat? Of course, large or medium monthly excursions from the customary annual pattern can upset infilling.

## Ausie Dan said

I would like to see this repeated using rural stations only.

From what I have seen elsewhere (WUWT I think),

using rural stations only, plus your method, would almost eliminate ALL warming.

In that case, it would certainly be worse than they thought!

## RomanM said

#23

Sounds like overkill to me. That would make the situation more complicated that it should be.

## steven mosher said

jeffid.. ever check out impervious surface data.. a different measure of urbanity

## The Blackboard » A simple model for spatially-weighted temp analysis said

[…] data (and actually lowers the global temperature vis-a-vis the raw data, as both this analysis and a similar one by Jeff Id show). The Rural/Urban designation here is based on GHCN station metadata; it may not be perfect, […]

## Gene Zeien said

RE: #8 “How do you recognize adjusted data?”

Look familiar? Compare to graph 2.