the Air Vent

Global Gridded GHCN Trend by Seasonal Offset Matching

This post employs global gridded data using Seasonal Offset Matching.  RomanM’s recent posts on the topic might even become standard method in climate science.   Certainly the Met office should at least pay attention to his methods if they are planning on redoing the dataset.  Of course this example is just the incredibly lacking GHCN dataset, full of station dropouts, weak metadata and urban warming but it provides a good example set to apply the methods toward.  In my previous versions of the GHCN the code spent a fair amount of time to determine if the data was from the same instrument or not.  This method does a good job combining the station info either way.  With that said, this is still a preliminary result but it is getting closer to a finished set.

The way the algorithm is applied here, stations with multiple copies of the same data are weighted more in the averaging.  Also, this temperature series is only for land based data.

Roman’s station combination code is explained here. – If you’ve not read up on it, this is the best description to date of what he has proposed.

The concept is to provide 12 months of offsets for each temperature series when combining data.  It maximizes corrections for missing data when combining multiple incomplete temperature series.  The function he wrote is deceptively simple.



calc.offset = function(tdat) {
  delt.mat = !is.na(tdat)
  delt.vec = rowSums(delt.mat)
  co.mat = diag(colSums(delt.vec*delt.mat)) - (t(delt.mat)%*%delt.mat)
  co.vec = colSums(delt.vec*delt.mat*tdat,na.rm=T) - colSums(rowSums(delt.mat*tdat,na.rm=T)*delt.mat)
  offs = psx.inv(co.mat)%*%co.vec
  c(offs)-mean(offs)}

### generally assumes that tsdat starts in month 1 and is a time series.

monthly.offset =function(tsdat) {
  dims = dim(tsdat)
  off.mat = matrix(NA,12,dims[2])
  dat.tsp = tsp(tsdat)
 for (i in 1:12) {
    dats = window(tsdat,start=c(dat.tsp[1],i), deltat=1)
    offsets = calc.offset(dats)
    off.mat[i,] = offsets }
  colnames(off.mat) = colnames(tsdat)
  rownames(off.mat) = month.abb
   nr = dims[1]
   nc = dims[2]
  matoff = matrix(NA,nr,nc)
 for (i in 1:nc) matoff[,i] = rep(off.mat[,i],length=nr)
  temp = rowMeans(tsdat-matoff,na.rm=T)
  pred =  c(temp) + matoff
  residual = tsdat-pred
list(temps = ts(temp,start=c(dat.tsp[1],1),freq=12),pred =pred, residual = residual, offsets=off.mat) }


This version which includes Roman’s own hand written regression, doesn’t fail on some of the weird series with no overlapping months of data. I had already written the monthly station combining code from my getstation efforts so it was a simple matter t0 combine Roman’s new algorithm with my work to plot a complete GHCN global temperature station trend.


gridval=array(NA,dim=c(length(sid),2))

#place stations on grid
for(i in 1:length(sid))
{
	gridval[i,1]=as.integer((rawcoord[i,2]+90)/5)+1  #lat
	gridval[i,2]=as.integer((rawcoord[i,1]+180)/5)+1  #lon
}
gridtem=ts(array(NA,dim=c(36,72,161*12)),start=1850,deltat=1/12)
for(ii in 1:72)
{
	for(jj in 1:36)
	{
		maska=gridval[,2]==ii
		maskb=gridval[,1]==jj
		maskc= maska & maskb

		if( sum(maskc)>0)
		{
			tar=array(NA,dim=c(161*12,sum(maskc)))
			tar=ts(tar,start=1850,deltat=1/12)
			kk=1
			for(i in sid[maskc])
			{
				rawsta = getallcurves(as.numeric(i))
				if(ncol(rawsta)>1)
				{
					sta=monthly.offset(rawsta)$temps
				}else{
					sta=rawsta
				}

				ind=which(tinv[,2]==i,arr.ind=TRUE)

				if(max(time(sta))>1850)
				{
					sta=window(sta,start=1850)
					index=as.integer(((time(sta)-1850)*12+.02)+1)
					tar[index,kk]=as.vector(sta)
				}
				kk = kk+1
			}
			if(ncol(tar)>1)
			{
				gridtem[jj,ii,]=calc.anom(monthly.offset(tar)$temps)
			}else{
				gridtem[jj,ii,]=calc.anom(tar)
			}
			print(jj)
			plot(gridtem[jj,ii,],type="l")

		}
	}
	print ("COLUMN")
	print(ii)
}

glav=rep(NA,161*12)
weight=sin(seq(2.5,180, 5)*(pi)/180)
meanweight=mean(weight)
weight=weight/meanweight
#weight=rep(1,36) # calculate unweighted temp
mask=rep(FALSE,36)
mask[1:18]=TRUE

for(kk in 1: (161*12))
{
	mm=rep(NA,72)
	for(ii in 1:72)
	{
		mm[ii]=mean(gridtem[mask,ii,kk]*weight,na.rm=TRUE)
	}
	glav[kk]=mean(mm,na.rm=TRUE)
}
glava=glav

mask=!mask
for(kk in 1: (161*12))
{
	mm=rep(NA,72)
	for(ii in 1:72)
	{
		mm[ii]=mean(gridtem[mask,ii,kk]*weight,na.rm=TRUE)
	}
	glav[kk]=mean(mm,na.rm=TRUE)
}

glavb=glav

#glav=(glava+glavb)/2

Again the code is preliminary but most of the bugs have been worked out over time. This calculates a 5×5 degree gridded temperature plot directly from the GHCN data without any hidden steps. It employs a unique and publishable method by Roman M for combining temperature series with anomalies and results in the following temperature plots.

The trend above looks pretty reasonable for a GHCN dataset.  I expanded it to include 1850 to present below.

The trend since 1978 is below.  Plenty of warming.  1998 again is not the highest year by the complete GHCN dataset.

Global trend since 1900 plotted by gridcell.

Click to expand


Global temperature trend since 1978

Click to expand