the Air Vent

GHCN Another Way

Ok so I’ve spent far too much time this weekend messing around with GHCN data in R. R has memory limitations which have driven me to the edge of sanity but doggedly I pushed on. In case you weren’t sure, slogging through temperature data is a crap way to spend a weekend but I’m getting closer to an independently created gridded temperature dataset by someone who makes not one penny for promoting one conclusion or another. Unfortunately, were stuck with the crap data archive at GHCN.

Yesterday my goal was simple, instead of collecting timeseries into a single WMO number, I wanted to collect every individual series into it’s own column in a matrix. To that end I wrote the following code. It looks simple but there were a ton of issues with getting this to work . Two days worth in fact, R doesn’t have good memory handling, doesn’t have breakpoints and runs impossibly slow if certain commands are or are not used. All of it makes sense after you beat your head on the wall enough, which I did, and the beginning of the result is shown in the code for compiling all of the GHCN series individually is below.


allstations=as.numeric(levels(factor(ghmean[,2]*100+ghmean[,3])))
nost=length(allstations)
rows=nrow(ghmean)
tempsta=array(NA,dim=c(1920,nost))
indexvec=(ghmean[,4]-1850)+1
mask=indexvec<1
indexvec[mask]=161

stmaskvec=ghmean[,2]*100+ghmean[,3]
td=array(unlist(ghmean[,5:16]),dim=c(595758,12))

datmask=rep(TRUE,1921)
datmask[1921:1932]=FALSE
for(i in 1:nost)
{
	mask =allstations[i]==stmaskvec
	dat=array(NA,dim=c(12,161))
	dat[,indexvec[mask]]=t(td[mask,])
	dim(dat)=c(1932,1)
	mask=dat==-9999
	dat[mask]=NA
	dat=calc.anom(dat/10)
	tempsta[,i]=as.numeric(dat[datmask])
	print (i/nost)
}
tempsta=ts(tempsta,start=1850,freq=12)
dimnames(tempsta)[[2]]=allstations

#save(tempsta,file="ALL GHCN DATA")
#load(file="ALL GHCN DATA")


This code loads ghcn into a matrix 1920 months (160 years) long and 13464 columns wide. We’ve seen a variety of numbers for GHCN station count but 13464 is the true number of total series. Availability by month of GHCN data is as follows:

Figure 1

So we can see the actual peak number of available stations crossed 8000 at one point. However there really were over 13000 instruments in operation over the last 150 years. My understanding is that most of those are still in operation today but for inexplicable reasons, are not collected together. One of the biggest scandals of climate science is that since 1992 there have only been an average of 2000 stations recorded in the primary global database. With all the media outlets screaming global warming doom, unless we pay more taxes, don’t you think someone would notice that NOBODY IS ARCHIVING PROBABLY 80 PERCENT TEMPERATURE STATIONS!!!! Oh yeah, but we have doom in India from melting glaciers, THAT NOBODY MEASURED EITHER! Doom from drought which will never happen. Doom from fish shrinkage due to global warming. Doom!! they say didn’t you hear them? Fig 2 shows the mean of all the GHCN data unweighted and averaged.

Figure 2

I just realized, a couple thousand of us are sitting here on a blog called a ‘skeptic’ blog, ‘denier’ blog, ‘contrarian’ blog and our SIN is plotting the DATA!!! That’s where we went wrong, plotting the damned data. You evil denialist beasts. Sorry, the past two months of climate change have permanently altered my perception of the authorities. — not for the better, in case you were wondering 😉

d2Climate/dt2 = f (Politics)

The change in the climate of climate change over time as a function of politics. I know it’s a crap expression, but it makes me laugh.

Anyway, there ain’ t a lot of trend in Figure 2. In F3, the data is filtered to reveal the incredibly small change in temperatures recorded by the single most complete set of temperature stations on earth over the last 150 years.

Figure 3

There it is, global warming doom. Now most of these stations came from the US, but I wonder how much that really matters. Most of the warming this century happened in the northern hemisphere yet not in the US. If I’m Russia, I would hand out thousands of bic lighters to temperature station operators, if you know what I mean, and hope that the US committed unilateral suicide with the data. Anyway, I’m concerned with Fig 3 because there isn’t enough warming to match the consensus. Before we go into that, I want to say that NONE of my work has 1998 as the warmest year on record. EVER. Yet every consensus metric on earth does. What does that mean? Am I not selective enough with the data? It’s odd, and it could be something wrong here B/C nobody gave me 22 million dollars over the last ten years. Instead of ridiculous $$, we “skeptics”, who actually care about the effing data and whether the JIC’s are lying to us, have a weekend of effort.

Figure 4

The gridding code looks like this:


latlon=tinv[,5:6]
stid=tinv[,2]

coln=as.integer(as.numeric(unlist(dimnames(tempsta)))/100)
Ncol=length(coln)
collatlon=array(NA,dim=c(Ncol,2))

for ( i in 1:Ncol)
{
	mask=stid[i]==coln
	collatlon[mask,2]=latlon[i,1]#latitude
	collatlon[mask,1]=latlon[i,2]#longitude
}

dt=rep(.5,Ncol)
plt.map(dt,collatlon)

gridval=array(NA,dim=c(Ncol,2))

#place stations on grid
for(i in 1:length(coln))
{
	gridval[i,1]=as.integer((collatlon[i,2]+90)/5)+1  #lat
	gridval[i,2]=as.integer((collatlon[i,1]+180)/5)+1  #lon
}

griddata=function(dat=dat,coord=coord)
{
	gridtem=array(NA,dim=c(36,72,160*12))
	for(ii in 1:72)
	{
		print("col")
		print (ii)
		for(jj in 1:36)
		{
			maska=coord[,2]==ii
			maskb=coord[,1]==jj
			maskc= maska & maskb
			maskc[is.na(maskc)]=FALSE
			print(jj)
			if( sum(maskc,na.rm=TRUE)>1)
			{
				gridtem[jj,ii,]=rowMeans((dat[,maskc]),na.rm=TRUE)
			}else{
				if(sum(maskc,na.rm=TRUE)==1)
				{
					print(sum(maskc,na.rm=TRUE))
					print(jj)
					gridtem[jj,ii,]=dat[,maskc]
				}
			}
		}
	}
	gridtem
}

gt=griddata(tempsta,gridval)

globt=rep(NA,1920)
for(i in 1:1920)
{
	globt[i]=mean(gt[,,i],na.rm=TRUE)
}
globt=ts(globt,start=1850,freq=12)
plot(ff(globt), main="Mean of All GHCN Data Gridded 5x5 Deg.", ylab="Anomaly C", xlab="Year")

So what’s the deal then? Why is this result different? Again, I’m not terribly sure, another weekend and still no solid answers. There is a clue though. The data recently released from CRU which is supposed to be raw GHCN has an interesting difference.

Figure 5

It’s a reasonably continuous series yet GHCN timeseries which CRU claims Figure 5 is from, have not one single continuous set of data.

Figure 6

You can see that the data records from all the long stations stopped around 1990, with only one series recorded in recent times (where all the global warming occured). So, since CRU claims that the data they have for each station is directly from GHCN, how did they get so many years of continuous data for station number 25563? I’ve already spent hours looking into this question in an attempt to reverse engineer the process. I could take the easy road and send an email to UEA — do you think they would help the climategate blogger figure this out? Anyway, it’s time to go to work this Monday morning, so we’ll get to look at the next exciting bit tonight or tomorrow. CRUGlue, adhering the data .