the Air Vent

Because the world needs another opinion

GHCN Gridded Temperature Data

Posted by Jeff Id on February 15, 2010

I’ve been working with GHCN temp data.  In this case I used a function called getstation4 which attempts to look at each station ID and determine if the data is from the same instrument or a different one. I’ve put the code here for it because anyone can use it for compiling individual station numbers.

For example in GHCN station number 60360, there are 4 separate series.  In raw temperature form, if there are different sensors close together at different altitudes they will have an offset in temperature. Also, a sensor mounted in a city area or by an air conditioner vent will read differently than one mounted five miles away in a field. These must not be averaged as absolute temperature and are best averaged after anomaly is taken while data from the same station is better averaged before anomalization. I think the mathematically inclined will appreciate the difference.

To be considered the same as each other, I made the condition that both stations must have some overlap and that the sum of the absolute value of the overlap must not be greater than 0.2 with this line.

if (mean(abs(dt),na.rm=TRUE)<.2)

Where dt is the difference between two GHCN series with the same WMO ID number. From several individual checks, I think the condition pretty well insures that I don’t accidentally join two series which don’t belong. The code is below.

getstation4=function(staid=60360)
{
	alladj=NA
	allraw=NA
	#raw data
	mask= (ghmean[,2]==staid)
	data=ghmean[mask,]
	noser=levels(factor((data[,3])))
	for(i in noser)
	{
		mask2=data[,3]==i
		startyear=min(data[mask2,4])
		endyear=max(data[mask2,4])
		subd=data[mask2,5:16]
		index=(data[mask2,4]-startyear)+1
		dat=array(NA,dim=c(12,(endyear-startyear)+1))
		for(j in 1:length(index))
		{
			dat[,index[j]]=as.numeric(subd[j,])
		}
		dim(dat)=c(length(dat),1)

		dat[dat==-9999]=NA
		dat=dat/10
		rawd=ts(dat,start=startyear,deltat=1/12)
		if(max(time(rawd))>=2011)
		{
			print ("error series")
			rawd=NA
		}

		if(!is.ts(allraw))
		{
			allraw=rawd
		}
		else
		{
			allraw=ts.union(allraw,rawd)
		}
	}

	nc=ncol(allraw)
	matchv=array(NA,dim=c(nc,nc))
	mv=rep(NA,nc)
	if(nc>1)
	{
		for (i in 1:nc)
		{
			for (j in 1:nc)
			{
				dt=allraw[,i]-allraw[,j]
				mask=!is.na(dt)
				if (sum(mask,na.rm=TRUE)>2)
				{
					if (mean(abs(dt),na.rm=TRUE)<.2)
					{#data is the same
						matchv[i,j]=j
					}
				}
			}
		}
		index=1
		for (i in 1:nc)
		{
			mask=!is.na(matchv[i,])
			if( sum(!is.na(mv[mask]))>0 )
			{#found match to already matched number
				mv[mask]=min(mv[mask],na.rm=TRUE)
			}else{
				mv[mask]=index
				index=index+1
			}
		}
	}else{
		mv=1
	}

	srs=levels(factor(mv))
	outar=array(NA,dim=c(nrow(allraw),length(srs)))
	index=1

	for (i in srs)
	{
		mask=i==mv
		if(!is.null(ncol(allraw[,mask])))
		{
			outar[,index]=rowMeans(allraw[,mask],na.rm=TRUE)
		}else{
			outar[,index]=allraw[,mask]
		}

		index=index+1
	}
	outar=ts(outar,start=time(allraw)[1],deltat=1/12)

	outara=calc.anom(outar)
	if(!is.null(ncol(outara)))
	{
		final=ts(rowMeans(outara,na.rm=TRUE),start=time(outara)[1],deltat=1/12)
	}else{
		final=outara
	}
	final
}

A couple of examples are the first two stations listed in the inventory values provided by GHCN.  This one was plotted by Tamino recently, my function came to the same conclusion as he did which is that the four series were all from the same instrument.  Therefore they are averaged together as raw temp data then the seasonal anomaly is calculated.  Tamino likes this one because it shows the warming he wants so badly.

The next station from a nearby sensor in Annaba.

A map  of their locations is shown here, unfortunately I don’t know how to put two locators down yet so you need to drag the map to the left a bit to find Skikda.  Its in the next bay over, basically horizontal from Annaba.

How is it that the first station shows a continuous warming trend since 1978 yet the other very nearby station shows a flat line.  There are literally thousands of calibrated temperature measurements and one diverges substantially from the other since 1978.  It is a strong indicator that something is wrong with one station or the other.   Anyway, in this post I have successfully combined all the GHCN data into a global gridded array.

Here are some of the plots.

Since 1850, the trend was0.033 C/Decade with most of the slope occurring since 1978.  I plotted the trend since 1978 below.

My slope is considerably higher than any of the other raw datasets over this period.  Not bad for a denialist blog eh?  Part of that difference might be due to the fact that I’ve taken the anomaly over a the full length of each series.  This can create artificial offsets. However, this is the plot of GHCN raw data including all it’s warts.  Below, I’ve overlaid the UAH satellite data for comparison UAH in red.

I’ll run a different anomaly baseline later, then we can start to have some fun with the temperature data.  Since so many series have been eliminated from GHCN in recent years,  one thing we can try is plotting the eliminated vs the kept stations for available data to see if EM Smith’s point about a bias in the stations which have been removed is in fact correct.

Below is a plot of the global temperature trend since 1850.

Same thing since 1978

In the last plot of trend since 1978, note how continuous the higher quality US stations are in relationship to the rest of the world.  There are plenty of stations with the same trend as the US, in fact they are probably the dominant number by histogram but the variance by gridcell worldwide is incredible.  Blues next to reds and blacks.  Certainly we can all agree that there are a ton of problems with these sensors.

Anyway, that’s for a later post which I hope will come tomorrow. One thing to mention, this post did weight each gridcell by the cos of the area (actually used the sine function in the calc – same thing) .  The code for compiling the data is below.

sid=levels(factor((ghmean[,2])))

rawcoord=array(0, dim=c(length(sid),2) )
j=1
for (i in sid)
{
	mask=tinv[,2]==i
	rawcoord[j,2]=tinv[mask,5][1]#lat
	rawcoord[j,1]=tinv[mask,6][1]#lon
	j=j+1
}

dt= rep(.5,length(sid))
plt.map(dt,rawcoord,labels=c("","GHCN Raw Stations"))
gridval=array(NA,dim=c(length(sid),2))

#place stations on grid
for(i in 1:length(sid))
{
	gridval[i,1]=as.integer((rawcoord[i,2]+90)/5)+1  #lat
	gridval[i,2]=as.integer((rawcoord[i,1]+180)/5)+1  #lon
}

gridtem=array(NA,dim=c(36,72,161*12))
for(ii in 1:72)
{
	for(jj in 1:36)
	{
		maska=gridval[,2]==ii
		maskb=gridval[,1]==jj
		maskc= maska & maskb

		if( sum(maskc)>0)
		{
			tar=array(NA,dim=c(161*12,sum(maskc)))
			kk=1
			for(i in sid[maskc])
			{
				sta = getstation4(as.numeric(i))
				ind=which(tinv[,2]==i,arr.ind=TRUE)

				if(max(time(sta))>1850 )

				#if(max(time(sta))>1850 & tinv[ind,9]=="R")
				#if(max(time(sta))>1850 & min(time(sta))<1940)
				{

					rawsta=window(sta,start=1850)
					tar[index,kk]=NA
					#if(sum(!is.na(rawsta))>=1200)
					#if (sum(teststa,na.rm==TRUE)==NA)
					{

						an=(rawsta)
						#mm=getbreakpoints(ff(an),.8,length=55)
						plot((an),main=i,xlim=c(1850,2010),ylim=c(-5,5))
						mm[[1]]=FALSE
						if(mm[[1]]==TRUE)
						{
							abline(v=mm[[2]],col="red")
						}

						if(mm[[1]]==FALSE)
						{
							index=as.integer(((time(rawsta)-1850)*12+.02)+1)
							tar[index,kk]=as.vector(rawsta)
						}else
						{
							tar[index,kk]=NA
						}
					}
				}
				kk = kk+1
			}
			gridtem[jj,ii,]=rowMeans(calc.anom(tar),na.rm=TRUE)
			print(jj)

		}
	}
	print ("COLUMN")
	print(ii)
}

glav=rep(NA,161*12)
weight=sin(seq(2.5,180, 5)*(pi)/180)
meanweight=mean(weight)
weight=weight/meanweight
#weight=rep(1,36) # calculate unweighted temp
mask=rep(FALSE,36)
mask[1:18]=TRUE

for(kk in 1: (161*12))
{
	mm=rep(NA,72)
	for(ii in 1:72)
	{
		mm[ii]=mean(gridtem[mask,ii,kk]*weight,na.rm=TRUE)
	}
	glav[kk]=mean(mm,na.rm=TRUE)
}
glava=glav

mask=!mask
for(kk in 1: (161*12))
{
	mm=rep(NA,72)
	for(ii in 1:72)
	{
		mm[ii]=mean(gridtem[mask,ii,kk]*weight,na.rm=TRUE)
	}
	glav[kk]=mean(mm,na.rm=TRUE)
}

glavb=glav

#glav=(glava+glavb)/2

#glavc=glav
#pglava=glava
#pglavb=glavb
#pglav=glav

plot(ts(glav,start=1850,deltat=1/12),main="Gridded Global Temperature GHCN",xlab="Year",ylab="Anomaly Deg C")
#savePlot("c:/agw/global GHCN station gridded temperature.jpg",type="jpg")

15 Responses to “GHCN Gridded Temperature Data”

  1. Nice work.

  2. KevinUK said

    Jeff ID

    Is your lengend on your maps +/- 1 deg.C i.e it is an anomaly or is it an actual trend i.e +/- 1 deg.C/decade for each grid cell?

    Now persoanlly I don’t like anomalising and griddeding data but instead prefer to just look at the trends in the raw/adjusted data for individual stations. I therefore produced a series of ‘interactive maps’ show the trends (in deg.C/century) over a number of different time periods (1880 to 2010, 1880 to 1909, 1910 to 1939, 1940 to 1969 and 1970 to 2010).

    It takes a little while to load the data into these Flash maps but it is well worth the wait. If you receieve a prompt say that it is taking Abobe Plash Player a while to load the data into the map please click the ‘No’ button (perhaps several times) until the full map is loaded. You can then zoom in and out and pam left/right and up/down using the map and if you clcik an individual coloured dot it will show the chart of raw/adjusted data for each time period for that station. It bestto load each map into a separte tab within your browser and that way you can click between tabs and contrast the difference in the maps for each time period. here are some links

    GISS raw data trends 1880 to 2010

    GISS raw data trends 1880 to 1909

    GISS raw data trends 1910 to 1939

    GISS raw data trends 1940 to 1969

    GISS raw data trends 1970 to 2010

    In particular compare the trends during the 1910 to 1939 period with those for the 1970 to 2010 period. No wonder Dr Phil thinks that the warming trends during the 1910 to 1940 and 1970 to 2010 are not statistically significantly different from one another. It looks to me like the 1910 to 1939 trends in the US stations are greater than those for the 1970 to 2010. What do you think?

    Now what is going on with all those ‘dark red’ Canadian stations between 1970 to 2010 and all those ‘dark red’ stations in Greenland, Iceland, northern Norway and Russia between 1910 to 1939? Please note ‘dark red’ is >+5 deg.C/century and ‘dark blue’ is <-5 deg.C/century. Do we really need to 'anomalise and grid' the raw data in order to see clear evidence of (non-global) warming/cooling on an approx 30 year multi-decadel cycle of warming (1910 to 1939) followed by cooling (1940 to 1969) followed by warming (1970 to 2000) followed by cooling (perhaps from 2000 onwards?)? I don't think we do. Ah! but this data hasn't been adjusted yet? What efect do the adjustments have on the trends? Well have a look for yourself.

    GISS raw data trends 1880 to 2010

    GISS raw data trends 1880 to 1909

    GISS raw data trends 1910 to 1939

    GISS raw data trends 1940 to 1969

    GISS raw data trends 1970 to 2010

    See any signifcant differences? Not really as the warming (and cooling) is real, it’s largely in the Northern Hemisphere and is largely Northern Hemisphere winter warming and has very little to do with man’s emissions of CO2 and everything to do with natural climatic variability to the AMO/PDO/ENSO.

    Spencer: Natural variability unexplained in IPCC models>

    Now isn’t it about time for the GCMs to be re-programmed to take full account of natural climatic variability rather to to be deliberately programmed to tell a ‘doom and gloom’ future catastrophic global warming story as they are now?

  3. Espen said

    I bet the sensor in Skikda is located at the airport (yes, it has an airport!). Sometimes I’m wondering if these “global temperature” measures are really measuring international air traffic growth.

  4. Jeff Id said

    #1 thanks, It at least shows that big problems aren’t in the code. This lays the foundation for experimenting with different stations in different regions. Anthony shared a copy of his surfacestations project (which I can’t blog on) that I’ll now be able to run through some code. I’ll be able to look at the elimination of stations by region, go back to Siberia data and do comparisons to the CRU version of each trend.

  5. clivere said

    Slightly off topic but associated with Anthony Watts surface station work that you mention.

    Is anyone geared up to be able to rerun the John V open temp analysis using USHCN current data?

    http://www.opentemp.org/main/

    I recognise that it wont be possible to run it with the latest data from Anthony but I assume the dataset provided by Anthony in September 2007 is available.

    I have had a quick look but am not sure what to do to run it myself.

    The original open temp runs had 3 take home messages.

    1. The output for the good surface stations matched closely with the GISS output. This was very popular with the warmer side of the debate with people like Tim Lambert and John Cook referring to it as a rebuttal to Anthonys work.

    2. There was a 0.35C dicrepancy in trend per century between the good stations (colder) and poor stations (warmer). For me this is a serious discrepancy compared with the Menne paper which implies no discrepancy based on a similar sample size.

    3. There were relativly few good stations compared with a large number of poor stations. Supposedly the small number of good stations (based on lights = 0) were used to heavily adjust the poor stations.

    Given the theory that USHCN vs 2 is now heavily processed in relation USHCN vs 1 I am interested in how this would impact the Open Temp output.

    Also would the dropping of stations in the last couple of years have an impact by reducing the number of good stations?

  6. #5..

    I havent used johnV work since 2007. The problem required more stations than we had at that point.

    When jeffId gets ready I have some ideas that need looking into, but I trust hes thought of most of them

  7. Kenneth Fritsch said

    The link below is to a statistical analysis that was done by a professor of statistics, RomanM, with the CRN evaluations from the Watts team at that point in time.

    John V, as I recall was a very good program writer, but not a statistician. His attempted analysis was woefully inadequate.

    I have wondered why John V’s attempt gets so much recognition and RomanM’s analysis so little.

    I know that I used to hear TCO, in attempts to disparage Watts, make comments on the great John V analysis and its refutation of the import of the Watts team work. I finally was able to pin him down on what he understood about the John V and RomanM analyses. It turns out that he did not understand what John V had done and had not read the RomanM analysis.

    http://climateaudit.org/2008/06/08/homogeneity-adjustment-part-i/#comment-150852

    Ordinarily my reference to Roman’s work comes with a sermonette, but today I will close by simply reminding that the Watts CRN efforts when complete or near complete deserve a good statistical analysis such as RomanM did in the link above.

  8. Carrick said

    I know that I used to hear TCO, in attempts to disparage Watts, make comments on the great John V analysis and its refutation of the import of the Watts team work. I finally was able to pin him down on what he understood about the John V and RomanM analyses. It turns out that he did not understand what John V had done and had not read the RomanM analysis.

    That’s pretty much my experience with this group too.

    It’s a typical example of the “band wagon” effect, whereby somebody, who is otherwise quite intelligence, gloms onto all supporting data and attacks all contrarian data without regard to the quality of either work.

    People like this just make it harder to separate the wheat from the chafe.

  9. Layman Lurker said

    Jeff, Kenneth, and others, take a look at Chad’s latest series of posts which delve into the automated raw temp data adjustment methods. It will take a while to work through all of the posts, but it is well worth it.

    Start with this post: http://treesfortheforest.wordpress.com/2010/02/10/methods-to-combine-station-data/

    then this:
    http://treesfortheforest.wordpress.com/2010/02/13/combining-inhomogeneous-station-data/

    and finally:
    http://treesfortheforest.wordpress.com/2010/02/16/combining-inhomogeneous-station-data-%e2%80%93-part-ii/

  10. Carrick said

    Layman Lurker, I’ve already been following this, but I agree that it’s some beautiful work. It is exactly the sort of reliability and validation suite that is needed.

  11. clivere said

    I have been folowing Chads analysis as well as E M Smith, Zeke H, Nick Stokes and other analysis performed in the past including that by Roman M. I have been following RC and CA from when they both started.

    The John V analysis was to an extent verified by Steve McIntyre. See
    http://web.archive.org/web/20071005040536/http://www.climateaudit.org/?p=2069

    John Vs analysis was liked by the warmer community because it showed a close alignment between GISS and the good stations. There were however issues with low sample size for good stations and issues with coverage and clustering of the samples. In addition some QA problems were apparently corrected for the Menne study which was based on a similar sample.

    I am coming at this from another angle in that the John V output showed a significant 0.35C variation in trend per century between the good and poor stations. For example see comment 380 in the above CA discussion. The Menne study has wiped out that variation and I am trying to understand why.

    There are reasons to believe that USHCN2 is heavily processed relative to USHCN1 and a rerun of the John V output using USHCN2 may help to clarify what is going on. If the USHCN2 is heavily processed it would also potentially explain some other things I am seeing that bother me.

    I expect the Anthony Watts paper will stimulate a lot of discussion and may provide clarification but I have no idea what the ETA is for the paper. If it is still months away then I am looking to figure things out before then based on any prior analysis that is out there.

    Given time I can probably run the Open Temp code myself but I would be starting from scratch and wont really be able to look at it properly myself for a couple of months.

  12. Kenneth Fritsch said

    Clivere, I am most curious why you mention RomanM’s analysis only in passing and then go on about John V’s. Why do not you comment in detail on RomanM’s work?

  13. clivere said

    Kenneth – because that is the one that got a lot of attention, that I am most familiar with and is considered by the warmer community as a “good” piece of work which they still refer to. It should also give me the information I want. I would need to revisit the Roman M work but I dont recall that he does a direct comparison with GISS which I would be interested in.

  14. Carrick said

    Clivere:

    Given time I can probably run the Open Temp code myself but I would be starting from scratch and wont really be able to look at it properly myself for a couple of months.

    You may want to take a look at the Clear Climate Code instead.

    … that is the one that got a lot of attention, that I am most familiar with and is considered by the warmer community as a “good” piece of work which they still refer to

    I just wish he had written up his work even at the level of a coherent blog post. It’s hard to really judge the quality of something presented this piece-meal, don’t you think?

  15. Kenneth Fritsch said

    Clivere, I think one thing that comes through hard and fast when looking at trend data from individual stations is that those trends varying greatly and making any conclusions takes lots of stations and station data. I also found that before 1920 these data sets have considerable missing data.

    The GISS data set for the US is significantly different than the USHCN set by the different ways that GISS and USHCN handle the UHI adjustment. USHCN trends for the US from 1920 to present is statistically and significantly higher than that for GISS (something like 0.14 degrees C per century as I recall).

    Roman’s analysis showed that a number of factors, including population, were important. If I were starting from scratch with GISS data, I would use Roman’s approach. Roman used the USHCN data that I provided him for the analysis poted at CA.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: