CRU #2 – Why we look at data and code

Sorry for the lack of posting lately. I’ve spent the holiday with the family, reading books and playing with the kids. In the background I’ve been continuing the exploration of the CRU stations available from the GHCN dataset. The purpose was just to continue exploring how the global temperature series are created.

First, there are 1643 series of raw data from GHCN which have 6 digit ID’s that correspond to the 1741 stations CRU listed recently on their website. I got this number by having software search through GHCN station ID’s for a match to CRU. My number is slightly different from the station number reported at CA last week, unfortunately there should be no discrepancy. I can’t find any error in the code, but this is a post of general differences over a lot of data so if there are errors in a few stations it wont cause much trouble. This result means GHCN has data for all but 98 of the CRU stations recently published. One question is, how representative of the full GHCN dataset are these 1643 series

The R software presented below, collects data from each of the 1643 CRU stations from the GHCN rather than the “raw” data presented by CRU. In GHCN, many individual station ID’s have multiple sub-stations. The software here simply averages all the substations for a particular ID number and creates a single time series for each station number. Admittedly, this method has very little QC involved and needs improvement but it should give us some baseline understanding of the data as a whole.

The 1643 individual stations are then averaged by month to create the graph below (2 yr gauss filter). Please ignore the trend lines in the following graphs as they have little meaning and were left there for other plots.

gchn raw CRU — Figure 1 CRU Stations - Data from GHCN

The monthly per year station count is important.

You can see the spike at the most recent end of Figure 1 is based on a very few stations, as are temperatures pre-1900. Overall, Figure 1 has an upslope which is visually quite similar to CRU Figure 3, which seems to provide initial verification of CRU accuracy. Remember, in a previous post, we recreated the CRU gridded average from the code and data presented. – Not a bad match between Figures 3 & 1 I think.

GHCN provides adjusted versions of its data as well. The adjusted version supposedly corrects for steps, biases in time of day and such. Unfortunately, not all stations have an adjusted version. Below is an unweighted average of the 917 available adjusted data stations from GHCN which match the CRU codes released.

gchn adjusted cru — Figure 4- Stations adjusted by GHCN

It’s odd that the upslope since 1970 basically disappears from adjusted data. Not at all what we’ve grown to expect from climate science in general, but again it may have to do with regional differences on these selected stations and the net correction may still be positive.

Figure 5 is the station count per year for the adjusted data in Figure 4. Note the huge drop off in stations in recent years, such a serious dropoff has to have an affect station trend but of course we don’t know how much yet. It certainly could be responsible for some of the differences between Figures 4 and 1 but it is a pretty big difference.

So the next question I had was: How representative are 1643 CRU choices, of the original 4495 individual GHCN surface station ID’s available? Again, these are not area weighted and some necessary corrections are missing, so the trend itself is not particularly reliable. Also certain areas (such as the US) are guaranteed to be oversampled in relation to the ‘developing’ nations (hahaha, can’t resist) but it probably gives some indication of what the subsample should look like.

all gchn raw CRU — Figure 6 - All GHCN - UNADJUSTED RAW station data.

The upslope in Figure 6 is again non-existent compared to the stations CRU chose to use. It doesn’t mean a huge amount at this point without area gridding taken into account, but it’s a little fishy looking to me that there is such a large difference between the chosen GHCN and the available GHCN dataset. It’s also interesting that the adjusted version by scientists at GHCN (Figure 4), doesn’t produce the same hockeystick as the raw data selected from CRU. The GHCN raw (Figure 6) matches Figure 4 adjusted much better than CRU Figures 1 or 3.

Station count (by averaged station ID), for all GHCN.

all raw station count GHCN CRU — Figure 7 - Station Count by ID - All GHCN

Finally, just to check the total of the available adjusted GHCN version I plotted Figure 8.

The Adjusted data has a bit of a downslope compared to the the raw again due mostly to too few series in history, but besides that it’s far more similar to raw than the CRU selections. Next step should be gridded averaging of the above data to see what results we get.

I’ve copied the code below in the first comment, WordPress doesn’t allow uploading a linked text file so you need to fix the fancy quotes and download GHCN data to make it work.

45 thoughts on “CRU #2 – Why we look at data and code”

Jeff Id says:

December 30, 2009 at 4:48 pm

The code incorporates work from Steve McIntyre and Ryan O. It’s not the cleanest thing I’ve written but if you download the GHCN to your computer and fix the fancy quotes created by wordpress, it should run turnkey.

source("http://www.climateaudit.info/scripts/utilities.txt")
ff=function(x) { filter.combine.pad(x,truncated.gauss.weights(23) )[,2] } plt.map=function(dat, coord=sat.coord, rng=.5, divs=1000, labels=map.lbls, norm=FALSE, hlite=NA, hcol="#99FF33", mbx=TRUE, mxs=FALSE, x=-.3, y=-.35, sl=1, st=1, sp=1, spl=1, s.main=1, s.sub=1, s.axis=1, cols=NA, legends=NA, sides=0, ppch=15, offs=0, ci=FALSE, ci.dat=NA, ci.col=rgb(.8, 1, .5, .7), ci.pch=13, ci.sp=1, add.pt=NA, add.col="black", add.pch=15) { library(mapproj) ### Set the offset dat=dat-offs ### Set up Antarctic projection and lat/lon lines #par(mar=c(2,2,3,4)) map("world", proj="azequidistant", orient=c(-0, 0, 0)) a=seq(-180, 150, 30) for(i in 1:length(a)) {lines(mapproject(list(x=rep(a[i], 181), y=c(-90:90))), col="gray", lty=3)} a=c(-180,0) for(i in 1:length(a)) {lines(mapproject(list(x=rep(a[i], 181), y=c(-90:90))), col="gray", lty=1)} a=seq(-50,-80,-10) for(i in 1:4) {lines(mapproject(list(y=rep(a[i], 72), x=seq(-177.5, 177.5,5))), col="gray", lty=3)} ### If no user provided color scale, make the color scale if(is.list(cols)) {mapcolors=cols} else {mapcolors=set.colors(rng, divs, hlite, hcol, sides)} ### Normalize the data to a maximum of +/- 1? nm=ifelse(norm==TRUE, max(abs(as.vector(dat))), 1) ### Assign the color scale if(!is.list(cols)) { start.pos=ifelse(sides==0, (divs+2), 1) start.pos=ifelse(sides==-1, length(mapcolors), start.pos) temp=start.pos+(round((divs+1)*(dat/(rng*nm)))) temp=ifelse(templength(mapcolors), mapcolors[length(mapcolors)], mapcolors[temp]) } else {temp=cols} ### Plot points xy=mapproject(coord[, 1], coord[, 2]) points(xy$x, xy$y, pch=ppch, cex=sp, col=temp) ### Confidence interval overlay? if(ci==TRUE) { ci.pts=ifelse(abs(dat)-ci.dat0) { smask[jj]=TRUE } jj=jj+1 } sum(smask)#1643 stations found rawcoord=array(0, dim=c(length(sid),2) ) j=1 for (i in sid) { mask=tinv[,2]==i rawcoord[j,2]=tinv[mask,5][sum(smask)] rawcoord[j,1]=tinv[mask,6][sum(smask)] j=j+1 } dt= rep(.5,27) plt.map(dt,rawcoord,labels=c("","GHCN Raw Stations")) ######### adjusted stations rawsta=rep(0,sum(smask)) index=rep(0,sum(smask)) ln=sum(smask) allrawar=array(NA,dim=c(160*12,ln)) alladjar=array(NA,dim=c(160*12,ln)) combin=0 for(i in 1:sum(smask)) { sta = getstation(as.numeric(cruid[smask,1][i])) rawsta=rowMeans(sta[[1]],na.rm=TRUE) rawsta=window(ts(rawsta,start=min(time(sta[[1]])),deltat=1/12),start=1850,end=c(2009,12)) if(!is.na(sta[[2]])) { adjsta=rowMeans(sta[[2]],na.rm=TRUE) adjsta=window(ts(adjsta,start=min(time(sta[[2]])),deltat=1/12),start=1850,end=c(2009,12)) }else{ adjsta=NA } index=as.integer(((time(rawsta)-1850)*12+.02)+1) allrawar[index,i]=rawsta index=as.integer(((time(adjsta)-1850)*12+.02)+1) alladjar[index,i]=adjsta print(i) } allraw=ts(allrawar,start=1850,deltat=1/12) alladj=ts(alladjar,start=1850,deltat=1/12) araw = calc.anom(allraw) dimnames(araw)[[2]]=cruid[smask,1] aadj = calc.anom(alladj) traw=rowMeans(araw,na.rm=TRUE) traw=ts(traw,start=time(araw)[1],deltat=1/12) traw=calc.anom(traw) plt.avg(ff((traw)),main.t="1643 GHCN Stations Used in CRU\nUnweighted Average",st=1850) #savePlot("c:/agw/gchn raw CRU.jpg",type="jpg") stct=array(NA,length(traw)) for( i in 1:length(stct)) { stct[i]=sum(!is.na(allraw[i,])) } stct=ts(stct,start=1850,deltat=1/12) plot(stct,main="Raw Station Count",ylab="Count",xlab="Year") #savePlot("c:/agw/raw station count GHCN CRU.jpg",type="jpg") cc=colMeans(aadj,na.rm=TRUE) sum(is.na(cc)) #adjusted gh tadj=rowMeans(aadj,na.rm=TRUE) tadj=ts(tadj,start=time(aadj)[1],deltat=1/12) plt.avg(ff(tadj),main.t="917 Remaining Stations Adusted by GHCN Used in CRU\nUnweighted Average",st=1850,en=2010) #savePlot("c:/agw/gchn adjusted cru.jpg",type="jpg") stct=array(NA,length(tadj)) for( i in 1:length(tadj)) { stct[i]=sum(!is.na(aadj[i,])) } stct=ts(stct,start=1850,deltat=1/12) plot(stct,main="Adjusted Station Count",ylab="Count",xlab="Year") #savePlot("c:/agw/adj station count GHCN CRU.jpg",type="jpg") ########################################################## ########################################################## ####### all ghch ########################################################## ########################################################## rawsta=rep(0,length(sid)) index=rep(0,length(sid)) ln=length(sid) allrawar=array(NA,dim=c(210*12,ln)) alladjar=array(NA,dim=c(210*12,ln)) combin=0 for(i in 1:ln) { sta = getstation(as.numeric(sid[i])) if(max(time(sta[[1]]))>1800) { rawsta=rowMeans(sta[[1]],na.rm=TRUE) rawsta=window(ts(rawsta,start=min(time(sta[[1]])),deltat=1/12),start=1800,end=c(2009,12)) if(!is.na(sta[[2]])) { adjsta=rowMeans(sta[[2]],na.rm=TRUE) adjsta=window(ts(adjsta,start=min(time(sta[[2]])),deltat=1/12),start=1800,end=c(2009,12)) }else{ adjsta=NA } }else{ rawsta=NA adjsta=NA } index=as.integer(((time(rawsta)-1800)*12+.02)+1) allrawar[index,i]=rawsta index=as.integer(((time(adjsta)-1800)*12+.02)+1) alladjar[index,i]=adjsta print(i) } allraw=ts(allrawar,start=1850,deltat=1/12) alladj=ts(alladjar,start=1850,deltat=1/12) araw = calc.anom(allraw) dimnames(araw)[[2]]=cruid[smask,1] aadj = calc.anom(alladj) traw=rowMeans(araw,na.rm=TRUE) traw=ts(traw,start=time(araw)[1],deltat=1/12) traw=calc.anom(traw) plt.avg(ff((traw)),main.t="All 4495 GHCN Station ID's\nUnweighted Average",st=1950) #savePlot("c:/agw/all gchn raw CRU.jpg",type="jpg") stct=array(NA,length(traw)) for( i in 1:length(stct)) { stct[i]=sum(!is.na(allraw[i,])) } stct=ts(stct,start=1800,deltat=1/12) plot(window(stct,start=1850),main="Raw Station Count",ylab="Count",xlab="Year") #savePlot("c:/agw/all raw station count GHCN CRU.jpg",type="jpg") cc=colMeans(aadj,na.rm=TRUE) sum(is.na(cc))#2705 #adjusted gh tadj=rowMeans(aadj,na.rm=TRUE) tadj=ts(tadj,start=time(aadj)[1],deltat=1/12) plt.avg(ff(tadj),main.t="2705 Remaining Stations Adusted by GHCN\nUnweighted Average",st=1850,en=2010,y.pos=1.75) #savePlot("c:/agw/all gchn adjusted cru.jpg",type="jpg") stct=array(NA,length(tadj)) for( i in 1:length(tadj)) { stct[i]=sum(!is.na(aadj[i,])) } stct=ts(stct,start=1850,deltat=1/12) plot(stct,main="Adjusted Station Count",ylab="Count",xlab="Year") #savePlot("c:/agw/adj station count GHCN CRU.jpg",type="jpg")

Reply
curious says:

December 30, 2009 at 5:51 pm

Jeff – check the scale on fig 7? I know it’s the New Year, but 2060 already?! 🙂

Reply
PeterS says:

December 30, 2009 at 6:15 pm

Is this just another example of how they are cherry picking?

Reply
Jeff Id says:

December 30, 2009 at 6:18 pm

#2, Man you guys are so picky :D. I fixed the plot.
——-

Actually, considering this and the emails, it doesn’t look to good for CRU to me. It will take a lot of downsloped stations all in one area to compensate for such a large uptrend. Now the US should help but wow.

It’s a big clue IMO but without finishing the job, there isn’t much to say beyond WUWT.

Reply
vjones says:

December 30, 2009 at 6:25 pm

Great analysis Jeff. It will be interesting to see the gridded versions. It is good to see the unadjusted data charted particularly Figure 6.

I’ve been working with the GHCN data in database form. The adjustments are quite horrendous in many cases, and the number of stations adjusted after 1990 is much much lower (even as % of available data), which suggests this is one key part of the warming after 1990 – i.e. that the adjustment alters the prior data, enhancing the post-1990 warming, at least that is what it looks like. I hope to have a series of posts, mostly guest posts, on it soon.

Reply
Pingback: Mother nature doesnt believe lies « TWAWKI
Mescalero says:

December 30, 2009 at 7:53 pm

I’m kind of new to these discussions. Where can I find a mathematical definition of the “temperature anomaly” shown in Figures 1, 3, 4, 6 and 8?

Reply
Sean Houlihane says:

December 30, 2009 at 8:36 pm

Fig 7. has the wrong x-axis.
A thought: maybe the 1998 peak affected either the station selection, or some of the corrections. Your graphs appear to show some divergence there…

Reply
Jeff Id says:

December 30, 2009 at 8:46 pm

#7, Anomaly is a method for removing the seasonal effects. In the case of monthly data, you subtract the average Jan from all Januaries, avg Feb from all Feb’s, all all March’s…. The name leaves something to be desired.

#8 The X axis of fig seven was fixed before you posted. What do you mean? I’m also lost a to the “divergence there” comment.”

Reply
Dave L. says:

December 30, 2009 at 9:34 pm

I think “adjusted” data should be completely ignored. Only work with raw data. If you have to adjust the raw data, it ain’t science.
Also, it would be great if it were possible to work with only rural sites. It urban heat island anomaly could be removed, warming would likely completely disappear. I reside 15 miles from the closest city; on clear winter nights, it is 5-9F colder at my home than reported from the airport at the nearby city.

Reply
Steve Hempell says:

December 30, 2009 at 10:08 pm

Jeff

I don’t know enough about statistics/computer code to know but are you not pursuing somewhat the same idea as Eugene at http://justdata.wordpress.com/. Your figure 6 looks remarkably like his logo which is his analysis depicted in the 1st chart. His analysis of all the GHCN is quite stunning in it’s implications.

By the way, it is nice to see these kind on analysis going on. I’m kind of tired of seeing the discussions of e-mails and FOIA. Seem to just result in food fights.

Reply
boballab says:

December 30, 2009 at 10:12 pm

@#10

Thats Ok Phil Jones has a problem figuring out UHI and how it works.

Chris – I presume the Met Office continually monitor the weather forecasts. Maybe because I’m in my 50s, but the language used in the forecasts seems a bit over the top re the cold. Where I’ve been for the last 20 days (in Norfolk) it doesn’t seem to have been as cold as the forecasts.

I’ve just submitted a paper on the UHI for London – it is 1.6 deg C for the LWC. It comes out to 2.6 deg C for night-time minimums. The BBC forecasts has the countryside 5-6 deg C cooler than city centres on recent nights. The paper shows the UHI hasn’t got any worse since 1901 (based on St James Park and Rothamsted).

As shown Phil can’t figure out that in the real world the people doing the nighly broadcasts are using the “raw” data and that is why it is 5-6 deg C cooler in the rural areas, it hasn’t been adjusted yet to fit his paper.

Also notice how he claims that the UHI for London hasn’t changed since 1901? Here I thought that Asphalt parking lots, Airconditioners, HVAC units, Automobiles and paved roads were more modern then that. Thanks Phil for showing me that they had all that in 1901!

Reply
Jeff Id says:

December 30, 2009 at 10:42 pm

#11, I agree about the emails, but what should we do when the premiere science journals like Nature are making claims that FOIA takes too much time. Where do we go when climate scientists claim ‘hide the decline’ is out of context? How do we address the fact that we’re being lied to from every direction and ignorant money stealing policies are being recommended as a replacement for logic? They’re flat lies and equally as immoral as Mann08.

Its too much so I hide in math.

I very much doubt that gridding will fix the difference in these datasets.

Reply
Steve Hempell says:

December 30, 2009 at 11:16 pm

Jeff

I’m very happy that you, and others are hiding in the math. I think that this is where the most important “stuff” is.

I just got tired of Susann and Norbert.

I think it would be interesting to see an exchange of ideas between yourself and Eugene. He seems keen to have input.

Much appreciate your work – it could be instrumental in getting back to science.

Reply
James A. Donald says:

December 31, 2009 at 1:26 am

This looks like a hockey stick, Possibly the hockey stick comes from urban stations.

Rural proxy information appears to indicate no twentieth century hockey stick

Reply
Pingback: Cosmetic Surgery is the Best Way to Look Younger and Charming | TRANSNATIC.com
Nick Stokes says:

December 31, 2009 at 2:23 am

Jeff,
Do you have any indication that CRU used GHCN adjusted data for anything? I’m not aware of any.

My understanding of the station drop-off with GHCN adjusted data, and the corresponding lack of post-1970 warming, is this. The GHCN adjustment algorithm, which eschews metadata, requires a run of years on each side of a data point to be adjusted, which means that there is virtually no GHCN-adjusted data post-1994. However, for the US they use USHCN adjustment, which uses metadata. So all (I think) post-1994 data is US. That’s why there is a rapid drop to about 100 stations in about 1994, then a much slower drop.

So the adjusted data is virtually useless for recent trends in global temp, and the Peterson paper included with the data does not recommend that it be used for that purpose.

Reply
Pingback: Data Warehouse And Its Applications In Agriculture | TRANSNATIC.com
Rich says:

December 31, 2009 at 7:56 am

Jeff #8
As I understand it, anomalies are calculated by averaging the monthly temperatures for the years 1961-1990 then subtracting that value from each monthly temperature. The details are here: http://data.giss.nasa.gov/gistemp/

It was James Hansen’s original idea that actual temperatures are pretty meaningless (except for a single observer making a single observation) but that “temperature anomalies are strongly correlated out to distances of the order of 1000 km” so trends in anomalies tell us something useful that trends in actual temperatures don’t.

All of this may or may not be true.

Reply
Jeff Id says:

December 31, 2009 at 9:40 am

$19, I apologize for miswriting above. It was so bad, I went back corrected my comment which was to divide by average temp and makes no sense. You are correct of course that you subtract the average from the temperature over the time period of each month.

As far as the meaning of temperature, anomaly temp is important as it allows seasonal changes to be substantially removed. If you use temperature itself to determine trend, you can get almost the same results if you make sure your start and end date are at the same point of the year but it’s not a correct approach to determining climate trend.

Minimizing the sum of squares of the error between the fitted line and the temp curve means that seasonal data far from the fitted line i.e. the hottest point in the winter and summer, have different weighted effects than points closer to the fitted line. This means that the results, while similar, won’t be a perfect match to the anomaly fit.

Besides that point, if you fit a line to the sine wave-ish raw temperature data you get a strong sine wave residual which is highly autocorrelated, meaning you have a poor model fit. Another appropriate solution might be to fit a sine wave to the actual temperature data. You would get a better fit than a simple line, with less autocorrelation in the residuals, but mathematically it’s almost the same as an anomaly style calculation.

Reply
Peter Dunford says:

December 31, 2009 at 9:50 am

Giss use 51-80 for their base period. Selection of base period can have a small effect on the size of anomaly generated, which might explain in part why GISS is usually higher.

Perhaps an item for further into your project, but how many CRU stations have the airport flag? We know from Chiefio that airports do sterling work raising GISS’s average temperatures.

Reply
Paul Maynard says:

December 31, 2009 at 10:16 am

A few questions from a lurker.

I thought anomalies were used so that you can compute a so called GAT. Otherwise, it would be difficult to compare the South Pole with the Equator.

Jeff, I have seen elsewhere on other blogs that GISS uses 60 – 90 as a reference because it enhances the warming effect. I have also asked why we do not use much longer periods as the reference “average” say 1900 -2000 but have not got an answer.

I recently did a lecture to a group of insurance underwriters. One of my standard features is to go to the GISS site and first pick Amundsen which shows no real trend and then ask the audience to pick places at random provided there is a reasonable station length – say 50 years. Many are surprised at the absence of runaway global warming.

Keep up the good work.

Cheers

Paul

Reply
Don B says:

December 31, 2009 at 12:08 pm

If climate “scientists” cherry-pick starting dates to try to make USA state high temperature records show recent warming, rather than show the 1930s as the decadal record holder of high temperature records, then there should be no surprise about duplicity among members of the AGW cult.

Click to access More_Critique_Of_Ncar_Cherry_Picking_Tempeature_Record_Study.pdf

Reply
Luvmy91stang says:

December 31, 2009 at 1:59 pm

Admittedly, I’m no programmer. Is the code you posted for your gridded map project or for the graphs in the blog post? It appears to be for the gridded map project. If this is correct could you post the code for these graphs?

Reply
vjones says:

December 31, 2009 at 1:59 pm

Jeff,
I’ve had another look at this. Figures 4 and 8 are identical; assuming this is not an error, the 917 adjusted station data is therefore highly representative of all the adjusted data, despite the fact that it is only a quarter of the stations. I find this a bit fishy given the magnitude of some of the adjustments (see here).
Leaving that aside for a moment, Figures 1 and 3 are quite similar – It would be noce to see these on the same axis. Does this suggest that any adjustments CRU makes to raw GHCN data are minimal and that the missing 5% (of unidentified stations) doesn’t matter? So is it just their CHOICE of stations (and adjustments – if any) that create the uptick?

Reply
Jeff Id says:

December 31, 2009 at 2:17 pm

You’re right. I’ve updated the last graph and post.

#17 I think is correct about usage of the adjusted data. I don’t think it’s done well.

Reply
Jeff Id says:

December 31, 2009 at 2:18 pm

#25, as to the choice of stations. In the case of the unweighted average, their choice of stations definitely made an upslope. We don’t really know how much effect their choice has quite yet though as we have to area grid and offset anomalies to make a proper timeseries.

Reply
Jeff Id says:

December 31, 2009 at 2:20 pm

#24, I share all code from my R posts but don’t know which ‘gridded map project’ code you’re looking for.

Reply
Luvmy91stang says:

December 31, 2009 at 2:33 pm

Jeff @ #28

Possibly I was confused by this code segment:

### Set up Antarctic projection and lat/lon lines
“…”

At any rate, I tried to run this code and it’s a no go. At first I thought it might be the quotes problem but then I noticed your not reading in the actual data files. At least not that I can find. Not in the code in comment # 1 or in CA’s utilities.txt either. I thought maybe you had posted the wrong code, but like I said, I’m not a programmer. What am I missing here?

Reply
Jeff Id says:

December 31, 2009 at 3:19 pm

#29 WordPress doesn’t seem to like the read file lines. I recopied the whole script and got the same result. It’s the first time I’ve seen that problem, I uploaded the R file to mediafire. You shouldn’t need to reformat.

http://www.mediafire.com/file/mzgvtgizrqy/GHCN Reader and CRU comparison.R

The gridded comments are because this code is a revision of the Antarctic GHCN code.

Reply
Kenneth Fritsch says:

December 31, 2009 at 3:48 pm

Jeff ID, I am in the process of attempting to determine what you have presented in the introduction to this thread means in terms of what I attempting to do with KNMI data linked below:

http://climexp.knmi.nl/getstations.cgi

As I noted earlier I was working with KNMI text data for adjusted GHCN and all GHCN station data and putting it into table form. I have just about completed that task, but in the meantime I am attempting reconcile what you have presented as the current number of CRU stations and what I found at KNMI. I posted this link before and it shows the historical record of CRU station numbers. From KNMI I get a total of adjusted 4471 GHCN stations over historic times and with at least 10 years of data and a total of 7295 GHCN unadjusted stations with the following explanation from KNMI:

The data comes in two flavours, adjusted and all (unadjusted). The first set has been corrected for urban effects and other biases by comparing urban stations to nearby rural stations. The adjusted data are only available for a few countries.
I am not sure from this help note in KNMI whether the adjusted stations are included in the all or unadjusted stations categories as unadjusted stations or that the unadjusted and adjusted stations have no common stations.

If we assume that CRU is using mainly GHCN data, I would think the low number of GHCN stations you show over the historic time period of interest does not agree with what KNMI has for CRU. Some of the difference may be attributed to how a station is qualified for counting but I doubt that the large differences noted here are explained that way.

In your graphs, Jeff, I do not understand what is meant by raw station count and unadjusted station count for GHCN (I assume) and the 917 remaining stations adjusted by GHCN used in CRU unweighted average.

I would like to understand any real and/or my confusion of the differences between these data sources.

In looking at trends I think it can be more insightful to look at different time periods and compare individual station trends. Regional trends should be of interest also.

It is my understanding that GISS and GHCN should be using similar raw data and any differences would be due primarily to their independent(?) adjustments which should also be case for CRU to GHCN and CRU to GISS. If all the so-called independent data sets for temperature can be shown to be using essentially all the same raw data then the only independence of these data sets would be the adjustments. Thus one can reasonably look at the raw data for sources of uncertainty and then again at differences between data sets for statistical differences between adjustments. And all this should be done at regional and individual station levels and segmented time periods. Looking at the data sets averaged over the entire historical record time period and doing it globally does not cut it with me.

Reply
Jeff Id says:

December 31, 2009 at 5:39 pm

Kenneth,

I haven’t had time to study KNMI station info yet. There is just too much to learn, however, IMO there’s no way KNMI has a substantially different set of surface stations. As far as the number of stations, quite a few of the 7000 GHCN series are actually the same data. It’s like they receive a different set from somewhere and instead of looking it over, they just incorporated it.

What we really, really need is a comprehensive review of the thermometer data. It’s absolutely astounding that scientists don’t demand it. We need to have a post about that here. Why don’t the ‘scientists’ care about an accurate and open review of the friggin’ thermometers. Even a half day messing with them and you find they are a complete mess.

Reply
Steve McIntyre says:

December 31, 2009 at 6:34 pm

Jeff, KMNI is merely a data aggregator for station data. No need to worry about it.

Reply
Steve McIntyre says:

December 31, 2009 at 6:56 pm

Jeff, the code is incomplete somewhere. As it sits, for example, dat, sid are undefined. And the initial function plt.map doesn’t seem to terminate. Cheers, Steve

Reply
Jeff Id says:

December 31, 2009 at 8:15 pm

Steve,

Ok, I’ll have to figure out where the code is messed up. The lines I uploaded in #30 are exactly what I ran but the persistent variables in R can sometimes get confusing.

Reply
Kriek says:

January 1, 2010 at 9:02 am

I’m no expert but I’m curious about this anomaly thing. The main issue with picking temperatures on a monthly basis is that all seasons aren’t equal, depending on the direction of sea currents the effects of a season can drift. To me it’s curious that GISS picks a period where the PDO is going one particular way as their baseline.

Reply
Jeff Id says:

January 1, 2010 at 11:52 am

I’ve gone back cleared all objects and updated the software so that it should work now.

http://www.mediafire.com/file/twgybmfmmyj/GHCN Reader and CRU comparison.R

Reply
stan says:

January 1, 2010 at 8:29 pm

Jeff,

Have you checked this out? http://www.warwickhughes.com/blog/?p=405

Reply
Jeff Id says:

January 1, 2010 at 8:46 pm

#38, I had not seen that.

Reply
Kenneth Fritsch says:

January 2, 2010 at 6:32 pm

Jeff ID, that graph from Hughes for CRU station numbers does not agree with I have for CRU from KNMI or your numbers for GHCN. Please note that Hughes graph axis for years is not to scale but simply denotes Jones’ versions.

It will be important that we can agree on station counts for CRU and GHCN. Remember that CRU claims most of their station data comes from GHCN, but since that outfit seems to derive pleasure in confusing people we must ask the question when was most of the CRU data from GHCN: forever, for the past 50 years, 20 years , last year??

Reply
Luvmy91stang says:

January 2, 2010 at 7:41 pm

Jeff, could you throw your stationid.txt (CRU station data)file up somewhere. I can’t tell from your code what is needed there. The data at CRU is sparse and the CA links are broke since the move.

Reply
Luvmy91stang says:

January 5, 2010 at 2:33 am

Bump for 41

Reply
NicL says:

January 9, 2010 at 5:37 pm

Jeff, it has taken me a while to get round to trying to get the data and run the code per your very constructive, excellent post. Having done so I do have a minor few comments on the code, which hopefully will help others to run it.

First, three of Steve M’s other filter utility functions seem to be required, as follows:

filter.combine.pad=function(x,a,M=NA,method=”norm”) {
temp<-!is.na(x)
w<-x[temp]
N<-length(w)
if(is.na(M)) M<-trunc (length(a)/2)+1
if(method=="norm") w<-c (rep(mean(w[1:M]),M),w,rep(mean(w[(N-M+1):N]),M) )
#if(method=="reflect") w<-c ( w[M:1],w,w[M:(N-M+1):N])
if(method=="mannomatic") {w<-c( c (w[1]-(w[(M+1):2]-w[1])),w, c(w[N]- (w[(N-1):(N-M)]-w[N]) ) ) }
if(method=="mann2") w<-c (rep(w[1],M),w,rep(w[N],M) )
y<-filter(w,a,method="convolution",sides=2)
y<-y[(M+1):(length(y)-M)]
z<- rep(NA,length(x))
z[temp]<-y
if(class(x)=="ts") z<-ts(z,start=tsp(x)[1],freq=tsp(x)[3])
filter.combine.pad<-cbind(x,z)
#filter.combine<-y[,2]
filter.combine.pad
}

truncated.gauss.weights<-function(year) {
a<-gaussian.filter.weights(2*year+1,year)
temp0.05*max(a))
a<-a[temp]
a<-a/sum(a)
truncated.gauss.weights<-a
truncated.gauss.weights
}

gaussian.filter.weights<-function(N,year) {
#N number of points; year – is number of years
sigma<-year/6
i<-( (-(N-1)/2):((N-1)/2) ) /sigma
w<- ((2*pi)^-0.5) * exp(-0.5* i^2)
gaussian.filter.weights<-w/sum(w)
gaussian.filter.weights
}

Also, I couldn't get these lines of the code (from the download link in #30) to work :

rawcoord[j,2]=tinv[mask,5][sum(smask)]
rawcoord[j,1]=tinv[mask,6][sum(smask)]

but all the calculations worked fine if I modified them slightly, as follows:

rawcoord[j,2]=(tinv[mask,5])[[1]]
rawcoord[j,1]=(tinv[mask,6])[[1]]

Two other, small, points: I think there is a typo and st should be 1850, not 1950, in

plt.avg(ff((traw)),main.t="All 4495 GHCN Station ID's\nUnweighted Average",st=1950); and

the number of adjusted station for Fig.8 is 1790 not 2705 [missing ! in sum(is.na(cc))]

I also got an error from plt.map(dt,rawcoord,labels=c("","GHCN Raw Stations")) due to absence of the set.legend function; this only affected the map of station locations.

Thanks again for providing such an illuminating post and for sharing the very useful R script.
Nic

Reply
Pingback: TWAWKI » 2009 Climate Gongs and Awards
Antonia Guida says:

December 17, 2015 at 3:55 am

Oh my God. I love Tori Black, she is so hot. Pondering if she is still looking for someone.

Reply