CRU #3 – The next step

This post has a problem, I don’t believe it’s an error on my part but nothing is ruled out yet. I’ve mentioned the problem in the comments but the graphs hook down too much at the end and it doesn’t seem related to short stations or steps in the data as I had surmised below. Despite several hours of looking I haven’t isolated the cause although when using all series rather than the longest ones, I get much better CRU-ish plots from the data. There will be an update for it this afternoon, but I need to run several more 20 minute runs to check out a hunch. -Jeff

————

Today I’ve worked on the GHCN global datasets. This is a work in progress but I’ve wanted to do this for some time. The work in general of looking at global temperature was brought about because of the fact that Dr. Phil Jones has alleged he made a good dataset , after climategate I wouldn’t trust Jones to hold onto a dollar bill in a candy store. What I’ve learned over the week regarding the surface temperature data is amazing. After referring to Mann08 data as a box of old socks, the surface stations are just as bad. Instead of going forward as I naively hoped, I’m going to have to back up and do more QC to get anything useful from the data and even then it will be difficult to be comfortable with it. You really have to take a moment and think about what that last sentence means for climatology. This project is just beginning but rather than report nothing, we should look at a bit of what I’ve found.

In a previous post I averaged all the GHCN data to get the curve in Figure 1.

all gchn raw CRU — Figure 1 - all data unweighted.

This is a very different curve from what CRU presents normally as reproduced in the Figure 2 plot.

Figure 2 - Gridded data using selected stations from CRU

Figure 1 isn’t from gridded data so the spatial distribution of temperature stations can affect trends quite a bit. Therefore it was important to first grid the data before averaging the grids into a global mean.

To create a gridded average, I plotted every single dataseries as it was added in, this led to the realization that there are a LOT of stations with steps in the anomaly curves. Steps themselves being the true anomalies in the sensor readings. These are probably caused by moving stations, adding blacktop next to the sensors, adding air conditioning uints, buildings, concrete, switching instruments, instrument failure, etc. So much of this data is complete junk, it’s hard to describe. But I plowed on anyway.

Figure 3 is a plot of the grid weighted average of all the station ID’s used in Figure 1. The data is collected into 5×5 degree area grids and then averaged for each gridcell, after that the globe is averaged together to create the trend. The grid’s are weighted by the cosine of their latitude before averaging so that the corrected area for each block is used.

This means that the one thermometer which may or may not exist in Muslim Somalia, is weighted according to it’s grid area rather than against the hundreds of thermometers that probably exist in the grid block containing New York. There is one difference between Figure 1 and Figure 3 besides the above, I’ve modified the getstation() function to grab the longest existing station data for each station ID. Remember, in the GHCN data, each thermometer station can have several different sets of data. The data come sometimes from the same thermometer and sometimes from a different one even though they have identical ID numbers. This means that quite a few series, many of which were identical copies of the same data (we don’t know), have been discarded in favor of the longest series.

global GHCN station average from Gridded all data — Figure 3

Figure 3 looks a lot more like Figure 1 than the CRU version in Figure 2. I’m not claiming it’s better, there are an awful lot of short term features missing from the curve, but it is apparently what the what raw GHCN data shows.

The station metadata comes with a flag for each station describing the size of town it’s near. The classifications are U – Urban, R – Rural, or S – Small town. Rural means not associated with a town of 10,000 or greater population, S is 10,000- 50,000 and U is everything bigger. It gives an opportunity to do a couple of different plots based on population. Anthony at WUWT would warn us that just because it says Rural, doesn’t mean nobody put a road, runway or AC unit right by the thermometer, but it’s worth a check.

global GHCN station average from Gridded Urban data — Figure 4 - Urban Station Gridded Data

global GHCN station average from Gridded Rural data — Figure 5 - Rural Station Gridded Data

As I wrote above, a lot of the data has large and obviously erroneous steps in it. One QC step which can be handled automatically by an unpaid engineer’s laptop would be to remove data with steps in it. R offers a couple of functions for the detection of steps, unfortunately they run so slowly you wouldn’t see this post for the next two weeks. Therefore, I wrote a crude step detector which removes gaps from time series. It works by fitting a line to a window of the data to check the slope. The window is shifted and another line is fit. If the slope 0f any window exceeds a threshold amount, the series has a step in it, therefore it is not used. A large number of bad series were thrown out using this method and a couple of good ones too. Of course the point of all this is to see how much affect the steps are having on the result.

global GHCN station average from Gridded all step sorted — Figure 6 - All data step sorted

Figure 6 matches Figure 3 pretty well. The signal is reasonably consistent even when large chunks of step data are thrown out. Again, this does not match CRU very well and I wonder how they chose their station subset. One of the themes at Climate Audit has always been questioning tht rules used to choose one dataset over another. In this case, it seems to make a big difference in the result and the subset chosen by CRU doesn’t match the bulk trend I find in the global surface data well. Don’t forget that this data is land only, and a truly global network needs to be averaged with sea temps to produce a complete global trend.

There is more to the story in this data, including the lack of availability of stations in recent years which may affect the trend in ALL of the graphs above more than we expect. A bad situation considering the money we’re being asked to spend on climate.

The code for this post is getting pretty big now and can be downloaded here – http://www.mediafire.com/file/tydndellqwj/GHCN Reader and CRU comparison 2.R

67 thoughts on “CRU #3 – The next step”

Viv Evans says:

January 3, 2010 at 11:02 am

I am amazed and deeply grateful for all the hard slog you’ve put in, to de-tangle the unholy mess that is the CRU data.

As to why your graphs are so unlike those by CRU, given the same data – CRU couldn’t have turned their graph round by 180 degrees … naaa … they couldn’t have, could they?

Sorry if this is a bit flippant – but with that CRU lot, who knows! Anything might be possible, except proper analysis …

Reply
cogito says:

January 3, 2010 at 11:21 am

Jeff, I’m amazed how one single person (you) can program the algorithms needed to grid the data, including corrections for missing or spurious data, in such a short time, when the CRU has spent … what, 20 years … to develop their program. And they have tremendous computer power, not just a PC.
I wonder why the haven’t cleaned up their databases during that time, knowing how crucial the basic data is.

Reply
BarryW says:

January 3, 2010 at 11:26 am

It would be interesting to see the differences graphed to see where the two computations disagree. At first glance the fall off in your graphs at the end compared to what CRU is showing is almost a “hide the decline” situation (by adjustment).

Reply
George Crews says:

January 3, 2010 at 11:54 am

Hi Jeff,

Why risk increasing error rather than reducing it? IMHO, we ought to be reluctant to grid the data until such a time as we have a validated climate model ensemble that can model regional (i.e., grid cell) climate change differences within a known error bound. Otherwise no grid weighting scheme can be logically (much less objectively) justified.

Why should we assume every grid cell is equally important for calculating global climate temperature? What objective justification? For example, if the Northern hemisphere’s climate warmed by 10 degrees and the Southern hemisphere’s climate cooled by 10 degrees would we say the overall global climate temperature remained the same? That would seem arbitrary. A 5 degree temperature change in the Gulf of Mexico might be much more significant than a 5 degree change in the Antarctic. And why weight on area and not some other figure-of-merit such as population density?

George

Reply
Andrew Chantrill says:

January 3, 2010 at 12:12 pm

I did another analysis, using all the data back to 1702.

I downloaded the V2.mean file and opened it in Excel. This gave me about 595,000 station-years of data. I eliminated any year that was not complete (i.e. was missing one or more months) and calculated the annual averages.

I then created a pivot table of stations (13,500 rows) and years (308 columns). I then calculated the temperature change from one year to the next. (This has the advantage of eliminating the need to correct for location; one is only looking at changes not absolute temperatures. It also has the advantage that any step change (location, instrumentation, concrete etc only affects one year’s data).

I then averaged the temperature changes for each year and finally summed them to ‘integrate’ back to anomaly. I also calculated an 11-year (to minimise effects of solar cycles) centred rolling average.

The results (pdf file) are at:

http://files.me.com/achantrill/ivkxm3

and are, I think, interesting.

Comments, anyone?

A

Reply
Douglas Hoyt says:

January 3, 2010 at 12:27 pm

I did an analysis of the global temperature anomalies back in 1999 using the cruwlda2 file. It agrees pretty well with Figure 6 above with temperatures in the 1970s being about the same as those around 1900.

It was not a big project. I used Fortran and from the moment I wrote the first line until it was finished and debugged, it took 4 hours. The program has about 200 lines, including comments. It uses equal area grid boxes rather than 5×5 grids.

Reply
Nick Moon says:

January 3, 2010 at 12:38 pm

I did something similar, a couple of weeks back. I seem to recall getting a graph very like your figure 1. If you just sum the raw data together, then, there is a lot of variation pre-1900. But otherwise the trend is pretty much flat.

It’s interesting, that according to a simple average of the raw data there is no evidence of global warming – whatsoever. This implies that in those parts of the world, or to be exact those 5deg grid-cells, where we have loads of data – there is no warming. The warming is only happening in places where we don’t have that much data. Only by de-emphasising the data from those places with the best, longest records can we get any warming. Any good scientist, who should always be sceptical, would have to be a little concerned about that step.

Course, simply taking a global average of all data is clearly wrong.

I did think of a different approach. This should produce a sane global average measurement but avoid doing gridding. Work out the average trend for each country, country codes being available, in the GHCN data. Thne avergae them together – but weighting them by country surface area – as taken from say the CIA world fact book.

Now that should provide a legitimate measure of global land temperature. And it should be fairly similar to the CRU land temperature taken from, their gridded data. But I wonder if it is.

Reply
vjones says:

January 3, 2010 at 12:45 pm

@Andrew Chantrill – that’s a lovely figure. Would it be possible to use it in a blog post? It fits very well with one I’m writing, commenting on what Jeff is doing and E.M.Smith’s work (e.g. http://chiefio.wordpress.com/2009/08/17/thermometer-years-by-latitude-warm-globe/).

What is interesting to me is that people are using different means of analysis and getting very similar results (that is always good). The pivot table approach is a good one. I’m invoilved in a database approach (although I’m not the one doing all the technical stuff). Do email me if you are interested: verityajones [at] gmail.com

Reply
vjones says:

January 3, 2010 at 12:49 pm

Jeff,
not to minimise your effort in any way of course, but it looks as if CRU needed a sledgehammer to crack a nut.

Reply
KevinM says:

January 3, 2010 at 1:01 pm

Andrew Chantrill,

I like your method. Gridding sounded stupid to me at first, but it is not as simple as you made it either.You are overweighting and underweighting areas.

If you have 1000 thermometers in New York, and one in DesMoines you are replicating data to the point that every real .01 degree anamoly in New York has 1000 times the effect of a .01 degree anomoly in DesMoines.

Keep the ‘anomoly only’ method, but pick only one thermometer per arbitrarily defined area. A 100 mile square for instance. I don’t care where the thermometers are, and I agree that anomoly-only is the way to go, but you have to stop overweighting.

Reply
Murray says:

January 3, 2010 at 1:12 pm

Eyeball observations:
Fig 3 – ca 1890 to 1938, cooling to 1967, warming to 1998, then sharp cooling back to 1850s/1950s level. Correlation with increasing CO2 concentration probably becomes statististically insignificant.
Fig 4 – Diminishes 1930s to 1960s cooling, looks more like 100 year warming, in the order of 0.8 degrees C.
fig – 5 emphasizes cooling 1855 to 1910, probably makes CO2 correlation even worse.

Relative to the selection used, your use of long records only would eliminate Chiefio’s “march of the thermometers” and AHI effects, while leaving in most UHI effects. This might be the biggest difference between you and CRU after removing positively biased steps.
How does the rapid warming ca 1967-1998 match earlier solar activity with an appropriate lag for ocean heating effect?

It looks to me that with all positive biases corrected there is no overall warming at all in the last 150 or so years, but a warming bump that is probably solar. I think you have destroyed the CO2 correlation. Murray

Reply
Andrew Chantrill says:

January 3, 2010 at 1:13 pm

KevinM,

I’m guilty as charged for overweighting and underweighting areas. But as far as I see, once one starts selecting data one is introducing a bias.

I also wanted to use all the data available, not just a sub-set.

And finally, if the supposed warming really is ‘global’ then it shouldn’t make too much difference. (Of course we know what warming there is isn’t global, and is affected by UHI etc)

I’m not claiming my analysis is the definitive one, but it certainly raises more questions about the warming being unprecedented.

A

Reply
LGardy LaRoche says:

January 3, 2010 at 1:38 pm

@Andrew Chantrill,
Would you have, per chance, fitted a trend line to the temperature change curve?

Reply
Jeff Id says:

January 3, 2010 at 1:45 pm

#4, It makes sense, I think, to weight according to area. If you have different regions of the globe reacting differently to ‘the global warming’ you wouldn’t want it all to depend on the thousands of temperature stations in the US. By gridding the stations high density areas are not overweighted in proportion to the globe.

Reply
Andrew Chantrill says:

January 3, 2010 at 1:52 pm

LGardy LaRoche,

No I haven’t fitted a line to it other than the rolling 11-Year average.

I have however also looked at sea levels as a proxy for temperature and made the following chart:

http://files.me.com/achantrill/xdvm3p

I plotted sea level in 3 series; from tide data up to 1969, from tide data after 1970 and from satellite data (more recent).

The reason for the split in tide data is that it is generally accepted that AGW had no effect until 1970. I wanted to see what the underlying trend was before 1970. (Answer: steeper sea level rise!).

A

Reply
David Starr says:

January 3, 2010 at 2:17 pm

Did you use the data from GHCN v2.mean or v2.mean_adj? I did plain “ungridded” plots of both. V2.mean_adj shows the last ten years cooling, v2.mean does NOT. I was unable to find any discussion about the adjustments applied to create v2.mean_adj. What ever they were, the data before 1838 got trashed.
Number of stations reporting peaked at around 100,000 in the 1980’s. After that something trimmed the number of stations down to around 14,000. You have to wonder how that happened.
Maps I have seen show that the GHCN data is largely from North American stations, so the plain worldwide average that I computed is more a record of North American temperature history rather than world wide history.
I am considering a simple gridding scheme, divide the world up into slices of 10 degrees latitude and 10 degrees longitude. The slices aren’t equal area, but are easy to compute and close enough for government work. Then average all the readings for each slice, for each year. Then make a second pass computing the averages of all the slices. This gives roughly equal weight to each area of the world. The heavily instrumented North American slices get the same weight in the final world average as the thinly instrumented slices.
I am reluctant to discard temperature readings. You have to assume that all instrument readings contain errors and you don’t know what those errors are. Averaging all the readings tends to cancel out the errors, high readings are averaged with low readings and the average is closer to the right answer. The more points averaged, the better the average.
I’ll grant this only works for randomly distributed errors, a systematic error in one direction, say urban heat island, doesn’t cancel. I don’t believe a step change in a data source necessarily invalidates it. We don’t know if the step change represents some change making the station read better or read worse.

Reply
Andrew Chantrill says:

January 3, 2010 at 2:24 pm

I used v2.mean, the unadjusted one, because I don’t have any confidence in any adjustments 😦

As I said, my aim wasn’t to be the definitive analysis, but just to see if the raw data supported claims of unprecedented warming, which they don’t.

It also shows, just how convenient 1850 is as a start for HADCRUT. 😉

A

Reply
j ferguson says:

January 3, 2010 at 2:25 pm

Andrew Chantrill

Why do you suppose your ghcn trend is so lumpy before 1840?

Reply
Andrew Chantrill says:

January 3, 2010 at 2:34 pm

J Ferguson,

Because there are fewer stations reporting to smooth out the results.

A

Reply
Jeff Id says:

January 3, 2010 at 2:34 pm

#16, David,

In the GHCN data there are many series with sharp obvious steps in them. These are useless in their current form for computing trend. Perhaps if they were split into sections and re-anomalized they could be useful.

Reply
Andrew Chantrill says:

January 3, 2010 at 2:41 pm

In my book it still beats dendrochronology and ice core data…

A 🙂

Reply
Steve Fitzpatrick says:

January 3, 2010 at 2:42 pm

Jeff,

Very interesting post. I have one question: were graphs 4, 5, and 6 grid weighted? I am guessing they were.

The difference between urban and rural curves after 2000 is nothing short of astounding; since rural stations (presumably) don’t need adjustments for UHI effects, the rural-only station graph ought to be a more accurate representation of average land temperature.

Since urban stations are likely more numerous than rural, but represent only a small physical area, when all the stations in a grid cell are combined in an average the more numerous urban stations may give a positive bias to a large fraction of the grid cells.

Reply
vjones says:

January 3, 2010 at 2:48 pm

Andrew,
If you want somewhere to set out properly what you did and how, check your email.

Reply
Jeff Id says:

January 3, 2010 at 2:49 pm

#22, thanks. 3, 4,5 and 6 are gridded. I think the urban stations are typically longer records than the rural which probably has some effect.

Reply
Steve Fitzpatrick says:

January 3, 2010 at 2:52 pm

jeff,

One further question: I imagine that at least some (and maybe a lot) of the land grids are void of station data.. and mainly in extremely cold regions. How can you generate a land average excluding grid cells with no station data? Is this where your analysis and CRU differ? Lots of room for nudging the average upward with manually assigned temperatures for desolate regions.

Reply
Jeff Id says:

January 3, 2010 at 3:02 pm

Andrew,

Your plot looks a lot more like what I would expect from this data than mine. I wonder what creates so much difference. I’ve also just averaged the data in Figure 1. This is all the stations averaged together but for stations with multiple dataseries they were averaged into a single series before averaging with the group. This, of course, changes the weighting. I’m bothered that I’m not getting the same results.

Reply
tarpon says:

January 3, 2010 at 3:03 pm

It’s disturbing to think what could be done with just a small part of the $30 billion spent on concocting the lie would do.

Thanks for your work …

Reply
Jeff Id says:

January 3, 2010 at 3:03 pm

#25, I just leave them blank.

I’m going to try running the same analysis now using just the CRU data, hopefully I get similar results to Figure 2.

Reply
Andrew Chantrill says:

January 3, 2010 at 3:22 pm

Jeff Id,

The reason why I chose my method is that is essentially ignores absolute temperatures. As a result dropping a station in, say, Barbados, and simultaneously adding one in Siberia makes no difference. By contrast, if one just takes the average one would, of course, see a step change.

I fully agree it would be interesting to make some corrections to correct the weighting issue, but that is beyond my skills/time.

A

Reply
Jeff Id says:

January 3, 2010 at 3:30 pm

#29, It’s interesting because I tried a similar method for antarctic temps. There weren’t enough stations so when you integrated a new one in, the other stations would experience a non-negligable step dependent on monthly variance. With higher station count this effect would cancel out. It means that a large portion of what I’m seeing is due to the introduction and removal of individual stations. I suspect that when you have such big shifts in station count as you show in your plot the net effect may also be non-negligible.

Reply
Steve Fitzpatrick says:

January 3, 2010 at 3:40 pm

#28,

Your step sorted graph and that from Hadley are almost the same up to 2000; it’s post-2000 that there is a big divergence. I’m guessing the cause is how Hadley handles void grid cells and the ever falling number of stations in recent years.

Reply
Jonathan says:

January 3, 2010 at 3:48 pm

While “rural” stations will show relatively small UHI effects compared with “urban” stations, this does not mean that the change in UHI will be relatively small at these stations. Recall that “rural” just means local population less than 10,000, and some (many?) of these sites could have gone from being isolated farms to small towns with thousands of people, with large changes in UHI. By contrast urban sites will have mostly been urban for quite a while.

Reply
David Starr says:

January 3, 2010 at 4:49 pm

>Jeff Id said
>January 3, 2010 at 2:34 pm

>#16, David,

>In the GHCN data there are many series with sharp obvious steps in them. These are useless in their current form >for computing trend. Perhaps if they were split into sections and re-anomalized they could be useful.

Jeff,
Tell me about re normalizing. What would you do? Add a corrective factor to the before the step data and another one to the after the step data to zap the step? Or run a low pass filter over the step to smooth it out? Or??

Reply
Jeff Id says:

January 3, 2010 at 4:58 pm

#33 due to the number of stations, I kind of like the idea of taking the derivative and re-integrating. It will take some exploration though, in the Antarctic, I was forced to use quite a bit of data overlap to make sure trends weren’t created from the noise. This means re-integrated data would probably stink on a gridded basis unless substantial overlap existed.

Reply
Jeff Id says:

January 3, 2010 at 5:02 pm

This post has some problems. Damn it’s tiring. I’ll be trying to fix it for the next couple of hours.

Reply
Steve McIntyre says:

January 3, 2010 at 5:03 pm

Jeff, wait until you look at the SST data – you’ll love the bucket adjustments and the Pearl Harbor adjustment event.

Reply
Jeff Id says:

January 3, 2010 at 5:06 pm

Steve,
I did a bit of reading on the buckets almost a year ago, the pearl harbor bit sounds new. If nothing else is learned, we know the data is a nightmare.

Reply
PeterS says:

January 3, 2010 at 5:36 pm

The question of how to process the raw data is not the real issue, IMHO. If the AGW alarmists are correct then we should be seeing a distinctively stronger trend of rising global average temperatures over the past 100 years or so, compared to the previous 100, 200, 300, etc. years. Statistically, I can’t see it, no matter which data set one uses and from whatever available source (primarily ice core data). Sometimes, we need to take a step back to see the forest from the trees. Having said that, I do agree we need to investigate SCIENTIFICALLY how best to analyze thermometer data, and not just trust CRU and others are doing it the right way on pure face value.

Reply
Jim says:

January 3, 2010 at 5:59 pm

*****************
Andrew Chantrill said
January 3, 2010 at 12:12 pm

I did another analysis, using all the data back to 1702.

Comments, anyone?

A
*************
Do the stations past the “hump” in the number of stations tend to migrate south towards warming climes? I believe ChiefIO mentioned that as a possible cause for warmer temps as the number of stations went back down.

Reply
Jim says:

January 3, 2010 at 6:08 pm

************
Andrew Chantrill said
January 3, 2010 at 12:12 pm

I did another analysis, using all the data back to 1702.

A
*************
Sorry, you are working with annual changes only. My mistake.

I thought about doing that on a monthly basis for the New Zealand temp records. After thinking about it for a while, I realized the data would need to be corrected for humidity because the heat capacity of humid air is greater than dry air. So for coastal regions the temp swing on New Zealand would be less than that measured further inland where the humidity is less. It would seem we need to define a standard humidity and correct all stations to that. Then we might have a more representative change in temperature per month or year.

Reply
AJStrata says:

January 3, 2010 at 6:28 pm

Jeff, I am sorry but you are engaged in the same nonsense that brought us global warming in the first place. You cannot ‘grid’ point samples which completely degrade over the distance of about 10 miles and a time span of easily 30 minutes. As I noted in this post what results is 99+% guesstimate using sparse and dodgy sensors.

http://strata-sphere.com/blog/index.php/archives/12118

The idea there is any fact in a solution which is 99+% speculation is a joke.

Reply
Jean Demesure says:

January 3, 2010 at 6:31 pm

The Swiss have published their raw and “homogeinised” temperatures.
ALL stations data have been homogeinised & adjusted UP while you would expect them to be adjusted down to correct artificial warming due to UHI which is statistically absurd !
But at least, Meteo Suisse has beeen honest to make public all their data.

Reply
Pingback: Demonising the people, self worship of the leaders! « TWAWKI
Charlie says:

January 3, 2010 at 7:44 pm

Jeff — would it be possible to do gridded versions of figs 3-6.

As you ntoed, the straight averages of all thermometers aren’t very informative because of the differing numbers of thermometers in different areas.

Reply
Tim G says:

January 3, 2010 at 8:18 pm

I could be way off on this, but: should you scale by cosine or cosine squared? Wouldn’t the length and width of a gridcell scale with the cosine of the angle?

Maybe I’m just having a brainfart. I don’t know.

–t

Reply
Peter S2 says:

January 3, 2010 at 8:31 pm

Layman’s question here – isn’t it possible to hunt down raw data from a number of long term stations which are known to be purely ‘rural’ (ie with no urban influence)? Surely it wouldn’t matter if only 50 to 100 stations could be found, spread around the globe very roughly equidistant to each other. From this a low resolution – but trustworthy – picture of global land temperature could be found for the last 150 years?

Just wonderin’.

Reply
BarryW says:

January 3, 2010 at 9:11 pm

45 No Jeff is right to use the cosine only. The difference in latitude stays the same no matter how close you get to the pole (about 60 nmi) but the longitude lines get closer together. The shape is trapezoidal (not quite since it’s on a sphere). area = ((b1 + b2) / 2)h where h is the change in latitude in nmi and b1 and b2 is the change in longitude in nmi at the lower and upper latitudes of the grid cell.

Reply
Carrick says:

January 3, 2010 at 9:20 pm

Jeff, the idea of averaging temperature trends rather than temperature anomalies is probably a good one. I’ve thought about this one myself, and it does fix one major malfunction in combing temperature data. Biggest issue is differentiation can be noisy so one needs to use a different method for the trend estimate than just e.g. the two-point derivative rule.

Regarding leaving grid points empty… I’m worried that having stations drop out, and missing grid points may be inserting biases into the temperature trend that are much larger than some of the other effects people have been concerned about.

Do you think that e.g. using RegEM to fill in missing grid points may not be a better approach here than just leaving them empty? I’d also advocate for some method like this for replacing multiple individual stations with their grid average

One could even imagine a hybrid method (starting 1980) where we used calibrated satellite data to infill missing stations. Hm… that is starting to sound familiar. LOL.

Reply
rcrejects says:

January 3, 2010 at 10:25 pm

Seems to me that classifying towns of up to 10,000 people as ‘rural’ could be problematic with respect to the delta UHI effect. Towns of 10,000 people will surely have a UHI effect. Think how many cars, airconditioners, houses, factories etc a town of 10,000 people would have.

As well, we have the Anthony Watts concerns about individual station compliance with standards re proximity to buildings, air conditioners, asphalt etc.

To be truly rural, I would have thought you need to consider stations that are remote from any human activity, except perhaps a house or two. And even then, it would have to comply with standards.

Reply
Dr. DoOM says:

January 3, 2010 at 11:08 pm

Personally, I’d love to see an animation of the raw data points color coded for temperature and located precisely on the globe over time.
Raw data on the exact coordinate over time – visual.
No gridding, no averaging, no monkeying with the data.
I’d like to see it naked first.

Reply
NZ Willy says:

January 4, 2010 at 12:53 am

Andrew Chantrill: your graph shows that “warming” commenced circa 1970 just when the number of substations started to be pruned. The more stations removed, the more the warming increased. This is evocative of the Russian complaint that their temperatures had been cherry-picked by CRU.

Reply
boballab says:

January 4, 2010 at 1:48 am

Actually what Jeff is doing is very much inline with bearing out what EM Smith did with GHCN. He also did a test where he only used the long lived stations (around 3,000) and he found no warming trend in them, it was in the 10,000 short lived stations. To me GHCN is the achilles heel to the claim that Phil Jones used for Hadcrut agreeing with GISS and NCDC. They should all agree outside of those small “adjustments” each make, they all use the same bad data, from cherry picked stations!

Reply
NikFromNYC says:

January 4, 2010 at 3:28 am

Sea level results depend on which version of history you use (at http://www.pol.ac.uk/psmsl/author_archive/church_white/). The updated version with data to 2007 (in red) instead of just 2001 removes an abrupt slope change in 1925. The original data has only monthly values that I averaged to give yearly ones. No explanation is given on the data page.

Reply
Andrew Chantrill says:

January 4, 2010 at 3:37 am

I also looked at UHI in the UK, using the same method, from the recently-released HADCRUT data:

http://files.me.com/achantrill/os5wc5

It doesn’t show any clear overall trend, although ‘rural’ did warm the least, and towns of 10,000 to 100,000 showed most warming after about 1988.

A

Reply
Andrew Chantrill says:

January 4, 2010 at 4:00 am

NikFromNYC,

Thanks for the link.

I re-did the sea level chart, using the updated data; the results are however much the same:

http://files.me.com/achantrill/o2hhrr

A 🙂

Reply
PenttiN says:

January 4, 2010 at 5:10 am

Finally data that seems to correlate with an independent analysis! The data you present (Fig. 3, 5 and 5) looks pretty much like the latter part of the 1000 year time series calculated by METLA (Finnish Forest Research Institute http://www.metla.fi/index-en.html and tree ring pages: http://lustiag.pp.fi/). METLA’s researchers (Meilikäinen and Timonen) by the way, don’t agree with the Mann’s “Hockey Stick” and with Briffa’s methods (esp. number of trees needed for proper statistics). Of course this only relates to Northern Europe but still. Have a look here: http://lustiag.pp.fi/metsamessu2008km.pdf (esp. slides 9 and 24). Unfortunately most part of the slides are in Finnish (with some English) but I guess you can find it in English, too, somewhere from the main pages. I hope you find this interesting.

Reply
KevinM says:

January 4, 2010 at 10:43 am

So now we should start to develop at least a little sympathy forJones et. al.

Clearly we see that _the signal_ is about flat, while we have _noise_ in the form of UHI hockey sticks. We also have unexplained downward adjustments of older data, and unexplained upward adjustments of newer data.

My natural inclination is to remove these unexplained adjustments that create upslope, and to clip off the later part of series with UHI hockeysticks, or somehow generate a normalized UHI-vs-population density curve to _correct_ the UHI hockeysticks.

Then I could get the approximately flat temperature anomaly curve that appears in some rural stations I’ve carefully confirmed are not corrupted by the _noise_ and measurement errors. Its the result I believe I would see from good data.

That wouldn’t be a problem, _right_?

Reply
ruhroh says:

January 4, 2010 at 3:20 pm

OK, I finally figured out how to plot colored dots on the globe.
I put some examples up for comment.

I used some of Cheifio’s quartiles (of record length) data.

Happy to inform interested parties on how it was done, or plot something more useful…
TL
http://www.globalmapperforum.com/forums/global-mapper-showcase/4933-global-thermometer-record-lengths-colored-dots-post12790.html

Reply
Gary Morton says:

January 4, 2010 at 3:52 pm

Another approach would be to:

1. Find all raw records for every station within each 1 X 1 degree grid cell (or 5 X 5, if not enough records for 1 X 1) which have homogeneous data for at least 1 decade (no changes in station location, instrument, etc., during the decade). Call these “qualified decadal records.”

2. Average all QDFs in each cell for each decade, to create a decadal trend segment for that decade/cell. A variation here could be to omit from the average any station designated “U”, to see what difference it makes.

3. Splice the QDF averages in each cell end-to-end, to create a 100+ year trend for the cell. That would remove any steps. While the absolute temps for the period would be questionable, the trend —- which is what interests us — should be valid.

4. Average the trends in the cells to produce trends over wider areas, such as latitudinal bands.

An important point, as PeterS (#38) suggested, is that the *trend* is what we wish to know, not the “average global temperature,” which is probably not deducible from the available data. Even clear trends over limited but globally well-dispersed areas, such as N. America, Europe, NZ, Oz, and anywhere else a reasonably reliable record exists would be informative as samples of the global trend.

(Idea of using and splicing decadal trends first proposed by commenter “supercritical” at WUWT, AFIK).

Reply
Kenneth Fritsch says:

January 4, 2010 at 6:43 pm

Jeff ID, I am not comprehending very well what you have extracted from the GHCN data base -realizing that that situation may be more my problem/confusion than your problem. In order to resolve my confusion, I have taken some text KNMI data and converted to a table with some R code listed below.

First let me say that I hope we can continue here at TAV to compare notes and get down to some detailed analyses of these available temperature data sets. That will require an initial understanding of what the data really consists.

I attempted to download the KNMI data into Excel and that did not work. What was required for me was to register at KNMI here: http://climexp.knmi.nl/start.cgi?someone@somewhere

On the same page on right side you can click on Monthly Station Data to go here:
http://climexp.knmi.nl/selectstation.cgi?someone@somewhere

In order to get an url to download to R you need to click on “full lists” to get here:
http://climexp.knmi.nl/allstationsform.cgi?someone@somewhere
then you right click on mean temperature under GHCN(adjusted), for example, and obtain the url from the copy short cut under properties to get in this example the following (only with your registered email in place of someone@somewhere:
http://climexp.knmi.nl/allstations.cgi?someone@somewhere+temperature+12

I have gone into detail here to show how to use KNMI with R because I think KNMI, if shown to be reliable, is a great one stop shop for climate data.

To get KNMI temperature series data go here:
http://climexp.knmi.nl/selectfield_obs.cgi?someone@somewhere

Linked directly below are 3 graphs showing the historical GHCN (adjusted and all) and CRUTEM3 station numbers. I do not see my numbers agreeing with you have posted, Jeff, and I would like to resolve that issue. My numbers were determined using the station start and end dates and assuming that the station operated continuously in between. I know that is not the case, but the very small amount of unaccounted in between years would decrease the numbers very little. The graph with a direct comparison of the CRU and GHCN all station numbers is similar but with significant differences over the range that is depicted.

I also found that the GHCN all station number includes all of the adjusted GHCN stations.

The station codes with a decimal on the end are supposed to indicate stations nearby a WMO station. If the individual station codes are tracked, including the decimal ending ones, they all appear unique with regard to latitude, longitude and altitude and name. Most decimal ending stations are connected to a station with no decimal ending, but not all, by any means. A decimal ending station can have a 1 up to more than 5 suffixes. These nearby stations are nearby as stated but often have shorter time periods covered and shorter than the WMO “mother” station. At this point I am not sure whether these stations should be integrated into the mother WMO station. Some nearby stations, as noted before, do not have any mother WMO station.

The sudden fall off in the number of adjusted GHCN stations in recent years must have something to do with a lag in adjusting all the stations as I cannot see a global temperature data set being meaningful with only a few hundred stations. Since the CRU number of stations is larger in recent years, and if they use primarily GHCN stations, then it would be obvious that CRU does their own adjustments to unadjusted GHCN data.

I need to look in more detail at these data and for explanations of what I am looking at.

#R code for Adjusted and All GHCN Station Data:

download.file(“http://climexp.knmi.nl/allstations.cgi?kenfritsch@sbcglobal.net+temperature+12″,”knmistat”)
DatK=readLines(“knmistat”)
Index=grep(“coordinates:”,DatK)
Coor=DatK[Index]
Lat=substr(Coor,14,19)
Lon=substr(Coor,23,29)
Alt=substr(Coor,45,49)
Index=grep(“years with data in”,DatK)
Date=DatK[Index]
Years=substr(Date,8,10)
Start=substr(Date,31,34)
End=substr(Date,36,39)
Index=grep(“coordinates:”,DatK)-1
Loc=DatK[Index]
Index=grep(“coordinates:”,DatK)+1
CodeData=DatK[Index]
Index=grep(“coordinates:”,DatK)+2
Pop=DatK[Index]
Index=grep(“coordinates:”,DatK)+3
Terr=DatK[Index]
TabAdjGHCN=cbind(Loc,Lat,Lon,Alt,Start,End,Years,Pop,Terr,CodeData)
save(TabAdjGHCN, file=”TabAdjGHCN”)
load(“TabAdjGHCN”)

#Do same for All GHCN Stations

download.file(“http://climexp.knmi.nl/allstations.cgi?kenfritsch@sbcglobal.net+temperature_all+12″,”knmistat”)
#Same code here as for adjGHCN above
TabAllGHCN=cbind(Loc,Lat,Lon,Alt,Start,End,Years,Pop,Terr,CodeData)
save(TabAllGHCN, file=”TabAllGHCN”)
load(“TabAllGHCN”)
CodeR=unlist(strsplit(TabAllGHCN[,10],”code:”))
CodeS=grep(“<a",CodeR)
CodeT=CodeR[CodeS]
CodeU=unlist(strsplit (CodeT,"\\(<a"))
Code=as.numeric(substr(CodeU[seq(from=1, to=length(CodeU)-1,by=2)],2,8))
TabAllCodeGHCN=cbind(Code,TabAllGHCN)
write.csv(TabAllCodeGHCN,file="TllCGHCNCVS")

StatYear=TabAllGHCN[,5:6]
CumStat= c(StatYear[1,1]:StatYear[1,2])
for(i in 2:length(StatYear[,1])){
CumStati=c(StatYear[i,1]:StatYear[i,2])
CumStat=c(CumStati,CumStat)
CumStat
}
TabStat=as.data.frame(table(CumStat))

Reply
Tonyb says:

January 4, 2010 at 8:30 pm

Hi Jeff

The excellent plots you have done start to make more sense if people view them in a broader historic context, as the latest in a never ending series of climatic peaks and troughs.

As well as the peaks you show there were also very notable ones around 1830, 1770 and 1730 within the instrumental records.

http://climatereason.com/LittleIceAgeThermometers/

There has been a very slow and general rise in temperatures from the depths of the little Ice age which commenced around 1350 and had a very large spike downwards in 1683. The MWP finished around 1290 so there was a very sharp deterioration of several degrees in a fifty year period which it has taken hundreds of years to recover from.

Unfortunately the Giss or Hadley records 1880/1850)represent just a tiny snapshot and don’t enable us to see the full extent of the peaks and troughs which would enable us to view our climate history in its proper context.

Tonyb

Reply
Jeff Id says:

January 4, 2010 at 8:39 pm

Kenneth, thanks for the code and instructions. It looks like it might be a better source.

Reply
Amber says:

January 5, 2010 at 3:05 am

FYI

Annual Australian Climate Statement 2009
http://www.bom.gov.au/announcements/media_releases/climate/change/20100105.shtml
Graphs and summaries.

On that page is also a link to Australia’s weather stations
http://www.bom.gov.au/cgi-bin/climate/hqsites/site_networks.cgi
Enable javascript and choose what is displayed.

Reply
Derek says:

January 5, 2010 at 6:23 pm

Andrew,

I empathize with your desire to not “select” thermometers as some of the AGW proponents have done. What about averaging the anomalies from reporting thermometers within a given regional block and using that? I know the area I live in is renowned for microclimates — it’s not at all unusual to see 5 or 6 different temperature and precipitation patterns within the city. This is one of the reasons I have huge issues with Hansen & Schmidt’s assumption they can use a single thermometer to represent a very wide area.

Reply
Kenneth Fritsch says:

January 5, 2010 at 6:28 pm

Jeff ID, I did some checking for duplicate stations in the GHCN All Station data set and found 24 stations that I suspect are duplicates. I ask R (code is below) for duplications based on the same latitude, longitude and altitude. I have linked the tabled results here:

All the duplicates with one exception were with the first 5 code numbers the same. Since these data set owners claim scrupulous quality control, I am surprised to catch this many duplications –and so easily. I need to look at stations now with very close but not identical position coordinates.

I also went back and looked at the percentage of years in between the start and end dates that were missing for the GHCN All data set. It was 6.9%.

I have been reading about the GHCN data sources and have found them listing many sources (see link below – click on Data Sources on left side of page and then click on Table 1) and this can create problems with overlapping stations and data that has to be separated out.
http://www.ncdc.noaa.gov/oa/climate/ghcn-monthly/index.php

Also look here for chiefio’s summary of GHCN station data:
http://chiefio.wordpress.com/2009/02/24/ghcn-global-historical-climate-network/

The link below describes the methods GHCN uses to obtain grid temperature anomalies. The methods are not straight forward and have changed over time.
http://www.ncdc.noaa.gov/oa/climate/research/ghcn/ghcngrid.html

R code:

load(“TabAllGHCN”)
Dup=as.data.frame(TabAllGHCN[,2:4])
Dup1=duplicated(Dup)
Index1=grep(“TRUE”,Dup1)
Dup2=duplicated(Dup, fromLast=TRUE)
Index2=grep(“TRUE”,Dup2)
IndexT=sort(c(Index1,Index2))
Duplicated=TabAllGHCN[IndexT,]
write.csv(Duplicated,file=”GHCNDuplicated”)

Reply
Andrew Chantrill says:

January 6, 2010 at 5:02 am

I also live in a micro-climate (happily warm!), so I understand the importance of taking a representative cross-section.

However I feel that taking an average of 13,500 stations should minimise the impact of any one, especially as my method only looks at changes, rather than absolute temperatures.

I’d like to do an analysis of trend vs population, but I have no ready way (such as a lookup table) of linking station to population. I’ve not used Fortran for over 30 years and am restricted to whatever I can do on a Mac with Excel.

A

Reply
e.m.smith says:

January 30, 2010 at 4:54 am

A couple of short comments.

1) Perhaps the ‘hook down’ at the end is valid. We have had snow in the south of France, the UK is a while blanket. I’ve had poor tomato ‘set’ for 2 or 3 years (cold causes this). Australia has had snow in Summer on the mountain peaks. Folks are dying from cold in Peru. The list goes on. Is it really that hard to think that the sun went cool a few years back and we’ve been dropping since then?

2) The “step up” in the ruraL graph. Many of the “rural” flagged stations are airports. Airports have been increasing as a percentage of the record. I think you are just finding that there are a lot of airports in the rural ‘batch’ and that they warmed a lot with the onset of the Jet Age.

http://chiefio.wordpress.com/2009/08/23/gistemp-fixes-uhi-using-airports-as-rural/

you might want to redo the graph but with the “A” airstation flag sites as one group, the non”A” rural as really rural, and then the clearly urban. You might also want to screen the station inventory flag for a description that includes “AIRP” as well:

[ OK. After matching the records by station ID, I could see if any of these “rural reference stations” was, in fact an airport. What I found was about 500 of them (out of 2179 total, so a little under 1/4 of the pristine rural stations used to “remove” UHI effect are airports…)

I say “about” because some of the stations that do NOT have an “A” for “airstn” do self describe as an AIRPORT. And not all AIRPORTS self describe. So it is somewhat unclear just how many of the ones with no “A” flag are also airports. ]

FWIW, temperatures have a strong correlation with jet-a fuel burned:

http://chiefio.wordpress.com/2009/12/15/of-jet-exhaust-and-airport-thermometers-feed-the-heat/

And if you look at the data tables in this next link you will find the percentage of the thermometers that are at air ports tends to increase about the time you get that rise in the “rural” stations (though it varies by country. New Zealand, for example, now has only one thermometer not at an airport, and it is on a tropical island closer to the equator than the main islands…)

http://chiefio.wordpress.com/2009/12/08/ncdc-ghcn-airports-by-year-by-latitude/

So my instinct says that you are on to something, but that it may be tied in with the whole ‘what is rural’ and ‘when is an airport rural’ issue…

E.M.Smith

Reply