CRU #3 – The next step
Posted by Jeff Id on January 3, 2010
This post has a problem, I don’t believe it’s an error on my part but nothing is ruled out yet. I’ve mentioned the problem in the comments but the graphs hook down too much at the end and it doesn’t seem related to short stations or steps in the data as I had surmised below. Despite several hours of looking I haven’t isolated the cause although when using all series rather than the longest ones, I get much better CRU-ish plots from the data. There will be an update for it this afternoon, but I need to run several more 20 minute runs to check out a hunch. -Jeff
Today I’ve worked on the GHCN global datasets. This is a work in progress but I’ve wanted to do this for some time. The work in general of looking at global temperature was brought about because of the fact that Dr. Phil Jones has alleged he made a good dataset , after climategate I wouldn’t trust Jones to hold onto a dollar bill in a candy store. What I’ve learned over the week regarding the surface temperature data is amazing. After referring to Mann08 data as a box of old socks, the surface stations are just as bad. Instead of going forward as I naively hoped, I’m going to have to back up and do more QC to get anything useful from the data and even then it will be difficult to be comfortable with it. You really have to take a moment and think about what that last sentence means for climatology. This project is just beginning but rather than report nothing, we should look at a bit of what I’ve found.
In a previous post I averaged all the GHCN data to get the curve in Figure 1.
This is a very different curve from what CRU presents normally as reproduced in the Figure 2 plot.
Figure 1 isn’t from gridded data so the spatial distribution of temperature stations can affect trends quite a bit. Therefore it was important to first grid the data before averaging the grids into a global mean.
To create a gridded average, I plotted every single dataseries as it was added in, this led to the realization that there are a LOT of stations with steps in the anomaly curves. Steps themselves being the true anomalies in the sensor readings. These are probably caused by moving stations, adding blacktop next to the sensors, adding air conditioning uints, buildings, concrete, switching instruments, instrument failure, etc. So much of this data is complete junk, it’s hard to describe. But I plowed on anyway.
Figure 3 is a plot of the grid weighted average of all the station ID’s used in Figure 1. The data is collected into 5×5 degree area grids and then averaged for each gridcell, after that the globe is averaged together to create the trend. The grid’s are weighted by the cosine of their latitude before averaging so that the corrected area for each block is used.
This means that the one thermometer which may or may not exist in Muslim Somalia, is weighted according to it’s grid area rather than against the hundreds of thermometers that probably exist in the grid block containing New York. There is one difference between Figure 1 and Figure 3 besides the above, I’ve modified the getstation() function to grab the longest existing station data for each station ID. Remember, in the GHCN data, each thermometer station can have several different sets of data. The data come sometimes from the same thermometer and sometimes from a different one even though they have identical ID numbers. This means that quite a few series, many of which were identical copies of the same data (we don’t know), have been discarded in favor of the longest series.
Figure 3 looks a lot more like Figure 1 than the CRU version in Figure 2. I’m not claiming it’s better, there are an awful lot of short term features missing from the curve, but it is apparently what the what raw GHCN data shows.
The station metadata comes with a flag for each station describing the size of town it’s near. The classifications are U – Urban, R – Rural, or S – Small town. Rural means not associated with a town of 10,000 or greater population, S is 10,000- 50,000 and U is everything bigger. It gives an opportunity to do a couple of different plots based on population. Anthony at WUWT would warn us that just because it says Rural, doesn’t mean nobody put a road, runway or AC unit right by the thermometer, but it’s worth a check.
As I wrote above, a lot of the data has large and obviously erroneous steps in it. One QC step which can be handled automatically by an unpaid engineer’s laptop would be to remove data with steps in it. R offers a couple of functions for the detection of steps, unfortunately they run so slowly you wouldn’t see this post for the next two weeks. Therefore, I wrote a crude step detector which removes gaps from time series. It works by fitting a line to a window of the data to check the slope. The window is shifted and another line is fit. If the slope 0f any window exceeds a threshold amount, the series has a step in it, therefore it is not used. A large number of bad series were thrown out using this method and a couple of good ones too. Of course the point of all this is to see how much affect the steps are having on the result.
Figure 6 matches Figure 3 pretty well. The signal is reasonably consistent even when large chunks of step data are thrown out. Again, this does not match CRU very well and I wonder how they chose their station subset. One of the themes at Climate Audit has always been questioning tht rules used to choose one dataset over another. In this case, it seems to make a big difference in the result and the subset chosen by CRU doesn’t match the bulk trend I find in the global surface data well. Don’t forget that this data is land only, and a truly global network needs to be averaged with sea temps to produce a complete global trend.
There is more to the story in this data, including the lack of availability of stations in recent years which may affect the trend in ALL of the graphs above more than we expect. A bad situation considering the money we’re being asked to spend on climate.
The code for this post is getting pretty big now and can be downloaded here – http://www.mediafire.com/file/tydndellqwj/GHCN Reader and CRU comparison 2.R