the Air Vent

Because the world needs another opinion

CRU 3b – Urban Warm Bias in GHCN

Posted by Jeff Id on January 5, 2010

So I’ve learned a great deal playing around with the GHCN data, I think this is a reasonably significant post. Ya know, it’s hard to know anything until you try it yourself and I hope more of the readers here will. Again, there was a problem in my last CRU post, however, the more I look for it, the more avenues there are to explore. The issues have been corrected by avoiding the remaining possibilities in this post.

Of all the details worked out over the last two days, one is a decent gridded average of temperature data. Unfortunately for us skeptics it looks like Figure 1 which is pretty similar to the CRU plot.

Figure 1 - all data gridded

Yes, there is warming according to our temp stations, but I don’t think comrade Phil Climategate Jones would like this curve, because the warming in this curve happens entirely after 1975.

It’s nice to see a good quality CRU similar curve after the previous effort, but that’s how things happen when you do your work in public. The plot above uses all the data with each 5 digit temp station code averaged together individually, as my first post did. Anomaly is calculated over the entire series length.

The concern which was explored in some detail, regarded the hypothesis that the loss of stations in recent years created or biased the trend. It came about since so many stations are lost in recent years as Ken Fritsch pointed out in the recent CRU #3 thread.

Figure 2 - Stations per year GHCN per Ken Fritsch from KNMI database

I’ve run dozens of plots over the last several days, some of which contained an error in them created from data selection or a code problem in my previous post. Using the algorithm which averages together individual station ID numbers, I get very consistent CRUesque patterns. the warming is common to a variety of data sorting processes. This methods avoids the issues of data selection or code problems in the other methods and I’m confident in the accuracy of these results, but you should check them.

Several methods were employed to test the consistencey of result, including sorting for Rural and Urban, and sorting for several different time lengths of station data. All varieties so far produced very similar same results. There are, however, interesting revelations from examination of the slight differences.

Figure 3 - 100% Urban Data

Figure 3 is a plot is the urban data only. Of note is that the warming starts at 1978 with only slight warming beforehand and launches up about 1.2 C with no end in sight. Also, 1982 isn’t much reduced from around 1940 which is different from the global average in Figure 1. So the next thing I did was to plot the rural data.

Figure 4 - Rural station data.

That looks a great deal more like the satellite data. The temp rose and fell again prior to 1978 and rose again since 1978 is maybe 0.5C total. I tend to ignore data prior to 1900 due to the very small number of stations. I don’t think the drop in temps to 1900 levels in the early 70’s is the kind of curve that supports the high CO2 sensitivity claimed by climate science. Does anyone remember the snow storms of the early 70’s? Yeah, yeah just weather, I know.

One of the other avenues explored at great length , yet still isn’t finished, was how station starts and stops affect the trend in recent years. To explore that, one of the several methods I used was to sort data according to number of available data points. Below, I presented the gridded global average for all stations with at least 100 years (1200 points) of available data, since many of the stations in Figure 2 were started in 1950.

Figure 5 - Urban gridded data from stations with at least 1200 points

The urban data only in Figure 5 has an even steeper curve, you would expect this from longer series in this type of analyis. The temp rise since 1978 is about 1.2C. The rural 100 year curve is below.

Figure 6 Rural temperature stations with at least 1200 points

So the Rural stations show about 0.7C of warming since 1978. Visibly less warming than the urban stations by themselves. Also note the slight downtrend in recent years. Since the industrial revolution occurred a hundred years ago, it’s hard to imagine this curve is created by CO2. Still I’m not denying the heat capturing ability of CO2, just that the curves here don’t show a continuous warming but rather a short term recent spike.

So of course we should look at the difference between urban and rural stations.

Figure 7 - Difference between urban and rural data as identified by GHCN

Figure 8 - Difference between urban and rural data from GHCN stations with at least 100 yrs of data (1200 monthly points)

Look at that curve! Despite the crudeness of the categorization of thermometers, there is a clear warming bias for big city data. The curve in Figure 8 ends at 0.6C difference. What’s more, the trend between the two looks statistically significant. If Phil Climategate Jones and Michael Marx Mann can choose which data they want to show and hide the rest, I think it’s only fair to choose to look at trends only from Figure 8 since 1978 (even though it won’t make much difference). After all, one hundred percent of global warming has apparently occurred since that time. Let’s do a simple significance test.

Figure 9 - Statistical significance between rural and urban stations

Woah, it’s not even close, a trend of 0.12 and a no trend null hypothesis limit of +0.04. The difference between urban and rural warming is as great as the entire trend in UAH data over the same timeperiod.

Just how much trend do the ground stations show.

Figure 10 - All urban stations different Y scale from 11

Figure 11 - All rural stations different Y scale from 10

Even Figure 11 is still greater than UAH and RSS satellite data but it’s one heck of a lot less than the urban stations. Of course we would be remiss to not mention that WUWT has taught us what rural stations often look like.

What could go wrong with sophisticated technology like that?

The R code for this post is here.

68 Responses to “CRU 3b – Urban Warm Bias in GHCN”

  1. BarryW said

    I’m wondering about what a graph of mmts sensor installation (like the one you show in the photo) would look like compared to when the temps started rising, since the cable runs required them to be closer to buildings

  2. Jeff Id said

    #1 Intersting point. I’m intersted in whether it’s related to the time integral of solar activity right now. If you integrate sunspots, the average activity launches upward since 1978. Now if it’s partially urban bias and partially sunspots, there isn’t much left for CO2, and that wouldn’t surprise me one bit.

  3. KevinM said

    Honesty is the only path.
    Thanks for not hiding.

    Have you dug into the adjustments built into the GHCN data, and do you plan to?

  4. Steve Fitzpatrick said

    Wow, just how much is Big Oil paying you, anyway?

    But seriously, very nice. That rural/urban difference is substantial; do I understand correctly that the urban station data has not been at all corrected for UHI?

    The rural only gridded graph looks very much like the results from comrade Jones et al, save for the pre-1940 period. Hadley and GISS graphs indicate the early part of the 20th century was substantially (~0.2?) colder. It could be that “estimated/interpolated” temperatures used by Hadley and GISS for empty grid cells makes a significant difference (assuming you are not including empty cells in the average).

  5. Jeff Id said

    #4 I’m averaging together all gridcells with values in them equally. Unfortunately this weighting method favors the NH so I plan to do an area weighted plot of the globe in one of the next posts.

    As far as my relationship with big oil, let’s just say the last check was big enough to purchase all the unobtanium (hole filler) futures on the global market. Don’t even try to buy any.

  6. carl said

    got similar results here too using single stations rural vs city

    urban heat island effect methinks and there simply is no good way to calculate or compensate for it with enough statistical accuracy when looking for tiny little temp changes with the quality of data available

    already well known that the majority of US sites do not meet NOAA/NWS specs, I would assume much of the rest of the world is similar

  7. George said

    Another point raised at WUWT has been the gradual replacement of whitewash with latex paint on Stevenson screens. I am guessing that would have happened starting in the 1970’s. And it would have had an interesting impact on the averages as more and more stations are coated with IR transparent latex paint, the individual stations might show a very slight step change. But overall we would see more and more of these stations with the latex being added to the average so the average might seem to increase in a linear fashion. Before all of the stations were coated with latex paint, they would have then began to be replaced with the electronic devices which would have to be moved closer to buildings, etc. So maybe this causes another slight step up that is difficult to see on any individual station but in the average over time as more of these are changed, you continue what looks like a linear ramp up. That is until all the stations are done changing.

    Would plotting stations that still use only mechanical thermometers (can you determine that?) present a different trend?

  8. boballab said

    I have been doing a station by station look for ones near where I live comparing GHCN “raw” to GHCN “output” to GISS “Input to GISS “output”. Its fascinating to see the same temperature change 3 times from the obsevered reading. One little project is a little slow going, I’m going back to the paper records for a station getting both the mean and the max/min readings.

  9. Luvmy91stang said

    Not to belabor the point, but your results are not reproducible without the stationid.txt data you are using. Could you link it or at least explain where you got it and what’s in it, including the structure?

  10. Jeff Id said

    #9 Very sorry about that. I forgot you asked and that it was required. It’s been amazingly busy at work so blogging has taken a back seat.

    The file includes all the CRU station id’s which are required for CRU replication and are not required for this post.

  11. turkeylurkey said

    Hey Jeff;

    I’m getting fairly good at plotting stuff in a geo-referenced way.

    I’m happy to plot anything you have that has lat/lon coordinates in it.

    I’m really curious how the stations get gridded out, especially when they have to reach outside the box to find a station.
    At some point, when you are on the topic, consider making a file that has the coordinates of the stations AND the coordinates of the ‘destination’ of where the data goes to. I can draw little vectors that show the ‘outreach’.

    Anyway, nice work…

  12. Jeff,

    You used GHCN *unadjusted* data? The plots are quite different from Eugene Zeien’s, here:

    He combined all data (unadjusted) from stations within a 1 X 1 degree sector, then averaged.

    Not sure what CRU does, but GISS uses unadjusted GHCN and then applies their own adjustments, getting an even steeper post-1975 trend than yours.

  13. Jeff Id said

    #12 His post is difficult to read, but it also seems to have a serious error. This post is correct in every way I can figure, and my posts do get beat to heck. Often, it’s difficult for me to read others work because it’s not expressed in clear code or methods. At this point, curves substantially different from these here IMVHO will require a bit of an explanation.

  14. tallbloke said

    Jeff, have you seen John Graham-Cummings several posts on the MET data and code?

    He found a bug in their perl script and offers a fix.

    Thought for the day. If CRU don’t think UHI is important, they obviously don’t think the elevated co2 levels in cities cause any additional warming.

  15. greg2213 said

    #14: “Thought for the day. If CRU don’t think UHI is important, they obviously don’t think the elevated co2 levels in cities cause any additional warming.”

    To my layman’s eyes that looks like a Damn Good Point.

    Are there any charts which detail the elevated CO2 in the cities?

  16. Douglas Hoyt said

    My suggestion is to look at each grid box and select the station with the minimum temperature trend for 1979 to present. Hopefully, this procedure would be equivalent to using the most rural station in each grid box and would avoid most of the UHI effects.

  17. Peter Dunford said

    #12 #13: As I understand Eugene’s data, the 613 stations he uses are just those with a continuous 1900 to 2009 record, and are not gridded to a global average and location is not considered.

    Does “All data global temperature average” for figure 1 mean all 4495 stations?

    If so, Figure 6 in the previous post showed a downturn at the end of the series, since around 1980, using an average of all 4495 stations. How has gridding eliminated that? Or was that plot in error (I realise you said it didn’t matter)?

    Last question, is your grigding altering the weighting of the boxes depending on latitude?

  18. Jeff Id said

    #17, All means 4495 station ID codes, including their subgroups as well so there are something like 8000ish (I think) series.

    The boxes are weighted according to the cos of latitude.

    In my previous post, I tried to select the longest record for each gridbox. If it worked correctly, the result might have some meaning. Instead of figuring it out, I tried a variety of other methods which produce results very similar to those here. I need to go back now and see if a particular subset of each gridcell can produce the results of the previous post but it might just be an error somewhere that is difficult to find.

  19. vjones said

    Douglas Hoyt – I would concur. The maps available here may proved useful to anyone considering this. (Further information: )

  20. AMac said

    Unfortunately for us skeptics it looks like Figure 1 which is pretty similar to the CRU plot.

    No. No!

    Perhaps said tongue in cheek, but subtle humor often doesn’t come across as it should over Al Gore’s internet.

    Minor comment — Figures 10 and 11 have different Y-axes, making the post-1978 urban and rural trends look more similar than they actually are.

    Really nice work. And the quick response of #10 to Luvmy91stang’s request for data is a great example for others to follow. (I’ve been reading the 2004 McIntyre/Nature correspondence archived at Climate Audit, which repeatedly focuses on Mann’s refusal to release or explain the data used in MBH98.)

  21. Layman Lurker said

    Great work Jeff. It will be interesting to see what this looks like once you consider the NH bias. Aside from the trend difference, what strikes me is how much the HF wiggles of the rural stations match up with the sat patterns. The same comparison of sat to urban looks much more noisy by my eyeball method.

  22. Douglas Hoyt said

    Another suggestion: use larger equal area grid boxes. For latitude, use acos(0.0), acos(0.1), acos(0.2)…acos(1.0) as the dividing lines. For latitude, use 10 degree widths.

  23. Jeff Id said

    #20, I wrote it with a smile to see who was reading carefully.

    #16, Would you be ok with CRU sorting the thermometers for the highest trend?

  24. Sonicfrog said

    Great post Jeff.

    There is one minor thing to consider revising. On fig’s 10 and eleven, the size of the graphs are the same, but the scaling of the temps is different. To the casual climate web surfer, this makes the slope of the temp rise look the same. I would re-size Fig 10 so that the slope difference is visible when comparing at the graphs.

    I hope this makes sense. I haven’t had breakfast yet, and I’m not sure if my word are coming out in a cogent manner…. I’m not even sure that made sense! 🙂

  25. Jeff Id said

    No time right now to fix scales. It would have been better had I done it before posting.

  26. Varco said

    Jeff, great work. This deserves wide publicity…

  27. Don B said

    Very, very nice work.

    #2 about looking at solar activity..

    A year or so ago I plotted a 3 year moving average of days of geomagnetic aa index greater than 60 – not very sophisticated – but the visual correlation between that and delayed global temps satisfied me that the sun was moderately important. 🙂

    What disappointed me was that (lagged) temperatures did not decline following the aa peak in the early 1990s. But, of course, your rural temperatures did decline.

  28. Hmmm said

    This urban bias will be watered down somewhat when we look at the global land + sea data, correct? (not that this bias should be dismissed of course)

    Makes me want to trust the satellite data only.

  29. Hmmm said

    You know this got me thinking; the land stations and data gets ALLOT of attention. 70% of the surface is over ocean though. Who is looking into this data? Not surprisingly they often have “sparse” data, even today. How many buoys were there in 1880???

    It appears that NOAA at one time had applied satellite data to fill modern gaps but has since stopped using satellite filler, citing a bias that caused problems for users. I couldn’t find out what this bias is on their website. I would love to know what the sea surface temps looked like in gapped grid cells, with and without the satellite filler.

    “Sea surface temperatures are determined using the extended reconstructed sea surface temperature (ERSST) analysis. ERSST uses the most recently available International Comprehensive Ocean-Atmosphere Data Set (ICOADS) and statistical methods that allow stable reconstruction using sparse data. The monthly analysis begins January 1854, but due to very sparse data, no global averages are computed before 1880. With more observations after 1880, the signal is stronger and more consistent over time.”

    “ERSST version 3b is currently used. ERSST version 3 improved upon version 2 in several ways: first, by changing the low-frequency tuning, effectively increasing the sensitivity to data prior to 1930; by internally handling sea ice calculations to increase the timeliness of the dataset; and by using satellite observations to increase data where in-situ measurements are sparse (Smith et al., 2008). In version 3b, the satellite observations were removed from the product because they were found to have introduced a bias that caused problems for many of our users. The bias was strongest in the middle and high latitude Southern Hemisphere where in-situ (ship and buoy) observations are sparse. More detailed information about the switch to version 3b.”

  30. Jeff Id said

    #27, I wrote to Anthony Watts to ask what the official position of the climatoknowledgists is on UHI b/c it’s a bit confusing for me, unfortunately he didn’t reply. If someone knows the answer I would appreciate it.

    I wasn’t at all disappointed with this result. If warming only occurs after 75, it’s hard to attribute very much of it to CO2. Also, if the warming can be so positively connected to UHI, the net result will be that all stations should be questioned as the surfacestations project will probably conclude. Even using their own rural/urban sorting results in a definitive result of UHI.

  31. Jeff,

    Have a look at Fig. 1 in
    LIMITS ON CO2 CLIMATE FORCING FROM RECENT TEMPERATURE DATA OF EARTH, by David H. Douglass, John R. Christy, Energy & Environment, Volume 20, Numbers 1-2, January 2009.

    Click to access E&E%20douglass_christy-color.pdf

    Pretty clear that the CRU temps are biased high — assuming UAH got the sat. temps right.

    Also see

    — for an updated graph of UAH satellite temp data.

    Best wishes,
    Pete Tillman

  32. Jeff Id said

    #31 I agree that CRU is biased high. UAH isn’t perfect though either but I don’t know which way it goes.

    As to the second link, I like mine better.

  33. Kenneth Fritsch said

    I think it is important to remember that GHCN uses adjusted data were there is sufficient spatially adjacent data to do their adjustments for homgeneity by differences temperature series and looking for change points (breaks) in the difference series. Where there is insufficient data (adjusted or unadjusted) to do the break point analyses, GHCN uses unadjusted data.

    GHCN uses change point for adjustments, including UHI effects,
    except for TOBS in the US and they do it because they have insufficient trust in the meta data formerly used to make homgeneity adjustments. GHCN does not acknowledge a need for TOBS in the ROW – which I have not seen documented.

    Reading about GISS, I found that they do some rather subjective adjustments to data when they see change points.

  34. EdeF said

    Things to check further: There is a bit of scalability going on here, if you show the anomalies as temperatures deg F or C at near
    full scale the rural station data shows up as nearly flat. I am finding very few rural stations in California that have a pronounced
    increase in temperature. Are the rural stations really rural. Most of the small towns of the 1940s have grown dramatically in the
    last 70 years. It would be interesting to really scrub these rural stations to make sure they are well away from urban areas and
    have a smallish population base. A few large cities I have looked at on the west coast of Callif. when the wx station is right on
    the coast have diminished UHI due to the constant on-shore breeze coming off of the Pacific. This effect dies off as you move inland.
    Since most of the temperature proxies are out in the mountains or countryside and reflect more of a rural temperature than the overall
    urban and rural temperature chart, expect to see a divergence between them. Just looking at dozens of rural temperature charts, I really do not see anything that alarms me.

  35. tty said

    What urban/rural classification do you use?

    The GHCN U=Urban/S=Small town/R=Rural or the brightness classification A/B/C?

    I’ve checked both for Sweden where I am familiar with most sites and the U/S/R is not very reliable, for example airports way out in the boondocks are classed as urban because they share name with a major city. The brightness classification is better, but also wrong sometimes because the coordinates are off.

  36. rob r said

    Consider this:

    In New Zealand there are now (2009-2010) only a few climate stations reporting to GHCN. Some of these have UHI issues and some have other local siting issues (as per the surface stations project).

    But as it happens there are dozens of rural climate stations throughout the Country that have very good longish records. Most of these sites do not report to GHCN. The data is freely available by signing up as a user of the National Climate Database that is maintained by NIWA (National Institute of Water and Atmospheric Research).

    But its not just average temperature that is available. The following monthly and annual data are available for many stations:

    01 Wet days (> 1mm rain) (month and year)
    02 Mean Air Temp (month and year)
    03 Mean Daily Max Air Temp (month and year)
    04 Mean Daily Min Air Temp (month and year)
    05 Mean Daily Grass Min Temp (month and year)
    06 Extreme Max Air Temp (Hotest measurement of the month/yr)
    07 Extreme Min Air Temp (Coldest measurement of the month/yr)
    08 Extreme Grass Min Temp (Often the coldest frost of the month)
    09 Total Sunshine (month and year) (month and year)
    10 Mean 5cm Earth (Soil) Temp (month and year)
    11 Mean 10 cm Earth Temp (month and year)
    12 Mean 20 cm Earth Temp (month and year)
    13 Mean 30 cm Earth Temp (month and year)
    14 Mean 100 cm Earth Temp (month and year)
    42 Mean cloud amount (month and year)
    43 Lowest Max Air Temp (month and year)
    44 Highest Max Air Temp (month and year)
    62 Lowest Daily Min Temp (month and year)
    63 Highest Daily Min Temp (month and year)
    65 Mean of 9am Temp (month and year)

    Other data that are available include stats on wind, humidity, evaporation, soil moisture, heating degree-days etc.

    Thats rather alot to get ones head around. Some stations don’t have all the data but some have most of it.

    The soil temps are interesting as these can potentially be used to verify local trends in the air temp. Also when one starts to get down to this level of detail it becomes more apparent which sites are truely rural and which are not. Even for the urban sites one can begin to detect whether the UHI effect is increasing or whether it has stabilised (by comparison with nearby rural stations).

    I suspect that this is at least part of the answer to Doug Hoyts rather “tongue in cheek” comment on station selection.

    It is likely that most other “westernised” countries have archives of this type of data. What is perhaps needed is the compilation of a more comprehensive open-source global climate database containing an array of different datatypes. Then it would be possible to get some genuine climate research done. (Rather than the sloppy secret squirral stuff we have been forced to accepted so far).

    Anyway, keep up the good work Jeff. I always enjoy your postings.

  37. […] Reposted from The Air Vent […]

  38. David W said

    Jeff, any idea as to why WUWT seem to have pulled this from their website?

  39. Jeff Id said

    #38, It’s going to run at 6am tomorrow.

  40. Geoff Sherrington said

    Hi Jeff,

    The shape similarity of figs 3 and 4 suggests that most rural stations became urban stations by population encroachment. Therefore subtraction of rural from urban (on a year by year basis with no lag) does not reveal a true magnitude, but a reduced one. The true magnitude, by measurement, can be rather more – transient events of 10 deg C have been reported (lit cit).

    Hypotheticals. (1) Urban stations reach a maximum UHI temperature difference once they exceed a threshhold population. The in-city met station does not get a much warmer peak, since the warmth is merely spread over more hours each day. (2) Most urban stations maxed out by 1950. (3) Many, if not most rural stations started to get UHI effects by 1980. (4) Overall, the UHI temp effect is still increasing for rural stations (although there are some with no UHI because of local siting). Thus, year-on-year subtraction understates.

    Observation: Truly rural seaside stations in Oz that I studied in a small set show no discernable temp increase since 1968, when I started the study. This observation needs to be explained in any analysis.

    Rough solution. With your data, add a time axis lag before doing subtraction of rural from urban. e.g. On the rural fig 6, make year 1900 equal to year 1930 etc, then subtract from urban fig 5. Best to stick with data with 100 year records as you have done later. This is rough and ready, but I think it will give a better magnitude for recent city UHI.

    UHI in my home town of Melbourne has been studied at leat at ; and ; and ; and

    In support of some of the above comments on rural localities, one of the papers references in passing the farming town of Deniliquin, present population 8,000, lat 35.5S, long 145E, thus:

    • In February 1995, on clear and calm
    nights, the town centre can be up to 4.2
    deg C warmer than the airport.

    So, not just Tmax and Tmin should be considered, but the daily profile of each. Shame we do not have this in historical data.

    Caveat: Nothing I have written here endorses the accuracy of your data sources which remain open to many questions.

  41. Jeff Id said

    #40 “Therefore subtraction of rural from urban (on a year by year basis with no lag) does not reveal a true magnitude, but a reduced one.”

    Exactly right IMHO. You should consider a decent writeup on your findings. I might know a blog owner who I have some sway with.

  42. Geoff Sherrington said

    Re 41 Jeff

    My writeup is now over a year old and we have moved on. As to your hint that you might just know a blog owner …. Why, I could never have that suggestion appear next to my name if emails were made public.

    Seriously, I have more work to do before going public, if people accept my submission. My data set is too small and I cannot be sure of its provenance. I’m working towards getting really raw data, but it’s not easy. The big unknown at the end will be meta data, like precisely where the met station was positioned re the town in each year.

  43. […] Air Vent è apparso un ottimo postin cui l’autore ha sviluppato del codice R per leggere e analizzare […]

  44. Espen said

    Very good work! Eyeballing figure 8, it seems to me that it’s quite probable that we’re already on our way down from a local maximum that was reached around 2004-2005. Further, the difference between that maximum and the previous local maximum in 1940 is about 0.4C. I have guesstimated a value in that range before, seems very likely to me.

    So we’re looking at 0.4C/65 years which could be explained by CO2 – but it could also still have other explanations, e.g. that there may be fluctuations with much longer “wavelengths” in addition to the ~65 year cycle (if there is any regularity to long-term temperature fluctuations at all, of course – I’m not quite convinced there is).

    (BTW – I’d rather have +40C than +0.4C – far too cold here now ;-))

  45. Espen said

    Correction: I wrote “Eyeballing figure 8”, I meant Figure 6: “Rural temperature stations with at least 1200 points”

  46. Kenneth Fritsch said

    Jeff ID, in my continuing analyses of the GHCN station data, I broke down the historical AL and Adjusted GHCN series for WMO and Near WMO stations. I have two graphs linked below that show these results. The R code is listed below.

    Some comments are necessary here:

    1. I have not been able to find, in any detail, what exactly a WMO station means other the WMO receives data and meta data from these stations.
    2. This quest becomes more important since I found that GHCN has only reported temperatures from WMO stations over the past 3 years, i.e. no more Near WMO stations (see the linked graphs). I am currently looking for an explanation of this development.
    3. The Near WMO stations are mainly stations located very near WMO stations, but not all are. A few have no “mother” WMO station. Notice that over time the GHCN total stations are approximately evenly divided between WMO and Near WMO stations. In effect this means a lot of rather redundant temperature data by very close proximity stations makes the station numbers more impressive than they really are in spatial coverage.
    4. My next analysis will be to look at the how many stations are within x distance (degrees of latitude and longitude) of other stations and how that metric has changed over time. After that I want to see how much stations in very close proximity vary in temperature trends over various periods of time.

    Index=grep(“Near WMO”,TabAllGHCN[,10])
    CumStat= c(StatYear[1,1]:StatYear[1,2])
    for(i in 2:length(StatYear[,1])){

    WMO= TabAllGHCN[Index3,]
    CumStat= c(StatYear[1,1]:StatYear[1,2])
    for(i in 2:length(StatYear[,1])){

    #Do same for Adj GHCN:

  47. magicjava said

    You may have already known this, but, yes, the IPCC claims global warming has only been occurring since 1975.

    And since a “climate cycle” is 30 years long, that means we have a sample size of 1.5 in our discussions about the global warming. In the full sample size there was warming. In the second 0.5 sample size (the last 12 or so years) there’s been no warming.

    At a high level, that represents the sum of our empirical knowledge about global warming.

  48. Phil A said

    “If warming only occurs after 75, it’s hard to attribute very much of it to CO2.”

    As far as I understand the AGW argument it runs something like “CO2 has been warming us since the 1920s but the warming trend was overmastered by atmospheric pollutants from 1945-1972 (or so) until our industry was cleaned up and the warming could resume.”

    Given the Eastern Europe and much of Asia continued polluting (and probably still do in many places) well after the 1970s I find this “convenient” theory dubious, not least because it fails to explaint the similar decline in the early 20th century before the warming of the 20s/30s.

  49. Geoff Sherrington said

    Kenneth Fritsch said
    January 7, 2010 at 3:20 pm

    It might vary with country, but there are examples where the WMO stays the same over a term of several met station shifts.

    The Australian BOM has this explanation, which might not be so useful:

    The Bureau of Meteorology station number uniquely specifies a station and is not intended to change over time
    time, although on very rare occasions a station number may change or be deleted from the record (usually
    to correct an error). Generally a new station number is established if an existing station changes in a way
    that would affect the climate data record for that site (measured in terms of air temperature and precipitation).
    Significant station moves are an example of this.
    Some stations also possess a World Meteorological Organization (WMO) station number.

    The WMO number is
    different to the Bureau of Meteorology number. It also uniquely specifies a station at any given time but
    can be reassigned to another station if the new station takes priority in the global reporting network.
    Only selected stations will have a WMO number. Significant stations may maintain their WMO number for
    many decades.

  50. Geoff Sherrington said

    More – I presume you have looked at

    There are discrepancies. The famous Darwin has WMO # 94120, but this is not on Aust meta data online sheets I have looked at and it covers about half a dozen sensor shifts, one of about 15 km.

    Good luck with your distance project – admirable persistence. There is a whole story yet to unravel about the correlation versus separation metrics of sub-daily, daily, weekly etc averages, including uncertainty terms.

  51. […] CRU 3b – Urban Warm Bias in GHCN So I’ve learned a great deal playing around with the GHCN data, I think this is a reasonably significant post. Ya […] […]

  52. avfukta nu said


    sorry for posting somewhat of topic, but…

    wouldn’t the “law of great numbers” lead us to guess that adjustments would center on zero, except for the case when we have a clear overall bias such as industrialisation/urbanisation of large areas? When I read that in total adjustments add significant warming trends I become suspicious. If it should have any trend, it should be negative to cancel for urbanisation, the only large scale cause for adjustment with a clear trend in itself? I find the odds for time of observation etc exposing such a clear trend to be low. Are you aware of any published rationale for adustments in total being positive?

  53. j ferguson said

    #46 4. “My next analysis will be to look at the how many stations are within x distance (degrees of latitude and longitude) of other stations and how that metric has changed over time. After that I want to see how much stations in very close proximity vary in temperature trends over various periods of time.”

    Mr. Fritsch, is this where you might detect station migration bias discovered, displayed and discussed at length by E.M. Smith at chiefio?

  54. j ferguson said

    Re: #53, to be more clear, “the southward migration of the centroid of the population of stations included in the GHCN report. In other words, the replacement of cold stations with likely warmer stations over recent years.

  55. j ferguson said

    re #54. I should spend more time waking up before wasting all your time with incoherent babble. southward migration of the centroid would be in the Northern hemisphere. sorry.

  56. Kenneth Fritsch said

    Jeff ID, I determined the number of stations per 5 x 5 degree grid for the GHCN ALL station series. This is the number of stations integrated over the entire time period of the series with at least 10 years worth of data and thus gives an overly optimistic picture of the spatial coverage of the GHCN series. To do this in detail I need to go back and chose some appropriate time periods for coverage. The R code is listed below and the graph and table summarizing my results are linked here:

    My purpose here was to show that many 5 x5 grids have only a single station over time (29.9%) and then ask the question of how we can determine the uncertainty of using a single station to represent this grid area. The results are listed in the graph which shows the cumulative percentage of the stations that are located in the grids at rates of 1 per grid and 2 on up to 100 per grid. The graph and the table show that many grids have few stations while a few have many and as many as 100. My thoughts at this time are that those grids with many stations (providing they have sufficiently long histories and adjusted temperatures) can be used to determine the variability we can expect from using any given number of stations to average for a grid temperature. I want to look at temperature anomaly trends. I may want to look also at unadjusted stations as the homogeneity adjustments could have the affect of decreasing the grid variability.

    I found the total number of 5 x 5 grids covered in the GHCN All station series was 891. Since there are 36×72 or 2448 total grids for the globe and the land area to total global area is 29%, we have a favorable comparison of 891/2448 or 34%, and particularly if you consider land to sea overlap at the boundaries with these land grids partly in the sea. The grids are designated by a latitude and longitude value and the are enclosed is given by, for example, latitude 85
    is >85 to 90.

    Lon= as.numeric(TabAllGHCN[,3])
    LonG= trunc(Lon/5)*5
    LatLonG=cbind(LatG,LonG) [,1],Logitude=LatLonG[,2]))

  57. Jeff Id said

    #56, That’s an interesting post. I don’t have time to write it up but people should check out just how much coverage we actually have. One project rolling around my little brain is a video of the whole earth depicting active temperature stations over time.

  58. Benjamin said

    It would appear to me that Jeff Id’s analysis confirms that Hansens’s crew is doing a pretty good job of adjusting for the UHI effect in their GISTEMP product.

    Looking at numbers 1978-present, Jeff Id found a decadal trend of .20C/decade for rural stations. GISTEMP land only product over that period shows a decadal trend of .18C/decade.


  59. Jeff Id said

    #58, Nope, this post shows the difference between urban and rural temperatures. By simple gridding the trend is heavily weighted to the northern hemisphere which has more warming. Pay no attention to the trend only the difference in trend.

  60. Kenneth Fritsch said

    Jeff ID @ #57:

    I was just going to suggest the same video. You know when you attempt to summarize and show what is important for spatial and temporal coverage of these stations it gets a little complicated. A light show video would show it, but I need a snapshot (or several snapshots) to really digest what that coverage means.

  61. vjones said

    #57 Jeff,
    That would certainly be very instructive and it is one of the things KevinS hopes to do with the data he is working on here. At the moment it is in database format and interactive maps (e.g.: but one of the original ideas was to have the maps show changes with time.

  62. vjones said

    #46 After that I want to see how much stations in very close proximity vary in temperature trends over various periods of time.
    That IMO is one key area for showing how flawed much of the current analysis is. Commenter “lws (06:46:41)” over at this thread on WUWT pointed out three such stations in Dallas (although the point made was about UHI).
    The maps mentioned in comment #61 above might suggest others as the stations are color-coded by trend and clicking on them opens a graph of the data.

  63. Luvmy91stang said


    On line 3 of your script you’re reading from the clipboard.


    What is supposed to be on the clipboard and why not just read it from a file?

  64. Geoff Sherrington said

    Benjamin said
    January 8, 2010 at 3:26 pm

    “It would appear to me that Jeff Id’s analysis confirms that Hansens’s crew is doing a pretty good job of adjusting for the UHI effect in their GISTEMP product.”

    In the broader sense, without invoking mechanisms, this might be a plausible interpretation of the graphs. However, about half of the Australian rural stations I have studied show no temperature increase at all since I started looking in 1968. Mechanism therefore becomes important.

    Unless you can explain the mechanism that allows many stations to show essentially no temp change in 40 years, it is perhaps premature to suggest that a mathematical (as opposed to a physical) method is superior.

  65. Jeff Id said

    #63, Sorry about that, I was testing some of the breakpoint algorithms in R. The lines do nothing.

    #64, I’m starting to see that I shouldn’t have published this post until I had all the area weighting done right. In this version, Australia got a very low weight, the trend was intended only to show the difference between urban and rural. If there is time, I’ll do an area weighted plot this weekend which due to the low trend of the SH, will reduce the global trend substantially.

  66. […] qui, sul blog di Jeff Id trovate gli script “R” ed il dataset delle serie da lui utilizz… AKPC_IDS += "6617,";Popularity: unranked Ti è piaciuto l'articolo? […]

  67. Kenneth Fritsch said

    Jeff ID, I have extracted the following time period station distributions on 5 x 5 degree grids for the GHCN Adjusted Station series (the R code is listed at the bottom of my post):

    1. Stations with no time period restrictions except the standard one of having at least 10 years worth of data.
    2. Stations that covered at least the time period 1950-1990.
    3. Stations that covered at least the time period 1900-1990.
    4. Stations that covered at least the time period 1900-2009.

    I chose these time periods as snap shots of the period that would be of most interest for the effects of AGW on temperature trends. I did not go back further than 1900 as the data the further you go back, in my view, has large uncertainties (not that it cannot going forward). I wanted to see how the station distribution thinned out as the time periods of interest increased. What is of interest here are the uncertainties that arise from stringing together short segmented temperature times series to come up with an average temperature for the 5 x 5 grid. I am not at all certain that that uncertainty is well acknowledged or accounted for by the temperature set owners.

    I also am concentrating here on the adjusted stations as the owners go, obviously, to great lengths to provide these adjustments and inform how necessary the adjustments are. As I discuss my findings below, it will become obvious why GHCN states that they use unadjusted data to make their first difference homogeneity adjustments – they run short on adjusted stations.

    I have linked four tables below that summarize the data for the stations meeting the time series lengths given above. Remember that the number of grids occupied with GHCN All stations was 891 and that number agreed reasonably well with the expected number given the land to global area ratio and the grids that would contain land at sea at the boundaries. Actually a plot of the GHCN All stations on global coordinates shows the outlines of the various continents.

    For the adjusted stations with no time restrictions, 713 of the 5 x 5 grids have stations and indicating that something on the order of 25% of the global land 5 x 5 grids do not have stations with adjusted data. The percentage of stations with 1 or small numbers of stations per grid goes up from what we saw with the GHCN All station distribution with 1 station per grid occurring on 35% of the populated grids, i.e. grids that have stations with adjusted data. The percentages for 2, 3, 4 and 5 stations are 14%, 11%,7% and 6%, respectively. There are 19 grids with 40 or more stations per grid and it these grids that can be used to determine variation within a grid area.

    The 1950-1990 time period minimum has 539 grids occupied with the 43% of the occupied grids havening only 1 station per grid and for 2, 3, 4 and 5 the percentages are 20%, 10%, 6% and 2%, respectively. There are 12 stations with 40 or more stations per grid.

    For the longer 1900-1990 minimum period there are 270 occupied grids with 49% with 1 per grid and 19%, and 7% for 2 and 3 stations per grid. There are 6 stations with 40 or more stations per grid.

    For the longest period 1900-2009, there are only 67 grids occupied (out of a total of 890 or so possible) and those with 1 station per grid is 90% and no grid contains as many as 3 stations (1 grid does).

    The next step is to measure the within grid variation in temperature trends. If those trends correlate very well, the use of sparsely covered longer segment time periods could be used without introducing large uncertainties in grid averages. If this is not the case then the next step would to determine the uncertainty of splicing shorter segments together and the availability of shorter segments for splicing.

    R code for GHCN Adjusted with all stations having 1900_2009 time period:

    Lon= as.numeric(TabAdjGHCN[,3])
    LonG= trunc(Lon/5)*5
    Stp= as.numeric(StatT[,4])
    for(i in 1:n){
    if(Srt[i]2008) StatT1[i] =”T” else StatT1[i]=”F”
    write.csv(Stat1900_2009,file=”Stat1900_2009″) [,1],Longitude=Stat1900_2009[,2]))

  68. Kenneth Fritsch said

    Jeff ID, I have now in place a rather automated method of obtaining information from KNMI and manipulating it in R and Excel to determine the variations in temperature trends over various periods of time for individual stations in 5 x 5 degree grids used by GHCN. KNMI is linked here: .

    I constructed graphs of the station data and calculated breakpoints. I will be happy to provide the R code and methods to anyone who might be interested in doing some of these calculations.
    I started with the grid which happens to include my home address in Illinois and is located at >40N to =45N and >85W to =90W. I have started my analyses with only those stations that have extended and complete data and used adjusted data. In this case I used the period 1900-2005 for the GHCN adjusted temperature series (from KNMI) and found 19 stations in this grid that met that criterion.

    My results are summarized in a table of the trend slopes for the 19 stations, the SE of the slopes, the probability that the slope was zero by chance and the AR1 auto correlations of the residuals for regressing temperature anomaly against time (years). Some example of the temperature series and breakpoints are given in the linked graph.

    Before discussing the results of my analysis I want to point to some issues I had in downloading data from KNMI and ask the question whether I might better go directly to using GHCN data as Jeff ID has done in his posts here and whether I can look at individual stations. I was able to manipulate the station data from KNMI in R but I ran into a problem of having to do some of the radio button (correct terminology??) pushing manually. This action did not significantly slow the overall process but it was not as elegant as writing a program to run start to finish without human intervention. I recall Steve M running into a similar problem using KNMI for auto downloads and I do remember that he overcame that problem.

    The discussion of the results is best started with inspection of the graphs of the station temperature time series and breakpoints. First I found that 6 of the 19 station series had at least one breakpoint. One station had 3 and another had 2 breakpoints. Breakpoints are determined based on statistical significance, but a visual inspection of the graphs shows a break in most at around the year 1957. Based on that observation I used the time periods for my trend calculations (see linked table) of 1900-1957, 1958-2005 and 1900-2005. I also found for 4 stations breakpoints at 1997 which on further visual inspection shows the same (though not significant) break in most of the series.

    So what are these features that appear rather consistently in most series telling us? I think it shows that the general changes in the climate, that these breaks indicate, are occurring across the 5 x 5 grid area. It should be remembered that I am working here with adjusted data and theoretically adjustments for non-climate non homogeneities have already been made. That adjustment might also decrease the trend variations that I will discuss below. We can also probably conclude that features occurring in most or all stations in an area such as this one are from climate and not station non homogeneities. Having said that about the station feature commonality, if one looks more closely at the segmented trends, they appear different from station to station. That observation is borne out in the discussion below of the trend data.

    The table summary can be further summarized here by noting that from 1900-1957 the range of trends is from 0.1 to 2.4 degrees per century (mean = 1.44 and Stdev = 0.70) with an approximate +/- 1.4 on most station trends, statistically significant trends for 10 stations and relatively small AR1 on the regression residuals of 0 to 0.26. For 1958-2005 the range of trends is from -0.3 to 3.5 (mean = 1.77 and Stdev = 0.92) with an approximate +/- of 1.8, significant trends for 8 stations and AR1 from 0.0 to 0.14. Finally, for the entire period from 1900-2005 the range of trends is from 0.1 to 1.4 (mean = 0.64 and Stdev = 0.36) with an approximate +/- of 0.54, significant trends for 13 stations.

    The trends summarized above, in my mind, show considerable variation and that further shows that climate can evidently be very local. This situation, in turn,adds to the problem of allocating, with good certainty, an average temperature (trend) to a 5 x 5 degree grid with only a few stations. Remember that I have used adjusted data that may already be averaged to some extent by that adjustment for non homogeneities. Grids do have other adjusted data that comes by way station data that covers only part of the extended time periods of interest. I am curious at this point how homogeneity adjustments are made for short lived stations as the calculation of breakpoints (change points) depends on have extended series.

    I plan to continue to do more of these analyses using grids that contain a reasonably large number of stations for making comparisons.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: