Hot News for Temperatures

Anthony Watts made a rather extraordinary announcement on his blog WUWT (AKA the center of the internet).   It has the potential to initiate a necessary change to climate science at its foundation, because if he is correct in his assertion, measured warming trends in the US and ostensibly globally have been overstated.  I see his claims as revolutionary, which is a pretty strong word, because they have the potential to change much of our understanding about global warming science.  To make my case, lets start first with what climate science doesn’t disagree with:

Adjustments to trends:

Click for graph source – Source Data: NOAA USHCN V2.5 data


WARNING –  While climate science created these well-known corrections, these are the same adjustments that Lewandowsky labeled me a conspiracy theorist in a published psychology paper for acknowledging their existence. Be careful in discussing this NOAA generated data as it might get you diagnosed with a personality disorder in a “highly regarded” international psychology journal…… or maybe even a tax audit…


An accurate plot of adjustments from Nick Stokes from comments below:

The Y axis is actually Deg C rather than F
A link to Nick’s post and code to generate the graph is here:



More seriously, these are known adjustments to the thermometers deemed necessary by climate science in order to accurately depict US temperatures.  They are right from the US thermometer data, right from the USHCN website.  The adjustments may be accurate and necessary and after the thermometers are corrected, they are held out by climate science as an excellent representation of actual temperature trends.  Until the last few years, we had no true knowledge as to how accurate the corrected trends are.  Before we go too far though, the corrections often seem quite reasonable, yet there is some conflict with satellite and radiosonde (balloon measurements).   I’ve always been uncertain of their veracity.

On other matters, we also know with certainty that climate models run too hot when compared to these adjusted observations.  That said, some of the deeply ensconced climate alarmist types in the mainstream of the climate field have still failed to admit what is painfully obvious at this point, while other main stream types have moved off message to make corrections to the models.  Basically my own really obvious “certainty” is still being argued with in ridiculous fashion in some die-hard corners of the climate science field.


This graph above is from Dr. Roy Spencer’s blog and it shows the trends of ground temps vs modeled temps.   In fairness, some of the more vocal climate scientists didn’t like this plot because of the start point Dr. Spencer chose, but the argument they made is complete nonsense as the SLOPE of the observations is the key and it is statistically much lower than the slope of the modeled data.  Quite a few papers are published now stating this well-known fact in more statistically complete fashion, so this graph is not by any means a stand alone article to be critiqued out of existence by an inconvenient starting point.  I often say that stats usually just tell you what you can already see in the data and normal people see models running ahead of observations.

I used this version for my argument today because Roy’s plot includes both surface temperatures (HadCRUT4) and satellite lower troposphere temperatures UAH.  It is important because the satellite temperatures are lagging behind the ground thermometer observations (slightly) and the land-based portion of the surface thermometers are the point of Anthony Watts research.   The following statement and graph really caught my attention and are what prompted this post.  If you are reading this blog, it will probably catch yours. In particular, note the following statement and the 3 subtitles in the plot below:

Our findings show that trend is indeed affected, not only by siting, but also by adjustments:




Biasing the record

Why is that a big deal?    Because it is extraordinary to find a statistically differentiable signal difference in a large group of temperature stations.  Now I need to preface that statement with – stations that are not pre-selected for items which would knowingly bias their record.   What Anthony Watts et. al. have done is rank temperature stations by pre-defined criteria, for the singular purpose of comparing data having different levels of human or environmental influence over their record.   Anthony Watts biased the record by separating high and low quality stations!

So lets consider for a moment what this says…. Stations with minimal influence (Class 1 & 2) show a much lower trend than stations with known influences.   The difference is extreme — they show half of the trend of the temperature corrected result.  That claim alone is frankly — huge.

Nick Stokes, a known skeptic abuser and technically adept blogger, wrote what I found to be a very compelling post that showed historic temperatures for the entire globe can be estimated reliably by as few as 60 temperature stations.  Although the post and math were very cool, the result isn’t technically that surprising.   What it shows quite clearly though, is that no matter which 60 stations you choose, you end up with very similar results.

Except apparently when Anthony Watts chooses the stations.

Not possible!

From Nick’s entertaining demonstration, and from the mainstream climate science claims that homogenized corrected temperatures are accurate, it should be nearly impossible to choose less influenced stations by objective criteria and come to a significantly different trend than the homogenized result.   But what Anthony Watts has demonstrated before and restated in a blog about coming work is that he has done exactly that.    Data from the least influenced, and therefore the best possible thermometers, is dramatically lower than the homogenized land temperature trends that nearly everyone in climate science use in their publications.

What is more, in Anthony’s previous work, he and his coauthors demonstrated a significant correlation between station quality and trend.   The better the station sighting, the lower the trend.   This is actually common sense in the weather industry as nearly every human influence to a local environment creates local warming effects.  From adding blacktop, air conditioner outlets, blocking airflow with buildings, concrete runways on and on, progress almost always creates local increases in temperature which influences reading of individual thermometers.  Other changes can shift temperatures cooler or warmer, such as station movement or changing the time of observation.   None of the local warming/cooling effects are controversial from my reading, but all of these sorts of problems are what homogenization of temperature stations is supposed to correct for.

Now Anthony’s previous work was roundly critiqued by people for certain shortcomings.   Climate science is highly politicized so many of the critiques were unfounded and even truly wrongfooted attacks based on result rather than true scientific problems but Anthony took them seriously and has apparently come back with an improved version having again similar results which directly address previous issues.  Instead of reducing the differences or limiting the difference between mainstream temperature publications and his result, the corrections have reinforced the previous results.

Future critics of Anthony’s work can make the claim that he has made some error, or somehow his choice of station quality is biased in some unknown way, and they in fact have done those things in the past. However, these stations are classified by outside influence and it is extremely unlikely that an “error” would result in a continuous (or nearly continuous) reduction of trend from class 5 to class 1 stations.   How could an error in the work produce such controlled results?  It doesn’t seem to be a reasonable claim.   To top it all off, Anthony’s result just seems like common sense.  Stations not influenced by buildings or air conditioners, or movements, or time of observation, produce lower trends.

To be clear again, I am not advocating for Anthony’s result, I haven’t read it.  I don’t know it.   What I do know, and what I am saying is that there should be NO significant differences if the station quality was properly corrected for in mainstream ground temperature series.  Either a gross error was made which is very unlikely as critiques have already been flown and addressed, or we have identified a big problem in land based temperature measurement.

The early reaction

Thus far we have only a bit of commentary on the results from the BEST Berkeley data group, and it isn’t at all encouraging.   I am hoping, and expecting, to see a group of more open minds look deeply into this in the future because Anthony Watt’s surface station project is the most thorough look anyone has ever taken at the quality of the temperature data being recorded.   The results are dramatically different from our current understanding of temperature trends and that is what non-political science is about, understanding.

If these truly revolutionary claims are correct, and scientifically in a multi-billion dollar field they are revolutionary claims, the global temperature trends (observations) are likely higher lower than shown in Figure 2 HadCRUT4 above and climate model projections are trending warmer than observations by even more than we already know.  A proper, coldly scientific review is necessary and it will mean a full audit of global temperature stations if we ever hope to make truly predictive climate models.

There is much more to write, I just hope that not just normal scientists, but mainstream global warming science takes a hard look at what this study is claiming in the near future.  The good news for Anthony Watts is that if he is correct, ignoring the result will only delay the inevitable outcome as the cold science of temperature measurement will certainly prove stronger than a multi-billion dollar political movement.





82 thoughts on “Hot News for Temperatures

  1. Thank you, Jeff, for this post and for all you and others have done to restore sanity to society and integrity to government science.

    I am convinced that FEAR of nuclear annihilation in 1945 convinced world leaders to:
    1. Form the United Nations on 24 Oct 1945
    2. Hide the source of energy that destroyed Hiroshima and Nagasaki with false consensus models of:
    _ a.) Heavy atoms like Uranium
    _ b.) Ordinary stars like the Sun
    _ c.) Galaxies like the Milky Way
    _ d.) The expanding Universe, and
    _ e.) Earth’s always changing climate

    1. Here are nine pages of precise experimental data (pages 19-27) that falsify Standard Models of nuclei, stars, Earth’s climate and the cosmos:

      Click to access Chapter_2.pdf

      Words cannot properly express my gratitude for your courage in allowing me to post factual information that is unwelcomed by many AGW skeptics and believers alike.

    2. Forgiveness and healing from the trauma of Climategate emails may be aided by sharing:

      a.) Aston’s warning on 12 Dec 1922 of the danger of transforming Earth into a star by uncontrollable release of nuclear energy [See page 20, last paragraph of Aston’s Nobel Prize Lecture]:

      Click to access aston-lecture.pdf

      b.) Information on uncontrolled chaos in the closing days of WWII

      _ 1. Allied atomic bombs destroyed Hiroshima and Nagasaki;

      _ 2. Japan exploded an atomic bomb off the east coast of Konan, Korea;

      _ 3. Stalin’s USSR troops captured Japan’s atomic bomb facility and took scientists and technicians to Russia; and

      _ 4. A young nuclear geochemist took secret possession of Japan’s atomic bomb plans . . .

      FEAR of death forged the alliance of world leaders and scientists that Climategate emails exposed in Nov 2009.

  2. Can’t wait to see it published in a respectable journal. I am sure Science or Nature Climate Change will snap it up given how “truly revolutionary” these claims are.

    1. The problem with “respectable journals” is that they are often run by the “respectable” cliques in power within a field. Anthony may find it profoundly difficult to find a venue willing to go out on a limb to publish empirical research that can’t be readily debated short of additional empirical research. Empirical research is profoundly hard work, much more so than modeling.

      1. That won’t be the reason Anthony finds it difficult to publish. Also, having published 5 papers myself to date in my field (all empirical) I can respectfully assume you are talking out of your hat and have no personal experience with publishing or even research for that matter.

  3. Jeff,
    That’s an official looking USHCN graph that you have posted. But it’s not from the link that you cite. It’s actually prepared by Steven Goddard. At the risk of being a sceptic abuser, I’ll remark that he is not a reliable authority. He just took a total average of all stations. And it seems the glitch in 2014 resulted from the fact that some stations did not have adjusted values posted. It’s the difference of two different station sets. Goodness knows what he did with the rest.

    I’ve made a shaded map of temperature trends for individual GHCN stations (unadjusted) for various periods of time, including 1977-2013. You can drill down to get individual station values. For some reason, the US is rather a patchwork compared to the rest of the world.

    1. Thanks Nick. I have run the same Tadj vs Traw plots myself in the past and they usually came out about the same. I don’t see the corrections as particularly controversial, I think the differences are more about the average station quality. If your corrections are based on typical stations and Anthony is correct that a typical station has a higher than actual trend, then your corrections will bias toward a typically higher trend.

      Also, the land is only a small fraction of global temps so any changes wouldn’t be huge trend-wise but models are a mess already and this could be the tipping point 😛 to get something fixed.

    2. I second Nick’s comment. This is a common problem on skeptics’ websites, and bad for their credibility: unsourced tables and (especially) graphics. Somebody, often of dubious reliability (e.g., Goddard) prepares a powerful graphic — and it goes viral, often without citing the source (which would ruin the play).

      It’s somewhat similar to the incidence of fake quotes so often found on conservative’s websites.

      This is not often found in the work of climate scientists, who tend to be careful about selecting and citing sources.

      1. As I told Nick, the graph is good. I have made similar to it myself in the past. It comes from the linked data and while you may wrongly state that it hurts credibility, I don’t see anyone writing where it is in error.

        Also, climate scientists make plenty of errors on blogs just like the rest of us. When publishing, everyone is careful.

        1. Jeff,

          This is an important point, so I’ll be dogmatic on principle about this despite a lack of supporting data.

          My point was not whether the data is correct or not. Casual readers seldom try to determine that. That’s the role of standard citation practices, to give us some basis to assess validity.

          “You wrongly state that it hurts credibility”

          IMO that’s quite an extraordinary defense for breaking these kind of basic practices. Blogs don’t usually do the extensive foot noting of journal articles,but correctly stating the origin of graphics is not asking much. I’m not the only one to point this out, or be bothered by the practice seen here.

          “Climate scientists make plenty of errors…”

          Yes, everybody is human. But this is not a binary question of perfect or imperfect (we’re not being evaluated by the NOMAD space probe).

          In my experience they are far more careful about these simple practices, which builds credibility. The “everybody does it” defense is IMO not correct on this point.

          In the short-run the climate policy debate will be won by appeals to the lay public (in the long-run the weather will choose the winner). These kinds of practices, each small but in aggregate significant, might prove decisive.

          1. Fabius Maximus, there seems to be a lot of wishful thinking/hopeful hand waving in your assertions. Either discuss the quality of the graph or stop the strawman argumentation trying to establish non-existent standards to blogs.

            Is the graphic valid or not? Show your work so that it may be replicated.

          2. “Show your work so that it may be replicated.”
            Well, the problem is that Goddard didn’t show his work. In the head post, he didn’t even say he had done it himself.

            Did SG just average stations with adjustments and subtract the average of stations with raw? The 2014 spike suggests that. Here is David A, not usually afflicted with scepticism, pressing that point. Someone answered (how does he know?) but not SG.

          3. Nick,

            I agree with you about the replication aspect of Steve’s work. The endpoint is clearly messed up somehow but it didn’t bother me because the same thing has been done by others in a much cleaner fashion. I don’t seem to have any in my blog, but I have done it in the past. I must not have considered it worth posting for the same reasons.

            I’m interested in seeing whether Anthony’s class 1 results are simply the low end, the bad stations class 5 the high end and the homogenization a middle ground between the two. That would tie the whole difference together rather neatly. No Lewandowsky style homogenization conspiracy required, just basic math. If we were to homogenize to the middle ground, or to the more common bad stations, we have a higher than actual result.

            The possibilities are admittedly pretty exciting to me so I’ll need to take an extra critical look when the paper is released.

          4. Racehorse, the graph states USHCN Final minus Raw. It also lists this link for the data:

            I guess you can’t hide or manipulate the data enough with such a simple process huh??


            Now, give us a detailed explanation as to why that simple procedure is so bad. Are you really saying they post bogus data at their OFFICIAL links?!?! What we been saying for years!!


        2. I don’t think references add much credibility to correlation sorted paleoclimatology. I also don’t think references helped Steig 09 when actual data shown here with less references contradicted it.

          Again, this particular data has been repeatedly shown here and at other blogs. It is a well known graphic in multiple forms.

          Here is another:

          and another:

          and another:

          So that was 3 minutes of work. Have I restored my credibility?

      2. @Fabius: In the interest of your own credibility, I suggest to refrain from making political attributions such as : “fake quotes so often found on conservative’s websites” and unverifiable qualifiers such as often vs not often in “This is not often found in the work of climate scientists …”
        The real question here is, are the temperature measurements, the very basis of climate science, trustworthy.

          1. No way buddy, I think Ben has a good point here.

            I provided a link to the data and multiple other links, some of which had supporting documentation for my work. Ben points out that you left unfounded assertions while stomping around about my credibility in particular. You made several of your own claims with zero references yourself including a unique claim in blogland that “climate scientists” do it better.

            Which written claims specifically did not meet your apparently other-people-only standard of reference?

            I want to know, did my follow-up links help you understand the credibility of my claims, or do you still feel you need more or different references in order to understand my claims here?

            It seems only fair that you explain yourself fully, otherwise you look like a troll trying to shoot artificial holes in a reasoned argument.

          2. Jeff,

            That’s a reasonable request, so I’ll respond in detail.

            (a) “while stomping around about my credibility in particular.”

            Wow. Quite over the top, imo.

            You cited (& linked to) Watts as the source of the graphic. Nick pointed out that Goddard was the source; Watts just didn’t tell us that. I posted a reply to Nick’s comment: “I second Nick’s comment. This is a common problem on skeptics’ websites, and bad for their credibility”.

            It was a bland comment. No naming names or kicking butts. Citing sources is like mom and apple pie. Also, Moms say that giving credit to others for their work is a common courtesy. I’m amazed anyone bothered to reply.

            (b) I find very amazing the hostility of the comments. Much more disturbing than my original point. Hence my follow-up comment:

            “Blogs don’t usually do the extensive foot noting of journal articles, but correctly stating the origin of graphics is not asking much. I’m not the only one to point this out, or be bothered by the practice seen here.”

            By “practice seen here” I meant in this comment thread (clear in context, imo). Strange stuff, indeed.

            Even if Jeff hadn’t cited the graphic’s source (which he did), it’s not a big deal. We make errors too frequently on the FM website; we add them to the Smackdowns page, we resolve to do better in the future, and life goes on.

            (c) “including a unique claim in blogland that ‘climate scientists’ do it better.”

            Spend 3 minutes reading Judith Curry, either Pielke, or RealClimate. All are meticulous about citing sources for their graphics. If you disagree, fine. Strange, but whatever.

            (d) “did my follow-up links help you understand the credibility of my claims”

            I applaud extensive citations. It’s the practice on the FM website, with extensive links in the body of the text and a For More Information section at the end. We write about controversial subjects, often on the edge of the known, and have found that this not only builds credibility but also keeps us sharp.

            This takes a lot of time, however, and (as I said in my comment) is not the usual practice on the Internet. On the other hand, the comments in this thread about citations are … (searching for an accurate but bland word) an unusual perspective, imo. Community standards are always nice to know.

            (e) My guess (emphasis on guess) what’s going on here: tribalism

            This isn’t my first rodeo. The FM website has 184 posts about climate change. Journalism (we’re not climate experts in any way) plus analysis of the nature and politics of the debate. As a result many have written articles denouncing us, from both sides of the climate wars — such as Steve Goodard’s response to my comment describing the mechanics of Google News and two posts by Brad DeLong (Prof of economics, Berkeley). Plus being the frequent subject of two-minute hate sessions in comment threads on leftist websites (e.g., this weird one at Naked Capitalism).

            Most of these look to me like tribalism producing substance-free, intense, vituperative replies to rather mild reporting (e.g., this about the pause, our biggest-traffic climate post). IMO their tribalism marginalizes themselves. But a definitive analysis to that waits for somebody’s dissertation long after the climate wars have died off (resolved by the weather).

          3. Jeff,

            I hate to clutter comment threads with minutia, but this seems relevant to several important points you and others have raised.

            In my posts about climate change, and about the social role of experts, I define “scientist” as someone who has done original work that is recognized as legitimate by peers in that field. That means a PhD, or publication in a peer-reviewed publication. Like any such definition, it’s somewhat arbitrary — but has the virtue of being operationally clear and objective.

            By my standard, you are a climate scientist: “Improved Methods for PCA-Based Reconstructions: Case Study Using the Steig et al. (2009) Antarctic Temperature Reconstruction“, by Ryan O’Donnell, Nicholas Lewis, Steve McIntyre, and Jeff Condon, Journal of Climate, April 2011.

            That you cited and linked to the source of the headline graphic is a confirmation of my original point.

          4. I guess I’m a little lost now as to what the critique actually is. Hopefully, Nick and others have provided enough evidence that the graphs aren’t unreasonable nonsense otherwise suited for MSNBC or a Lewandowsky conspiracy article.

            Someone stated that Goddard’s graph seems to have come from raw data being averaged rather than anomaly data and that is primarily responsible for the differences in shape. Again, not a big deal either way.

          5. Jeff,
            “Someone stated that Goddard’s graph seems to have come from raw data being averaged rather than anomaly data”
            Anomalies would have mostly fixed it, but I used absolute temperatures, as I think does USHCN. I’ve explained the issue here, especially the spike.

        1. Ben says ” I suggest to refrain from making political attributions such as…”

          I ran than through Google translate, which output “Please provide supporting examples for your assertion about fake quotes on conservative websites.” I’m always happy to oblige polite requests. Here are posts describing some well-known examples, as a start. More available on request; they are legion.

          Fear not! America will not fall due to its citizens’ imprudence. We’ve found a sure solution., 16 April 2012
          The Founders talk to us about guns for a well-regulated militia,24 July 2012
          But Hitler confiscated guns, leaving Germans helpless!, 11 January 2013


          1. Fabius, I didn’t need these examples to know that a lot of unsubstantiated or fake claims can be found on the web. Not only on the web, but also in scientific publications of all sorts (Hide the decline?). But whoever associates claims (fake or facts) with color of skin, religion, political beliefs, or psychopathies(!) discredits his own arguments, irrespective of the factual content.
            Jeff’s blog deserves better.

          2. Ben,

            “a lot of unsubstantiated or fake claims can be found on the web”

            Specifics. Specifics. Specifics.

            I specifically referred to “fake quotes”. They’re commonplace in writings of the Right. They’re seldom found in writings of the Left, who have their own favored modes of propaganda (for lots and lots of examples see section 5 here: ).

            Seeing these tribal differences helps understand the pageantry of our time.

            If you prefer to be blind to these distinctions, that’s OK too. Whatever.

          3. Ben,

            (1) “this discussion is completely off topic and I don’t care who says what.”

            Topic drift. That’s how comment threads run, and accounts for much of their charm. Nobody can tell where they’ll lead.

            (2) “Your generalizations are simply disgusting.”

            As grandma used to say, that’s giving him a piece of your mind. I’ll treasure it.

            (3) “Judge claims by facts not by associations”

            To what are you replying? I make specific statement, and provide examples. If you have a counter-example, I’d like to see it. That you find this aspect of reality unpleasant … well, OK.

            (4) Yes, Ben Adler, there are liberal equivalents to climate change denial“, Roy Spencer, at his website, 9 May 2014.

            Thanks for linking to this. As usual, Roy does an excellent job of assembling and analyzing data, even of non-climate matters.

            If you would like more examples, I have 24 posts documenting this phenomenon, going back to 2008. Here are two fun ones:

            The North Pole is now a lake! Are you afraid yet?, 3 August 2013
            Climate science deniers on the Left, captured for viewing, 29 September 2013

      3. Data trumps references, but, employed cannily “references” can obscure an argument to the point that no one is really thinking about the data. One of the sad truths even of education, even in the “good old days” was that debating class often encouraged outright lying. Debate is a tactical sport rather than a strategic one.

        It is remarkably difficult to refute a “citation” you haver never heard of because it doesn’t exist. It is also profoundly difficult to defend a position you have never assumed, but which your debate opponent asserts you hold. The unwary can find themselves defending positions they do not and never did hold. Worse, you can find yourself doing this because of the tactics used by folks that superficially hold more or less the same position you do, if they employ debate tactics that are inherently misleading. The climategate emails shows how powerful this tactic is in that it is plain that few of the correspondents show any respect for Michael Mann. Yet he was the one who effectively set the stage for what followed. In the emails it is clear that there were plenty of doubts and questions and, in fact, it was clear that there was very little difference in view point between “hotists” and “luke warmists.” Never the less, the “team” are trapped as much by Mann’s tactics as by any demonstration that AGW theory was remotely close to being valid. Otherwise, why would Trenberth for example not only complain about the “missing heat” but assert that the data “must” be wrong.

  4. Great post, completely unsurprising that poorly sited temp sensors run hotter than well sited temp sensors but the fact that the gradient of the poorly sited temp sensors is about double that of the properly sited temp sensors is astonishing.

    Do they calibrate these sensors frequently and keep calibration histories? I doubt it. Do they factor in humidity since humid air has more thermal mass than dry air?

    I have a lot more faith in the satellite temp record cuz it is so easy for corrupt people in the climate science field to put a thumb on the scales of land based temp sensors. It must be very tempting for individuals and countries to tamper with measurements when they benefit politically and financially from that tampering.

    Omanuel, what you are doing is rude.

  5. So, if you can estimate global temperature from 60 stations, why not pick the smallest set of stations that are least likely to have been influenced by human development? If, as Nick claims, we have the math to show that this result would most likely be accurate, and these stations would need little or no adjustment, why use a larger pool of data that requires correction and adjustment?

    1. Well, part of the point of what Jeff, Zeke, I and others did was that you can do the calcs on unadjusted data, and it makes little difference. I always use unadjusted data, not because I think it is better, but because I think it is not worth arguing about.

      1. Ok, then, forget adjustments – why not just pick a small pool of stations which have high quality measurements and are sited such that their record is unlikely to have been influenced by nearby development? Surely we can find 60 such sites, if not hundreds. If it can be demonstrated that such a pool can accurately estimate global temperature, why add more stations who’s record may have been polluted by local development? Adding more stations should not change the answer much – if it does, either your claim that 60 stations can estimate the global average is wrong, or the added stations are not representative of the global trend.

    2. Joshv,

      I think your idea would work quite well. I also agree with Nick and I suspect Anthony Watts would agree with Nick that it makes little difference if we use homogenized or unhomogenized data. The difference here being that if we take the data as a random selection, Anthony’s work means that the majority of the stations would bias the trend higher leading to the current mainstream trends. If we do as you suggest, we would get a lower trend if Anthony is right, and a more accurate answer.

    3. Joshv,
      The problem is that there are very few weather stations with records long enough to study trends since e.g., the late 19th century.

      Out of the 6051 stations in the non-U.S. component of NOAA’s Global Historical Climatology Network dataset (the one used by most groups), only 8 of the stations are “fully rural” (i.e., rural in terms of assoc. population and nightlight brightness) and have data for at least 95 of the last 100 years.

      Of those 8 stations (The Pas; Angmagssalik; Lord Howe Island; Sodankyla; Hohenpeissenberg; Valentia Observatory; Sulina; and Saentis), most of them have been potentially affected by human development, e.g., The Pas & Lord Howe Island are airport stations, Sulina has been moved to a concrete platform overlooking the River Danube…

      In the U.S. component of that dataset (the “USHCN”), there are quite a few fully rural stations with fairly complete records for the last 100 years – about 20% of stations, due to the high density of the US “COOP network”. But, this is the component that Watts et al. are studying with their Surfacestations project. Apparently, only 20% of those stations were Class 1 or 2.

      Berkeley Earth’s newer dataset has a lot more station records than NOAA’s dataset, but most of these records are fairly short with less than 30 years data, so the situation is not a whole lot better there, either.

      We discuss this in more detail in Section 3 of our “Urbanization bias III. Estimating the extent of bias in the Historical Climatology Network datasets” paper, which we have submitted for open peer review here:

      1. Ronan, I would suggest that the vast majority even of rural stations reflect anthropic effects. In terms of sheer scale the effects of the conversion of wild land to farm land is far more geographically extensive than those urbanization induces. Further, the kinds of landscape changes that agriculture requires can directly influence phenomena ranging from albedo to energy uptake through water evaporation on irrigated lands. Even range lands are subject long tern change when grazing suppress succession and slows replacement of trees in savannah environments. Add to that that changes in crops – wheat to corn – cereal to orchard and vineyard also have effects. My conclusion is that there may be no unbiased data record from land-based sources.

  6. A couple years ago I looked at growth/decline of US Counties and looked at temps.

    Most of the warming appeared to come from population growth which I suggest came from UHI.


    I took the list of BEST sites and using those sites in BEST with a Country code of United States I used State/County name to merge with the list of Counties I have with population changes.

    I am attempting to correlate County population changes changes from 1900 to 2010 with cooling or warming from 1900 to 2011.

    1956 Stations with data in 2011 and 1900.

    1320 were warming and 636 were cooling.

    1213 of those I could match to the table of US Counties.

    1089 distinct counties.

    562 of those counties had more warming stations than cooling.

    496 had more cooling stations than warming.

    31 had an equal number of cooling and warming stations.

    Warming Counties had a mean temperature change of .0692C/decade.

    Warming counties had a mean population increase of 174,361.

    Warming counties on average grew by 648% from 1900 to 2011.

    Cooling counties had a mean temperature change of -.0573C/decade.

    Cooling counties had a mean population increase of 39,060.

    Cooling counties on average grew by 194% for 1900 to 2011.

    “Equal” counties had a mean temperature change of .0119C/decade.

    “Equal” counties had a mean population increase of 86,469.

    “Equal” counties on average grew by 512% from 1900 to 2011.

    It appears warming counties grew much, much faster than the country as a whole, while cooling counties grew slower than the country as a whole.

  7. Hi Jeff,
    Have you read our study of the Surfacestations results yet? Pdf here: We didn’t have access to Watts et al.’s new Leroy, 2010 results, and so we were using the Fall et al., 2011 dataset. However, our results seem to roughly concur with the findings Anthony Watts & Evan Jones mention.

    We found poor siting increased the unadjusted trends by about 32% and TOB-adjusted trends by about 18%. The nominal “good-poor” difference for the fully-adjusted trends is close to zero. But this seems to predominantly be a result of the blending problem with the Menne et al. homogenization algorithm, rather than an indication that the homogenization “removed” the biases.

    We found two blending problems were occurring:

    1. Because the good stations are in the minority, the homogenization algorithm tends to adjust the good stations to better match the poor stations, i.e., more siting biases are introduced to the good stations than are removed from the poor stations.

    2. Many rural USHCN stations are affected by urban blending in the fully-adjusted dataset. This introduces a general “warming” trend into the entire dataset, substantially increasing the average trends of the USHCN.

    We have uploaded all the data and code (Python) for our paper to FigShare: We have decided to use a fully Open Peer Review system for our climate science papers, rather than the conventional “2 reviewers + 1 editor” closed system. So, anyone that has any comments/criticisms with our analysis is welcome to post them on the OPRJ page… 🙂

    1. Thank you Nick. I have added a link to your post and inserted the graph in the post above. I appreciate the effort.

      I did read the code quickly and it looks accurate.

    2. Jeff,
      Zeke pointed out on my site that the data that Steven Goddard posted, which I used, was in deg C, not deg F. I assumed the latter because he gave the result in deg F. So the plot should be labelled in deg C, which reduces the discrepancy somewhat. But not much; SG’s plot has still almost twice the range, even without the final spike. I’ll modify my plot to show the revised axis.

  8. As someone who was a working experimental physicist, the entire surface station network has always bothered me. First off, it was never designed, built, calibrated, or maintained the way a scientific instrument to measure climate, as opposed to local weather, would be. Furthermore, it never can be because the local environment is part of each instrument. Since the environment changes in random ways, trees growing up and coming down, roads and highways being added and changed, buildings and malls added in the neighborhood, there is no way that one can create a physical theory that allows for “corrections” to the data that are believable. At best one can try to select stations that are and have been far away from any kind of human influence. That still doesn’t exclude changes to the environment that occur naturally and will also bias the readings. Even then, there is always human encroachment since somebody has to be around to read the thermometer.

    Homogenization, infilling, all that stuff is stupid nonsense. It’s a bunch of guys playing “let’s pretend to be scientists” without a a clue about a physical understanding that could actually tell you how to adjust the data for unwanted external influence. Face it, it’s on par with pretending that tree rings are thermometers without knowing anything about the biology of trees and all the variables that influence their growth.

    There are uses for the data that don’t involve magic climate incantations. I’d like to see a different analysis than the usual anomaly, which throws away a tremendous amount of information in return for a single reporter ready number. Phil Jones and others have stated that one third of the stations show a downward trend. How about just computing the slope of the raw unadjusted temperatures for each station and plotting them on a map. Red dots for rising temperatures and blue dots for a falling trend. It would be interesting to see if the trends cluster by type or if they’re randomly distributed. Do this for the continental US and see what comes up.

    1. Paul,
      “How about just computing the slope of the raw unadjusted temperatures for each station and plotting them on a map. Red dots for rising temperatures and blue dots for a falling trend. It would be interesting to see if the trends cluster by type or if they’re randomly distributed. Do this for the continental US and see what comes up.”

      It’s done here. It’s an active map; you can rotate, zoom, show stations, choose years etc. It’s unadjusted GHCN/SST

  9. Nick Stokes,

    Basically, an adjustment upwards of about half a degree since the mid-twentieth century. Surprising because this value seems to be found in all countries (which generally have no significant problem with TOB). Surprising because, a priori, adjustments are related to random phenomena.

    Hence a cardinal question: What is the real origin of this so constant bias ?

    1. Phi. I think you answered your own question: “What is the real origin of this so constant bias ?” BIAS, makes it difficult to accept the reality of their own observations.

      1. The adjustments aren’t really the story. The story is that the stations are potentially nearly universally biased by local heating sources and the actual temperatures from unbiased stations may fall on the low side of satellite observations over land. Ocean still overwhelms the record simply because of the area of coverage.

    2. Phi,
      In the US, the main reason is TOBS, and there is a clear reason for its direction in terms of the instructions given to observers over the years.

      I don’t think you’re right that the same figure has been found everywhere. It’s true that homogenization does tend to produce a small upward nett effect, and I have wondered why. When all the fussing is done, a temperature index is just a weighted average of the station data. Homogenization identifies certain readings as suspect and downweights them, upweighting neighbours. That only has a bias if the downeighted systematically differ in trend or whatever. It seems they do, to some extent, and I don’t know why.

      1. It may not be the case for all countries but I am not aware of countries where this is not the case. And you?

        TOBS, adjusting temperature for hours of readings does not seem a problem. It’s just that the evolution of these hours from 1950 to 1990 is quite remarkable. A priori, the overall effect should be neutral while it is superbly directed. I’m not saying it’s hiding something, we can legitimately invoke chance. But beware, chance is a small twine, do not pull it too.

        The effect of homogenization on trend is not small. In known cases, for the second part of the twentieth century, it’s not far from half of the warming. In fact, this effect does not appear to be due to a play on weights, but to the correction of discontinuities.

        This question of origin is important because as long as the answer is not known, this means that a significant phenomenon (in relative terms) escapes us and therefore there is no guarantee that corrections are appropriate.

      2. Phi,
        I think you’ll need to quote numbers to back up your claim that other countries are the same. I gave my analysis of GHCN adjustments here.

        My complete account of why TOBS “cools the past” and why it is necessary is here.

        1. Nick Stokes,

          For a typical case, I suggest you Böhm et al. 2001 for the Alpine region. An official and interesting document for Switzerland :
          If I remember correctly, in your area, the problem lies in the same waters.

          GHCN raises a slightly different problem compared to national offices which I had in mind. GHCN uses a greater proportion of short series. This brings us closer to the issue of BEST. The final effect is the same but it is not possible to make an overall analysis of the adjustments which logically requires lengthy series. What would interest me are homogenisation by national offices that have no or little warming bias.

          I do not doubt the need of TOBS adjustments, simply, the evolution of reading hours for US is quite surprising. Just don’t push the chance too far.

          1. Phi:

            I do not doubt the need of TOBS adjustments, simply, the evolution of reading hours for US is quite surprising. Just don’t push the chance too far.

            Could you expand on this a bit? What you are getting at isn’t clear to me.

          2. J Ferguson,

            A priori, the cumulative changes in reading hours should have a neutral effect on temperatures. With so many changes spread over a century, there should be no appreciable effect on trends if the changes had a random character.

            I did a quick check and as long as reading hours are right, corrections do not seem too questionable. This means that the changes of time of observation are distributed for the whole in a very special way that causes a constant cooling bias in raw series.

            This may be the effect of habits that have evolved constantly and always in the same direction. This situation is quite specific of the US and is not found elsewhere in this form. The U.S. network has other peculiarities, I’m reading the interesting paper of Ronan and my impression is that this singular behavior is probably a simple chance. Other adjustments also enhance the warming trend, and there, it has nothing to do with chance.

          3. Jeff Id,

            Indeed, it’s just a little strange, but after all, it can be a coincidence. This oddity occurs in a context where the data processing is clearly deficient and that kind of singularity makes suspicious. If you take the case of Switzerland, the bias is almost identical while there is no Tobs problem. The proportion of rural stations is lower and then this may also explain the difference.

          4. All this being very paradoxical, I should have specified that stations which are more subject to increased perturbations (ie, for simplicity, which have a more urban character), are also more subject to upwards adjustments. This phenomenon is described in Hansen et al. 2001.

          5. Jeff Id,

            Off topic. I think I found something about MXD. According to my analysis, the densities decrease in dead trees and that especially for external rings. I think this explains the strange results you achieved. Unfortunately, this would mean that the usefulness of fossil specimens is rather small.

          6. Jeff Id,

            Here :

            Given the numbers of individuals, it is mostly the external 100 years which are significant. Violet curves represents the average for all fossil. I added the data of standing tree just for completion but it is not usable for comparison.

    3. Phi & Nick Stokes,

      We discuss the homogenization techniques used by NOAA in some detail in Section 4 of our “Urbanization bias III. Estimating the extent of bias in the Historical Climatology Network datasets” paper, which we have submitted for open peer review here: We also provide the data and code for our analysis on FigShare: Have you read it yet?

      I think you are both right to a certain extent. For the US Historical Climatology Network, TOBS introduces a warming trend of about +0.19°C/century, while the other adjustments (step-changes & FILNET) introduce about +0.16°C/century. In other words, TOBS is about half of the additional warming.

      We have done some preliminary calculations along the lines of Jerry Brennan’s 2005 analysis which Nick links us to, and although we haven’t published them yet, we find that the adjustments do seem reasonable, if we assume NOAA’s station history files are accurate. [Although for some reason NOAA seem to have stopped publically archiving the station history files – the most recent file we could find was the 1996 version!]

      So, I’d agree with Nick that TOBS has partially biased the early 20th century “warm” relative to present.

      However, we have found that the Menne & Williams step-change adjustments are seriously problematic when there is a high frequency of non-climatic biases in the records. In particular, there are two main problems:
      1. If many of the neighbour stations are affected by urbanization bias, then the “urban blending” problem means that Menne’s algorithm (a) doesn’t remove enough UHI from the urban stations and (b) introduces UHI into the rural stations
      2. If well-exposed stations are in the minority (as the Surfacestations results show), then Menne’s algorithm (a) doesn’t remove enough siting bias from the poorly exposed stations and (b) introduces siting bias into the well-exposed stations.

      As a result, much of the +0.16°C/century adjustments for the non-TOBS adjustments are probably spurious & inadequate. Considering the extent of the UHI biases and siting biases in the data, it is likely that a substantial “cooling” adjustment needs to be applied instead.

      If the magnitude of these necessary adjustments are ≤-0.19°C/century, then this would approximately cancel the TOBS adjustments & the 1930s would have been the hottest era on record for the US. If the necessary adjustments aren’t as large, then the 1930s would still have been about as warm as present.

      With all this in mind, I would argue that US temperatures during the 1930s (the “dust bowl era”) were at the very least comparable to recent temperatures, suggesting that recent temperatures aren’t particularly unusual.

      P.S. Phi, I think we might have been discussing some of these issues a while back on a couple of ClimateAudit posts, before I had submitted our papers…

        1. A C,
          For the Global Historical Climatology Network (GHCN) stations, NOAA provide two “urbanization” flags which can each have one of three values. One is based on associated population size (“R”: pop <10,000; "S": 10,000<pop100,000). The other is based on satellite estimates of nightlight brightness which were made during the mid-1990s (“A”: dark; “B”: dim; “C”: bright).

          Both of these estimates are quite crude, have problems, and not always very accurate. But, we found that if we combined both metrics together, we were able to identify the least urbanized and most urbanized stations reasonably well.

          For most of our analysis, we divided the stations into three categories:
          1. “Fully rural” (RA)
          2. “Fully urban” (UC)
          3. “Intermediate” (all other stations, i.e., RB, RC, SA, SB, SC, UA, UB).

          For the U.S. Historical Climatology Network (USHCN), i.e., the U.S. part of the GHCN, this broke down into 8% “fully urban”, 69% “intermediate” and 23% “fully rural”. For the rest of the GHCN, it broke down into 25% “fully urban”, 42% “intermediate” and 33% “fully rural”.

          Unfortunately, when NOAA are homogenizing the USHCN stations, they don’t actually use USHCN stations as “station neighbours”, but instead use the larger COOP dataset (from which the USHCN was constructed).
          So for our analysis of the USHCN step-change homogenization algorithim (Section 4.3.3), we couldn’t use the GHCN metrics for identifying the urbanization of the neighbours.

          Instead, we used the “GRUMP” urban extent dataset. If a station’s coordinates were within a GRUMP urban boundary, then it was considered “urban”, otherwise “rural”.

          On average, 44.5% of the COOP neighbours used for homogenizing each USHCN station are urban according to GRUMP. This is why urban blending is such a serious concern for NOAA’s step-change homogenization algorithm.

          The figure is higher for “fully urban” USHCN stations (61.0%), but a bit lower for “fully rural” USHCN stations (37.5%). This makes sense, since you are more likely to find urban neighbours near highly urbanized stations.

          Does that answer your question and/or make sense?

          1. D’oh! 😦
            I forget when you use the “less than” and “greater than” symbols, HTML treats them as something else!

            That beginning bit should read:
            “One is based on associated population size (“R”: pop less than 10,000; “S”: pop between 10,000 and 100,000; “U”: pop greater than 100,000)”

  10. What the discussion about siting suggests to me is that the conventional or consensus position is even farther out on a proxy limb that I’d first thought. The main concept is that “temperature” is a proxy for “climate”, and that CO2 is a proxy for industrial human influence. Now, even stipulating for a moment that the weather data on temperature, (neglecting humidity, clouds, precipitation, snow cover…) is a valid indicator of climate, we see that land use changes, particulate pollution, de- or re-forestation, pavement and construction, reservoir construction, and other human activity have competing influence on those temperature records. The attributation of the average of all such influence to “Urban Heat Islands” allows one to look at the data and determine that UHI offers no discernable signal in the overall noise. And yet the disaggregated regional and local data seems to indicate that each such factor has influence, increasing or decreasing, temperatures (and neglecting again such useful physical measures of climate heat content as humidity.) Put a reservoir HERE to serve a growing city THERE and the station NEARBY (given prevailing winds going THATAWAY) will be affected … in a fashion more or less unique to that station and unmatched by other reservoir/city growth/wind combinations. If we reject then the UHI hypothesis because it’s hard to measure, why do we default back to the CO2 hypothesis? That is, what if “warming” continues due to all the other human influence factors, even if solar or nuclear energy production and carbon-capture and sequestration efforts are fully implemented? Will we then start ripping up concrete surfaces? Re-seeding crop lands for forest? What data does the station siting project provide us about all the other knobs on the climate machine?

    1. Pouncer,

      We reanalysed the various studies claiming that UHI only has a negligible effect on global temperature estimates in our “Urbanization Bias I. Is It A Negligible Problem For Global Temperature Estimates?” paper, which we have submitted for open peer review here: [Data and code for our analysis on FigShare:].

      Have you read it? We also have written a less-technical summary of our UHI findings for our blog, which you might prefer if you don’t want to read our full paper:

      At any rate, we looked individually at all nine of those studies (from Hansen & Lebedeff, 1987 to Jones et al., 1990 to Parker, 2006 to Wickham et al., 2013), and found that in all cases their conclusions were unjustified. The problems varied from study to study, but in the end, none of them were valid.

      The UHI problem is a very insidious one, and I doubt that the exact magnitude of the bias can be satisfactorily resolved… using the current publicly-available data. However, I can confidently say that UHI biases have substantially reduced the magnitude of the 1950s-1970s “global cooling” and substantially increased the magnitude of the 1980s-2000s “global warming”. Figuring out the exact magnitudes would require more information than is currently available, in my opinion.

      There does seem to have been some “global warming” since the 1970s, but depending on (a) exactly how much warming and (b) how much “global cooling” occurred during the 1950s-1970s, it is hard to know how unusual recent temperatures are… 😦

      At present, it seems quite plausible to me that global temperatures could have been just as warm during the 1930s/1940s as present! At any rate, the “unusual” nature of the recent 1980s-2000s warming has been substantially overestimated…

  11. This reminded me of a problem I had trying to synchronize a Windows XP real-time process (yeah, I know, contradiction in terms) to an external device using Ethernet packets. The trick was recognizing that Windows was adding 8 milliseconds every now and again to the latency of the packets. Throwing away those outliers allowed the Windows process to reliably track the changes in timing of the external device. It stands to reason that including known outliers in the data invalidates anything one might do with the data. If I understand correctly, what Anthony has done is that he has proven statistically that data from poorly-sited stations are in fact, outliers, and that they have to be excluded if anything of value is to be learned by analyzing the data.

  12. I have had discussions with people involved in constructing benchmarks for testing and comparing various methods for adjusting station temperature series. The benchmarking process entails simulating a series with a known climate temperature and then adding into the series non climate effects that are thought to be realistic. I had a blog discussion with Venema, who was instrumental in authoring a European benchmarking test, about the capability of the adjustment methods finding non climate effects that change the temperature series gradually over time, i.e. resulting in long gradual trends. I do not think that Venema understood the limitations of breakpoint methods available in finding these trends.

    I have worked with breakpoint methods and I believe I understand these limitations. Those methods cannot reliably find those gradual trends and even when using differencing between near neighbor stations. It can be readily shown that near neighbor station temperature difference series that have previously been adjusted can have significant trends that are apparently caused, at least in part, by natural noise variations. Further if one models these temperature series the resulting red/white noise from simulations will produce near neighbor difference series with significant trends. (I should add here that the trends of series from station differences after adjustment using the latest GHCN algorithm are generally greater than those derived from model simulations which might mean that the algorithms are not adjusting for trends larger than those occurring naturally.) That is an aside and my point here is that with that naturally occurring noise level it becomes an impossible task to find and distinguish gradually changing non climate changes over longer time periods.

    Of course, those constructing benchmark performance tests for the these methods can merely ignore introducing conditions for gradual temperature changes by claiming that those changes are not realistic or at least rare in real world occurrences. What you will find in these benchmarking tests is that as non climate effected temperature changes become more subtle the methods overall performance decreases and in some cases dramatically and even when these gradual non climate changes are rather limited in use in the simulations. It should also be noted here that while benchmarking with simulations of the “truth” is probably the best approach to testing adjustment methods it is a bit of circular affair whereby we really do not know all the unknown non climate effects to put into the simulations or else we would already have a better handle on adjustment process.

    What I have requested of those developing better benchmarking tests is to be unafraid to test and show those non climate changes that would be difficult to impossible to find and make proper adjustments for and how that could affect the adjusted series trends and confidence intervals for those trends.

    We do have the satellite troposphere series since 1979 to compare with the ground series, but here we have disagreements about how the trends in the troposphere and ground should compare. Certainly the climate models vary on this issue. I do suspect that the reliability and certainty of the adjusted temperature series going back in time decreases significantly and certainly before 1979. Unfortunately a lot of temperature series investigations only go back to 1979.

    1. Kenneth,
      Have you read Section 4.3 of our “Urbanization bias III” paper that I linked above?

      We provided a quite detailed discussion of our main problems with the various breakpoint methods that have been used by NOAA NCDC (i.e., Karl & Williams, 1987; Easterling & Peterson, 1995; Menne & Williams, 2009). We also briefly discuss the Venema et al., 2012 study you refer to.

      1. Yes, I have read it. It is what reminded me of some of the work I had done previously on detecting gradual trends in temperature series. I do my investigations merely to determine whether I think the published accounts have truly covered all the bases or at least the more important ones. .

  13. Jeff, have you been following the goings-on about TOBS, and especially, PHA adjustments at Lucia’s and Judith Curry’s blogs? If you are able, would love to hear your take on current discussions.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s