Things that make you go HMM …

A guest post from Ryan O.  This post discusses some of our concerns about the global temperature data. I hope to continue this or Ryan can down the road. Ryan is took a brute force approach to looking at pre-homogenization trends of GHCN with a simple script.  I think you’ll find the results interesting.

———————-

So lately, a lot of people have been musing about the accuracy of the temperature indices.  One of the oft-repeated things is that three independent indices (CRU, GISS, NOAA) all yield similar results.  This is presented as confirmation that they cannot be that far off.

Of course, the world is not so simple.  All three indices depend in a large part on GHCN. They are not independent.  If there is something wrong with GHCN, it will carry through to all three indices. The “raw” data for GHCN consists of 13,472 stations.  You can download it here: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z Unpack using WinZip or similar (I use 7-Zip since it’s free), import into Excel to be able to split the year from the station identifier, and save as a tab-delimited file.  At the end of this, I will provide a script that will read that tab-delimited file into R.

Of those 13,472 stations in GHCN, some go back as far as 1701.  I arbitrarily selected 1900 as a cutoff date.  You are, of course, free to select other dates.  I then wanted to see what the data looked like.  To do this, I simply calculated anomalies using the period of 1900-2009, averaged, and also plotted the station density.  That yields this graphic:

allghcn.png

Of course, this looks quite hoaky (and most of you knew it would ahead of time).  The reason for this is simple – you can’t calculate anomalies without figuring out offsets for records that are incomplete during the baseline period.  Note how the shape changes greatly as stations are added/deleted (red line maxes at ~8,000 stations).  The purpose for showing this graph is to illustrate that, in order to use the greatest amount of data in the GHCN database, you MUST make adjustments.  There’s nothing untoward about the fact that adjustments are made . . . if you want to work with anomalies and you want to use as much of the data as possible, you have to make adjustments.

However, you now have a problem: How do you make the adjustments?

To avoid this complication, I simply discarded all series that did not have at least 1,000 points since 1900 (or about 80% complete). This allows me to compute anomalies using a common baseline, and it avoids the complication of having the geographical weight changing over time.  The locations in 1900 are the same as the locations in 2000.  This won’t necessarily be representative of the whole world, as the network is heavily weighted toward US stations, but I’m just taking an initial look and not trying to come up with my own index yet.  Just looking and thinking about what I see.

The 1,000 point cutoff yields 1,793 stations.  Unfortunately, there is still another problem.  A lot of stations dropped out after 1990.  So unless I want to just look at 1900 – 1990, I have to cull some more.  For this next cull, I required that from 1990 – 2009, stations had to have at least 180 points (or about 70% complete).  That yields 894 stations.  You could use other numbers; it doesn’t change the results much.  Anyway . . . now that we have a group of long record length stations that are fairly complete, let’s take a look at what we see (red line maxes at ~890 stations):

ghcn2009.png

loess2009.png

Hmm.   Some interesting things to note.  First, the overall warming from 1900 is only about 0.22 Deg C/Century, using the raw data.  Now, as stated before, this is not a geographically representative sample.  Would a representative sample yield another 1 Deg C of warming?  Dunno.  What is certainly interesting about this sample is that it has the same general shape as CRU/GISS/NOAA – but the 1930 – 1980 decline is more pronounced and the 1980 – present rise is less pronounced.

Also note that a large number of stations still drop out in ~2006. In fact, there are only about 100 stations from 2006 – 2009 that also have 1,000 points since 1900 and 180 points since 1990. I am not sure why, but as we know this can affect the analysis, let’s take a look at what happens if we just go through 2006:

ghcn2006.png

loess2006.png

More hmm.  The warming trend is under a tenth of a degree C from 1900 and the 2006 point on the smooth is about the same as the 1936 point on the smooth.  Perhaps the drop off in stations is causing a problem; perhaps not.  Not enough information in this analysis to tell.

So what does this mean?  I don’t know.  It certainly doesn’t mean that the temperature indices are wrong because this is a really basic analysis that does not take into account geographical representation or station moves (when the same station identifier was kept).  It also is interesting that the high warming trend is not present in this subset of raw GHCN data – meaning that, if the warming is due to adjustments, the three indices must be adjusting in very similar ways.  I have no thoughts on how likely that would be at the moment. Regardless, it’s interesting enough to make me want to dig deeper.  Maybe I will find that GISS/CRU/NOAA are fine.  Maybe not.

Either way, I’m sure there will be more to come.

Script:

temp=read.table(“v2.mean.txt”)

early.year=min(temp[, 2])

last.year=max(temp[, 2])

ghcn=matrix(ncol=length(unique(temp[, 1])), nrow=12*(1+last.year-early.year))

stn=0

stn.hdr=vector()

for(i in 1:nrow(temp)) {

if(i==1) {stn=stn+1} else {stn=ifelse(temp[i, 1]==temp[i-1, 1], stn, stn+1)}

stn.hdr[stn]=temp[i, 1]

record.time=12*(as.numeric(temp[i, 2])-early.year)

for(j in 1:12) {

tester=as.numeric(temp[i, j+2])/10

vals=ifelse(tester>-900 & tester<900, tester, NA)

ghcn[record.time+j, stn]=vals

}

}

colnames(ghcn)=stn.hdr

save(ghcn, file=”ghcn.RData”)

ghcn=ts(ghcn, early.year, freq=12)

ghcn=window(ghcn, 1900)

h=vector()

for(i in 1:ncol(ghcn)) {h[i]=length(na.omit(as.vector(ghcn[, i])))>1000}

ghcn.1723=ts(ghcn[, h], 1900, freq=12)

h=vector()

for(i in 1:ncol(ghcn.1723)) {h[i]=length(na.omit(as.vector(window(ghcn.1723[, i], start=1990))))>180}

ghcn.894=ts(ghcn.1723[, h], 1900, freq=12)

86 thoughts on “Things that make you go HMM …

  1. Why were temp stations dropped? How was dropping them initiated and by who? Did someone understand that eliminating “problematic” stations with “cooler” trends help set the table to create a hockey stick shaped upswing in temperature?

    Where is the information about the how and why for all these station dropouts?
    thanks
    Ed

  2. #2, I left this at RC:

    jeff id says:
    Your comment is awaiting moderation.
    15 December 2009 at 4:15 PM

    Eric,

    Do you know if CRU uses homogenized GHCN or do they do their own process?

    Ryan did a post at tAV which takes a brute force look at homogenization. It’s not particularly damaging to any case either way but it’s interesting.

    https://noconsensus.wordpress.com/2009/12/15/3649/

    My own opinion is that the homogenization may very well be acceptable. It would be nice to see it clearly spelled out why certain decisions are made. –I have read several papers on it and without someone to answer a few questions, it’s very difficult to figure out.

    #1 – I don’t know the answers. If someone does, I hope they clue us in.

  3. For those of us who don’t do this kind of thing routinely,

    1) Could I do this kind of thing with Excel or the Open Office version?
    2) I’ve done a few nontrivial things with xL but I usually had a fairly explicit crib sheet, and then I made variations on that theme.
    3) Where would I plug that ‘script’ in my Excel?

    I realized that I feel the need to actually compute something myself in order to move my debate (with my brother) up from the ‘dueling webpage references’ stalemate.

    I realize this is far removed from the kind of analysis performed by the big guys.

    But I think that moving this kind of ‘volks-analysis’ out to a broader audience might have significant value to more than just me.

    In your abundant free time…/sarc
    TL

  4. (posted this on RealClimate half an hour ago, and thought it may prove of interest to your and your readers… haha “volks-analysis” :))

    Hi all,

    I’ve coincidentally tried a somewhat comparable exercise yesterday. Downloaded raw and adjusted GHCN data. Then wrote a MATLAB script that reads the data, selects all WMO stations, selects the measurement series that are present in both datasets, determines the differences between them (i.e., the ‘adjustment’ or ‘homogenization’), bins the adjustments in 5-year bins, and plots the means and std’s of the data in the bins. Not surprisingly, both for the global dataset and the European subset this shows near-neutral adjustments (i.e., no “cooling the old data” or “cooking the recent data”). Additionally, the script shows the deviation from the 1961-1990 mean of each measurement series (both raw and homogenized). Strong warming in the most recent decades is absolutely obvious in both datasets. Here’s a link to the resulting PDF for Europe:

    RESULTS-EUROPE.pdf

    If you want to try it yourself (data + simple script + example output):

    GHCN-QND-ANALYSIS.zip

    I’m not a climatologist (although I am a scientist, and have performed QC on environmental data – which I guess puts me squarely on the Dark Side of the debate on AGW). Yet I’ve done this analysis in 4 hours, without any prior knowledge of the GHCN dataset. What this shows, in my opinion, is that anyone who claims to have spent yeeeaaaars of his/her life studying the dark ways of the IPCC/NOAA/WMO/etc., and still cannot reproduce their results or still cannot understand/believe how the results were obtained, is full of sh#t… (don’t be offended, I guess you’ve haven’t tried long enough to qualify… :))

    Keep it up.
    Steven.

  5. OK, now I am confused.

    Some smart guys plot adjustments vs. time and find disturbing trends.

    Other smart guys plot adjustments vs. time and find proof that no trend can be seen in the adjustments.

    What am I missing here?
    TL

  6. Paring stations WAY down to only those that extend back to the 1700s is revealing: almost all fail to show any recent upturn in trend:

    A couple more are included here that do show a slight upturn:

    These are no good for a global average, but the question becomes why a huge proposed upswing in global average doesn’t show up in individual records. If Global Warming is real, why doesn’t my own city of New York show any of it, at *all*?

    Long records strongly suggest to me that an average is being calculated in a way that hotter stations are being added towards the end, or cooler stations are being removed. Thus a global average could show an upturn while no individual stations do so.

  7. TurkeyLurkey said: “OK, now I am confused. Some smart guys plot adjustments vs. time and find disturbing trends. Other smart guys plot adjustments vs. time and find proof that no trend can be seen in the adjustments.”

    The “no trend” analysis is based on the global data set. The US data set shows much greater adjustments. I’ve found both plots and overlaid them here:

    I have yet to see an equivalent “no trend” analysis of the US data, meaning a histogram of slope changes.

    “No trend” (“no” = minimally positive but still postive) slope change analysis does not rule out shape changes though. Crowding all the warming into the last few decades and taking it out of the former ones will still shows “no trend change”.

    Also, most of these adjustments do *not* involve urban heating. So far only GISS does that as far as I know, and though homogenization may mix urban and rural data together it may not actually correct for urban heating (or rural land use changes).

    Confusing indeed!

  8. First of all, Ryan O, your post is well written and easy to comprehend. What you have shown is essentially what I have been attempting to show and that is that most of the centennial temperature trend (at least in the contiguous 48 US states) is due to adjustments. Therefore, when we compare different data sets like GNCN, GISS and CRU, that use essentially the same raw data, we are obtaining a measure of how well the adjustments agree. We can look at this measure in more detail by disaggregating the series differences into shorter time periods and into regions.

    I would also say that, lack of station quality control, as exposed by the Watts team CRN evaluations and ratings, puts much non-measured and unacknowledged uncertainty into the raw data itself. Changing micro climates have to have some very obvious affects on trends. Notice how the people at GHCN hand waved this problem off when emphatically confronted with it by, I believe Watts and Pielke Jr.

    I would strongly suggest the linked article below to anyone wanting to better understand the adjustments that go into the GISS and GHCN temperature data. Read it carefully to insure that you understand what uncertainties exist and whether there have been attempts made to measure them.

    We know the TOB, or time of observation, is a large adjustment and while I have not attempted to get a good feel for its validity I have seen written that it is supposed to be straight forward. The MMTS adjustment, from glass to min/max thermometers, should be straight forward also but I have seen more doubting on that adjustment than for TOB. Of these adjustments, non-homogeneity adjustments would have to be less straight forward in my mind and can be large. Even if random, I would think such a large adjustment would put an additional uncertainty on the adjusted temperatures. I am not at all convinced that that uncertainty is acknowledged and/or taken into account.

    http://pubs.giss.nasa.gov/abstracts/2001/Hansen_etal.html

    My overall impression is that the keepers of the temperature data sets have a very selfish interest in downplaying any uncertainties that exist in the adjusted (or raw) data or even between data sets. This state of affairs cries out for more independent study and published results. I sometimes think that independent investigators chase the easy and perhaps wrong aspects of uncertainty. I would encourage a two prong study of comparing the adjustments and understanding the differences between data set constructions and look much, much closer at the (changing) micro climate effects on raw temperature measurements. I sometimes think that independent investigators get too wrapped up in the effects of UHI when the real and more general one is micro climate effects which can occur in any environment.

    Below I want to document what passes for a scientific reply by Phil Jones and ask if we should trust and not verify (also read above the Jones reply on the same linked page a statement by Ben Santer who apparently thinks the different data sets are independent when that situation fits his purposes):

    No one, it seems, cares to read what we put up on the CRU web page. These people just make up motives for what we might or might not have done.

    Almost all the data we have in the CRU archive is exactly the same as in the Global Historical Climatology Network (GHCN) archive used by the NOAA National Climatic Data Center [see here and here].

    The original raw data are not “lost.” I could reconstruct what we had from U.S. Department of Energy reports we published in the mid-1980s. I would start with the GHCN data. I know that the effort would be a complete waste of time, though. I may get around to it some time. The documentation of what we’ve done is all in the literature.

    If we have “lost” any data it is the following:

    1. Station series for sites that in the 1980s we deemed then to be affected by either urban biases or by numerous site moves, that were either not correctable or not worth doing as there were other series in the region.
    2. The original data for sites for which we made appropriate adjustments in the temperature data in the 1980s. We still have our adjusted data, of course, and these along with all other sites that didn’t need adjusting.
    3. Since the 1980s as colleagues and National Meteorological Services (NMSs) have produced adjusted series for regions and or countries, then we replaced the data we had with the better series.

    In the papers, I’ve always said that homogeneity adjustments are best produced by NMSs. A good example of this is the work by Lucie Vincent in Canada. Here we just replaced what data we had for the 200+ sites she sorted out.

    The CRUTEM3 data for land look much like the GHCN and NASA Goddard Institute for Space Studies data for the same domains.
    Apart from a figure in the IPCC Fourth Assessment Report (AR4) showing this, there is also this paper from Geophysical Research Letters in 2005 by Russ Vose et al. Figure 2 is similar to the AR4 plot.

    I think if it hadn’t been this issue, the Competitive Enterprise Institute would have dreamt up something else!

    http://www.climatesciencewatch.org/index.php/csw/details/phil-jones-and-ben-santer-comment-on-cei

  9. I imported the December version to see how it compared to October’s version. The recent station counts for the U.S. (station ID beginning with 425) remain unchanged as far as I can tell:

    YEAR COUNT
    2006 1177
    2007 134
    2008 136
    2009 136

    I didn’t check for duplicate “thermometer versions.”

    Most of the 136 stations are airports, but Chiefio has been on this one for a while.

    136 stations works out to 2.7 airportmometers per state.

    I still can’t figure out why we are trying measure climate at airports. Anyone?

  10. The count for 2009 should really be 134.

    42572340000*2*2009: missing 9 out of 11 values
    42591190000*0*2009: missing 10 out of 11 values

    Why do they even bother to include these stations if most of the values are missing?

  11. Ryan,

    Your brute force analysis strikes me as having a lot more validity than a proxy reconstruction. People seem happy to draw conclusions from a brute force analysis of tree ring widths, so why not draw conclusions from a brute force analysis of mercury heights.

  12. Everyone,

    Yes, this can be gridded up, and I will do so at some point.

    Also, this makes no statement on the adjustments. I am not comparing raw vs. adjusted – I’m simply plotting raw. The adjusted data may show little to no change for these stations, as they are not geographically representative.

    Ken, your points are well taken. I have not looked at TOBS yet, but I will.

    The purpose of this is not to show that anything’s wrong with GHCN – it’s simply to garner ideas and pointers from you guys on what to look at (like in Ken’s post).

  13. I like Chiefio’s phrase for the decline in collated measurements in the early 1990s: The Great Dying of Thermometers.

  14. Willis Eschenbach said that he and Anthony Watts were setting up surfacetemps.org to put all the data and analysis into one place. Sounds like what is needed.

  15. Looks almost like an average of the GISS North America and GISS Global anomolies which makes sense since so many of your quality stations should be weighting towards the US. Will be very interesting to see your results.

    Question: instead of making “adjustments” for station moves to make them continual long-term trends, what does the data look like if you treat a moved station as a totally different station/trend altogether and only use the trend data from one location and treat the trend at a later location as a totally different station? Would love to see this same chart with moved stations also weeded out if possible.

    Would also like to weed out poorly sited stations (requires a global survey). Could at least do for the US and see how it compares to GISS US.

  16. Sorry for not interacting much today. It’s been busy again.

    #6, The reason for the difference in results between Ryan and Steve is that Ryan took into account the offset required to knit short anomaly stations in.

    If you have two thermometers at the same location measuring the exact same values and a perfect noiseless trend of 1C/century. Assume one is operational for the last 100 years the other for the last 50 (same exact measurements every day). When you anomalize the first gets an initial value at year zero of -.5 and at the ending year +/5 at year 50 it’s a perfect 0. The fifty year thermometer gets an intial value at -.25 and an ending anomaly value of +.25 so when the two are averaged at year fifty there is a sudden step in the average due to the introduction of a new station even though in true temp they had the same exact value.

    So there are different ways to counter for that effect. one is to offset the new station by the other’s temperature record and eliminate the step. A second method which Ryan used here is to not use the ‘short’ data at all.

    Therefore Steve’s method used all thermometers averaged together, he didn’t account for the steps! Ryan’s Brute force approach eliminates all the potential bias and get’s a different result. It begs the question, does Steve’s CRU’s, NOAA’s or GISS method create trend purely by the introduction of improperly offset short series. I believe CheifIO, like Ryan also found a yes answer.

    This is actually a pretty strong post, despite my tone softening comment at RC. I for one would like a reasoned explanation.

    I think Steve’s post still did a good job, you can see that a trend was still added to the homogenized data manually despite the additional bias found by Ryan. Steve’s post didn’t address the same real issues.

    In my opinion, it is simply not possible to have such huge variations in station numbers without dramatically affecting trends and these results cannot be ignored. The methods can be easily cherrypicked as my Antarctic post showed. All but one station was dropped from the antarctic record, creating a false trend increase. The same effects can occur as they are added in.

  17. I think there is a pretty obvious explanation for this:

    When you “discarded all series that did not have at least 1,000 points since 1900” and required that “from 1990 – 2009, stations had to have at least 180 points (or about 70% complete)”, you accidentally selected for temperature data from the United States.

    Since it is generally agreed that the United States got as warm in the 1930s as it is today, your smoother graph is also as warm in the 1930s as it is during the present day.

    A geographic analysis of the remaining stations should confirm or rebut this point.

  18. Presumably the trick would be to be able to graft the older historical temperature record (when we had lots of stations) on to the satellite temperature record for the more recent past (when we didn’t)?

    But hmmmm indeed.

  19. Re $18 ahhh, my patented ‘ask a stupid question’ technique works once again. I appreciate your sophisticated version of the complex answer.

    Re #9;

    Your efforts are laudable, but I’m thinking that there’s 1.8 degrees F for every degree C.

    But your graph;
    “The “no trend” analysis is based on the global data set. The US data set shows much greater adjustments. I’ve found both plots and overlaid them here:

    http://i46.tinypic.com/6pb0hi.jpg

    seems to indicate that
    1.08 degree C is plotted on the same line as 0.6 degrees F.

    So, as far as I can tell, you might want to rescale that overlay?

    Re #5;, I guess the way you got to ‘nothing fishy’ was to look at all the stations instead of the long-record stations. I’m further guessing that you are blessing the ‘southward march’ and the ‘downhill march’ of the thermometers of note.

    I guess that if Ryan O had separately plotted all of the short-record stations, it would be remarkably different than the lengthy ones.

    thanks to all.
    TL

    TL

  20. #19

    I was wondering the same thing which is why I suggested spatial weighting (ala the Jeff’s for Antarctica). However, I think the chances that this idea might not have occurred to Ryan are likely slim to none. 😉

  21. “Not surprisingly, both for the global dataset and the European subset this shows near-neutral adjustments (i.e., no “cooling the old data” or “cooking the recent data”).” – Steven van Heuven

    Curious. Eyeballing your Europe graph, it looks as if the temperature rise from (picking obvious round dates) say 1900-2000 has been adjusted from about 1.1 degrees to 1.4 degrees.

    If increasing the measured warming by 27% doesn’t count as “cooking the books” in your world, one can only wonder what does!

  22. #23 Lurk,

    Ryan’s one of those annoying people who doesn’t miss much.

    “that does not take into account geographical representation or station moves”

    I’m sure he realized the geographic weighting can cause a problem which is the reason for the title of the post. However, if I had to guess, the inclusion and exclusion of non-offset anomaly data probably creates more trend than geographical location. It’s a complete guess though and perhaps wrong.

  23. 89 billion dollars spent on global warming and we are only using 134 thermometers to measure the temperature of the United States? Damn, and they castigated the military for 600 dollar toilet seats.

  24. #25 I think the homogenization process has more effect than geographical, but I will find out for sure eventually. 😀

    Lurk . . . I will do even better than what you suggest, though it will take some time. 😉

  25. #28, It’s written in R. The language is free and easily found in a google search. Steve McIntyre promotes R pretty regularly for this reason. It’s very similar to Matlab (which is a more functional language) but R is a free download and anyone can verify.

  26. #27

    Ryan, haven’t seen you drop a teaser like that for several months. It’s like “twenty questions” or something. 🙂 Here is my second question: if I refreshed myself on the Antarctic posts several months back could I deduce the method you are hinting at?

  27. Still am incredoulous that the climate science tries to estimate “one” global average. Seems pretty silly to me. Least we should have is 48 seperate zones divided on our 24 hour time scale and then north/south. To average all of these does not help in “seeing” the W5. Maybe my area of the map is flat, whilst south of me is cooler by .5, but next door to that is wamer by 1.2. Suddenly we have a “global” warming trend? Me thinks not!

  28. Gents;
    Thanx, I now see that the answer “R” was in the original post. Duh…
    Eye appointment Friday… Expect I’ll cross the rubicon of +3.5add on my lenses…

    Well, I was counting parentheses on my LISP program, late one night in the basement of the computing center, ~37 years years ago, when I decided that LISP SUCKS and that I hated being an IS major.

    I’ve successfully avoided learning any non-assembler languages, but I’m getting up a head of steam to learn this dang R thing.
    At least I have a reason to care about it, and a nice community of people generating compelling examples.

    Thanks for your patience.
    TL

  29. “If you have two thermometers at the same location measuring the exact same values and a perfect noiseless trend of 1C/century. Assume one is operational for the last 100 years the other for the last 50 (same exact measurements every day). When you anomalize the first gets an initial value at year zero of -.5 and at the ending year +/5 at year 50 it’s a perfect 0. The fifty year thermometer gets an intial value at -.25 and an ending anomaly value of +.25 so when the two are averaged at year fifty there is a sudden step in the average due to the introduction of a new station even though in true temp they had the same exact value.”

    Jeff, can’t this be avoided by just calculating a per-station trend using a least squares fit on the raw station data? This should even be relatively insensitive to large gaps in the data. I’ve never understood this need to anomalize raw data first.

    Then take your per-station trends and bin them up to show a frequency distribution. I am guessing it’s a relatively fat distribution, centered on 0.

    Then you could also calculate some similar distributions by latitude, country, etc… and see if there is anything that sticks out.

  30. Steven,

    Your post proves a point, but perhaps not the one you wanted to make.

    The initial mistake I made was to assume GHCN was adjusting raw data, but at least in the US, it has already been adjusted a number of times by NCDC for the USHCN. Each step introducing more possible smaller error while removing gross discontinuities. It has been an eye opening journey to see how many times the data is adjusted by various algorithms. I’m sure I still don’t have it all completely right, but this is what I have seen so far at least for US data. You need to ensure your raw..is in fact raw, not already cooked.

    My observation is that the accumulation of these uncertainties appears to exceed the range of detected warming signal that is claimed.

    Every time someone adjustments the data they also increase the band of uncertainty. This uncertainty builds upon the uncertainty already in the raw data prior to hand off to GHCN. Many of the adjustments appear to have legitimate reasons of trying to remove large discontinuities or false overall trends. However, every time they modify the data with an algorithm that reduces the volatility, they add some uncertainty even while these other problems are fixed.

    In the US, the data is first gathered daily from the station.
    The data is collected and reported in 1 degree F increments.

    DSI-3200 Page 4:
    “The accuracy of the maximum-minimum temperature system (MMTS) is +/- 0.5
    degrees C, and the temperature is displayed to the nearest 0.1 degree F. The observer records the values to the nearest whole degree F. A Cooperative Program Manager calibrates the MMTS sensor annually against a specially maintained reference instrument.”

    Click to access td3200.pdf

    So before any adjustments are made, the data has an error range of +/- 1F or .5C !

    Then it is adjusted least 3 times before it is handed to GISS
    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

    Time of Observation Bias Adjustments (Adjustment #1) (Error range unknown)
    “Next, monthly temperature values were adjusted for the time-of-observation bias (Karl, et al. 1986; Vose et al., 2003).”

    “The TOB-adjustment software uses an empirical model to estimate and adjust the monthly temperature values so that they more closely resemble values based on the local midnight summary period.”

    Homogeneity Testing and Adjustment Procedures (Adjustment #2) (Error range may be shown by NCDC, see below)
    “Following the TOB adjustments, the homogeneity of the TOB-adjusted temperature series is assessed. In previous releases of the U.S. HCN monthly dataset, homogeneity adjustments were performed using the procedure described in Karl and Williams (1987).
    Unfortunately, station histories are often incomplete so artificial discontinuities in a data series may occur on dates with no associated record in the metadata archive. Undocumented station changes obviously limit the effectiveness of SHAP. To remedy the problem of incomplete station histories, the version 2 homogenization algorithm addresses both documented and undocumented discontinuities.
    Estimation of Missing Values (Adjustment #3) (Error range unknown)
    Following the homogenization process, estimates for missing data are calculated using a weighted average of values from highly correlated neighboring values. The weights are determined using a procedure similar to the SHAP routine. This program, called FILNET, uses the results from the TOB and homogenization algorithms to obtain a more accurate estimate of the climatological relationship between stations. The FILNET program also estimates data across intervals in a station record where discontinuities occur in a short time interval, which prevents the reliable estimation of appropriate adjustments.
    Urbanization Effects (NCDC says this is covered by their Homogenization algorithms)
    In the original HCN, the regression-based approach of Karl et al. (1988) was employed to account for urban heat islands. In contrast, no specific urban correction is applied in HCN version 2 because the change-point detection algorithm effectively accounts for any “local” trend at any individual station. In other words, the impact of urbanization and other changes in land use is likely small in HCN version 2.”

    Now after starting out with an observation error range of +/- 1F (.5C). Every one of these prior adjustments adds uncertainty to the data.

    For example:

    1. Raw Data point is 15C +/- .5C That means the range is 14.5C to 15.5C
    2. The first adjustment add +/- .25. Now the range is 14.25C to 15.75C
    3. Second adjustment +/- .25 Now the range is 14C to 16C
    4. Third adjustment is +/- .505 Now the range is 13.5C to 16.5C!

    I chose these numbers for 2,3 & 4 as examples, I do not yet know the real numbers. However I chose their net value because the NCDC gives an example in the document that describes their process. Their chart for Reno shows error bars that look to be around 1.8C to 2C range error introduced by their adjustments. This is additive to the +/- .5 C built in to the raw measurement as there is no indication it includes the raw error range. It could actually be worse as it is unclear if the TOB & Missing data interpolation is part of their uncertainty calculation.
    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

    Only after these adjustments are doen GISS get the data for merge into the GHCN.
    http://data.giss.nasa.gov/gistemp/sources/gistemp.html

    Then, incredibly, they make more adjustments!

    http://data.giss.nasa.gov/gistemp/sources/gistemp.html
    First they remove the Adjustment #3 data from above. So in a sense they remove one source of error. (Adjustment #4)
    The reports were converted from F to C and reformatted; data marked as being filled in using interpolation methods were removed.If they only remove part of Adjustment #3, then more uncertainty is introduced.

    This indicates GISS is using the NCDC/USHCN data that has been through Adjustment #3 and they take it mostly back to Adjustment #2.

    Then, they homogenize the data again! (Adjustment #5)

    “The goal of the homogeneization effort is to avoid any impact (warming
    or cooling) of the changing environment that some stations experienced
    by changing the long term trend of any non-rural station to match the
    long term trend of their rural neighbors, while retaining the short term
    monthly and annual variations. If no such neighbors exist, the station is
    completely dropped, if the rural records are shorter, part of the
    non-rural record is dropped.”

    The specific stated goal of this adjustment is to take into account Urban Heat Island effect. Yet, NCDC says they already adjusted for this when they homongenized the data! Now the data is twice baked for UHI. (Which I thought the IPCC and Jones, Wang (1990) said was negligible)

    A bit concerning is the video data analysis show UHI is still in the data by sampling city/urban pairs around the country. Hmm… are we smarter than a 6th grader..:-) (I need the URL, but it is well known and easily reproduced independently)

    Roman M has done an excellent job in his blog showing these GISS adjustments are driving a bias into the data.
    http://statpad.wordpress.com/2009/12/12/ghcn-and-adjustment-trends/

    I have sampled numerous individual stations in CA and seen the same bias being introduced by this this process. I described the process for doing this in comments over at WUWT. Others using that method have found the same results in NY, Grand Canyon, Calgary, etc. http://wattsupwiththat.com/ (I need to find the exact links, sorry)

    So there are some building concerns about this final step. First, it appears duplicative to (at least in the US) what has already been done. Second it appears to be reshaping the curves to fit the story. No one has any indication if this is just a bad algorithm or deliberate.

    Don’t forget, this also introduces uncertainty. NCDC estimated their homogenization introduced up to +/- .9C if I read their example chart correctly.

    So right now it appears the inherent cumulative range of error far exceeds the claimed warming signal that has supposedly detected.

    This chart shows GISS claiming a warming signal of .6 C detected.
    http://data.giss.nasa.gov/gistemp/2008/

    Remember, even the raw data was +/- .5 C from the moment it was written down per the NCDC.

    The most incredibly generous reading of the Reno example for NCDC (assuming it includes the raw error rate, TOB, Missing Value Estimates included) shows an error band of at best +/- .9 C.

    Then there is whatever uncertainty is added by the GISS process that also appears to bend the curve.

    I don’t claim to know all the right numbers for the total uncertainty being introduced into the data. However, this error budget must be fully disclosed and understood. It certainly appears that the signal is less than the noise introduced by the original measurement and the up to 5 adjustments. Exactly by how much is critical to prove the point that a warming signal has in fact been detected.

    I don’t know what process the raw European data goes through. Until someone describes the path from observation to the end, it is hard to say. Your post shows early 20th century adjustments of +/- 1C. Given my earlier mistake on GISS “raw” that turned out to already have been modified by NCDC, you may want to investigate if your data may be in the same position. Assuming a similar raw observation error of +/- .5C, the cumulative error range is already +/- 1.5C. It is really hard to claim a 1C signal detection in that environment.

    Given this is an open review process, I’m sure others will find mistakes in this post, let me know and I will investigate and correct as required.

  31. I think the sudden decrease in station counts is suspect, especially when it appears those that have survived have a marked warming bias. Does GHCN publish their criteria for station inclusion/exclusion? Ideally they’d have some unbiased criteria which everybody could critique and verify.

  32. Steven,

    Your post proves a point, but perhaps not the one you wanted to make.

    The initial mistake I made was to assume GHCN was adjusting raw data, but at least in the US, it has already been adjusted a number of times by NCDC for the USHCN. Each step introducing more possible smaller error while removing gross discontinuities. It has been an eye opening journey to see how many times the data is adjusted by various algorithms. I’m sure I still don’t have it all completely right, but this is what I have seen so far at least for US data.You may want to check that the raw is actually raw.

    My observation is that the accumulation of these uncertainties appears to exceed the range of detected warming signal that is claimed.

    Every time someone adjustments the data they also increase the band of uncertainty. This uncertainty builds upon the uncertainty already in the raw data prior to hand off to GHCN. Many of the adjustments appear to have legitimate reasons of trying to remove large discontinuities or false overall trends. However, every time they modify the data with an algorithm that reduces the volatility, they add some uncertainty even while these other problems are fixed.

    In the US, the data is first gathered daily from the station.
    The data is collected and reported in 1 degree F increments.

    DSI-3200 Page 4:
    “The accuracy of the maximum-minimum temperature system (MMTS) is +/- 0.5
    degrees C, and the temperature is displayed to the nearest 0.1 degree F. The observer records the values to the nearest whole degree F. A Cooperative Program Manager calibrates the MMTS sensor annually against a specially maintained reference instrument.”

    Click to access td3200.pdf

    So before any adjustments are made, the data has an error range of +/- 1F or .5C !

    Then it is adjusted least 3 times before it is handed to GISS
    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

    Time of Observation Bias Adjustments (Adjustment #1) (Error range unknown)
    “Next, monthly temperature values were adjusted for the time-of-observation bias (Karl, et al. 1986; Vose et al., 2003).

    The TOB-adjustment software uses an empirical model to estimate and adjust the monthly temperature values so that they more closely resemble values based on the local midnight summary period.”

    Homogeneity Testing and Adjustment Procedures (Adjustment #2) (Error range may be shown by NCDC, see below)
    “Following the TOB adjustments, the homogeneity of the TOB-adjusted temperature series is assessed. In previous releases of the U.S. HCN monthly dataset, homogeneity adjustments were performed using the procedure described in Karl and Williams (1987).
    Unfortunately, station histories are often incomplete so artificial discontinuities in a data series may occur on dates with no associated record in the metadata archive. Undocumented station changes obviously limit the effectiveness of SHAP. To remedy the problem of incomplete station histories, the version 2 homogenization algorithm addresses both documented and undocumented discontinuities.”
    Estimation of Missing Values (Adjustment #3) (Error range unknown)
    “Following the homogenization process, estimates for missing data are calculated using a weighted average of values from highly correlated neighboring values. The weights are determined using a procedure similar to the SHAP routine. This program, called FILNET, uses the results from the TOB and homogenization algorithms to obtain a more accurate estimate of the climatological relationship between stations. The FILNET program also estimates data across intervals in a station record where discontinuities occur in a short time interval, which prevents the reliable estimation of appropriate adjustments.
    Urbanization Effects (NCDC says this is covered by their Homogenization algorithms)
    In the original HCN, the regression-based approach of Karl et al. (1988) was employed to account for urban heat islands. In contrast, no specific urban correction is applied in HCN version 2 because the change-point detection algorithm effectively accounts for any “local” trend at any individual station. In other words, the impact of urbanization and other changes in land use is likely small in HCN version 2.”

    Now after starting out with an observation error range of +/- 1F (.5C). Every one of these prior adjustments adds uncertainty to the data.

    For example:

    1. Raw Data point is 15C +/- .5C That means the range is 14.5C to 15.5C
    2. The first adjustment add +/- .25. Now the range is 14.25C to 15.75C
    3. Second adjustment +/- .25 Now the range is 14C to 16C
    4. Third adjustment is +/- .505 Now the range is 13.5C to 16.5C!

    I chose these numbers for 2,3 & 4 as examples, I do not yet know the real numbers. However I chose their net value because the NCDC gives an example in the document that describes their process. Their chart for Reno shows error bars that look to be around 1.8C to 2C range error introduced by their adjustments. This is additive to the +/- .5 C built in to the raw measurement as there is no indication it includes the raw error range. It could actually be worse as it is unclear if the TOB & Missing data interpolation is part of their uncertainty calculation.
    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

    Only after these adjustments are doen GISS get the data for merge into the GHCN.
    http://data.giss.nasa.gov/gistemp/sources/gistemp.html

    Then, incredibly, they make more adjustments!

    http://data.giss.nasa.gov/gistemp/sources/gistemp.html
    First they remove the Adjustment #3 data from above. So in a sense they remove one source of error. (Adjustment #4)
    “The reports were converted from F to C and reformatted; data marked as being filled in using interpolation methods were removed.”

    If they only remove part of Adjustment #3, then more uncertainty is introduced.

    This indicates GISS is using the NCDC/USHCN data that has been through Adjustment #3 and they take it mostly back to Adjustment #2.

    Then, they homogenize the data again! (Adjustment #5)

    “The goal of the homogeneization effort is to avoid any impact (warming
    or cooling) of the changing environment that some stations experienced
    by changing the long term trend of any non-rural station to match the
    long term trend of their rural neighbors, while retaining the short term
    monthly and annual variations. If no such neighbors exist, the station is
    completely dropped, if the rural records are shorter, part of the
    non-rural record is dropped.”

    The specific stated goal of this adjustment is to take into account Urban Heat Island effect. Yet, NCDC says they already adjusted for this when they homongenized the data! Now the data is twice baked for UHI. (Which I thought the IPCC and Jones, Wang (1990) said was negligible)

    A bit concerning is the video data analysis show UHI is still in the data by sampling city/urban pairs around the country. Hmm… are we smarter than a 6th grader..:-) (I need the URL, but it is well known and easily reproduced independently)

    Roman M has done an excellent job in his blog showing these GISS adjustments are driving a bias into the data.
    http://statpad.wordpress.com/2009/12/12/ghcn-and-adjustment-trends/

    I have sampled numerous individual stations in CA and seen the same bias being introduced by this this process. I described the process for doing this in comments over at WUWT. Others using that method have found the same results in NY, Grand Canyon, Calgary, etc. http://wattsupwiththat.com/ (I need to find the exact links, sorry)

    So there are two building concerns about this final step. First, it appears duplicative to (at least in the US) what has already been done. Second it appears to be reshaping the curves to fit the story. No one has any indication if this is just a bad algorithm or deliberate.

    Don’t forget, this also introduces uncertainty. NCDC estimated their homogenization introduced up to +/- .9C if I read their example chart correctly.

    So right now it appears the inherent cumulative range of error far exceeds the claimed warming signal that has supposedly detected.

    This chart shows GISS claiming a warming signal of .6 C detected.
    http://data.giss.nasa.gov/gistemp/2008/

    Remember, even the raw data was +/- .5 C from the moment it was written down per the NCDC.

    The most incredibly generous reading of the Reno example for NCDC (assuming it includes the raw error rate, TOB, Missing Value Estimates included) shows an error band of at best +/- .9 C.

    Then there is whatever uncertainty is added by the GISS process that also appears to bend the curve.

    I don’t claim to know all the right numbers for the total uncertainty being introduced into the data. However, this error budget must be fully disclosed and understood. It certainly appears that the signal is less than the noise introduced by the original measurement and the up to 5 adjustments. Exactly by how much is critical to prove the point that a warming signal has in fact been detected.

    I don’t know what process the raw European data goes through. Until someone describes the path from observation to the end, it is hard to say.
    Your post show early 20th century adjustments of +/- 1C. Given my earlier mistake on GISS “raw” that turned out to already have been modified by NCDC, you may want to investigate if your data may be in the same position. Assuming a similar raw observation error of +/- .5C, the cumulative error range is already +/- 1.5C. It is really hard to claim a 1C signal detection in that environment.

    Given this is an open review process, I’m sure others will find mistakes in this post, let me know and I will investigate and correct as required.

  33. Just to add a general observation here: In the GISS/Hansen paper that I linked above it is freely admitted that the while the calculations for the adjustment for TOB is straight forward, given that you have an accurate documentation of when the observation times changed and to what hours, the key remains that accurate documentation is first required. TOB and station change inhomogenities are the largest adjustments made with regards to trend, but in the end both depend, as noted by Hansen in the paper, on documentation. (USHCN forgets the documentation and uses change points for some of these adjustments and it make large differences at some stations.)

    Now please think about the quality control claimed by USHCN for its stations and then what the Watts evaluation team discovered and then tell me that you have good faith and trust in the TOB and change of station documentation that is claimed or at least implied by the temperature series owners. Look at the sloppiness revealed of the CRU data set owners in “read me Harry” and in losing raw data. Look also how all this is waved away by the owners and claimed it does not matter. And finally look at how the users of these data and other parties who should interested in accurate data shrug and carry on.

  34. #37 It’s important, not to calculate the trend on raw non anomaly data because the annual temperatures have a huge sinusoidal signal from seasonal change. The seasonal signal overwhelms the trend variation. That’s why it’s so important that technical people like yourself grab a bit of data and run a plot.

    It’s not hard and the data is available.

  35. Hey Jeff;
    Thanks for pointing out another blindingly obvious thing that I had not realized until you wrote it.

    What would be the effect of truncating short records at multiples of 365 days length, and fitting the trend to each one as #37 was suggesting?
    Whenever a purported discontinuity (change of equipment, location, etc) just separately truncate that segment and then mod365.25 days. Hmm, tricky to do that fractional day thing…

    I like the idea of avoiding as many ‘adjustments’ as possible.
    I’m not sure of the value of trying to adjust the segments to get rid of shifts…
    Talk is cheap; I’m not saying anyone ‘should’ do this.
    I’m just curious what would be wrong with this approach.
    TIA
    TL

  36. You can’t just fit trends and then tack them all together; your final answer will be totally erroneous.

    Take any noisy time series – be it the CRU index, an individual station, or a stock price. Break it up into intervals (your choice). Perform a linear regression for each interval and tack them end-on-end. Then compare that result to the linear trend using all the data.

    No worky.

    The problem is that in the regression model y=mx+b, you have to keep track of all of the b’s (the y-intercept). In other words, even if you try just to use trends, you still have to keep track of the intercepts, which means you have to know what the offset is from one segment to the next. Now how do you know if a “shift” is due to real no-shit temperature change? Or a station move? Or UHI? Using trends gives you no additional information.

    The other issue is with anomaly baselines. When it comes to climate, we only care about the change. That change has to be relative to a baseline. If you have incomplete records, you cannot baseline the records to the same period without figuring out what the offset is from one station to the next. With anomalies, the issue is intractable. You can’t go back and add more stations, so if you want to use information that only has partial coverage temporally, you must adjust.

    In other words, the trend thing won’t work.

  37. It is nice to see people looking at the Nasa data which has been corrupted either way due do their own release statements in the last few years. However, one major problem these programs have is beyond obvious. There is no account of global position of each temperature taken. For instance, let say the wall in your room was to be measured for a wall temperature. This wall temperature was to be the average temperature across the entire surface. By all laws of sampling as taught in simple engineering school or grade school science fair projects, one needs to take a temperature reading in a uniform distance across the surface so as to have a uniform sampling density. Why? Because, if you take 100 samples on the left side at the floor and one sample in the middle, it would be invalid sampling. This is fairly obvious but climate people seem too not have a clue. Most of all temperature science is found under “black body radiation and sampling” chapters in most engineering books. In the real world of steel production, temperature sampling is extremely important and is well understood. Uniform or equally distant sample points are required unless a proven distance to temperature capacity formula is used to quantify errors in true distance sample. This means you placed 100 points on a surface uniformly but a few are millimeters off. Averaged cells are used to form a pyramid of cells with error values for each cell. Every sample has a sampled error and a corrected error.

    Just being Captain Obvious here, but none of the sample data will allow a global temperature even for the USA. Period and end of story. What can be said, you have historical data on several temperature reading sites. One can not use it for global or average temperature at all. Just not correct.

    Sadly, the temperature data can say maybe for a city that the temperature has changed over time but that is it. All this was well known back in the 1950’s and 1960’s. That is why the thought was to have Satellites measure the overall surface. But, some snags there as well, one is the aging of Sat’s sensors has not been sufficiently verified as recommended by the engineers. For instance, an uniform swath of land with temperature readings would need to be verified by the Sat data. This comparison would give a confidence number as to the correctness of the Sat equipment. But once again, only done initially and dropped due to budget cuts. And, Sat sensing has its problems due to Ionosphere interference, radar curvature problems, atmospheric interference. Hence, the sampling was never corrected. This was deemed OK since they were just used as a general feel not tenths of a degree accuracy. Maybe plus or minus 5 degrees C but then one could not say if the world was warming or cooling a degree.

  38. You mention station dropout, but if you check the source country, quite often the station is still there and recording data, but the new data is missing for some reason. Not sure of the reason for this, though the ones I have looked suprise suprise show no warming in the last 20 odd years. I also noticed in GISS that some stations disapear after being homogenised, again not sure of the reason.

    I thing to look at also when looking at station geographical coverage is to also account for changes in stations numbers at differing altitudes. By dropping out high altitude stations and adding in more coastal stations you can skew the data to warming whilst still accounting for station coverage weightings and homogenisation. If the change in higher alititude stations is not accounted for the results are easilly skewed, and I am not sure of anyone actually considering this – but its key to getting it right!

  39. The trend method can, indeed, work. In the first place – you really truly don’t care what the intercept is. Since the entire goal of the exercise is a trend, you never have to convert back to a true temperature. But we actually could with one solid current reading.

    This is exactly analogous to tracking a time-varying position (say: a car) versus tracking the velocity. If the only thing you ever watched was the velocity, you can still peg the average velocity in the end. If you have a single reading of time-position-velocity that you trust, you can convert the entire velocity chart back into position. (Velocity is the first derivative of position with respect to time, and ‘temperature trend’ is likewise the first derivative of temperature with respect to time.)

    The issue with the raw data is: someone is picking up the damn car and relocating it at random. Or fiddling with the odometer. The speedometer still basically works. But the silent relocations are brutal.

    Discontinuities on the seasonally smoothed temperature-time plot imply station moves/instrument changes/other systemic issues – even when there isn’t an accompanying station move notice.

    Picture this set of data with no further info:

    1900 0
    1910 1
    1920 2
    1930 3
    1930.01 10
    1940 9
    1950 8
    1950.01 3
    1960 2
    1970 1
    1980 0
    1990 2
    2000 4

    I’m happier dealing with that set of data from the ‘trend’ domain after splitting on discontinuities. “Excess” splitting means there’s less data in each trend-segment -> increased but directly calculable errors. A ‘missed’ discontinuity basically means it wasn’t large enough to detect – which adds it back into the other noise. That is: a missed discontinuity is a spot with the potential to improve our calculation – but missing it doesn’t dramatically foul it up more than the original temperature-domain calculations.

    I started with three stations (Mountain, valley, half-way-up) and +1C/d to 1930, -1C/d to 1980, and +2C/d to 2000. Yes it is simplistic, and yes dealing with “seasonal adjustment” is irritating, no this doesn’t directly correct for UHI. But a calculation of “This is what the thermometer’s temperature history would have been if it hadn’t ever been moved” would be quite handy.

  40. I have a forecast on the site, http://www.aerology.com/national.aspx I was the one to pay to have it put together. I have invested 25 years of my life (and about 20% of my income, from those years) to find the elusive natural analog weather forecast, and now I have it on line for the whole world to use, for the next 4 whole years, free of charge.

    There has been no cherry picking of data stations, no adjustments to the raw data, no hidden agenda, straight plotting of past data in sync to a natural cyclic pattern that allows the viewer to see the “natural patterns of variability,” that are so needed to be included in the quest for sorting the noisy short term [monthly to decade long] signals, to quantify the parts that solar, geomagnetic, lunar tidal, and CO2 forcings, play in the whole climate scheme of interactions, so that the individual components can be separated, and algorithms derived, to fix the problems with the weather and climate models.

    I do not qualify for grants, subsidies, or even the minimum qualifications to submit for “peer review publishing.” I have spent 30 years reading research, of it, 10 years sorting data and looking for patterns, then 15 more years working the bugs out, and trying to find a format to present it in to be understandable.

    Why do gate keepers always end up standing in the way of progress, in the name of keeping the science pure? Science is the study of how truth comes out of the darkness, as more understanding from the side lines becomes mainstream, and fits into the improved picture of reality.

    I was hoping to add to the total understanding, and through the exposure to the real weather, and climate scientists in this forum, to effect peer review, to further improve the end product that is available to the public, for the greater well being, of the inhabitants of the whole planet. I am not looking for power, money, or fame, just to leave something good behind when I go.

    Richard Holle

  41. (Some how lost this off of my post above)

    The raw data set I used is the Cooperative Summary of the day TD3200 POR-2001 Rel.Nov 2002.
    How do I find out how much adjustment and homogenization has been done to this set of data, I have assumed that this is a set of totally raw data, was I wrong to assume or even hope for this?

    However it does show trends due to the aforementioned cyclic patterns, that could be used to hone the short term accuracy of the models if this trend in the cyclic patterns was considered into the models.

    The results of the analog cyclic pattern I discovered repeat with in a complex pattern of Inner planet harmonics, and outer planet longer term interferences that come round to the 172 year pattern Landscheidt discovered, so this is just the shorter period set of variables, that further define the limits, of the natural variables needed to be considered, along side the CO2 hypothesis, as the longer term/period parents (Milankovich and Landscheidt cycles) of these driving forces are valid. It would be in error if they were not considered and calculated into the filtering of the swings in the climate data, for forecasting longer terms into the future.

    A sample of the cyclic pattern found in the meteorological database is presented as a composite of the past three cycles composited together and plotted onto maps for a 5 year period starting in 2008, and running to January of 2014, on a rough draft website I use to further define the shifts in the pattern from the past three to the current cycle, to continue learning about the details of the interactions.

  42. If you’re going to look into adjustments, then I think contacting Pielke Sr would be a good idea. He has quite a few papers on the effects of environment on temperature readings.

  43. If the warming is that small then isnt that inside the margin of error and therefore statistically it means that it could well mean that there is no warming.

  44. Al, your example of car velocity is not quite apt; the underlying variables (time and distance) are continuous, and therefore instantaneous estimates are available. Temperature data is discrete. Different worlds.

    And no, discontinuities absolutely do not imply station moves; “discontinuities” appear naturally all the time. Chaotic systems have a tendency to do that. 😉

  45. The recent discussions of station census, anomalies, raw vs. adjusted data, homogenization, and the like ought to be leading to some important areas of agreement between the AGW-Consensus crowd (e.g. RealClimate) and the Skeptics (e.g. ClimateAudit).

    The red traces (station counts) in Ryan O’s first two graphics are truly shocking. As the stakes become higher, data quantity nosedives. Huh?!?

    Everyone ought to agree on the urgency of a few simple measures.

    Raw data should be collected and made public, with as much metadata (siting, history of adjustments, etc.) as possible. These sorts of databases are essential for discriminating between real trends and spurious ones.

    The many station locations dropped from the GHCN between 1990 and ~2005 should be salvaged. Happily, it seems that in many cases, data collection continued past the dropoff point; the lapse was in collation. Most of those records could be backfilled.

    For those stations that were physically abandoned, most could be reclaimed, and observations re-started in 2010. The addition of metadata would allow factors such as increased urbanization to be taken into account.

    This latter initiative wouldn’t help today or tomorrow. But what if “the science isn’t settled” ten years from now? A much-expanded set of long-term records would be a huge plus, as far as improved understanding of climate.

    So it seems to me.

  46. Anyone know if GHCN file v2.mean is truly raw data? Or has it been cooked in some fashion? For that matter, anyone know just how the data in v2.mean_adj has been “adjusted”? Also, how was the drastic culling of data in the last few years accomplished?

  47. Picking out a trend from a wide error range is not technically defensible.

    If the earlier data has a error range of +/- 1.4C (NCDC processing induced error +/-.9 raw measurement error .5C) And the latest data has an error of just raw measurement error (+/- .5C).

    In the above, I’m being very, very generous. I did not even include GISS homogenization error. The error margins appear to be even larger.

    You cannot reliably detect a 1C warming signal with this much inherent error in the data.

    They ignore the error range, and are then measuring the “rise” against the middle of the error range in the older data while ignoring the error range in the current data.

    To be simplistic, lets call this a 100 year period and a 10C baseline temp.

    The older data has a 10C base temp with an +/- 1.4 error range. So it can be 8.6Cto 11.4C.
    The latest data measures at 11C with a +/-.5C So it can be 10.5C to 11.5C.

    Now I can get any story I want from this data.

    I can claim a 8.6C to 11.5C rise. 2.9C rise!

    I can claim a 11.4C to 10.5C drop. .9C drop!

    What they appear to do is take the mid points of the older data and the newer and claim 10C to 11C. 1C rise!

    Error margins(noise) matter when processing signals. Particularly when you can’t confirm what signal was sent to test your work.

    Their forward looking models were proven to have a much large error range than they thought when the planet temps leveled/cooled and their models called for up. K. Trenbeth’s email called it a “travesty”, not really, just time to go back and rework the model till it did accurately predict future temps. Their backward looking reconstructions were proven to have much larger errors because they could not track current temps after 1960 accurately. This led to “hide the decline.” The modern data from about 1900 and how they process has much larger error than they want to broadly admit, I don’t see a cumulative error budget anywhere.

    They are claiming to detect a signal below the noise floor. Not gonna happen. Not defensible.

  48. The temperature data is precisely as continuous as position data.

    There is no such thing as teleportation beyond the atomic level, and it is exceeding difficult to get a temperature to exhibit a true discontinuity instead of a steep move through a position.

    The real issue is “The data is crap, you can’t do that.”

    But even the temperature data needs to have precisely the same examination for discontinuities (yes, it’s discrete data, please consider ‘discontinuity’ as shorthand for ‘examine the data incrementally from the start, perform both incremental fits and residual plots, and set some threshold for “veering” like five-adjacent-points-past two sigma to determine where an instrumentation change probably occurred.’)

  49. Nice work, Ryan

    One can do the whole process of importing of the GHCN data in R as follows:

    grid=scan(“Climate Data/v2.mean”,what=”raw”,sep=”\n”) # substitute the path to where the unpacked raw GHCN data file (v2.mean) is stored
    k=length(grid)
    ghcn.raw=array(0,dim=c(k,14))
    parse.ghcn=function(x) { c(substr(x,1,12),substr(x,13,16),substr(x,17,21), substr(x,22,26), substr(x,27,31),substr(x,32,36),substr(x,37,41), substr(x,42,46),substr(x,47,51),substr(x,52,56),substr(x,57,61), substr(x,62,66),substr(x,67,71),substr(x,72,76)) }
    for (j in 1:k) { ghcn.raw[j,]=as.numeric(parse.ghcn(grid[[j]])) }
    rm(grid,k)

    Hopefully this will save a little bit of trouble. It takes several minutes to run. I had to do it this way as the version of Excel I am using has a 65k row limit! NB From “parse.ghcn=…” to “substr(x,72,76)) }” is all one long line of code.

    The 12 monthly fields in the GHCN file are all integers. It seems surprising that temperatures are not shown to at least one decimal place. Or am I missing something?

  50. David Starr @ #54

    I strongly suggest that the following link and its links to change point calculations be read in detail by those discussing temperature set reliabilities. I would note that change point adjustments, while not necessarily getting it right and I can see reasons why it would not and is in effect a brute force method, are employed by the temperature series owners because of a lack of good faith in the meta data used for previous adjustments.

    In general, I suggest that a very careful reading of what the temperature series owners publish (and choose not to publish) about their data, and particularly about uncertainties, can be very revealing and certainly provide areas that need independent investigation.

    I have not had an active hand in the evaluations that the Watts team is doing with CRN ratings of USHCN stations, but I was involved in a statistical analysis where RomanM did all the heavy lifting using the partially completed station data evaluations and historical station data from USHCN that were posted on line at that point in time. My hopes are that the Watts team will have the complete or nearly complete data scrutinized and analyzed by competent statisticians. I judge that the data has a major story to tell and it is data that most independent (and interested series owners) investigators do not make the effort to do and/or have the where with all to carry it out. It would be a damn shame not to publish an analysis of it – and to instead continue the silly bickering over it that I currently see.

    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

  51. Well looks like an institute in Russia just pointed a finger at CRU and stated that the CRU probably cooked the books. Seems that CRU only used 25% of the Russian data and they just happen to be in urban areas. Damm we must get the CDC looking for that disease that keeps killing off rural thermometers. ICECAP has it here, they got it from a Russian paper:

    http://icecap.us/index.php/go/new-and-cool

  52. #57. The data is only good down to a granularity of 1 degree F +/- 1F (+/- l.5C)at the time of physical measurement per the NCDC docs. Any futher granularity is a result of averaging that implies greater precision than exists in the data. If the first step is in 1 degree increments, you can’t claim to detect 1/10th of a degree changes.

    #58 I agree that link is mandatory reading. Also the GISS description of their processes for what they do after NCDC hands them the data. The ChiefIO site has great links to follow this education path.

    I have read and studied the NCDC and GISS docs in detail. A careful reading shows the cumulative error budget to be bigger than one would discern from their summary presentations. Their method of declaring a detected warming signal is at best highly questionable, if not statistically impossible. The exact errors they are introducing are not readily discernable from the published docs. The example NCDC gives themselves on their homogenization process shows a +/- .9C error introduced, but fails to state if this includes the +/- .5C inherent in the measurement. I do not think it does, but cannot prove it yet.

    Maybe we on the outside can reverse engineer it, but as a matter of basic scientific integrity they should be publishing it in the clear. The goal then is to contintually narrow it by refining their methods. This is the way signal processing technology advances. This is why wireless home phones and cell phones used to totally suck, but get better every year.

    However I don’t think we on the outside can expect to reverse engineer the full exact answer even if we know the form of the answer. By just processing the data, such as the stuff you and Roman did gives us hints of the biases being introduced and is incredibly useful information. However, the exact breadth of he errors being introduced would require a significant set of well controlled field studies involving actual field measurements and comparing to what the algorithms do. If these studies exist that actually test their methods for reliabiity and define an error budget for each step I have yet to find them. They may be there, but it’s not obvious.

    The burden of proof should be on them to show this claimed warming signal can even be detected given the known issues with data set and how it is processed. This is fundamental. The emails showed at least some of them really struggled with underplaying this (see AJ Strata’s comments on Briffa as a leak candidate).

    I know I’m being repetitive, but this to me is more important than the disturbing bias they appear to be adding to the data. I was researching that bias when I ran into the error budget. I would have gotten a F in any engineering class work for making such claims of bogus accuracy. It’s one of the most fundamental things they teach us in engineering because if you ignore it the products do not work. It is THE challenge in making communications gear go faster with less errors. In that case, we know alot about what data to expect and can use higher level protocols to reconstruct or resend data that is corrupted. In the case of climate, they don’t have a copy of the true signal, so it’s more like the old analog phones, not the new digital ones.

    It’s obvious to me because I’ve been around signal processing since the 1980s. They cannot detect the signal with the granularity or confidence they claim to be detecting from this data set.

    I’m confident one of our statistical gurus can publish a general proof of the the reliability of the claim.

    The form of this proof would be:
    For a given set of data at a point X with a reliability of +/- Xe,
    the probability that you have detected a deviation of size Z that has a reliability of +/-Ze from X is:
    A% when Z is less than Y
    B% when Z is between Y and 2Y
    C% when Z is between 2Y and 3Y etc.

    Really it is a continuous function. The curve will be instructive for all. I’m hoping to incent one of the many stats gurus to whip this out.

    Then as we learn the error budget Xe we find the answer to what is the smallest warming signal they could detect at what confidence level.

  53. #60, Thanks; I hadn’t got round to reading the metadata yet. I was thinking the temperatures were in deg Kelvin, but looking them more carefully they obviously aren’t. How stupid of me!

  54. Kenneth Fritsch #58

    Thanks for the link. It took me too a page that described replacing suspect data with an average of data from close by stations. The criteria for such replacement wasn’t explained. Could have been software detecting points too far away from the mean, or someone eyeballing outlier points.
    The discussion concerned the US Historical Climate Network. I assume they treated the Global HCN data the same way. They did not say that the raw data was still available, a bad sign, makes me think it isn’t, which would imply that the v2.mean file has been “adjusted”.

  55. Stan, I read the thread and comments, thanks.

    I’ll put in Lucia’s last comment on the thread as it summarizes things nicely.

    My main point is accuracy. Lucia’s is about precision from rounding. Technically Lucia is right you can report whole averaged numbers to 10th of a degree. It can be done if the accuracy information is carried forward with the data. I should be better with my wording on precision vs accuracy and will be going forward. So thanks, I did blur the two in my posts.

    I can average 12 and 15 to 13.5. This appears to be the point post. Lucia is right on that point. Lucia explicity excludes the accuracy question as they are separate issues.

    However Lucia also makes my central and much more important point. “Averaging rounded values doesn’t improve accuracy. If values are inaccurate, averaging can’t fix that.” I.E. you can’t infer accuracy to +/- .1 now, you must still carry in the +/- .5 the original numbers had.

    For example if the numbers are 12 +/-.5 and 15 +/- .5 then the actual average is somewhere between:
    (11.5 + 14.5)/2 = 13
    (12+15)/2 = 13.5
    (12.5 + 15.5)/2 = 14

    The range is 13.5 +/- .5 for a range of 13 to 14.
    You cannot say it is 13.5 +/- .1 for a range of 13.4 to 13.6

    If I now claim to have detected a .6 C rise in temps (which is what GISS site claims) vs the central point average of 13.5(I made this number up for example, it could be any number and the example still works). At a minimum assuming no other errors introduced, you must carry into that a +/- .5C error range in both numbers that was introduced at the time of measurement (per NCDC).

    The real answer could then be:

    A rise of 1.6 C (take .6C + .5 error for 14.6C vs 13.5C – .5C error for 13C).
    A drop of .4C (take .6C – .5 error for 13.6C vs 13.5C + .5C error for 14C).

    Or anywhere in between. This is just on the raw measurement. NCDC and GISS clearly introduce other errors of still to be determined magnitude. NCDC displayed in their docs of around +/- .9C for their Reno example.

    My central point remains the cumulative error budget needs to be shown and reviewed vs the claimed signal detected because it seems too big to support their claims. This is just on the fragments I can find. +/-.5C on the measurements, then the NCDC changes with the example with error bars for the adjustments they do, then the GISS changes.

    Here is the comment I am refering to:

    lucia (Comment#28251)
    December 16th, 2009 at 1:08 pm

    blueice2hotesea– Averaging rounded values doesn’t improve accuracy. If values are inaccurate, averaging can’t fix that. Fixing inaccuracies relative to what you really want to measure is the goal of “homogenization” and that’s what Briggs is discussing.

    Averaging numbers can result in an averaged quantity with better precision than the individual underlying measurement. This is true whether or not we round. In the case on monthly surface temperature anomalies, averaging a whole bunch of numbers rounded to 1F can result in a monthly average anomaly with a precision that is better than 1F.

  56. #67
    Devil is in the details of what they do each step. A process could leave a result +/- a percentage or add more +/- degrees C. (The impact of 1% of 8C is way different than 1% of 25C vs a process that introduces +/- .1C to any number) One way or another they build up on that initial raw observation error of +/- .5C.

    I don’t envy what they are trying to do.

    They get a +/- .5C raw number with a bunch of random other issues that may or may not make the error range significantly worse. They must address these unknowns. But the price they must pay is to have uncertainty being introduced by the very processes being used to remove other errors like site changes, UHI, equipment changes, etc. It is just the nature of the beast. Nothing nefarious there, but they need to declare how much they can best estimate the net error result is for each step.

    So I get what they are try to accomplish and the fact they need to try to clean the data up. I just want clarity in how accurate the claimed result is or is not. I do question what the second homogenization is actually doing to the curves. (Is that like ultra pasturized?)

    That a bunch of science people apparently got pressured to declare greater certainty than actually exists is not a surprise. Management pressures engineering teams to commit to impossible schedules and costs all the time. Then reality eventually strikes and bad managers try to blame the engineers when both share the blame. A lack of open process between mgmt and the technical people is typically at the root of the overcommitment. Frankly, I feel for the majority that probably got sucked into this by what appears to be a few over zealous advocates, like the CRU crew, the hockey team, and Hansen. It sure looks like the accuracy of all the models, backward looking, forward looking and 20th century data were overstated and there was pressure to do so.

    We need good, open climate science, and scientists. I’m confident the vast majority will be happier when this all settles out. If the CO2 AGW crowd is right, fine, good open science will withstand all skeptics once the zealots are gone.

  57. Here’s a graph of 30 matching US stations with no data-points missing in either adjusted or raw GHCN during the 1921-2005 time period. There was a purge of mostly US stations in April 2006.
    With no data needing interpolating the 85 year trends are raw = 0.06c and adjusted = 0.24c.

    Overall, 38% of data was adjusted downward. 19% adjusted upward. With 43% unadjusted. Sounds good, almost like they might be adjusting for UHI. But the devil is in the details. The heaviest downward adjustments were early in the time period when UHI can be assumed to be less.

  58. Ryan O and Kenneth, i’ve posted the TOBS paper a couple times. if you have trouble finding it just hollar. 1986

  59. I like those extra long records. Cant we just treat them like a tree ring and reconstruct the global temp. or run regem and fill in the rest of the planet? hehe

  60. Page 158 from the TOBS doc

    “The standard error of prediction for the mean temperature varies from approximately 0.15-0.20 C around sunrise to approximatley 0.10C-0.20C during the afternoon.”

    This error may be somewhat different than the +/- .5C in that the probabilities are not equally spread across the range,but are likely in a gaussian distribution. The original +/- .5C is spread equally.

    So for raw data subject to TOB corrections by NCDC the error range is now from somewhere around. +/- .6 C to +/- .7C. But the different types of errors must be tracked. That is if they are following this method and it is not apparent how many raw data observations are subject to this correction. It could be 5% it could be 95%, but that should be explicitly known by NCDC.

    It’s probably worth looking at Vose et al., 2003 as the NCDC references this in addition to Karl for TOB. They may have an updated view of the introduced error.

    The good thing is they only do a handful of touches that the error needs to be accounted for so tracking the error ranges and their nature is not impossible.

  61. All,

    I’ve had a notion simmering on the back burner for a while about how to reasonably accomodate inconsistencies in the spatial and temporal distibution of observations. One of the things that really churns my stomach is this notion of infilling. Part of the problem I see is in trying to patch together anomaly values from nearby adjacent stations and stitch them into one record. It occurs to me that rather than working in anomalized data, one can work in bona fide temperatures instead.

    Rather than anomalizing first, then gridding, I think one would be more justified in gridding first, then removing seasonal influence by anomalizing the grid. This would allow you to use partial or incomplete records without having to manufacture data based on a daisy-chain of assumptions. Does anyone know if this is done anywhere?

    A benefit of gridding first is you can generate metrics on the spatial uncertainties. Kriging is one method that does this, I am sure there are others.

    A huge upside to a grid-first process is that you don’t have to throw out ANY stations due to short record length.

    I suppose I haven’t explained myself very well, but on the off chance I have, does anyone see any fundamental issues with the grid-first concept?

  62. #78Earle,

    I’ve not seen it done that way but it suffers the same problems as splicing in anomaly data. Consider that new stations will be at different altitudes and surrounding local conditions. The values of even close proximity stations could be different by a half degree quite easily. If this offset in the fixed value isn’t taken into account when new stations are introduced, results like the above would probably be common.

  63. #79 Jeff,

    True. But since the result will be a globally averaged temperature anomaly, there will be smearing of discontinuities whatever the method. The benefit in my mind is to actually quantify the error introduced by those near station discontinuities.

    The alternative of homogenizing the data could be more meaningful if had a handle on all the changes in each individual station and could justify every adjustment. Given the quality of the current observation network and the quality of the metadata for that network, any adjustments reflect wishful thinking rather than actually reducing uncertainty.

    I suppose my assumption is that there will be sufficient numbers of stations dropping into and out of existence that a given grid cell will be adequately sampled.

    Now consider if you will that you can quantify the uncertainty a given station ( ala the CRN rating system, e.g) and can feed that uncertainty into your gridding algorithm. So you can have one station with a mean temp of 12.5 C +/- 1.0 C and a nearby station with a mean temp of 12.9 C +/0 3.0 C. The calculated temp for the area would take both the temperatures and uncertainties into account and the resultant uncertainty in the gridded value would also reflect the observation uncertainty as well as the uncertainty due to the variation in the data.

    An upside to this approach is that the scientific literature has been filled for decades with methods to address this sort of spatial observation and uncertainty. Developing something along these lines would avoid the ‘not invented here’ mindset that seems to pervade climate science statistics.

  64. I don’t know what this means but it doesn’t look good.
    I looked at four data sources for Lismore, Australia ( w/s no 058037) – the BOM raw data (the weather station data), the BOM Annual Mean Temp graph on the high-quality climate site data page, NASA gistemp Stations (combining sources at same location) and NASA data (after homogeneity adjustment).
    Some examples. The table is a little hard to read but the headings are as mentioned above.
    Year/ Raw mean Data/ BOM AMT(graph) /NASA (cs@sl) /NASA (aha)
    1910 19.45 18.5 19.38 18.88

    1915 20.45 19.7 20.67 20.17

    1940 19.8 19.3 19.82 19.32

    1960 18.95 18.4 19.12 19.62

    1980 20 19.6 20.27 19.88

    1992 19.15 19.2 19.16 19.15 (The NASA records end at 1992.)
    Notice that the later the time period the closer the ‘adjustments’ become similar to the raw data, both with the BOM graph and the NASA data.
    eg 1910 Rw data mean of 19.45 – NASA after adjustment 18.88 – down 0 .57C
    1992 Raw data mean of 19.15 – NASA adjustment 19.15C – no change

    I don’t know if this is just this station but the effect of this is to make the anomaly graph look as if there has been a steady rise in mean temps for this site when in fact there hasn’t been as much.

  65. Earle,
    I like the general drift. We are better off to see the patchwork of info as it is instead of a “smeared” local view that rapidly diverges from reality.

    Low quality, low coverage areas will stand out and people can then look for real solutions to get real data for them. We may just have to accept for some areas that by current methods we don’t know and won’t ever know the temp history. The more I look at the steps from raw, TOB, NCDC Homogenization, interoplation of missing station data, removal of some by GISS, a second seemingly overlapping GISS homogenization, then another round of merging and adjusting on a global basis as it is fed into the grids the uglier it gets. The merge is necessary to roll it up. The repeated adjustments are the issue.

    Some of this is clearly necessary (TOB, something to account for site attributes for example). But the general idea of working with the data as it is rather than creating these spliced and estimated continuous records is good. If the data sucks for a region, then it is better to know that and look for ways to improve it. Algorithmically faking it without making it fully visible while ignoring the measurement and introduced error ranges is just broken.

  66. Steven #5,
    I don’t think anyone has said Europe has not warmed. GISS, at one time, had 1934 the warmest year on record in the US – meaning US warming has not been significant. We know Antarctica warming is also insignificant based on the Ryan’s Tiles post. And we know Africa has not seen any significant warming in the 20th century. According to scientists in Russia, Russian/Siberian warming seems to be the result of CRU selecting stations which show warming and ignoring others. Douglas Keenan has alleged claims of Chinese warming are fabricated (and/or data is fabricated).

    Is European warming really global warming? Really?

  67. I’m new to this whole field of climate studies, although I have been intensely following the Climategate thing since its beginning. Anyway, please forgive me if my ignorance is showing. My question is this: Why is anyone spending all this time and energy “properly” computing land-based temps when we have, supposedly, more reliable balloon and space-based data? Why not focus on what we can trust?

  68. Good day! This is my 1st comment here so I just wanted to give a quick shout out and tell you I really enjoy reading through your blog posts. Can you recommend any other blogs/websites/forums that cover the same topics? Thank you so much!

  69. I’m truly enjoying the design and layout of
    your website. It’s a very easy on the eyes which makes it much more enjoyable for me to come here
    and visit more often. Did you hire out a developer to create your theme?
    Superb work!

Leave a comment