the Air Vent

Because the world needs another opinion

GCHN Adjustments – Statpad

Posted by Jeff Id on December 12, 2009

Roman M, has a blog.  We don’t know about it much, because he’s only put a couple of posts up.  It’s ann odd coincidence that the ‘hackers’ knew enough about CA to put the same link on Roman’s blog that they left here.  It’s a bit of an arcane detail that not many of us know.

Apparently like myself, Roman has spent the day buried in GHCN data and code.

This post is worth checking out: GHCN and Adjustment Trends


12 Responses to “GCHN Adjustments – Statpad”

  1. Green RD Mgr said

    Interesting. I too have spent recent days looking at NCDC how GISS ext adjust. (Thanks to those who have guided me to sources).

    I suggest plotting NCDC raw vs adjusted. It seems to show no curve shaping, but fairly random as you would expect if they are adjusting for local changes. They also do a decent job of explaining what they do and why. ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly

    I cannot find the correlation between the GISS pre homogenized(which I had thought was raw) and NCDC raw. They appear to produce very similar curves, slopes, ramps, but offset significantly. I still do not understand what logic drives the GISS homogenization as it is more aggressive.

    The next example to look at is how what you plotted as applied to individual stations. How a station that shows a decline is shaped vs how one that shows a temp ramp is very different. The end resulting overall characteristics may be instructive. What I have seen seems to drive any individual station GISS data to the overall results in this post.

    Too early to reach any conclusions, but data I have looked at is consistent with this theme. Still searching for data and answers….

  2. boballab said

    That is because the GISS “raw” is not really the raw data. When you go to the GISS site it gives you 3 choices.

    Choice #1 States: “raw GHCN data + USHCN Corrections”

    That means that what GISS is using as raw is the adjusted data from GHCN for US stations. When you pull up the list of stations, ie Washington/na, under option 1 you see there is multiple stations covering different years. Now when you look at the plot for one of them you will see that they also plot the other stations with it on the graph but the data is for just that one station.

    Now back at the Selection of type of Data screen the second option is where you can get the combined data for our example of Washington/na. What this means is this is the data set where they combined all the stations in that area into one data set. This means there should have been station adjustments built into the dataset.

    Then of course is the 3rd option of the GISS “homogenized” data.

    So basically you can’t get raw data from GISS for US sites is because what they are getting from NCDC is the GHCN with adjustments already in it. So that is not GISS’s fault. Now USHCNv2 should not have been applied to foreign data. So the “raw” from GHCN for foreign sites GISS has should be closer to raw, but then again we don’t know how that data was handled before NOAA and NCDC got their hands on it. So basically GISS can only work with what they get and for US sites they get preadjusted data.

  3. Mesa said

    Yes it’s a very interesting post Roman M has indeed.

    I argued that the original graph from GG was meaningless without including the time dimension at RC today and got a lot of obnoxious comments about what an idiot I was. I’m sure they will now switch into declaring that they knew this adjustment profile was the case all along or some other hind quarter covering rhetorical tactic.

  4. TerryMN said

    Steig was certainly quick to declare the “science is settled” based on that histogram. Alas, his rigor and robustness has been found lacking yet again…

  5. hpx83 said

    It seems that there are more people looking into the same things. I checked the mean annual adjustments a couple of weeks ago (http://savecapitalism.wordpress.com/2009/12/02/ghcn-database-adjustments/) and came up with a similarly looking graph.

  6. Nick Stokes said

    I verified Giorgio’s calculations The histogram is here. I get the same mean, 0.0175 deg C/decade. and standard deviation 0.189 C/dec. I tried to post the code at Giorgio’s site as a comment, but R seems to trigger his spam filter. Anyway, here it is. I edited the files slightly by replacing -9999 by NA and separating the station number from the year.

    #### A program written by Nick Stokes, 13 Dec 2009, to calculate the changes to regression
    # slopes caused by adjustments to the GHCN temperatures v2.mean_adj-v2.mean

    # A function to calculate regression slope. I hope it is faster than lm()
    slope<-function(v,jj){
    m=jj-mean(jj)
    s=(v %*% m)/(m %*% m)
    s
    }

    #####################
    # read data from v2.mean and v2.mean_adj, downloaded from http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/
    # I edited (emacs) to put a blank between the station number and year, and to change -9999 to NA (add .txt)

    # Read in data from the files in matrix form
    if(T){ #change to F after you have read in th efiles once
    vmean <- matrix(scan("v2.mean.txt", 0, skip=0,na.strings = "NA"), ncol=14, byrow=TRUE)
    vmean_adj <- matrix(scan("v2.mean_adj.txt", 0, skip=0,na.strings = "NA"), ncol=14, byrow=TRUE)
    # Now, to save time, move to annual averages
    vmean_ann=vmean[,1:3]
    vmean_ann[,3]=rowMeans(vmean[,3:14], na.rm = T)
    vmean_ann_adj=vmean_adj[,1:3]
    vmean_ann_adj[,3]=rowMeans(vmean_adj[,3:14], na.rm = T)
    }

    # Initialise
    vv=rep(0.,200) # regression y vector
    jj=rep(0,200) # regression y vector
    grad=rep(0.,9999) # gradients (the output result)

    len=length(vmean_ann[,1])
    jmax=length(vmean_ann_adj[,1])

    j=1
    k=0
    kk=0
    m=0
    # counters. j is row of adjusted file. m is station counter
    # k,kk are local row (year) counter (for station m). k skips NA's, kk doesn't

    # loop over all rows in v2.mean
    for(i in 1:(len-1)){
    kk=kk+1
    # to find matching rows, first check diff between stat nos and years
    u=vmean_ann_adj[j,]-vmean_ann[i,]
    # If the adjusted counter has got ahead of the unadj, wait
    if(u[1]<0){
    if(j<jmax)j=j+1; u=vmean_ann_adj[j,]-vmean_ann[i,]
    }
    # If we have a match, add to regression vec vv[]
    if(u[1]==0 & u[2]==0 ){

    if(!is.na(u[3])){ # don't add to regression if NA
    k=k+1 # local adjusted counter
    jj[k]=kk # x for regression
    vv[k]=u[3] # discrepancies for regression
    }
    if(j0){
    m=m+1 # m is station counter
    grad[m]=slope(vv[1:k],jj[1:k]) # compute regression slope
    k=0 # zero local counters
    kk=0
    }
    }
    # Now prepare histogram. Comment out jpeg and dev.off() to get screen graphics
    jpeg(“GHCNAdjustments.jpg”)
    hist(grad[1:m],nclass=200,xlab=”degrees C/decade”,main=”GHCN adjustment change to trend”) # draw histogram
    a=c(mean(grad[1:m])); a # Mean slope change
    dev.off()

  7. Looking at gg’s blog page was interesting. Seeing Roman’s take was even more interesting.

    But this still leaves the urban heat island issue hugely unresolved AFAICT. Patterson 2003 (?) is surely seriously challenged by a lot of people, not least Michaels and McKitrick.

    There are also comments about debunking Watts’ surfacestations work and Eschenbach’s work… but at least it seems gg allows posters from both sides of the debate. To date.

  8. Green RD Mgr said

    Boballab,

    Thanks, that fits with my understanding now.
    However, when I pick a US station and plot the NCDC raw, the NCDC adjusted, the GISS option1. None of them matched. That continues to be my issue. The form of the the basic characteristics in wave shape, slopes, etc. are very, very similar, but they are almost a degree off between NCDC adjusted and GISS option 1. So I’m thinking there may be another intermediate step I’m missing somewhere?

    I’m confident NCDC gets the raw. Their docs state they adjust it for time of day, site,etc. A look at their raw and the adj data sets matches with this.

    They also have a nice paper describing their homongenization process.

    As an aside, while their described homogenization process seems well targeted to identify undocumented station changes & extraneous data events, it also contains the seeds of polluting (infecting) other stations data as it appears to be a highly recursive automated process where homongenized data is then reused to homogenize other data. So while it addresses one issue, it may create another and drive a higher level of uncertainty into the broader data set. Could be a nice thesis for someone…

    But back on track:

    Then apparently when NCDC is done with their adjustments this all goes to their customers, including GISS. But like I said, I’m still trying to confirm with a data match to see that this is the source of GISS pre homogneized or if there is another source or intemediary.

    Did I miss something here?

    Then GISS apparently homogenizes the data again regardless of how many times it was already adjusted! The GISS homogenization clearly reshapes the curves for no stated legitimate reasons…that is the process I’m trying to understand. Especially if their data supplier already homogenized the data.

    Following the trail back on the GISS data from them to raw was supposed to be simple data integrity diligence that would take little time…:-). I feel for Harry the programmer…

    What I see so far is Raw data, twice(maybe thrice) baked. Then a claim of trends detected that are smaller than the changes made. I would throw a technical team out of an operations review if they presented this and asked me to allocate funds/people to a project based upon such a story.

    It could all be legit, frankly the only one that really bothers me is the final curve reshaping GISS stuff. Regardless, the process needs to be open and independently reproduceable given the stakes.

    Perhaps they need to accept that with current methods the error range of the data and their processes create noise that exceeds the signal they are claiming to detect. But that would pretty much undermine the whole story, so not likely…;-).

  9. Nick Stokes said

    I reproduced Romanm’s plot with a bit of a hack to the R script. It matched. But it seems entirely consistent with GG’s result. If you fit a line to the 1905-2005 section of Romanm’s plot, the slope is 0.023 C/decade. If you go back further, the slope diminishes. GG’s mean slope (and mine) was 0.0175 C/decade.

  10. Green RD Mgr said

    Nick,
    Saw your post on WUWT on data handling.

    Can you point me to sources that describe how GISS undoes the NCDC adjustments and then redoes them? Your exchange with EM Smith explains why I could not reconcile GISS pre homogenized vs NCDC adjusted for the same stations. They look similar, but with significant offset.

  11. Nick Stokes said

    #10 Green
    It was EM Smith who said that GISS undoes the USHCN adjustment and redoes them, and I believe that is so. There is a file called gistemp.txt which comes with the gistemp release, and which explains the various steps. This para may be relevant to your offset issue:
    Replacing USHCN-unmodified by USHCN-corrected data:
    The reports were converted from F to C and reformatted; data marked as being
    filled in using interpolation methods were removed. USHCN-IDs were replaced
    by the corresponding GHCN-ID. The latest common 10 years for each station
    were used to compare corrected and uncorrected data. The offset obtained in
    way was subtracted from the corrected USHCN reports to match any new incoming
    GHCN reports for that station (GHCN reports are updated monthly; in the past,
    USHCN data used to lag by 1-5 years).

  12. Green R&D Mgr said

    Nick,
    Thanks for the pointer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: