the Air Vent

Because the world needs another opinion

Auto-matic Correlation

Posted by Jeff Id on March 27, 2009

This post has an error.  The calculation inadvertently used temperature rather than anomaly.

Thanks to Steve McIntyre at Climate Audit and comments from Hu McCulloch at Climate Audit for quickly spotting this error and bringing it to our attention.  I think this points out pretty well that accusations of cherry picking or playing favorites on Climate Audit aren’t reasonable.  Problems get chopped up and spit out regardless of the source or meaning.

The two Jeffs

UPDATE This is the corrected graph by Jeff C which has reasonable correlation vs distance.


We know there’s a problem.  It’s a question of where and how much.

Jeff Id


A guest post by Jeff C.

Jeff C has written a short post for us which I have independently verified with my own separately written code. I think you’ll be surprised.


Previously we have looked at scatter plots to get an understanding of how well correlated the data appears over distance. Below is a scatter the raw Antarctic surface data. This contains no infilling, just actual measured data from occupied surface stations.


This plot above was originally calculated by Steve McIntyre. Note how the correlation is virtually 1 at 0 km, with a gradual decay as distance increases. This is what we would expect to see as stations closer together should have better correlated climate than stations far apart.


This plot, also provided by Steve, is a distance correlation for the satellite era (1982-2006) of the Steig 3 PC reconstruction used in the Nature paper. Note how correlation remains at 1 for some cell pairs at distances out to 3000 km. This seemed suspicious and led many of us to believe that the reduction to 3 PCs had caused a spatial smearing of the data.


This plot above is of the NSIDC AVHRR data from the University of Wisconsin website that Jeff and I have been processing for the past few weeks. Note that the “cone” is quite a bit wider than the surface data, but the distance correlation looks reasonable. Some of the cell pairs are still rather well-correlated at long distances, but we don’t see the values of 1 we saw on the Steig reconstruction.


This is the shocker. Here is the distance correlation plot for the Steig cloud-masked data released today. This data set has been presented as the satellite data used as the input to the reconstruction. If it were truly “raw” (or minimally processed) satellite data, we would expect to see a plot similar to the NSIDC plot immediately above. Instead, we see that every single data pair has a correlation of greater than 0.5!! Data from the peninsula is highly correlated with data from the East Antarctica coast and the interior despite the surface data showing nothing of the sort.

Why would this data set have such a high cell to cell correlation? I’m speculating here, but Steig talks about “enhanced cloud masking” where daily data points that exceed the climatological mean by +/- 10 deg C. are considered cloud contaminated and discarded. From my experience with the NSIDC AVHRR data, a huge number of data points would be affected by this threshold, perhaps as much as 50% of all points. If a simplified infilling algorithm was used to replace those points, high correlation might result. Regardless, this plot appears to show that the cloud-masked data set is highly-processed and suspect.

When I first ran this plot I thought it must be in error. I checked my code line by line and have repeated the results multiple times. I still find it hard to believe.


Jeff Id

I’ve spent several hours verifying this post and have independently verified the results using my own code. Jeff C’s code used a subset of every 5th value in the grid (due to matrix size R can’t handle the full matrix). My independently written version used a random subset method which was derived from SteveM’s original sat correlation.

What it means:

The concept of this paper was to use spatial information to insure proper weighting and location of individual surface stations across the antarctic. The surface stations are the lowest noise measurement of atmospheric temperature and show a particular correlation pattern which we can consider “natural” (the first graph) . This is the pattern you would expect to see in any data representing antarctic temperature. The 3rd graph is the NSIDC dataset and represents spatial correlation of the publicly available cloud masked data from the same instruments as processed by the NSIDC. There is a wider spread of the cone angle as compared to surface station data which is expected due to the increased noise level in the dataset, but the key is that there still is spatial information available. The last graph however has correlations pegged at almost 1 for the full width of the dataset independant of the distance, mountain ranges, peninsula, sea contaminated pixels and the rest.

From my other post which derrived the 3 pc’s for the reconstruction dataset, this data doesn’t seem to be an exact copy of the original data but it is close. What’s more is we can now make sense of the second to last graph which is derived from the full reconstruction using 3 pc’s as presented by Steig. The data from graph 3 has almost a parallelogram shape because surface station data’s correlation vs distance is copied equally across the entire satellite dataset regardless of actual location.

If you take the surface station points (graph 1) and spread copies of the surface station data across the entire width of the Steig satellite data (graph 4), you get (graph 3).

I’m not in any way saying or in any way implying this was done intentionally but this is just about the perfect dataset to use if you want to weight every station equally and basically average the pre 1982 trends across the entire continent. I thought we were going to have to go through RegEM and do a lot of calculation to find if this was the case — not this time. This is the perfect scenario to blend the high concentration of known warming peninsula stations across an entire continent.


A copy of Jeff C R code if you would like to verify the calculation:

#circledist function
#calculates great circle distance with earth radius 6378.137 km
circledist =function(x,R=6372.795) #fromlat,fromlong,lat,long
y[y<=-180]= 360+y[y<=-180]
delta= y *pi180
theta=2*asin(sqrt(sin((tolat- fromlat)/2)^2+cos(tolat)*cos(fromlat)*(sin(delta/2))^2))

parse=5 #set every nth point to include, setting to 1 is very slow
#grid=scan(“anom_5509.csv”,n= -1,sep=”,”,skip=1) # use for UWisc AVHRR
grid=scan(“cloudmaskedAVHRR.txt”,n=-1) # Use for Steig recon or cloud masked
dimnames(anom14_5509)[[2]] <- 1:5509
anom14_5509=anom14_5509[,seq(1,5509,by=parse)] #parses to every nth column

#Load Coordinates
coord_5509=coord_5509[seq(1,5509,by=parse),] #parses to every nth row

#make correlation matrix
corry=cor(anom14_5509,use=use0) #correlation coef calculation
sum(! #1585
sum(! &corry<0) # 658

#make lat-long matrices


#calculate distances

#make ID matrices


plot(station[,4],station[,3],xlab=”Dist (km)”,ylab=”Correlation”,col=”grey70″,xlim=c(0,6000),ylim=c(-.5,1))
# 0.4958,
title(“Steig AVHRR cell distance correlation”)

27 Responses to “Auto-matic Correlation”

  1. Layman Lurker said

    Jeff C.: On your flow chart you speculated that RegEm was used for infilling masked grids. If RegEm was used coupled with the suspected autocorrelation then we would expect the infilled values to be artifacts and not reflective of local averages for the grid. It looks like your hunch was correct.

    Presumably, infilling masked data with “artifacts” would have resulted in a spatial and temporal “checkerboard” of patterns within AVHRR data. Might this have been the reason behind the decision to further process the data before running moving to the reconstruction step?

  2. WhyNot said

    It is obvious the raw data used was heavily manipulated, filtered, artificially created, or whatever term you would like to use, to generate a predetermined output. One, that would further substantiate the AWG cause. A polyscience Ponzi scheme!

  3. Jeff C. said

    #1 I don’t think RegEM was used, but I think you are correct in that the infilling methodology is causing the high autocorrelation. I tried using RegEM to infill missing data in the 5509 cell NSIDC AVHRR, it crashed almost immediately. I reduced the data set to 25% of the original and RegEM still crashed.

    I forgot to mention above that the distance correlation code was originally Steve McIntyre’s. I modified portions of it into a format more familiar to me as I don’t understand how R uses data frames. Steve did the original heavy lifting, but any mistakes that might be found are mine.

  4. Jeff C. said

    Folks, I owe a huge apology to you all, and particularly to Dr. Steig. I re-used code to process the scatter plot that I had previously used for the satellite reconstruction. I neglected to account for the fact that the recon were anomalies, the cloud-masked data set were temperatures. When I recalculated the scatter plot using anomalies, the familiar pattern re-emerged.

    Thanks to Hu at Climate Audit for reviewing the code and pointing out this flaw. This mistake was entirely mine and I again apologize for jumping to conclusions.

  5. Layman Lurker said

    So the corrected post would now suggest that the infilling for cloud masked grids did not result in spurious correlations. Correct?

  6. Jeff Id said

    #5 yup. The infilling looks pretty normal from a correlation standpoint. The problem is therefore in the PCA reduction.

    #4 — No way you get all the credit for this one. I verified and posted it and totally believed it was correct this morning. Sorry to everyone but these things will happen.

    As I’ve pointed out many times, on the Air Vent, the data is the data and the math is the math. We have little control over either and I will make no attempts to sell something I don’t believe in.

    This is actually a bit of good news, because tonight we can look at RegEM with higher orders whereas it would have been pretty useless had this post been correct. JeffC had ended all the fun and those of us on the air vent would have had to wait for the next paper for entertainment.

    Now we get to keep going 🙂

  7. Layman Lurker said

    If this masking was done correctly, then the result should “tidy up” the scattergram by removing spurious anomalies due to clouds and giving cleaner boundaries to the cone. However there is a significant portion of the cone which now moves into negative correlation territory when compared with the raw AVHRR. Could this effect be an artifact?

  8. Harold Vance said

    #4 — One of the best things about the Internet is that fresh eyes can review one’s code. People who aren’t familiar with a style or way of coding are going to see other code from a different point of view. What might not be obvious to one coder often becomes immediately apparent to another coder. We are lucky to be living in this age where people can review each other’s work at virtually any time of day or night and report their results. For zero cost. (I’ve been coding for 25 years and am envious of the way this process works.)

    The speed with which you guys own up to mistakes is awesome. Man, how I wish others would follow your example.

  9. davidc said

    I agree with Harold, very impressed with your readiness to admit an error.

  10. Fluffy Clouds (Tim L) said

    OK OK OK….. lol but now on to the data!!
    I agree with this post… and the question is.. final jeopardy!!!!!!!!!!

    Layman Lurker said
    March 27, 2009 at 6:40 pm

    If this masking was done correctly, then the result should “tidy up” the scattergram by removing spurious anomalies due to clouds and giving cleaner boundaries to the cone. However there is a significant portion of the cone which now moves into negative correlation territory when compared with the raw AVHRR. Could this effect be an artifact?

    OR is there more less negative ?

    Good work J&J

  11. Now I don’t know if cloud masking is literally what it suggests, but if it is, it DOES raise a point in which could lie a big source of the Steig errors. This is the overall warming effect of cloud cover over permanent icefields. If Svensmark’s hypothesis is correct, from 1957 to about 1970, there would have been more cloud cover over the planet which would have a cooling effect on the rest of the planet but a warming effect over Antarctica. Now if the cloudy records are omitted, the earlier Antarctica records will register too cold: thus a false amount of warming could appear to have happened from that time.

    See my notes on Steig

  12. TCO said

    Actually…no it doesn’t prove that general a point of fair mindedness about CA. I watched his duplitious conduct with Loehle and was sickened by how he didn’t call out Loehle, but instead previricated with the “if it’s bad, Moburg is bad” avoidance (when he knew Moburg was bad and called out Moburg in no uncertain terms.) And when he doesn’t call out the Watts silliness, it’s obvious…if you are in the skeptic clique you get a gentler ride. The guy is a Canadian [snip again – not a good use of my time]

    Have much better hopes for the Jeffs and Hu. Just telling you all to WATCH OUT.

  13. RomanM said

    If I may add my two cents worth to this, what people who may read these blogs have to understand is that what most of us do here is not the presentation of “final” facts. Instead it is an ongoing seminar, an interchange and development of ideas. If one makes a purely calculational slip, it’s not something to lose sleep about. All of our “scratch work” is out in the open and if there wasn’t an occasional goof, we would probably be slacking off too much.

    The “team” does all of their stuff away from prying eyes – and still, more often than they should, post some pretty obvious basic methodological gaffes. Their posts are in journals where a lot more care and doublechecking needs to have been done and where there is no venue (as there is here) for correction.

    You guys’s do good stuff and these were slight oversights, easily corrected and acknowledged. Anyway, I figure that this gives me a “get out of jail free card” should someone dicover one of my screwups, so no harm, no foul. Move along … 😉

  14. TCO said

    I agree with roman. Would only note that Steve likes to have his cake and eat it too. Has little hoi polloi treat his “open lab notebook” as if it were real reports. Thinks he should be invited to travel to conferences, etc. based on it. When the guy has almost nothing for pubs. Other than EE and comments (which don’t count as pubs) all the guy has is one letter in GRL. I think he is slack in not finishing things off. Also think that he likes not having things summarized and stood behind. But that’s unfair to the other side. There’s not even a stable set of criticism to engage on.

    Plus he effs up all his axes on graphs and such.

  15. My colorful density plots that reproduce the graph and tell you much more:

  16. TCO said

    String theory sucks. I have a theorist physicist friend who can do math like a whiz…AND has physical insights. He confirms my suspicions. The tide has turned against you all. Even the popular meme is now that string theory is untested poopoo.

  17. Dear TCO, sorry to inform you but your “whiz” friend is surely a crackpot if he has any doubts about the validity of string theory. Screw your tides, you groupthink-led primitive morons.

  18. TCO said

    Dude…I think he could take you in a math fight. (He’s not sure…but I’m ready to equip you each with Bessel functions and see who survives the duel.)

  19. TCO said

    Just wait…Lumo…I think departments are going to stop hiring stringers soon. The tide is turning against you. You should have figured out high Tc. It still needs a suitable treatment. Instead…you decided to jerk off with imaginary booshwa. Sad, sad. Einstein weeps.

    Oh…and Dick Feynman had the same impression of stringers as I do…

  20. TCO said

    Maybe you could do like Mike Mann and drop out of math-phys and become a climatologist? Probably make more money, do less work, and get laid more often. Plus there are actually more interesting problems and approaches being worked on. Face it, we’re at the “End of Science”. Drop into an applied field. Sorry…Goldman is not hiring.

  21. Dear TCO, you surely don’t believe that your friends can reach my ankles in mathematics.

    I am not denying that breathtakingly stupid and low-quality, uneducated, yet “active” idiots have flooded many if not most U.S. universities. What I am saying is that the validity of string theory has been pretty much established. That’s a very different question than the question how many idiots are being hired somewhere.

    In the very same sense, I am not denying that the White House is currently overrun by socialists or global warming alarmists or similar garbage and that most Americans are brutally confused about very basic questions of freedom, democracy, and capitalism these days. I am only saying that this collection of ideologies is sick and wrong.

    Guess why I escaped from that mess – from the places controlled by fanatical feminists, socialists, anti-string crackpots, and similar scum who use “arguments” involving concensus or hiring, without caring about the merit a bit. Two years ago, it started to be an issue of my very personal safety.

    All the people who refuse to lick the asses of aggressive yet primitive human foam like you are at risk unless they are geographically separated.

  22. TCO said

    Dude, you’re at effing HARVARD! That’s commie central. Heck, I’m still helping Summers get over his butt-whipping from the little shriekers.

    Oh…and I think he could take you. You might know more wierd math crap…but he has real insights. He’s from outside the US, btw. And he is lazy and still butt-kicks all the Chinese grad students.

  23. Jeff Id said

    #22 I wouldn’t hold the Chinese grad students above, most that I’ve met have a complete lack of creative thinking. They typically just go through the motions.

  24. TCO said

    IANAR…but, yep.

  25. Jeff Id said

    #13 Thanks Roman, I don’t like screwing up but I’m old enough it rolls off my back.

    I think most people understand and in the end appreciate that if I admit my mistakes, they can actually trust me more when I claim they are correct.

    On another topic. I think that if Mann had the kind of instantaneous feedback you get in the blog world the hockey stick probably would never have happened. Once he became so committed to an obvious error, it becomes impossible for a large ego to retract it.

  26. Sean said

    For RC, the science is settled set – the guilt of human sourced CO2 is established. For CA the science is subject to debate. The defense does not have to prove innocence only the presence of reasonable doubt. The defense council does not have to believe their client, just avoid knowingly letting the defendant lie.

    These are the rolls they set themselves. Other sites set themselves other rolls.
    Of course Rc accept the guilt of CO2 is debatable, the burden of proof would change.
    It would become more like a civil case.

  27. TCO said

    Sean: I have wasted more time on the two sites than you. I am well beyond simplistic assessments. If you have something more interesting, nuanced, penetrating in terms of an insight feel free. Oh…and if I diss CA, that does not mean I am an RC defender. Only the chimp tribe fece throwers think like that. And they do so reflexively without self-awareness.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: