Posted by Jeff Id on March 27, 2009
This post has an error. The calculation inadvertently used temperature rather than anomaly.
Thanks to Steve McIntyre at Climate Audit and comments from Hu McCulloch at Climate Audit for quickly spotting this error and bringing it to our attention. I think this points out pretty well that accusations of cherry picking or playing favorites on Climate Audit aren’t reasonable. Problems get chopped up and spit out regardless of the source or meaning.
The two Jeffs
UPDATE This is the corrected graph by Jeff C which has reasonable correlation vs distance.
We know there’s a problem. It’s a question of where and how much.
A guest post by Jeff C.
Jeff C has written a short post for us which I have independently verified with my own separately written code. I think you’ll be surprised.
Previously we have looked at scatter plots to get an understanding of how well correlated the data appears over distance. Below is a scatter the raw Antarctic surface data. This contains no infilling, just actual measured data from occupied surface stations.
This plot above was originally calculated by Steve McIntyre. Note how the correlation is virtually 1 at 0 km, with a gradual decay as distance increases. This is what we would expect to see as stations closer together should have better correlated climate than stations far apart.
This plot, also provided by Steve, is a distance correlation for the satellite era (1982-2006) of the Steig 3 PC reconstruction used in the Nature paper. Note how correlation remains at 1 for some cell pairs at distances out to 3000 km. This seemed suspicious and led many of us to believe that the reduction to 3 PCs had caused a spatial smearing of the data.
This plot above is of the NSIDC AVHRR data from the University of Wisconsin website that Jeff and I have been processing for the past few weeks. Note that the “cone” is quite a bit wider than the surface data, but the distance correlation looks reasonable. Some of the cell pairs are still rather well-correlated at long distances, but we don’t see the values of 1 we saw on the Steig reconstruction.
This is the shocker. Here is the distance correlation plot for the Steig cloud-masked data released today. This data set has been presented as the satellite data used as the input to the reconstruction. If it were truly “raw” (or minimally processed) satellite data, we would expect to see a plot similar to the NSIDC plot immediately above. Instead, we see that every single data pair has a correlation of greater than 0.5!! Data from the peninsula is highly correlated with data from the East Antarctica coast and the interior despite the surface data showing nothing of the sort.
Why would this data set have such a high cell to cell correlation? I’m speculating here, but Steig talks about “enhanced cloud masking” where daily data points that exceed the climatological mean by +/- 10 deg C. are considered cloud contaminated and discarded. From my experience with the NSIDC AVHRR data, a huge number of data points would be affected by this threshold, perhaps as much as 50% of all points. If a simplified infilling algorithm was used to replace those points, high correlation might result. Regardless, this plot appears to show that the cloud-masked data set is highly-processed and suspect.
When I first ran this plot I thought it must be in error. I checked my code line by line and have repeated the results multiple times. I still find it hard to believe.
I’ve spent several hours verifying this post and have independently verified the results using my own code. Jeff C’s code used a subset of every 5th value in the grid (due to matrix size R can’t handle the full matrix). My independently written version used a random subset method which was derived from SteveM’s original sat correlation.
What it means:
The concept of this paper was to use spatial information to insure proper weighting and location of individual surface stations across the antarctic. The surface stations are the lowest noise measurement of atmospheric temperature and show a particular correlation pattern which we can consider “natural” (the first graph) . This is the pattern you would expect to see in any data representing antarctic temperature. The 3rd graph is the NSIDC dataset and represents spatial correlation of the publicly available cloud masked data from the same instruments as processed by the NSIDC. There is a wider spread of the cone angle as compared to surface station data which is expected due to the increased noise level in the dataset, but the key is that there still is spatial information available. The last graph however has correlations pegged at almost 1 for the full width of the dataset independant of the distance, mountain ranges, peninsula, sea contaminated pixels and the rest.
From my other post which derrived the 3 pc’s for the reconstruction dataset, this data doesn’t seem to be an exact copy of the original data but it is close. What’s more is we can now make sense of the second to last graph which is derived from the full reconstruction using 3 pc’s as presented by Steig. The data from graph 3 has almost a parallelogram shape because surface station data’s correlation vs distance is copied equally across the entire satellite dataset regardless of actual location.
If you take the surface station points (graph 1) and spread copies of the surface station data across the entire width of the Steig satellite data (graph 4), you get (graph 3).
I’m not in any way saying or in any way implying this was done intentionally but this is just about the perfect dataset to use if you want to weight every station equally and basically average the pre 1982 trends across the entire continent. I thought we were going to have to go through RegEM and do a lot of calculation to find if this was the case — not this time. This is the perfect scenario to blend the high concentration of known warming peninsula stations across an entire continent.
A copy of Jeff C R code if you would like to verify the calculation:
#calculates great circle distance with earth radius 6378.137 km
circledist =function(x,R=6372.795) #fromlat,fromlong,lat,long
delta= y *pi180
parse=5 #set every nth point to include, setting to 1 is very slow
#grid=scan(“anom_5509.csv”,n= -1,sep=”,”,skip=1) # use for UWisc AVHRR
grid=scan(“cloudmaskedAVHRR.txt”,n=-1) # Use for Steig recon or cloud masked
dimnames(anom14_5509)[] <- 1:5509
anom14_5509=anom14_5509[,seq(1,5509,by=parse)] #parses to every nth column
coord_5509=coord_5509[seq(1,5509,by=parse),] #parses to every nth row
#make correlation matrix
corry=cor(anom14_5509,use=use0) #correlation coef calculation
sum(!is.na(corry) &corry<0) # 658
#make lat-long matrices
#make ID matrices
title(“Steig AVHRR cell distance correlation”)