Preliminary PCA of Steig Data
Posted by Jeff Id on March 27, 2009
I moved my previous post done yesterday back to the top of the page.
Today I received the data from Dr. Steig on the Antarctic paper. He sent a polite and very short response to my email and activated the file, no references to matlab classes. I don’t think he’ll mind because it is a short email but if he requests I’ll remove it.
A link was provided on your homepage for the satellite AVHRR data. The filename is cloudmaskedAVHRR.txt.
Currently it appears to be unintentionally set for password permission only to access. I am interested in continuing analysis of this data through RegEM and comparison to the NSIDC AVHRR dataset. Can you please reset permissions for download?
Sorry about that, permissions set wrong accidentally. Now accessible.
Well everyone at CA is looking closely at it now. I’ve done a PCA analysis to compare this data to the 3 pcs provided in the reconstruction, they are close but not a perfect match. This is a quote from the paper in methods.
The first two principal components of TIR alone explain
.50% of the monthly and annual temperature variabilities4. Monthly anomalies
from microwave data (not affected by clouds) yield virtually identical results.
The statement about 50% seemed pretty questionable to myself and others. This next graph represents the eigenvalues weights.
The values represent the importance of each PC curve on the final trend but I find the graph a little hard to read. The next plot is more clear, by calculating the total weights of all the 300 eigenvalues (one per year of data) and by summing for each point in the graph above from 1 to x eigenvalues divided by the total we get this.
This graph is updated according to Hu’s comment below. The first 5 values are now.. 45.28819, 53.03861, 60.02086, 64.19012, 67.54436.
The code for this plot is here
for ( i in 1:length(svd0$d))
plot(perc,xlab=”Index (Eigenvalue)”,main=”Steig AVHRR Signal Contained in Eigenvalues”,ylab=”Percent contained in (1 – Index)”)
#savePlot(“C:/agw/antarctic paper/Sat data code/pics 3/steig AVHRR eigenvalue percent weights.jpg”,type=”jpg”)
I’m now going to have to agree with the statement about 50% of the monthly temp variation being contained in the signal. Originally I had written that it didn’t seem possible. Either way, RegEM needs covariance detail to properly allocate trends and they’ve thrown away 40 percent of it for no good reason with their PC analysis. Now that I think of it, that’s a good point, there is no reason whatsoever to throw away all that covariance information and Gavin’s explanation has been really weak so far. Steig’s group simply referenced this paper with some familiar names as the rational for their decision.
Schneider, D. P., Steig, E. J. & Comiso, J. Recent climate variability in Antarctica
from satellite-derived temperature data. J. Clim. 17, 1569–1583 (2004).
I haven’t read the paper but I wonder about its data now too. What do the PC’s look like?
Here are the reconstructed data PC’s from the same time period.
These are the first 3 PC’s I calculated from the AVHRR data.
Below are the pcs together on individual graphs
There seem to be ever increasing differences in the plots (green line) from pc 1 – 3. IMO, This is demonstrates that this is not the actual data used to make the original paper, it seems close but for some reason there are differences. This could be due to some differences between matlab and R (I haven’t proven that aspect) but the differences seem too large.
I didn’t stop there. Below are plots of the eigenvectors. They look pretty familiar.
Those above plots are the ones used, now what was left out.
There are a few things I can conclude from tonights abuse of my math processor. First, this doesn’t seem to be the actual original data but it is close. Second, there was a substantial amount of covariance information left out of the analysis. While this information doesn’t assist in monthly averages it is required for RegEM to properly allocate station weights according to area. This does not mean that even with higher eigs, we will get proper weighting. It simply represents an explanation of why Steig’s paper did not get proper spatial correlation.
This plot is natural data and represents an expected level of correlation vs distance.
This next plot shows what happens with Steig satellite data in RegEM. Note the extreme correlation vs distance of all the data.
Thanks to SteveM, Jeff C, Roman who’s anomaly code I used and all the people at CA who have a part in the code and information used to create this post.
The next step is pretty simple RegEM……..
Thanks also to Hu for pointing out the error in the eig weights.