## Preliminary PCA of Steig Data

Posted by Jeff Id on March 27, 2009

I moved my previous post done yesterday back to the top of the page.

—-

Today I received the data from Dr. Steig on the Antarctic paper. He sent a polite and very short response to my email and activated the file, no references to matlab classes. I don’t think he’ll mind because it is a short email but if he requests I’ll remove it.

Dr. Steig,

A link was provided on your homepage for the satellite AVHRR data. The filename is cloudmaskedAVHRR.txt.

Currently it appears to be unintentionally set for password permission only to access. I am interested in continuing analysis of this data through RegEM and comparison to the NSIDC AVHRR dataset. Can you please reset permissions for download?

Jeff

Sorry about that, permissions set wrong accidentally. Now accessible.

Eric

Thanks much.

Well everyone at CA is looking closely at it now. I’ve done a PCA analysis to compare this data to the 3 pcs provided in the reconstruction, they are close but not a perfect match. This is a quote from the paper in methods.

The first two principal components of TIR alone explain

.50% of the monthly and annual temperature variabilities4. Monthly anomalies

from microwave data (not affected by clouds) yield virtually identical results.

The statement about 50% seemed pretty questionable to myself and others. This next graph represents the eigenvalues weights.

The values represent the importance of each PC curve on the final trend but I find the graph a little hard to read. The next plot is more clear, by calculating the total weights of all the 300 eigenvalues (one per year of data) and by summing for each point in the graph above from 1 to x eigenvalues divided by the total we get this.

This graph is updated according to Hu’s comment below. The first 5 values are now.. 45.28819, 53.03861, 60.02086, 64.19012, 67.54436.

The code for this plot is here

for ( i in 1:length(svd0$d))

{

perc[i]=sum(svd0$d[1:i]^2)/tot*100

}

plot(perc,xlab=”Index (Eigenvalue)”,main=”Steig AVHRR Signal Contained in Eigenvalues”,ylab=”Percent contained in (1 – Index)”)

grid()

#savePlot(“C:/agw/antarctic paper/Sat data code/pics 3/steig AVHRR eigenvalue percent weights.jpg”,type=”jpg”)

I’m now going to have to agree with the statement about 50% of the monthly temp variation being contained in the signal. Originally I had written that it didn’t seem possible. Either way, RegEM needs covariance detail to properly allocate trends and they’ve thrown away 40 percent of it for no good reason with their PC analysis. Now that I think of it, that’s a good point, there is no reason whatsoever to throw away all that covariance information and Gavin’s explanation has been really weak so far. Steig’s group simply referenced this paper with some familiar names as the rational for their decision.

Schneider, D. P., Steig, E. J. & Comiso, J. Recent climate variability in Antarctica

from satellite-derived temperature data. J. Clim. 17, 1569–1583 (2004).

I haven’t read the paper but I wonder about its data now too. What do the PC’s look like?

Here are the reconstructed data PC’s from the same time period.

These are the first 3 PC’s I calculated from the AVHRR data.

Below are the pcs together on individual graphs

There seem to be ever increasing differences in the plots (green line) from pc 1 – 3. **IMO, This is demonstrates that this is not the actual data used to make the original paper, it seems close but for some reason there are differences.** This could be due to some differences between matlab and R (I haven’t proven that aspect) but the differences seem too large.

I didn’t stop there. Below are plots of the eigenvectors. They look pretty familiar.

Those above plots are the ones used, now what was left out.

There are a few things I can conclude from tonights abuse of my math processor. First, this doesn’t seem to be the actual original data but it is close. Second, there was a substantial amount of covariance information left out of the analysis. While this information doesn’t assist in monthly averages it is required for RegEM to properly allocate station weights according to area. This does not mean that even with higher eigs, we will get proper weighting. It simply represents an explanation of why Steig’s paper did not get proper spatial correlation.

This plot is natural data and represents an expected level of correlation vs distance.

This next plot shows what happens with Steig satellite data in RegEM. Note the extreme correlation vs distance of all the data.

Thanks to SteveM, Jeff C, Roman who’s anomaly code I used and all the people at CA who have a part in the code and information used to create this post.

The next step is pretty simple RegEM……..

Thanks also to Hu for pointing out the error in the eig weights.

## Jeff C. said

Good post. I’m surpised the 3 PCs aren’t the same as it seems foolish to provide something that isn’t correct after all this time. Perhaps they weren’t sure exactly what dataset they used and this was as close as they could get.

You show PC1 having good agreement between the two (the flatline difference trace), but eyeballing it, they don’t look they agree that well. Might be an illusion since they have different scales.

Regarding the scatter plot, here is one for the NSIDC AVHRR data.

http://i404.photobucket.com/albums/pp127/jeffc1728/nsidcavhrrscatter.gif

There is a reasonable correlation decay over distance. I’ll put one up for Steig’s new AVHRR dataset shortly.

## Jeff Id said

Thanks Jeff for the graph. Good stuff.

## Jeff C. said

I hesitate to post this as I can’t believe the results. Here is the distance correlation scatter plot for Steig’s AVHRR data set released today.

I’ve been over my code and ran it several times because this says the worst-case correlation of any two points is greater than +0.5! This is the same code I have used to run Steig’s recon and the NSIDC AVHRR. How can this be?

I’m happy to forward my code, but if anyone has calculated this independently I would love to see it.

http://i404.photobucket.com/albums/pp127/jeffc1728/steigavhrrscatter.gif

## Jeff Id said

Please send your code. Amazing. I’ve probably got just enough energy to repeat it.

## Jeff C. said

I just sent you the code in an email. I’ll look into more so don’t feel obligated to dig into this tonight.

## Hu McCulloch said

I believe what you are showing are not the eigenvalues of the covariance matrix, but the singular values of the AVHRR matrix itself. These are related, but the former are the squares of the latter. The cumulative explanatory power should therefore be in terms of the sum of squared singular values, not the sum of singular values. (I think — I’m just learning this stuff myself. :=) )

The natural “scree” cutoff point for PC’s would be either k = 1 or k = 3. However, it’s not clear to me what purpose is served in the present context, other than computational convenience, from reducing k below n at all.

(In fact, once you have removed seasonal means from the data, the rank of the matrix is n-12, rather than n itself, so the last 12 singular values are 0 and do nothing for the fit. But otherwise, I don’t see the point of reducing k, except perhaps to keep RegEM from choking up on too big a matrix.)

## Jeff Id said

Hu, Very nice. I didn’t understand what was happening to the last 12 values. Here they are.

[283] 1.936804e+01 1.905110e+01 1.887429e+01 1.866718e+01 1.850329e+01 1.789741e+01

[289] 4.805931e-12 4.589858e-12 4.247902e-12 4.131582e-12 3.579812e-12 3.324030e-12

[295] 3.323852e-12 3.153015e-12 3.139742e-12 3.047826e-12 3.017056e-12 2.974581e-12

I’ll change my plot.

## Layman Lurker said

Interesting that the Chladni patterns are back again. Are the ocean pixels in the Steig dataset? If they are then the cloud masking must have added to autocorrelation – seemingly confirmed by Jeff C.’s scattergrams.

## Lucy Skywalker said

Jeff Id

I came here from your post at WUWT which has a dead link but I guess this is the thread. Now I don’t understand enough stats to understand what is this “amazing discovery” of Jeff C, but I’m sure I’m not the only one who would like to understand!

## Lucy Skywalker said

Sorry, link is now working again, AND the Auto-matic article is back, I must have hit unlucky when it was in down time.

## Lucy Skywalker said

Crikey, I now see from the times of posts on both CA and your Auto-matic thread, that exactly as I tried to read your article, you were correcting the mistake you’d found, extremely quickly – as well as apologizing and leaving the story visible. And you’ve explained what it is all about.

Thank you.

## Jeff Id said

#11 No problem. I’m good at making mistakes so I have a lot of experience in apologizing.

The phones at work have been crazy too.

## TCO said

Leaving the mistake visible is different from Steve. Have had the experience of replying to a post and him editing it and not noting it. I’m sure it’s not deliberate. It’s just who expects ethics from [snip]

## Layman Lurker said

Jeff, any ideas yet on accounting for differences between pc’s used in reconstruction and pc’s you calcualted from Steig AVHRR?

## Jeff Id said

#14

I’m guessing this might be a redone version with slightly different settings. Like the original was lost. The other possibility I think of is that there is some rounding error in the calcs but I don’t know.