## NSIDC Sat Data PCA

Posted by Jeff Id on March 26, 2009

I thought about titling it NSIDC AVHRR PCA just for fun. I ‘ve done a short PC analysis of the antarctic data to see if I could replicate Steig’s 3 pc’s. It didn’t work out very well but it returned some interesting results. I had to post this now because we just got the real data from Eric Steig and will be using that next and will have something to compare it to.

Before I get started, credit for the code goes to about a dozen people, Jeff C, SteveM, Roman, Ryan and myself. I may have added someone or left someone off inadvertently but it has a pile of people involved.

First here is a plot of the first 100 eigenvalues.

Compare this to SteveM’s plot on CA of the processed data from Steig.

Remeber this is the statement in Steig’s paper.

Principal component analysis of the weather station data produces results

similar to those of the satellite data analysis, yielding three separable principal

components. We therefore used the RegEM algorithm with a cut-off parameter

k53. A disadvantage of excluding higher-order terms (k.3) is that this fails to

fully capture the variance in the Antarctic Peninsula region.Weaccept this tradeoff

because the Peninsula is already the best-observed region of the Antarctic.

It’s clear from my plot above that the data has strong eigenvalues at levels much greater than K = 3. The NSIDC data used to create the plots is processed differently than Steig but it is from the same instruments. I fully expected this to look different because there simply isn’t enough covariance information to get proper spatial weighting. It just means they left too much information out

Here is a plot of the first 10 pc’s.

The first 3 graphs are the same PC’s I would expect in the first half of Steig’s data. The right half of the graph below should look the same. Yeah, they’re not too close.

I then decided to plot the eigenvectors.

This is the same as the patterns SteveM produced but if you’ve read the posts you see the pattern is different. I expected the secondary oscillation to be top left to bottom right but the ocean cell contamination of the data overpowered the spatial covariance of the matrix. The fact that ocean pixels are more stable created a non-spatially or less spatially autocorrelated pixels in the data so the pattern is between the ocean pixels positive and center land positive and a ring in between negative. A link to a movie which plays the data and shows the ocean pixel contamination clearly is here.

I will need to rerun the analysis masking the ocean contaminated parts which should result in the autocorrelation patterns we would expect.

## Fluffy Clouds (Tim L) said

thanks for your work.

## Ryan O said

I think this helps show that the match between the surface data and AVHRR data is poor. If the match were good, then I would expect at least the first 2 eigenvectors to make a decent match. The first one is sort of similar, but the second one is definitely not. I wonder if the ocean contamination could explain the whole difference . . .

## Molon Labe said

You might make sure your higher order modes are not sensitively dependent on the data. Perhaps by making a small random net-zero adjustment to the data and repeating the decomposition and comparing the resulting eigenmodes.

## Jeff Id said

#3

There’s a lot of spatial autocorrelation in these set’s low orders yet so I wouldn’t expect much. You’ve been right before so can you tell me what would you expect to see from the test?

#2 Ryan,

Have you seen the movie at the link above? The sea pixels sit pretty flat while the continent oscillates up and down between them. I know you understand this better than me but I’ll say it anyway, unlike PC1, PC2 looks for the highest equal but opposite covariance across the dataset since PC1 removed them. I think the sea pixels are so strong they outweighed the spatial autocorrelation.

You have to be right though that cloud noise is making a difference.

## Molon Labe said

On the one hand, I’m thinking of numerical roundoff issues making it difficult to tease out the higher eigenmodes. If a slight perturbation of the data (slight being, say on the order of the measurement precision) gives wildly different results, then I don’t think you can have much confidence in them.

The basis of comparison would be to carry the analysis all the way through to computing temperature trends. I don’t think you can conclude anything by just comparing the shape of, say, the n’th eigenmode between the original and perturbed decomposition. The first n eigenmodes may just be a different basis for the “same” subspace, in which case the eigenmodes may be different but you’d still see the same trends.

On the other hand, consider the implications if such a slight perturbation – again mimicking measurement uncertainty – gave a significantly different temperature trend when you retain just 3 eigenmodes.

## Ryan O said

Jeff,

Yah, the movie is telling. I had done a similar thing in R so I could adjust the range (12-, 24-, 48-month means) and it’s always the same. The ocean inclusion definitely explains a lot – it certainly explains the coastal bands in eigens 2-4. But I wonder if that alone would wipe out the West Ant. eigen, or if adding the surface station back in and infilling would resurrect the original shape.