How Many PC’s Does it Take to …
Posted by Jeff Condon on February 17, 2009
I had an idea earlier today that RegEM on all the massive set of satellite data in the antarctic reconstruction paper may be actually equal to a RegEM of only 3 series. This is an oddity as revealed by Roman M’s brilliant analysis on climate audit where he discovered that the satellite reconstruction data is entirely represented by 3 pc’s for over 5000 grid cells. A massive amount of data for so small a number of series. Each gridcell is created by taking three trends times 3 x 5000 multipliers and added together to create all of the individual cells.
The way the paper seems to work is to use all the complexities of the 1982 -2006 satellite data with a wide variety of covariances in relation to the 42 surface measurements in a sophisticated bounded impution algorithm (RegEM) to reproduce data back to 1956. Well, it turns out that doesn’t seem to be the case.
I used the 3 back calculated PC’s that RomanM derrived which extend back to 1957 and deleted all values prior to1982. Three total temp series which were then placed next to the 42 surface stations in a matrix. RegEM was used to reproduce the 3 series.
The series on the left are the Steig PC’s, the ones on the right represent the recreated PC’s (the data prior to 1982 in the right side graphs were calculated using RegEM).
This is the SMALL difference between the plots.
What this shows to me is that the reconstruction with all it’s complex covariances actually comes down to the RegEM infilling of 3 PC’s based on the data from 30ish pre-1980 temperature stations. To be very clear, this is a reconstruction of the AWS data for an entire continent based on RegEM from — three curves!!
What makes this significant is that it is assumed from reading the paper that the AWS data covariance is used to determine station weighting relative to individual ground positions. Instead there is little information here to separate the locatons of the ground stations as would be required for a proper RegEM reconstruction. Also, we now can assume that the AWS reconstruction was done on 3 pc’s rather than the entire set of data.
I think that due to the number of reads and lack of comments this may be a bit too confusing the way it’s written so I added a bit more explanation.
First a PC or principal component is basically a curve. The curves in this case were calculated to be a best match to two reconstructions, satellite or AWS data for the antarctic. The 3 curves get a multiplier which can be positive or negative and are added together to create a reproduction of the entire field of satellite measurements. This process can do a good job of representing a field but limitations in the number of pc’s can create limitations in the detail level of the resulting field. Still, if done right the trend should be accurate.
From Jeff C the trend in the field from the 3 pc’s looks like this.
The above graph comes from the data presented by Dr. Steig at his website, Roman discovered this data was actually PC data and did not include the real satellite data. Jeff C simply plots all the points on a map of the antarctic. Jeff’s graph shows a positive trend at nearly every station in the antarctic, still you can see some of the detail level in the trends which let’s us know that the 3 PC’s can make a nice field. You can see the title of the graph goes back to 1957, yet the AWS data only exists from 1980′s onward.
In RegEM a temperature station would be expected to have a high correlation with the real satellite data at a the same point so RegEM would then assign a high weighting from the nearby surface station historic data to that individual point creating a reasonable spatial trend.
By using a low number of PC’s in the reconstruction rather than the actual satellite data the likelihood of an individual station receiving proper weighting is reduced and by my own guess it seems pretty minimal. That means that the stations in the peninsula would have their already exaggerated (from Jeff C’s post) weighting assigned to the entire reconstruction.
The first ten times I read the paper I assumed real data was used to properly match surface station data to the satellite data. This result clearly shows that the processed data from 3 pc’s was used to create the historic trends for all the satellite record. My result above has a very small error in the reconstruction which shows that the reconstructions of the pre-satellite data can be created with minimal concern for the weighting of individual stations.
This isn’t final proof or anything but it is strong evidence to me that the peninsula stations are likely exaggerating the trend of the entire reconstruction.