Correlation of Reconstructions
Posted by Jeff Condon on February 25, 2009
One of the key issues with the Steig09 RegEM algorithm is the assumption that correlation will create appropriate location based weightings for the reconstructed trend. The correlation in the AWS (automated weather station) reconstruction is based on signal in AWS and manual Surface Station data. This post uses the same algorithm employed by SteveM on CA to demonstrate the correlation vs distance of the surface station data prior to RegEM. RegEM infills the missing data through the whole matrix so while Steig09 presented the AWS as a verification of the satellite reconstruction, the surface station missing data was also reconstructed. Therefore this first three graphs are similar in shape to SteveM’s result but is actually based on the RegEM data.
This is a plot of the correlation of the total surface station data as reconstructed by RegEM.
The second graph is just the more recent 1980 onward data which is almost exactly the same.
The third plot of pre-1980 surface station data has quite a bit more infilling yet retains a good amount of data.
Visually by the above plots RegEM did a good job in correlating the infilled stations according to the region although there are a few outliers (see the near 1 correlations of greater than 2000 km). This is therefore a pattern we should see in the other reconstructions.
The important part of the Steig09 AWS-surface station reconstruction was the AWS station data which was far more sparse. This data was used as verification of satellite trend. The next 3 graphs are all created from automatic weather stations after RegEM infilling.
Of course we would expect some increased noise but this plot shows a near 1:1 correlation for a lot of stations up to 3000 km apart. What this demonstrates is significant spreading of individual stations influence.
The next plot is the post 1980 data only which is the only time frame where any automatic weather station data existed. You would expect better correlation with distance.
Finally the pre-1980 data, this data is totally reconstructed from RegEM.
Clearly the pre 1980 reconstruction has some serious problems. I would have thought that an analysis which relies so heavily on correlation to determine proper weighting would have some minimal verification that appropriate weightings were used. In my world I’d be stuck with a product that didn’t work. In climatology it doesn’t seem to matter.
The problem is that the correlation vs distance of the manned surface station data (first 3 graphs) is coherent between reconstructed and original data. Therefore, this represents the pattern you expect for proper station weighting. Since AWS didn’t match the surface stations reasonably, there will be an unaccounted for ‘bleed through’ of the trend of some stations into others. This is verified again by the collection of near 1 correlations of the pre 1980 AWS data (last graph) which demonstrates that insufficient information was provided to constrain the expectation maximization algorithm.
The fact that the AWS stations show such extreme scatter confirms that the RegEM reconstruction did not properly account for area weighting in this case and cannot be reasonably used for trend based conclusions as done in Steig09. I really wonder how this reconstruction could verify the satellite result.