the Air Vent

Because the world needs another opinion

Correlation of Reconstructions

Posted by Jeff Id on February 25, 2009

One of the key issues with the Steig09 RegEM algorithm is the assumption that correlation will create appropriate location based weightings for the reconstructed trend. The correlation in the AWS (automated weather station) reconstruction is based on signal in AWS and manual Surface Station data. This post uses the same algorithm employed by SteveM on CA to demonstrate the correlation vs distance of the surface station data prior to RegEM. RegEM infills the missing data through the whole matrix so while Steig09 presented the AWS as a verification of the satellite reconstruction, the surface station missing data was also reconstructed. Therefore this first three graphs are similar in shape to SteveM’s result but is actually based on the RegEM data.

This is a plot of the correlation of the total surface station data as reconstructed by RegEM.

antarctic-correlation-vs-distance-aws-surface-total

The second graph is just the more recent 1980 onward data which is almost exactly the same.

antarctic-correlation-vs-distance-aws-surface-post-1980

The third plot of pre-1980 surface station data has quite a bit more infilling yet retains a good amount of data.

antarctic-correlation-vs-distance-aws-surface-pre-1980
Visually by the above plots RegEM did a good job in correlating the infilled stations according to the region although there are a few outliers (see the near 1 correlations of greater than 2000 km). This is therefore a pattern we should see in the other reconstructions.

The important part of the Steig09 AWS-surface station reconstruction was the AWS station data which was far more sparse. This data was used as verification of satellite trend.  The next 3 graphs are all created from automatic weather stations after RegEM infilling.

antarctic-correlation-vs-distance-aws-totalOf course we would expect some increased noise but this plot shows a near 1:1 correlation for a lot of stations up to 3000 km apart. What this demonstrates is significant spreading of individual stations influence.

The next plot is the post 1980 data only which is the only time frame where any automatic weather station data existed. You would expect better correlation with distance.

antarctic-correlation-vs-distance-aws-post-1980It’s better because some of the data drops back from the near 1 correlation but is still not very good with only slight improvement from the total dataset.

Finally the pre-1980 data, this data is totally reconstructed from RegEM.

antarctic-correlation-vs-distance-aws-pre-19801

Clearly the pre 1980 reconstruction has some serious problems. I would have thought that an analysis which relies so heavily on correlation to determine proper weighting would have some minimal verification that appropriate weightings were used. In my world I’d be stuck with a product that didn’t work. In climatology it doesn’t seem to matter.

The problem is that the correlation vs distance of the manned surface station data (first 3 graphs) is coherent between reconstructed and original data. Therefore, this represents the pattern you expect for proper station weighting. Since AWS didn’t match the surface stations reasonably, there will be an unaccounted for ‘bleed through’ of the trend of some stations into others. This is verified again by the collection of near 1 correlations of the pre 1980 AWS data (last graph) which demonstrates that insufficient information was provided to constrain the expectation maximization algorithm.

The fact that the AWS stations show such extreme scatter confirms that the RegEM reconstruction did not properly account for area weighting in this case and cannot be reasonably used for trend based conclusions as done in Steig09. I really wonder how this reconstruction could verify the satellite result.

26 Responses to “Correlation of Reconstructions”

  1. Jeff, the reason for the differences is that there is a lot more actual data in the surface network than the AWS network and the RegEM version keeps actual data. If you do a similar scatter plot for the AVHRR “representation” that we have, you get something that looks like your last scatter plot. This is then imposed on the surface data.

  2. Jeff C. said

    I’m turning this over in my mind to try to understand the implications. The measured surface data (the most complete, measured data set that we have) without any infilling shows a strong distance correlation. This is our standard against which to judge the merits of the infilling in other data sets. If more infilled data leads to the distance correlation degrading, doesn’t that mean the infilled data must be in error?

    Setting aside the AWS for a moment, if you compare Steve’s first surface scatter plot (which I believe has no infilling) with the top plot in this post (which has infilling), the distance correlation has degraded. I think that demonstates that RegEM is not smart enough to avoid spurious correlations without explicit distance knowledge. It may be my own confirmation bias, but I’m not sure how else to read it.

  3. Jeff Id said

    #2 that’s exactly as I see it. SteveM already did the satellite version of this calc after his comment at #1. and it looks the same as the AWS. I see this as a big problem myself because if the area of influence of a temperature point is unnaturally spread without substantial control, the reconstruction is not appropriately considering its weighting in the trend. In the sat version, this means there are too few PC’s.

  4. James Mayeau said

    Jeff Id are you to the point where you could take data from 46 US surface stations, and eigenvector a temperature plot for the rest of the country, to test Steig’s method of manipulation against real life temperature records?

  5. Layman Lurker said

    #4

    James, IMO I think that might be difficult (if not impossible) to do without being able to duplicate the geographic and climatic circimstances of Antarctica and also how the surface stations are situated on the island. There would be different variables (or more, or fewer) meaning different numbers of PC’s etc..

    It me be possible, however, to do some kind of case study of a suitable area of the world to examine whether some of the same issues arise which have been suggested such as distance correlation problems, station weighting, etc. Still, if you read the latest post at CA: http://www.climateaudit.org/?p=5326#comments you will see that even some of these narrower issues could be related to varibles which would be hard to find outside of antarctica.

  6. Neil Fisher said

    James, IMO I think that might be difficult (if not impossible) to do without being able to duplicate the geographic and climatic circimstances of Antarctica and also how the surface stations are situated on the island.

    Surely several attempts using different areas could help here? USA, Europe, Austalia, Asia – there are surely plenty of places with temperature records we could decimate and then attempt to re-construct. Hell, you could use the same area and use different stations as your “data” and compare as well. This would give an idea of how “good” or “bad” the general method is, would it not?

  7. Neil Fisher said

    Further to my previous post:
    Surely this should have been done *before* Steig et al was given cover page status?

  8. James Mayeau said

    Surely this should have been done *before* Steig et al was given cover page status?

    Amen brother.
    But I have a feeling Nature Mag and the AP aren’t really all that picky when it comes to evidence confirming global warming.

  9. Jeff Id said

    #4 I can do it but I’m not sure the value. I tend to get caught up in my own thoughts on these things. After JeffC’s recent post on CA I want to redo the above analysis using regpar=7 data to see if it does a better job.

    BTW: I’ve started playing around with artificial data with expected correlation to try and understand how the long term trends are copied, so far RegEM does a lousy job with certain patterns but nothing worth posting yet. I hope this sort of analysis has the potential to do a better job getting right at the heart of the matter than temp stations.

  10. Jeff C. said

    #9 – FYI I picked regpar = 7&8 because I wanted to use some of the 6 lobe Chladni patterns in the recon (assuming that the whole Chladni hypothesis hold water). I was thinking these might be able to contain the smearing as the lobe size was consistent with the size of the localized phenomena (i.e. peninsula warming). It may work as well with regpar set to a lower value but I have not had a chance to try it.

    I left neigs unset (default to maximum) and it took about 80 interations to converge using repar = 7.

  11. mike freeman said

    Ok guys…the Jeffs, Steve M et al….

    I’m not a staticticthingy person as I cannot even spell it.
    You have all been pouring over Steigs report for a while now. In the UK the BBC is all over the story that the world enviro ministers are in Antartica right now to witness ‘global warming’ because, very convient, a new study suggests that Antartica is warming after all. Now I can imagine the scheduling of this trip was prob made, lets guess, a year ago maybe? Or maybe 6months?
    Anyway, am sure such a trip is not something drummed up on a a Thursday night (“hey Hillary “- thats our enviro bloke not the famous ‘Mrs I used to be the next President of the USA’ – “wotcha doin’ this weekend, fancy popping down to Antartica, me an’ me mates thought it would be summat to do eh?)
    And so they arrive just after Steigs report? Gosh they are lucky Steigs report – the outcome of which must have been unknown to Steig when they started gathering their data(….ahem)- didn’t say “no global warming to see here, move along…”

    So the question to you guys is….you have been playing with the data now (bless you) for a while. To a layman….is the Steig report bogus, yes or no?
    I’m not big on coincidences so our politicos arriving there at this time makes my eyebrows twitch. But despite trying really really really hard to follow your blogs & comments….you guys are speaking a diff language to me. I get the fact that the Hockey Stick was made from taking a perhaps unrepresentative proxy in lieu of others which would not show the ‘desired’ effect….so are you guys confident to say that Steigs paper is also guilty of this?

  12. Tim G said

    Maybe I don’t have the context. But I’m not sure that absolute distance between stations is the correct measure. It seems that (at least for locations in the interior) the latitude (or relative distance from the South Pole) might make more sense. Point equidistant from the South Pole each receive the same amount of sunlight each year.

    Just a thought.

    –t

  13. Layman Lurker said

    #9

    “I can do it but I’m not sure the value”

    If you decided to do this you might be better off constructing your own scenario to test with controlled signals, numbers of variables, etc. so that you are not having to put up with false starts and mental summersaults over interpretations yahdah yahdah.

  14. Jeff Id said

    #11 There are a lot of problems in the paper. There are two halves to it, we have only been given the data for the first half and are guessing at the code. The first half was used as a verification of the conclusions not to make the conclusions so we can’t really discuss the final result without the data.

    However, Steig’s reluctance to provide the data and code combined with the real and I think substantial problems found in the correlation indicate to me that the long term trend they calculated cannot be trusted as they are quite possibly artifacts of an unnecessarily over-complex and unverified statistical process. The complexity of which makes me a bit suspicious all by itself.

    Why the crazy math? What’s wrong with an area weighted average? It worked when I went to school.

    Of course the other guys have their own opinions.

  15. James Mayeau said

    Here is my thinking. What Steig did in effect is to take some remote locations and extrapolate these temps to the wider area.
    Lets say he did the same with the United States. The Antarctic peninsula would be an analog of Florida, so if Steig did the same thing with the USA he would have a handful of temp readings from for instance Miami, Tampa, Fort Lauderdale, Mobile excetera. From there he would extrapolate that Billings Montana, Denver Colorado, St. Paul Minn. excetera, had such and such a temperature trend.

    Intuitively I know it’s total nonsense. You can’t tell what the temp was in Sacramento (where I live) from a San Francisco Los Angeles Redding extrapolation, but intuition doesn’t carry much weight in the world of science.
    So what we need are a few (was it 46 that they used in Antarctica?) station series from the warmer parts of the US then to use Steig’s procedure to extrapolate what the temperature of the rest of the country would be according to Steig.
    Then when those emulated temps don’t match the reality of Montana Idaho Maine and New Jersey we have a concrete demonstration of just how useless Steig et al is as a study to base policy on.

    You see the point?

  16. I think that demonstates that RegEM is not smart enough to avoid spurious correlations without explicit distance knowledge. It may be my own confirmation bias, but I’m not sure how else to read it.

    Jeff, here’s a possible explanation for the weirdness in the AVHRR correlations. The recon is only rank 3. That means that each of the 5509 columns is a linear combination of only 3 series (of length(600). There’s only so many ways that you can combine 3 series, so when you try to make 5509(!) different combinations with a length of only 600, you’ve essentially got more pegs than holes and it seems to me that it will be an absolute cauldron of spurious correlation. The archived data had 99.9% correlations between series over 3000 km apart. If, as seems likely, they used the rank 3 version in the meatgrinder, this may turn out to be a real mess.

  17. Layman Lurker said

    #16

    Yes Steve, it is along the logic lines of David Stockwell’s post on your Chladni thread: “Bias introduced by finite eigenvector choice”. Would it be possible to compare a hypothesized projection for a scattergram of distance correlation with Chladni patterns to those found by Jeff on this post?

  18. Layman Lurker said

    #15

    James, read CA’s post here: http://www.climateaudit.org/?p=5326

    It appears that PCA analysis of an autocorrelated time series might yield PC’s with predictable “Chladni” patterns – some with warming and cooling lobes. All warming for a given PC would then be forced into it’s “lobe” regardless of orientation of original station data where warming has occurred (ie: peninsula). This accomplishes pretty much what you suggest in your post.

  19. Layman Lurker said

    Further to #18:

    James and Mike, while these Chlandi patterns may “smear” warming from the peninisula to it’s “lobes” on the continent, these could not likely be replicated at the US because Chladni patterns reflect autocorrelation on a “disk”. This disk pattern may also be a function of relative symetry of the continent around the pole, so it may be difficult to go anywhere else in the world to replicate this – Australia for example is an island but does not have the symetry of latitude that Antarctica does.

    There are still much work ahead (for team #2!)to confirm that this is or is not happening. It is important to note that these patterns, if they exist, are not likely not physical processes which Steig speaks of. They would be artifacts of PCA analysis of “disk” autocorrelation. If this proves to be the case, one has to give the benefit of a doubt to Steig that he was not aware of this potential problem. His arm waving of projecting physical processes onto the first two PC’s may have been confirmation bias on his part however.

  20. Layman Lurker said

    Sorry for clogging things up Jeff, the wheels are spinning (and gears grinding)a bit here so bare with me.

    Thinking about the Chladni patterns still, it seems to me that the wierd shape of PC3 of the sat reconstruction makes more sense in this frame of reference. If you look at the pre 1982 section of PC3 and look at the wiggles with a magnifying glass, they look like a shrunken version of your Jeff C. grid weighted PC3 in the same time frame. Is this an artifact of the low weighting of the peninsula seen in the Chladni eigenvectors? If it is then this would this not be confirmation of the Chladni pattern? This affect is seen because there are a disproportionate number of surface stations (17 of 42) being shrunk compared to the geographic area affected by the low eigenvector weighting. Funny thing is I originally thought that the grid weighting constrained the ability of the peninsula to project disproportionate warming onto the continent, but with Chladni eigenvectors it is the opposite, external grid weighting would actually offset the shrinking imposed by the eigenvector wieght configuration. I’m not sure I have got my head around PCA analysis yet so tell me if I am out in left field here.

  21. James Mayeau said

    I’m not sure I have got my head around PCA analysis yet so tell me if I am out in left field here.

    This is the problem in a nutshell. Lurker, you have been paying attention to this issue, and yet it still elludes understanding.
    Now imagine trying to explain
    Chladni eigenvectors to a person who is rebuting you with “Antarctica is warming too”.
    You will lose the argument, not because Antarctica is really melting, but because the attention span of the audience will be exceeded.

    This is why a Steig type recon of the US landmass is useful, possibly essential.

    Antarctica is a special case study for the basic proposition “Does co2 cause warming of the atmosphere?” and it’s follow up questions, “If so how much?”,and “is this amount dangerous?”

    Lubos Motl put it that in science the ideal understanding comes when you remove as many variables as possible to study the object.
    CO2 is supposedly well mixed, therefore it’s effect will be in the temperature record even over Antarctica. Over every other part of the globe the signal is masked by “natural variability”, but Antarctica is widely known to be the one place cut off from “natural variability” by circumpolar wind currents. Also it is the driest continent so those are two aspects of climate which are removed from temperature studys of Antarctica.
    Even Steig’s miniscule warming created by a flawed study points to much lower co2 sensitivity then would be dangerous. Without it we are talking about no effect at all.

  22. Layman Lurker said

    James, you are right about peoples eyes glazing over with technical explanations. One cannot escape from the technical limitations here though. The whole issue with Chladni patterns in Antarctica is that it the conditions that give rise to it – autocorrelation of time series on a disk – likely cannot be found elsewhere in the world. It is likely a function of the unique geography of Antarctica. One cannot take this to the US and apply a test case because you will not generate Chladni patterns, therefore you will not generate the artifacts that cause the warming bias. It’s unfortunate but it is what it is. BTW, I am totally open to counter arguments and the possibility that I am wrong on this.

    I don’t think disproving this paper would show anything other than Antarctic warming has not been proven. Even if the Steig paper is shot down, the warmers will just retreat to their prior position – lack of warming in the Antarctic is not inconsistent with GCM’s, the ozone hole, etc. And maybe they are right. However, the “Team” could take a shot accross the bow with this one. And who knows where that would lead.

    As far as losing arguments with others who don’t get the science, is that ever going to change? We’re talking about people who generally have closed minds on this issue. Hell, many will think that “big oil” have paid off the Jeff’s , Steve, etc. to manufacture some dubious claims. Or they will say that only actual climate scientists are capable to produce quality work. If they have open minds then they will open up to understanding the issues. BTW, many people who are not close minded are coming to sites like this one and CA in droves. What a treat they have witnessed over the last month.

  23. Jeff Id said

    #22 As I understand it the eigen patterns will happen on any spatially autocorrelated data. If you take a circular area from the center of the US and apply the same technique, you will get similar patterns in most cases although the axes of pc2 and 3 will shift according to the data.

    My stat terminology may be wrong but here it gos. The pc process works by finding the most covariance of the dataset. So for pc1 the covariance ends up representing the trend of the data. After that main trend is found it is subtracted from the trend at each temp station (residuals remain). Since the primary trend is removed from the residuals and the remaining maximum covariance is found the second time, the data is split in half – half positive, half negative causing the two lobed boundary pattern which is again subtracted for the residual. Of course since that is removed applying the analysis again will produce an orthogonal right angle split as the next maximum covariance. Since both directions in the two lobed pattern are removed the next step is a split of the two lobed patterns which results in the four lobed patterns aligned and at 45 degrees to the two lobed.

    I’m sure you’ve read this link but others may not have.

    http://www.climateaudit.org/scripts/toeplitz/circle.pdf

    Anyway these patterns should be found in spatially autocorrelated data.

    If the data is not spatially autocorrelated, i.e. one station is not related to an adjacent one the PC1, 2, … lobes still exist in the method but the plot of the lobes will be randomized so a graphic output doesn’t show any workable detail.

    Since all temp stations are related, i.e. it’s nearly the same temp here as 20 miles from here, the patterns are an artifact of the method rather than the natural world. — Stick a fork in it.. :)

  24. Layman Lurker said

    Thanks for the clarification Jeff. I respectfully stand down and retreat to my corner for a crow feast.

    My speculation was that because Antarctica was at the pole, that this autocorrelation would reflect symetry of a disk around its center. In other words, you can go five miles any direction from the pole (near the center of the island or disk) and be at the same latitude. Such a phenomenon would be unique to Antarctica as an island at the pole. Therefore the autocorrelation would be at work for all points equidistant from the disk’s center as well as between individual points on the disk.

  25. James Mayeau said

    Even if the Steig paper is shot down, the warmers will just retreat to their prior position – lack of warming in the Antarctic is not inconsistent with GCM’s, the ozone hole, etc. And maybe they are right.

    I enjoy the thought of warmers retreating.
    Yep.
    Let me try for a minute and think of a downside…

    Nope. I’m still good with it.

  26. page48 said

    “Since all temp stations are related, i.e. it’s nearly the same temp here as 20 miles from here, the patterns are an artifact of the method rather than the natural world. — Stick a fork in it..”

    I think I’m beginning to get the gist of PCA and its limitations. I’ve been slogging through books for a few months, now, and, yes – I think I am beginning to understand the process! Yea!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 148 other followers

%d bloggers like this: