More on CPS Signal Distortion

I’ve been playing around with CPS some more and am trying to figure out how to correct for the signal de-amplification created by correlation based proxy sorting. As I have explained here Historic Hockey Stick – Pt 2 Shock and Recovery correlation does not respond linearly to noise. I’ve re-written that post several times to improve it so it reads completely differently than before. This post was created using the same 10000 ARMA simulated proxies as noise and adding a new signal with a square wave in the historic portion. By averaging the square wave which was spread over 200 years we can get a nice calculation of the signal magnitude. Figure 1 is the shape of the signal used.

signal added to proxies
Figure 1, Artificial Signal

Now to explore the effects of different noise levels on the ARMA data a multiplier was used from 0 – 1 in 0.01 steps. So each calculation used the Fig 1 signal + proxy * multiplier. This had the effect of adding noise levels from 0 to 1 times the proxy data. Since the proxy data had a standard deviation very close to 1 you can think of the multiplier as the standard deviation of the noise.

Since we have 101 individual CPS reconstructions of varying quality, a surface plot does a good job depicting the shape change in the recovered signal. The RMS axis is noise/signal because the plotting program needed ascending values. Each of the individual reconstructions is plotted on the time axis.

signal to noise CPS
Figure 2,Surface plot of CPS reconstructions at different noise levels

From Figure 2, the non-linearity of the response is apparent. The CPS method with no noise results in a perfect reconstruction and as noise level is increased the values appear to asymptotically drift toward a zero historic signal magnitude. This demonstrates that higher noise levels will result in a ‘more unprecedented’ calibration range signal.

So then I plotted the magnification factor of the signal recovery for each reconstruction in Figure 3.

Magnification factor of historic signal vs  sd
Figure 3, Demagnification of Historic Signal as a Function of Noise

This again clearly demonstrates the reasons why correlation cannot be used to sort proxy data for validity. All of the proxies used here had equal validity yet there is a serious distortion in both the historic and calibration range signal magnitudes. I’ll have more on this in the near future.


Just for fun, I learned how to make gif movies.  Click to play, I think!


26 thoughts on “More on CPS Signal Distortion

  1. Jeff ID, I have seen passing references to what you are doing here and with regards to some of the Mann reconstructions, but I have not seen a detailed discussion of its importance in the overall view of things.

    I made a comment at CA on the sea ice extent reconstruction 2009 paper (Fauria and Grinsted et al.) with observation that the variation of the observed data is much greater than that of the reconstruction in the period where they can be compared (the instrumental period). To me if this relationship extended into the pre-instrumental period the reconstruction would be misleading.

    I think the reply will be that as long as the reconstructed period is extended into the full instrument period then both the pre and post instrumental data will be equally smoothed and comparable. It would be like saying we place the observed data points on the reconstructed graph but please ignore it and compare only the reconstructed data over the entire time period. I would not be satisfied with that reply, because, while we know we are cutting down the variation, we do not know how much for any given sub-period in the reconstruction.

    What really is misleading on my view is when the reconstruction for some reason does not extend entirely into the instrumental period and the instrumental period is simply attached to the end.

  2. Thanks Kenneth, I’ve also noticed people are working to ignore this. I’m not sure the reasons but I’ll just keep making it more clear until I can correct for the problem. I left two links to these posts on another blog and my whole post got clipped for it.

    I’ve seen no attempts to criticize these posts either. None by Mann, Gavin or anyone else. I’ll put up turnkey code for this as well soon (it needs commenting) and keep working on it. I think you and I know the variation is being cut, but most of those making the papers don’t think about it or don’t want to.

    BTW, my impression is that Mann 08 will have about a 60% reduction in the amplitude of any real signal in its proxies.

    I agree about attaching the instrumental period on the end. It’s a political ploy nothing more. Incidentally, since you’ve followed the RegEM thing, the EIV method of Mann 08 is a RegEM of the proxies onto the instrument record. Basically they used an overdefined multivariate regression to determine the scaling of each proxy to instrument data after they did the correlation sort. Since all data is available in the first several hundred years because they infilled it, the result should come down to a single multiplier weight per series at least in the early part of the reconstruction. Weights change when series are missing values.

    The result is a near seamlessly joined reconstruction and instrument record presented as a reconstruction. The signal distortions explored here hold true for EIV as well.

  3. Jeff,

    After I’m done with Antarctica, I’d like to look at Mann08 as well. The main reason is that a loss of amplitude will occur not only with noise in y – the dependent variable – but also with noise in x (time). It would be interesting to see what the loss of amplitude is due to dating uncertainty. I haven’t seen any work by anyone addressing that issue.

    Also, based on the Antarctic work, it should be easy to adapt the algorithm to do an RLS reconstruction of the temperature field directly using proxy data as predictors and using only covariance information from the instrumental record. This allows a direct comparison of the proxy temperature field vs. instrumental, which allows estimation of the CIs for the entire reconstruction for each individual grid cell. It also allows a determination of the effectiveness of the calibration – a la Brown 1982 and Brown & Sundberg 1987 and 1989. I still have yet to fully digest their stuff, but I understand enough with some help from Steve to be able to calculate CIs based on a goodness of fit of the reconstruction to the instrumental data.

  4. I looked at only one Brown and Sundberg and it was a difficult paper. The math is ugly. I understood the concepts presented though at least to the effects regarding PCA horseshoe patterns. I suspect that it should be referenced regarding Chiladni patterns as it discusses the horseshoe created by PC analysis on autocorrelated data which is the same effect.

    The reason I keep going back to the math here rather than proxies is because paleo papers keep using similar methods to get unprecedented ‘whatever they want’. The ignoring of the results above (some of which has been published) by all of paleo is almost as disturbing as the censored directory. My hope is that it will lead to an ability to correct the data.

  5. Hans von Storch published a similar analysis in Science in 2004, showing that the higher the noise in the proxies, the more flattened the reconstruction was outside of the calibration period.


    Von Storch, H., et al., 2004. Reconstructing past climate from noisy data. Science, 306, 679-682.


    Supposed free link to full paper (but I can’t get to work):

    Click to access vonStorch2004science.pdf

    Here are von Storch’s comments on the analysis and its context in Nature’s climate blog (be sure to read the comments!):

  6. Hans von Storch published a similar analysis in Science in 2004, showing that the higher the noise in the proxies, the more flattened the reconstruction was outside of the calibration period.

    Thanks Curt and Jeff for the reminder. This reference is the one to which I was alluding in my post above. I always wondered why we did not see more discussion of the importance of this observation. I would think it should be prominently displayed on all reconstructions like a warning label.

  7. 12,13,14.
    LOL…. you guys…….. stop it….lmao!

    lurker,,,, maybe if he put some bird pix. in there it would help!.

  8. Very interesting work here. I have a question though, may be because I didn’t read all the related posts: Wasn’t all this stuff already contained in the 2004 Von Storch Science paper? It looks, may be superficially, to me, that the main conclusions of all these posts where already contained there.

    Best Regards

    Giovanni Pellegrini

  9. Von Storch discussed some of this. What I haven’t seen is a demonstration of the deamplification of historic signals however there was a short discussion that this occurs. The code here is completely turnkey and can be easily copied, run and verified. So far it has been ignored completely by the team.

    The other problem with VonStorch is they bundled the whole paper with an incorrect replication of Mann98 which allowed the team to aggressively attack the paper as not accurate while ignoring the main conclusions. Actually, by my reading they accidentally did Mann 98 correctly and got a different answer.

    In my opinion, VonStorch should have dramatically changed reconstruction science but instead was ignored and what appear to me to be faulty reconstructions continue to be created at an unprecedented pace (some pun intended). The methods always seem to create the unprecedented shock and recovery pattern which we see in processing random data as well as data with a signal.

  10. I would say that what you call “deamplification”, is classified by Von Storch as “marked losses
    of centennial and multidecadal variations”, which is the central point of the paper. So finally I would say you’re pointing out the same thing. More interesting is the part concerning the reconstruction of the calibration signal. In Von Storch paper is clearly seen the loss of the final slope as you add noise… which is also what you show, though no visible shock and recovery pattern, may be for the specificity of the employed data. I have to look it up…

    Best regards.

    Giovanni Pellegrini

  11. #18,

    I agree with you that VonStorch was discussing some of the same issues. However, there are multiple effects which were not clearly laid out by VonStorch. One is the amplification/deamplifaction factor for the calibration portion of the signal. Another is the ability to extract any signal you want from the data. The third is the repression of all signal present in pre-calibration data. I’ve also separated the effects of correlation sorting without rescaling (amplification) and the scaling caused by CPS.

    There are two posts at the top which show some of these effects.

  12. For sure you do have the merit of having clarified all the controversial aspects of CPS one by one. I would say that the amplification/deamplifaction factor for the calibration portion of the signal, and the repression of all signal present in pre-calibration data were present in the Von Storch paper (see Fig.3 of that paper for instance, where the LIA disappears, and the finale temperature slope goes fairly down), but not clearly outlined as you did. I have some concerns with the second point tough, i.e. the ability to extract any signal you want from the data. Mann states that also an independent cross validation along different calibration period is needed. While the basis of CPS may be faulty, I guess that, by applying this sort of control, it would be impossible to extract, say, a sinusoidal signal.

    If you take as you calibration signal a SIN wave with length one period T=100 years, and then you calibrate on the first or last 50 years, I think that it would be extremely difficult to get any kind of correlation on the other 50 years. It would be worth trying with all the calibration signal you adopted, split in half. I’d expect decent results only for the positive slope, but who knows, may be the method is so badly flawed that even this independent calibration does not help.

    Best Regards

    Giovanni Pellegrini

  13. #20 What Mann 08 does is apply the calibration before and after a gap. Then he checks the fit inside the gap claiming that the fit exceeds that of random noise. There are several problems with that method which include that the data has been sorted according to correlation to the beginning and end of basically an upslope. We would expect the middle to contain an upslope then as well. Another problem is that M08 uses smoothed and infilled data which was seriously under corrected for autocorrelation. A difficult but not impossible demonstration to prove when the data is so bad the AR function won’t regress in many cases. However, the under correction and infilling by RegEM allows him to say the data is significantly better than random data.

    Your suggestion is a good one though and I’ll get to that when time permits. It would be an interesting demonstration.

  14. Again, just a few quick comments:

    a) You claim that taking a simple mean of the “proxies” recovers more signal than the CPS tecnique. That seems unlikely, however, where the SNR is significantly different in different “proxies” as would be the case in real world data. In such a situation, screening for the quality of the proxy would appear to be a necessity to recover a genuine signal, if a method can be found to do so.

    b) I am unsure as to the relevance of this claim to the climate debate. Specifically, you show that decreasing SNR will result in a reduction of the amplitude of the recovered signal. That is correct, though you may have overstated the effect for real world data (see comment elsewhere). However, this loss of magnitude takes the form of a shrinkage towards the mean of the calibration period. That means that for any reconstruction in which the recovered “signal” remains below the mean of the calibration period, it will understate cool periods, but overstate warm periods. the reverse will be true if the “signal” lies above the mean. Looking at the recovered “signal” by the CPS method in Mann et al 2008, it is plain to see it lies below the mean at all times before the calibration period, except very briefly, and just barely, around 600 AD. That means that to the extent that this effect biases the Mann et al 2008 reconstruction, it does so by overstating the temperatures. In other words, this effect cannot be the reason why Mann et al 2008 does not reconstruct a MWP warmer than the twentieth century mean. On the contrary, the failure to reconstruct a very warm MWP is likely to be a product of the proxies, and to be a genuine signal.

  15. a) We will have to disagree on this. It is statistical nonsense to sort data using math. Writing that, there are hypothetical cases where a known effect may occur that is ‘very clear’ and sorting removes this effect. None of this sort of clean sorting can be anywhere near justified on trees or temp proxies. The Mean is the obvious answer to the problems with reconstructions, if the data is too noisy the answer is collect more data.

    I realize very well that the average of the proxies won’t produce the unprecedented results some paleo’s too often look for. This is an obvious indication of the lack of quality of temp proxy data. Or perhaps the fact that temps aren’t unprecedented.

    b) Temperature reconstructions which are biased toward producing unprecedented temps today are being used to justify immediate and powerful reactions. These reactions would be a lot less desperate if we understood that only 1000 years ago we had already seen this type of rise and people could have the chance to consider carefully the fact that we haven’t exceeded normal variations. The predictions of disaster at X temperature rely on the fact that we haven’t seen X temperature in recent history. My opinion of this is that we have probably seen similar temperatures in the recent past without the doom scenarios of climatology actually occurring but I can’t prove it and sure would like to know before we trash our economies with idiotic mitigation strategies.

    To that end, I would like the disingenuous reconstructions to stop and good work to begin so that reasoned people can make good choices. People who make these papers and those who blindly support them (not saying you) need to step back and consider the possibility that what I and others have written about these reconstructions is completely true.

    This is the relevance to the climate debate.

    You say:

    However, this loss of magnitude takes the form of a shrinkage towards the mean of the calibration period.

    This is true. Then you follow it with:

    That means that for any reconstruction in which the recovered “signal” remains below the mean of the calibration period, it will understate cool periods, but overstate warm periods. — are you sure you don’t see the relevance to the climate debate?

    This is not correct though. CPS scaling is multiplicative not additive the data is sorted by correlation which amplifies variance in the calibration range. This multiplicatively scaled variance results in a de-amplification of the historic signal. I showed you this effect in the part two post at the hockey stick link above. The multiplication effect on a +/- 1 sine wave is deamplified equally positive and negative as shown by the iso-temperature lines.

    All the rest of your conclusions after this quote is therefore incorrect.

  16. a) Ignoring the hypothetical cases you mention, we can agree that any mathematical sorting will distort the signal recovered from the “proxies” to a greater or lesser degree. However, there are clearly situations in which inclusion of noisy data will unavoidably distort the signal as well. For instance, in the case where we have just two proxies, if the SNR of one is 0.1, and of another is 0.9, then clearly the recovered “signal” from taking the mean will closely approximate to the mean of two times the actual signal plus the noise of the first, ie, the noisier proxy. In this case, if we could find a way to eliminate the noisier proxy, we would a more accurately recover the signal, even though we then have no means to eliminate the noise from the less noisy proxy. Obviously, if the noise was genuine noise, sufficiently increasing the number of proxies would eliminate this problem. However, if the noise of the noisier proxies, or a significant subset of them, was from a contaminating signal that was not correlated to the signal we were seeking, then merely multiplying proxies would not eliminate the distortion from taking the mean.

    Another situation arises when a significant portion of the proxies were, in fact, pure red noise. Including these in our mean would have a very similar effect to that which you have shown for the CPS technique, ie, it would shrink the recovered signal towards the mean of the red noise. Again, simply multiplying proxies would not eliminate this problem if the amount of red noise remained constant. Indeed, if scientists started making use of more dubious proxies simply to multiply the number of proxies, they may well reduce the recovered signal rather than increasing it.

    For scientists, then, the issue is a pragmatic one. It is not whether or not their data sorting technique will distort the recovered signal. That is a given, even for non mathematical sorting techniques. It is, will the distortion so introduced be less than the distortion introduced by noisy data if the data is not sorted. If it is less, then there is a reason to use the technique even if it is not statistically immaculate.

    Take Mann et al. 2008 as an example. Of 1290 proxies, approximately 168 would have matched their sorting criteria just by chance. Consequently, of their final proxies, 316 would have had a significant amount of signal, while 168 would have been mostly noise. That means only 65% of the proxies used would have had a significant temperature signal. By chance, approximately 105 of the excluded proxies would have had a significant climate signal, but been excluded because by chance, that signal was masked in the calibration period. That means or the original proxies, 421 or 1290, or 33% would have contained a significant climate signal. That is a near doubling of the proportion of good proxies in the study by means of the screening technique, and could be expected to significantly improve the accuracy of the recovered signal. Whether it improves the accuraccy enough to compensate for the distortion introduced by the use of the CPS technique is a different question. Given your estimates of the inaccuraccy introduced, I doubt it. But given those shown by Mann et al. 2005, and on figure S12 in the supplementary data to Mann et al. 2008, it probably does. In the later case, Mann’s use of the technique is certainly warranted.

    More on CPS Signal Distortion

  17. b) I will ask you to look at this diagram, which is very informative, from your “Historic Hockey Stick pt 2 Shock and Recovery”:

    In this diagram, 0.5 represents the mean of the calibration period, and the 0.5 Iso Temperature line lies on the 0.5 uncalibrated temperature as a result. This means that any signal with a temperature greater than 0.5 in the original data will be shown as having a temperature greater than 0.5 in the recovered signal. What is more, it will be shown as having a lower temperature in the recovered signal than in the original data. Likewise, any temperature less than 0.5 in the original data will be shown with a temperature less than 0.5 in the recovered signal. It will also be shown as having a higher temperature in the recovered signal than in the original data.

    Now, in the recovered signal from Mann et al. 2008 (NH CPS land), as shown in figure three, the reconstruction only crosses the 0 mark once, and that is around 600 AD. The 0 mark may not in fact be the mean of the calibration period, and may instead by the 1960 – 1990 mean. That, however, is a sufficiently close approximation to the mean of the calibration period for my comments to hold.

    Given that the recovered signal only crosses the 0 mark at 600 AD, and at all other points lies below it, then from the properties of CPS reconstructions observed above, this means that the original data only crossed the zero mark in 600 AD, and that at all other times, including the MWP, it lay below the mean. That means that at all times other than 600 AD, the CPS reconstruction over estimates the temperature rather than underestimates it. That is, it does not show the cold periods to be as cold as they actually were (it understates the cold periods), but it also shows the warm periods other than 600 AD as being warmer than they actually were (ie, it overstates them). These facts follow from the fact that CPS techniques shrink data towards the mean of the calibration period, and that the CPS reconstruction in Mann et al 2008 only once (and briefly) climbs above that mean.

  18. #24, Noisy data always distorts the signal so we agree. Also if we could chose which tree was a better thermometer we could get a better signal we agree again.

    However, in a noise + signal situation you cannot do it by correlation sorting which is what I demonstrate here. The reason is that sorting the noise causes a bias in the calibration range signal amplifying the local variance.
    After scaling this increased variance in the calibration period results in a systematic deamplification of the historic signal. My hope is that by analysis and modeling of the AR signal + noise we could predict the amount of deamplfication and offset and provide a correction to the signal. It would be an estimate but it doesn’t seem impossible to me.

    Now regarding Mann 08, of the 1257 proxies 484 were recovered. This is higher than what would be expected from random data indicating that the data he used was not random. That’s good but that is a pretty weak hurdle for a reconstruction. However, there are many reasons why the data isn’t random.

    First – 90 percent of the proxies were infilled by RegEM. The process literally pastes on a hockey stick blade on the tips of the other data. While in most cases the paste was small in cases like Schweingruber MXD data 60 years of data was pasted on.

    There were 110 ish series from this set and not surprisingly most (like 90, it’s been a while) passed correlaiton. These can not be considered reasonable proxies anymore.

    In addition the luterbacher series was actual instrumental data pasted on the ends of the proxies. Of 71 I believe again from memory 70 passed correlation. These cannot be considered proxies b/c we’ve just correlated instrument data to instrument data to validate the historic portion.

    In addition to those and several other issues, Mann08 also used a pick 2 correlation method to improve screening results. This biases upward the threshold for what is expected of random data.

    Finally, the autocorrelation of the series was underestimated. Again, as in all cases above, biasing the number away from the possible conclusion that the data is random.

    When you put together all the different biases, it becomes clear that it’s likely that Mann08 has not even proven that the series were not random. The groups extreme efforts to bias the result (pick 2) are evidence of foreknowledge of this detail as well.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s