Historic Hockey Stick – Pt 2 Shock and Recovery

This post is about testing how well Mann 08 CPS (composite plus scale) can recover a signal from artificial ARMA proxy data. ARMA is just a fancy method to create artificial signals which match the noise and autocorrelation of a measured one. If you’re not familiar with this – don’t worry, it doesn’t matter.

In the last post we saw that Mann CPS hockey stick maker can make any shape you want using the same method and data used to make a hockey stick temperature curve. It happens because any data which doesn’t correlate to a pre-determined curve is discarded. This leads reasonable folks to say incorrectly – If it is temperature it should correlated so the method is reasonable. What’s missing from this seemingly reasonable understanding is that the response of the correlation sorting to noise level is non linear.  In high or medium noise cases correlation can become a cherry pick of your favorite noise.   This post takes the next step and looks at how well CPS does at retrieving a signal from both zero average random data and random data with a signal.

First, I looked through the Briffa Schweingruber MXD latewood proxies, which are tree ring density proxies. These ones are interesting because you can actually see the imputed (not real) data which was RegEM infilled on the graph endpoints by the change in noise level in the most recent 60 years.

Figure 1
Figure 1

The code presented below performs an ARMA match to the noise level of actual proxies including (redness – see comment two below) and creates 10,000 proxies with no trend but similar noise levels. Figure 2 is the first of 10,000 generated ARMA signal-less proxies.

simulated schwiengruber series
Figure 2 Simulated proxy data

The next step is to verify the proxies don’t contain a signal. So the code averages the data by row, one row for each year. Figure 3 is the average of all proxy data.

Figure 3 - Average of no-signal proxies
Figure 3 - Average of no-signal proxies

The ripple is of course the remaining noise of the average. In Figure 4, CPS was run on the same data with a zero average, the red line represents the curve the software was set to look for which is analogous to Mann08 inserting a temperature curve in the graph. Data which didn’t correlate greater than r = 0.6 was thrown out and the remaining data was scaled and averaged one proxy at a time to match the red line.  Standard CPS in other words.

Figure 4 - CPS on data with no signal -- Find the red line
Figure 4 - CPS on data with no signal -- Find the red line

We discovered an unprecedented warming trend in no signal data.  I’ve done a lot of these curves and I see it as a perturbation(shock) and recovery.  The shock is created by a biased calibration sort of the noise and the recovery rate is based on the autocorrelation of the noise level.  The long term trend tends to re-center right on the mean of the calibration range data.

So you might ask what happens when there is actual temperature information represented by the proxies. Fortunately ARMA gives us a method to do just that. The code presented below generates 10,000 proxies 1000 years long. We can create a fake temperature signal of known amplitude and add it into the proxies. After that, we’ll go look for it using CPS and see how well it does.

Figure 5 is an artificial temperature signal.  One hundred one years of warming was chosen arbitrarily from 1900-2000 with exactly a 1 C amplitude. Then in history the code added a sine wave, also having a 1C amplitude.

fake temperature signal
Figure 5 - Fake Temp Signal

This signal is then added to the simulated proxy data to create the same graph as Figure 1 which when you squint, you can see the sine wave and temperature rise in Figure 6.

simulated schwiengruber series with temp
Figure 6 - Sample arma proxy with signal added

Just to make it very clear,  Figure 7 is an average of all the same ARMA proxies as above with signal together.

average simulated schwiengruber series with temp
Figure 7 - Average of simulated data

We have a near perfect recovery of the artificial temperature signal simply from an average of the noisy data. Really, it’s tough to beat a simple average in signal recovery. CPS aims to do just that, let’s see what we get. Figure 8 uses the same CPS Mannian code as Figure 4, yet we use the dataset which has the signal from Figure 5. Since we know a priori the temp rise in the data is exactly linear 1C from 1900 onward to seven digits accuracy, that is what I set up the code to look for. The red line in Figure 8 represents the expected value of 1 C rise in 100 years.

simulated data CPS with signal
Figure 8 - CPS looking for a 1C rise in a 1C rise signal

As Figure 8 shows, using CPS to try and detect a 1C rise in the data which Figure 7 proves averages perfectly to a 1C rise found only a 0.6-0.7C rise which is not very good performance in signal extraction.

Just to re-explain what happens here. In Mann 08 the proxy data which may or may not be temperature is compared by correlated to gridded temperature data and proxies which don’t have the proper upslope are chucked in the circular bin. The remaining data is offset so that the mean in the calibration range (years red line above exists) matches the mean of the temperature calibration data (red line). The last step is that the individual data is then scaled to match the red line. –

This get’s a little complex.

It turns out that the added noise does not evenly affect correlation to the slope. It causes either a positive or negative bias in the calibration region unless we’re really lucky.  This is the equation for correlation copied from Wikipedia:

Image2

If you use correlation you find out that correlation values on high slope data is less affected by noise than low slope data. This means that on average slope reducing noise will have a greater correlation reducing effect than slope increasing noise.  For a balanced signal recovery we would hope the math would have an equal effect on positive and negative slope changes due to noise on the signal.  Since more positive slopes are favored during sorting, when the signal is re-averaged the completely random unsorted noise in the historic portion (to the left of the recovery) still averages to zero while the area in the calibration period has a non-zero noise average.  The result is a distortion in the isotemperature lines of what is presented as a rectilinear plot.

It is also possible but unlikely to get a magnification of the historic signal as well.  If the actual signal plus noise has a standard deviation smaller than the signal you’re looking for the result of the sort and scale will amplify the historic data but in practice this is a nearly impossible case.

Therefore what ends up happening is that the signal in history is amplified or deamplified depending on the level of the signal you are looking for and the level of noise on the data. If the signal in each proxy is not substantially less than the temperature signal you’re looking for, most of the proxies recovered by correlation have a greater calibration range slope requiring a demagnification in CPS to match the standard deviation of the red line. Since most data are then reduced in amplitude the NET is a demagnification of the historic signal.

Therefor there are actually two distortions of any possible temperature signal in CPS. The first is the throwing out of data, the second occurs during the scaling of the graph.

The exciting bit:

In this post the individual distortions are combined but our artificial data allows us to do one more trick. Since we added the signal to the simulated data and then went to find the signal, we still have a perfect copy of the simulated data without any signal. As the last step in the code presented,  the proxies with no signal average to 0 as shown in  Figure 3 are put in a 7 x (10000 x 1000) array and a constant value is added to them. The offset values used were (-1.5, -1, -.5, 0, .5, 1, 1.5 ) . Then, instead of running CPS with correlation separately, the code used the same proxies which pass correlation from Figure 8 and employed the same offset and magnification from Figure 8 giving us the actual shape of the distortions in temperature created by CPS. True iso-temperature lines.

While the previous results of these latest posts have been published showing some of the distortions in the calibration range signal, the distortion of the hisoric signal hasn’t been published on to my knowledge.

Figure 9 CPS with iso-temperature lines
Figure 9 CPS with iso-temperature lines

The iso-temperature lines indicate the true temperature scale of the graph. I made this graph larger so you can see the detail (click on it). Notice how the tips of the black line sine wave just touch the top and bottom of the blue -/+1C isotempereature lines. Note again and the valley of temperature at the calibration range, the black line just touches the zero isotemp at the year where the red line begins (0 temperature). The black line ends again just touching the blue +1C isotemperature line at the year 2000. Now look at the CPS value on the far left temperature scale.

What this means is that if we know the coefficients used and we know the resulting signal, it may be possible to reasonably back calculate (correct for distortions) the result and find the true signal prior to the CPS weighting.

Even if there isn’t a reasonable way to do the back-calculaiton, what this demonstrates is that hockey stick CPS curves are naturally unprecedented due to the math. Calibration range data is automatically matched to whatever signal you’re looking for and historic data is demagnified.  EIV in Mann 08 is simply a regEM of the sorted data onto gridded temp curves – (an odd form of regularized multivariate regression).  Since all data is available, the series are actually linearly scaled by the same factors through most of the early record so there is no possibiltiy that EIV can correct for the correlation implemented distortions.

If you consider the most recent 150 years, the correlation sorting has a form of shock and recovery.  It’s apparent that the same shock and recovery pattern exists in almost every paleoclimatology temperature reconstruction. It’s effects are also visible in the recent sea ice reconstruction by Fauria 09 my eyes are so sensitive to it now that every time I see an unprecedented curve I wonder what happened mathematically to the data.  Mann 09’s recent hurricane paper also demonstrates the same effect but it’s difficult to see where it comes from.

What we do know from this is that correlation sorting of proxies is not a valid method for signal recovery.

As before the best explanation is in the code. I recommend people run it. The first half of this code is from the previous post, there are comments beginning the second half to separate them out. If you’re interested in how CPS distorts the signal,  I would recommend you read above briefly but then study and run the code.

Link TEMPERATURE CORRELATION r2

41 thoughts on “Historic Hockey Stick – Pt 2 Shock and Recovery

  1. The graphs are chopped off on the right in Firefox. Can this be fixed?

    Seems the Mannian 2008 is just a sausage machine 1 size fits all.

  2. Michael Mann has something to say about all this – see his response below:

    From RC “A Warning from Copenhagen Thread” Post 114.

    Mark Says: 22 June 2009 at 5:29 PM

    I have an interlocutor that says that you can get the Mann Hockey stick from random data.

    It took several goes but this is what he eventually said:

    “Red noise is a random walk. You add a random number to the the previous value and so on.
    You make a number of these series and then you use them to replace the proxy data in a temperature reconstruction.

    If you produce several of these random walks you will see that some of them have a similarity to the instrumental record. Some reconstructions give these series a high weighting overiding most of the other series hence the concerns over a few proxies dictating the result.”

    So you have to use random data and keep using random data until you get Mann’s Hockey Stick.

    That hardly seems to be random data to me…

    [Response: Actually, this line of attack is even more disingenuous and/or ill-informed than that. Obviously, if one generates enough red noise surrogate time series (especially when the “redness” is inappropriately inflated, as is often done by the charlatans who make this argument), one can eventually match any target arbitrarily closely. What this specious line of attack neglects (intentionally, one can safely conclude) is that a screening regression requires independent cross-validation to guard against the selection of false predictors. If a close statistical relationship when training a statistical model arises by chance, as would be the case in such a scenario, then the resulting statistical model will fail when used to make out-of-sample predictions over an independent test period not used in the training of the model. That’s precisely what any serious researchers in this field test for when evaluating the skillfulness of a statistical reconstruction based on any sort of screening regression approach. This isn’t advanced stuff. Its covered in most undergraduate intro stat courses. So the sorts of characters who make the argument you cite either have no understanding of elementary statistics or, all too commonly, do but recognize that their intended audience does not, and will fall prey to their nefarious brand of charlatanism. -mike]

    Thought some of you would like a laugh!

    [Response: Disgust is a more appropriate emotion, recognizing that the errors in reasoning aren’t so innocent, and that there is a willing attempt to deceive involved. -mike]

  3. Thanks Mondo, I would never see these replies if people didn’t tell me. I rewrote a bit of the post above, because I was pretty tired when I finished.

    As far as ‘redness’ or degree of autocorrelation in the data. This has an effect on the recovery rate of the signal and the ability to extract high definition shapes from the signal. However, Mike must have been referring to a different charlatan because my previous post uses HIS DATA.

    So the sorts of characters who make the argument you cite either have no understanding of elementary statistics or, all too commonly, do but recognize that their intended audience does not, and will fall prey to their nefarious brand of charlatanism.

    Yes, yes the nefarious brand of charlatanism called open code and data. Allowing criticism and admitting error.

    ==========

    In CPS you can get any signal you want from any random dataset. Discussion of redness, is arm waiving, nothing more. It affects the result but does not prevent the same result. This post uses matched redness to Mikes own data. In addition, the number of proxies doesn’t matter and it’s dishonest of Mike to claim that we can’t use so many, using less only obscures the result in the noise level and he knows it — So we know why Mike makes that point? 😀

    This guy is beyond any scientist I’ve dealt with in his sliminess and if AGW is as severe as claimed, he is doing us all a huge disservice.

  4. I just realized, my next post was going to explore the effects of different redness levels. Now that’s fun.

    I don’t want to be too mean, after all what would I blog about without Mike Mann?

  5. Pete M.

    You can recover the graphs by going to “View”, “Zoom”, and clicking on “Zoom Text Only”.

    I’ve gotten used to doing this whenever I look at tAV – but I wish it weren’t necessary. (hint, hint, Jeff Id.)

  6. Jeff ID, I need to reread what you have done here, but I think I see most of your points. I want to clarify what I think you are doing with the 10,000 simulations. I think this might bear on what MM is saying and why it would not relate to what you are doing here. He talks about using simulations with a given red noise content and determining the percent that would emulate a reconstruction. What, I think you are doing is using a series of 10,000 simulated proxies in a single reconstruction. Those proxies contain the red noise level from that of the tree ring densities and when you place a signal in the red noise it is for the entire 10,000 proxy reconstruction. Let me know if I have surmised correctly.

    Meanwhile I need to reread the Mann (2008) paper and SI.

  7. #7, Kenneth, Thanks for taking the time to read this. I’m a bit frustrated with it because people don’t seem to get the sound of the hammer falling. You are right that there are 10,000 simulated proxies. This was done to make the shape of the signal as clear as possible and it works with far less.

    This post demonstrates a problem with almost all of the proxy based temperature reconstructions that I know of. It will become more clear with the next post but only if people learn this one.

    MM references using too much red noise. This post matches the red noise to the proxies. The last post used his actual proxies. That’s one reason my criticisms are different from what MM has encountered in the past.

    This is also different in that it demonstrates the signal distortion caused by individual proxy calibration to an expected curve. This is not a minor point and without being too humble, every paleoclimatologist should be aware of it.

  8. #8 Jeff,
    If this is repeat or an incorrect assumption of the math presented, please ignore.

    In several posts you have shown that you can extract several different signals (up, down, wiggles, etc.). I wonder if you took the 10,000 sample with the two known signals as in figure 5 above, used the method selected by Mann for the proxies, and you show that yours did include “a screening regression (that) requires independent cross-validation to guard against the selection of false predictors”, couldn’t the rejected proxies be used “to back calculate the result and find the true signal prior to the CPS weighting”?

    If it does recover the signal and it should, would that not show definitely that Mann’s approach is a circular argument? In other words it works when you know the answer, it cannot be used to determine the answer.

    I got into an argument at RC that got me booted (go figure) arguing that the way IPCC approached temperature was wrong because it was a circular argument. They tried using the low frquency signal processing example to show that it could be done. The part not answered was that I and several persons tried to make is that in the signal processing argument, the signal was known, not unknown as the present case is. Their claim that it is known is true only for the modern period, and assumptions have to be made in order to extrapolate to the past. So I am curious, as to the back calculation. My thought is that if it works, and you redo Mann 08, you will get noise except for the infilled and divergence truncated versions. If true, it should be possible to show that the divergence problem was correctly expressed by Loehle. http://www.climateaudit.org/?p=2405

  9. #9 I’m not sure I understand the back calculation. I’ll have to think about it.

    This post’s main point is that Mann’s method and other proxy calibrations do not work even when you know the answer. The noise in the calibration period is guaranteed to demagnify the historic signal in calibration.

    The historic extrapolation is compressed onto the isotemperature scale of Figure 9.

  10. JeffId, rereading this and running the Rscript from your “mann-cps-demo” (thanks again), I think your comment “”It’s apparent now that the same EXACT shock and recovery pattern exists in almost every paleoclimatology reconstruction we’ve ever seen”” may show you how to backcalculate, or may show why there is a “false” (method driven) divergence.

    Assume that the divergence is not just divergence. It is part of the shock/recovery. It could be needed information. Or it may be the way to further show the method has a fatal flaw.

    Imagine you had run this ten years ago, say 1998. You update your proxies, and re-run. Suppose that the data is just too noisy, meaning it will tend to average to zero. One would expect that the new data would want to go to the original centerline. Even re-centered, it would still tend to go to the new centerline. Doesn’t your work indicate that if it is red noise, divergence is not what the hand waving excuses say it is. Rather, it approaches proof that the method is flawed.

    I think proof of this would be add 10 years of red noise at the +1C or even the 0C red noise, and see what you get. I would use the +1C offset to represent the current flat temperature that the world has seen.

  11. #12, the new data will return to the centerline. This post was done in C (I think) but you get the same results.

    https://noconsensus.wordpress.com/2008/10/11/will-the-real-hockey-stick-please-stand-up/

    When I read this, I believe it proves beyond a shadow of a doubt that CPS cannot ever work ever.

    I got all excited the first time I finished it and expected people would notice more. The isotemp plot at the end is really the end of the line for CPS but really it should be the end of the line for individual calibration of multiple proxies by any of the methods I’ve read.

    In fact, I’ve got no idea how to correctly calibrate trees except to use ARMA data to create the iso-temp lines and rescale the graph. I would expect this type of thing to be done to all paleo reconstructions from now on – it may sound a bit cocky but it’s real. CPS is dead and many other proxy sorting/scaling algorithms need to go with it. EIV, TLS, TTLS, whatever scaling regression is used.

    As far as divergence, I believe that is a different story related to trees once thought to be temp straying from the signal in more recent times, this graph is just noise to me. I’m afraid I may still not understand though – I’m sorry. Thinking about it, I’ve never heard of a tree series that was thought to be temp yet later didn’t diverge. That would be a good question for SteveM.

  12. JeffId. It really bothered me that Mann08 backfilled. Perhaps this is part of the reason for the change from MBH98. This quote “”What this specious line of attack neglects (intentionally, one can safely conclude) is that a screening regression requires independent cross-validation to guard against the selection of false predictors. If a close statistical relationship when training a statistical model arises by chance, as would be the case in such a scenario, then the resulting statistical model will fail when used to make out-of-sample predictions over an independent test period not used in the training of the model. That’s precisely what any serious researchers in this field test for when evaluating the skillfulness of a statistical reconstruction based on any sort of screening regression approach.”” Perhaps the infilling with temperature in Mann08 was used to make sure the out-of-sample test passed.

  13. #14, I agree completely with your concept. Mann had to choose less than 60 (I think the number was 51) hand picked proxies to make sure the QC passed muster. This is not a small detail considering his argument. However, even if it was done correctly and his argument was perfect (it’s not), CPS cannot reconstruct the true signal from known signal data.

  14. Jeff one of the adverse comments towards Loehle was his naive approach (meaning without power) that was used. A lot of the proxy wars have been concerned with the aspect of how well (power) does the reconstruction fare. R^2, CE, etc, but does that matter with this method? If one does not have agreed upon CI’s, minimum %inclusion, etc, such as can be found in tables for standard statistical procedures, how can one say for sure that CPS cannot reconstruct an approximate signal? I find this statement “”That’s precisely what any serious researchers in this field test for when evaluating the skillfulness of a statistical reconstruction based on any sort of screening regression approach”” a problematic claim. For, if these tests have not been repeated and standardized (on the Island don’t count) just how is the evaluation of skillfulness of the reconstruction done?

    Remember the flap about George Will? He used a naive approach. But sometimes, one just gets the right answer by drawing a straight line through two points. Your work on Mann08 suggests that his claims of the relationships may be naive. Consider your figure 8. In your example, the part similar to the MWP, the amplitude was reduced more than the modern temperature part; though both were reduced. Without standard methods to determine the amplification (>1 =1 <1), then this method is naive and does not support the claims made of it. Just as George Will's was not totally wrong (data is data, and the future has not yet arrived; Will may be right) the method is not powerful.

    You state ""However, even if it was done correctly and his argument was perfect (it’s not), CPS cannot reconstruct the true signal from known signal data."" I don't believe that Mann would claim he got the "true signal", his claim is that within the limits of the findings, the results are "robust". I think you may be bursting that balloon.

  15. It’s not just a matter of true but rather a matter of nearly guaranteed unprecidentedness. There is a guaranteed correlation in the calibration range with a near guaranteed reduction in historic data. Why ‘near’ guaranteed, because there are situations which can amplify historic data. These situations will not arise in paleo reconstructions though.

    It’s much worse than people realize, simply calibrating tree ring data, creates an often 50 percent reduction in historic signal. I believe they know this, Mann does this with intent. That’s my opinion. Just like it was my opinion that Mann knew he could extract any signal with his math. Now he’s admitted that, we need to realize that it ruins nearly every paleo recon. What’s more, Monte Carlo methods have the potential to restore the reconstructions artificial temp scale.

  16. …the sorts of characters who make the argument you cite either have no understanding of elementary statistics or, all too commonly, do but recognize that their intended audience does not, and will fall prey to their nefarious brand of charlatanism

    I notice that Michael Mann is his usual charming self when confronted by knowledgeable opponents.

  17. #20, I just got back from a long day of work. Mann’s rationale for the correctness of his HS in this case has been completely bogus. Actually, he’s not responded to my posts in particular he just repeated a canned comment about someone elses work. Nobody else from the RC crowd has mentioned this either.

    So with a lot of views, turnkey code and simple concepts these posts stand uncriticized.

    Except by dHog of course who mocks them without specifics because he doesn’t do math.

  18. This is an interesting result, and quite similar to a result found in Mann et al. 2005, “Testing the Fidelity of Methods Used in Proxy-Based Reconstructions of Past Climate”. However, while you and Mann et al. agree that the CPS method will reduce variation from the mean of the calibration period in the noncalibration period, you significantly disagree about the size of the effect. See, for example, figure S12 a at
    http://www.pnas.org/content/suppl/2008/09/02/0805721105.DCSupplemental/0805721105SI.pdf#nameddest=STXT

    It occurs to me that at least part of the reason for this discreprancy is the validation curve you have used. It seems probable that the scale of the effect is not just dependant on the Signal to noise ratio, but also on the scaling ratio in the CPS proceedure, ie, the ratio of standard deviation of the proxies to the standard deviation of the calibration curve. That being the case, using a straight line for the calibration curve will significantly over state the effect because of it’s low SD compared to an actual temperature series, particularly local temperature series as used in Mann et al. 2008.

    I would be very interested in your comment on this.

    It also occurs to me that as you can plot iso-temperature lines for the CPS method, given a reasonable estimate of SNR, you could correct for this effect in such reconstructions asn SNR 2008. An interesting project for the mathematically inclined?

  19. There is an effect created by the difference in SD of the proxies to the SD of the curve but it doesn’t cause overstatement of any effects. You have touched on one of the subtleties of the post though.

    In pearson correlation the curves are both divided by their own standard deviation – see the equation above for r. This is a normalization of both curves to their standard deviation before correlation. This is only the data sorting phase though but the SD is divided out.

    In the scaling phase the data is offset by mean and then the standard deviation of each series is scaled to the standard deviation of the temperature signal. The result of standard deviations is therefore only a linear scaling factor times the whole reconstructed signal (see the code from part 1 or 2). This is handled in some work by regressing the total result to average temperature after the fact. It is, however, unrelated to the ratio of the amplitude of the pre-calibration period recovered signal.

  20. You are going to have to try and help me here. In the supplementary material of Mann et al 2008, they show the application of their method to pseudo-proxies generated in a climate model (Fig S12). The reconstructions from the pseudo-proxies recreate the “global temperature” with a relatively high degree of accuracy, showing a typical error of 0.1 degrees which overestimates temperature in all cases. This may be because the “global temperature” being reconstructed always lies below the mean of the calibration period (except in the calibration period). However, the error is a constant, being approx 0.1 degrees regardless of distance from the mean. Further, the slopes of the curves are identical between reconstruction and original. These properties are very distinct from those you show in your reconstructions, and I am trying to understand where the difference lies. The reconstruction has these properties with a SNR of 0.4, approximately equivalent to your ARMA noise standard deviation of 0.8.
    http://www.pnas.org/content/suppl/2008/09/02/0805721105.DCSupplemental/0805721105SI.pdf#nameddest=STXT

    Given the clear mathematical differences between the relationship between target signal and reconstruction in Mann et al’s paper, and in your analysis, and failing some clear explanation of that difference; the only reasonable assumption is that you have failed to reconstruct Mann et al’s technique. I am trying to find that “reasonable explanation”, which may well result in a conclusion that you have failed to reproduce their technique, but at least leaves open the possibility that your criticism is valid.

    Clearly simple additions or subtractions will not change the slope of the curve, so the centering of the means cannot be the cause of the effect you have identified. Ergo, the effect is a product of the various rescalings. In your program, they result in an adjustment of:
    y(r) = y(s) * SD(cal)/(SD(sig) + dSD(sig))
    where y(r) is a point in the reconstruction, y(s) is the point in the signal, SD(cal) and SD(sig)are the standard deviations of the calibration data, and the signal in the calibration period respectively, and dSD(sig) is the change to the standard deviation of the signal resulting from the addition of noise. Clearly SD(cal) is a factor in the size of the rescaling, and cannot be eliminated, even where SD(cal) = SD(sig). The pearson correlation does not enter into the rescaling, so I fail to see the relevance.

    However, while the standard deviation of the calibration data is relevant, on consideration adjusting its size would not reproduce the differences between your and Mann et al.’s validation tests. It occurs to me that Mann et al. apply the tecnique you describe to calibrate the proxies to local temperature data. They then average the values of indexes within the same grid cell, weight the resulting indexes by the area of the cell, and take the average of the weighted indices. This is the composite stage of the CPS technique. However, they then rescale the composite by dividing it by its standard deviation, and then multiplying by the standard deviation of the instrumental record over the calibration period. This later step, the scaling, does not appear to be reproduced in your program. Perhaps this is the reason for the difference in results between your validation test and Mann et al.’s?

  21. Further to my preceding comment, I am in the process of developing a spread sheet to test my conjecture in the final paragraph of that post. Specifically, the spreadsheet sets a target of decadal averages for “temperatures” between 1000 and 2000 AD inclusive, with the period 1900 to 2000 AD inclusive constituting the calibration period. Currently, in the calibration period, the target rises linearly from 0 to 1. In other intervals, the target is set as 0, 0.5, 1, 2 or -2. From the target 100 proxies are then generated by adding white noise to the target for each value, then multiplying by a scaling factor. The scaling factor is determined randomly for each proxy by multiplying are preset value by a random number between 0 and 1. This is to represent the fact that in paleoclimatology, proxies for temperature do not all come in the same units. The white noise is generated by muliplying a random number between 0 and 1 by 4, then subtracting 2. As a result, noise is a significant component of any proxy.

    The spreadsheet then generates reconstructions of the target series by a variety of techniques. The first, and simplest is to just take the mean of the proxies for each value in the target sequence. The next two then rescale the mean based on standard deviations, and rescale and recenter the mean respectively (MPS). Finally, the spreadsheet generates reconstructions by generating a composite (the technique you use above), and a final proxy be rescaling and recentering the composite reconstruction (the CPS technique). In all cases calibration is done using values from the calibration period only. This is to reflect the fact that in real life, calibration can only be done against the instrumental record.

    Of the reconstruction techniques, the simple mean is the worst performed, followed by the simple composite. The second best performed is the Mean Plus Scale (MPS) technique. The best performed is the full CPS technique, which consistently scores a better r^2, and typically has a lower average error, and SD of error than does the MPS technique. This is partly because the CPS technique consistently outperforms the MPS technique in reproducing the shape of the target, but sometimes is less accurate because of variability in scale.

    Importantly, although the CPS technique sometimes understates high “temperatures” (and overstates low “temperatures”) by an average amount of 0.1 “degrees”, more typically it overstates high values, and understates low values by about the same amount. That is, in this toy example at least, CPS techniques are more likely to reconstruct high temperatures from the past with a higher value than the target rather than the lower. In the same way, they are more likely to reconstruct low temperatures as being lower than the target, though in both cases by a small amount. In other words, CPS techniques are more likely to overstate temperatures in the MWP than to understate them.

    Finally, my experiments with this spread sheet show conclusively that you program does not reproduce the full CPS technique. Your technique reproduces the composite, but does not then scale the results as was described by Mann et al 2008 in the supplementary material (whose link is above).

    I believe the ball is in your court.

  22. #27, I’m sorry I didn’t answer sooner, it’s awesome that you’re digging into this. Lessee.

    y(r) = y(s) * SD(cal)/(SD(sig) + dSD(sig))

    Perhaps the math looks the same to you but I see it like this. For each proxy Y

    Y(result) = (Y_proxy(t)-mean_proxy_cal) * SD_cal/SD_proxy_cal + mean(cal)

    Where mean_proxy_cal is the mean of the proxy over the calibration range. It’s important to center the proxy by it’s mean before multiplying through by SD.

    The pearson sorting is the key to the differences between the mean and historic portion of the signal. It is a non-linear sort so what it ends up doing is favoring positive noise in the calibration range over negative. This creates an increase in variance in the calibration portion of the signal as compared to the historic.

    #28 Tom in your first paragraph I want to point out that you describe a process which is mathematically equivalent to adding different noise levels at random to different proxies having the same signal. I would also recommend a red noise instead of white as that is more consistent with a decadally filtered proxy as used in Mann08.

    I would have to see your spreadsheet to understand your second paragraph. If you upload to a file share or email it to my address on the right I could check it out.

    I disagree with your third paragraph. CPS underestimates the variance of the historic signal, yet you seem to see it in terms of absolute value. Saying it’s more likely to reconstruct temps higher in the past is confusing to me. The temps are offset toward the mean of the calibration range and typically but not always demagnified according to the pearson correlation sorting. Demagnification will always be the case in temp reconstructions but that is more complicated than this discussion. So in the final result you have a tendency toward the median of the calibration and a demagnification of variance.

    Your second to last sentence is correct I think but a bit confusing.

    This is what the SI says.

    All proxies available within a given instrumental surface temperature grid box were then averaged and scaled to the same mean and decadal standard deviation as the nearest available 5° latitude by longitude instrumental surface temperature grid box series over the calibration period. Of the 226 total grid boxes available back to at least A.D. 1800 (see Fig. 1B), 136 extend back to A.D. 1500, 21 to A.D. 1000, and 11 to year 0. The gridded proxy data were then areally weighted, spatially averaged over the target hemisphere, and scaled to have the same mean and decadal standard deviation (we refer to the latter as ‘‘variance matching’’) as the target hemispheric (NH or SH) mean temperature series.

    The difference between my program and this result is simple. I don’t have gridded data so my signal is the same throughought all of the artificial proxy data. So first the ‘centered to mean’ data is scaled to the same decadal standard deviation as the ‘gridded box series’ all my grids are the same so I scale to the instrument sd and then average. I did not make the effort to take the result and rescale it again to the final signal. This will magnify the whole result and achieve a better match to the calibration range data.

    This was done intentionally to keep the program simpler, perhaps it’s not the best idea but my intent was to write something as simple as possible. This last step was included in several of my earlier posts. I should add it in and do another post just to make it more transparent. However, the relative demagnification of the historic signal stays the same.

    Have you tried to run R? It’s very easy and you can do a lot more interesting things than a spreadsheet.

  23. Thanks for your quick responce this time.

    With regard to #27, your program clearly results in a scaling effect on the proxies, which raises the question as to (1) what is the scaling effect,and (2) is that effect actually an effect of the CPS proceedure, ie, have you correctly reproduced that procedure.

    Solely with regard to question (1), I observed that recentering of the mean is mathematically incapable of rescaling; so consequently the scaling must be a consequences of the various multiplications and divisions by the standard deviations. Hence my formula. Your formula is correct as the mathematical operation involved (but see below). My formula can be got from yours by setting the means to 0 (for simplicity)and setting SD_proxy_cal = SD(sig) + dSD(sig). That is, the Standard deviation of the proxy over the calibration range is equal to the SD deviation of the signal over the calibration range, plus the change in standard deviation introduced by noise. The sole purpose of the formula is to see which properties of the signal and proxy over the calibration range effect the scaling, and how.

    With regard to (2), I conjectured that you had not correctly reproduced the CPS technique, and my setting up of the spread sheet has now convinced me of that. In your attempted reconstruction of the CPS technique, you only scale the proxies to the mean and SD of the calibration data once, but Mann et al describe doing that twice. In the first instance, they zero the mean of the proxies, then rescale the standard deviations to that of the temperature series of the nearest instrumental surface temperature grid box, before reseting the mean to that of the nearest instrumental surface temperature grid box. They then averaged the data (after weighting for surface area). After that, they zeroed the mean again, rescaled the standard deviation to that of the global (or hemispheric) temperatures series over the calibration period, then reseting the mean again. The fact that they do it twice is important. The first operation together with the averaging involved eliminates a large part of the noise. When it is rescaled the second time, there is little noise to distort the process, so the loss of scale that you have correctly identified as a consequence of the first stage of the process is then corrected.

    From my spread sheet, we have this example:
    0 0.002468936 0.000890881 0.052492901 0.342167349 -0.091845469
    0.5 1.192335189 0.430237416 0.481839437 0.491716096 0.468936776
    1 2.933460054 1.058497884 1.110099904 0.672430564 1.146585148
    2 5.152728854 1.859289878 1.910891899 0.94394158 2.164705062
    -2 -5.286552943 -1.907578423 -1.855976402 -0.246590314 -2.2995861
    The first column is the simple mean of the proxy data, in the second, the mean is rescaled using the calibration period, in the third it is rescaled and the mean. The fourth and fifth columns are the interesting ones. The fourth is equivalent to your recreation of the CPS technique, and just as in your recreation, the recovered signal has contracted towards the mean. The fifth is the equivalent of the CPS proceedure as described in Mann et al. All pseudoproxies were rescaled so that their mean and standard deviation matched that of the target in the calibration period. They were then averaged, and the resulting series was again rescaled so that its mean and standard deviation matched that of the target in the calibration period. As you can see, in this case full CPS reconstruction (column 5) undershot any value in the target equal to the mean of the calibration period (0.5) or less, thus indicating those periods to be “colder” than they actually were. However, it overshot any value that was greater than the mean, thus indicating them to be warmer than they were. In other words, in this case the CPS technique exagerated past temperature variations (though only slightly). With exactly the same set up, except for the value of the random numbers used to generate noise, this pattern appears to be reproduced about 2/3rds of the time; with the values slightly understating past “temperature” variations about a third of the time. It is also reproduced across a wide range of scaling factors for the noise (up least up to six orders of magnitude).

  24. With regard to 2 my software indeed does represent CPS quite accurately. What you are calling a limitation was intentionally left out.

    The fact that they do it twice is important in achieving a better match to the calibration range data for publication. You are correct, I have no disagreement. When you say that it the loss of scale is corrected, this is where we disagree and is in fact what is proven here as well as VonStorch04.

    I’m sure you agree that the correction of the scale in calibration range after the rescaled proxies are averaged better matches calibration data and improves the fit to the calibration data. However, the loss of scale I’m referring to is the historic loss of scale in relation to the present time. The operation is trying to recover the historic information and since the scale is demagnified about 40% in comparison to the calibration loss of scale this cannot be corrected by a single scaling factor times the whole curve.

    I apologize for this because I didn’t expect it to create confusion. I thought presenting it in this manner gave a better explanation of how the noise averages out in the CPS process before final scaling.

    Perhaps you could email the spreadsheet so I can see what you’re doing better. Are you looking only at the uninteresting calibration portion of the signal or at the historic part? This could be the whole point of confusion.

  25. In reverse order, the averages given above are all across the historic period, not the calibration period.

    For the CPS, the following are the targets values from the historic periods, plus the averages for three refreshes of the spreadsheet, with all settings identical:
    0:-0.091845469,-0.048673564,0.035134974
    0.5:0.468936776,0.570070728,0.59364619
    1:1.146585148,1.014561678,1.019121136
    2:2.164705062,2.181972325,1.936664488
    -2:-2.2995861,-2.322849912,-1.836148312

    Here are first twenty odd values overlapping the calibration and historic periods from the first refresh above (all historic targest except the last = 0, the last = 0.5):

    1: 0.769916153; 0.9: 1.003566273; 0.8: 0.759265374; 0.7: 0.894150915; 0.6: 0.500883478; 0.5: 0.620232132; 0.4: 0.2030322; 0.3: 0.178818057; 0.2: 0.338265428; 0.1: 0.006348014; 0: 0.225521976; -0.248773979; -0.195382091; -0.318994145; -0.304880194; -0.086507886; -0.032888573; .018925995; -0.132820071; 0.028713265; 0.491786176

    You can see there is little sign of the dip immediatley after the calibration period which is so evident in your graphs. It is not uncommon to get (small) positive values in this range. In my most recent refresh, the second value in the historic period is 0.228, so this looks like random variation around the target, or a value just underneath the target.

    I have now emailed you the spreadsheet so you can look at it in more detail.

    I am uncertain as to why you think multiplication by a single factor cannot restore the loss of scale. Multiplication can certainly restore the relative seperation of the parts of the signal on the vertical scale, for as your isotemperature lines show, the information about the seperation is conserved. The only issue is whether the pattern of the isotemp lines in and near the calibration period will be eliminated by multiplication by a constant factor. As that distortion is a consequence of the displacement of the isotemp lines, it is at least plausible that the restoring the isotemp lines will eliminate it, or at least reduce it to inconsequence. In that regard, it would be interesting to see your graph of the data after the full CPS process with the isotemp lines.

  26. Tom,

    Thanks for the spreadhseet. I see now what you’re doing, please correct me if I’m wrong.

    You’ve created 100 proxies by adding white noise to a signal, rescaled the calibration range, averaged and then recalibrated again recovering the original signal almost perfectly.

    If thats right I’m not surprised at all. There are a couple of suggestions I would like to make.

    First adding white noise is the least troublesome form of noise. It cannot create the short term effects which are so common in proxy data. This is important because the SD of every proxy + white noise is always up and likely always by a similar amount. This still isn’t that huge a problem but I think you’ll see a greater deviation in recovering your signal if you know how to add redness (red noise). If you don’t I can suggest some methods by spreadsheet – R is still pretty easy for this. I believe if you add some redness your reconstructed signal will be scaled a bit after the second scaling – it’s worth trying.

    Second, I didn’t see the pearson correlation sorting which is what distorts the historic result. Without this step all the series have the same scaling through their history so the historic signal will follow the calibration range.

    Therefore I agree that your method can do what you say but it’s not working with the same kind of data or performing the correlation operation.

  27. That’s almost what I am doing. I am actually adding two components of noise to each proxy. The first compenent is a simple addition of white noise which is currently at the same scale for each proxy. But I then multiply each pseudoproxy by an (unknown to the reconstruction technique) random scaling factor. This is not noise in the proxy, but its effect is to add noise to the reconstruction. It is because of this feature that the various reconstructions are still quite noisy.

    I suspect you are right about white noise being the least troublesome. It is also the only noise that can be easily implemented on the spreadsheet, ie, red noise could also be implimented, but only by significantly expanding the complexity of the sheet (I think). I don’t think just adding red noise of itself will result in a reduced scale after the second scaling. What may do so is if the red noise is not noise as such, but rather another, possibly partially correlated signal. In that case, it will not be eliminated because it will appear in multiple proxies and be significantly correlated in doing so. As such, some of it will inevitably survive through the various averaging processes, and may then reduce the scale of the signal after the second scaling.

    As to the correlation sorting distorting the historic result, that is a seperate issue as whether the CPS technique itself distorts the result. Are you suggesting Mann et al’s reconstruction would have been technically correct if they had used the CPS technique without the prior sorting?

  28. 34, It would have been closer to true scale if they had not sorted of course. There is a problem with the red noise, I have shown this effect in some of my older posts I believe but if I get time later today I’ll run it and add a post to the list, I want to fix the UAH video tonight. From my recollection, the standard deviation rescaling isn’t perfect b/c of the noise factor – but the scaling historic and calibration were identical.

    For a spreadsheet, if you make a matrix of equal size to your proxies on the next page you can create red noise by adding a random value times a multiplier to each previous value then add this matrix back to the scaled proxy matrix for your noise. It should be pretty quick actually.

  29. Tom,

    I’m not forgetting this, I’ll do another post based entirely on CPS without sorting but it’s going to be about 2 days if I had to guess.

  30. Did the post promised in #36 above ever happen? I searched through the November posts, but none seemed to be it.

  31. #37 Joe,

    I’ve done the analysis somewhere else on tAV as suggested before but I still haven’t forgotten that I need to redo it for completeness. It’s actually very easy to modify the above code if you’re really interested. Just copy a few lines to the end to do another scaling.

    If you look at the date of #36, I went hunting by the 13th, climategate came on the 19th. CRU data release at the end of last month. IOW, There have been a lot of distractions I hope to find time for it soon.

  32. #38 Jeff,

    Thanks for the prompt reply. Actually, I didn’t completely follow the discussion with Tom Curtis–today I’m still trying to learn R and understand the data sets you work with so that by walking around in them and the code I can assimilate the discussion better. So at this point I’m still basically reacting to not much more than the discussion’s atmospherics, but at that level at least the colloquy with Tom Curtis seemed serious.

    In any event, it will probably be a week or so before I’ve played with your code enough to completely understand what you’ve done already, so from my perspective your return to this issue will likely be timely enough when it happens.

  33. #39, Tom is a regular non-believer who tries to kick dirt on “not green enough” blogs. The step he’s requesting just multiplies the final curve by a scaling factor to match the hockey stick blade up better. The shape of the curve is the same. To say it another way, he’s demanding I change the scale of the total Y axis of the result whereas my point has been the distortion of the y axis with time.

    That’s why it’s not a terribly big deal.

  34. Hey there! I just would like to give you a big thumbs up for your great info you’ve got right here on this post. I am coming back to your web site for more soon.

Leave a comment