Demo of Flawed Hockey Stick Math Using Actual NH Data

I got a bit of a surprise while continuing my calcs using red noise.  This time I used a real reconstruction from the NH data.

Last night I spent about 5 hours on more red noise plus signal analysis.  I found some incredible things regarding the methodology of statistically sorting and calibrating proxy data.  While others have examined and published papers on how hockey sticks can be created from red noise, I am taking this to the next step.  What I have done here is to impart artificial known signals in the red noise and go look for it.

Last night I expanded that work to put in my red noise data the CPS northern hemisphere composite data from Mann08 and then used a similar methodologies to his to go look for it.

If you are not familiar with my other work go here first.

Simple Statistical Evidence Why Hockey Stick Temp Graphs are Bent!!

Below is the one curve M08 provided as a result from his proxies.

This is one continuous curve but I colored the end red to show the 1850-1995 calibration range used below in all calculations.  This curve is actually an average of 10,000 red noise graphs with the signal added in.  It is a near perfect match to the original and can be verified by the quality of the tail from 0 to 200 AD.  There is a 0 signal and the average of the red noise is basically 0 here.

So I have 10,000 series of red noise plus a perfect known signal added to each one.   How can we find the best proxies to recapture the data.   Paleoclimatology uses statistical correlation, but they only have temperature for the red section of the graph above so they correlate to the section they know.  This seems a reasonable assumption but is actually a fatal flaw in the math.  Let’s see what happens just by sorting for the best fit to the red section and averaging the result.

The blue line is the original curve from the first graph.  The purple/pink line is all data with a correlation r> 0.1.  Look what happened, it amplified the signal in recent times just from sorting.  What I find interesting is that historic data (prior to the 1850-1995 calibration range) fits the original data almost perfectly.   There is no amplification.  Well naturally you would say look for better correlation data in the calibration range and you will get a better fit to the original series.  I reran the same test with r>0.8 and I got the yellow graph.

The Yellow much higher correlation line is even more amplified in recent times yet historic data is not magnified.  When we consider how correlation works flat low amplitude but noisy signals are given low r’s when compared to data with a sloped signal.  This natural and correct mathematical process prefers high slope data.  The result is clear above.   Higher correlation to a positive (known calibration data) increases the amplification of a signal inside red noise.

Well this isn’t what paleoclimatology does — completely.  In paleoclimatology, high correlation data are then scaled so the calibration range has a signal that is a perfect match to the known.

In my example here we have the advantage of a perfect known signal and a huge set of proxies to sort so we can see clearly what the scaling does.

I used the following process to scale the calibration data from the two graphs above.

Sort proxies by  r>0.8

Fit a line by least squares to the red section of the top graph

Fit a line by least squares to all remaining passed r proxies

Scale and offset each proxy such that linear least squares after magnification has the same slope and intercept as the linear least squares in the red section of the original graph.

This process is analogous to the EIV and CPS scaling values in M08.  I am not an expert in these methods but since they use linear scaling to make the best fit of proxy data to the red calibration data it is a good approximation.  I believe this simplified method will produce a result so similar it would scare the original authors.  Anyway, it doesn’t matter which method I use for this demo, the result is the same.

Here is the r>0.8 graph in red on top of the original signal in blue.

The red signal in the calibration range from 1850-2000 is shaped almost perfectly like the original data series.  But look what happened to historic values.  For those familiar to my recent posts (link above) we aren’t too surprised by this result.  But for those who haven’t seen it, what you are looking at is a demagnification of historic temperatures based on the basic methodology used in every hockey stick paper.

We know the signal is shown in the blue graph, I put it in the red noise with a + sign.  Yet the statistics process of sorting for the best correlation and magnifying it to fit “known” values creates a substantial demagnification of historic values.

Again if you are familiar with my other work you can see the true zero of the red curve offset by the amount shown in the tail.  Not only is the data demagnified it is offset!

You might have noticed I did the high correlation r>0.8 data first.  Science is kind of fun, this next graph caused me to spend hours on what I thought was going to be an easy post.  What happens with an r>0.1 through the same process.  This is the same data as shown in the purple/pink line of the second graph but each series went through the scaling process to match the calibration range data to the highest degree possible.

Ouch, something went wrong.  I spent hours thinking my software had a problem.  I tried different red noise levels and poured through thousands of lines of code. No problems. What?

It took a while to realize what really happened.   The r correlation process rejects flat (nar zero) slope signals in the calibration range.  A loose r value of 0.1 allowed very flat slope correlations to be accepted.  These individual series are then magnified by a multiplier until they fit the original slope of the calibration resulting in a few series having very high weights compared to the rest.

Think of it like this, assume our calibration data has a slope of 1.  The red noise + signal has slopes varying from below zero to above.  If a proxy shows enough correlation to barely exceed the r>0.1 value some of these proxies will likely have a flat slope say 0.01.  This series would then be magnified by 100 times to match the known signal.  Which it clearly did in the calibration period from 1850-1995 (my last graph above).

This is actually the correct result!  I also find it a stunning revalation and confirmation of how others have shown by looking at the magnification coefficients from actual papers that these hockey stick graphs are primarily a few series in historic pre-calibration times.

Here’s the big revelation for me

The math forces historic values to be comprised of the worst r value series!

The lowest r values get the highest magnification.  Why didn’t I see that before?

I hope my readers find this to be as huge of a result as I do.  I had been guaranteeing myself that the historic signal would be much flatter than recent times but as I demonstrated in my other post (link above), flat sloped proxies cause amplification of the historic signal.  You know science is fun even when you are wrong in your expectations.  I just didn’t expect a few proxies to get so much magnification that they took over the rest.

By the way r>0.1 resulted in 3399 or 34% of my original random series plus signal being accepted.

Comments and criticisms welcome.

Also, from M08 we know they added fake upslopes to the ends of 90% of the 1209 series.   This would have the effect of making sure that not one proxy took over the historic graph.  Think about that for a minute!

14 thoughts on “Demo of Flawed Hockey Stick Math Using Actual NH Data

  1. REALLY interesting stuff! Am enjoying your efforts to deconstruct the paleo process. This stuff seems ripe for publication.

    Unfortunately, I’m a bit slow…

    The bottom graph has values that are magnified because of the low r and the math used, correct? If the intent behind using a low r value was to attempt to re-create the M08 graph, it looks like the amplification you obtain is greater than what M08 obtained and of different sign for the Medieval Optimum, as well as in the LIA. So, if it’s a similar process to M08, why are the results different?

    Thanks!
    Bruce

  2. #1

    You are absolutely correct and the result is a bit confusing. In the last graph a some flat slope proxies were accepted in the r correlation and outweighed the other proxies dramatically giving them so much strength over other data that they became “the signal”.

    This is a surprising result to me. In my previous posts, I assumed a nice linear upslope for a known correlation signal and didn’t hit any near zero proxies to this magnitude. They all averaged out.

    Looking at the calibration range data in the top graph the signal is very non linear. I the complex shape of it allows accepted r>0.1 values on proxies with a near zero slope whereas a linear rising temperature would reject any proxy near a zero slope more strongly making the effect un-noticable.

    So the vast majority of proxies are deamplified, the average to a strong linear upward trend or high r correlation graphs show that, but what I discovered above is that low slope data can be given a higher r from a “shaped signal” correlation and the calibration process will amplify it ahead of others beyond reason.

    It just shows how easy it is to be fooled by fancy statistics. I was certain I would get the same result as my linear examples.

    Now, I am sitting here in amazement that the historic values can be based on just a few proxies. Something I have heard before.

  3. Very interesting! For those of us who don’t know Mann08 inside out, what r threshold did they use for selecting proxies – 0.1 ? And for the 90% of proxies with ‘fake’ data spliced on their end, what percentage of the calibration period contained fake data?

    (I’m just trying to guage how many & much of a demagnification effect we’d expect, and how many & how much of a magnification effect we’d expect.)

  4. I calculated on another post through backfitting the result data through a magic statistical process of my own an approximate scale factor of 0.62 for the historic values. The degree of infilling varied from proxy to proxy. The 95 schweingruber series had 38 years of infilled data, a few had more and the rest were all lower.

    I just received an email from Steve McIntyre who explained that the CPS method uses a scale factor to standard deviation rather than the end slope. Since standard deviation is less likely to go to zero value than slope the amplification of individual series is less likely to be so great as in my last graph.

    This does not invalidate the work above, it just minimizes the effect in the last graph where a few series took over completely for the group. Perhaps they would just have a strong influence. I will run it again tonight according to cps.

    Fun stuff.

  5. Jeff, I may have missed it. But, have you done your analysis with a second filter to determine the demagnification wrt “divergence”? The filter of the second sort is to filter out the rednoise that you have kept and mimic an elimination of “divergence” of the revelevant rednoise. First to see what eliminating the divergence series does to the demagnification. And secondly to infill with red noise that has been selected as being equal or correct to filter sequence 1 (being correct to your temperature data), and compare results. Being curious of the results, I am. Hope I have stated this clearly.

  6. John,

    I am not sure I am understanding.

    If I am right about your meaning, you are looking for how actual tree ring or mxd divergence affects the signal result. This is a good question by itself. Since we know large groups of tree data are together divergent (not just a small group) I did do an analysis on a previous post of a linear negative signal while sorting for a positive.

    The last graph in the post below demonstrates a negative signal in the proxies.
    https://noconsensus.wordpress.com/2008/09/29/simple-statistical-evidence-why-hockey-stick-temp-graphs-are-bent/

    While closer to your request it wasn’t matched to true proxy red noise. Still if you compare the difference to the other graphs which are mathematically equal you can make some general conclusions about the effect on amplification.

    I also found this an interesting point. I wonder if you will conclude the same things I did.

    For those who are casual readers, my intent and result shows how hockey sticks shapes are created from noise and how the basic math behind them cannot reproduce the “true” signal in the data. I made the signal and then went to look for it using similar techniques in this post and the exact same technique in the next. In all cases the original signal cannot be reproduced.

  7. Yes, that graph is similar to what I requested. The difference is small. I was interested in particular if you took some of the rednoise OUT that correlated to the negative signal you put in baseline. That is why I stated the second sort. Then, that rednoise you took out, truncate the negative slope area and infill with a rednoise with the positive slope. I agree that the hockey stick will be created. It is the demagnification that I was interested in. One of these may be like the difference in noise graphs you did, that showed the differences of the demagnification. I would conclude that taking out a series with a negative slope will give a HS shape. Re-inserting rednoise redone with a positive slope will also give the HS sahpe. Just a different way of showing just how vigorous it makes HS shapes. And, yes the reason I ask is that if you take a series such as TR or MXD, divergence and demagnification issues per the actual signal could be more than a bit interesting.

  8. Correction: the infilling would have to be rednoise that you sorted to have a positive slope. I do not know that it would have to be different than the first sort you did. THink about what Mann et al 08 did in a real way. Did they not sort, get a matching slope and then infill. It seems that there has to be some kind of double HS search. Once to get the shape, such that the divergence series can be truncated and then the final sort and run for the final result. So with this assumption of the double sort, I was trying to show that whether or not you excluded the divergence, you get a HS, if you have a negative slope, or insert a positive slope, you get a HS. I still wonder if the implied double sort with this positive slope replacement of a negative value could be one of the sources and reason the claim is made that this reconstruction is insensitive to BCpines, Graybill, etc. To me this would show up in the magnification/demagnification that your graphs and work indicate.

  9. I see what you mean now. But I read the paper differently. As I read it in the SI section, and somewhat amazingly it is even worse than your suggestion. The divergent data was first chopped off then new improved data was pasted, taped and glued on the end of each proxy. Happy with the new improved data, a single sort was done. The pasting operation was pre-sort. This improved data still only passed correlation because of a very low r correlation value of 0.1.

    The quote below demonstrates the pre processing requirements of the SI section of his paper.

    The RegEM algorithm of Schneider (9) was used to estimate missing values for proxy series terminating before the 1995 calibration interval endpoint, based on their mutual covariance with the other available proxy data over the full 1850–1995 calibration interval.

    If I am correct, the addition of the upslope would have less amplification because the upslope pasted on the end would guarantee less magnification of low correlation series. Reducing the problem I demonstrated in the last graph above.

    But to your point of incorrectly flipping data, you are more than right that if there was real signal in some of these BS proxies you would decrease the amplitude of all the signal by the wrong sign through the entire series except for the glued on end. Steve McIntyre found a gorgeous example of that in the Finnish sediment proxies which Hu McCulloch pointed out was further used to validate the total result.

    I hope I understood better.

  10. Yes. I was curious as your opinion of what it does… It would seem to me, also, that if it improved the fit artificial or not it would, actually, decrease the demagnification as you stated above due to the tendency of the selection process to weight the worst highest, I believe is how you stated it. I checked, it appears you are correct. It was a presort.

    It would appear to date that you work indicates: the presort and infilling helped, but could introduce spurious results; the frequency of noise can bias the amplification; that if it does not effect the magnification, the result could actually be random correlation with changing magnification around the actual signal, if it can be detected at all.

    But perhaps I am too pessimistic. 🙂

  11. WOW,

    Great graphics and even better discussion.

    So, the question that follows: How ‘should’ it be done?

    Thanks,
    EJ

  12. “The lowest r values get the highest magnification.” So if that was the bristlecone pine’s situation, would I be right to think that Mann mightn’t even have to be consciously devious to produce a graph that effectively multiplies its strength by 390…? if I’ve understood you right, Mann might still think he’s done fair science and for this reason won’t understand why an “amateur oil-funded” denier is going for him all the time.

  13. Lucy,

    Perhaps I am more suspicious than you. Remember, Mann’s team has used at least 3 methods which all create the same distortions of the actual data (CPS, EIV and de-centered PCA) What are the odds that they invented 3 separate techniques which all provide the same distortions?

    It is highly unusual in science to throw away any data for non-physical reasons. It is also unusual to prove a causality theory without strong evidence. For Mann to achieve his desired result of a little ice age and MWP very specific proxies had to be amplified ahead of others.

    I haven’t even looked into it yet, but proxies are weighted more strongly when other proxies are not nearby because they represent more of the planet. If you were to eliminate even 50 or so for no reason you could really strengthen the response of the ones you want.

    Mann accidentally posted 1357 proxies on a government server stating it was the data used. Later he posted 1209 proxies which he claimed were original data. These proxies appeared to me to pass his selection criteria for use so we can be certain that proxies were eliminated for reasons not explained in his paper. See my online experiment with the hockey stick post.

    https://noconsensus.wordpress.com/2008/09/20/online-experiment-with-the-latest-hockey-stick/

    Thanks for the interest.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s