The Flaw in the Math Behind the Hockey Stick

When I did this post it was just the beginning of what I found out. Thanks to dogooza for the RC link. To see the true fallacy of the Mann08 math in detail go to this more recent post.

Will the Real Hockey Stick Please Stand Up?

On Sept 11 of 08 I finally understood how hockey stick temperature graphs are being produced. At that time I made bold statements on several blogs that you can’t do that because the historic temperatures are on a different scale than local temperatures (last 100 years). They were rightfully ignored since I had no data to back them up. Now it’s different!

If you don’t know about proxy temperature reconstruction, that’s a mouthful go to this link first then come back. Ten Things Everyone Should Know About the Global Warming Hockey Stick

Today, I have put together a clear demonstration of why all of these sorted temperature reconstructions are flawed!

From my previous post How Come So Many Independant Papers Claim Hockey Sticks I made this graph showing random red noise data.

I added a fake temp signal which looks like this below to every series.

I then demonstrated how any random sort produced the same signal after that I did a sort based on maximum slope in the last 100 years (1900-2000) and produced this graph.

The historic signal is still nearly the same, but recent times show a false temperature rise. Its false because I made the data myself and gave it a flat temperature! This is pretty cool trick but it isn’t what hockey stick papers do. What all the hockey stick papers are doing is comparing a temperature rise in the last 100 years, scaling the data and then offsetting it to match current temperatures, my graph just sorted the data by slope. The scaling (magnification) operation is justified through ‘good correlation’ but it is in its concept mathematically improper.

The Experiment.

What happens when you take random red noise data with the signal in the above graph and perform correlation and MAGNIFICATION to the various series?

I assumed all temperature rises occurred in the last 100 years and were linear (no curvature to the graph).

For the science guys: I took the best fit line of the last 100 years of my random data. (using least squares), I then calculated the r value for correlation and only accepted r values greater than 0.8. Remember the real temperature in this data is graph 2.

I first fit lines to the data and averaged all series with r values > 0.8. This process simply looks for series with ends which match well to a linear curve. Since the linear curve can go up or down the average of 10,000 series is flat in recent times as shown here.

Wow Jeff, a nice graph the with same shape and amplitude as graph #2! The historic 1200-1300AD peak is the same amplitude (height) as what was produced by maximum slope in graph number 3 as well. Slightly interesting but not that exciting.

Next I decided that I would assume a temperature rise from -0.2 to 0.6 degrees C in the last 100 years. Since the assumption is that temperature is rising I couldn’t accept any negative slopes in the data (this identical to what is done with temperature proxy data).

Science guys again : I then ran a correlation analysis for high r values and sorted those for positive slopes. Using a least squares technique I ‘calibrated’ the graphs to my assumed 0.8 degree temperature rise.

The steps were like this:

Assume – linear temp rise from 1900 – 2000 of -0.2 to 0.6C

step 1 – correlate high r value data >0.8

step 2 – remove any negative slopes

step 3 – magnify individual data series to match my temperature trend

step 4 – average data

The following graph was produced.

First thing you can see is that again I was able to create a strong temperature hockey stick in recent times. The temperature from my fake proxy data went from -0.2 to 0.6 degrees C which again is pretty neat considering we know there is no temperature signal in this data because I made it up! The real effect is on the historic temp rise. The hump between 1200 and 1300 ad is offset +0.2C and smaller in height.

Why did that happen?

I would like to say I predicted this, but I actually expected the hump to get larger in the 1200-1300 year AD range. I now understand that the fact that it has shrunk is created by the noise in the data. To say it as simply as I can, if the average slope of the 10,000 series of data in the (1900-2000) calibration period is greater than the slope of the temp signal, the historic values are compressed!!!!!! They could have been magnified with a different noise level but the noise in this psuedo-proxy fake data combined with the shape of the calibration trend determines the true scale factor for historic temperatures!!

Ok, now this is big stuff so of course I couldn’t just make one graph. I randomly assumed a temperature rise of -0.55 to 0.95 degrees C and reran the software. The curve below was produced.

Ok, now the same random data with a known flat temperature signal has produced a huge up slope in recent temperature. More interestingly, the historic 1 degree C temperature hump has changed showing a minimum value of .15 and a max of .85 for a total rise of 0.7. This happened just by looking for a slope in recent times using methods similar to paleoclimatology. I couldn’t stop there because this goes at the very foundation of proxy based temperature reconstruction.

Below is another graph of the same data sorted assuming a 0 to 1 degree temperature rise in recent times. Again, remember this is random data with a flat temperature in the most recent 100 years. This time I scaled the data for a 0 to 1 degree C rise.

Ok, same data again and you can see the peak in history has been demagnified by the same processes used in the hockey stick papers. The hump between 1200 and 1300 now has a height of 0.5degrees instead of the true value of 1 and local temperatures show a 1 degree rise.

One thing I see from this graph which is different from the previous two, we know from the signal I gave this random data that the true temperature is zero everywhere except the 1200-1300 range. We also know the peak of the historic 1200-1300 temp is 1C. This means the true zero temperature of the graph basically follows the arc from 100AD of 0.45 through to 1900 AD of 0.2 . Also the peak at 2000 and at 1250 are both 1 degree C in reality so you can imagine a similar arc across the graph through those points! You can then get an idea of the shape of the temperature compression created by simply sorting the graph for local trends.

Those who are scientific minded will notice that these last 3 graphs are all of the same data on different scales (magnifications and offsets). So for you who like myself, still have questions I did another experiment.

Experiment #2

What if we look for temperature that actually exists in the proxies?

By this I mean, let’s add a 0 to 1 degree known rise in temperature and then use statistical calibration to go find it. Fun eh!

Here is the new temperature signal I used.

Now clearly the last 100 years have a linear temperature rise of 1 degree with a historic temperature rise of 1 degree. I added this to the random series and then performed my least squares based calibration and correlation which is analogous to the majority of the hockey stick paper methodologies.

Look at the historic trend now. This time we know the temperature actually rose exactly 1 degree C. I was able to use 37% of the data or 3700 proxies out of 10000 yet the historic trend is compressed due to scaling. I’m still not done though. An entire science has missed this point, so how much can we demonstrate the error in the mathematics. Let’s do one more, this time I will only accept correlations greater than r = 0.95 to positive sloped data.

An r of 0.95 demands a greater accuracy to the linear fit at the end. This means that this method will only accept the best quality data for the calibration period. This only comprises data which represents a highly true linear trend at the end. Instead of improving the historic 1200-1300AD signal to show its real shape, it was in fact further reduced!! Compare the last two graphs!

Now, If you imagine the zero temperature curve on the above graph from the 1900 point following the slow arc through to 1000 AD you can see what has happened to the true zero temperature of the graph. Then imagine a similar but upside down arc extending from the 2000 year 1 degree C to the 1250 0.7 degree mark, you can visualize what happened to the actual temperature scale over the years.

What happened here isn’t the result of my manipulation of the data, this was a realistic test of the standard methodology of paleoclimatology. What the scientists are missing is the point that when you scale (magnify) data based on recent times the HISTORIC scaling is affected by the noise level in the data. If your noise has a slope which is greater than the trend you are sorting by, the historic data will be demagnified! If the noise level has a lower slope than the calibration data it will be skewed the other way creating a historic magnification of the trend!

I hope that every paleoclimatologist reads this post. My point is the same as when I found out what was going on. You cannot sort data based on a trend you feel should be in it. If you do, the historic data is not on the same scale as the sorting (calibration) year data (i.e. 1900-2000 compared to 1200-1300). The scaling is affected by many things including the ever present slew rate of the noise.

For those who read this, some will think this is a minor point or just another crazy blog. This is not, it is proof of the flaw in the methodology of an entire science.

As exciting as this was, tune in tomorrow for my next post which will explore the effect of noise slew rate on the above graphs. Sounds fun don’t you think?

20 thoughts on “The Flaw in the Math Behind the Hockey Stick

  1. Thanks for another very interesting post.

    I like this approach of building from a simple model like this. It has finally helped me understand why it may seem reasonable for someone to only choose the proxies that correlate well with the actual recent temperature record whilst clearly showing the possible dangers in this method.

    I’m looking forward to your next post.

  2. It’s good to see you have a little lawyer in you! As we talked about before, this is all about compiling “evidence” to build a case. In this case, the math is your “smoking gun”.

    Now, the bigger question is…what do you do with it?! Posting it on a blog will never get the audience (and by this I mean academicians and policy makers) this work deserves. You need to constrain your anger (I know…good luck with that one) and find a way to put all of this information to good use in the form of a “peer reviewed” journal. Only then will join the ranks of McIntyre et al as a truly hated man (and I mean that in the most complimentary way)!

  3. That really is excellent work. Thank you for the effort. IMO, this should be fine tuned, formalized, and published. The simplicity and clarity of the analysis make it quite devestating.

    These stupid hockey sticks need to be put to rest once and for all. Maybe then sharp minds can turn to solving real problems instead of wasting time untangling Mann’s nonsense.

  4. Here’s another interesting trick:

    Reverse the x-axis after adding your fake hockey stick to your random data. Then run that through the hockeystick finder.

  5. Lee, I want to develop this more fully. If I can finish the math I will consider publishing. I am convinced it is a real effect and would change the shape of every hockey stick graph which sorts proxies by correlation.

    I will probably need a partner in the field to finish the work though to help with references and other details just to complete the concept.

  6. Jeff, any feedback on this work from some of the more well known skeptics? I’m a little suprised it hasn’t gotten more attenion.

  7. #6

    I am a bit surprised myself. I did receive an email requesting publication but damn it seems to me that mathematically demonstrating a serious flaw in the every HS would be bigger news.

    It is only a blog though.

    I am working on the next step. If I’m right it will get a bit of attention for the rest of this work.

  8. I think this appears to be a really neat piece of work. What I think is impressive, is being able to reproduce all the major features of the hockey-stick from a very simple set of assumptions. You get the sharp rising blade at the end. You get the long straight-ish, slightly declining shaft. You also find that any real bumps in the past get attenuated by the technique.

    You get all this from only a small number of assumptions. Firstly, that proxies, even good proxies, will have errors that are red rather than white noise. Secondly, that they are selected/filtered/weighted depending on their correlation with a rising value at the end of the time period.

    It’s a good sign, when you can make a small number of assumptions and your model can then produce a number of useful predictions. This is, pretty much, the opposite of most climate science where they make a huge number of assumptions to get one output.

    I’d observe that McIntyre and McKitrick’s work is a specific disproof of the PCA techniques use in the original Hockey stick. What you have here, is more like a general disproof.

    I think this white vs. red noise point is interesting. Physically, the question is this, is the temperature at a given moment in time some average plus or minus a random amount of noise. Or, is it a random amount added/subtracted from the temperature in the last time interval? It seems reasonable to assume that temperature today depends on temperature yesterday. Whether that is reasonable over a longer period I can’t say. the global warming advocates claim that it is. They fret that there is so much heat stored up in the system that global warming will continue even if we cut back emissions now. Real temperature records, witha bit of smoothing certainly look more like red noise than white.

    Red noise can also be thought of as similar to the mean free path in kinetic theory. And I think I was introduced to this in school maths with something called the drunkards walk. There has been much discussion on Climate Audit about applicability of methods depending on whether the data is stationary or non-stationary. As I read it that’s equivalent to white and red noise again.

    In fact, I think you can probably show that, some of the techniques used by Mann and others would work and would produce the right answer but only if the proxy datasets were accurate temperature measurements with errors that were white noise rather than red (i.e. did not include another trend – whether forced or random).

    I think it would be interesting to demonstrate that you get a hockey stick under a wide range of sampling/filtering techniques. It might also be interesting to test that this doesn’t happen if the noise is white – and perhaps show that the techniques used by Mann and others can actually work under that situation. Then what would be really impressive is to show just how much or howe little of the error needs to be red to get a hockey stick.

    You could end up with a set of requirements which any purported proxy analysis would have to show were met. You might be able to set a minimum number required and a minimum required quality standard for the proxies.

    Personally, I think that there is more theology than science in much of this. And rather like theology the proofs offered tend to be circular ones. if you assume God exists it is possible to provie God exists. If you assume that proxies will show a strong temperature rise in the last century of the series – then you can prove that.

    In years to come, the hockey stick will come to have the same scientific status as N-rays or the Martian Canals.

  9. Long Question, bare with me here:

    If we assume that these two things are true:
    1) temperature proxies are not as good an indication of historical temperature as some people assume, and
    2) observed, non-“messed with” temperature data (from NASA, NOAA, etc.) shows warming over the past 120 years (on the order of slightly less than 1 degree C);
    then doesn’t it follow that to filter out the worst temperature proxy datasets by means of comparing them to the observed temperature would be a good thing?

    My point, in slightly different words: you seem to be asserting that throwing out temperature proxies that don’t fit a predefined curve in the last 100 years is bad science, but wouldn’t it be a good thing to throw out the temperature proxies that don’t fit on the assumption that if they don’t match the observed temperatures of 10% of the last 1000 years that it’s also very likely that they wouldn’t match the actual temperatures in the other 900 years?

    Or are you assuming that temperature proxies have no value whatsoever?

    Thanks for the very interesting blog posts. They’ve certainly given me a lot to think about.


  10. JimmyBH

    Good questions, it seems reasonable on the surface and it is what you are supposed to think. There are several problems though. First, the correlation required is only a loose correlation not a tight excellent fit. Second and more importantly, during the correlation process a best fit amplification of the data is applied. If you have 100 tree measurements, each higher correlation tree is accepted and scaled by a totally different value.

    If trees ONLY react to temperature and other influences created no noise, this might work ok. If you add any noise to the dataset through things like CO2 which is basically plant food, nutrient levels or moisture level etc. The slope is altered up and down according to these influences as well. The net result creates a statistical amplification and offset of the data which can be calculated. Every proxy I have seen except borehole data is extraordinarily noisy.

    If the noise level present in the collection of proxies creates more positive slopes than negative slopes you get a de-amplification of historic trends as compared to the calibration period (sorting years) created by the noise averaging and de-amplifying these signals.

    It is clearly bad mathematics.

    There are other reasons that tree ring proxies should never have been used for temperature. They are clearly not linear thermometers. The AGW scientists started by blindly assuming linearity and recently have intentionally ignored this fact straight in the face of massive scientific evidence to the contrary. But that is another issue entirely, my post only addresses the statistical amplification of local trends compared to history.

    My other more recent posts have much better demonstrations than this first one.

    Thanks for the interest.

  11. Jeff

    Have you thought about inaccuracies in time at all? (apologies if you’ve covered this elsewhere)

    How about doing the same experiment with the warm period added to each series. However, this time assume that the dates were measured inaccurately, so that for some measures the warmth occurs up to 50 years earlier, and for others up to 50 years later (with a normal spread for the lead and lag). What happens when this is averaged out? Presumably it tends to a flat line

  12. Since this is proper, rather than ‘climate’ science, you should be interested in getting it agressively examined, and ‘disproved’ if possible.

    May I suggest that you draw it to the attention of Tamino and the Real Climate team? Even if they are unable to find any holes in it, their attacks will, I feel, be of general interest in revealing their position. And if they do not comment…..

  13. Jeff,
    The problem with this analysis, is the same as the problem with the previous blog on this subject,

    What the paleoclimatologists do is a two step process.
    1) Calibration of proxies to known temperatures for part of the period during which there is a temperature record.
    2) Verify that the calibration of the proxies does work for a different part of the temperature record.

    In the case of the recent paper by Mann et. al. 2008 they performed that procedure with their proxies in 2 different ways using the 150 year period between 1850 and 1900. The used a 50 year period to calibrate the proxy with temperature, and a 100 year period to validate the calibration that. They tried reconstructions using the first and last 100 years as the validation period.

    In addition it seems that the real temperature data was used for validation, not a linear approximation.

    This is a lot different from what you did with your random data to choose which of the noise generated data to pick for your reconstruction.

    Suppose you used a calibration and validation procedure similar to what was done in Mann et. al. 2008, on your noise generated data. How many of your noise generated proxies would pass?

  14. Oops,
    The 150 year period between 1850 and 2000 was used for the calibration validation procedure to choose the screened data.

  15. Eric, I see that you have misunderstood my poorly titled post.

    It is poorly titled because there are hockey sticks which use different math. The CPS method represented is used in multiple papers though.

    First thing, I am just about as close to this paper as anyone. I have read all the files in huge detail, downloaded the data used as well as the accidentally published 1357 series data. I personally have discovered several flaws in the data pointed out around this blog and on CA.

    The points you make are simply statements of method in Mann08, he was quite open about his methods except for the EIV reconstruction which is difficult to decipher.

    What I do in this post is use near-identical methodologies to show that CPS methods cause distortion in the historic signal relative to the “calibration range”. This distortion is a function of the frequency and magnitude of the noise level in the data. By simplifying the math to a point where most can understand it, I was able to demonstrate this effect with a minimum of confusion.

    I used more complex patterns and methods in other posts to reveal the same effect.

    When you say,

    This is a lot different from what you did with your random data to choose which of the noise generated data to pick for your reconstruction.

    It makes me wonder if you have enough math background to understand the simplicity of this demonstration. I’m not intending that to be critical, you are clearly a smart guy but I get a lot of different types of people here. The posts on this blog are quite clear on this topic but require a small amount of math background.

    Suppose you used a calibration and validation procedure similar to what was done in Mann et. al. 2008, on your noise generated data. How many of your noise generated proxies would pass?

    You are asking the right question, this is the key point of Mann08. Please consider though that over 90% of the proxies had upslopes pasted on them — inside the calibration range (think about what that would do to the magnitude of the rescaling) — prior to correlation analysis. r values were computed for all proxies based on gridded data and the ones with the best match to temp were kept the remainder were thrown away – you can see them in SD1 on line.

    Regarding this post.
    A number of publications have shown that this ‘sorting’ is not an acceptable method for proxy analysis as it guarantees a result from noisy data. This is something unique to climatology and is not acceptable as a form of science. Sorting based on reasoning for choosing a proxy is acceptable when the reasoning is openly explained, however sorting for the shape of the curve you want was the #1 reason I started my AGW kick. My blog, in August was to be on science and politics not AGW.

    High autocorrelation effects from filtered data also become an issue. This was extraordinarily badly addressed in M08 resulting in a higher acceptance level of even the faked-modified data.

    I hope that you will consider these arguments with an open mind. They don’t invalidate AGW in any way, but the reality is that this paper is as flawed as anything ever published. The math is so bad, I personally believe it is done with intent.

    Here is the quote I promised earlier which is from page 1,

    Because of the evidence for loss of temperature sensitivity after 1960 (1), MXD data were eliminated for the post-1960 interval. The RegEM algorithm of Schneider (9) was used to estimate missing values for proxy series terminating before the 1995 calibration interval endpoint, based on their mutual covariance with the other available proxy data over the full 1850–1995 calibration interval. No instrumental or historical (i.e., Luterbacher et al.) data were used in this procedure.

    We didn’t like that the MXD data had a downslope, so we chopped it off, pasted on an upslope using fancy interpolation from only 55 proxies and passed 90% of it through calibration.

    You can confirm my percentages yourself in the SD1 data file in the first of 3 r correlation columns – It’s the shweingruber data also referred to as briffa on CA.

    CA got access to the original data and it resulted in a post here

    If you read and look around with an open mind, it will confirm my remarks and technique.

  16. Jeff: What a mess. Doesn’t it make you stop and think when Steve McI does not endorse this sort of stuff…that maybe, just maybe you don’t have such a stunning final or even initial straw? Steve is very particular to the social dynamics and avoids correcting the skeptic side. But he will still very carefully fail to endorse stuff that is just stupid.

    Given, you screwed up so much here in just basic issue analysis, doesn’t this make you pause and think that you need to be more careful on stuff in the future? More careful not to jump to conclusions?

    P.s. You led me to dhog. Dhog led me here.

  17. I see nothing wrong with the issue analysis TCO. This post is correct and a clear demonstration of how the signal becomes distorted in time.

    If I’m wrong, please tell me where the error is so I can correct it.

    There is nothing stupid I can see in the above post except for you having so limited a mathematical knowledge you can’t figure it out. I redid this post in build your own hockey stick and posted the R code for it. Give it a run and tell me specifically what’s wrong with it or stop being a complete Idiot. You’re better than this – be all you can be.

    Also, what does Steve M’s endorsement of this post mean either way. I was rather left with the impression from Steve that it looks right to him, the math is too simple not to. He said- don’t say every hockey stick B/C they are made differently.

  18. A. Your procedure is simpler than Mann’s, which looks at multiple periods and at more than linear fits.
    REPLY:Not correct, the procedure is analogous as proven in make your own hockey stick.

    B. Obviously, you have to use the current instrument period to qualify proxies. You need to watch for data mining. And physical arguments help you. But just speaking out against any calibration from observed temps is insane.
    REPLY:Calibration of a noisy dataset using this method invariably results in matching your intended signal creating good ‘verification’ while distorting any real signal in the data as proven here and in the other post.

    C. Note that here you are speaking out against using a trend for qualification, but in the Steig case, you have a (unquantified and not clearly described with math logic) argument against “high frequency” matching. Any port in a storm, eh!
    REPLY:Disaggregate TCO, different problems entirely. These papers have no similarities I see.


    You can do better than this.

  19. Stop voice in Goding me. That’s an RC tactic.

    REPLY: Be honest in your criticism and you’ll get treated better. You haven’t even read the material and you are already jumping on it. I pointed out to dhog the material in my post actually WAS published already.

    If you want rights to post openly, you are absolutely required to be honest. This is what lead to the removal of some of your posts. You were way to fast to criticize and are way off the mark indicating a dhog style troll driven by whatever personal craziness your life gives you. Post well, ask questions, figure it out and then you can tear it up. The material is correct however so don’t be afraid to agree.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s