the Air Vent

Because the world needs another opinion

Hockey Stick Explanation

Posted by Jeff Id on July 6, 2010

There is quite a bit of confusion about the nature of hockey stick temperature constructions.  Currently, many non-paleo climate scientists seem to want to avoid the discussion altogether.  However, these studies are still freely passed through review, which seems to me a very biased point of view.  I reported here,   on an open review of a paper written by Ammann on a different method for scaling proxies to correct for variance loss in proxies.  As I read it, it looks like a method which will get closer to a proper solution but not fix the problems.   It seems that some climate scientists have fully recognized the problem of Mannian style reconstructions and are interested in improving the results.

This probably has come about after the NAS panel’s report on Mann’s hockey stick but whatever the reason it is good news.

In blogland, people tend to see the hockey stick as a temperature graph that Steve McIntyre debunked.  What is a commonly missed point is that the first hockey stick was the result of mathematical error causing the preferential selection of a certain group of high variance proxies.  Since that time, that particular error has been corrected but current hockey sticks are created by different proxies and methods.  All of these global temperature methods that I have read – and there have been many – try to linearly re-weight multiple proxies to provide the best match to measured temperature.   Since the proxies are noisy – very noisy – this reweighting process preferentially selects noise which happens to create better agreement to measured temperatures and deweights the noise which doesn’t agree.  The result is that the signal in the measured temperature region becomes a good match (because of the noise) while the historic noise is unsorted and randomly combined.  The near guaranteed result when the measured temperature you are matching is an upslope is a flat pre-measurement handle and an unprecedented blade.  Again, this is due to the noise and has nothing to do with the signal.  Nobody even knows if there is a temperature signal in trees.

Now in the link above ordinary least squares (OLS) is compared to the new method which regresses one proxy at a time using the OLS method to estimate residuals.  It doesn’t matter if you don’t get that part because it’s just another way to calculate what to multiply times each series before adding them together.

Items which match better to temperature still get more heavily weighted than those which don’t.

Climate scientists like to call it variance loss to the low frequency signal, I prefer to refer to it as variance amplification of the noise.  The OLS method shown is, like many methods, completely insensitive to the sign of the proxies.  In other words a downslope proxy with an inverted temperature profile will be flipped upside down and weighted heavily.   Of course the physical meaning of reading a thermometer upside down (because you like the fit better) is nonsense.   I found the discussion of these effects by climate scientists posting replies to the paper by Ammann to be interesting in that they acknowledge and understand that Mannian’s latest reconstructions will likely exhibit these characteristics, but find that many seem to have failed to understand the reasons for this variance amplification problem. They discuss testing various methods against different types of noise and this sort of thing,  an excellent idea, but really seem to skirt around the root cause of the AUTOMATIC AND GUARANTEED variance differential between the calibration range and the historic range.  — See hockey stick posts linked above for more.

If you have interest in these things, the link above and the replies in the interactive discussion are quite interesting and informative.

Here is a comment which I noted in my previous post, made by one of the reviews in the interactive discussion.

However, it is well established in the statistical literature that traditional regression parameter estimation can lead to substantial amplitude attenuation if the predictors carry significant amounts of noise.

This has been an endless point made here but still many people have failed to understand the difference between these methods and the Mannian original hockey stick method. It’s also worth noting that these methods are applied throughout proxy based climatology and I’ve not seen a single good one.  Dr. Loehle made the best real effort by averaging pre-calibrated curves but his source data could very well be nonsense as  nobody has any proof that these proxies are in any way temperature.  Other papers using multi-proxy methods include rainfall estimates, sea ice extent and even one in coral that was run at Climate Audit.  They all seem to contain the same kinds of regressions, they keep finding unprecedented results, and they keep being passed through review despite these known MAJOR issues.

Anyway, it seemed worthwhile to call attention to this paper again and to try again and explain what is creating so many unprecedented paleoclimatology curves.  I hope that climate scientists continue their progress toward being honest about the horrible state of paleoclimatology and step back from the “unprecedented” language.  So far, there has been very little change.


25 Responses to “Hockey Stick Explanation”

  1. Kenneth Fritsch said

    Jeff ID, I hope you realize how important it is for others understanding of temperature proxies for you to continue to make the points that you do here. Some of these points are touched on by some of the not so rabid advocates in the climate world, but even then they appear to me to be made almost apologetically.

    That blind spot that you refer to by some climate scientists in this matter is, I think, caused by their assumption of a rather large climate effect from GHGs with positive feedback (from the consensus thinking) and thus their resulting assumption that there must be a temperature signal in the proxies that shows this effect. That what some of us see and explain as primarily white noise in the pre instrumental period that is calibrated and selected in the instrumental period to give the HS blade is just too difficult a concept for them to comprehend.

    The original Mann et al. proxy showing the nearly straight line historical temperatures attached to the modern blade is what one would expect from treating the white noise in the manner that you have noted. The timing of that publication and the IPCC use of it made it iconic for the climate scientists/advocates and thus very difficult for them to be critical of it. In fact the opposite was done and involved scientists chose to defend it and add to it to the point of their arguments almost being bizarre. I think it was at this point that the tone exposed in the climategate emails became prevalent in the climate science community and to the point that normally anticipated scientific criticism of the HS nearly dried up. This situation then feed upon itself to point that a belated criticism becomes almost an admission of some dark hole in the entire peer review system that in turn weakens the scientist/advocate arguments for AGW mitigation.

    I think over time I see Mann, himself, as less of the problem in the HS matter and the climate science community as a whole more of the problem. Unfortunately I think we can find Mannian type scientists in all areas of sciences where claims are made rather prematurely for something that at first glance appears to conclude something new and unique. That individual scientist can, because of pride and ownership and in, like Mann’s case, a strong advocacy position, have a rather difficult time backing away from his original conclusions, but that is compensated for by other scientists in the field rendering their scientific skepticism. Unfortunately in the climate science community the mantra has been consensus and skepticism has been given a negative connotation.

  2. Jeff Id said

    Thanks Kenneth,

    I keep trying to think of better ways to write it. I’m not at all sure what the future of this holds but the reaction to the Ammann work was a positive.

  3. Derek said

    I think I might just be beginning to grasp what you are trying to describe in this thread.
    Illustrating it with a completely different data set / subject might be an idea,
    especially if it gets a really silly result, to help explain to blogland.

    Idea – Find a data set, NOT climate science related, treat it in the above statistical manner,
    reach silly conclusions and, submit results for publication.
    Would it be published. ?

    This might help separate the problem you are trying to illustrate to us all, yet again. AND,
    thank you for the patience to try again, it will sink in eventually, hopefully.
    – It is beginning to with me, and I only have CSE maths – lowest of the low in the UK..).

    I also wonder if the temperature / linearity issues and proxy / temperature reliability are clouding your explanations in some peoples minds.

  4. Brian H said

    On a related matter (data selection), you might like to have a look at this, if you haven’t already:
    “In a paper submitted to a US Senate Committee on Commerce, Science, and Transportation hearing Professor Zbigniew Jaworowski explains,“The basis of most of the IPCC conclusions on anthropogenic causes and on projections of climatic change is the assumption of low level of CO2 in the pre-industrial atmosphere. This assumption, based on glaciological studies, is false.” This means more when you know that Tom Wigley, who is the heart of the CRU gang, introduced the 280 ppm number to the climate science community with a 1983 paper titled, “The pre-industrial carbon dioxide level.” (Climatic Change 5, 315-320). He based his work on studies by G. S. Callendar (1938) of thousands of direct measures of atmospheric CO2 beginning in 1812. Callendar rejected most of the records, including 69% of the 19th century records and only selected records that established 280 ppm as the pre-industrial level.”

    See this graph: http://www.canadafreepress.com/images/uploads/ball122809-1.jpg
    showing his data selection.

  5. Steve Fitzpatrick said

    Jeff,

    I understand the problem with multiple regressions on noisy data that may not be stationary, but I am not sure I see a solution. Only independently determined functions that describe how the individual proxies should vary with temperature (that is, functions based on physical rational, not based on a “calibration” against the temperature record) would seem to give any scientific legitimacy to the whole exercise.

    Do you see any defensible approach?

  6. Jeff Id said

    #5 Perhaps if the data was variance normalized over it’s entire length. All series then averaged and the result scaled to temp. I don’t see much else that can be done with it reasonably. Mann infilling data with his favorite hockey stick blades and then regressing RegEM style is about as ugly a process as I could imagine. It does make a HS though.

  7. Steve Fitzpatrick said

    Jeff #6,

    So how would you do that? I imagine you could 1) calculate the mean for each proxy over the entire record, 2) calculate the deviation of each from its own mean, 3) change the sign of the deviation, if needed, so that a positive value means higher temperature, and 4) scale all of the deviations so that they have the same total variance. You could then average the lot and correlate the later part of the average against the instrument record. Does this sound reasonable to you?

  8. Jeff Id said

    I was thinking of something along those lines. The proxies have to be upside right of course but with varying types of proxies they may need to be done in individual groups, i.e. tree ring widts separately computed from mollusk shells or something. If they weren’t done separately the autoregressive component of the high frequency proxies would cause an artificial amplitude reduction in comparison to low frequency responders like sea ice. If you did all the sea ice together and the trees together and then scaled and averaged again it might improve those issues.

    It’s hard to talk seriously about it because the proxies have a high probability of being mostly unrelated to temp…. at least imho.

  9. Derek said

    #5 & #6 Surely if you have x number of variables effecting something you need to quantify the effects of them all, to know what one variable contributed, and how it varied. ?
    Otherwise how do you know the difference is attributable to one variable only. ?
    Tree rings I seem to remember have at least 5 major variables effecting their width.

  10. Jeff Id said

    #9 Correlation baby, that’s how. No correlation after the process then the S/N doesn’t exist. Mann cheated by throwing out 60% of his uncorrelated data in his 08 paper, he got a decent match too but played a lot of what can only be called ‘games’ to get there.

    Obviously, you are right though and I suspect that’s why we don’t see these methods used. After the process they give bad correlation.

  11. Jeff:
    Anthony Watts at http://wattsupwiththat.com/2010/07/06/co2-field-experiment-likely-to-cause-do-over-for-climate-models/ and Pierre Gosselin have postings today about a couple of papers from the Max Plank Institute. Of particular significance is the first highlight that pgosselin posted, “In most ecosystems, the photosynthesis rate at which plants fix carbon dioxide from the atmosphere changes relatively little as the temperature varies.” Doesn’t the growth rate of trees, i.e. tree rings depend directly on the photosynthesis going on in the tree leaves? If this process “changes relatively little as the temperature varies”, then one could conclude that the validity of tree rings as a proxy for temperature is questionable.

    I just finished Bishop Hill’s book, “The Hockey Stick Illusion” and I would highly recommend it to your readers. The data selection process (Others may call it cherry-picking) and the contortion of science to create a hockey stick is mind blowing. I raised the question,
    “Does a tree’s growth rate respond to temperature?” on my website, http://www.socratesparadox.com

  12. AMac said

    This is somewhat off-topic, but does provide some context for the discussion, perhaps. Referee #3, Kevin Anchukaitis, commented at C-a-s back in May to defend dendrochronology. I addressed a response to him downthread. But there was no follow-up.

    I have often wondered about the problem described by Jeff, “The near guaranteed result when the measured temperature you are matching is an upslope is a flat pre-measurement handle and an unprecedented blade.”

    The answer seems to be that it’s not a question that paleoclimate folks are interested in asking (though maybe a closer read of the paper will make me eat those words).

    For an extreme version of the Consensus’ “any proxy’s good, as long as it shows what I want to see” perspective, see the six-comment sub-thread at Arthur Smith’s site that starts with AMac boringly continues to.

  13. mikep said

    There is also the problem of spurious regression, where correlations between two variables are the results not of causation from one to the other but of both depending on some third factor, and nonsense correlations = high correlation but no causation – which can very easily arise from comparing two non-stationary time series. Both problem were analysed by Udney Yule early in the 20th century. And econometricians have developed ways of avoiding nonsense correlations.

  14. Jeff Id said

    #12 “There is also the problem of spurious regression, where correlations between two variables are the results not of causation from one to the other but of both depending on some third factor”

    I think you’re right, but the problem is even trickier when the data is so noisy and inverse correlations are common. In the least squares method or more aptly the RegEM method (not much different), the calculation cannot tell the difference between positive and negative correlation. Once one normally positive yet noisy proxy has a negative correlation that fits better, spurious errors are created. I mean, we know how sediments respond to temp, but in regression the signal is often flipped. Steig et al was a good example, when you run a regression which doesn’t care about the sign, the algorithm may find that the final result is better when one item is flipped and others magnified to compensate.

    Truncated total least squares is an effort to fix this exact situation, except that climate science prefers to see it as ‘overfitting’ rather than poor methods. I’m a coauthor on an Antarctic temperature paper with Ryan, Nic and Steve which currently is experiencing resistance from publication. Ryan and Nic put massive effort into maximizing the fit without overfitting. The continental trend came out right, a fact I can disclose because it matches simple methods well. Still, my preferred methods are the simple ones. Average, area weighted average, Roman’s offset method. Those kinds of things.

    My guess is that there are a lot of sciences which have covered similar ground to paleoclimatology, but few with so much noise in the data. Paleoclimate has really lost its way and the responses to Ammann in this link represent to me the beginnings of recognition of that fact.

  15. Brian H said

    @BK, #11 “B. Kindseth said
    July 6, 2010 at 6:10 pm

    ” If this process “changes relatively little as the temperature varies”, then one could conclude that the validity of tree rings as a proxy for temperature is questionable. ”
    _____
    No more so than phrenology, surely! 😉

  16. But do you have enough proxies to spatially grid them?

    Seriously, isn’t there an issue with using spatially sparse proxy data and “matching” it against a gridded global temp (or temp anomaly)?

  17. stan said

    Jeff,

    OT — re: Muir Russell silliness

    I posted this question to Pielke, Jr and I’m curious what you and your readers’ takes are.

    The Muir Russell report has been released. Once again, an ‘official investigation’ has been conducted with only one side being interviewed. I have a question for you — I have to think that that the sloppy, one-sided efforts are deliberate. Do you think that the blatantly obvious shortcomings of the process are really a signal by the investigators?

    I’m serious. Assume that the people involved all understand their marching orders are to produce a whitewash. Having some shred of integrity, they deliberately avoid making the effort of dotting all the ‘i’s and crossing all the ‘t’s that a thorough quality effort would require. Since the ‘investigation’ is clearly a coverup to anyone looking at the incompetence of the process, they are signaling that they aren’t happy with what they have had to do. It’s subtle sabotage.

    Why else would they produce such obviously inadequate investigations? They don’t even bother to go through the motions.

  18. Kenneth Fritsch said

    I am disappointed that there has not been more discussion of the technical note by C.M. Ammann et al. (2010). The paper presents, in my view, some good background on the attenuation problem where the calibration and prediction periods have different signal to noise ratios. They suggest that smoothing and filtering noise can remove some of this problem. They also point to orthogonal regression techniques as possible solutions after warning about “dangers” in using these techniques.

    Of interest in applying the “correction” is viewed in Fig 1 in the paper where the known and OLS methods render a different reconstruction bias but with approximately the same relative lower uncertainty limits, whereas the methods with corrections lead to biases smaller than the OLS but with CIs that are larger. For the correction described in this paper by authors (ACOLS) they note that the variance is somewhat increased when in fact it is dramatically increased.
    The authors also use a rather limited spatial computer model to test their correction method. The critical step in their procedure is in estimating a value for the variance of the white noise component in the predictor time period. The authors point to the introduction of red noise as leading to significantly greater uncertainties in the reconstruction results.

    I need to read the SI to understand better what the authors actually did with regards to adding noise to the predictor, but what I have a problem with in these papers is that they are published to improve a previous method but at the same time and unfortunately often attempt to show that the previous results with regards to AGW remain totally valid. Sometimes to do that they have to introduce estimates and parameters that do not cover a full range of what could occur in more skeptical world. I am always watchful for this occurrence.

    One can see that the noise to signal ratio in Fig. 1 of the paper is critical to the reconstruction making any sense at all and the authors have only shown what their correction does when the ratio is equal to 1. Knowing that these papers often bury the uncertainties and more revealing sensitivity testing in the SI that is where I will read and report next.

  19. Jeff Id said

    #18, I think that when I stop blogging about technical stuff for a couple of weeks people move on. I agree with your criticism that the work attempts to make previous work sound ok, when in fact they are demonstrating why the previous work stinks.

  20. AMac said

    I have a problem with uncertainty estimates and error bars in many paleo reconstruction papers.

    The starting assumption is, “there is a correlation of X between local temperature and proxy value through the instrumental record.” Where X = 0.4 or a similar low but meaningful number. Now, we assume that the correlation of X is maintained for the time period under study.

    But we can’t know that is the case. Tree rings strike me as especially vulnerable to examination. I gasped when I learned that tree ring readings taken from a stand of trees aren’t all incorporated into the proxy (or randomly selected). Instead, I understand that it takes expert judgment to select those records which correlate with temperature during the calibration period. And it takes expert judgment to select records for use from the reconstruction period, as well. (Hopefully you can tell me that this is incorrect.)

    If this is indeed how it’s done, there are potentially large sources of uncertainty that are “nonquantified,” and thus left out of final uncertainty estimates. The risk is that the expert, in seeking a coherent story, will produce one. But without a guarantee that this story concerns temperature (or whatever the specified climate variable is.)

    From my imperfect understanding, there is a second serious problem with tree-ring calibration, in the divergence problem. Somewhere between 1960 and 1980, many “well-behaved” tree stands “lost” the temperature-related tree-ring signal that had been there for decades. The data from 1960/1980 to the present are good–it’s just the purported relationship with temperature that has changed from earlier years.

    It seems to me that any statistical theory of calibration ought to require the entire instrumental record to be used. Unless a pre-existing argument has been made to exclude certain data. “Pre-existing” means “blind to the data at the time the exclusion decision was made,” never “I took a look at the data and didn’t like what I saw.”

    So it looks like handling the Divergence Problem by truncating data series at 1960 or whatever is the equivalent of a declaration of true love: “I love the apparent high-r relationship of treerings to temperature 1880-1960, more than I love the disappointing lower-r relationship that the 1880-2010 data demonstrates.”

    This seems very counter-intuitive. If I am understanding the gist of how the Divergence Problem has been handled in calibrating treering data sets.

  21. Kenneth Fritsch said

    But we can’t know that is the case. Tree rings strike me as especially vulnerable to examination. I gasped when I learned that tree ring readings taken from a stand of trees aren’t all incorporated into the proxy (or randomly selected). Instead, I understand that it takes expert judgment to select those records which correlate with temperature during the calibration period. And it takes expert judgment to select records for use from the reconstruction period, as well. (Hopefully you can tell me that this is incorrect.)

    As far as I know you have the gist of the problem, although I am not sure that selection is based on individual trees in a stand. The problem as I see it is that tree ring proxies are not based on a reasonable a priori selection criterion and then sampled on a random basis. I think that dendros in general have little appreciation of what affect their selection process has on the statistics of the proxies.

    The divergence problem in my view is what would occur if one did what the dendros have done and that is to construct a proxy based, in effect, on in-sample data. The only way to test the validity of that method is with out-of-sample testing. That, in effect, is what the divergence problem shows and it shows that the in-sample data based proxies are not valid. Climate scientists do a lot of talking around this situation in order to avoid admitting the essence of the problem, but in the end it is what it is.

    When Jeff ID and others point to specific problems with the proxy methodology they are not implying that that specific problem is the only problem.

  22. Jeff Id said

    #21 It’s absolutely not the only problem, but it is the easiest to conclusively demonstrate. Steve M maintains that proxy quality is the single largest problem. He’s probably right.

  23. AMac said

    My suspicion is that, were I to learn dendro methods and then apply data selection and statistics that I think are rigorous and appropriate, I would end up with a reconstruction that is a spaghetti trace like many of the others. Except that my error bars would be on the order of +/- 4 degrees, instead of the aesthetically-pleasing and seemingly-informative +/- 0.4 degrees and less that we commonly observe.

    In all likelihood, this manuscript would be rejected out of hand as being fundamentally uninteresting, even if correct (and who can really tell if it is or not, anyway).

    If this is approximately right, there are strong selection pressures operating in favor of the paleo-methods status quo. Those doubters who develop a facility for crimestop will be able to use the methods that generate ‘informative’ results, and publish. But the doubters who keep applying methods rigorously are going to experience less professional success, on the whole.

  24. Kenneth Fritsch said

    The SI for the paper linked above gives only three pages which show the higher resolution graphs for the OLS and the correction applied by the authors. The link below which is in the body of the paper gives more information in nine pages.

    The SI immediately points to the authors correction with a reduction in bias for reconstructions compared to OLS, but also its use increasing the CIs. The addition of red noise was stated in the main paper as a comparable result for the correct but “deteriorative”. An interesting word to use for a dramatic deterioration of the correction method when red noise is present (added). You can see it for yourselves in the SI linked here. The CIs are large and not symmetrical about the line and appear biased in the higher temperature direction and are only shown for the multiple regression case.

    Again I see a significantly different view coming out of the SI compared to the main paper.

    http://www.clim-past.net/6/273/2010/cp-6-273-2010-supplement.pdf

  25. sleeper said

    AMac says:

    My suspicion is that, were I to learn dendro methods and then apply data selection and statistics that I think are rigorous and appropriate, I would end up with a reconstruction that is a spaghetti trace like many of the others. Except that my error bars would be on the order of +/- 4 degrees, instead of the aesthetically-pleasing and seemingly-informative +/- 0.4 degrees and less that we commonly observe.

    And you would be correct… and go broke. High uncertainty doesn’t pay well in climate science.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: