To the Drawing Board

It’s a good day for me.  Nic sent a link to an Ammann paper which discussed the suppression of variance in paleoclimate reconstructions- the bane of my sanity.  TAV wouldn’t be a climate blog without the horrible statistics in paleoclimate.  Ammann (A09) proposes a simple method for correcting the calibrations in paleo reconstructions.  The correction is ad-hoc and poorly tested in the paper but it is the beginning of some proper self reflection by the paleo community.

The link to the paper is here.

A couple of cathartic quotes for my id, which has been ranting into an apparent vacuum as far as climatologists go, for over a year now.

Regression-based climate reconstructions scale one or more noisy proxy records
against a (generally) short instrumental data series. Based on that relationship, the
indirect information is then used to estimate that particular measure of climate back
5 in time. A well-calibrated proxy record(s), if stationary in its relationship to the target,
should faithfully preserve the mean amplitude of the climatic variable. However, it is
well established in the statistical literature that traditional regression parameter estimation
can lead to substantial amplitude attenuation if the predictors carry significant
amounts of noise.

Thank god, it only took about a minute for this engineer with Mann08 to figure it out.  Now, after almost 2 years of frustration, it’s great to see some of the work recognize the problem.  All of my hockey stick posts above address this exact issue.  Anyone think Ammann is a skeptic?

Climate proxies derived from tree-rings, ice cores, lake sediments, etc.,
are inherently noisy and thus all regression-based reconstructions could suffer from
this problem.

ALL REGRESSION BASED RECONSTRUCTIONS.

Yup, that’s what I said.

TLS has received significant attention and new NH reconstructions based on this technique generally exhibit  more pronounced amplitude (Hegerl et al., 2006, 2007; Mann et al., 2008; Riedwyl et al., 2009).

They are discussing the amplitude of the calibration period — IOW, how to paste a blade on a hockey stick.reconstruction period and the reduced straightness of the hockey stick.

Now this part of the discussion is for the more technical of the readers.  It’s from the conclusions portion of the paper.

One trade-off that has to be accepted in regression-based reconstructions is that the 25 correction for bias comes at the cost of increased variance (see Supplementary Material:  http://www.clim-past-discuss.net/5/1645/2009/cpd-5-1645-2009-supplement.pdf). This variance increase is mostly concentrated at the interannual scale, and thus decadal smoothing of the reconstructions results essentially compensates for this.

What an interesting statement in the conclusion.  The paper discusses correcting for noise in proxies,  they have proposed a method which I’m not sure is new or something done elsewhere (a point brought up by one of the scientists who reviewed).   The method comes right out of the blue from my perspective.  Anyway, they tested it on some very unique noiseless data – for which it worked well.  If you can read the equations presented, it corrects proxy scale multipliers  (called slope here) by the variance (read annual noise).  It doesn’t seem likely that this method would work well when there is significant autocorrelation and multi-year noise.   It relies instead on annual variance to properly re-scale the series. The potential biases were oddly not explored by the authors and that particular point was criticized by several of the technical comments.  — correctly.  My guess is that the reason it wasn’t explored is that the authors knew what they did was an improvement, yet knew it would fail a more real-world test.  It will, but like the recent sea ice wind paper, that in itself doesn’t make it bad.

What makes me so happy tonight is the feeling of a teenie tiny bit of vindication for my comments on Mann08, 09 and several other reconstructions.  Again, this is a different problem from the one SteveM dealt with in Mann98.  Different math, different problems, same result – coincidence?

Even more interesting than the paper though are the technical review comments in the interactive discussion– a nice format for publication.

A. Moberg

But, in the real world we are dealing with proxy records which can (and do) have much more complex noise structures. We cannot then simply assume that we have a higher SNR at low frequencies compared to high frequencies at the individual site records. If there is much noise at low frequencies in the original proxy series, then I would intuitively guess that ACOLS would result in an artifical inflation not only of the high-frequency noise component in the final reconstruction (as shown in Figure S1), but also inflate the low-frequency component of the noise.

I don’t know A. Moburg and don’t feel like looking him up but  I don’t particularly agree with his intuition, in real world proxy data, the annual noise is quite dominant and different proxies with different annual autocorrelations will average to uncertain results with this method.  A statisticians nightmare.  However, his points on the more complex noise structures are well taken.

Zorita had this to say at the beginning of his comment.

Are all climate reconstructions wrong? Well, this manuscript does not imply it, but it is a welcomed warning that reconstruction methods may be more complex than they seem.

And this:

For instance, when using a linear regression model to reconstruct past temperature (predictand Y) from a temperature-sensitive proxy (predictor X), the estimation of the regression parameter by ordinary-least-squares (OLS) requires that the predictor is noise-free. This is clearly violated most of the times since time variations of proxy records are due to many other processes than temperature. The blind application of OLS in this context leads to an underestimation of the regression parameter, and thus to an underestimation of past climate variations.

Not a surprise from Zorita, who figured this out in 2004.

Christiansen had this to say:

Strangely the authors fail to cite our recent paper (Christiansen et al. 2009) which showed that variance l oss is a serious problem for 7 different reconstruction methods including both direct and indirect regression methods as well as methods based on CCA regression and TLS. All the methods showed substantial underestimation of trends. low-frequency variability, and the pre-industrial level.

Well I think we’re reaching a consensus here, but wait.

Brohan said this:

The main issue with the paper is that both the derivation of the method, and the pseudoproxy tests, are done with simplified and idealised forms of contaminating noise. A clear implication is that the methods will work similarly well for real proxy data, where the contaminating noise is more complex; this is not likely to be so.

And clarifying with this:

The conventional approach to solving this for B0 and B1 is to choose the values that minimise the RMS of E – it is now well established that this often results in biased values of B1: that it mis-represents the true relationship between the proxy and the climate.

B1 being the slope of individual noisy proxies.

This is a standard assumption in mathematical statistics,  and the paper does an admirable job of demonstrating the value of ACOLS where it holds. But for real proxy data this assumption is grossly violated: U will be autocorrelated, correlated with X, non-normal, and sigma_U will vary with time. It’s not reasonable to require a calibration method to cope automatically with all these problems (probably no method does), but the value of the proposed method is not how well it behaves in the idealised case, but how well it will do in the real case.

Gotta love it, my mood is improving by the moment.

Anonymous reviewer 3 is one of my favorites.  If you are a serious reader here, consider the bold statements below.

A final point highlighting previous online comments is necessary regarding the representation of the literature, as discussed by Christiansen. I think Christiansen’s argument is a little narrow, but I would agree that Ammann et al. have not done a particularly good job at characterizing the arc of the variance loss and bias discussions within the literature. It is, for instance, surprising to see the Mann et al. (2007, 2008) papers cited as acknowledging the need for attenuation correction. These papers are part of a series (e.g. Rutherford et al. 2005; Mann et al. 2005) dating back to the Mann et al. (1998) publication that have argued vehemently for the ‘low-amplitude’ reconstruction originally reported in that paper. While latter studies test new methods, the thrust of the arguments throughout these papers has been that there is likely no variance loss or biases in their reported results. To imply that the need for variance corrections has been advanced by these studies is therefore a rather serious mischaracterization.

Ok, that comment is correct enough to make me quit blogging on hockey sticks.  The boys are getting it, and it’s not 100% politics first.  He’s absolutely, flatly stated that the premise that Mann had considered variance loss in any way is crap.  Ammann seems to me to be looking for the exit.

But we’re still not done.  It’s a good day/night in blogland.

Anchukaitis wrote:

Most of the other reviewers have commented that red noise is both more realistic and more difficult to deal with. In my own emulations of the ACOLS method, red noise  (lag 1, 0 < r < 1) can introduce spurious variance at the decadal and multidecadal scale, although the longest-term multicentury or millennial mean may still be captured.

I liked Ammann a lot less after this comment.  This means I was right, the new method doesn’t fix the problem in a reasonable case.  There is a difference between suspecting something won’t work and having confirmation.  You cannot write a paper on a fix for paleo reconstructions, find an apparent solution to the variance problems, and employ  the ONLY TYPE of noise which allows your fix to work — BY ACCIDENT.  Yup, I’m at it again.  This looks like intent to me, if Ammann comes by and explains himself here publicly, he has a chance of being off the Mann list, otherwise, there is no realistic choice.  If you get the math, you should agree.  However, it’s not a big deal because every SINGLE  reviewer picked up on it.  100% honesty.

It is a good time for climate science. The Anchukaitis comment finished with this:

The manuscript would benefit most from additional tests and representative examples and comparisons using realistic red noise.

Now that IS a consensus.  A natural consensus borne of the truth of simple math  in the face of endless reconstructions presented as confirmation that today is the warmest time in history.

Back to the drawing board boys.

—–

I recommend that anyone interested in the hockey stick plots, takes a moment to read the above links and view the appropriate graphs.  It has been an interesting and rewarding night for me.  It’s cathartic to see some sanity in this world.

41 thoughts on “To the Drawing Board

  1. Stunning. Hopefully it is also meaningful. Maybe now is the time to go after Mann ’08 with a published and peer reviewed critique.

    Makes one think back to the discussions here and on other blogs. Honestly, I can’t think of a single “AGW freindly” scientist who stepped up to the plate on this. I can’t think of anyone who even engaged in any meaningful discussion. On the contrary, there were plenty of drive by ad homs, appeals to authority, dismissiveness, etc.

  2. All this proves is that the science is NOT IN, and that the AGW debate is alive and well. So, AGW alarmists are just that – alarmists; and must be treated as such. Skeptics are just that – skeptics and must be treated as such. Hence, the AGW thesis is far from proven and must be researched for many years to come before coming to a definitive conclusion. So, any move to “save the planet” must be treated as a fraud or a hoax since they are being peddled under the assumption that the AGW case is proven, which of course as shown above is not.

  3. Nice one, Jeff. Looks like the penny is (slowly and belatedly) dropping. Unfortunately it will still take some time to filter through to the Alarmists and politicians who are still making crass and uninformed statements and policies.

  4. Is that a faint pink glow over to the east, amid this deep and enduring darkness we’ve been enduring?
    Can it be that a new day is dawning for CS practitioners?

    I’ve been totally enmeshed with dayjob stuff, but when I step over for a peek, here’s Jeff with amazing stuff.

    Thanks again,
    RR

  5. Crap like this makes my head explode when it gets published.

    BTW, it is possible to have errors result in increased amplitude in EIV/CFR methods . . . we saw this with Antarctica. I do not see how it is possible (I would bet a lot of money that it is not) for CPS-type recons to show this behavior.

  6. As I’ve been cautioning about proxies for years, why would you expect to be able to predict a really hot or really cold day, week or year in year 990 on the basis of calibrations made on instrumented, near-continuous data in the 1990 era? Of course the variance will calculate out as less in way back times. Nobody has argued that year 990 was hotter than 1931 or 2007 or whatever. The existing maths remove that possibility to the unlikely realm.

    Temperature is only a proxy for heat flow, which is the main item of interest to global warming in any case. Bias bothers me more than precision loss.

  7. Very good Jeff !!!
    Way ahead of the curve !!!
    At least it’s another small step in the right direction.
    Will they admit soon that there is also a slight possibility of a tiny chance that there may be some consideration that the data collection process might be less then perfect?

  8. And the effects of this type of admission follow through to important big-picture areas:

    If the variance in temp reconstructions is greater than was previously considered, this carries through to climate models whose hindcasts were compared to these reconstructions for “validation”. That becomes less impressive, as even more models with completely different forecasts can now fit through the now wider window. (Not that I trust any climate model, warming or coolling).

    It also affects attribution of anthro-C02 vs. natural causes to current day trends. The more variance reported from the past, the less unique the current temperature may be, the more you can attribute to natural causes, the less they can state they can’t think of anything else but CO2 as a cause.

  9. Here’s a question-one of the advocate’s arguments against the MWP is that it did not occur at the same time everywhere. Well, the problem with that claim is that dating of certain proxy evidence has some uncertainty. This means that for some proxies, combining a number of them will lead to a “smearing” of the MWP signal. This is analogous to the problem of underestimating historical variance, but I don’t think anyone (except some passing comments by Craig Loehle) has actually raised this issue. Probably because there is no obvious solution to it.

  10. Re: Timetochooseagain question: In general, the farther back in time people sample data the more dating uncertainty there is and often the farther apart in time the samples are (or the farther apart the dated samples are) due to costs–this will make an event appear asynchronous around the world for e.g., the MWP peak. Similarly, the effect of combining proxies with dating error is to smear peaks/troughs (flatten them). I showed both of these in a REAL, PEER-Reviewed journal:
    Loehle, C. 2005. Estimating Climatic Timeseries from Multi-Site Data Afflicted with Dating Error. Mathematical Geology 37:127-140

  11. If you have a number of proxies that seem to show a common “event” (like the MWP) but, as Craig puts it, smeared over time by misdating, could one not time-align them by individually “stretching” or “compressing” their timelines within the (presumably estimable) dating error ranges until maximum correlation is achieved? It looks like we are dealing with two noisy parameters here (“temperature” and date), and to de-noise one of them (the “temp”) by averaging one has to align the contributing data precisely in the other dimension first – just like two or more different copies of a noisy analogue video or sound recording can be combined to gain on s/n ratio, but only if they are precisely in sync: Otherwise out-of-phase parts of the signal will cancel each other out and result both in distortion and (in the worst case) *less* useful signal in the combination than in any of its parts, rather than the desired multiplication of the signal.

  12. Re: ChrisZ–In my paper cited above, I develop a method to do exactly this. In the case of misdating, the effect of averaging series will be to approach the true signal, but with damped amplitude (the mean of the dating error converges to the true value by the Central Limit Theorem. Then you can estimate the amplitude. For white noise errors I demonstrated that my method works nicely. If anyone wants a copy email me at craigloehl at aol dot com

  13. I think there are two different issues being discussed, in #10 TTCA makes the point that if temperatures peak in different areas at different times we get a flatter average global signal. That’s a bit different than the different peak at a different time in a proxy series due to dating errors. In the TTCA case, if I’m understanding, you wouldn’t want the peaks to align b/c the true global average should flatten the signal. In Craig Loehle’s case the warming did happen at the same time but dating error suppresses the historic peak.

    BTW, Dr. Loehle, if you wouldn’t mind sending it, Jeffid1 at gmail.com

    I suppose there is no blogging allowed on it again for some amount of time?

  14. Why sure the other tests were done. They’re in the ‘censored file’. You just have to know where to look.
    =================

  15. Without knowing much about it, this seems to be a problem for the editor and the journal, too. If all the reviewers pointed to the problem, why wasn’t it fixed before publication? Did the editor carelessly ignore the criticisms? Did the author not think the reviews were worth responding to? Was he unable to respond and still have a paper worth publishing? Inquiring minds want to know.

    If I expected peer review to work, this is not how I would set it up.
    ==========================

  16. No wiggle matching. That would assume I know a peak when I see one, but when there is dating error and sampling error, a peak in an individual series is misleading. The mean of n series with each date having random dating errors creates a valid depiction of the peaks and troughs of a cycle, but damped out (larger n better of course). Email me for a copy of the paper. I’ll only sell your address to Gavin!

  17. 15-Actually Jeff I was thinking more along the lines of Craig’s case to begin with, but you do make a good point, if the event really was asynchronous, then you wouldn’t need to worry about that, but how do we know that the smeared MWP was not due to dating error? Or on the other side, how do we know that it was?

    My opinion has long been that it is very hard to conclude anything about the past climate at this time except for very large scale, long timescale things. I do think the MWP was probably global and similar to recent warming if not greater than, but I only think so and can’t prove it. That’s just based on my weighing of a lot of evidence which IMAO requires some subjective assessment. We need more data.

  18. Peter of Sydney said
    April 9, 2010 at 2:05 am

    All this proves is that the science is NOT IN, and that the AGW debate is alive and well.

    No. First, it only addresses paleo-reconstructions, which are only secondary to the AGW debate in the first place. The recons are important to the “A” of AGW only insofar as precedence is concerned. There could still very well be a significant “A” proportion of the recent warming (assuming it is not actually man made in the sense I’m sure you understand) without it being unprecedented. Second, it is only an admission of something many of us have long known (albeit thinly veiled, IMO): without a priori known signal and noise characteristics, along with known physical, and linear, input characteristics, methods that rely on both linear combinations of input sources and stationary statistics are wholly inappropriate for doing the types of extractions that these reconstructions rely upon. Precedence is dubious at best when it is possible to generate just about any set of orthogonal waveforms from just about any data set.

    Mark

  19. #14 Craig,

    sorry to say so, but by averaging without correcting (or at least minimizing) the dating differences first, you will never get a “true picture” of the common traits of your data! As the misdating likely gets worse over longer timespans, the dampening will not be constant, but amplitudes in the distant past will be (on average) more attenuated than more recent (and thus more exactly dated and better “in phase” in regard to the fixed endpoint – the present) data. If you have cyclical effects in your data it gets worse: Averaging, say, one dataset that has 100 sine-shaped cycles over the last 1000 years with another that, by misdating, has these 100 cycles erroneously spread over 1010 years will give a result that shows maximum cycle amplitude at both ends (present and -1000 yrs) and a spurious flat point in the middle – the contours will look like an equally spurious 1000-year cycle modulating the 10-year cycle. More complex combinations of cyclical and non-cyclical effects will be distorted beyond recognition. The phenomenon is well-known with soundwaves (think the “tremolo” of an out-of-tune piano): http://en.wikipedia.org/wiki/Beat_(acoustics)

  20. #25 Combinations of cycles that are independent will not be affected by dating error. You can’t correct it ahead of time or it would not be “error”. I am not proposing that you can find 50 yr cycles in 1 million year old data. I’m looking at say the past 10,000 yrs where dating error is roughly constant. Please read the paper to see that I demonstrate my method.

  21. In an out of tune piano, the sounds distort one another. Dating error shifts a datum to earlier or later–not the same thing at all.

  22. Glad you found the paper and comments interesting, Jeff. I agree that the inclusion of comments is very helpful, and a number of excellent criticisms are made in them. I quite like this publication format. It reminds me a bit of old papers in some of the statistical journals, which had appended to them comments (some very lengthy) made at or following the meeting at which the paper had been presented. In some cases, it is now such comments that are widely cited, rather than the original papers!

    One small query, concerning the quote:

    “TLS has received significant attention and new NH reconstructions based on this technique generally exhibit more pronounced amplitude (Hegerl et al., 2006, 2007; Mann et al., 2008; Riedwyl et al., 2009).”

    You comment on this: “They are discussing the amplitude of the calibration period — IOW, how to paste a blade on a hockey stick.”

    My reading of this statement by Ammann was that it referred principally to the reconstruction amplitude before the calibration period (although I agree that Mann et al 2008, at least, referred also to TLS avoiding an underestimation of recent warming).

    It is well known, of course, that TLS can be expected to give a higher estimate of the slope coefficient than OLS does.

  23. Thanks, many thanks for this, most interesting and enlightening. And, very sincerely, its a most welcome reversion to what has been most valuable in this blog: science, not politics. Very glad it appeared, as I was just about at the point of giving up on you, you seemed to have gone down a black hole with almost no climate science in it. Welcome back!

  24. #31, Thanks for the vote of confidence. I’ve chased conservatives away as well. The only thing I know about blogging is to be honest and write what I think. You are welcome to leave your thoughts but like my own, they are fair game.

  25. JeffID, I am curious about the type of processing that is done in reconstructions. In your synthetic series with the MWP and the modern hockey blade, you had the attenuation effect, and also that it changed the mean, correct? I was wondering if any of the reconstructions did the equivalent of signal decompression. Take your artificial reconstruction: 1. Assume that the MWP signal is now the signal; 2. reverse the process with the full data set to reconstruct the modern period 3. Matrix solution of the modern period to original amplitude; 4. check known MWP and have the mean as cross verification. If this works, one could put artificial small signals and misdated siganls, do a series of tests that show the error of the method. On real proxies the error would be larger, but with the artificial one would set the error of the signal recovery method itself. Perhaps that is what has been done and I just missed it. I am more used to the solve the equation methods used in engineering.

  26. Re Mark T April 9, 2010 at 6:21 pm

    Given one of the pillars of the AGW debate is the existence or lack thereof of the MWP, you therefore agree with me Mark; the AGW debate is alive and well 🙂

  27. I may have misunderstood this; but is he suggesting that we can take a rubbish proxy, or more likely, a set of, rubbish proxies and then inflate their story of the past in proportion to their rubbishiness ??
    – Behold the Ammannomatic

  28. Peter of Sydney said
    April 10, 2010 at 9:02 am
    Re Mark T April 9, 2010 at 6:21 pm

    Given one of the pillars of the AGW debate is the existence or lack thereof of the MWP, you therefore agree with me Mark; the AGW debate is alive and well

    No, I do not, because the recons are not a pillar of the AGW debate. They are only useful in the context of convincing the public of precedence, but have no real bearing on AGW in the first place. They never did. The scientists, correctly, do not claim they do. The scientists don’t want to admit the propaganda value because it is so easy to refute them which risks taking the public out of the AGW debate, but that’s not where the “debate” resides anyway.

    Mark

  29. The historical proxy reconstructions have clearly been used as part of the IPCC campaign to convince the public of the problem (ie on the cover of the report). They have been used by Al Gore extensively. They have been defended against poor statistical methodology and data-hoarding ad infinitum by the pro AGW camps in academia.

    Now we see a different argument along the lines of Marks’ post. The Real Climate guys have been taking this tack for a while now – “Hey, we never said these paleo reconstructions were important!”

    The fact is the entire pro AGW movement used these as part of the propaganda machine, along with claims of current and impending short term extreme events to sex up the case and attempt to provoke action.

    It backfired. Upon scrutiny, reconstructions are completely indefensible, and we see most of the AGW crowd running away from them now. The argument has “moved on” to modeling. So be it, but the entire “campaign” is tainted by the biased presentation of the case using the reconstructions, and the ridiculous machinations of the paleo crowd, and that can’t be changed.

  30. Yes, Mesa, propaganda. They knew then how shaky the results were, and they’ve always known that the results say nothing about CO2 or anthropogenic causes, but pushed ahead because of the graphic impact on public opinion. It never had anything to do with the real AGW debate per se, except how the public viewed the debate.

    Mark

  31. Oh, and I should add, the recons were never used to “end” the debate anyway. The word “consensus” was what they used for that.

    Mark

Leave a reply to frost Cancel reply