Mann09 Analog vs Digital

In Mann08 a correlation screening process was used to eliminate offending series for a composite plus scale reconstruction. It provided an easy method for the scientists to choose which data makes the best hockey stick. It was very simple to demonstrate the completely bogus and biased selective choice of information which was in my opinion done with the intent of creating a false signal. This blog has been quite vocal about the intent issue win Mann08 and now we are faced with the same kind of result in Mann09. Yet they didn’t use screening this time.

The last time around it was easy to demonstrate the chucking of data that didn’t fit the pre-determined conclusion. The practice was so crystal clear that most in the public could figure it out. Unfortunately the nearly useless media completely refused to pick up on it and advocate scientists at Real Climate continue to defend the practice. Now with the disclosure of these emails, we’ve seen how Mann and Jones operate and perhaps this will get some attention. In the meantime, the team has moved on to a new style of reconstruction. It’s a bit more clever and far more difficult to explain.

Here are the northern hemisphere reconstructions as presented in Mann09:

Fig. 1. Decadal surface temperature reconstructions.

Fig. 1. Decadal surface temperature reconstructions.
Surface temperature reconstructions have been averaged
over (A) the entire Northern Hemisphere (NH), (B) North
Atlantic AMO region [sea surface temperature (SST) averaged
over the North Atlantic ocean as defined by (30)], (C) North
Pacific PDO (Pacific DecadalOscillation) region (SST averaged
over the central North Pacific region 22.5°N–57.5°N,
152.5°E–132.5°Was defined by (31)], and (D) Niño3 region
(2.5°S–2.5°N, 92.5°W–147.5°W). Shading indicates 95%
confidence intervals, based on uncertainty estimates discussed
in the text. The intervals best defining the MCA and
LIA based on the NH hemispheric mean series are shown by
red and blue boxes, respectively. For comparison, results are
also shown for parallel (“screened”) reconstructions that are
based on a subset of the proxy data that pass screening for a
local temperature signal [see (13) for details]. The Northern
Hemisphere mean Errors in Variables (EIV) reconstruction
(13) is also shown for comparison.

This is what Mann09 has to say about ‘screening’.

Separate experiments were performed using a “screened” subset of the full proxy data set in which proxy records were screened for a local temperature signal based on their correlations with co-located instrumental data. These and other details, including sources, of the proxy data are provided in ref. S1 [note: a recent correction was made to the details of the screening as described in ref. S1. Due to an “off-by-one” error in the degrees of freedom employed in the original screening that has been brought to our attention, the critical p values used for screening decadally-resolved proxy data are actually in the range p=0.11-0.12 rather than the nominal p=0.10 critical value cited. This brings the critical p value closer to the effective p value used for annually-resolved proxies (nominal value of p=0.10, but effective value actually closer to p=0.13 owing to the existence of significant serial correlation in many of the annual proxy data). It is worth noting that the precise thresholds used in the screening are subjective and therefore somewhat immaterial—our use of statistical validation exercises provides the best test of the reliability of any data screening exercises.

In this study, the use of the full “all proxy” data set is emphasized, as this yields considerably longer-term evidence of reconstruction skill. “Screened proxy” results are only provided for comparison. All data used in this study are available in “SOM Data.”

It seems that perhaps Michael Mann paid a little attention to the ease of criticism of the previous 08 bogus methods. So how did they get the same results as those that use screening? (click on Figure 1 pane 1 to see the difference).

The reason for the same results is simple actually but will be more difficult to show. Before we get too far, notice that the instrumental portion on the far right of Figure 1 has no information from the proxies displayed. It’s HadCRU data only. This makes it difficult to visually grasp the quality of fit of the trees to the line. I checked the online results to see if the proxy info for the instrumental period was available and it is also not presented. Therefore the entire blade in this case is the instrument series and it’s up to the reader to have faith that the correlation numbers prove trees and various stuff are temperature. It’s also worth mentioning that the Luterbacher series which was actually instrumental data was left out of this paper, although it’s left in the 1209allnames.xls data file from the SI. (all correlation screening numbers are the same as the original). — Luterbacher was another McIntyre criticism which apparently was accepted by the team without acknowledgment.

The original Mann et al (S1) proxy dataset also included 71 European composite surface
temperature reconstructions back to AD 1500 based on a composite of proxy, historical,
and early instrumental data (S5). These data were not used in the present study, so that
gridbox level assessments of skill would be entirely independent of information from the
instrumental record.

I’m getting off track a bit though. From the Steig et al. paper we have more than a passing familiarity with RegEM which was again used in this paper. In the original 08 paper RegEM was used to paste information (blades) on each series and infill missing data up until 2006. This is again the case here. We have all the same goofy proxies which look as ridiculous as this:

Figure 2 - Pink is pasted on fake data.

or this:

Figure 3

The Briffa hide the decline data are all present as well. So in this paper these proxies which have been infilled with a hockey stick blade are then run in RegEM which becomes a multivariate regression against gridded temperature to create the new hockey sticks above. Multivariate regression is a fancy word to describe an attempt to weight all of the series and create the best possible match to the temperature data (an upslope).

The math (simply) can be described like this.

Output = c1* proxy1 + c2 *proxy2 + C3 * proxy3……..C1138*proxy1138

The regression determines the best values of c. C can be positive or negative and is determined based on a best possible fit to temp. This is the important bit now, in original Mann08 as well as ‘screened’ version here, correlation was used to eliminate data. The elimination is equivalent to setting c =0 for the screened out proxy. In multivariate regression, the correlation with the gridded data determines the weighting of that series to provide best shape of gridded data. In other words data with a poor match will receive very low weighting or even negative weighting as in the case of Tiljander.

The math is the same thing except that it’s analog screening rather than digital.

In the 08 case data is eliminated a-priori through screening (Digital data scrapping) in this case it’s deweighted with a multiplier based on its shape (analog data scrapping).

It’s more difficult to demonstrate the preferential selection in this Analog case, but the nice thing about this reconstruction is that in the instrumental calibration period there are no missing values in the series. So all series weighting will be like the equation above — i.e. a single constant times each proxy all added together. We should be able to replicate the process, provide the weights from the B weighting matrix and create some plots which show that the process is another version of preferential selection of hockey sticks.

My sincerest thanks to Michael Mann and team for again providing a unique crossword puzzle which will provide much entertainment for the coming weeks.

I would like to know who reviewed this, there are some reveiws in the emails which if they can be attached to this Mann paper — we’ve found some new team members. The reason I question is simply a matter of taking the time to check again.

20 thoughts on “Mann09 Analog vs Digital

  1. Output = c1* proxy1 + c2 *proxy2 + C3 * proxy3……..C1138*proxy1138

    I just want to make sure… you’re absolutely certain this is all there is to his model?

  2. RegEM code uses a B matrix to weight the proxies. The B matrix has the ability to change the wighting structure by annual value if one series or another is missing. If all series are there (as in the calibration period), the weightings are as simple as eq 1.

    The paper is then a form of regularized regression against a 7 pc version of the temperature data. The problem cannot be backsolved because there is missing data in the early proxies so the B matrix changes weightings.

  3. I’ve been on a whole bunch of political websites over the last few days, right up to this morning. The Left and their myrmidons are spinning as fast as they can to defend this scandal – STILL! Keep up the good work guys! I hope you get the message out in an easy to understand way for the common citizen to comprehend.

  4. Jeff ID:

    The paper is then a form of regularized regression against a 7 pc version of the temperature data. The problem cannot be backsolved because there is missing data in the early proxies so the B matrix changes weightings.

    Well, when you break it down like, that it looks like a pretty horrid model. Basically each proxy is being modeled as a zero-memory system.

    For some proxies that may be true, but for tree proxies at least, I suspect they respond to short-and-long period fluctuations differently, which means you need to replace the constant with a series, e.g., instead of

    Output(yr) = c1* proxy1(yr) + c2 *proxy2(yr) + C3 * proxy3(yr)……..C1138*proxy1138(yr)

    (where I’m assuming the data are being indexed by year for illustration), you need something like

    Output(yr) =[c10 * proxy1(yr) + c11 * proxy1(yr-1) + c12 * proxy1(yr-2) + … c1n * proxy1(yr-n)] + …

    c10… c1n would be the (truncated) cross correlation function between proxy1 and its calibrating temperature series.

  5. #4, While you are correct that it would solve one of the problems, I would still be critical of it because it allows data to be weighted preferentially to fit the outcome. My whole problem with a multivariate datamasher in a proxy case is that it sweeps under the rug the fact that proxies are selectively chosen for an upslope.

    The noise in the proxies is preferentially selected to be an upslope in the calibration range and you necessarily get a reduced variance in the historic portion of the reconstruction.

    Guaranteed unprecidnosity.

  6. JeffID if it were me, I would compute the transfer function between the proxy and its local temperature over the calibration period, smooth it and constrain it to be physically realizable. In an ideal world, I would also have some estimate of self-noise of the proxy, and use that to construct the optimal Wiener filter (sounds a lot more complicated than it really is of course). One could then use that regularized transfer function (e.g.) compute the inverse transfer to obtain the coefficients.

    I also wouldn’t cut out high-frequency data (no low-pass filtering), and use the ability of the series to reconstruct known global temperature fluctuations. Mann 09 exhibits spectral structural:

    see here

    but it doesn’t appear to have much to do with the known periodic forcings. The cut off at 10-years is due to the low-pass filter applied to the the reconstruction.

    While that’s better than it did in Mann 08:

    see here.

    (Red line is GISS, blue line is Mann outside the reconstruction period.)

    For it’s problems, Loehle 07, on the other hand manages to retain most of the realistic spectral content seen in individual proxies:

    see here

    Here’s an example of an individual proxy:

    see here.

    (reference)

  7. Jeff — I reached this conclusion a long time ago myself. Steve McIntyre has been pointing out for years that all of these proxy reconstructions are simply weighted averages of the individual proxy values. You phrase it well, I think. In the studies that use a large number of proxies, the weighting is “analog”; in those that use a small number of proxies, the weighting is “digital” (binary, really — just 0 and 1) through the screening process.

    McIntyre and McKitrick, in their analysis of the original “hockey stick” (MBH98/99) found that the stripbark bristlecone pine series were weighted 390 times as much (in some steps at least) as the least weighted proxies, and very few others were weighted anywhere near as highly as the bristlecones. This, of course, meant that the bristlecone curve was passed almost intact to the final reconstruction.

  8. Ooooo . . . I like this one, Jeff. 😀

    Screening without screening. Beautiful. As soon as Antarctica is done, I believe it’s time to start taking this one apart. Shouldn’t be too hard.

    Amazing people buy this crap. Seriously.

  9. If it were me, I’d have talked to a few people who know about tree physiology, and a few more who know about the ecology of stands of trees. I’d have asked them for their views on attempting to use trees as thermometers accurate to 0.1 degrees. Then we’d have grinned, shared a few pints, and wandered off to spend our lives on a more fruitful idea. Really, the details of the arithmetic are secondary – all they show (had the authors been honest) is that their ambition is just as daft as any sensible man would have seen it to be in the first place.

  10. I’m an outsider who, as a reporter, is quite familiar with medical-research fraud, but climate change is out of my league. However, if there is a similarity between the two areas–and I believe there is–certain climate change lies can be boiled down to simpler statements for uninitated but intelligent readers.

    For example, PCR testing of gene fragments “amplifies” those fragments to a level where they can be more easily observed. But using PCR to ascertain levels of viral infection in the body doesn’t work, because the process amplifies rather than observes true viral concentrations (titer). Relying on PCR to test for active Swine Flu infection, for instance, is a joke. The whole field of PCR to definitively diagnose illness is misguided, useless, and a fraud.

    Likewise,I’m sure there are whole lines of climate change investigation which are bogus from the ground up, for various basic reasons, and while it is important to track down all the errors made along the way, it’s vital to attack the false foundations of these research lines. Bring the whole house down.

  11. I haven’t read Mann09 yet, but if what you’re saying is correct (and I have no reason to think otherwise), I appreciate your focus on the critical issue (as in #5). Weighting (or eliminating) noisy proxies on the basis of their fit to the modern instrumental temperature record will *always* create a false hockey stick. It doesn’t take a Ph.D. in statistics to see that, but every Ph.D. in statistics should see it as being trivially true.

  12. The proxy-weighting equation gets to one of the extraordinary aspects of Mann et al (2008) that the paleoclimate community refuses to acknowledge. From this post, it looks like this rears its head in Mann 09 in modified form.

    Mann 08’s Figure S8a shows the long-term Northern Hemisphere temperature anomaly reconstruction, as built from all screened proxies, and as built from all screened proxies less the 1 rightside-up Tiljander, the 1 ambiguous Tiljander, the 2 upside-down Tiljanders, and three others.

    In the original Fig. S8a (still at the PNAS website) and the twice-corrected one (at the Penn State website), the two curves are effectively identical.

    This must mean that either the 7 proxies as combined have extraordinarily good agreement with the other screened proxies, or that their weighting was near to zero.

    Four varve proxies, calibrated to the 1850-1995 instrumental record in a manner that is known to be nonsensical. Yet perhaps they show excellent agreement with other proxies prior to 1850? Even though two of the four are upside-down? This is impossible to believe. Thus, the reasonable conclusion is that the “CPS” and “CPS minus 7” traces coincide because the 7 are weighted near zero.

    This makes the once-corrected Fig. S8a (now removed from the Penn State website) particularly curious, because the “CPS” and “CPS minus 7” traces did diverge at a number of times. Thus, the Tiljander proxies must have had significant weighting, at least during the times of major divergence.

    You would think that moving from version 1 (same) to version 2 (divergent), and then from version 2 to version 3 (same) would be sounding alarm bells. Not so, it would seem.

    And if the curves are superimposable because the weighting of the subtracted proxies is near zero, Fig. S8a shouldn’t be there at all. The sentence “we removed 7 unweighted series and that didn’t change anything, duh” would have been much more informative.

  13. #15, Exactly. Isn’t it funny that they can do this in plain sight of the reviewers. It’s self proving that our contention about the new method is what we say. When the proxies are simply removed the result is the same therefore they had almost no weight in the first place.

  14. The damning thing is what that says about MBH 1998 – the hockeystick.

    The headlines were about “1000s of proxies! Comprehensive!” etc. The crucial part about all the wrangling with bristlecones is: the fact that they’re so heavily weighted necessarily means a whole lot of proxies are weighted right about zero.

  15. Good article, Jeff. I will study the paper and supplementary info carefully.

    # 9, Ryan – Yes, taking this one apart should be most interesting!

  16. Hi! Would you mind if I share your blog with my myspace group?
    There’s a lot of folks that I think would really appreciate your content.
    Please let me know. Thanks

Leave a comment