the Air Vent

Because the world needs another opinion

A repy to Dr. Jim Bouldin

Posted by Jeff Id on July 27, 2011

I’ve been gone for a while working on other things.  MikeN called my attention to a criticism by  Jim Bouldin, of my ‘probing’ of the hockey stick CPS methods. Since the Air Vent wouldn’t even be a climate blog, if it weren’t for Mann 08, it does seem important to address the criticisms by Jim.  As it is my blog, the cool thing is that I can shove comments right in the middle of his criticisms to point out the issues of disagreement.


Oh boy. BIG problem with Jeff ID’s point that you quoted above.

To summarize: He is arguing that a hockey stick emerges as an artifact of the method used for screening proxies to include in a reconstruction (with specific reference of course, to *Mike Mann’s* reconstructions). The cause of this artifact production is supposed by him to be due to the fact that: “…The series are scaled and/or eliminated according to their best match to measured temperature which has an upslope. The result of this sorting is a preferential selection of noise in the calibration range that matches the upslope, whereas the pre-calibration time has both temperature and unsorted noise.”

This statement is entirely *false*, and it is so on several levels (including use of poor terminology such as “sorted” to mean screened). Not only is it false, it shows a phenomenal lack of attention to the most basic of facts, as presented by Mann et al in their 2008 paper in PNAS, both in the main paper and in the supplemental material. To wit:

There were 1209 proxies (from some larger candidate set) that met three initial screening criteria, (such as minimum length of record and stated minimum correlation among the individual members at a given site). From these 1209 records, a nominal screening cutoff of p < .10 with either of the two closest instrumental temperature grid points, was established. (After accounting for temporal autocorrelation, this p value rises slightly to p < .128). If one assumes a positive relationship between ring measure and temperature (i.e. one tailed test), the expected number of sites meeting this criterion is: 1209 * .128 = 155. (If one assumes that either a positive or negative relationship might occur, which they do not, the number is half that, about 78.)

The actual number of sites that passed this screening: 484, or over 3 times the number expected based on chance alone, (i.e. assuming no relationship between rings and temperature, and using a one tailed test).

See, Jim has several misunderstandings.  The point he repeats from M08 is that Mannian correlation passed so many proxies, it couldn’t  be by accident and they must be truly temperature!! This is the same point proven wrong here so often. Jim doesn’t read here for sure.

First we can remember Tiljander which was simply flipped to improve the high number of correlated series, from memory, this counts for 3? of 484 series but it is worse than that.  Luterbacher, which included 71 series of ‘ACTUAL’ physical temperature data, also skewed the results – so subtract another 71 bs proxies from 484 as the portion of the proxy Mann checked for correlation to temperature – was actual temperature.

Don’t worry, I’m not done yet!

Furthermore, the mean correlation for sites with records that went back to 1000 AD was 0.33. The probability of getting an r value that high by chance, over a 150 year calibration period, for 59 sites, is very small indeed. Note that Mann et al pointed all but the last of these things out in either the paper, the supplement, or both.

In short, the probability of getting 484 sites that pass the p < .128 screening by chance, is very small, and his argument is utterly wrong. The only way it could be true is if somehow the temp-ring relationship magically arose in 1850 but didn’t exist beforehand, which of course is ludicrous.

It is about 120 series that would pass  if you accept Mann’s incredibly generous autocorrelation assumptions.  Far, far, higher if you do not. One of the most difficult scams of the paper is the determination of the correct autocorrelations to use for this particular correlation value.  Jim apparently accepts it with a hand wave and zero consideration but climate science isn’t known for statistical prowess. It becomes even more fuzzy when you realize that the autocorrelations of individual series are so widely different that several won’t even converge with R’s arima fit function.

I imagine you had no intention that this post would devolve into another paleoclimate debate, which in the minds of some, is of course perfectly synonymous with the hockey stick “debate”.

I note also that Jeff ID states in the thread you mentioned: “Even though I am certain this is one hundred percent correct, this changes little about the climate story. What it does do is make one wonder how math skilled individuals still refuse to acknowledge it.”

Well Jeff, maybe because it’s perhaps, patently wrong?

Lessee, 484 sites, 71 Luterbacher are actual temperature, so these clearly do not help support the NULL proxy by accident theory and can be subtracted from the 484 and 1209 total series.  We can’t forget the ‘high end’ climate scientists who decided the IPCC should ‘hide the decline’ of the latewood density data (MXD) and simply chop the data off.  Yup, chop the offending bit of flacidity off and give it an Enzyte style prosthesis courtesy of RegEM — The  Mann show. Scientifically speaking, we have now subtracted another 95 series of horsecrap data from the remaining 413 leaving only 318 series (at a maximum) that passed validation presumably using actual data taken from actual proxies.  NOPE, not so fast, this is climate science so we’re not done yet!!

Of all 1208 series, there were only a small fraction which were not artificially infilled prior to screening.  Yup, only a few percent gave the dog its wag.

Of the 484 series with enough Mannliness to pass correlation screening, 391 were infilled with fake data as plotted below.

With average data looking like that, even Bond would be proud.  Individual series often looked like this realistic piece of RegEM’d jewelry.  Imagine the pride you would feel presenting this piece of augmented data to your professor in college.

So with all but a few data series not artificially augmented, Jim Bouldin (PhD) has determined that little Jeff Id  BS, has no clue what he is talking about — thus the BS I suppose.

Well Jeff, maybe because it’s perhaps, patently wrong?

Methinks the good doctor needs to stop worrying which team he is on and look at the paper.  Even more importantly than the infilling, proper characterization of autocorrelation powerfully affects the ‘screening’ of Enzyte accredited series, as this post demonstrates.

Perhaps the good Doc thinks I need to take his matlab class 😉

All kidding aside, I can and have physically demonstrated  that the number of M08 proxies passing correlation cannot be stated to be biased toward having an upslope signal, let alone a temperature signal. It is a misnomer, promoted by duped reviewers and a scientific community sporting a lack of diligence and Jim has jumped two feet into the hole.  It may not be his fault as the paper is obtusely written with more puzzles than the New York Times, but skepticism should be the core of a good scientist.

Not a good start for our relationship.

More to come when time allows.

71 Responses to “A repy to Dr. Jim Bouldin”

  1. Jeff Id said


  2. Layman Lurker said

    From these 1209 records, a nominal screening cutoff of p < .10 with either of the two closest instrumental temperature grid points, was established.

    Jim thinks Mann 08 screened proxies by p rather than r ??

  3. Layman Lurker said

    It is about 120 series that would pass if you accept Mann’s incredibly generous autocorrelation assumptions. Far, far, higher if you do not. One of the most difficult scams of the paper is the determination of the correct autocorrelations to use for this particular correlation value. Jim apparently accepts it with a hand wave and zero consideration but climate science isn’t known for statistical prowess.

    Jeff, it looks to me like you should split the blockquote into two with your above comment in between – correct?

  4. kim said

    Heh, LL, I thought I heard two voices in that block quote, but couldn’t figure it out without your tip.

  5. HaroldW said

    Kim — Well, you wouldn’t expect Jim Bouldin to use the phrase “Jim accepts” — I mean, who talks of themselves in the third person, after all?

  6. Leaders of misinformation about CO2-induced global warming are getting frightened that they may now be held responsible for the impending collapse of Western economic and social order.

    Their roles are summarized in this history of events from 1945 to 2011:

    Click to access 20110722_Climategate_Roots.pdf

    With kind regards,
    Oliver K. Manuel
    Former NASA Principal
    Investigator for Apollo

  7. Jeff Id said

    #3 I fixed it and added a graph.

  8. AMac said

    “Infilling” doesn’t seem like the right word, at least for some cases. What the graph of Zhang_1979_Yellow seems to show is the use of RegEM to generate values for the proxy from ~1925 to ~1995. That would be “extrapolation.”

    This is also the case for the four Tiljander data series. Her paper says she didn’t measure varves later than 1985, the mud at the top of the core wasn’t as consolidated. So in Mann08, the latest values, 1986 to 1995, were supplied by RegEM. Again, extrapolation.

    It seems questionable to use extrapolated data in a correlation-based study. It seems particularly questionable to use it in the *calibration* and *validation* steps. The extrapolated data accounts for about 6% of the entire correlation period, 1850-1995. But it’s about double that percentage for the Late parts of the two Early/Late screens.

    Reliance on extrapolated series must obviously decrease the accuracy and precision of the resultant recons (compared to reliance on actual data). I don’t see that estimates of the recons’ errors take this into account.

    By the way, here are the four Tiljander data series. They aren’t actually entirely independent of one another, so it’s inappropriate to use them all as separate series. (I’m pretty certain that was an honest mistake — one of the disadvantages of the proxyhopper approach.)

    lightsum – Tiljander interpreted thicker to mean colder, snowier winters, pre-1720. Mann08 interprets it as Thicker means Warmer.

    darksum – Tiljander interpreted thicker to mean warmer, wetter summers, pre-1720. Mann08 interprets it as Thicker means Warmer.

    thickness – Tiljander didn’t interpret this (it’s lightsum plus darksum). Mann08 interprets it as Thicker means Warmer.

    XRD – Tiljander interpreted higher absorbance to mean colder, snowier winters, pre-1720. Mann08 interprets it as Higher Absorbance means Warmer. XRD’s correlation to temperature wasn’t high enough to pass screening.

  9. Andrew said

    A p value of .1 is not particularly impressive and is not the usual “standard” for statistical significance. Why would that be the supposed cut off?

    Also, note that one average the the proxies claim to have an r of .33, this is an explained variance of less than 11%, sounds to me like most of these proxies very little statistical relationship to temperature, much less any physical relationship. And considering these are the proxies chosen for correlation, it’s even more pathetic.

  10. Steve McIntyre said

    Don’t forget about Mannian pick-two correlations – he permitted correlation to be to an adjacent gridcell if that were more “significant” than the host gridcell. But calculated the benchmark on one gridcell. 🙂

  11. AMac said

    Re: Steve McIntyre (Jul 28 09:34),

    How did pick-two work? (That could only have been for EIV, as CPS used a global average IIRC). If you have a given 5-deg by 5-deg gridcell, then you have four adjacent gridcells (N, S, E, and W). What was the rule for picking which one to correlate to, if r was higher than for the host gridcell? E.g., “pick the one that the proxy’s lat-long position is closest to”?

  12. Steve McIntyre said

    Don’t forget about Mannian pick-two daily keno correlations – he permitted correlation to be to an adjacent gridcell if that were more “significant” than the host gridcell. But calculated the benchmark on one gridcell. 🙂

    See a contemporary CA post on this linking to some other posts (one of this is the first mention of tAV). See

    Plotting the pick-two correlations by “proxy” type shows the problems:

    Bouldin is totally wrong if he presumes that the Mann et al 2008 484 prove anything at all about the ‘significance” of the proxies.

    The problems were immediately identified in the critical climate blogs. While the “community” has no obligation to pay attention to the climate blogs, if the errors are readily observed by the blogs, then they should also be promptly observed by the community and someone in the community should have reported the problems in the academic literature by now so that people like Bouldin don’t treat such claims as having any scientific meaning.

  13. Amac don’t forget that the use of this extrapolation, (it is called infilling BTW) in the calibration and validation steps gives unknown properties to confidence iintervals and autocorrelation. Besides, plotting a X over part of the Y and then claiming to know r, R^2, or CE is a claim that was not supported. It way be supported but who knows?

  14. Steve McIntyre said

    AMac, we mostly figured it out at the time. Here’s what I recall – without checking the scripts and notes in detail. The second gridcell was the “closest” adjacent cell, based on the deemed location of the proxy. What happened in the case of ties e.g. a proxy in dead center of the gridcell as happens, or a proxy on a 4-corner? Jean S showed, as I recall, that the answer was Matlab specific and sometimes turned on the 16th digit. Another oddity that turned up in trying to figure out trick two daily keno was that Mann’s grid was 1 degree off from a standard 5×5 grid i.e. the center would be 53.5N not 52.5N. It’s hard to think that there was any reason for this other than programming error, but it affected pick-two results in detail, though probably not in aggregate (but you can never tell with weird Mannian decisions,) I classified this “method” as a “stupid pet trick” i.e. a method that had no rationale but probably didn’t matter.

  15. Today as the United States approaches the edge of social and economic disaster, it is intriguing that

    1. A mainstream climate scientist is under investigation for possible manipulation of observations:

    2. Nature magazine attacks the Heartland Institute for “letting the cat out of the bag”:

    Nevertheless, the US economy must now be brought under control to prevent total loss of our constitutional form of government. That is the difficult choice we face today.

  16. Mark T said

    Sooo… What does the good Dr. have a PhD. in, numerology?

  17. Andrew said

    12-It seems to me not even that the community should have pick up on this problem after the fact, the reviewers should have pick up on it beforehand!

  18. Kenneth Fritsch said

    Jeff ID, thanks much for this post and reminder of your original analysis of Mann (08). I have been delving into several Mann et al reconstructions, in an attempt to model the proxies with ARIMA or ARFIMA, and the “adjustments” that you refer to in this post stood out as a red flag to this skeptic on review.

    Nick Stokes at his blog did some interesting graphs of reconstructions using some R code to animate an individual series against a background of all the reconstruction series. In some he removed the instrumental part that is often tacked onto the end of the series per my request. The appearance with the instrumental blade at the end is very revealing in my view. What I have found further with looking at individual proxy series is that these series look very much like a mixture of white and red noise and without a HS shape. I can model these series and using the derived ARFIMA parameters to simulate a long series in replicate. The replications match well the actual proxy series and can show rather extended length runs with “trends”.

    I have downloaded all 1209 proxies of the Mann (08) reconstruction and graphed the individual series and am in the process of modeling them all. (What I found in modeling the M (98) was that individual proxies fit different ARFIMA models and had fractional d values that varied between 0 and 0.5). When you work this closely with the proxies what is striking is that attempting to integrate those series into something that even faintly resembles a HS is difficult to visualize.

    With the Schweingruber MDX series Mann et al simply lop off the proxies at 1960 and without much rationalization for that drastic action. Remember also that in Mann (98) the North American TR series PCs were adjusted for showing unrealistic growth in the latter parts of the series. The adjustment was based on another TR series. The rationale for the adjustment was that NA TR PC showed growth that was conjectured to be caused by CO2 fertilization in trees at high altitudes. That growth leveled off in the latest part of the series and this was attributed by the authors to a “saturation” effect. The authors ended up stating that regardless of the exact mechanism the growth was unnatural and with anthropogenic causes.

    If, indeed, the proxies are more or less merely a combination of white and red noise than the tacking on of the instrument record at the end is horribly misleading. What is needed then is that proxy records be updated (using the same sites in an extended series) so that we might have a proxy record all the way to present without the instrumental record interfering.

    Another interesting development from Mann (08) comes from the differences in validating proxy series in the calibration/verification instrumental period that comes with the further back the proxy series dates – with the longer series showing poorer validation. In my view, since there are far fewer long proxy series to “select” from and thus the “selection” process has to settle for less “selection” with the long series. This development also brings to mind the need to see a study were the criteria for selection is made a priori and then look at the reconstructions and validation that process yields. (defintely flawed samples that can be readily explained would be the only culling allowed).

    I have linked an article by Briffa and Cook that spells out many of the problems that many of the so-called skeptics have discussed over time. On careful reading, I think much of the arguments that weaken the cases made by some of the consensus climate scientists come from that group whether intentional or not.

    Click to access tree-rings.pdf

  19. Kenneth Fritsch said

    “Jim thinks Mann 08 screened proxies by p rather than r ??”

    LL, this apparent interchange of p-value and r has bothered me, but on my latest read of Mann(08), I believe they relate p–value to r by way off the degrees of freedom used to calculate p and r. I assume this calculation is possible although I have been too lazy to do it myself. The Mann (08) authors appear to equate a p-value of 0.10 with the reduced degrees of freedom due to AR1 to an r value of 0.11 – as I recall.

  20. Carrick said


    Sooo… What does the good Dr. have a PhD. in, numerology?

    No, I think it’s a Ph.D. in Inane Ridicule.

  21. Mark T said

    Ah… Even better.


  22. Tom Gray said

    In regard to the claim about a valid proxy being determined by correlation, has anyone tried to match the total list of candidate proxies against other curves.. If one would match the candidates against say a flat line with a slight downward slope at the end, what would the overlap be between the proxies that Mann accepted and those.

    It seems to me that Mann’s test just looks for proxies with a rising slope at the end. So with a rising slope at the end and with a variety of shapes before that, deriving a hockey stick would not seem to be very surprising.

  23. Denier said

    #12 & #16

    Jim Bouldin, PhD
    Research Ecologist
    Department of Plant Sciences, UC Davis

  24. Denier said

    Twentieth Century Changes in Forests of the Sierra Nevada Mountains by Jim Bouldin B.S. (Ohio State University) 1982
    DISSERTATION Submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY
    in Plant Biology


  25. Denier said

    Quote from the good Doctor’s website:

    ‘I didn’t go into ecology so I could be a statistician or programmer’

    Doesn’t seem to stop him being dogmatic about both!

  26. Jan v J said

    Hey, Jimbo – you’re neither.
    I suppose that’s the good news!

  27. Mark T said

    Perhaps he went into ecology because he was incapable of doing statistics or programming?

  28. uc said

    I think I tried Ebisuzaki’s method ( ) some time ago, here :

    It seems that only

    1070 tornetrask
    1061 tiljander_2003_darksum

    survive this test in the early steps. At AD1400 step there following proxies get through:

    271 ca630
    272 ca631
    287 cana106
    314 cana175
    362 co556
    397 fisher_1996_cgreenland
    424 gisp2o18
    628 mo037
    654 mt110
    778 nm560
    796 norw010
    820 nv516
    908 schweingruber_mxdabd_grid1
    909 schweingruber_mxdabd_grid10
    910 schweingruber_mxdabd_grid100
    920 schweingruber_mxdabd_grid11
    927 schweingruber_mxdabd_grid12
    933 schweingruber_mxdabd_grid18
    934 schweingruber_mxdabd_grid19
    935 schweingruber_mxdabd_grid2
    936 schweingruber_mxdabd_grid20
    937 schweingruber_mxdabd_grid21
    938 schweingruber_mxdabd_grid22
    959 schweingruber_mxdabd_grid42
    960 schweingruber_mxdabd_grid44
    961 schweingruber_mxdabd_grid45
    977 schweingruber_mxdabd_grid6
    988 schweingruber_mxdabd_grid70
    1002 schweingruber_mxdabd_grid89
    1070 tornetrask
    1061 tiljander_2003_darksum
    1104 ut509
    1122 vinther_2004_scgreenland
    427 haase_2003_srca
    330 chuine_2004_burgundyharvest

    Too bad images are missing due to server changes, I can check if I still have those.

  29. Generate a random cloud of 30 to 50 points in 2 variables and compute r: it is not hard to get r around .33 for such a meaningless set of data. If I am doing anything with data and get r.1 screening.

  30. uc said

    Another earlier comment that I think is relevant: ( )

    This method is published in high-quality journal, and used by sea level experts. Why wouldn’t Mann use it for testing significant correlations in proxy vs. temperature series? Because it gives critical r values that are in the range of 0.2 – 0.3 ?

    The CCE-Ebisuzaki figure is now here,

  31. oops: can’t use less than sign. If I get less than r=0.7 I am not too happy with the relationship.

  32. timetochooseagain said

    It occurs to me that one way to determine the common signal among proxies would be to correlate them with each other rather than instrumental temperature. Has anyone done an analysis of this? I would venture to guess the proxies have little relationship to one another, which is contrary to the idea that climate variation is spatially auto-correlated…unless of course the lack of correlation was because the proxies are mostly noise rather than signal!

  33. Jeff Id said


    A lot of memories from the old days.

    Lets see,

    Steve of course – pick two should never be left out of this discussion. I was saving your mining discovery for those who still might disagree.

    UC’s demonstration shows a ton of math designed to correct for the autocorrelation problems I mentioned with a no longer surprisingly zero real correlation result. I haven’t read the paper but my own basic work has given me the same impression <5% SNR.

  34. said

    Jim Bouldin felt the r vs p distinction didn’t change anything, as the two are closely related.

    I tested white noise against white noise, and found that they correlated to each other above the level of .1 10.6% of the time. This is the level of correlation that Mann used, .106. He used .128 for the 100 year periods.

    Jim also mentioned that proxies were selected only if they had .5 correlation among the cores. I’m confused as to how Yamal was included since the core data wasn’t available yet.

  35. Mark T said

    For the record i do not give a hoot what his degree is in nor whether he even has one… It’s just that alarmists make such a big deal about pedigree the irony of the pedigree in here just drips.


  36. Novick said

    Jeff, is there any particular reason you used different correlation levels than Mann in this post?

  37. AMac said

    On just about any topic, I find that I have a harder time engaging constructively when personal remarks are directed against me. And that’s any old topic — and the subject here is climate science!

    I assume it’s not just me.

    Coming into the discussion of Mann08, Jim Bouldin has made a real effort to be civil. Perhaps not with 100% success, but even the attempt is more than most parties can usually manage.

    It’s very rare to get an actual dialog concerning Mann08 that is non-inane. And who knows, maybe tongue-biting will turn out to be contagious. That’d be good, for people on all sides of the various divides.

    Just a suggestion.

  38. Jeff Id said


    Yup, laziness.

  39. Jeff Id said

    Sorry Amac, M08 is a sore spot with me.

  40. AMac said

    Re: Jeff Id (Jul 28 21:20),

    In the post, you wrote “Luterbacher, which included 71 series of ‘ACTUAL’ physical temperature data, also skewed the results.”

    I thought “Luterbacher” was a set of tree-ring data. It wouldn’t seem to make much sense to invest a lot of significance in the correlation of “calculated gridcell temperature anomaly” with “instrument-measured temperature.”

    Could you point me to a publication or post that elaborates on this point?

  41. Jeff Id said


    I’m very very busy and would love to be spending my time in this but Luterbacher’s info is linked in M08 credits. The treering data had actual temp data pasted on in the correlation years.

  42. Jeff Id said

    There are certain things that those who don’t understand the history of this need to grok. First, technical blogging is an incredibly time consuming and reading intensive task. Anyone who thinks SteveM doesn’t spend hours and hours looking into a technical post, isn’t quite understanding the complexity of the post itself. It is a puzzle, it is entertaining and people like Mann make it more fun. From my perspective, when I started blogging on climate, not only did I need to learn the nuances of paleoclimatology ‘proxies’, but also the statistics. You can literally watch the learning from my early posts to my more recent.

    Today, I can rip through a climate/stats paper in literally one tenth the time it used to take. I’m certain that the pros are far better at it than I at parsing standard climate terminology. So when paid pros come along and claim that somehow I’ve missed the single most basic claim of M08, they certainly will need to explain themselves. In my opinion, any experienced paleo-scientist knows within minutes whether M08 is a sensible paper or not. It ain’t a tough call.

    AMac has been a long time reader, but in my opinion, he really missed this point in 37.

    My less than reverent tone is the result of my blatantly greater understanding of M08 relative to the Doc. — Sorry if that sounds cocky but how would any reader here consider Jim’s poorly considered criticism after spending literally a couple of years of their lives deconstructing a paper, CPS, RegEM, proxy processing mathematics, assumptions and origins?

  43. MrPete said

    Re: AMac (Jul 28 21:42),
    Try here. 🙂

  44. Carrick said


    Coming into the discussion of Mann08, Jim Bouldin has made a real effort to be civil

    I don’t see that. I see plenty of snark and passive aggressive behavior.

    This for example is a “real effort to be civil”?

    Not only is it false, it shows a phenomenal lack of attention to the most basic of facts

    I don’t see how anybody could get their feathers ruffled over that, especially given the phenomenal lack of understanding demonstrated by Bouldin in his critique . 😉

    Nor do I really see even a 0.001% chance that Bouldin would ever admit any error to any paper that Mann has ever written. That just wouldn’t be politically viable. I’m not commenting on whether an ecologist has enough math/stats background to really grasp the issues, other than to say I am certain that Mann does not.

  45. Denier said

    #44 Carrick
    Plant Biologist.

  46. page488 said

    Well, I’ve been gone for two years and we’re still on tree rings. (Sigh!!!!!!!!!!!!!!!!!)
    I’m glad I’m not a statistician or I would be mad (in the psychiatric sense) by now.

    Water plays such a huge role in tree growth (structure, transport, and chemical reactions) that I don’t see how anyobody can correlate anything to anything without incorporating the known waterfall (rain) into the equations dealing with the knowns (width of tree ring and ambient temp.)

    Bristlecone Pine tree rings were only attractive to the AGW academic crowd because the trees live so damn long. But, water plays such a major role in their development, that to try to squeeze temp data out of their ring widths is rediculous. I imagine that’s why so much statistical manipulation is necessary – the data they want just isn’t there–or can’t be perceived by any reasonable method.

  47. Jeff Id said

    Yup Page, we haven’t talked about it much for years. Somehow the trees are still wreaking their ecological havoc.

  48. Jeff Id said

    #45, It is interesting that a plant biologist would be supportive of the treemometer concept. In my experience, they are typically the most skeptical. As Page wrote, too many other influences exist, many of which have a far stronger affect on growth than temp. We know this stuff, we also know that it is ignored blindly by the climatology community. Linearity assumed as Craig Loehle published on. And then there is simply bad math that Steve M has posted endlessly on.

    The one key everyone has always come back to is that it is simply bad data. UC demonstrated it, Willis Eschenbach has demonstrated it. Steve McIntyre has demonstrated it and even I have. It isn’t that strange that paleoscience which depends on that bad data for funding, continues to pretend that it is useful. That is actually the message in this post. Jim claimed that 484 series couldn’t be accepted by accident if the data were bad, without speaking for anyone else, many of us look at the math and flatly disagree.

  49. Steve McIntyre said

    It’s also all too characteristic that Jim Bouldin made statements about the 484 series without doing any due diligence of his own. And that he accuses people who have downloaded and parsed the data of a “phenomenal lack of attention”. The situation is the reverse. The climate “community” has shown a “phenomenal lack of attention” to the details and credulously accepted claims in Mann et al 2008 without any due diligence. They should realize that peer review by pals is no substitute for the sort of scrutiny that has taken place at the critical blogs and, rather than take umbrage at the extended peer review at the critical blogs, should thank people for spending the time and embrace it.

  50. #48 JeffID. You get the Reiterate the Obvious Award. A botanist who is an ecologist should not be posting as he did for not only what Craig, Steve, and UC demonstrated but, like you. I have a hole card. The card is the metadata of these species and their extant. Those familar with the climategate emails may recall the Russians metadata where they noted that particularily about 6000 years ago the subfossil tree line was 150 yards greater than present and that the MWP was about the same as today. An ecologist/botanist would note the micro niche described in the metadata and realize that if the Yamal etc from that region did not show a MWP as warm and a previous period warmer, then there is something wrong with the method. Most, especially, would realize that Craig’s peer reviewed article about the linear assumption problems was most probably being realized in the Mann 08 work based on extant of the sub fossils contrasted with a linear assumption.

  51. Kenneth Fritsch said

    “In the post, you wrote “Luterbacher, which included 71 series of ‘ACTUAL’ physical temperature data, also skewed the results.”

    I thought “Luterbacher” was a set of tree-ring data. It wouldn’t seem to make much sense to invest a lot of significance in the correlation of “calculated gridcell temperature anomaly” with “instrument-measured temperature.”

    Could you point me to a publication or post that elaborates on this point?”

    Amac, all this information is in the original Mann (08) paper where the authors classify Luterbacher as a historical proxy – of which the important instrumental period is – well instrumental. I think part of the problem of not picking up on these hard-to-believe revelations is that a science oriented mind has problems visualizing and immediately comprehending that these methods are being used. The truncating of the MXD Schweingruber series at 1960 and replaced with infilling is noted in that paper also. One almost needs someone like SteveM to point to these seemingly unscientific measures to assure oneselve that indeed that is what the authors did.

    As Jeff ID noted above once you are familiar with the use of these adjustments, and indeed the frequent use of them in some reconstruction papers, along with the somewhat vague writing styles, analysis of the works becomes significantly less painful and time consuming.

    Just by downloading all of the 1209 proxy series used in Mann(08) and graphing them in R, you will get a very different picture than that described in the Mann (08) text. The task of doing this was much easier than I first assumed it would be.

  52. Layman Lurker said

    #19 Kenneth Fritsch

    Thanks, Kenneth. I am reading through the paper and SI (and later on to Jeff’s posts) for a refresher on all this stuff again.


    Kenneth, FYI Mann discusses his use of Luterbacher on page 2 of the SI:

    As some (European) instrumental information from Luterbacher et al. (3) was incorporated back to A.D. 1500, statistical validation exercises are not entirely independent of the instrumental record for networks that include these predictors. To gauge any potential artificial inflation of skill in reconstructions using these data as predictors, separate validation experiments and reconstructions back to A.D. 1500 were also performed without using these predictors. The details of the early and late validation exercises for all of the various alternative predictor networks (e.g., ‘‘all proxy,’’ ‘‘screened proxy,’’ ‘‘frozen A.D. 1000,’’ ‘‘no tree-ring,’’ ‘‘no Luterbacher’’) and target instrumental series (NH, SH, and global, land, combined land plus ocean, and using both CRU and ICRU instrumental series) are provided in supplementary spreadsheets (‘‘eiv-validation.xls’’ and ‘‘cps-validation.xls’’).

    Here is the spreadsheet of the CPS validation statistics (including “no Luterbacher”). The “no Luterbacher” sensitivity obviously gets wrapped into the Tiljander/dendro validation issue again.

  53. Kenneth Fritsch said

    LL, do you find it frustrating and puzzling that the authors would not show a reconstruction without screening, Lutebacher, upside down Tiljander, and with or without the TR series and with Schweingruber MDX series with the truncated part back in place? When they remove, one at a time, the “offending” proxies, we get what SteveM refers to as moving the pea under the shell routine.

    I will give Mann (08) credit for distinctly showing the “divergence” of not only TR reconstruction but also reconstructions without TR and even mentioning the non TR divergence in the paper. It is best seen in the Mann (08) SI where the instrumental period is magnified for easy viewing of the divergence and then again with the obscuring instrumental record tacked onto the reconstructions with the entire period. Without the instrumental record tacked on the reconstructions and proxies look much more like combined red and white noise series. Mann (08) comes a long way away from the original reconstruction showing the HS and I find a lot of information tn this paper that weakens the case for using many, if not most, of these proxies for reconstructions of past temperatures.

  54. Kenneth Fritsch said

    LL, and, of course, what you excerpted from Mann (08) does not bear on the inclusion of the Luterbacher series in the number of series that had a p-value/ r above the threshold and the expections of what would be found without a temperature/proxy relationship.

  55. MIkeN said

    How did Mann get temperature history for non-standard gridcells?

  56. uc said

    “i.e. assuming no relationship between rings and temperature” + assuming independence between rings? Not sure what how this should go, anyway, this is what my replication shows:

    gl high-screened: 413 pass
    gl raw-screened: 483 pass
    nh raw-screened: 420 pass
    nh raw-Ebisuzaki-screened: 355 (71 lut, 85 schweingruber) pass

  57. Jeff Id said

    #56 UC

    so 355 – 71 – 85 = 199 of the 1209 series pass correlation to temp. 16% using pick two methods on a correlation which should accidentally pass 10% with pick one.

    Not too good IMO.

  58. Layman Lurker said

    Mann states circa 150 proxy series would pass screening “by chance alone” but I cannot find anything stating his assumptions about the type of noise which would generate that number. In order to determine the type of noise assumptions in Mann’s estimate, I generated 1209 simulations of annual values over 133 years (1878 to 2011) and standardized – then correlated to standardized mean annual hadCrut values. My intent was to start with white noise then introduce progressively higher AR1 coefficients until I matched Mann’s number. I didn’t have to go any further than white noise with 149 simulations >= to 0.1 correlation coefficient. The Histogram is here.

  59. Carrick said


    How did Mann get temperature history for non-standard gridcells?

    He used teleconnection functions.

    You know, the type where money gets extracted from your wallet, collected in Washington, then dispersed to Mann and collaborators.

  60. j ferguson said

    There must be something to this teleconnection business that I’ve missed. Crudely, correlating ring width to temperature records somewhere else but not necessarily local seems preposterous on the face of it. Maybe like correlating to ring heights in bathtubs in Siberia.

    If this is what Mann actually does, this thing alone, (assuming I haven’t missed its subtlety) should have been enough to drain his swamp.

    I am so much reminded of the “Magic Grits” in “My Cousin, Vinny”

  61. Jeff Id said

    #60, Nope, that is exactly it.

  62. MIkeN said

    Layman Lurker, Mann used .106 correlation.

  63. Layman Lurker said

    MikeN, I changed my correlation cut off to 0.106 from 0.1 and did 100 runs of 1209 white noise simulations and another 100 runs of 1209 simulations with AR1 coefficient of 0.1. For white noise the average of 100 runs was ~ 138 >= 0.106. For AR1 of 0.1 the average was ~ 161. For AR1 of 0.3 the average was ~ 214.

    My instrumental data was n=133 ending in 2011. IIRC, Mann’s was n=145 ending in 1995. Plus he ultimately correlated each proxy to individual grid cells, though it’s not clear how the 150 “by chance alone” number was calculated.

  64. Anonymous said

    By any chance is Mann’s number 155? This is 1209*.128 which is the correlation Mann used for early and late 100 yr calibrations.

    I used the same HadCrut values as Mann and got .106 chance of having white noise correlate >.1, which appears to invert the p and r values.

  65. Kenneth Fritsch said

    Below are 4 links to the ARIMA models for some long Mann (08) proxies that I derived using the R function, auto.arima from library(forecast), that Lucia at Blackboard put me onto. In the function, I used the “bic” for the AIC criteria for selecting the best model along with the maximum ar and ma orders limited to 5 and the process looking at all the possible models within those combinations. I used bic because it penalizes more the model selection for using added ARIMA orders of ar and ma and I noted that using aic for a criteria would sometimes give ar orders out to 9 – even though the differences from 5 to 9 were did not probably have statistical significance.

    The 90 proxies selected were the longest (more than 999 years) during the time period from 2003 AD to 1 AD. I believe all of them have fairly complete data in the 1000-2000 time period and some are extended back into the 1000-0 millineum. A very few of these series had less than annual resolution and were interpolated in the Mann (08) data. That interpolation could affect the ARIMA model results as an artifact for those proxies.

    What becomes very clear is that the proxies vary greatly from one to another for the ARIMA model selected and that if one were to determine the expected number of chance correlations above a given threshold with temperature over the instrumental period one would have to look at individual proxies or, at least, in several groupings of proxies.

    Obviously if the proxies were responding strongly to similar temperature effects these proxies would be expected to have very similar ARIMA models.

    My next step is to model these proxies using ARFIMA models. Unfortunately I have not discovered an R function like auto.arima for ARIMA models to find the best ARFIMA model and thus the process will be a bit more computationally involved. I have been reading papers on using a Bayesian approach to compare ARIMA to ARFIMA models, but I am not there yet.

  66. Layman Lurker said

    Kenneth, after looking at your charts, I wanted to look at case examples relating proxies to grid points. I went to the NOAA site for M08 and downloaded the “original” proxy data. I picked out az510 because I found that KNMI had grid temperature data for the az510 lat long coordinates. After loading the data for az510 I decided to run auto.arima (on both raw and standardized) according to your constraints as a check (p=5, q=5, ic=”bic”) but I am having problems reproducing your numbers. Am I missing something or using the wrong data?

  67. Kenneth Fritsch said

    LL, I used stepwise=FALSE. That way you select from all the possible combinations of the p and q limits. Stepwise leads you down a path stepwise to the smallest BIC (or AIC or AICc) and I think can miss some smaller BICs in the process. I noted this difference with Lucia at the Blackboard. The R default for stepwise is stepwise=TRUE, so you will have to specify it in the command. I checked my calculation on az510 and it repeated, but was different when stepwise=TRUE.

    Did you get: ar1 ma1 ma2 intercept
    0.921 -0.4234 -0.1014 992.0845

  68. Layman Lurker said

    Thanks Kenneth. The stepwise = FALSE made a difference. I also discovered I started on row 3 instead of row 4 therefore I included the proxy id number as data. The run now reproduces the coefficients of your chart.

  69. Kenneth Fritsch said

    LL, I have attempted several different series using stepwise= TRUE and then stepwise=FALSE and have obtained different ARIMA models and models from the log likelihood and AIC or BIC values that are different and significantly different. In other words, the stepwise=TRUE, which is there to save on computations, can miss selecting the best model. Another feature of auto.arima is trace. Put trace=TRUE and the output for auto.arima will give a trace of all the models calculated along with their AIC scores.

    By the way, I want to make sure that I credit Jeff ID with being the one that I first noticed pointing to the wide distributions of AR1, as I recall, for the Mann 08 proxies and analyzed and showed how this would well give different results than where the Mann authors simply suggested an average AR1 for decreasing the degrees of freedom for estimating the expected number of proxies that might be selected by chance for a given p-Value. Jeff showed the AR1 distribution in a histogram.

    I have been attempting to keep track of the number of proxies that can be deducted from the 484 that the Mann (08) authors used for their selection number. The Luterbacher instrumental and historical series deducts 71, the MDX Schweingruber series cut-off at year 1960 forward deducts another105 by my count, the Tiljander series another 4 and the pick 2 selection of an adjacent grid to obtain the threshold p-Value another not calculated but not small number of proxies. The pre-selection process also had already lopped off proxies for reasons not detailed in Mann (08). Of interest is the criteria for TRW series the Mann authors used that requires that the samples used for a proxy have an intra sample correlation of at least 0.50 for the period 1750-1970. I saw no rationale for that selection process.

    And, of course, we have to doubt the reduced degrees of freedom that the Mann authors used for the expected number of proxies meeting the criteria by chance.

    In my view a true scientific analysis of TRW would require finding a reasonable and scientific criteria to use for a selection process that is determined a priori. Using such a process would then be “tested” against the instrumental record. Also out-of-sample testing of the currently used legitimate proxies by bringing them up to data with further measurements on the proxies already selected would reveal much in the way of determining whether the proxies are noise or they contain a temperature signal and how strong/weak that signal might be.

    That I have not seen attempts in the climate science literature to accomplish the two objectives presented above and that climate science authors make calculations like the 484 in Mann (08) makes me skeptical of these proxies and their authors.

  70. Kenneth Fritsch said

    I have now fitted ARFIMA models to the 90 long Mann(08) proxy series and used 27 of these models to estimate the expected number of proxies that would have a 0.1 or less probability of correlating with the local grid temperature during the 1850-1995 period by chance. Recall that Mann(08) found 484 of a total of 1208 proxies that fit in this probability range (with consideration for decreased degrees of freedom for AR1), by using 71 instrumental proxy series (Luterbacher) , 4 upside down Tiljander proxies (actually that were not reacting to temperature in the period of interest) and 105 MDX series (Schweingruber) that were arbitrarily cut-off at 1960.

    The results of 1000 replications for fitting to an ARFIMA model for the 27 proxies are listed in the table in the link below. The R function cor.test was used with a one sided test (positive correlation) and the result for the p-value was extracted.

    The table shows that with ARFIMA models approximately 31 % of the all the 27,000 replications gives p-values of 0.1 or less. If we subtract out the Luterbacher series from the Mann(08) calculation in the denominator and numerator, we have a comparable percentage of 36 %. Using the SteveM link to a CA thread below we can see that the MDX proxies all passed the P less than 0.1 test. The question of how to handle this series in view of the arbitrary cut-off probably needs to be something different than merely eliminating its results from the denominator and numerator – although that would allow a sensitivity test for the inclusion/exclusion of that series. Here I will give the MDX series that same success rate as the remaining series minus Luterbacher which in effect removes 64 from the numerator and yields a percentage of 30%.

    We can obtain a success rate very much in line with that for a legitimate M(08) selection process by using ARFIMA models with long term persistence – and without adjusting the Mann(08) rates for the pick 2 method applied in that paper.

    I have looked at ARFIMA and ARIMA modeled series based on the 90 long proxies and compared visually the three different series, i.e. actual, ARFIMA and ARIMA. These series have much the same appearance and indicates that an ARIMA series can handle some of the long trending excursions in the actual series as well as or nearly as well as ARFIMA does. I plan to do more of the same analysis I did here with more long series and using ARIMA.

  71. […] superaba un cierto umbral al usar alguno de los dos datos de termómetro más próximos (fuente,fuente), es decir, el árbol puede no ser sensible a una temperatura medida a 100 km de distancia, pero […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: