Sometimes they forget

So I left a comment at RC today bolded below.   The boys are stinging again because they are as good at PR as most engineers I know.

You know the lack of disclosure of data not used, is nearly equivalent to the regression methods which automatically reject data not preferred. The mere fact that the reconstruction with ALL of the data wasn’t published is not enough to counter the obvious possibility of pre-selection.

[Response: In any statistical analysis there is always a possibility of pre-selection to get a signal, or the possibility of trying different combinations until the signal disappears depending on what the conscious or unconscious bias might be. Yet the scientific literature is not full of people saying that other authors are deceptive or guilty of misconduct because they got a different result. No one can ever prove that they didn’t do a calculation, and ever more insistent demands that they must, are pointless. McIntyre is dead wrong here – both in his conclusions and his conduct. – gavin]

The sophistry here is that we have a history of post-hoc selection of methods (hide the decline), brow-beating of those with different results (the Trenberth travesty), and blocking of papers which refute results (many references).  Now we find that many more Yamal region proxy series were available than stated and a reconstruction from such a strong hockey stick temperature region (usually a 3 month project) has taken years to reach the public eye.    Unsurprisingly, now that a basic estimate was published by Steve McIntyre, the data doesn’t seem to support the six sigma Yamal trend. So, in context, the request is hardly unreasonable.

In addition, the problem here is that Gavin understands full well the regressomatic techniques of paleoclimate.  I flatly don’t believe he is too stupid to miss how the auto-enzyte algorithms work.  To sum up: The likelihood of Gavin’s misunderstanding of the probability of a paleoclimate regression (post-selection) to get a signal (hockey stick), is inverse to the probability of actually finding said signal.   IOW, he knows damned well how this works.

Of course even Steve’s obviously non-temperature result will work fine in a RegEM, TTLS, TLS, etc……   regression.  When doing multivariate regression, noise works fine for creating unprecedented temperatures.

166 thoughts on “Sometimes they forget

  1. Nullis in verba. How many skeptic sites go away when reasonable efforts are made to publish scientifically meaningful SIs?

    auto-enzyte– ha! I am not in the academic world anymore, but I have seen plenty of graphs likes these from Marketers, CFO’s and Financial Advisors. Climate Science is not special in this regard.

  2. “No one can ever prove that they didn’t do a calculation, and ever more insistent demands that they must, are pointless.”

    Unless, of course, they did do the expected calculation, and reported it.

  3. Auto-enzyte?
    I have no idea what that means. In the first page of Google results I get listings that would probably stop this post getting through to you if I listed them.
    Is that a typo? Or if not, what does it mean?
    I don’t ask without at least TRYING to find out for myself; I’ve tried and failed. Please help.

    1. Sorry.

      https://www.enzyte.com/

      The team uses math where it doesn’t matter if there is a real signal in the data you put in, you get an unprecedented uptick in recent years. See the hockey stick post link above. My tongue in cheek comment was basically in reference to a very old and long discussed problem with multi-variate regression of noisy data.

  4. In any statistical analysis there is always a possibility of pre-selection to get a signal, or the possibility of trying different combinations until the signal disappears depending on what the conscious or unconscious bias might be.

    So they recognize the problem in general but are blind to their own situation.

    “Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened.” — Sir Winston Churchill British

    Their methodology is no better than Dr. Rhine’s ESP experiments at Duke. Keep looking until you find the results that you’re looking for.

  5. I once worked for an instrument company, no longer in business, in which the final checkout of the instrument was to read three identical samples yielding a target standard deviation. When I investigated how they always were able to meet or beat the standard, it turned out they would take ten to twenty readings and select the three “best” readings – the ones that were closest together. Then, when the instrument failed to perform in the field per its specification, the installation engineer was to blame.

    It took me approximately six years to get the practice changed. That was really my mistake, I should have looked for other employment much sooner. However, I was young and ignorant of the ways of the world. I had assumed ignorance on their part rather than laziness or malicious intent. It ultimately turned out to be all three at the same time.

  6. The mere fact that the reconstruction with ALL of the data wasn’t published is not enough to counter the obvious possibility of pre-selection.

    Idungitit. That sentence appears to contradict itself into meaninglessness. Is there an extra negative in it, or something?

    1. Nope, it is right.

      The point of this whole (for me 3 year) discussion revolves around the intentional selection of data we like vs data we do like. If the “scientists” find data they do like and keep it, and the “scientists” discard data they don’t like, are they really being “scientists”.

      Hide the decline is as much a symptom as it is a result.

      In a nutshell, paleoclimatology has the appearance (no proof) of pre-selecting data which gives the best AGW story. Does anyone really believe Steve McIntyre is so stupid to ask for unpublished data for no reason? I don’t think so.

      My point is that paleoclimatology regularly ignores data they don’t like. They do it as a matter of standard practice. They intentionally write algorithms which do it and then they defend the algorithms based on the most statistically ignorant commentary anyone can imagine.

      How would it look if Phillip Morris ignored any data which supported the carcinogenec nature of cigarettes and only accepted data which showed improved health?

      in my opinion that is EXACTLY what ALL proprietors of Real Climate support as science.

      1. Pre-selection of preferred data seems worse because people looked at the data and made a call. Post selection is actually worse because it is a mathematically optimized version of selection.

  7. and Jeff – you forgot to say that they invent their statistical processing routines without any reference to the study of statistics. The “judgement calls” alluded to by Gavin are where the “science” lies and they are not presented in a way so that they can be discussed. Instead, focus on the mechanistic data processing.

  8. Jeff Id wrote,

    ” In addition, the problem here is that Gavin understands full well the regressomatic techniques of paleoclimate. I flatly don’t believe he is too stupid to miss how the auto-enzyte algorithms work. To sum up: The likelihood of Gavin’s misunderstanding of the probability of a paleoclimate regression (post-selection) to get a signal (hockey stick), is inverse to the probability of actually finding said signal. IOW, he knows damned well how this works. ”

    In other words, Gavin has shown he is NOT incompetant, and that he knows he is lying.

  9. Great column Jeff. I hope there can be some acknowledgement in the future.
    In the meantime, a post on some of the Great Canadians whose contributions to society have improved our world would be great. I can think of at least one off the top of my head- Wayne Gretzky.

  10. Someone pointed out awhile ago that one should not assume conspiracy when incompetence is a possible explanation. Of course they said it more eloquently then that. Anyway, despite the lofty attitude over at RC there is a degree of incompetence; and I concede some religious dogma in the group think. Any group that believes they are sufficiently self critical to not cherry pick, and then develops an algorithm to do exactly that, has demonstrated a fair degree of incompetence.

      1. Re: Anonymous (May 20 01:43),
        They have long been insufficient in this case. Direct challenges to specialist procedures by those with degrees and professional status have been brushed off, even when patently justified. It is not possible that they don’t “know what they do”.

        Malfeasance is now the H0.

  11. Gavin said: “In any statistical analysis there is always a possibility of pre-selection to get a signal, or the possibility of trying different combinations until the signal disappears depending on what the conscious or unconscious bias might be. Yet the scientific literature is not full of people saying that other authors are deceptive or guilty of misconduct because they got a different result. No one can ever prove that they didn’t do a calculation, and ever more insistent demands that they must, are pointless. McIntyre is dead wrong here – both in his conclusions and his conduct.”

    Gavin’s reply is utter BS pure and simple and here is why. When SteveM presented further data that appeared to contradict the Yamal data of Briffa he was doing a sensitivity test and a test that does not validate either proxy series but rather shows the extreme sensitivity of the selection process to the end result. Instead of dealing with the sensitivity test on the basis it was intended Gavin goes off on SteveM and accusing him of something he has not done. That is a diversion and not an answer. Gavin as an advocate sees SteveM as the enemy but only in advocacy terms.

    The silliness that Gavin shows in noting in his reply about never knowing about pre-selection of data in published papers is glaring in the face of what some climate scientists ignore or misuse and that is the sensitivity test. Those tests can show much about pre-selection and ask (lo beg) the question of the author: Why did you limit your selection to those items used in the publication. Further to his silliness is that a good scientists with any statistical background are much aware of pre-selection and go to great lengths to show and publish an a priori selection criteria and why after the fact outliers might have been excluded.

    When you read a paper that has not done sensitivity tests or not done the tests properly or has not carefully reported its selection rules and when the rules were applied or used arbitrary reasons for exclusions of data after the fact, you have every right to be very skeptical about the papers conclusions and to do your own sensitivity tests. These are the points that Gavin entirely ignores.

    1. And my Id is annoyed that even without pre-selection, the methods used create the same problem post selection. See, Gavin acts all self riteous about even the suggestion of selecting preferred data but that is exactly what the regressomatic auto-enzyte methods do.

        1. I should have added that if one decides to scale the proxy response by subtracting a mean and dividing by the standard deviation the regressomatic could be avoided and then we need to talk about pre-selection.

  12. “Yet the scientific literature is not full of people saying that other authors are deceptive or guilty of misconduct because they got a different result”

    Oh, the irony, in Gavin not realising why that is so; in particular the “..is not full of..” part.

  13. Jeff, the sad fact is just this:

    Two documents intended to advance mankind:

    a.) The 1776 US Declaration of Independence http://www.ushistory.org/declaration/document/

    b.) The 1945 United Nations Charter http://www.un.org/en/documents/charter/preamble.shtml

    Evolved into opposing versions of truth (reality):

    Science ceased being a tool to advance mankind and became a tool to control and enslave mankind with falsehoods and half-truths – like AGW, standard solar model, and oscillating solar neutrinos !

    Wise leaders would have protected and implemented both documents. Unwise leaders tried to achieve the objectives with deception.

    With kind regards,
    Oliver K. Manuel
    http://www.omatumr.com
    http://omanuel.wordpress.com/about/

  14. “Obviously non-temperature”? It wasn’t obvious to me until I looked at this Norwegian research report, linked from a comment in climateaudit: http://met.no/Forskning/Publikasjoner/filestore/Ealat_Yamal_climaterep_dvs-1.pdf

    The station Mare-Sale is on the peninsula proper. Steve’s chart doesn’t look very much like the temperature graphs for sure, also not much like the precipitation data. But in any case, there most interesting thing, perhaps, is that Yamal was slightly warmer in the period 1935-1950 than it has been in recent years. There even was slightly less sea ice in the Kara sea back then! (figure 5)

    So whether you manage to squeeze a “temperature signal” out of the Yamal larches (maybe it’s possible if using the temperature data for the right season and doing a multiple regression with snow cover, precipitation and other factors) or not, you’re still left with a divergence problem! A divergence between what they want to show and the reality, that is…

  15. Who would have guess in the fall of 2009 that the Climategate emails and documents and “official responses” to them would expose sixty-four years (2009-1945 = 64 yrs) of deceptive science to “save the world” from the threat of nuclear war? http://omanuel.wordpress.com/about/#comment-70

    Thanks, Jeff, for your role in this! It may take a while, but the outcome of this disagreement is decided.

  16. In today’s reply, Steve just sliced and diced Gav like a veg-o-matic. Which reminded me of Jim Croce’s song.

    “The only part that wasn’t bloody was the soles of the GISS man’s feet”

      1. Do you agree with Rashit? It seems to me that he jumped the gun awfully quickly. I could have replicated Steve’s result without even asking because he has done the same calculation on line so many times and it is typical of Steve to do the post and put the code up later. I’ve seen the same thing dozens of times.

        It looks more like Briffa was pissed that Steve got the data.

        1. Well, the code answers some of the questions. And it does run; I’ve checked. But whether it makes sense I don’t know. Rashit seems to think that Steve’s analysis was hasty. He knows about this stuff. I don’t. But I suspect Rashit knows it isn’t that easy.

      2. Well Nick, if you choose to jump on the wagon with KH, you end up in the quick sand with him. You’re surrounded by trouble and no way out. Should’ve looked more closely before you jumped on board.

        1. Well, when you write a post called “New data from Hantemirov”, and you’re running a campaign based on the claim that Briffa failed to use RH’s data, and you get a loud slapdown from the man himself, then I think it isn’t me in the bog.

  17. NS says “A loud slapdown”. Yeah right. What would have been his reason/motivation for such outrage? Seems like a PC move to me.

    1. It sure sounds to me like he meant it.

      As I noted here, you can ask people to be cooperative, and they often will. But if the result is a series of posts maligning their respected colleagues, they are not going to like it. RH didn’t like it.

  18. I believe Nick Stokes is like Schmidt and RH in that they evidently would rather scold SteveM then address, as I think scientists would, the sensitivity tests that he ran. RH has the code and data just like SteveM has it and if he has run it ,like a good scientist would, and his results were significantly different I would suspect we would have seen those results. I suspect the Briffa has the data and code and has probably run it also. Briffa did reply to SteveM’s first sensitivity test.

    It would appear that sensitivity tests are so foreign to climate scientists that they take them to mean that the person doing them is in effect calling the authors liars and they get so irritated that they cannot discuss the results – as good scientists would.

    1. Kenneth,
      I see you’re scolding me for scolding Steve for scolding almost everybody, but not saying much about his analysis either. Well, as I said above, I ran his code, and it gives the results shown. But I think Gavin and Rashid are right. there is just so much missing.

      Briffa in his 2008 paper does a whole lot of prior analysis to show that his sample is suitable for the analysis. In Sec 5, for example, he tests the relation between tree growth and climate. He finds, interestingly, that the sample he used does not have divergence problems. Sec 3 discusses in detail the climate in the regions, with seasonal variations, again to demonstrate that the trees are good candidates for a temperature signal.

      Steve M does none of those things. He just takes a whole lot of trees in a just arrived data set and cranks the RCS handle, and finds the results are different. So?

      1. The only weather station on the Yamal peninsula proper shows a stronger early 20th century warming than late century warming, so I think the whole divergence problem should be questioned. See link above.

        1. See Briffa’s Fig 7. Correlation between instrumental temperature and proxies are good through C20 in the samples. Neither shows a very marked general rise.

          1. Nick,

            Maybe you can explain to us ignorant types how what Briffa did shows that only a few trees are truth tellers and the majority of trees are liars. While you are thinking of your reply it would be good to include the proof that trees CAN make a reaspnable treemometer and it is reasonable to use calibrated trees for the blade and non-calibrated trees for the handle etc. etc. etc. etc. etc. etc.

      2. Nick, it is very unclear how Briffa selected his proxies in his study and thus when SteveM runs his sensitivity tests he obtains different results and the interested scientist and citizen-scientist can well question the Briffa selection process. From the Briffa article you linked I have the following excerpt:

        “Combining different site data to form regional chronologies is generally only valid if there is an inherent common signal representing the expression of an underlying common forcing in the regional data. The assumption here is that each regional aggregation of data, and hence the variability of ring-width indices from year to year and on longer time scales, represents the expression of common climatic forcing.”

        That excerpt by itself would make me doubt Briffa’s understanding of selection bias and that it could lead to some very misleading conclusions.

        I am not at all certain that you, Nick, appreciate or understand the statistical repercussions that come from selection bias. I believe you have indicated in earlier exchanges that tacking the instrumental record onto the end of proxy series is perfectly valid. That in my mind is the same as saying we can assume the proxy response is as valid a thermometer as that measured instrumentally and we can put the results on an equal footing.

        1. I should have added that SteveM takes chronologies based on what he can obtain data for and shows the results on his blog. That is pretty straight forward. On the other hand, we do not know specifically why or what Briffa rejected in his analysis. It is obvious that SteveM is forcing Briffa’s hand again to make these revelations – which is all good for science.

        2. Kenneth,
          Briffa set out requirements that his data had to fulfill (“empirically demonstrated link between this growth variability and local measured instrumental temperature data”), and showed that they did. Then he analyzed.

          Now there may be other sets that fulfill the requirements and would give a different answer. That would show that there is indeed a selection issue. But to pick a set without showing that it fulfills the requirements doesn’t help at all.

          1. “(“empirically demonstrated link between this growth variability and local measured instrumental temperature data”)

            Empirical criteria for rejecting data which doesn’t match what you want is exactly the ‘crime’ for which the team is accused.

          2. Nick,

            why do you seem to be saying circular logic is good??

            1) set requirements that will select the data that will show what you want to show.
            2) run the code selecting the data
            3) declare the selected data proves what you set out to prove.

            Gee, I should start writing papers proving the trees show alarming cooling!! Oh yeah, you need a group of cronies in positions of influence to be taken seriously.

            HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA

            Isn’t there something in statistics that prefers you to use OTHER data for your actual series after determining your requirements with one set??

            With such a small data set it brings in the real possibility that you are simply looking at statistical FREAKS!!

          3. “But to pick a set without showing that it fulfills the requirements doesn’t help at all.”

            To solve part of this problem, you could just stop using the series at the given time it stops meeting the requirements. 🙂

          4. Why do you think they go to palces like Yamal anyway? It’s to find trees for which temperature is likely to be the dominant determinant of growth. Likely, but how do you know? You check – you have to. And in that 2008 Briffa paper, it’s all laid out in Fig 7. Correlation between measured temp and tree-ring indices. It’s a basic part of the system.

            And if you want to use Hantemirov’s other data, you have to do similar checks.

            And so yes, it’s true that you can’t satisfactorily use proxies as an independent measure of temperature in the instrumental period. You use up most of the information just establishing that the proxies are valid.

          5. Nick,

            As you are aware, if the ‘sorting’ is more than a minuscule amount of the data, then all you are doing is representing your preferred version of the noise. You also guarantee a flat handle for your hockey stick as the non-climate signal will cancel in history creating a reduced variance.

          6. Jeff,
            I did some sorting like that here, where I took 10000 random runs and sorted them by hockey-stick index, which is a crude method of correlation. And then you do get just sorted noise. But even 10000 doesn’t give you much, which illustrates that the kinds of correlation that Briffa shows are not just sorted noise – they are statistically highly significant.

            I can’t see how that guarantees a flat handle – he isn’t really sorting for flatness. It’s true that in two of the three regions there’s little instrumental trend, but there’s no real indication the process selects for trendlessness – ie that the measures rejected were more likely to have a trend or other pattern that could be propagated back in time.

          7. “I can’t see how that guarantees a flat handle”

            This is odd to me. Think about what happens if you have ‘flat’ noise and sort by correlation to an upslope. The upslope itself nearly guarantees a sort for a blade by the basic correlation mathematics. If you assume flat growth for a thousand years of trees, you can quickly grasp how sorting for an upslope only in recent years will yield an upslope.

            Hell, I took mann’s data and found the same thing everywhere I searched no matter which year I tried.

          8. Kan,

            “To solve part of this problem, you could just stop using the series at the given time it stops meeting the requirements.”

            I hope you are joking. The issue is that they are making special calls for special data and just because this data MATCHES something they are trying to prove they declare it good data. As I mentioned above, how do you know that such a small data set aren’t statistical FREAKS!!! That they actually have meaning instead of being accidents??

  19. Nick:

    Correlation between measured temp and tree-ring indices

    That’s not how you establish causation. You need an independent method to establish whether the trees are behaving as temperature proxies, one that doesn’t require comparing tree data to instrumental data, and a metric for determining when the proxy “fails” to continue to track temperature that doesn’t depend on instrumental data. (Otherwise how do you “know” when a proxy is acting as a temperature proxy outside of the incorrectly labeled “verification period’?)

    Correlational studies should be reserved for validation of the proxies, not in “verifying” which data are temperature proxies. (Use paleoclimatology’s logic that “correlation equals causation,” the number of firemen in San Francisco would be a good proxy over the interval for which we have instrumental data. Nice hockey-stick shape too.)

    We factually know that trees respond to other quantities besides temperature, for example moisture, solar exposure and soil conditions. In fact adequate moisture, solar exposure and nutrient availability are all stronger predictors of the growth rate of a tree at a particular location than temperature (whether’s it on a north or south facing side of a hill for example is well known in forestry to be more important for how rapidly a tree gains girth than the mean temperature of the region for example), so the use of particular tree ring proxies needs to be critically examined before they can be accepted as valid temperature proxies (something beyond a “*wink* we checked” statement from the scientists is needed). In addition we factually know that trees don’t have a linear response to temperature (in fact it’s double sided… there is an optimal temperature for growth, all other factors being equal), but linear correlational methods between temperature and tree-ring indices assumes a linear relationship to be true. This doesn’t necessarily eliminate the use of tree ring proxies, but it does suggest you need something more than a simple correlational method to sort the good proxies from the bad.

    There is some voodoo mumbling about “temperature limited growth” that gets used as a justification for site selection, but the argument is very weak, seems completely untestable, appears to lack logical self-consistency and, as things stand, certainly wouldn’t past muster in any field that takes itself seriously as practicing real science. For example selecting trees that at some point are in the “tree line” is useless unless you can track where the tree-line shifts as climate changes, and even then, changes in precipitation amounts has a stronger effect than temperature, and in general regionally precipitation certainly does not correlate with temperature over long periods of time. Of course this lack of a correlation between temperature and precipitation is an interesting observation in itself, as a majority of e.g., Mann 08’s tree ring proxies are actually precipitation proxies, which he’s assuming act indirectly as temperature proxies due to what he assumes is a linear relationship between temperature and precipitation.

    The sad truth is many of the problems in paleoclimatology are solvable, if the people who were doing the work were more intellectually honest, and if the people who aren’t paleoclimatologists weren’t quite so credulous about the flawed work being produced, and frankly, if the overall quality of science paleoclimatologists practice were a bit less mediocre. The sad truth is paleoclimatology’s biggest claim to fame is its utility as a propaganda tool not for any underlying “truth” that it reveals about historical climate. Thus the paleoclimate people have no motivation to “fix” the problems in their field. They are giving the answer that is politically convenient already. Real improvements in methodology may lead to inconvenient results.

    The fact that the tree-ring temperature proxies fail to validate for about 1/3 of the temperature record is enough to tell any unbiased person that what the reconstructions are producing outside of the inaptly named “verification period” is most likely crap. The fact that they often fail to display the periods of overlap with the instrumental records, where the tree ring proxies diverge from the instrumental record, should tell any neutral observer than they themselves are self-aware that they are producing crap.

    This field has probably done more to undermine the credibility of climate science as a field to neutral observers than what microscopic benefits it might offer. It tells the rest of us just how badly the “true believers” want to believe that they would so readily accept without critical thinking this type of work as valid results that should be taken seriously. And the fact that some proxies end up getting used upside down–opposite in direction to the underlying mechanism that gave rise to the temperature proxy–should be enough to inform even the causal observer that the problem isn’t just mediocrity, it’s blatant dishonesty on the part of the so-called scientists.

    1. Carrick,
      For my part I think it would be a miracle if we could get a really reliable, accurate method of determining the temperature a thousand years ago. And I also think the temperature a thousand years ago isn’t relevant to anything much. So I don’t pin a lot of faith on proxy results.

      Still, there is evidence there. Even though we don’t have “a metric for determining when the proxy “fails” to continue to track temperature that doesn’t depend on instrumental data”. Instrumental data is the only temperature measure we have. The only thing we can do with the tree-ring data is correlate. It doesn’t prove causation, absolutely, and doesn’t prove anything about what happened pre-instrumental.

      It isn’t proof, but it isn’t nothing. And, it seems to me, Briffa in that 2008 paper does it about as well as it can be done. For that sample, there isn’t a divergence period. Divergence is common but not universal. The correlation is thoroughly explored, and looks pretty good.

      But the proposition about proxies not telling you about the instrumental period seems fundamental to me, and is widely misunderstood by sceptics. As in their favorite phrase “hide the decline”, which is based on the idea that paleos are trying to manipulate to show a hockey stick with proxies. They aren’t; they know that proxies can’t (because of the circularity of the argument) confirm the recent rise. And they aren’t needed for that; thermometers are more credible anyway.

      Conversely, they have no reason to “hide the decline”, for the same reason. No informed person, rational sceptics included, believes that a post-1960 downturn means that the thermometers are wrong. The sparsely understood rational argument is that divergence detracts from the credibility of the proxies. But that would be true whether the deviation was down or up.

      1. Nick:
        You are very good at misdirection and straw person arguments. Have you been practicing? 🙂

        But the proposition about proxies not telling you about the instrumental period seems fundamental to me, and is widely misunderstood by sceptics. As in their favorite phrase “hide the decline”, which is based on the idea that paleos are trying to manipulate to show a hockey stick with proxies. They aren’t; they know that proxies can’t (because of the circularity of the argument) confirm the recent rise. And they aren’t needed for that; thermometers are more credible anyway.

        This is ridiculous: “Proxies can’t confirm the recent rise”? No, what we are talking about in the “hide the decline” situation is that not only do the proxies not track the recent rise in the target temperatures, but that they are going the WRONG direction in the sole period where most reconstructions are being calibrated to the temperatures. There is no circularity involved. Without proper calibration, the entire reconstruction will be junk.

        Your final sentence is classic straw-man misdirection intended to obfuscate the real problem which you fail to address from a scientific viewpoint.

        Conversely, they have no reason to “hide the decline”, for the same reason. No informed person, rational sceptics included, believes that a post-1960 downturn means that the thermometers are wrong. The sparsely understood rational argument is that divergence detracts from the credibility of the proxies. But that would be true whether the deviation was down or up.

        You start this paragraph with a continuation of that misdirection before you finally get to the real issue. However, you don’t frame the issue quite exactly. The bottom line is that the “divergence detracts from the credibility of the entire reconstruction” and not just reliability of one or two recalcitrant uncooperative proxies. It was indeed a portion of the reconstruction, not the proxies, that magically disappeared from a graph intended to present the results of the reconstruction itself.

        Furthermore, it would not be exactly the same “truth” if the deviation had been up because at least that could offer some hope that the calibration might have been somewhat successful. Do you really believe that “hide the decline” would even have happened had it been exaggeratedly steeper?

        1. Roman,
          Perhaps I’m not very good at direction because you’re just not following the argument. I’m saying that any proxy-based claim about recent temperatures fails because of circularity. That is, you’ve selected proxies that correlate with instrumental temperature over the last century or so, and so all they are telling you in that period is the instrumental period. This is not really altered by the division into calibrationion and verification periods. Paleos know that, and do not make proxy-based claims about recent temperatures.

          The misconception is with sceptics, and you’ve said it in caps: “they are going the WRONG direction”. There isn’t a wrong direction; since the proxies aren’t telling you anything about temperatures in this period, they can’t tell you anything wrong. The only question is whether they correlate or not. If they don’t, you can’t use them – you can’t even calibrate them. The vexing problem arises when some do seem to correlate well over a consistent part of the period, and not over another part. That’s the much discussed (by paleos) divergence problem.

          “you finally get to the real issue”
          You’re a bit impatient here. I thought I was logically building the argument. Yes, divergence casts doubt on the whole thing. But how much doubt? Yes, it’s not just one or two recalcitrants. But it isn’t all of them either.

          1. “is the instrumental period” – I meant “is the instrumental temperature

          2. Re: Nick StokesNick Stokes (May 20 18:00),

            There isn’t a wrong direction; since the proxies aren’t telling you anything about temperatures in this period, they can’t tell you anything wrong. The only question is whether they correlate or not.

            I don’t think that I have ever read a more ridiculous statement from you. Really! I am (almost) speechless…

            Wiggle matching is not science. Usable proxies are not supposed to be black-box entities which contain completely hidden information on relationships between physical entities which are teased out by statistical techniques. If you don’t understand what to expect for the specific type of relationship between the proxy and the entity which it supposedly represents then you should not be using it.

            Yes, in the initial stages while you are learning about a possible proxy and it’s characteristics, you might not be aware of the direction of a physical relationship. If you have reached the stage of applying it to a reconstruction, you still are not aware of the what it should look like, how can you merit the appellation of a scientist?

            Now I can understand why upsidedownmann could have taken umbrage at criticism of his “only in climate science are we allowed to do this” analysis.

          3. Nick,

            If your statement about the claims of the dendro reconstructions is correct, then please explain the use of the word “unprecedented” for late 20th century temperatures when they discuss the >400 years reconstructions? Their claims appear to be, from the words they choose to use, far greater than you are giving them credit for.

          4. Kan,
            Let me say it again – proxies aren’t telling you about the instrumental period. The reason is that you’ve chosen proxies that correlate with instrumental. Of course this isn’t perfect. But still, having made that choice, to the extent they agree you can’t then claim that the proxies that you’ve chosen are supporting instrumental. You chose them that way, so that you could then use them to measure pre-instrumental temperatures.

            So when they say unprecedented, they are citing instrumental temperatures over the last 100 years or so, and then proxy temperatures before that.

          5. Nick,

            Don’t you throw up in your mouth just a little when you redefine what the authors of the papers said their reconstructions showed and why??

            I have a better idea. Why don’t you go and tell them what to say BEFORE the paper is accepted and published rather than trying to rewrite them after they are published and the press release exagerates what they actually say?!?!?!

          6. Roman,
            Again I think you’re missing the point. Of course they know what relationship they “expect” ie would like. They would like some treering index to be an affine function of some temperature – probably a growing season temp. And they select at various levels to try to find proxies with that property. They go to places like Yamal, where reasoning and experience suggests that it might be so. Then they go through their calibration/verification exercise against instrumental, and discard (or downweight) some that don’t qualify.

            Having done all that, they don’t then have, in the remaining sample, an independent measure of temperature against instrumental in the overlap period. That’s not what they are claiming, and not what they are looking for. What they are claiming is that the selected proxies, which correlated with some measure of air temperature in the overlap period, can reasonably be expected to continue that relationship in the past.

            Perhaps the bare words that you quoted (“the proxies aren’t telling you anything about temperatures in this period”) are responsible. The argument I’m setting out is that the proxies as selected aren’t telling you anything about temperatures in this period. It isn’t that they aren’t responding to temperature; the issue is that the selection took away the independence of the information. That’s a statistical argument.

          7. Nick,
            “So when they say unprecedented, they are citing instrumental temperatures over the last 100 years or so, and then proxy temperatures before that.”

            “The sparsely understood rational argument is that divergence detracts from the credibility of the proxies.”

            And now we have the specific problem. Until the divergence is explained, the dendro proxies are useless, and the statement “unprecedented” is knowingly and unabashedly misleading.

            The resolution to the divergence problem requires more investigation. Especially of the selection processes used. So far, as more of the data that was available to these scientist becomes public, the selection processes they used have raised very serious questions. But, so far, no retraction of “unprecedented”.

          8. Kan,
            “the selection processes they used have raised very serious questions”
            And what are those very serious questions?

          9. The serious question is whether they have found nothing more than anomalous outliers. Only a small subset of the total population exhibits the desired properties that correlate to temperature, and in some cases for only a subset of the given time period. Perhaps the selection process used, which lends strongly to an inherent bias, is based on a false premise. That is a serious question.

      2. Nick:

        The only thing we can do with the tree-ring data is correlate

        If that were true, tree-ring data would have no value.

        I think the statement is not true, there are other ways to verify besides correlation with temperature (few things are better studied than trees due to their economic value after all, the problem IMO is bluntly the people doing this pseudoscience are neither very bright, nor ethical nor very gifted as scientists, they just make good mouth-pieces for a political movement, this area of research needs a complete reset).

        Conversely, they have no reason to “hide the decline”, for the same reason. No informed person, rational sceptics included, believes that a post-1960 downturn means that the thermometers are wrong. The sparsely understood rational argument is that divergence detracts from the credibility of the proxies. But that would be true whether the deviation was down or up.

        Of course this is nonsense. Anytime you have an adverse result, you automatically have a reason to hide it. What they incorrectly call ‘verification statistics’ fails for about 1/3 of the measurement period, that’s about as adverse as it gets without total burn-down, fall-over and sink-into-the-swamp. Whether people are previously informed of this is irrelevant to the fact that they don’t show adverse results, or whether they should, which they should beyond any possible argument to the contrary.

        I don’t find the paleoclimate results particularly interesting because I think they are wrong. Bad data plus wrong methods = crap.

        I also find historical records of local climate much more interesting (and historically relevant), because certainly it’s how local weather is varying that affects whether (e.g.) ships are being sent out for exploration and merchanting. What the global mean temperature is at a point in time isn’t even a particularly meaningful number.

        The paleo results even if they were right wouldn’t say much of use regarding the impact of AGW. As it is, they are useful for the shoring up the faith of the faithful, but I find reliance on, or even defense of, this garbage to be stomach turning.

        1. “Whether people are previously informed of this is irrelevant to the fact that they don’t show adverse results”

          But they do show and have shown the adverse results. The complaint is that they sometimes excluded them from plots which were supposed to convey information about temperature. They are excluded because they would convey what is acknowledged to be misleading information about temperature. Not every plot is about verification statistics.

          “there are other ways to verify besides correlation with temperature”
          Examples?

          1. Nick,

            You still haven’t even begun to explain how the selection process for the trees used to make the blade have any correlation or usefulness in the selection of the trees that make the shaft or vicey vercy. Until you do you are simply smearing yourself with the ridiculous idea that having two sets of data selected by different means actually has a coherent meaning in this setting.

            What methods of selection were used that is consistent throughout the whole reconstruction?!?!?! Where there are differences where is the research supporting those differences?!?!?!

            Basically you are using special, unsupported by research, pleadings for the blade and throwing the kitchen sink into the shaft. Shame.

          2. KK,
            they don’t choose trees to make the blade. That’s the point. The HS consists of a proxy shaft and an instrumental blade. Because of the selection process, the proxies should track the blade reasonably well, but that isn’t independent information. And when there is a deviation, as with the post 1960 decline observed in some but not all situations, it’s because proxies with that degree of deviation were admitted, presumably because of the prevalence of divergence.

          3. nick

            I guess the trees “chose” themselves without the Paleo guys setting up any equations and doing any processing??

            HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA

            Apologies for my not being as precise as you in my writing. It does leave you wide holes for misdirection though!! 8>)

            Back to the difference in how the trees were selected between the blade and the shaft. if the same criteria is not used for both sets of trees the reconstruction is essentially meaningless unless you can explain to us why it would not be. As you so ably point out the only real criteria for the trees in the blade were correlation with a teleconnection (snicker) temperature!! Since we cannot use this same criteria to select the trees for the shaft it is arm waving taken to an extreme degree to make ANY claims for the reconstruction!!

            This is the issue I believe Steve Mc. is pointing out and your are attempting to talk around. It is also probably why a certain “scientist” was so upset with Steve Mc. He is now culpable for collaborating with the enemy in showing how his brethren are lacking in any honesty in their work and claims. If he didn’t scream they would have to see if he floated while tied to rocks.

          4. Nick:

            But they do show and have shown the adverse results

            You are using the “almost a virgin” argument. Some of them show adverse results some of the time.

            Examples?

            Are you arguing for the sake of arguing or do you really not know the literature well enough to answer this question yourself?

          5. “Are you arguing for the sake of arguing”

            No. I can’t think how you can verify that a particular tree-ring index is a proxy for temperature other than by its correlation with some measured temperature. Possibly with other temperature proxies, but I don’t think they do that. Unless that you mean that the whole reconstruction process is a verification? I’m hoping you could explain.

          6. If you’re going to be an ardent supporter of a particular approach, I’d suggest it’s your responsibility to become familiar with the literature, and not mine to make you familiar with it.

            Sorry you can’t have it both ways. You can’t be claim to be an “expert” and make assertions that correlation=causation aka cherry picking data is the only strategy available to paleolimatologists with respect to proxy reconstructions, and then plead for references when it is pointed out this simply isn’t true. RTFL.

          7. And before you start chasing wild horses, I suggest you refresh your memory that I was referring to the verification process for when & where tree ring data behave as temperature proxies. (Just so we don’t substitute arguments about needing to use correlational-based metrics in the reconstruction process, which is different arguing that you need to use correlation in the selection process.)

      3. Nick Sokes,

        “And I also think the temperature a thousand years ago isn’t relevant to anything much. So I don’t pin a lot of faith on proxy results”

        OK then, let’s defund all the paleo guys and get onto what you believe supports your theory that CO2 is an issue??

        1. There are plenty of things I’d defund before paleo. OK, I should have emphasised – not much relevant to AGW. The reason for concern re CO2 is the same as it has been for 115 years – the greenhouse effect. If you put CO2 in the air, it’s going to get hotter.

          1. Nick,

            tch, tch, tch. That is your ASSERTION and you are sticking to it. We are still waiting for that mythical paper that actually SHOWS us HOW MUCH it will warm in the real world rather than in the models. Are you ready to perform that miracle for us now??

            HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA

  20. Again I think you’re missing the point. Of course they know what relationship they “expect” ie would like. They would like some treering index to be an affine function of some temperature – probably a growing season temp. And they select at various levels to try to find proxies with that property. They go to places like Yamal, where reasoning and experience suggests that it might be so. Then they go through their calibration/verification exercise against instrumental, and discard (or downweight) some that don’t qualify.

    Would like???

    “Expect” is not just “would like”. The whole point is that there is an underlying claim that proxies contains information about the temperature. In the case of trees, this claim is supposedly based on physical principles due to which it also contains a built-in requirement that expects that the observed relationship will be specifically positive – i.e. increasing temperatures will produce in increase in tree ring size.

    The initial information in the sample IS independent of the observed temperature record. The calibration process estimates parameters in determining the exact numerical form of the relationship. Unless there is severe overfitting, only ***some*** of the information in the sample will be used for this purpose.

    The validation process is supposed to verify that the reconstruction has properly extracted the information and can produce reasonable estimates of temperature by showing that the reconstructed temperatures can reasonably approximate the observed temp series over a portion of time which was not used in calculating the reconstruction. If the results are not close enough, the reconstruction is declared to be a failure.

    What you don’t seem to understand is that the calibration period does not get a free ride. Although the reconstruction cannot be validated by the results in the calibration (a concept which you seem to be interpreting as “proxies not telling you about the instrumental period”), the fit should be as good as or even better during this period than it is in validation. If this is not the case, then there is a serious problem with the calibration. “Hiding the decline” was the result of the understanding of this concept by the team.

    Perhaps the bare words that you quoted (“the proxies aren’t telling you anything about temperatures in this period”) are responsible. The argument I’m setting out is that the proxies as selected aren’t telling you anything about temperatures in this period. It isn’t that they aren’t responding to temperature; the issue is that the selection took away the independence of the information. That’s a statistical argument.

    Estimating parameters and calculating the reconstruction does not necessarily “take away the independence of the information”.

    Look at a standard regression. One can estimate the parameters of the relationship and have information left over to examine the fit through a variety of diagnostic procedures. The residuals in such a regression are uncorrelated (and independent for Normal errors) with the parameters. So there is real iinformation available which can allow us to test whether the regression itself might be problematical. The same principle applies to the results of a reconstruction during the calibration period.

    That’s a statistical argument…

    1. Roman, Nick was using the phrase “would like” in the sense that scientists would, in general “like” to find a useful relationship between environment (temperature in this case) and some proxy indicator, not in the sense of trying to come to a pre-conceived result regarding what that relationship indicates about past and present temps.

      There is an important mistake being made, in these discussions generally, regarding the validity of choosing a certain set of tree ring data at the expense of other candidate sets, and by so doing appear to bias the eventual paleo-estimate (aka “reconstruction”, “back-cast” etc). The issue ties in directly to the issue of “divergence” and to the paper by Loehle (2009) on that topic. Loehle’s argument there (and it is a very good and important one), is that non-linearities in the relationship between environmental driver and ring response (temperature and ring width for sake of the present discussion) will create potentially serious problems when one attempts to use the relationship between them, derived over a portion of the instrumental record, to make paleo-estimates of the env. variable.of interest. This is correct, no question about it whatsoever. It is of fundamental importance and is one main reason why Loehle’s paper is so important.

      The issue then arises as to whether one can find, among the set of potential proxy data out there in the world (trees in this case), sites in which these possible non-linearities are not manifest, because the environmental driver has not yet reached the range of values where the nonlinear relationship becomes clearly apparent. By so doing, one at least has a reasonably stable relationship, over calibration and validation (not verification, that’s different!!) periods, i.e. over the full instrumental data period. As Loehle correctly pointed out, this by itself doesn’t guarantee an accurate reconstruction, because you still don’t know, for the pre-instrumental period, where the measured proxy values fall out exactly in this posited non-linear relationship (which even though you haven’t actually observed it during the instrumental period, you have very good theoretical (i.e. biological) reasons for believing must eventually manifest itself at some (uncertain) range of temperatures). What then; are we screwed? In the sense of an un-restricted use of our calibration relationship to estimate all times in the past, yes, but we can think about it a little more deeply than that.

      Let’s assume we have a observed a strongly positive relationship between the two during the instrumental period, calibration and validation. In that case, we have at least the possibility of “stepping it back in time” to tell when the last time period was which had temps as high as observed during the instrumental period. By this I mean that it might be possible to go back, from the present, to the point at which the proxy values are as high as observed during the instr. period, and know with reasonable confidence that, at least back to that point, temperatures were no higher than the highest observed during the instr. period. Before that point you can’t really say much at all, because you could, from that point back, be on the “other side” of the non-linear (Loehle’s unimodal response, say) relationship, and therefore beyond the realm of justified use of your calibration relatioinship. Granted that there are some technical questions to work out, particularly with respect to how much to smooth the time series beforehand, as this will affect the result. But at least this gives us the **possibility** of estimating how far back in time we have to go to find a state similar to the present.

      Don’t get me wrong–I’m not saying that this reasoning is why these site choices are made. But I am saying that there is a potentially unexplored, and very useful, reason for doing so.

      1. Re: Jim Bouldin (May 21 12:50),

        Roman, Nick was using the phrase “would like” in the sense that scientists would, in general “like” to find a useful relationship between environment (temperature in this case) and some proxy indicator, not in the sense of trying to come to a pre-conceived result regarding what that relationship indicates about past and present temps.

        I am aware what Nick meant but you did not understand what I meant in my reply. He had indicated earlier that what the scientists were looking for was any relationship between proxies and temperature, but that the specific characteristics of that relationship did not seem to matter. My response was that they did matter because they were supposedly based on physical real world interactions between the two. You assumed that I was referring to the creation of hockey sticks which was NOT the case.

        The rest of your comment, as well intended as it might be, really had little to add to the discussion that we were having.

        However, I do think that there is room for a discussion on the principles and implementation of statistical analyses of proxy data. Much of what I am interested deals with basic elements which seem to rarely be present in journal papers. An example of such a point to start with:

        — State the explicit model which describes the variables present in the analysis and more importantly the explicit mathematical relationships which link those variables. Doing this would then guide the necessary steps to be taken in the analysis and allow for proper interpretation of the results.

        What bothers me in some tree ring analyses is the estimation of the growth curve(s) from the data, subsequent division of the proxies by the curve(s) followed by simple additive combination of the altered curves to form a “chronology”. Exactly how do the sizes of the rings, the growth curve, the chronology AND the unmentioned random elements relate to each other mathematically? The answer to this question affects the proper calculation of uncertainty levels and of possible mathematical bias in the estimated chronology.

        Another topics which could be raised:

        Why do we not see further diagnostic evaluation of the proxies AFTER a chronology or a temperature reconstruction has been calculated? What role does each proxy play in the reconstruction? What proxies do not seem to have been useful? There are some simple avenues that might be explored in this direction.

        And I have further ideas on better reconstruction methodology… 😉

        1. “I am aware what Nick meant but you did not understand what I meant in my reply. …The rest of your comment, as well intended as it might be, really had little to add to the discussion that we were having.”
          The rest of my comment wasn’t in any way a response to what you had said, although I can see how it might have looked that way given that I included disparate comments together in one message. I tried to put an extra blank line in to indicate that.

          “However, I do think that there is room for a discussion on the principles and implementation of statistical analyses of proxy data”.
          Without question, and it is ongoing in the literature.

          Much of what I am interested deals with basic elements which seem to rarely be present in journal papers. An example of such a point to start with:
          — State the explicit model which describes the variables present in the analysis and more importantly the explicit mathematical relationships which link those variables. Doing this would then guide the necessary steps to be taken in the analysis and allow for proper interpretation of the results.

          “What bothers me in some tree ring analyses is the estimation of the growth curve(s) from the data, subsequent division of the proxies by the curve(s) followed by simple additive combination of the altered curves to form a “chronology”.”
          Not at all sure what you mean there by “simple additive combination of the altered curves”, including the first and last phrases thereof. Usually, a robust mean of the indices for each year is taken to get the annual values.

          “Exactly how do the sizes of the rings, the growth curve, the chronology AND the unmentioned random elements relate to each other mathematically? The answer to this question affects the proper calculation of uncertainty levels and of possible mathematical bias in the estimated chronology.”

          Not sure what you mean exactly by “relate to each other mathematically”. As near as I can tell from this statement, you are looking for something like this:
          http://www.treeringsociety.org/TRBTRR/TRBvol47_37-59.pdf It’s not like these people operate without a conceptual framework for what they are doing.

          “Another topics which could be raised: Why do we not see further diagnostic evaluation of the proxies AFTER a chronology or a temperature reconstruction has been calculated? What role does each proxy play in the reconstruction? What proxies do not seem to have been useful? There are some simple avenues that might be explored in this direction.”
          That would certainly be helpful. That’s the step of the reconstruction process I know the least about so can’t add much. I’d like to see more analyses evaluating spatial coherence between the chronologies themselves, before calibration.

          “And I have further ideas on better reconstruction methodology… 😉 “

          1. Re: Jim Bouldin (May 21 17:39),

            The content of the Cook paper is indeed exactly the type of statistical modelling that I am talking about. On the second page of the paper, the author sets out the basis for the discussion:

            Consider a tree-ring series as a linear aggregate of several unobserved subseries. Let this aggregate series be expressed as:

            Rt = A t + Ct + dD1t + dD2t + Et

            where R t is the observed ring-width series; At is the age-size related trend in ringwidth; Ct is the climatically related environmental signal; D1t is the disturbance pulse caused by a local endogenous disturbance; D2t is the disturbance pulse caused by a standwide exogenous disturbance; and Et is the largely unexplained year-to-year variability not related to the other signals.

            This is a simple linear model where each ring width measurement is expressed as the sum of the effects of a set of identified factors. But does this model adequately represent the reality from which the data come?

            For example, there is no term in the model which reflects the characteristics of the particular tree from which the tree ring came – all trees are identical except for the random year to year variation Et. The reality is that trees are certainly going to be affected systematically by the single physical location where they spend their entire existence. Some trees will have larger or smaller rings than other trees on a consistent basis. Not taking this into account will give larger trees a greater weight in the subsequent estimates of the factor effects.

            Assumptions about the characteristics of Et are not stated. It should be noticed that tree rings are never negative. In a linear model, this constraint puts a lower limit on the possible value of the term Et which then depends on the values of the other factors. Treating the Et as independent of everything else will produce problems in any subsequent analysis. Is the distribution of Et the same for all rings? From looking at the later analysis in the paper, one can infer that this is not the case.

            Statistically, such a linear model is optimally analyzed in an additively fashion. Factor effects are separated from each other through operations which through suitably weighted addition and subtraction cancel the effects of the other factors so that the estimators contain only the parameters relating to that factor and the Et random terms. However, we later read in the paper:

            The estimation and removal of At from a ring-width series has been a procedure of dendrochronology since its modern day development by A. E. Douglass (1914, 1919). This procedure is known as standardization (Douglass 1919; Fritts 1976). Standardization transforms the non-stationary ring-widths in a new series of stationary, relative tree-ring indices that have a defined mean of 1.0 and a constant variance. This is accomplished by dividing each measured ring-width by its expected value, as estimated by At.

            In effect, our equation at this point becomes:

            Rt/At = 1 + Ct/At + dD1t/At + dD2t/At + Et/At

            from which we must now tease out the remaining factor parameters, in particular Ct.

            Averaging, whether done using robust methods or not, will not do the job. The results will be biased because smaller At values will magnify the apparent environmental signal Ct. Variations in ages of the samples available at a given time will show up at possible variations in Ct. Since At is often an estimate itself with uncertainty due to the fitting process, you now have ratios of random variables to deal with when trying to calculate error bars.

            Enough for now. Hopefully this gives some idea of why I am of the opinion that the topic needs to be revisited to discuss the topic with explicitly stating all of the assumptions inherent in such reconstructions.

        2. correction, that’s not quite right; I was +/- responding to your statement:

          ““The whole point is that there is an underlying claim that proxies contains information about the temperature. In the case of trees, this claim is supposedly based on physical principles due to which it also contains a built-in requirement that expects that the observed relationship will be specifically positive – i.e. increasing temperatures will produce in increase in tree ring size.”

          using it as something of a springboard to some related issues

  21. * I use the terminology “lie” in the same sense employed by Eric and the other RealClimate denizens about those they disagree with. For example, being wrong about something is a “lie” in the RealClimate lexicon.

    1. Things would go a LOT better in these discussions if everyone, on all “sides”, would just drop all such accusations and discuss only the science issues.

      1. Agreed. In my comment, I was acknowledging implicitly that Eric is probably unfamiliar with McIntyre’s editorial policy forbidding the discussion of thermodynamics on his blog, so what he said wasn’t really a “lie of commission”, but it was not an honest assertion of what Eric actually knew (as opposed to assumed without verification), and based on that, it was a misrepresentation of what happens on McIntyre’s blog.

        However, discussion of science ethics and responsible conduct of research should remain on the table, even if we agree to remove loaded words from conversation.

        1. Eric’s climateaudit “lie” was along the lines of being asked what the weather is like today in Beijing, and you answering “it’s raining” without ever checking. Is this a lie? (It could be true.) It certainly is a misrepresentation of what you “know” to be true.

          I’ve said for a while that RealClimate needs to establish standards of behavior for their editors, along the lines of real editorial boards. Oddly, McIntyre has much more clearly delineated rules of conduct on his blog than this much larger group of individuals, with presumably much larger resources available to them (and probably experience serving as editors on journals etc), have been able to establish.

          1. And it’s more than a bit ironic I must say, in the very thread in which you debate what exactly lying is, that you “presume” to know what sorts of “resources” are available to RealClimate.

          2. We have no way to gauge the resources available to the George Soros funded RealClimate blog? Really??? That’s your argument? LAWLZ!

            Any product like “real climate” has a branding associated with it. It can be run as a free for all, or more … rationally. I’ve simply suggested it would function better if they established rules of conduct for the editors. There is no “period” associated with that because it isn’t an either or proposition.

          3. No, you “suggested”, which is to say that you, by your own words, “presumed”, that RealClimate has a lot more “resources” available to it than does Steve McIntyre.

            Not even to mention that “rules of conduct” are by themselves meaningless. The question is rather what those rules of conduct are, and if you think that Steve McIntyres are better than RealClimate’s are, then all I can say is you are entitled to your…..

            opinion!

          4. You’re right Jim. RealClimate is substantially improved by not having any “meaningless” standards of conduct and everybody on the staff acting like spoiled little boys who didn’t get their favorite present for their birthday.

            What was I even thinking. 🙄

          5. Okey Dokey Carrick, we can see what you’re interested in discussing here. Which is not surprising given your understanding of the science as stated upthread.

    2. Carrick,

      then according to that definition we don’t know whether Nick is lieing or lieing???

      HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA

  22. I think that Jim Bouldin’s comment here indicates a disconnect in the discussion and brings to bear what many of us here are attempting to show as a problem in selection bias for temperature proxies.

    “Let’s assume we have a observed a strongly positive relationship between the two during the instrumental period, calibration and validation. In that case, we have at least the possibility of “stepping it back in time” to tell when the last time period was which had temps as high as observed during the instrumental period. By this I mean that it might be possible to go back, from the present, to the point at which the proxy values are as high as observed during the instr. period, and know with reasonable confidence that, at least back to that point, temperatures were no higher than the highest observed during the instr. period. Before that point you can’t really say much at all, because you could, from that point back, be on the “other side” of the non-linear (Loehle’s unimodal response, say) relationship, and therefore beyond the realm of justified use of your calibration relatioinship. Granted that there are some technical questions to work out, particularly with respect to how much to smooth the time series beforehand, as this will affect the result. But at least this gives us the **possibility** of estimating how far back in time we have to go to find a state similar to the present.”

    It is perhaps time to show some ARFIMA and ARIMA models where, with no deterministic trend included in the series, we can see upward trends at the end of these series. In other words how many selections would be needed to satisfy a requirement that the proxy was responding to temperature? One would have to conclude that on sampling a proxy, which had samples retained with no reasonable characteristics for excluding any samples from the analysis and with no peeking at the temperature relationship, and not seeing the upward trend no more often than you would in the ARFIMA or ARIMA models that the proxy was not a valid thermometer.

    I would think that the analyses and investigation would have to be confined to the part I describe above until one could at least show that the proxy was doing significantly better than a model with no deterministic trend. I think this first step is what is not understood by some scientists working in this area and rather they simply do not appreciate the fact that with sufficient numbers of selections one can find these random looking trends and correlations.

    If it were appreciated we would see studies of a sampling of potential proxies based on a reasonable set of criterion and determining whether that sampling showed trends significantly in excess would could arise by chance. There are questions beyond such a study that would be required to be answered for determining a valid proxy thermometer (some on which Jim Bouldin referred back to Craig Loehle) but the investigation I suggest ,I think, would be a good a required starting point.

    1. I wouldn’t call it a disconnect, I’d call them largely separate issues. But more broadly, I don’t agree with the thrust of your argument.

      “It is perhaps time to show some ARFIMA and ARIMA models where, with no deterministic trend included in the series, we can see upward trends at the end of these series. In other words how many selections would be needed to satisfy a requirement that the proxy was responding to temperature? One would have to conclude that on sampling a proxy, which had samples retained with no reasonable characteristics for excluding any samples from the analysis and with no peeking at the temperature relationship, and not seeing the upward trend no more often than you would in the ARFIMA or ARIMA models that the proxy was not a valid thermometer.”

      You don’t need an ARIMA model to generate the kind of phenomenon you describe there, you just need strongly 1st order autocorrelated data and enough randomly generated replications thereof. Then yes, you will find some that tail up, or down, at either end.

      But calibrations between proxy and instrumental are not, as usually performed, based on time scales other than annual. These annual variations can, and often do–depending on their magnitude relative to the trend–dominate the calibration relationship and diagnostic statistics from it. The “upward trends” you describe refer implicitly to longer time scales (multi-year), which will have varying degrees of correspondence with the inter-annual relationship, sometimes strong, sometimes not so strong, sometimes strengthening the annual-scale calibration and sometimes weakening it. The fact that these interannual variations often track each other at a p value that is much lower than would be expected under a null model of no relationship (p = .05), indicates that there is a relationship between the proxy and instrumental data. This relationship is observed even at (some) sites that show strong divergence over mult-decadal scales, indicating that there is still some cause and effect connection between the variables, and indicative of the possibility that the divergence phenomenon is, in such cases, at least partly, perhaps even largely, artifacts of the analytical procedures used.

      The best scenario is when the annual-scale relationships are strong across calibration and validation periods (and similar, as judged by various model calibration “efficiency” statistics such as CE, RE, KGE etc), *and* at longer time scales (decadal to multi-decadal). When you have that situation you have definite evidence of a strong and stable calibration relationship that can be used to estimate pre-instrumental values.

      But nevertheless with all the caveats and concerns that I mentioned in my previous post. Yes, you do have to run a sort of gauntlet here.

      1. “You don’t need an ARIMA model to generate the kind of phenomenon you describe there, you just need strongly 1st order autocorrelated data and enough randomly generated replications thereof. Then yes, you will find some that tail up, or down, at either end.”

        You are more likely to see longer segments of a long series trending (faux trend, if you while) depending on the model parameters you chose. An AFIRMA model with a d value over 0.3 gives many longer trends. If we got it down to deciding whether a proxy and temperature relationship trended significantly more often than a model we would have to determine which model was more appropriate.

        “The best scenario is when the annual-scale relationships are strong across calibration and validation periods (and similar, as judged by various model calibration “efficiency” statistics such as CE, RE, KGE etc), *and* at longer time scales (decadal to multi-decadal). When you have that situation you have definite evidence of a strong and stable calibration relationship that can be used to estimate pre-instrumental values.”

        As I recall Mann avoids the use of r in validation testing because I would think he is concerned with obtaining a proxy response that looks “good” in the higher frequency response but not for longer frequencies. This view I would think is counter to what you saying here. I know for example that a difference series between near neighbor temperature measuring stations and a reference station can have very good correlations and yet yield very different long term trends. I would think a valid proxy response has to track temperature over a long period of time and following annual wiggles does not necessarily meet that requirement. I have observed scientists noting that a series of tree ring proxies respond at the same time positioned blip in the individual series to a known historical event, like a volcanic eruption. The problem is that the responses vary in intensity from proxy to proxy.

        “This relationship is observed even at (some) sites that show strong divergence over mult-decadal scales, indicating that there is still some cause and effect connection between the variables, and indicative of the possibility that the divergence phenomenon is, in such cases, at least partly, perhaps even largely, artifacts of the analytical procedures used.”

        I know you are testing various algorithms for adjusting tree ring growth to match climate, but I have not seen any published evidence of what you are conjecturing here about divergence. If divergence is not explained by analytical artifacts then I think divergence would show a weakness in your arguments about the importance of annual response. It has been my observation that divergence is not relegated to dendro proxies but can seen in non dendro proxies as well. Mann (08) makes, in passing, this same observation. I would suppose if you could explain divergence in some proxies, we would continue to be left without explantion the extraordinary responses of proxies such as those in Yamal and those proxies that show neutral responses.

        1. “As I recall Mann avoids the use of r in validation testing because I would think he is concerned with obtaining a proxy response that looks “good” in the higher frequency response but not for longer frequencies. This view I would think is counter to what you saying here.”

          Most TR studies, almost all I would guess, calibrate on the annual scale; it’s standard practice. So the fact that Mann does so also is inconsequential. It’s not counter to what I’m saying. My explication on that point was related more to the general issue of whether trees track the environment at all, given the divergence effect. Trees can diverge at longer time scales and still show high correlation at annual time scales, that was my point.

          As for the use of r or R squared, versus other calibration stability statistics (e.g. CE etc) I think it’s widely agreed that the latter are the way to go, and Mann et al used RE and CE in their 2008 analyses IIRC, testing the significance with a Monte Carlo routine, going from memory.

          “I know for example that a difference series between near neighbor temperature measuring stations and a reference station can have very good correlations and yet yield very different long term trends. I would think a valid proxy response has to track temperature over a long period of time and following annual wiggles does not necessarily meet that requirement.”

          As I said, it’s best if you can get high correlations at all temporal scales, no question. This is an important issue and needs to be studied carefully. It also potentially interfaces with the detrending step in a complex way that I can’t easily explain here and won’t try to.

          “I know you are testing various algorithms for adjusting tree ring growth to match climate…”
          No, I’ve developed a new method for detrending the size/age effect in a way that doesn’t remove part of the low-frequency climate variation, like existing methods do, i.e. a better way to extract the true long term climatic trend. There’s no “adjusting tree rings” involved. It’s an annualized detrending of the size effect if you will.

          “but I have not seen any published evidence of what you are conjecturing here about divergence. If divergence is not explained by analytical artifacts then I think divergence would show a weakness in your arguments about the importance of annual response. It has been my observation that divergence is not relegated to dendro proxies but can seen in non dendro proxies as well. Mann (08) makes, in passing, this same observation. I would suppose if you could explain divergence in some proxies, we would continue to be left without explantion the extraordinary responses of proxies such as those in Yamal and those proxies that show neutral responses.”

          Divergence is not always explained by analytical artifacts; my point is that these can sometimes cause them in certain situations. If an actual breakdown in the temp response is instead the cause, then I agree with you, and then you will see a weakening correlation between driver and response at all time scales, including the annual, and there is definitely strong evidence for this at many of the northern sites N of 55 degrees. As for other proxies, I cannot say anything about that.

          I encourage you to read:
          Esper and Frank (2009) Divergence pitfalls in tree-ring research. Climatic Change 94:261–266
          D’Arrigo et al. (2008) On the ‘Divergence Problem’ in Northern Forests… Global and Planetary Change 60:289–305
          Briffa et al (1998) Reduced sensitivity of recent tree growth to temperature at high northern latitudes. Nature 391:678–682

          especially the first one.

          1. Kenneth,
            Actually, the best data I’ve come across is Table 1 of the last reference. There, they show very large differences in the relationships between temperature and maximum latewood density when computed at decade to multi-decade scales, but essentially no, or very minor, differences when computed at the annual scale, viz:

            Table 1 Correlations between tree-ring and temperature series (data rearranged by me)

            Maximum density (MXD)*:

            A B C D
            NWNA 71 66 58 46
            SWNA 72 75 72 63
            ENA 72 73 82 70
            NEUR 89 87 92 81
            SEUR 59 62 82 79
            WSIB 82 83 75 58
            CSIB 56 51 83 67
            ESIB 52 56 78 52

            Ring width (RW)**:

            A B C D
            NWNA 35 33 12 15
            SWNA 27 22 -4 4
            ENA 33 31 26 15
            NEUR 73 64 56 45
            SEUR 44 39 -6 7
            WSIB 41 42 45 28
            CSIB 42 34 54 44
            ESIB 27 37 21 37

            columns:
            A = annual scale r, 1881-1960‡
            B = annual scale r, 1881-1981
            C = multi-dedadal r, 1881-1960§
            D = multi-dedadal r, 1881-1981

            Definitions of regions and number of sites: southwestern (SWNA, 53 sites), northwestern (NWNA, 30) and eastern (ENA,34) North America; northern (NEUR, 46) and southern (SEUR, 72) Europe; western (WSIB, 42), central (CSIB, 31) and eastern (ESIB, 6) Siberia

            All correlations in this table have been multiplied by 100.
            * Correlated against April–September mean instrumental averages.
            ** Correlated against June–August instrumental data.
            ‡ Interannual correlations involve timeseries of residuals from decadally smoothed curves.
            § Decadal correlations are calculated using decadally smoothed series.

            Note that the large drops in r for the multi-decadal scale are recorded over time spans with a very large degree of overlap in time (80 years), meaning that the large changes shown here at the multi-decadal scale are due to the (apparently very large) changes that occurred from 1961 to 1981.

          2. Jim Bouldin, from your replies I think we are close to being on the same page about what is required of a valid proxy. I have not seen a published paper or article that summarizes what climate scientists judge are the requirements and the bases for them. I see too much, in my view, of never considering the alternative that proxies, as a group or selected individual ones, are not valid thermometers for use in long term temperature reconstructions.

            A couple of quibbles with your reply above:

            1. The use of interannual measures in proxy calibration and verification:
            The link below shows the calculation of RE and CE

            Click to access Mann_Rutherford_GRL02.pdf

            The excerpt from Amman and Wahl 2007 below would indicate that Mann et al. use of the RE metric does indeed consider problematic issues tied to interannual-focused measures as in the use of r.

            Ammann and Wahl 2007 asserts:

            “MBH and WA argue for use of the of Reduction of Error (RE) metric as the most appropriate validation measure of the reconstructed Northern Hemisphere temperature within the MBH framework, because of its balance of evaluating both interannual and long-term mean reconstruction performance and its ability thereby to avoid false negative (Type II) errors based on interannual-focused measures”

            2) Adjusting tree rings:

            It is my view that indeed algorithms are used to adjust tree ring widths for tree age/size (and perhaps other considerations) and nobody uses raw tree ring data, i.e. it is all adjusted for use in proxies, and your efforts would in the end do the same.

            3) I have read the articles from your recommended reading list and while they did impart some knowledge to me, I did not see the kind of exposition that I judge needs to be made here – as I noted above.

        2. Kenneth Fritsch:

          I would suppose if you could explain divergence in some proxies, we would continue to be left without explantion the extraordinary responses of proxies such as those in Yamal and those proxies that show neutral responses

          Regarding the Yamal data, Craig Loehle had some interesting comments:

          [S]trip bark trees are obviously growing in an unusual way due to damage and should be dropped. The Yamal trees look like a case of conversion from shrub to upright growth form, and should likewise be dropped (esp since they are multi-sigma outliers). Instead they do the converse and use them repeatedly.
          The fact that they will not publicly state how/why they dropped sites/trees is quite damning. It is either an untested assumption (per my comment above) or simply data snooping.

          Craig may or may not be right with his comments, but MO this is the sort of selection process that should be engaged in before one starts computing correlation between the tree-ring proxy and the instrument record.

          Regarding Mann 08, most of the tree-ring samples he uses don’t actually show late-20th century divergence, even though his reconstruction clearly shows divergence.

          Given that Mann relies on Yamal series that exhibit abnormal growth and flawed non-tree-ring proxies (Tiljander) which he double counts in addition to inverting the prior known causal relationship between temperature and proxy index, it doesn’t take a rocket science to say a significant problem here is methodological, and not something that is fundamental to the proxy data themselves.

      2. Jim,

        “The fact that these interannual variations often track each other at a p value that is much lower than would be expected under a null model of no relationship (p = .05),”

        I definitely challenge this assertion. Even considering the highly pre-selected Mannian dataset, the percentage of accepted data is very low. He found 484 proxies of 1208 which correlate above 0.1. Of these, 71 were luterbacher which had instrumental data pasted on, (I’ve forgotten the hundred something) Schwiengruber of hide the decline fame with the last 60 years chopped and replaced. By the time I was done with upside down data etc.., there was less than 10 percent which met p=0.1 and those had been pre-sorted from a much larger set of data -including the newly revealed Yamal results.

        I once made an online estimate in a post at CA which resulted in a S/N ratio of 0.03 or something like that. Chucking data in that scenario, especially rho=0.9 autocorrelated data, is a big no no.

        1. “The fact that these interannual variations often track each other at a p value that is much lower than would be expected under a null model of no relationship (p = .05),”

          With sufficient data one could easily obtain an r=0.10 that was significant at a probability of <0.01. All this means is that 1% of the proxy variablity is explained by temperature. That does not give me a lot of confidence that the proxy will respond consistently well to temperature over time or space.

          "(I’ve forgotten the hundred something) Schwiengruber of hide the decline fame with the last 60 years chopped and replaced."

          104 MXD Schweingruber proxies.

        2. Well I really can’t follow your argument there and usual, it’s always a harping back to what Michael Mann did, is doing, will do, instead of a broader discussion of the science as a whole; I’m not going there.

          1. I am saying that the claimed correlations in the (‘standard everyone uses them’) proxy sets are NOT as high as you claim. Mann’s data is the same as everyone elses.

          2. Everyone uses the same set of proxy data in their analyses eh? That statement alone reveals how well you know this topic. Probably not a whole lot I can say, other than my original statement on the matter was:

            “The fact that these interannual variations often track each other at a p value that is much lower than would be expected under a null model of no relationship (p = .05) [later corrected to p = 0.50 ], indicates that there is a relationship between the proxy and instrumental data.”

            There is an enormous amount of evidence supporting this statement, and not just from Mann’s work; see the data I posted above from Briffa et al. (1998) and the paper by Wettstein et al cited elsewhere here, among others. And your interpretation of the significance of the correlations used by Mann is incorrect also; there is no way you would get that many sites correlating at p = 0.1 under the null model of no significant relationship between temperature and ring response. Not even close.

          3. From Mann(08) we have this:

            “Reconstructions were performed based on both the ‘‘full’’ proxy data network and on a ‘‘screened’’ network (Table S1) consisting of only those proxies that pass a screening process for a local surfacetemperature signal. The screening process requires a statistically significant (P _ 0.10) correlation with local instrumental surfacetemperature data during the calibration interval. Where the sign of the correlation could a priori be specified (positive for tree-ring data, ice-core oxygen isotopes, lake sediments, and historical documents, and negative for coral oxygen-isotope records), a one-sided
            significance criterion was used. Otherwise, a two-sided significance criterion was used. Further details of the screening procedure are provided in SI Text.

            The distribution of the ‘‘screened’’ proxy network for the full interval 1850–1995 is shown in Fig. S1B. The rms local annual temperature correlation of the full screened network is r_0.39 (r_0.33) for predictors available back to A.D. 1800 (A.D. 1000). This corresponds to signal-to-noise amplitude ratios (SNRs) of SNR _ 0.43 and 0.35, respectively (see ref. 32). Of the 1,209 proxy records in the full dataset, 484 (40%) pass the temperature-screening process over the full (1850–1995) calibration interval (Fig. 1; see also SI Text and Table S1).”

            What this says is that we have an approximate 1 in 7 chance that the correlations of proxy to temperature could happen by chance as a test criteria. Mann (08) uses P=0.10 but with the AR1 considered P=0.13 after accounting for the reduced degrees of freedom. It appears that the average correlation is r= 0.39 for those proxies back to 1800 and r=0.33 for those back to 1000.

            When I made the correction for proxies that did not belong in the 40% it reduced that number to around 22%, as I recall, passing the screening test. I then did a sampling of an ARFIMA model with what I judged had parameters typical of a long term proxy response and that result produced correlations with a P=>0.10 about 20 % of the time.

            By the way, the correlations for all the Mann (08) proxies is in a reference from the paper or SI to another location – I think it was the NOAA reconstruction depository – and I can dig those out for viewing here.

          4. Jim: “Everyone uses the same set of proxy data in their analyses eh?”

            If you want to “make progress in science” as you claim, I think it would work better if you honestly portrayed the argument the other person was actually making.

            Pretty sure that Jeff’s point wasn’t that everybody uses the “same proxy data in their analysis.”

          5. Here is the average correlation (r) values for grouped tree ring proxies that passed the P=0.10 screening test in Mann(08):

            Pass-screening over 1850-1995( r ) =0.1965
            Pass-screening over 1896-1995( r )= 0.2311
            Pass-screening over 1850-1949( r )= 0.2283

            I’ll get back to give a link to these data later.

          6. Jim, what you are reporting is pre-sorted data. No I wasn’t saying ‘everyone uses the exact same data all of the time’. What I am saying is that these datasets are commonly used in multiple papers. In fact much of the Mannian data was referenced to other reconstruction papers.

            There is a great deal of commonality in data. I haven’t got time to review the Briffa result at the moment, but I wouldn’t be surprised to find the same data in M08,09, VS, other Briffa work, etc…

            My point was that the data is a significant and real set and its correlation with temp has been HEAVILY distorted by pre-manipulation.

          7. Jim,

            I have put considerable time into variance loss and signal/noise ratio in paleo data. My results differ from the publications in many respects. Here is a post I did at CA on the matter:

            http://climateaudit.org/2010/08/19/signal-to-noise-ratio-estimates-of-mann08-temperature-proxy-data/

            I also have put considerable time into analyzing autocorrelation and variance loss using simple AR1 models here. As you may be aware, the handle of the stick is forced flat in relation to the blade, simply by the nature of noisy multivariate calibration methods on autocorrelated data.

          8. Jeff:

            I also have put considerable time into analyzing autocorrelation and variance loss using simple AR1 models here. As you may be aware, the handle of the stick is forced flat in relation to the blade, simply by the nature of noisy multivariate calibration methods on autocorrelated data.

            I think this is less of a problem with the newer regularization schemes being used (e.g., errors in variables) than with the older correlational methods. So it’d be interesting to perform the types of studies you did using the newer methodologies.

            Fundamentally the same methodological problem exists though…the weighting scheme is based on how well the proxy correlates with temperature over an interval that seems to have been selected to give the desired result rather than on any fundamental scientific reasons.

            Why is 1880-1960 for example “special”? Likely 1880-1950 is corrupted by measurement issues…. temperature is much better understood e..1950-2000 is much more tightly nailed down. And as I pointed out on this thread, the majority of tree ring proxies in Mann’s data sets respond “correctly” to temperature over that period.

            Conceptually, you should be able to determine how good of a temperature proxy a given record is (establish its weighting) without resorting to correlational methods. If you can’t, from my perspective, you aren’t even practicing science, not even bad science, but voodoo at that point. First you pick the right methodology, then you apply it correct, then you compute the results, then you interpret them. There should be no feedback “interpretation=inconvenient answer, so adjust methodology.”

            Were Nick actually right and you were stuck with using correlation=causation (seems like Jim implicitly endorses this too), I think there’d be no choice but to pull the plug on the funding…going down that road leads to meaningless nonsense.

            To give a flavor of what I’m thinking about … why is it (on physical grounds) that the tree-ring proxies seem to do a better job with multi-year temperature data than with annual? From a biological (evolutionary adaptive) standpoint, I don’t see how this makes the slightest bit of sense. And I don’t think it’s consistent with real-world data where they carefully monitor growth patterns and perform MANOVA type regressions…my brother in law (who has an advanced forestry degree with a specialization in tree planation growth) would tell you much of what I said above: Trees respond to changes in precipitation, solar exposure and fertilization much more strongly than they do to annual temperature.

            Jim and the paleoclimate people people can fling as many ad hominems as he’d like at us, but these are the basic facts that he and Mann are both blithely ignoring. You don’t get to ignore the elephant in the living room and claim sanity at the same time. That argument just doesn’t hold up to scrutiny.

          9. When I said the following in a previous post:

            “I then did a sampling of an ARFIMA model with what I judged had parameters typical of a long term proxy response and that result produced correlations with a P=>0.10 about 20 % of the time.”

            That should have been P=<0.10 about 20 % of the time. In other words you obtain a significant correlation per the Mann (08) selection criteria about the same amount of times with a trendless ARFIMA model as was the case for the proxies in Mann(08) after .removing those proxies that should not have been included.

          10. To make it clearer what the Mann (08) correlation data represents in my previous post, I should point out the there were 927 TRW proxies. The numbers passing the P=0.10 criteria for correlation to temperature was for the three periods given in parentheses after the average correlation values. We can see the pass rate is not good (25%) and it also would indicate that had we had data on correlations for those not passing the criteria the average correlation would have been smaller. The preferred correlation of Jim Bouldin of annual data would indicate that less than 5% (r^2) of the year to year proxy response is explained by temperature in the Mann (08) TRW proxies. That number would further indicate a poor translation of response to temperature over time and space.

            Pass-screening over 1850-1995( r ) =0.1965 (258)
            Pass-screening over 1896-1995( r )= 0.2311 (239)
            Pass-screening over 1850-1949( r )= 0.2283 (218)

          11. Carrick,

            “I think this is less of a problem with the newer regularization schemes being used (e.g., errors in variables) than with the older correlational methods. So it’d be interesting to perform the types of studies you did using the newer methodologies.”

            I have some experience with these as well and am not convinced that the problem is substantially reduced. Of course the proof is in the pudding and as you can tell from my lack of posting, commenting, etc. means that work has gobbled up a huge chunk of my time. The cool bit is that work is getting to be a lot of fun.

          12. “Were Nick actually right and you were stuck with using correlation=causation “
            Carrick, I don’t know where you get the causation from – I haven’t mentioned it. If two things correlate, you can use one to estimate the other. No issue of causation.

          13. Nick:

            Carrick, I don’t know where you get the causation from – I haven’t mentioned it. If two things correlate, you can use one to estimate the other. No issue of causation.

            Using correlation to select proxies is making the assumption of causation. You do understand that, right?

            You have to a) assume that temperature is an explanatory variable for tree-ring growth (that’s where the assumption of causation enters) and that b) other explanatory variables such as precipitation can be neglected (again the assumption of causation enters here).

            Using correlation-like measures to calibrate is fine IMO, using correlation to sort when you don’t have any guarantees that your selected explanatory variable (temperature) is the only causative agent is where the flaw in the methodology appears.

          14. Jeff:

            I have some experience with these as well and am not convinced that the problem is substantially reduced. Of course the proof is in the pudding and as you can tell from my lack of posting, commenting, etc. means that work has gobbled up a huge chunk of my time. The cool bit is that work is getting to be a lot of fun.

            I’ve got some experience with these methods as well, but not as they are applied in this area. It would be fun to work on, but like you, my own work is consuming all of my time and anyway is more fun and generally a more productive line of research than chasing tree-temperature shadows.

          15. Nick:

            If two things correlate, you can use one to estimate the other. No issue of causation.

            Let me address this too: If they correlate but you don’t use the assumption of causation you can use one to estimate the other “in sample” (that is you can use them in the range where the two metrics overlap). See “in sample estimation”. That you can do this is a mathematical fact. So what you said was technically true, but completely irrelevant to the problem we’re addressing here. I can’t think of any applications where “in sample estimation” of temperature from tree-ring growth is useful, but if they were using correlational methods for “in sample estimation” of temperature (where other methods were not available), nobody would have a problem with it.

            [In practice, as Jim points out, the instrumental temperature record is actually much more finally grained than the record of temperature that you can reconstruct from tree-ring proxies, so there is no practical value to “in sample estimation” in this case.]

            What you absolutely can’t do is use the fact that two variables (e.g., tree-rings and instrumental record) correlate over an overlapping temporal range to estimate the one variable from the other outside of the temporal range where the two variables correlate without applying the assumption of causation.

          16. OK Carrick, what causation do you think they are assuming? They infer temperature from tree-rings – so treerings cause temperature?

            It doesn’t matter what causes what – it could be something else driving both. The mathematical assumption is only that the observed correlation persists outside the overlap region, for whatever reason. Of course, to find proxies with statistically significant correlation, you’ll have causative mechanisms in mind. And you’ll try to avoid the possibility of confounding factors that might change.

          17. Nick:

            what causation do you think they are assuming

            Um…

            How about this?

            Tree growth is affected by temperature. Increase the temperature, the tree grows more rapidly (this is a “cause”), decrease the temperature, the tree grows more slowly (this is a “cause”.)

            I’ll assume a brain-fart was responsible for that question.

            It doesn’t matter what causes what – it could be something else driving both

            It doesn’t matter as long as you are performing “in sample estimation.”

          18. Nick

            The mathematical assumption is only that the observed correlation persists outside the overlap region, for whatever reason.

            And the only way that happens is if a causative agent is present.

            Otherwise you’re performing the equivalent of polynomial extrapolation (or extrapolation using a model with zero external validity). You can do it mathematically, but unless you assert causality, the results are mathematically meaningless.

            Get it?

          19. Carrick,
            In fact they are correlating with temperatures in distant locations. But the actual determinant of growth might be local soil temperature, snow cover duration etc. Again, mathematically the assumption is simply that statistically significant correlation observed during overlap will persist in the non-overlapping period. There’s no assumption about cause needed. Of course, a possible physical mechanism will make the assumption more persuasive.

          20. Nick, seriously, try out that argument in a peer-reviewed journal.

            The assumption that the correlation will persist prior to the period of overlap has no merit absent a physical model that explains the existence of a correlation, and establishes why it might be expected to persist prior to the period of overlap.

            I’m would hope you would reject “number of firefighters in San Francisco” as a proxy for global mean temperature… even though the correlation between that metric and decadal-scale temperature change is awesome (and no divergence problem!).

          21. Carrick.
            I’m would hope you would reject “number of firefighters in San Francisco” as a proxy for global mean temperature”
            Well, I assume they have a cooling influence.

          22. LOL. Including a causal model now!

            But this true only if you include in town fires. 😉

            The US has had a pretty horrendous record of over-managing forest fires, and that has lead to some massive super fires in the last few decades.

          23. NIck,

            “Again, mathematically the assumption is simply that statistically significant correlation observed during overlap will persist in the non-overlapping period.”

            If you have autocorrelated data, as proxies are, and the data is noisy, as proxies are, pre-selection followed by MV calibration against temperature containing a trend, guarantees a flatter than actual handle and an upslope at the end.

            I really don’t get why people don’t understand this.

          24. Jeff, I don’t understand it. I can see that if you overestimate sensitivity of the proxies to temperature in the instrumental period, you’ll get a flatter blade. But I don’t see what you have listed having that effect.

            I also don’t see where the upslope at the end comes from, but in any case I think that should be discounted because of selection. And I see I have support there from McShane and Wyner:
            “Second, the blue curve closely matches the red curve during the period 1902 AD to 1980 AD because this period has served as the training data and therefore the blue curve is calibrated to the red during it (note also the red curve is plotted from 1902 AD to 1998 AD). This sets up the erroneous visual expectation that the reconstructions are more accurate than they really are.”

            Blue is proxy, red instrumental. The gist of this complaint seems to be that proxies should not be shown at all post 1902, and I think I agree.

          25. Nick, I think the flattening that Jeff is referring to is the ‘loss of variance’ issue with this method, and it has been recognized as an issue as long ago as Mann 1999. See e.g. Von Storch’s 2004 paper.

            That’s the reason when I compare different reconstructions, I treat the reconstructed quantity as “unnormalized” outside of the calibration period.

            Blue is proxy, red instrumental. The gist of this complaint seems to be that proxies should not be shown at all post 1902, and I think I agree.

            I actually think it’s fair to show the calibration period, because it demonstrates how well or how poorly the data conform to the training period. Perhaps this could be split up out a separate plot.

            As I intimated at some other point in this thread, I also don’t advocate using the very old data sets for reconstruction, I think the error in the instrumental data are just too large, and difficult to quantify, for that.

          26. I am hoping that Jim Bouldin comes back to TAV to reply to my latest in comments our exchanges, and particularly, the correlation of the tree ring proxies to temperature used in Mann (08) that was in turn limited to 25% (of over 900 tree ring proxies with calculated correlations) of the proxies that passed the pre-selection criteria of P=0.13.

            From Bouldin’s previous posts I could not judge whether those numbers would surprise him or not. It is good to enter into these kinds of discussion as it forces you to look even closer at data from which you thought you had pulled a goodly amount of information. In this case it was worse than I thought.

            To complete this analysis – for now anyway – I would like to estimate how those tree ring proxies that passed the pre-selection criteria, and for which we have correlation data, compare to those that did not pass the criteria and for which we do not have correlation data. I see that the tree ring proxies do have correlations within groups for cores and maybe I can do something with that data.

          27. I cannot do much with the intra core correlations because these cores are strictly for the group of cores in an individual proxy that failed/passed the selection criteria. What is interesting, however, is that the intra core correlations, i.e. with one another is much higher than for the correlation of those proxies that passed the selection criteria with temperature.

            For the period 1850-1995 the passing proxies had an average correlation with temperature of 0.197 while the intra core correlations with the passing proxies was 0.656 and that for the failing proxies was higher at 0.701. In other words, the tree rings responded to something very well in concert within the proxy but that something in total was much bigger than the temperature response. The other factors in that big something would not necessarily track with temperture over time (or space) and making eking out a temperature signal most difficult.

        3. But more to the relevant point , I refer interested folks to:

          Wettstein et al. (2011) Coherent Region-, Species-, and Frequency-Dependent Local Climate Signals in Northern Hemisphere Tree-Ring Widths. J. Climate 24:5998-

    2. correction, should be:
      “..expected under a null model of no relationship (p = .50)…”, meaning that under the null hypothesis of no relationship you’d get half of all test results above and below p = 0.50.

  23. But calibrations between proxy and instrumental are not, as usually performed, based on time scales other than annual

    With tree rings more I believe correlating with JJA rather than annual is more typical, especially for high-latitude dwelling species. The point is to pick species you “know” by other means exhibit temperature limited growth, these tend to be species living in tree lines or the high arctic, where growth is highly seasonal, and strongly correlated with summertime temperatures, not annual.

    1. Yes. Using the term “annual” in the context of long term paleoclimate reconstructions typically refers to the yearly scale and below, with the understanding that the effect is largely seasonal.

      1. That can of course be part of the underlying problem that we’re encountering with the divergence—using irregularly sampled surface temperature both spatially and temporally.

        It’s generally understood that global warming/cooling has a stronger effect as you approach the North Pole, and that the effect of polar amplification is larger in winter than summer. See e.g. this, which shows latitudinal effect of warming by season.

  24. speaking to Jim Bouldin – you might have gathered that this an emotionally charged environment. In part, the charge is due to the “editorial” policy of RealClimate, which is heavily censored to make agnostics appear like fools and true believers seem like true believers. It is, almost unbelievably akin to a KKK rally policy. your recent interventions have come across as oddly scientific. In the sense that we all know that RC does not do proper science anymore. But your recent posts on climate audit make me hope that you actually have an open mind with regard to what trees can say.

    1. Yes I’m very well aware of the emotional aspect Diogenes. In most fights there is legitimate blame to be assigned on both/all sides. I think that all that any of us can do given the situation is do our best to understand the science, and try not to add to the hostility. I do my best, but I don’t always necessarily succeed.

  25. I’v been out of the loop for a while on the Yamal controversy. Am I correct in summarising that the debate has moved on from whether the study was crap or not. All parties agree tacitly or openly, that it’s junk science. Now the debate is over whether the Team was incompetent or dishonest? Is that a fair assessment or too strong?

  26. the last thing that “official” science needs is for someone to wait till the wire and publish a crap treemometry paper that gets included in the “consensus” science without time for objective evaluation. You would hope that the scientists in the area would have the ethics not to do this again. But I am not hoilding my breath nor betting the farm.

  27. Sometimes they forget……….

    Your words Eric?: “in my view O’Donnell et al. is a perfectly acceptable addition to the literature. O’Donnell et al. suggest several improvements to the methodology we used, most of which I agree with in principle”

    [Response:Indeed, yes. I also pointed out that I thought that their trend estimate was low, and I gave my reasons, and they chose not to address this.–eric]

    Comment by Grosjan — 23 May 2012 @ 5:42 PM

    (from the borehole)

  28. You have 62 series; detrend and fit to Jan 1900 to Dec 1949 temperature; take the sum of square of the difference and then rank buy smallest sum squares; so best fit has the lowest sum of squares and is ranked (1).
    Using the same detrended data fit to Jan 1950 to Dec 1999 temperature; take the sum of square of the difference and then rank buy smallest sum squares; so best fit is ranked (1).

    Now test if the two rankings belong to the same population.

  29. DocMartyn said
    June 1, 2012 at 10:09 pm

    Interesting analysis, Doc. What 62 series are you referring to.

Leave a reply to kuhnkat Cancel reply