kgnd & Cross-Validation: PART II – Parting Thoughts

by Ryan O’Donnell

kgnd & Cross-Validation: PART II (TTLS vs. Ridge Regression and Table S3)

In his O’Donnellgate post, Steig makes the following observation:

It’s perhaps also worth pointing out that the *main* criticism I had of O’Donnell’s paper was never addressed. If you’re interested in this detail, it has to do with the choice of the parameter ‘k_gnd’, which I wrote about in my last post. In my very first review, I pointed out that as shown in their Table S3, using k_gnd = 7, “results in estimates of the missing data in West Antarctica stations that is further from climatology (which would result, for example, from an artificial negative trend) than using lower values of k_gnd.”

Mysteriously, this table is now absent in the final paper (which I was not given a chance to review).

In posts at Eli’s and John Nielsen-Gammon’s, he makes some further observations with respect to this:

This is not complicated folks. O’Donnell and gang, not liking my criticisms of the way they used TTLS, and in particular the fact that the truncation parameter they wanted to use, suddently started using IRIDGE. This has the advantage of having a build in verification function, which means you can’t see what the verification statistics are, which means that it is much easier to NOT SHOW THE BAD VERIFICATION STATISTICS I was criticizing them for. Maybe that is not why they used iridge. I don’t know WHY they used IRIDGE but I did not suggest it to them nor endorse it.

[at J N-G’s only:] P.P.P.S So if anyone wants to speculate that hiding table S3 is O’Donnell lying again, go for it, since speculation is all most people seem to be doing these days.

A few of these observations are fairly easy to deal with. Steig claims that the disappearance of Table S3 (which we will discuss in a moment) was “mysterious”, states that its disappearance was unknown to himself as a reviewer, and implies (and condones speculation) that this was done nefarious purpose.

The problem is . . . none of this is true. Steig did, in fact, see the revised Supporting Information with Table S3 removed. The removal was far from “mysterious”, as Steig himself acknowledges in his third review:

An unfortunate aspect to this new manuscript is that, being much shorter, it now provides less information on the details of the various tests that O’Donnell et al. have done. This is not the authors fault, but rather is a response to reviewers’ requests for a shorter supplementary section. The main thing is that the ‘iridge’ procedure is a bit of a black box, and yet this is now what is emphasized in the manuscript. That’s too bad because it is probably less useful as a ‘teaching’ manuscript than earlier versions. I would love to see O’Donnell et al. discuss in a bit more details (perhaps just a few sentences) how the iridget caclculations actually work, since this is not very well described in the original work of Schneider. This is just a suggestion to the authors, and I do not feel strongly that they should be held to it.

Apparently, Steig’s post-review change-of-heart includes forgetting that he saw the revised SI with the table removed, forgetting that he knew why it was removed, and forgetting that the only additional request he had with respect to the SI was that we add a few words on iRidge (which we declined to do for reasons stated at the end of this post here).

These issues are, of course, incidental to the primary point of concern: Did Steig’s criticisms based on Table S3 have any scientific validity? With the administrative details out of the way, let us examine this in some depth.

Table S3 can be found here. As noted in both the SI and the main text, it was never intended to represent valid cross-validation statistics for the reconstruction. Instead, it was the result of a screening test we used to limit the total number of combinations we had to test in order to determine the optimal parameters.

Regardless of how we intended to use the screening test, the results show one particularly interesting feature. At our optimal choice of kgnd = 7 (demonstrated in Part I), Byrd AWS station demonstrates a negative CE. A negative CE indicates that the simple mean of the data matches the data better than the infilled values – or, in other words, indicates that the infilled values are a poor representation of the actual values. Steig noted this, and – during the review process as well as following publication – has used this to try to claim that our West Antarctic results are no good.

The problem with Steig’s logic is that this test is a very poor indicator of the actual reconstruction skill for an individual station (and an almost equally poor way of finding the ideal truncation parameter). To see why, we must first understand how the screening test was conducted.

For the screening test, we co-opted a popular method for measuring reconstruction skill for the purpose of performing an approximate cross-validation test. This method is an early / late withholding test. How it works is that a certain number of stations are designated as verification targets and alternately have ½ of their data withheld:

Fig 1


The first step withhold half of the data for the verification targets (red in the left graphic), infill at various settings of kgnd, and compare the infilled values back to the original, withheld values in red. We then repeat this by withholding the other half of the data (red in the right graphic). Lastly, we extract the worst results from the early and late tests and compare at the various settings for kgnd.

Some problems with this type of test should be immediately apparent. First, only a portion of the predictors can be analyzed. If we were to withhold the long record length stations, for example, the test would necessarily yield ideal truncation values that are too small. This is because the maximum allowable truncation parameter is equal to the minimum number of predictors available for any time step. If one attempts to use a larger parameter, the regression coefficients are undefined. This limits us to withholding the short record length stations – which already have the most sampling error due to the short overlap with the predictors (and, as a result, are likely to give more uncertain cross validation results, especially when one considers the ability to match the low-frequency response).

The second major issue is that withholding such large quantities of data affects the order and shape of the eigenvectors. We discuss this at length in our response to Review A here. This means that eigenvectors 1 – 7 (say) when performing the early/late test are different than eigenvectors 1 – 7 (say) when performing the actual reconstruction. Because the eigenvectors are different, there is no reason to suspect that the peak performance would occur at the same number of retained eigenvectors.

In other words, this type of cross-validation testing is simply not very good at determining the ideal truncation parameter, either for the full set or for an individual station. So . . . one might ask, why on earth did we do this?

The reason for performing this test was to get a gross idea of what the ideal truncation parameter might be. Since we have no a priori knowledge of what it might be, simply testing all of the possible parameters (1 through 11) for kgnd was computationally prohibitive. So this test was performed to determine the neighborhood in which we would expect the ideal kgnd to occur when we performed our more extensive and effective cross validation tests.

To this point, however, all we have done is present plausible reasons why early/late cross-validation might be a poor indicator of the ideal truncation parameters. We have not demonstrated this to be true. It is now time to change that.

The test we will perform is simple. We will take known, complete instrumental temperature data, mask out values to duplicate the pattern of missingness in Antarctica, and then determine the actual idea truncation parameter by comparing the infilling error to the masked values. We will then perform the early/late cross validation test, and compare how well that test identifies the actual ideal truncation parameter. In addition, we will test the method used by S09 for determining the truncation parameter (Mann et al. 2007). As a bonus, we will also compare the performance of TTLS to ridge regression. For speed, we will use multiple ridge regression (mRidge) instead of iRidge, but this choice does not affect any of the following results.

The data sets we will use are the AVHRR data corresponding to the manned station locations for one test and long record GHCN stations for another. In order to obtain a true Monte Carlo estimate for the performance of these methods, we will use the phase-randomization approach from Christiansen et al. 2008 to obtain random realizations of the source data. This approach involves taking the Fourier transform of the data, randomizing the phases, and then performing an inverse Fourier transform. This preserves the covariance, noise, and autocorrelation structure of the data while yielding random temporal realizations. The test scripts and data used are available here, as are the raw cross-validation statistics used to produce the plots from Part I. We will perform 50 replicates for each data set, which take approximately 2.5 days apiece of computational time.

To summarize, we will test:

  1. How effective is the Mann et al. 2007 rule (used by S09) at finding the correct truncation parameter?
  1. How effective is early/late cross validation at finding the correct truncation parameter?
  1. Do the individual station results from early/late cross validation mean anything?
  1. Is ridge regression or TTLS more accurate for infilling instrumental temperatures?

*** MANN et al. 2007 ***

The truncation rule espoused by Mann et al. 2007 – which was the rule used by S09 – performed particularly poorly. For the AVHRR set, the rule correctly identified the ideal truncation parameter only 7 times out of 50 attempts, with an average miss of 2.08 and missed the truncation parameter by 4 or more 11 out of 50 times. The performance with the GHCN set was nearly identical, correctly identifying the right parameter 8 times, an average miss of 2.08, and 11 misses by 4 or more.

To put this in perspective, simply generating random numbers using a uniform distribution between the minimum possible truncation parameter (1) and the maximum observed (8) achieved a correct identification 5.2 times out of 50, with an average miss of 3.03, and an a miss of 4 or greater of 13.2 times out of 50 in 100 trials.

In other words, the Mann et al. 2007 rule is not much better than plucking random numbers out of the air.


Early/late cross validation performed better than Mann et al. 2007. (Of course, this is almost a given since it is difficult to conceive of a truncation rule more inept than Mann et al. 2007.) For the AVHRR set, this method correctly identified the ideal truncation parameter 25 times out of 50, with an average miss of 0.64, and only 1 miss of 4 or greater. For the GHCN set, the performance was marginally worse. The method correctly identified the ideal truncation parameter 18 times, with an average miss of 0.76 and no misses of 4 or greater.

At this point, it is clear that neither Mann et al. 2007 nor early/late cross validation are particularly effective. We can expect to get the truncation parameter correct only 15% of the time using Mann (compared to 10.5% of the time using random numbers), and only 43% of the time using early/late cross validation. Given the large spread in reconstruction results based on changing the truncation parameter by only 1, this is not very encouraging.

Even more discouraging is how well the early/late cross validation testing captures the correct parameter for a given station. Remember that one of Steig’s criticisms is that in the early/late test, the Byrd value at kgnd = 7 was negative. Did that negative CE really mean anything? Let’s take a look.

Since we not only calculated the ideal parameter for the whole set but also the ideal parameter for each individual station, we can compare how accurate the early/late cross validation results capture the actual ideal parameter for each station. To do this, we simply extract the known ideal parameters from the full set and compare them to the estimated ideal parameters from the cross-validation testing on a station-by-station basis. If we plot those results, we get:

Fig 2


In case any of us were wondering, that’s not very good. The early/late cross validation test correctly identified the ideal truncation parameter for a given station only 751 times out of 3,500 chances – or a dismal 20.7% success rate. In fact, the true ideal truncation parameter differed from the cross validation estimate by 2 or more over 50% of the time.

In terms of CE, Steig’s case is even weaker. If one examines the CE value for the early/late cross validation test at the ideal parameter for each station, the early/late cross validation estimate of CE is negative over 36% of the time:

Fig 3

To illustrate just how bad the correlation is between the early/late cross validation CE estimate and the true CE at the ideal truncation parameter, we can look at the scatterplot of the values:


Fig 4

The early/late cross validation results tell us virtually nothing about how well the full data set will fit the missing data at the ideal truncation parameter. The idea that the early/late cross validation results – Table S3 – give any indication of full reconstruction performance for a given station is simply baseless. All early-late cross validation can do is give us an indication of what the full set ideal truncation parameter might be. It gives us a neighborhood for the full set – nothing more. It tells us next to nothing about individual stations.

This is why we ignored Steig’s criticisms of kgnd. His criticisms, quite simply, have no basis in fact.


The only remaining criticism we’ve yet to prove baseless is that ridge regression is known to cause problems. Of all of the criticisms Steig leveled in that post, this one is the easiest to prove false.

Along with performing a full-set ideal truncation parameter determination and an early-late cross validation test, we also infilled each of the random realizations using ridge regression and compared the results to the best possible TTLS results. How did ridge stack up? Let’s look:

Fig 5


Note that ridge regression beat TTLS EVERY SINGLE TIME. The average improvement in using ridge regression instead of TTLS was an 11.9% improvement in RMS error. In terms of capturing the actual linear trend of the data (measured in units of standard error of the actual trend), ridge demonstrated a 15.9% improvement over TTLS.

The charge that ridge regression is somehow less accurate than TTLS when infilling instrumental temperatures is completely without merit.

Given that ridge regression outperformed TTLS using the ideal truncation parameter every single time, one might wonder what would happen if – instead of using the near-random Mann et al. 2007 rule or the marginally better (but still poor) early/late cross validation tool – we simply picked the truncation parameter based on how well the TTLS solution matched the ridge regression solution. In other words, if we attempt to choose the truncation parameter based on the TTLS solution that matches the ridge regression solutions the best, how would we do?

The answer is . . . we would do remarkably well.

If we were to do this, we would have chosen the correct truncation parameter for the AVHRR data 34 times with an average miss of 0.46 and no cases of missing by 4 or more. For the GHCN data, the performance is even better. We would have chosen the correct truncation parameter 43 times with an average miss of 0.25 and no cases of missing by 4 or more. We can plot these results to give an idea of how much more effective this method is:

Fig 6

Nor does this give a fair indication of how much better a comparison with ridge regression is in terms of picking the right truncation parameter. This is because when the ridge regression comparison misses, it misses by a much smaller amount than early/late cross validation or Mann et al. 2007. Determining this is simple. We sum up the difference in RMS error between the truncation parameter chosen by a particular method and the true ideal truncation parameter, and divide by the number of times that method chose the wrong truncation parameter.

Average increase in RMS error per miss:

  1. Ridge regression: 19 misses, RMS error per miss of 0.011
  2. Early / late cross validation: 57 misses, RMS error per miss of 0.095
  3. Mann et al. 2007: 85 misses, RMS error per miss of 2.887 (0.655 with the removal of one large outlier)

In other words, if early/late cross validation chooses the wrong parameter, the resulting increase in RMS error is, on average, 9 times the additional error as compared to using a ridge regression comparison and choosing incorrectly. In fact, just 2 misses using early/late cross validation result in the same additional error as the sum of all 19 misses using ridge regression. And choosing it based on Mann et al. 2007 incurs, on average, 65 times the additional error if benefit of the doubt is given for one outlier, and 288 times the additional error if a strict average is used.

At this point, we may be wondering . . . at what value of kgnd does the ridge solution best match the TTLS solution for our reconstruction? Well . . .

If we use correlation coefficient, then we get kgnd = 7.

If we use RMS error, then we get kgnd = 7.

If we use CE, then we get kgnd = 7.

If we use RE, then we get kgnd = 7.

If we use our cross validation method from our first submission, then we get kgnd = 7.

If we use our published cross validation method, then we get kgnd = 7.


We had 3 unanswered questions from the previous post:

  1. The reason for the lower West Antarctic trends in O10 is due to the use of iRidge. Nope, as the TTLS trends at the ideal parameter are even lower.
  2. Early/late cross-validation is a better way to determine the truncation parameter. Definitely not.
  3. The kgnd issues were not addressed by O10, who “mysteriously” removed a table of offending verification statistics. The removal of the table was neither mysterious nor unexpected, and the kgnd issues were thoroughly addressed in our responses.

With respect to Steig’s 11 criticisms of our paper in his post, only one of them has any potential merit, and that one relates to unpublished, presently unverifiable, results of uncertain accuracy at one location, which are also at variance with his reconstruction.

His final batting average: 0.091.


Regardless of whether Steig honestly attempted to improve our manuscript during the review, and regardless of whether he honestly believed his criticisms to be true, the act of simply claiming things without performing even the simplest check to determine if the claim could possibly be true cannot be condoned. Just as every author has the responsibility to verify results and calculations to the best of his or her ability, every reviewer should have the responsibility to verify claims prior to making them. This verification is even more critical when communicating things as fact to the public.

Regardless of how “gentle” Steig’s critique was, it was nothing more than a collection of false statements posing as fact.

If climate scientists wish to be taken seriously by those who have the time and ability to independently verify their results, they ought to be careful about claiming things to be true that are easily proven to be false.

17 thoughts on “kgnd & Cross-Validation: PART II – Parting Thoughts

  1. I like how the phase randomization approach of Christiansen et al is simply chucked in there as though it were a bit of light reading. If I ever get time again, that would be a fun thing to do to proxy series.

  2. Ryan, one of the criticism of ridge regression is that it reduces the measured trend. If you have already done this, can you post the results of synthetic data comparing the mean value of the measured trend of ridge regression vs ttls?

    I suppose I could do this myself with a small bit of coaching, if you haven’t.

  3. Carrick, perhaps you could show us a reference to where it was shown that ridge regression results in lower trend, i.e. is that criticism hand waving or real. I agree that a simulation would give a more general picture, but on the other hand I have been attempting to keep score on when these claims are well founded and when they are simply thrown out – or even anecdotal.

    As I recall Steig’s reference was back to a Mann et al paper where the claim was made with the reference to a single result. Have there been other claims?

  4. From a previous RyanO post here, what I recalled was in Ryan’s comment below.

    “More to [what we believe to be] the reviewer’s point, though Mann et al. (2005) did show in the Supporting Information where TTLS demonstrated improved performance compared to ridge, this was by example only, and cannot therefore be considered a general result. By contrast, Christiansen et al. (2009) demonstrated worse performance for TTLS in pseudoproxy studies when stochasticity is considered – confirming that the Mann et al. (2005) result is unlikely to be a general one. Indeed, our own study shows ridge to outperform TTLS (and to significantly outperform the S09 implementation of TTLS), providing additional confirmation that any general claims of increased TTLS accuracy over ridge is rather suspect.”

  5. Carrick, here is Ryan’s response to Steig in the third review:

    The second topic concerns the bias. The bias issue (which is also mentioned in the Mann et al. 2007 JGR paper, not the 2008 PNAS paper) is attributed to a personal communication from Dr. Lee (2006) and is not elaborated beyond mentioning that it relates to the standardization method of Mann et al. (2005). Smerdon and Kaplan (2007) showed that the standardization bias between Rutherford et al. (2005) and Mann et al. (2005) results from sensitivity due to use of precalibration data during standardization. This is only a concern for pseudoproxy studies or test data studies, as precalibration data is not available in practice (and is certainly unavailable with respect to our reconstruction and S09).
    In practice, the standardization sensitivity cannot be a reason for choosing ridge over TTLS unless one has access to the very data one is trying to reconstruct. This is a separate issue from whether TTLS is more accurate than ridge, which is what the reviewer seems to be implying by the term “bias” – perhaps meaning that the ridge estimator is not a variance-unbiased estimator. While true, the TTLS estimator is not variance-unbiased either, so this interpretation does not provide a reason for selecting TTLS over ridge. It should be clear that Mann et al. (2007) was referring to the standardization bias – which, as we have pointed out, depends on precalibration data being available, and is not an indicator of which method is more accurate.
    More to [what we believe to be] the reviewer’s point, though Mann et al. (2005) did show in the Supporting Information where TTLS demonstrated improved performance compared to ridge, this was by example only, and cannot therefore be considered a general result. By contrast, Christiansen et al. (2009) demonstrated worse performance
    for TTLS in pseudoproxy studies when stochasticity is considered – confirming that the Mann et al. (2005) result is unlikely to be a general one. Indeed, our own study shows ridge to outperform TTLS (and to significantly outperform the S09 implementation of TTLS), providing additional confirmation that any general claims of increased TTLS accuracy over ridge is rather suspect.

    Steig was very vague in the review, basically saying only that there was a “bias”. In the RC article, Steig didn’t do much better, saying something to the effect that ridge ‘reduces the trend’. Anyone else beside me struck by curiousity at such ambiguity? IIRC, there are only two possible matters that this criticism can track back to – one of which cannot apply to O10. On the other, call me cynical, but I don’t believe that anyone from the Team is going to be raising the variance loss issue with anything but cryptic language.

  6. Carrick,

    It’s rather small. For the set averages (i.e., row means of the synthesized data), the standard errors are (in deg C/decade):

    GHCN set: 0.03
    AVHRR set: 0.08

    The ability to replicate the trend was recorded in units of SE. So the mean values for trend error in terms of SE were:

    GHCN set:

    …..ridge: 0.46 ==> trend error of 0.03 * 0.46 = 0.015 deg C/decade
    …..TTLS at ideal: 0.57 ==> trend error of 0.03 * 0.56 = 0.018 deg C/decade

    AVHRR set:

    …..ridge: 0.18 ==> trend error of 0.08 * 0.18 = 0.014 deg C/decade
    …..TTLS at ideal: 0.83 ==> trend error of 0.08 * 0.83 = 0.066 deg C/decade

    Insofar as your statement that ridge regression “reduces” the trend, this depends on what happened in the withheld portion of the data. In paleo reconstructions, for example, one of the criticisms made by Smerdon & Kaplan (2007) is that ridge regression increased the trend.

    If the calibration and reconstruction periods have exactly the same trend, on average, ridge will underestimate. If the calibration period has a trend and the reconstruction period does not, on average, ridge will get the trend right. If the calibration period has a trend and the reconstruction period has a trend opposite in sign, on average, ridge will overestimate.

    Since the temporal information and the order/orientation of the masks were randomized, the mean trend difference for both TTLS and ridge are not very informative. What is informative is the absolute deviation from the actual trend in terms of SE of the actual trend.

    Whether ridge and/or TTLS overestimate/underestimate/get-it-right depends on the relationship between the calibration trend and the reconstruction period trend in the actual data. The statement that ridge “reduces” the trend is an implicit assumption that the measured trend in the calibration period is of the same sign as the actual trend in the reconstruction period.

  7. Error in the above. Should read:

    AVHRR set:

    …..ridge: 0.18 ==> trend error of 0.08 * 0.18 = 0.014 deg C/decade
    …..TTLS at ideal: 0.23 ==> trend error of 0.08 * 0.23 = 0.018 deg C/decade

  8. I should also note that the above applies to a single RHS and single regularization parameter. In the case of RegEM, a separate regularization parameter is computed for each pattern of missingness. On average, the above statements may hold, but for an individual realization, the behavior may be quite different. It is possible for ridge to overestimate a trend even if the calibration and reconstruction periods have trends of the same sign, since different predictors are used for each pattern of missingness. If the actual errors are not i.i.d. and of equal variance, then there exist patterns of missingness that will produce individual results that are opposite of the general rules listed above . . . though on average the above rules will hold.

    The Christiansen paper referenced in the post provides some insight into this behavior by analyzing the stochasticity of regression solutions for various methods, including ridge and TTLS. The stochasticity is huge.

  9. “Anyone else beside me struck by curiousity at such ambiguity?”

    LL, I have already given my impression of what Steig was attempting to do as Reveiwer A and that was to get tutored by Ryan et al. on the methods and then get O(10) authors to provide alternative results by alternative methods and then finally for Steig to select the “best” (most WA warming) method. What struck me was Steig’s almost total insistence that WA had to be warming and by the amount, or close to it, that S(09) indicated. It was almost like if you do not get WA sufficiently warming I know your methods must be wrong or at least strongly suspect. I attribute that approach and the lack of sensitivity testing by some climate scientists to consensus thinking about AGW – as in we already know what the answer should be.

  10. So Ryan are you saying that TTLS under the conditions you note can give larger errors and thus could give a higher trend than ridge, but at the margins of the trends that have appeared in S(09) and O(10) for the Antarctica continent and 3 main regions.

    I was curious when I attempted to find the information that you noted in the Mann(05) SI with regards to TTLS versus iRidge, I could not find it. I used the SI at Mann’s website. Is this the paper with Rutherford as a coauthor where they used synthetic data and with various SN ratios?

  11. I do not know how well this table from O(10) SI reproduces here, but put yourself in Steig’s shoes and I think you can see were his review was headed. TTLS over IRidge and then TTLS with kgnd=6 and we have got some real WA warming – and a Peninsula bonus as well.

    Reconstruction Type Continent East Antarctica West Antarctica Peninsula
    RLS, IRidge 0.06 ± 0.08 0.03 ± 0.09 0.10 ± 0.09 0.35 ± 0.11
    RLS, TTLS, kgnd=5 0.10 ± 0.09 0.04 ± 0.10 0.21 ± 0.11 0.52 ± 0.14
    RLS, TTLS, kgnd=6 0.09 ± 0.09 0.03 ± 0.10 0.20 ± 0.11 0.50 ± 0.14
    RLS, TTLS, kgnd=7 0.07 ± 0.09 0.05 ± 0.09 0.07 ± 0.07 0.36 ± 0.11
    RLS, TTLS, kgnd=8 0.07 ± 0.09 0.05 ± 0.09 0.11 ± 0.08 0.38 ± 0.12
    E-W, IRidge 0.04 ± 0.06 0.02 ± 0.07 0.06 ± 0.07 0.32 ± 0.09
    E-W, TTLS, kgnd=5 0.07 ± 0.07 0.04 ± 0.07 0.10 ± 0.08 0.39 ± 0.09
    E-W, TTLS, kgnd=6 0.07 ± 0.06 0.04 ± 0.07 0.09 ± 0.08 0.37 ± 0.09
    E-W, TTLS, kgnd=7 0.07 ± 0.06 0.05 ± 0.07 0.05 ± 0.06 0.33 ± 0.09
    E-W, TTLS, kgnd=8 0.07 ± 0.07 0.05 ± 0.07 0.11 ± 0.07 0.32 ± 0.08

  12. Thanks for the response Ryan. I would have picked kgnd=7, which appears to have been your choice too (balance between CE and spatial resolution).

    In a particular example that I know about, the trend does get reduced, but this turns out to be a good thing (you have ground truth in that example so you can judge the skill of the various inverse solutions).

  13. The problem with being a liar is that you forget what was a lie and what was the truth and get caught. Steig’s problem is compounded by the fact that he does not understand these topics well enough to offer valid criticism.


  14. Ryan,

    Charlie Munger (Warren Buffett’s investment partner) identified Eric Steig’s problem a long time ago in his list of 24 causes for human misjudgment (

    The following applies to Eric and a lot of other climate scientists (like James Hansen and his cabal):
    “4. Fourth, and this is a superpower in error-causing psychological tendency: bias from consistency and commitment tendency, including the tendency to avoid or promptly resolve cognitive dissonance. Includes the self-confirmation tendency of all conclusions, particularly expressed conclusions, and with a special persistence for conclusions that are hard-won.”

    When someone like Eric “shouts out” his conclusions on the cover of Nature, he is automatically and at the same time “pounding in” (into his head) an absolute certainty of the validity of those conclusions. The affirmation is overwhelmingly powerful. That makes it very difficult/impossible to ever accept those same conclusions were rubbish.

    As Eric has so amply demonstrated, starting with blog shouting matches well before the O(10) manuscript was submitted, he will not ever (indeed is probably incapable of) accepting findings that invalidate the conclusions of the S(09) Nature paper.

    Fortunately, objective reality does not depend on Eric’s beliefs.

  15. Thanks, Ryan, for going back over all this. I am just sorry that you had to.

    I think Dr. Steig overplayed his hand when he decided to post his response on RC, but up till then he had accomplished his goals. He had delayed publication, and gotten you to change your methods in the published paper away from a replication of his mathematical failings. To me, that took away from the power of your paper. By having your presentation use a different method, he could now claim you did not disprove his work.

    Personally, I think every method you used showed he was incompetent mathematically. But by getting you to remove the TTLS regressions, he created the illusion that you couldn’t discredit his methods by replication. To the readers of tAV, CA, and WUWT, you’ve won the battle and war, but that is not what the general public and MSM understands.

  16. One of the ironies is the warp caused by the great need to have that big ol’ shelf melt. And what if it did? Would they be happy then?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s