Back in 2004, Steve McIntyre and Ross McKitrick (MM) attempted to publish a nature paper critical of the particular PCA method used in the latest hockey stick paper MBH04 required by the UN IPCC to continue the aggressive global warming stance they had taken. Although the paper was eventually rejected for publication the one of the two anonymous reviewers made some interesting supportive commentary regarding PCA analysis. Link available below. Referee #1 says specifically that:
There are two main points of dispute:
1. The principal component technique used.
2. The quality of the early data.
I deal with 1. first. It is an area where I have expertise, but it is not at all clear what exactly is being done.
The referee continued with
1. I think I understand better than before what the MBH04 PCA is doing, namely centering the data about the mean of the 1902-80 period rather than of the whole series. The question is why, and what properties and interpretation does such a procedure have? Given the non-stationarity of the series, it is certainly not successively maximizing variance as in PCA, and talking about ‘explained variance’ therefore makes little sense.
Recently on George Tamino’s blog about Tamino’s own insistence that MM were wrong in their use of PCA — Don’t take my word for it Just ask Ian Jolliffe.
Well that didn’t sit too well with Dr. Jolliffe so he wrote back. Many of my readers already know what he said so only a small quote here.
There are an awful lot of red herrings, and a fair amount of bluster, out there in the discussion I’ve seen, but my main concern is that I don’t know how to interpret the results when such a strange centring is used? Does anyone? What are you optimising? A peculiar mixture of means and variances?
This comment about not knowing what the results are looking for combined with being an expert on PCA seem similar to the above referee comments from the natue review. So I asked him outright on Tamino’s thread.
I should note that in the time between Ian’s first post and my first post, Tamino was embroiled in a detailed conversation with Dr. Jolliffe where he was stating that MM had confused uncentered and decentered PCA in his effort to convince Ian that MM was wrong.
My post —
I am fairly new to the AGW science world but not to science. In my digging regarding PCA, I happened across the anonymous review comments on Climate Audit of a paper submitted to Nature by McIntyre & McKitrick (MM) in2004. The reviewer’s comments appear to be remarkably similar to your recent post on this blog.
Most notably referee #1 in the first submission states an expertise in PCA. This of course narrows the field considerably.
The comment which really struck me was this
“I think I understand better than before what the MBH98 PCA is doing, namely centering the data about the mean of the 1902-80 period rather than of the whole series. The question is why, and what properties and interpretation does such a procedure have?”
In my admittedly limited understanding of MBH98 PCA this seems to be the same criticism in 2004 that you have posted here. Besides the comments about using a partial The phrasing of the what in the post here on this blog is a big flag.
“I don’t know how to interpret the results when such a strange centring is used? Does anyone? What are you optimising? A peculiar mixture of means and variances?”
So the easy question is –
Were you in fact the reviewer of the MM paper to nature in 04?
and more importantly-
Now that you have had more time, has your position changed regarding the review?
Thanks in advance for your reply.
Well this pissed off Tamino and a couple of his favorite posters. I think more because they had just been dealt a pretty big blow to their arguments more than the question itself but it was interesting. I didn’t back down to the criticisms I received and because of that my questions took up a good chunk of the thread. After some abuse, I relented and re-asked my question providing links to the paper and the referee comments to ask if he would state his opinion regarding the nature commentor.
After some time, Ian replied sounding a bit perturbed but not angry. This is where MM got a bit more of the overdue credit for their pain. Unfortunately, the details of the description are easily lost and the whole paragraph needs to be studied to understand the meaning. There is one paragraph though which stood out to me.
However, if we then go on to interpret the PCs or use them in a non-descriptive way we need to be very careful that we have understood the implications of the underlying covariance structure – for example, might it produce PCs that give undue weight to particular data points, or that emphasise some parts of the structure but hide others that are no less interesting?
He is discussing again, how do we interpret the results of such an unusual centering method. The same criticism in the MM paper.
I thanked Ian for the reply. Some time later this post came up.
After some thought, I have decided to ‘come out’, for three reasons. The first is that it fairly obvious I was the Nature reviewer and the second is that I’d like to think that when I write a review, there is nothing in it that I can’t defend. A third reason is to warn others about how memory can let you down, especially when you are not as young as you used to be.
I see nothing in the two reviews (Reviewer 1 first submission; Reviewer 2 second submission) that I would change with hindsight. Indeed some of my recent comments are remarkably similar despite not having read through these reviews in detail for 4 years.
Looking back, I was interested to note the chronology. The first review was written in February 2004 and it is clear that I didn’t understand what MBH were doing at that time. The second review was in July 2004 when I said I thought I did understand, but the notorious Powerpoint presentation was in May 2004 when I had yet to see the light.
Now for the scary memory bit. In July 2004 I clearly thought I understood what MBH were doing, though not why or how to interpret it. However, I must have felt that other things, involving less investment of time, were more interesting and moved on. Apart from another reviewing task a year later, I was unaware of the fierce controversy raging, until earlier this year when a co-worker and I started investigating algebraic relationships between PCAs with different centrings. Looking for examples, we revisited MBH, and I was genuinely surprised when my co-worker told me that MBH had not done an uncentred analysis, but something else. I had forgotten what I learnt four years earlier. So I was wrong in saying in an earlier posting ‘it was only fairly recently that I realised the exact nature of decentred PCA’; rather it was a case of being reminded. As I said, scary!
After all the attempts to discredit the MM paper directly by Tamino, Ian Jolliffe reaffirmed that the PCA method used was in doubt and its actual meaning was questionable.
It’s not much, but in McIntyre and McKitryk’s world, after years of battling this methodology, they’ll take what they can get.
I’m new to this science though. Not bad but I had hoped for a bit more!
Ian continued his comments on short segment centering today on Tamino OT#6.
In response to chopbox, I’ve said almost everything I can at present regarding short segment centring. I think there are three stages of understanding to go through in the evaluation of any new statistical techniques: understanding the mathematics behind the technique, understanding the technique in statistical terms and understanding how to interpret the results when the technique is applied to real data. I don’t believe that I have progressed in the last four years regarding the second or third of these so there is really nothing to add to my earlier comments. It may be that someone can provide a good explanation of the second stage, but I haven’t seen it yet. It also seems to me that because the second stage hasn’t been thought through properly, there are deficiencies in how the third stage has sometimes been presented, for example in comparing proportion of variation accounted for in short-segment centring with that from ‘ordinary’ PCA, which is like comparing apples with bananas.
As an applied statistician, the second and third stages interest me more than the first, but it crucial to know about the first before going further. Also anyone with a mathematical background takes pleasure in elegant mathematics. For this stage I have learnt something recently mainly from the work of a co-worker, who has derived relationships between the results (eigenvalues, eigenvectors and the PCs themselves) for different types of centring. For various reasons, I won’t be going public with these just yet, but in due course we hope to publish something.
This means to me that we may hear more about what is wrong with the short segment centering in the near future. Should be interesting. Full comments HERE.