Posted by Jeff Id on December 9, 2010
The most pervasive theme in the “Doing It Ourselves” thread was what the reviews said. Rather than respond individually, I thought it would be best [read: less work for me!] to do a post on it.
This post will focus on the comments that required changes to the manuscript. With one exception, I will not spend any time on the comments that we addressed without changes. There are a number of reasons for this. The biggest reason is that several of these comments required only wording changes for clarity as the comments were motivated by misunderstandings. Another reason is that successfully addressed comments have no bearing on the science that will be published.
As I mentioned before, there is one exception that I would like to bring up, because it makes a salient point about how important wording can be in a paper. Quoting from S09:
In this Letter, we use statistical climate-field-reconstruction techniques to obtain a 50-year-long, spatially complete estimate of monthly Antarctic temperature anomalies. In essence, we use the spatial covariance structure of the surface temperature field to guide
interpolation of the sparse but reliable 50-year-long records of 2-m temperature from occupied weather stations.
Now . . . what does this mean?
Depending on how much you know about the S09 method, this could be interpreted in a number of ways. If you know quite a bit, it could be interpreted as combining infilled station data with the AVHRR spatial eigenvectors. This would be the mathematically correct interpretation (well, almost correct). If you know not as much, it might be interpreted as using the AVHRR spatial structure to help predict ground data. This would not be a mathematically correct interpretation. During the review process, one reviewer took the later interpretation, and generated this comment:
. . . S09’s methodology is less sensitive to errors in the ground stations than is RO10’s because the former uses information from the AVHRR data when infilling the ground stations, while RO10’s does not. This does not necessarily mean that S09’s methodology is superior: RO10 provides a sound argument as to why the use of the AVHRR data to help infill the ground-station data may possibly be problematic . . .
This is easily shown to be the improper interpretation. Our response:
The claim by the reviewer that S09’s methods are less sensitive to the quality of the ground data is not accurate, and the reasoning given is also inaccurate. In S09, the number of ground stations (42) overwhelms the contribution from the PCs (3). One can test this by splitting S09 into 2 steps: first infill the ground stations, and then add the PCs. If this is done, the following results are obtained:
West Antarctica Peninsula East Antarctica Continent
Original S09 0.20 +/- 0.09 0.13 +/- 0.05 0.10 +/- 0.10 0.12 +/- 0.09
2-Step S09 0.19 +/- 0.09 0.13 +/- 0.05 0.10 +/- 0.10 0.12 +/- 0.09
As you can see, the results are virtually identical.
The primary point from all of this is that clarity matters a great deal. Note that S09 do not claim in the paper that using the PCs to help predict missing ground station data makes the reconstruction less sensitive to errors in the ground station data. However, if the statement that is actually present in S09 is improperly interpreted because the description of the method is not entirely clear, this may be interpreted as a claim of improved performance.
Okay, with that finished, let’s move on to the review comments that elicited changes. One group of comments, which I will simply list out, were minor editorial changes, which included (with like comments or issues that were repeated more than once in the text combined):
1. Inconsistent mathematical notation (several spots)
2. Inconsistent citations (e.g., North, 1982 instead of the correct North et al., 1982)
3. Remove a reference to RealClimate
4. Remove discussions concerning GCMs (none of us agreed with this, but as it was not relevant to the main point of the paper, we yielded on this one)
5. Make the abstract consistent with the main text (e.g., note that statistically significant warming is found for the West Antarctic regional average in the abstract)
6. Explain briefly how our criticisms apply to all of the S09 reconstructions (TIR, AWS, and standard PCA)
7. State whether confidence intervals took into account a degrees-of-freedom reduction due to serial correlation of the residuals
8. Change the title to more appropriately reflect the scope of the paper (the original title was “Deconstructing the Steig et al.  Antarctic Temperature Reconstruction”)
9. Move as much of the relevant information from the Supporting Information to the main text, and keep the Supporting Information to a manageable length (originally, the Supporting Information was double the length of the paper)
10. Specify whether the residual trend between the raw AVHRR data and ground data was statistically significant
11. Keep the same style (i.e., active or passive voice) throughout the paper, rather than switching styles between paragraphs
12. Clarify the description of the S09 method to prevent confusion
13. Clarify the section on using RegEM to attempt to calibrate unlike variables (several portions needed clarity)
14. Rewrite the section on the difference between the AVHRR eigenvector weighting and the weighting used in RegEM (all of the original 3 reviewers had similar comments concerning the difficulty they had understanding this section)
15. Clarify the explanation of how the S09 method geographically relocates the Peninsula trend
16. Clarify the equations used to generate summary statistics listed in the tables
17. Fix several improper table reference numbers (i.e. “Table 5” when what was meant was “Table 6”)
18. Add more substantial justification for the choice of regularization parameter in the RLS method
19. Add a table showing seasonal trend results
20. Changes to figures:
a. Move the replication of S09 figure to the Supporting Information
b. Move the area definitions from the SI to the main text, and add station locations
c. Include a figure visually demonstrating the claim that “variance loss in our reconstructions is small”
d. Move the seasonal trend maps from the SI to the main text
e. Show boundaries of statistically significant trends on the trend maps
f. Include a figure showing statistically significant differences in trend between RO10 and S09 to visually demonstrate the claims of differences in the text
Quite a laundry list of changes . . . and remember, I combined a lot of them. These were the kinds of things that are indeed important, but that “blog reviews” generally do not catch. Peer review, on the other hand, did a good job in requiring that they were fixed.
So those were the minor changes. Now for the major ones!
MAJOR CHANGE #1
The most substantial change (which was due to the reviewer who generated the 88 pages of back-and-forth commentary) – and probably the most important (must give credit where credit is due!) – was for us to remove our primary reconstructions from the main text.
The initial version of the paper used reconstructions where we infilled the missing ground station data using RegEM TTLS. We chose this route to have reconstructions that were as close as possible to S09’s method. However, the results are strongly dependent on the truncation parameter used. To support our choice of truncation parameter, we spent a good deal of time in the paper explaining a rather comprehensive cross-validation method and several alternative methods that all arrived at the same result. This reduced the amount of time that we could spend actually discussing the results and added some confusion by referencing a bunch of different procedures.
During the review process, we provided data from RegEM Ridge showing that ridge regression provided the same general results, but with much improved verification statistics and superior reproduction of the LF information in some key areas (such as the Peninsula and West Antarctica). The reviewer suggested that, as we believed the ridge regression results to be superior, these should be the ones shown in the main text.
This resulted in a major rewrite of the paper, and a complete re-do of all of the calculations, figures, and tables (I spent about 3 weeks on it). The entire “Results” section had to be thrown out and rewritten from scratch, the code required some major rewrites, much of the commentary about truncation parameters no longer applied, and the cross-validation method had changed.
I won’t go into detail here about the differences between RegEM TTLS and Ridge – you’ll have to wait for the published paper for that!
MAJOR CHANGE #2
The second substantial change was something we instituted on our own based on cross-validation concerns generated by the same reviewer. While all of us felt that these concerns were poorly justified, demonstrating this required a whole new set of cross-validation statistics. Since the new calculations provided stronger evidence that our concerns with the S09 method were valid, we removed the old cross-validation methods from the text and code and substituted the new.
MAJOR CHANGE #3
Another rather important change – which was requested by 2 of the 3 initial reviewers – was to detail the effects of each of the proposed modifications. For this, I will quote excerpts from our review responses.
Effect of including additional satellite eigenvectors alone
In the full period, this variant captures some (but not all) of the features in the RO10 reconstructions. Those features captured are the reduced warming in the Ross region as compared to S09 and better localization of the Peninsula trends. Absent are the prominent Ross, South pole, and Weddell area cooling. Additionally, the continent-wide trend is much closer to S09 (0.010) than RO10 (0.05).
More significant differences are apparent in the subperiods. The 1957 – 1981 plot looks far closer to the equivalent S09 subperiod than the RO10 reconstructions. The Ross cooling is reduced, the pole is warming instead of cooling, and the strong warming in Victoria/Wilkes Land is absent. Given that the latter two features are in well-observed regions of the continent and match ground records, their absence is significant.
The 1982 – 2006 plot is also substantially different, as it is merely the truncated, but otherwise unaltered, AVHRR data. This is a crucial observation. If the regression coefficients are directly compatible with the AVHRR eigenvector weights, then using the modeled PCs could not greatly alter the patterns of this subperiod.
Effect of additional satellite eigenvectors + constraining the regression coefficients by the eigenvector weights (i.e., add a constraint that prohibits “negative thermometers”):
In the full period, [constraining by the AVHRR eigenvectors] provides patterns that are very similar to the RO10 reconstructions. Most of the essential features are captured, albeit with the Weddell and South Pole areas showing less cooling than RO10. While the spatial patterns are reasonably well represented, as noted in the response to the previous problem, this does not extend to the overall magnitude. This variant captures only 2/3 of the difference in the continental trends, leaving a substantial portion unaccounted for.
In the subperiods, the patterns remain significantly different from RO10. As Mod 3 only affects the 1982 – 2006 period, the 1957 – 1981 plot is unchanged and retains the same deficiencies noted in Variant 1. The 1982 – 2006 plot, on the other hand, looks substantially different from both Variant 1 and the RO10 reconstructions. It is clear that properly calibrating the PCs has a significant impact on the spatial distribution of trends. This confirms the statements in our text that the coefficients used to predict the PCs differ materially from the weights used to recover gridded estimates, and shows the reviewer’s belief that use of the modeled PCs has little impact on the spatial patterns is not correct.
Furthermore, the 1982 – 2006 plot is missing all of the essential features of the RO10 reconstructions. It shows a visibly apparent loss of variance, displays a large cooling region in the Ross area, and is missing the Victoria/Wilkes Land and Weddell area cooling.
Effect of constraining the regression coefficients by the eigenvector weights, properly calibrating the PCs, and using only 5 eigenvectors:
With only 5 PCs but including [use of the properly calibrated PCs and physical weighting constraints], most of the essential spatial features of the RO10 reconstructions are present, both in the 1957 – 2006 period and in the subperiods. Though there is visually apparent variance loss between these reconstructions and RO10 – and the warming in Victoria Land near Cape Adams is significantly reduced in the 1957 – 1982 period – the overall pattern is close to the RO10 reconstructions.
It is clear that additional eigenvectors alone cannot account for the spatial differences between S09 and RO10. The same is true of the combination additional eigenvectors and [use of the properly calibrated PCs]. Furthermore, the dependence on the number of retained eigenvectors is less than implied by the reviewer, as most of the essential features of RO10 are reproduced with as few as 5 retained eigenvectors.
Five eigenvectors, properly calibrated PCs, but failure to use physical weighting constraints::
This variant demonstrates the significant impact of [the weighting constraints]. While most of the full period features are captured in this reconstruction, the subperiods are clearly different from both [the previous variant] and the RO10 reconstructions. In particular, without [weighting constraints], the 1957 – 1981 and 1982 – 2006 subperiods are virtually identical, with the exception that the latter displays muted trends.
To address the concern that the contribution of each modification is not documented, we have amended the text to include the table at the beginning of this discussion.
MAJOR CHANGE #4
The final major change – and one that I was quite reluctant to part with – was to remove the discussion of Chladni (e.g., standing wave) patterns in the eigenvectors. However, since we could not spend too much time developing this argument (most of the relevant stuff was only in the SI), the section did seem somewhat out-of-place. So we yielded on this issue and removed it. However, the discussion of Chladni patterns is of critical importance in general to valid methods of choosing truncation parameters.
We therefore intend to write a standalone paper dealing with this issue (work on that has not yet begun).
Edit: Added from below
There are three ways in the reviewers requested/proposed modifications:
1. Present a reasoned, substantiated argument that something we had said was incorrect, incomplete, or not sufficiently detailed.
2. Request that additional information be provided as it would be useful for readers.
3. Make unsubstantiated, hand-waving claims that something is wrong.
For the “laundry list” of minor items presented above, most of these fell into #1 or #2, and were what I would consider part of the “typical” review process. These primarily came from two of the initial reviewers, and the fourth reviewer that was added later. For the “MAJOR CHANGES” listed above, all came as a result of comments made that fell into category #3.
In the specific case of the effect of the modifications, one of the reviewers suggested that a table or discussion of the importance of each would be valuable to the reader. This, of course, would have been easy to comply with and would not have required an inordinate amount of discussion. Another reviewer – rather than asking what the effects of the individual modifications were – made several claims that were not substantiated in any way:
1. The difference in patterns of trends was due “almost entirely” to the use of more AVHRR eigenvectors
2. The difference in magnitude of trends was due to the use of the calibrated, modeled PCs
So rather than simply add a table and discussion concerning the effects of the modifications, I had to spend 5+ pages demonstrating that the hand-waving claims of the reviewer were incorrect. The reason I had to spend this time was because the reviewer used his claims to attempt to make yet another claim that we did not show that the Peninsula trends were geographically relocated by the S09 method, and that the reduction in the continental trend was due to an arbitrary choice concerning the PCs and not through any mathematical requirement or objective criteria. This resulted in a great deal of extra work for us, an unnecessary delay, and an overly long response.
While the end result was that the paper was improved, a great deal of the work required to implement the improvement was, in my opinion, valueless.
The frustrating (and unnecessary) part of the review process was the sheer number of completely unsubstantiated claims that we ended up having to show were groundless. In my opinion, it is perfectly acceptable for a reviewer to request additional information or additional research to support the conclusions in a paper. What should not be acceptable is for the reviewer to force the authors to respond to arguments for which the reviewer presents no evidence that his claim is correct. The former merely requires the authors to perform value-added activities, while the latter requires the authors to perform a heap of extra, non value-added work to address unsubstantiated hypotheses. Just as authors are required to show objective evidence for their claims, so should reviewers, as the reviewers can affect whether a paper is published or rejected.
So while it is true that in the end the paper contains stronger evidence for our conclusions than it did in the beginning, I question whether the amount of effort required was justified. One can spend five years sanding and re-lacquering a table to make it “better” than a 3-day refinishing job, but when 99.9999999% of the people who see it can’t tell the difference and the ones who can don’t really care, was the extra 4.99 years of effort worth it?
Anyway, the biggest issue I had with the reviews was that one particular reviewer insisted on making claims – which we had to rebut – yet rarely provided evidence that these claims were true. In my mind, that is not how the process is supposed to work.