Do “Adjustments” Lead to a Warming Bias?
Posted by Jeff Id on January 20, 2010
I keep telling you guys the guest posts are great, check it out.
Guest post by Dr. Craig Loehle
Recent news items have shown that adjustments to climate data seem to create warming trends where none exist in the raw data. The Darwin case evaluated by Willis Eschenbach (http://wattsupwiththat.com/2009/12/08/the-smoking-gun-at-darwin-zero/) shows the recent adjusted data to be tilted upwards. Willis also had another post about Karlen’s comments on IPCC data (http://wattsupwiththat.com/2009/11/29/when-results-go-bad/). An analysis of 88 stations in Siberia shows a cooling trend for the raw data but a big warming after adjustments by Hadley (http://heliogenic.blogspot.com/2009/12/adjusting-siberia.html). Data from New Zealand are adjusted by their own weather service to create strong warming where none exist in the raw data (http://www.climateconversation.wordshine.co.nz/docs/awfw/are-we-feeling-warmer-yet.htm). In none of these cases is a rationale for adjustments publicly available. It seems implausible on its face that all corrections for inhomogeneities should lead to a warming bias, especially when the UHI effect if corrected would cool recent years rather than warm them.
I have come across a paper that may help explain how Hadley, GHCN, and GISS do their adjustments and where the warming trends come from. The paper is:
Peterson, T.C., D.R. Easterling, T.R. Karl, P. Groisman, N. Nicholls, N. Plummer, S. Torok, I. Auer, R. Boehm, D. Gullett, L. Vincent, R. Heino, H. Tuomenvirta, O. Mestre, T. Szentimrey, J. Salinger, E.J. Førland, I. Hanssen-Bauer, H. Alexandersson, P. Jones, and D. Parker. 1998. Homogeneity adjustments of in situ atmospheric climate data: A review. International Journal of Climatology 18:1493‑1517.
I include 2 sections of text. 4.2.2 describes what the GHCN does, and then 3.4.4 describes the method they use for detecting inhomogeneities and adjusting them. There are other methods described in the paper which will have a similar effect, with some methods correcting slopes as well as means.
4.2.2. Global Historical Climatology Network. The Global Historical Climatology Network (GHCN; Peterson and Vose, 1997) includes data sets of both original and homogeneity-adjusted time series of mean monthly maximum, minimum, and mean temperature. Because of the paucity of available station history information for many of the 7280 GHCN temperature stations around the world, no metadata beyond latitude and longitude are used in the GHCN adjustment methodology. First a reference series for each station is made as described in Section 3.2 and Peterson and Easterling (1994) then the candidate reference series is tested using a 2-phase regression technique described in Section 3.4.4 and Easterling and Peterson (1995a,b). This determines the date and the magnitude of the adjustments.
3.4.4. Two–phase regression. Solow (1987) described a technique for detecting a change in the trend of a time series by identifying the change point in a two-phase regression where the regression lines before and after the year being tested were constrained to meet at that point. Since changes in instruments can cause step changes, Easterling and Peterson (1995a,b) developed a variation on the two-phase regression in which the regression lines were not constrained to meet and where a linear regression is fitted to the part of the (candidate_reference) difference series before the year being tested and another after the year being tested. This test is repeated for all years of the time series (with a minimum of 5 years in each section), and the year with the lowest residual sum of the squares is considered the year of a potential discontinuity. A residual sum of the squares from a single regression through the entire time series is also calculated. The significance of the two phase fit is tested with (i) a likelihood ratio statistic using the two residual sums of the squares and (ii) the difference in the means of the difference series before and after the discontinuity as evaluated by the Student’s t-test. If the discontinuity is determined to be significant, the time series is subdivided into two at that year. Each of these smaller sections are similarly tested. This subdividing process continues until no significant discontinuities are found or the time series are too short to test (B10 years). Each of the discontinuities that have been identified are further tested using a multiresponse permutation procedure (MRPP; Mielke, 1991). The MRPP test is non-parametric and compares the Euclidean distances between members within each group with the distances between all members from both groups, to return a probability that two groups more different could occur by random chance alone. The two groups are the 12-year windows on either side of the discontinuity, though the window is truncated at a second potential discontinuity. If the discontinuity is significant at the 95% level (probability (P)_0.05), it is considered a true discontinuity. The adjustment that is applied to all data points prior to the discontinuity is the difference in the means of the (station_reference) difference series’ two windows.
Figure 8. Annual temperature time series for Carlsbad, New Mexico. Solid line is unadjusted data. Long dash is the Jones homogeneity-adjusted data and the short dash is the U.S. HCN adjusted data for the same station. Despite very different methodologies and perhaps some differences in the source data, the long-term trends for the two adjusted time series are in good agreement.
Bold above is my emphasis. Figure 8 shows an application of their method (their caption), with figure 9 being similar. The recent period is unchanged, and the past years are dropped 1.4 Deg. C in a series that arguably shows no trend. There are two things to note: Adjustments are applied to the period prior to the “discontinuity” only, and the algorithm will force any earlier period that is warmer than the reference period to become cooler. If there is a wave in the data, such as a warm mid-century period, for whatever reason, this is likely to be identified as a discontinuity and flattened out. This algorithm might detect true discontinuities, but it is applied in an automated fashion to the global data by GHCN. It would seem from the figures that the Hadley algorithm is nearly identical. In their Figure 9, they identify the breakpoint with a plausible station move for Spokane, WA, but not for Fig. 8, Carlsbad NM.
This raises some questions: Is this still how they do it? How often is this type of adjustment done in the global dataset? Can anyone find it in the CRU code? Also, note that in data sparse regions, even a very few adjustments could have a big impact on regional contributions to global trends.
RomanM has an analysis (http://statpad.wordpress.com/2009/12/12/ghcn-and-adjustment-trends/) that bears on the adjustment question. He asked a very clever question: how do the mean adjustments in GHCN line up over time? His result showing the mean annual adjustments by year for the global data:
Between 1900 and 1990, there is a linear trend of more and more negative adjustments for older dates. Corrections for UHI would go the other way (more cooling for recent dates or warming of older dates). It is difficult to find any records in which a UHI correction looks like it has been applied. To me it looks like 1990 is a pivot point. Roman’s result above is consistent with applying the breakpoint algorithm beginning in the most recent years as the “valid” period and tipping down earlier data from the pivot point in 1990, which would make older dates progressively cooler. This is also consistent with the analysis of the USHCN database (R.C. Balling and C.D. Idso. 2002. Analysis of adjustments to the United States Historical Climatology Network (USHCN) temperature database. Geophysical Research Letters 10.1029/2002GL014825) which found a linear trend of progressive cooling of older dates (RAW-FILNET) between 1930 and 1995 that is even more pronounced than in Roman’s global analysis. A recent post on TheAirVent (https://noconsensus.wordpress.com/2010/01/19/long-record-ghcn-analysis/ ) shows a similar result to Roman’s using a different way of interogating the data:
Note that this plot starts in 1900 and is a cumulative plot rather than yearly.
A valid test of this adjustment algorithm would be to test selections of data, see what “discontinuities” are identified, and then evaluate the metadata to see how often they correctly identify true staion moves or instrument changes. In one of the citations they use as authority for the method (Vincent, L.A. 1998. A technique for the identification of inhomogeneities in Canadian temperature series. J. Climate 11:1094-1104) simple simulated climate data were used (without any major local trends such as a mid-century warming) and the algorithm found spurious inhomogeneities in 13.6% of cases. One of the reasons for this false-positive problem is, I believe, that the reference series compiled from nearby stations will be more uniform in time (smoother) than the test series, since it is merged data based on series that are mutually highly correlated. This will make it more likely to find false discontinuities. If only this percentage (13.6%) of stations were adjusted as in Fig. 8 above, then the overall cooling of early dates for a global dataset would be 0.19 Deg. C, which closely matches what RomanM found above. For real climate data (not simulated) I would guess that even more false breakpoints would be identified by this algorithm. Ideally, breakpoints found by the algorithm should be compared against metadata, but this is not done for the global analysis, which is automated. The Vincent ms is the only paper I could find so far that tests this approach to doing adjustments for inhomogeneities. Other papers only discuss applying the adjustments. Note that for the non-US sites, the TOBS and other adjustments for known issues can not be applied due to lack of metadata, nor is it likely Hadley even applies these to US sites (though I don’t know this for sure).
The above results point to either a conceptual problem with the automated adjustments, or a software implementation problem. It also appears that the algorithm is adjusting slopes, not just means, as would be appropriate for a station move or instrument change. When falsely identified breakpoints are sufficiently frequent that they can lead to false trends of the order of 0.25 deg C/Century (30% of the purported warming), then an automated adjustment method would seem to be not recommended.