This is a guest post by NicL where he examines the effects of a variety of synthetic data on the Antarctic reconstruction. Nic has an interesting approach to his analysis which is always a good thing when looking at a complex algorithm. I’ve uploaded a Word document at the bottom which contains the original formatting and code for this post.

### ===================================================

UPDATE: Due to some new developments in the R version of RegEM used in this post. The results have been updated in a paper by Nic here. Although the result has improved it has changed very little and the conclusions are unchanged. A link to the revised post is here.

The effect of surface station trends on Steig’s reconstruction2

====================================================================

### The effect of surface station trends on Steig’s reconstruction of Antarctica temperature trends

In this article, I aim to show that RegEM, the Regularized Expectation Maximization algorithm used by Steig in its TTLS ((truncated total least squares) variant, is highly sensitive to trends contained in its input data, even when the trending data series have little or no relationship with each other apart from both having trends; to propose a method of counteracting this sensitivity and to evaluate it; and having demonstrated that the method works well to show that using it produces much lower average reconstruction trends than Steig’s method. But before I start, I would like to set readers a puzzle. Which surface station has the most impact on the average 1957-2006 trend per Steig’s reconstruction, and how much impact does it have? The answer, which I think will surprise most people, and casts further doubt on Steig’s results, is given in the last section of this article.

### The sensitivity of RegEM to trends in the data

There has been considerable discussion, in the context of Steig’s RegEM based reconstruction of Antarctica temperatures from 1957-2006 using satellite AVHRR data from 1982 on, of the likelihood that RegEM may impute a long term trend from one data series to another based purely on short term (high frequency) correlation between the two data series. I believe that is indeed a major concern. It is entirely possible for the temperatures at two sites to be affected similarly by short term factors but for them to exhibit entirely different trends over the long term. If, say, temperatures in the Antarctic peninsula exhibited a high frequency correlation with the data series representing the satellite temperature measurements, local temperature trends in the peninsula arising from oceanic causes could have a distorting influence on the satellite data series based reconstruction of Antarctica average temperature trends.

However, I have seen little discussion or investigation of the sensitivity of RegEM to data series that exhibit long term trends but which are not otherwise correlated in any significant way with the remaining data series – where high frequency correlations are negligible. I have therefore carried out some investigation into this subject. In order to do so, I first modified Steve M’s R-port of the regem_pttls function so that it would, inter alia, produce and report the RegEM TTLS solution, in terms of principal component (PC) time series (equal in number to the regularization parameter regpar) and weights on those PCs for each of the data series, from which the “unspliced trend” can be derived.