PART I (West Antarctica)

**IMPORTANT NOTE: The code has been modified to extract station-by-station cross-validation results for TTLS. If you are using the code, please download the new copy RO10 Code.txt and use that. Data is here internal.cv.Rdata. **[I have added a .doc extension to each link for wordpress compatibility, just delete the .doc and the files are ready to use. - Jeff]

One of the issues brought up by Steig concerning our reconstructions concerns a parameter known as “kgnd”. Steig’s argument is that this parameter was chosen improperly for our TTLS reconstructions in the SI (it is not a parameter for the main iRidge reconstructions), and, if a different value is chosen, our SI reconstructions match S09 better. Specifically, Steig claims here:

The choice of k, and it just so happens that this yields the minimum trends. In fact, O’Donnell et al. show in a table in their Supplementary Material that the mean trend for West Antarctica for smaller values of k_{gnd}that yields the best agreement with theiridgecalculations (which, remember, is already known to create problems) happens to be k_{gnd}= 7_{gnd}is more than twice (~0.2 °C/decade) what it is for their ‘optimum’ estimate of k_{gnd}= 7 (~0.07°C/decade).{emphasis added}

Steig then goes on to claim that the closer concordance between the kgnd = 5 / 6 reconstructions and the S09 reconstruction means that we chose the *wrong* parameter value. His justification is nothing more than the other values agree with his reconstruction better. It is not based on any cross-validation results or other calculations (excepting, perhaps, the erroneous calculation of the Byrd station trend, which we evaluated here). Steig – as per usual – doesn’t do calculations to evaluate his own claim. He simply states things as *true*. And – as per usual – these things turn out to be somewhat less than true.

Before we begin discussing why this claim is not accurate, let’s make sure we understand what “kgnd” *is*. First, let’s talk about RegEM, which is where “kgnd” appears. RegEM belongs to a class of algorithms called *Expectation-Maximization* algorithms. In the case of RegEM, the “Reg” refers to “regularized”. We will defer the discussion of regularization for now, as we first need to understand a bit about EM.

EM has been around for quite a while (since the 60’s or thereabouts) and was formalized by Dempster et al. in 1977. The idea behind it is quite simple: use the available information to take a *best guess* at the missing information. Then, assuming that the previous *best guess* is better than having no information at all, use those results to make a *better* guess. This process continues until each subsequent guess is nearly equal to the previous guess . . . in other words, until the guesses stop changing. In the case of linear regressions, as long as the regression model is appropriate for the data and the proper correction is applied for the bias generated by the predicted variances, this general framework was mathematically proven to converge to the maximum likelihood estimate of the missing data.

Rather than use ordinary least squares for the regression function, RegEM utilizes *regularization*. The reason for this is because many times – due to large gaps in coverage – the data matrix is *ill-conditioned*. This means that there are too many unknowns and not enough overlaps in the data to provide a unique answer. Such problems come up frequently in non-laboratory analyses like analyzing climate data. Regularization provides a way to obtain a unique solution by adding some additional constraints.

The RegEM algorithm provides two types of regularization: ridge regression and truncated total least squares (TTLS). For this post, we will only talk about TTLS. Ridge regression will be discussed in a follow-on post.

TTLS relies on a mathematical technique called the *eigenvalue decomposition* (and can alternatively be calculated using the *singular value decomposition*, or SVD). It is easier to visualize using the SVD method, so that is where we will start.

The SVD is a way of restructuring a data set based on the amount of shared information between the variables. Qualitatively, how it works is to find a *spatial eigenvector* and *principal component* (PC) pair that best represents the data. It then finds the next best representation for the remainder . . . and so forth . . . until the data set has been completely *decomposed* into eigenvector-principal component pairs: