## EIV/TLS Regression – Why Use It?

Posted by Jeff Id on January 8, 2011

This is a guest post reprinted with permission from Roman M’s blog statpad. I found this to be the best explanation I’ve read of the difference in OLS and EIV methods and it has the additional benefit of discovering that the solutions of the EIV regression are dependent on the input scaling. The effects of the scaling differences are dependent on the data you started with but EIV, looks to be a pretty dangerous here. The critique applies to several popular climatology reconstruction methods to which the problem may or may not be a minor one. Roman put it all all into one very nice post.

Roman M —–

Over the last month or two, I have been looking at the response by Schmidt, Mann, and Rutherford to McShane and Wyner’s paper on the hockey stick. In the process, I took a closer look at the total least squares (error –in-variables or EIV) regression procedure which is an integral part of the methodology used by the hockey team in their paleo reconstructions. Some of what I found surprised me.

A brief explanation of the difference between ordinary least squares (OLS) and EIV is in order. Some further information can be found on the Wiki Error-in-Variables and Total least squares pages. We will first look at the case where there is a single predictor.

**Univariate Case
**

The OLS model for predicting a response Y from a predictor X through a linear relationship looks like:

α and β are the intercept and the slope of the relationship, e is the “random error” in the response variable due to sampling and/or other considerations and n is the sample size. The model is fitted by choosing the linear coefficients which minimize the sum of the squared errors (which is also consistent with maximum likelihood estimation for a Normally distributed response:

The problem is easily solved by using matrix algebra and estimation of uncertainties in the coefficients is relatively trivial.

EIV regression attempts to solve the problem when there may also be “errors”, f, in the predictors themselves:

The f-errors are usually assumed to be independent of the e-errors and the estimation of all the parameters is done by minimizing a somewhat different looking expression:

under the condition

X^{* } and Y^{*} (often called scores) are the unknown actual values of X and Y. The minimization problem can be recognized as calculating the minimum total of the perpendicular squared distances from the data points to a line which contains the estimated scores. Mathematically, the problem of calculating the estimated coefficients of the line can be solved by using a principal components calculation on the data. It should be noted that the data (predictors and responses should each be centered at zero beforehand.

The following graphs illustrate the difference in the two approaches:

What could be better, you ask. Well, all is not what it may seem at first glance.

First, you might have noticed that the orange lines connecting the data to the scores in the EIV plot are all parallel. The adept reader can see from considerations of similar triangles that the ratio of the estimated errors, e and f (the green lines plotted for one of the sample points), is a constant equal to minus one times the slope coefficient (or one over that coefficient dependent on which is the numerator term). The claim that somehow this regression properly takes into account the error uncertainty of the predictors seems spurious at best.

The second and considerably more important problematic feature is that, as the total-least squares page of Wiki linked above states: “total least squares does not have the property of units-invariance (it is not scale invariant).” Simply put, if you rescale a variable (or express it in different units), you will NOT get the same result as for the unscaled case. Thus, if we are doing a paleo reconstruction and we decide to calibrate to temperature anomalies as F’s rather than C’s, we will end up with a different reconstruction. How much different will depend on the details of the data. However, the point is that we can get two different answers simple by using different units in our analysis. Since all sorts of rescaling can be done on the proxies, the end result is subject to the choices made.

To illustrate this point, we use the data from the example above. The Y variable is multiplied by a scale factor ranging from .1 to 20. The new slope is calculated and divided by the old EIV slope which has also been scaled by the same factor.

If the procedure was invariant under scaling (as OLS is), then the result should be equal to unity in all cases. Instead, one can see that for scale factors close to zero, the EIV behaves basically like ordinary OLS regression . As the scale factor increases, the result (after unscaling) looks like 1/the OLS slope with the X and Y variables switched.

However, that is not the end of the story. What happens if both X and Y are each scaled to have standard deviation 1? This surprised me somewhat. The slope can only take either +1 or -1 (Except for some cases where the data form an exactly symmetric pattern for which ALL slopes produce exactly the same SS).

In effect, this would imply that, after unscaling , the EIV calculated slope = sd(Y) / sd(X). To a statistician, this would be very disconcerting since this slope is not determined in any way shape or form by any existing relationship between X and Y – this is the answer when the data points are in an exactly straight line or when they are uncorrelated. It is not affected by sample size so clearly large sample convergence results would not be applicable. On the other hand, the OLS slope = Corr(X,Y) * sd(Y) / sd(X) for the same case so that this criticism would not apply to that result.

**Multilinear Case
**

So far we have only dealt with the univariate case. Perhaps if there are more predictors, this would alleviate the problems we have seen here. All sorts of comparisons are possible, but to shorten the post, we will only look at the effect of rescaling the all of the variables to unit variance.

Using R, we generate a sample of 5 predictors and a single response variable with 20000 values each. The variables are generated “independently” (subject to the limits of a random number generator). We calculate the slope coefficients for both the straight OLS regression and also for EIV/TLS:

Variable | OLS Reg | EIV-TLS |

X1 | 0.005969581 | 1.9253757 |

X2 | 0.010657532 | 1.8661962 |

X3 | -0.005656248 | 3.7607298 |

X4 | -0.003537972 | 0.6509362 |

X5 | 0.003616522 | 4.4236177 |

All of the theoretical coefficients are supposed to be zero and with 20000 observations, the difference should not be large. In fact 95% confidence intervals for the OLS coefficients all contain the value 0. However, the EIV result is completely out to lunch. The response Y must be scaled down by about 20%, to have all of the EIV coefficients become small enough to be inside the 95% CIs calculated by the OLS procedure.

**EIV on Simulated Proxy Data**

We give one more example of what the effect of applying EIV in the paleo environment can be.

As I mentioned earlier, I have been looking at the response by Gavin and crew to the M-W paper. In their response, the authors use artificial proxy data to compare their EIV construct to other methods. Two different climate models are used to generate a “temperature series” and proxies (which have auto-regressive errors) are provided. I took the CSM model (time frame used 850 to 1980) with 59 proxy sequences as the data. An EIV fit with these 59 predictors was carried out using the calibration period 1856 to 1980. A simple reconstruction was calculated from these coefficients for the entire time range.

This reconstruction was done for each of the three cases: (i) Temperature anomalies in C, (ii) Temperature anomalies in F, and (iii) Temperature anomalies scaled to unit variance during the calibration period. The following plots represent the difference in the resulting reconstructions: (i) – (ii) and (i) – (iii):

The differences here are non-trivial. I realize that is not a *reproduction* of the total method used by the Mann team. However, the EIV methodology is central to the current spate of their reconstructions so some effect must be there. How strong is it? I don’t know – maybe they can calculate the Fahrenheit version for us so we can all see it. Surely, you would think that they would be aware of all the features of a statistical method before deciding to use it. Maybe I missed their discussion of it.

A script for running the above analysis is available here (the file is labeled *.doc*, but it is a simple text file). Save it and load into R directly: Reivpost

## Jeroen B. said

Small correction: 2nd paragraph: hockey time -> hockey team

## Jeff Id said

This has particular meaning to proxy reconstructions when there is low signal to noise in a proxy and high signal to noise in the temperature data. Scaling to sd=1 with a signal of 0.1 could create some uniquely random results.

## Carrick said

Wikipedia also describes A scale-invariant version of TLS.

Why can’t these be used instead?

## Kenneth Fritsch said

Thanks, RomanM for sharing and instructing on these methodologies. It is threads like this one that make the online experience interesting and challenging to learn more.

## Kenneth Fritsch said

The above is from Carrick’s Wikipedia link. The question would be: did the authors of these reconstructions use the multicative normalization? Was there a reference to Paul Sameulson or the more recent references in their papers?

## Nic L said

I am quite sure that Mann, Rutherford et al did not use multiplicative (least products) regression in their reconstructions. All the references and the archived code for their EIV reconstructions that I have seen refer to standard truncated total least squares RegEM, which normalizes variables to unit standard deviation but, by the nature of TTLS regression, minimizes the sum of squares of the residuals for all variables, X as well as Y, measuring perpendicular to the regression line(s). That is the same as minimizing the sum of the squared perpendicular lengths. There are typically multiple X and multiple Y variables involved here, of course.

## Kenneth Fritsch said

Nic L, I surmised the same as you after I re-read the EIV methodology in Mann et al. (2008).

## Ruhroh said

Hey Jeff;

Even though I check Statpad, I had not made it through this pivotal article until you re-hosted it here. Wow! The EIV slope is somewhere between the OLS slope and 1/OLS slope with interchanged variables. That’s just fantastic!

As you pointed out (I think), the very low signal levels in many proxies create the necessity of massive scaling, for ‘calibration’ to the instrumental history. Yikes!

BTW, for the engineers in the group, what SNR are seen in ‘typical’ proxy records? Are the climascientologists better at pulling signal out of noise than USN sonar guys? How many dB down are those dang temperature signals?

Perhaps you could do a similar kindness for us, regarding the recent UC article at CA. I know there’s something important there, but I can’t seem to get it into my thick skull.

In your abundant free time…

This is obviously less important than having fun with your new son.

Thanks for all you do and have done…

RR

## Mark T said

Depends upon what you mean by proxy and what type of signal you’re trying to acquire. Furthermore, there’s no one answer. For radar the answer depends upon the link parameters and radar cross section (RCS) and it translates to probability of detection (Pd) and probability of false alarm (Pfa.) For communications the answer likewise depends upon the link parameters but it translates to bit error rate (for digital signals, of course) instead.

Those USN sonar guys are detecting signals with known parameters as they propagate through a medium with fairly well known parameters perturbed by noise/interference with fairly well known parameters.

Without well defined signal and noise characteristics, this question is impossible to answer. The definition of “signal” in climate reconstructions is just as arbitrary as the definition of noise.

Mark

## Ruhroh said

Well GARSH, Mark, not sure what I said to elicit that scolding.

While I have very little trust in the ‘instrumental record’ on which the proxy weights are determined, I had the impression that most of the paleoclimaticists are largely in agreement about that ‘signal’.

I was just asking for crude order of magnitudes from people that had wrestled with the actual proxy data.

I’ll ask it another way, probably even more flawed;

If you scale each proxy such that the ‘signal’ portion in the instrumented period, is of comparable magnitude as the ‘temperature’ in the ‘instrumented’ period, what do the plots look like? (I’m guessing we might need to relax the usual constraint on 0 K .)

Is this formulation of my general question any less problematic

?

RR

## Kenneth Fritsch said

Ruhroh, was not Mark’s point that signal to noise in a proxy where correct or expected signal is not known is different than in some engineering measurements where controlled experiments can be performed. You do have the instrumental period of the reconstructions but with the cherry picking exercises that are used by most reconstructions one does not obtain a good look at the potential noise. I suppose if a climate scientist tried real hard he could set up an experiment that might be more revealing on noise in the temperature signal, but normally these guys I surmise are fighting to find calibration signals during the instrumental period that correlate reasonably well with the proxies, i.e. they do not like to see noise.

I, in a very amateur manner, have attempted to estimate the amount of noise in the Ljungqvist proxies (2009 and 2010) by determining how much noise I have to add to a comparable instrumental series to degrade the correlation between distance separation and series correlations on a pair-wise basis to that I measured in the proxies. I am attempting to refine what I did – as it is very crude. My first attempt says the noise to signal, assuming the instrumental record has none was approximately 2.5 to 1, but could have been higher as the Ljungqvist proxy series do not correlate well at all. I am right now looking at doing the same comparison with the UAH gridded record since 1979 and at this point I think I can show that the GHCN ground station data has detectable noise in its signal compared to the UAH satellite data.

## Geoff Sherrington said

Roman,

Thank you for crystallizing a long-tern unease I have been unable to express clearly because I have been out of the field too long. IIRC, we were routinely using Mahalanobis concepts in ore grade estimation work in the late 70s. We would not have continued with it if it did not provide a gain in understanding.

There is an intuitive discomfort about curve fitting to data normalized to SDs. For a start, with time series of temperature data, the SDs can be calculated in different ways depending on how 10-second or 30 minute observations are aggregated into monthly means. My physics lecturers would have told me to keep to fundamental units, in this case K for temperature. I was lectured by quite a few physics experts because I kept failing and having to repeat. (Females were more interesting than study).

## Geoff Sherrington said

This might not be the optimum place to ask this, and forgive me if it has alreday been solved.

At the tropics the 4 seasons seem logical units in many ways, but at the poles there is more like a 2 part year, one dark and one light (approximately). When dealing with autocorrelation of effects related to light intensity or temperature, how is correction made for this variation on the time axis?

## Mark T said

How on earth did you take that as a scolding? Geez… lighten up, I was just answering your question as someone that does exactly what you asked about. How about “thanks for the information?”

Mark

## Carrick said

Geoff, it’s one of the reasons for calibrating station temperature to a regionally located proxy.

## Ruhroh said

Mark T;

“Thanks for the information”.

Actually, I was already fully aware of the information portion of your comment, which came across as condescending. It seems clear enough that any CA reader would have awareness that the ‘proxies’ are poorly characterized for both ‘signal’ and ‘noise’, and that exact answers will be N/A . I wasn’t asking for exact…

The value to me here is the realization that when I am flogging my colleagues (for inexact language or mushy logic on a shared engineering project where I’m the senior guy,) I can come across as an ogre, without regard to my good intent to reduce imperfections. I’m often the designated (dx/dt)**3 when it comes to details.

However poorly I phrased it, I am still interested in the relative magnitudes of ‘signal’ and ‘error’ (noise) as those terms are used in climatology. Dr Roman did show that the EIV calculations can be dramatically impacted by scaling.

That seems like pretty big news in ClimateLand.

I was trying to get a zero-th order idea to better calibrate my intuition.

Ken Fritsch gave a very helpful reply of the kind I sought to elicit, to help move the discussion along…

Resuming lurkiness;

RR

## Geoff Sherrington said

15 Carrick,

Thank you. In case I was not clear, the sun is directly above a tropical ground point twice a year and once a year outside the tropics. There is a gradation of cyclicity that is not corrected by simple cosine transforms from Equator to Pole for (say) irradiance and I suspect that it would complicate a global temperature reconstruction to which cyclicity corrections were being applied. I appreciate the value of stations being close to proxy sites for this and other reasons e.g. work on Huon Pine tree rings in Tasmania, where reliable stations are not close to the trees of the classic studies. My interest was more in correcting effects on the Y-axis as well as the X, the latter being emphasised in this thread.

## Mark T said

That’s ridiculous. Your question was clear and I answered it clearly – I dare you to point out one thing I wrote iin that post and demonstrate how any rational person could view it as condescending. Based on your response, you did not understand the information portion of my post (which was all of it.) Proxies used in climate science have nothing to do with any point I made.

If you did not want to hear from an engineer what a typical SNR was you should not have asked. If you were going to simply insult anyone that answered, why bother? Quite frankly this is the most assinine thing I’ve heard here.

Mark

## Layman Lurker said

#11 Kenneth Fritsch

Kenneth, another cool variation on your comparison of distance correlations would be to include GCM gridpoints as well. A mismatch with observations would IMO pull the rug out form under Gavin’s ‘GCM as null’ critique of McKitrick and Nierenberg 2010.

## Ruhroh said

Yikes!

I think I see where Murphy worked his inevitable law;

It is my impression that Jeff Id sometimes has referred to SNR in climate ‘data’.

After thanking him for hosting this terrific post, my goal was to ask him to rephrase the statistical arguments into more of an engineering jargon, particularly in the context of the ~flaky ‘proxies’ upon which the dismal dendroclimatology relies.

Thus, when I wrote ‘for the engineers’, I meant to ask him to translate it into a more familiar (to old EE like me) lingo.

I can see that some folks would parse that phrase of mine as a request to be educated about engineering practices unrelated to climate ‘proxies’. I think that is where the wheels first went off the track…

My wife says that English is a second language for me,

a distant second at that, and that my primary language has something to do with a multidimensional correlator which doesn’t involve words.

I’m a ‘work backwards from the answer’ person, so speaking to ‘normal people’ involves backwardization of my entire thought process. Sometimes I do it better than others..

I appreciate your bona fide effort to educate me about the difficulty in quantifying link margin, BER, RCS, et al. (after 36 years of engineering, I may be rather set in my ways…).

The point of my second reply was that you had reminded me how easy it is to offend smart people by seeming to talk down to them. I think I do this more often than I realize. That was me acknowledging my error and thanking you for the realization.

Apparently the second reply was no more successful than my prior efforts.

I regret my role in trashing this thread and the gratuitous insulting of Mark T.

Carpe dinero

RR

## Mark T said

Okie.

Realistically, the concept of SNR in this data is meaningless. To determine SNR the signal needs to be defined a priori (noise, too) but the signal is the unknown, it is the thing being searched for (my points above were that we/I start out with defined signal and noise and there is no “typical” wwithout other clarification.)

I think any portion of temperature data can be legitimately called signal though low frequency components definitely dominate. Fourier transforms of daily data yields a spectrum dominated by the 50 or so year cycle that could reasonably be called signal (17% of the power, btw.) The rolloff of spectral components is logarithmic which could mean a lot of things which muddies the waters even further.

Mark

## Mark T said

And, if anybody knows how to get this danged Droid 2 keyboard to stop the double letter thing, please, oh please, let me know. IIt’s really bad in these windows, marginally better when texting.

MMark

## Mark T said

Uh, yes to what Kenneth said, btw.

Regarding Roman’s post… my thoughts have always been that a linear trend applied to known cyclical data with at least some chaotic component is rather silly. Like everything else in climate science, the result is pre-supposed to be exactly what they find it to be. “And then a miracle occurs.”

Mark

## steveta_uk said

I would have thought that for dentro at least, the OLS is more appropriate anyway.

If my understanding is correct (which is highly unlikely) the ‘e’ error component is the noise in the tree ring widths, which as we all know could completely mask any temperature signal.

The ‘f’ error component would appear to reflect the error in counting rings. Since a seven-year-old should be able to count without significant error, which is EIV appropriate under these circumstances?

## steveta_uk said

Typo above: “which is EIV appropriate” should read “why is EIV appropriate”

## Kenneth Fritsch said

LL, that was my intention for the next part of my winter project. Once you get into these calculations/observations things tend to get a bit more complicated. Part of the problem involves lack of proxy data (I need to expand beyond the Ljungqvist proxies) and in some cases instrumental data from a spatial perspective. That is, obviously, not a problem with satellite and climate model data.

MW 2010 was quick to pick up on the difference between the auto correlation of real proxies and computer generated ones. I think, as you suggest, that there are many ways to look at these differences from proxies and it is rather apparent to me that that those producing reconstructions are not terribly interested in showing these differences.

Some issues that I have to consider in my analyses and comparisons include that UAH data is gridded/integrated in 2.5 X 2.5 degree boundaries while GHCN station and proxy data is applied at a point. My arima modeling has shown that finding the best fits between arima(1,1,1,); arima(1,0,0) and even arima(0,0,1) can be less than clear cut in some cases.

## Kenneth Fritsch said

I think this discussion of EIV and OLS methods is important in context of the Mann et al. (2008) paper. It has been my opinion that Mann et al in the 2008 paper is beginning to admit to a warping of the hockey stick with age, if not outright deterioration. When you look at the renderings of the reconstructions in the 2008 paper without the instrumental record tacked onto it you no longer see much recognizable as a hockey stick. Also important to note is that the 2008 paper can get back in time a “valid” reconstruction without tree rings, by the paper’s own standards, only with EIV, as OLS does not get beyond the Little Ice Age.

Remember also that with tree rings we have the sensitivity issues of the bristle cones, and, without the tree rings, we have the problem of the upside down Tijander et al. series.

Finally it should be noted that the 2008 paper uses a “screening” process to preselect proxies and without compensation for the increased uncertainty that that method could attach to the results – given that a valid calculation for that form of cherry picking is even possible.

http://www.pnas.org/content/early/2008/09/02/0805721105.full.pdf

Typical of these papers we see that, while EIV becomes the preferred method in the 2008 paper and apparently because it allows a “valid” reconstruction beyond 1500, that method also shows greater Medieval warming. The authors handle that by stating that the CPS method with the colder Medieval period and the EIV method bracket the correct warming. That is called wringing conclusions out of the analysis.

So in closing this long winded post I say the EIV method was very important to the 2008 Mann paper – and then along comes RomanM with cruel reality.

## Roger Caiazza said

I am a meteorologist and was gob smacked by this quote:

“the point is that we can get two different answers simple by using different units in our analysis. Since all sorts of rescaling can be done on the proxies, the end result is subject to the choices made.”

This is a perfect example of why meteorologists, climatologists and, especially, climate scientists should consult with statisticians before publishing anything with sophisticated statistics.

PS I learned that lesson early in my career.

## Brian H said

Mark;

About the KB; try opening Control Panel (assuming you are using Windows), Keyboard icon. Lengthen the repeat delay, and/or slow down the repeat speed. Should work.

## Brian H said

Roger;

Yes, statisticians are at or near the top of professionals the DIY amatoors at CRU should be consulting with off the top. But I think they got flamed by a few early on in the process, especially by ones who didn’t appreciate having their suggestions ignored or statements edited or deleted — while keeping their names as part of the author listing. Some near-lawsuits were required to get themselves dissociated from the final Official Dog’s Breakfasts.

So the core group battened the hatches and keeps all such disruptive input out.

## Mark T said

Droid 2, i.e., my phone. Total pain in the ass.

Mark

## Mark T said

Oh, wow. I did not notice that. Unbelievable. Yet, of course, WE are the anti-science ones. Sigh…

Mark

## Chas said

I am slightly puzzled; biologists often use ‘ranged major axis’ [RMA] regresssion (aka ‘reduced’ major axis)- for model II errors are they talking of the same thing scaled TLS?

If so, the the slope of the RMA regression (scaled TLS ?) is the slope of the OLS/r. Having messed with RMA a little (there used to an RMA excel add-in floating around) it is indeed very disconcerting that the slope for totaly uncorrelated x and y’s is +1 ,-1 rather than 0 in OLS.

However, I was under the impression that RMA regression shouldn’t be used for making predictions, which is what they appear be doing with the proxies and temperatures (?)