## REGULARIZED LEAST SQUARES RECONSTRUCTION

Posted by Jeff Id on July 15, 2009

Another great guest post. Ryan figured out a method to do a stable multivariate regression without over fitting with similar results to TPCA. It’s comforting to see that even though the trend is slightly lower the results are very similar to the corrected EM reconstruction and the area weighted reconstruction. The math here is pretty sharp stuff.

———————————————

Ryan O:

One of the problems that Jeff Id has raised on numerous occasions is that none of the reconstruction methods we have examined and tried are truly reconstructions of surface air temperature. They are all in some fashion a mix between surface air temperature and skin temperature. Even the weighted eigenvector reconstructions only fully rotate the AVHRR PCs in the limit where the weight assigned to the AVHRR PCs approaches zero, and there are calculational limits on how low that weight can be set.

There are three different ways to resolve this problem that have been batted around. One is to calibrate the AVHRR data to the ground data. I had attempted to do this in the past without satisfactory results because the station data is so spatially incomplete. A second way is to supplement the weighted eigenvector reconstruction by assigning the stations an area of influence and multiplying them by an additional weighting factor. Jeff Id is currently working on this method. A third way is to not use any AVHRR temperature information at all. Instead of using the PCs (temporal EOFs), we will simply use the eigenvectors scaled by the eigenvalues (spatial EOFs).

To start, let us define our spatial EOFs as:

Lp is then a matrix containing columns with the eigenvector weights at the station locations for the first N satellite EOFs (U), scaled by the eigenvalues (sigma), and additionally scaled by the square root of the number of rows.

The problem we want to solve, then, is how to predict what the PCs should be in order to explain the maximum variance at the ground stations. This provides the maximal rotation of the PCs to the ground stations because it does not require the use of any satellite temperature information. We only use covariance information. Mathematically, this means finding a vector of temporal weights (a) for each time (t) given a vector of ground station temperature measurements (d):

The least-squares solution to this problem is given by the pseudoinverse of Lp times the station data vector:

I had tried this in the past and gotten reconstructions that were massively overfit. The reason for this behavior is that the eigenvectors were derived from the satellite data (so they will not be a perfect representation of station covariance) and that the station data itself contains noise. Dr. Beckers pointed me toward his 2006 paper where they used a regularized version of this to estimate the rms noise for the entire field when infilling. In our case, using the regularized version prevents the overfitting by limiting the fit based on the amount of assumed noise. The regularized equation is:

I is the identity matrix and mu squared is the mean square noise. Calculating the mean square noise is fairly simple:

Here, mu squared is simply the square of the point-wise difference between the calculated temperature using the first N satellite EOFs (x) and the ground station data (Greek letter xi). In other words, it is equivalent to the squared standard deviation of the residuals (or, the amount of station variance left unexplained by the satellite data).

Note: If you look at Dr. Beckers’ paper http://www.ocean-sci.net/2/183/2006/os-2-183-2006.pdf?FrameEngine=false , you will note that he calculates mu squared differently. I have sent him an email about this as I believe his listed equation is in error. The equation gives the mean square difference, which is not the same as the mean square error. For example, Dr. Beckers’ equation would yield an error of zero when comparing two sine waves 180 degrees out of phase, which is clearly not the intent. I believe this is a simple misprint.

Regardless, the correct form of the equation for our purposes (which are slightly different than Dr. Beckers’ – though I do not see why that would require use of the mean square difference) is the one I have listed above.

(Incidentally, the math behind our Iterative TSVD is identical to that used by Dr. Beckers’ DINEOF algorithm, so if you are interested in the mathematical basis for our infilling algorithm, his 2003 and 2006 papers provide an excellent reference.)

THE RECONSTRUCTION

In order to perform a reconstruction with this method, we must first infill the station data. As we have in the past, we will use the Iterative TSVD algorithm with the mode set to covariance, station set GRID 1C (all on-grid stations with at least 96 months of data), and impute up to 5 retained EOFs. This infilled station set is identical to the station set we used for the previous covariance-based reconstruction.

We then take the SVD of the transpose of the AVHRR anomaly data (so that the left-hand eigenvector contains the spatial information and the right-hand eigenvector contains the temporal information). Now we can extract the eigenvector values at the ground station locations for each spatial EOF, arrange them in descending order in matrix U~, and multiply each column by its respective eigenvector. This defines Lp. From there, we compute the pseudoinverse and calculate matrix A, with columns corresponding to a given satellite EOF and rows equal to the vector a for that particular month. Note that each column of A is a principal component, but calculated using the fit of the ground station temperatures to the eigenvector rather than satellite data.

From there, the reconstructed temperatures are computed as before: U A, where U is the matrix containing the spatial eigenvectors and A is the matrix of PCs.

This method is attractive for a number of reasons. First, it is a straight-forward least squares regression of the station data against the spatial EOFs. It does not require any black-box imputation algorithm, and a least squares regression is a readily understood calculation. And most of all, for the purpose of reconstructing surface air temperatures, the solution contains no direct satellite temperature information.

So, without further ado, here is a 28-spatial EOF reconstruction, using ground station set GRID 1C.

The really encouraging part is that even though we used only ground station temperatures and satellite covariance information, our cross validation statistics to actual satellite temperatures are double that of Steig’s.

Just for the heck of it, I did a 288-spatial EOF reconstruction. Bigger is better. There are a total of 300 possible spatial EOFs from the decomposition of the AVHRR data. The last 12 have essentially zero eigenvalues ( ~ 10-12), so 288 is the maximum number of EOFs that can be used. Essentially, this uses every last bit of covariance information available from the satellite data. The subtitles are slightly incorrect. I retained zero PCs; I retained only spatial EOFs. Everywhere you see “PC”, substitute “spatial EOF”.

So much for cut-and-paste.

Results are below.

As an interesting side note, I used Dr. Beckers’ noise estimation procedure to compute the rms error between the satellite and ground information across the entire grid. The rms error at ground station locations specifically was about 2.5oC, or a little better than Comiso’s reported ~3oC for his cloud masking procedure in Comiso (2000). Across the whole grid, the average was about 2.8oC. The difference between the satellite information and the stations – and, by extension, the entire grid – is actually a bit more accurate than Comiso advertises his cloud masking to be.

In other words, pretty good.

However, it would do all of us well to remember that the signals we are looking for – ~0.1oC per decade – is still 30 times smaller than the standard deviation of the difference between cloud masked satellite data and ground data. So even though Comiso delivered on his ~3oC promise, that still results in a massive difference in trend.

Compare:

Figured it was a good idea to add a little perspective to what ~3oC rms noise really means in terms of the confidence we should place in trends computed from the data.

## John F. Pittman said

Ryan O NOt to be a distraction, a bit of fun was sent Anthony at WUWT for supposing that a volcano could effect more than a small area, etc. Now this would be true, if we measured temperature everywhere, but we humans measure temperature of interest. Many find volcanoes intersting and temperature is one of their parameters. So cutting to the chase, The major volcanoes are centered in the .5red-black in Graham land, and Mount Erebus is centered in the 0.5 at the edge of the Ross ice shelf. See http://vulcan.wr.usgs.gov/Volcanoes/Antarctica/Maps/map_antarctica_volcanoes.html If you remove the bias by removing the weighted area for this, what does this do? I am assuming with your method that a disdrete area can be removed from your database, including the weather station if necessary; Specifically the Linear Trends (1957-200x). Ignore if not of interest.

## Kenneth Fritsch said

Ryan O, I want to read your post one more time before asking any questions. I hope that questioning in general here at Air Vent will not be inhibited by posters’ lack of linear algebra expertise as I judge that you have made a diligent effort to explain in rather plain English what your methods and calculations are doing. My questions will be concerning the satellite versus station data and how the satellite data can be used for establishing spatial relationships without being used directly in calculating trends. I will also have a question concerning the Steig authors’ intentions for how the satellite data was to be used in their reconstruction.

By the way, I am able to follow reasonably well how the linear algebra is being utilized from your explanations, but do not follow all the manipulations – as simple as they might appear to those who are more versed in this area of mathematics.

Once again, I think your efforts are what blogging should be all about.

## Jeff Id said

#2 I’m still getting a bunch of visitors here, it’s rather fantastic that the match to the EM versions is so good with a much more understandable (and common) method although figuring out how to implement it was pretty good stuff.

## Nic L said

Ryan O,

Congratulations on another excellent piece of work. The approach of regularizing by limiting the fit based on the estimated amount of noise is clever. I found the Beckers 2006 paper very helpful in following your math; I agree that his equation (15) for mu squared certainly looks wrong. There were one or two points where I wasn’t quite sure what exactly you were doing, but I will hopefully be able to work it out if you post your script.

One question. I had been wondering for some time how best to use the spatial correlation information in the AVHRR data to improve the surface station reconstruction without the inaccuracies and questionable changes over time in the AVHRR data influencing the temporal aspect of the reconstruction. With there being so few surface stations in large parts of Antarctica, and many of them (particularly the AWS) having relatively short and/or discontinuous records, extra information – even from a source with sizeable average errors – could perhaps measurably improve the surface station reconstruction. However, your approach doesn’t seem to use the AVHRR spatial correlation information for this purpose, as you carry out the surface station reconstruction without any involvement of the AVHRR data. Is that right? Do you think that the surface station reconstruction would be improved if it had the benefit of the AVHRR spatial correlation information? I don’t see why, in principle, one couldn’t somehow add that information in. For instance, would it achieve that object in RegEM if one added (with a suitable weight) covariances between AVHRR data at or around the locations of each pair of surface stations in to the main covariance matrix? Maybe you have tried something similar and found it is not worthwhile?

## Jeff Id said

Nic,

I forwarded Ryan’s script to you, he just sent it this afternoon.

I don’t know if you ever saw my hack method of distributing surface data by correlation. I wasn’t happy with it in the end because the covariance information wasn’t contained enough by linear weighting.

http://noconsensus.wordpress.com/2009/04/09/the-antarctic-an-engineers-reconstruction/

I’ve tried other variations on the algorithm with a variety of results similar to the one above. But it seems so hacked together, I didn’t know what to do with it.

## Ryan O said

#3, 4 Yah, you, Kenneth, and others have asked this quite a bit and I hadn’t yet thought of a clean way to do that. I’m working on a way to do this right now.

Basically, you use the existing data to estimate the amplitudes as I have above. Then you multiply those amplitudes through the eigenvectors and estimate temperatures at each station location.

Wash, rinse, repeat . . . until the numbers stop changing. If you do it that way, you use the satellite covariance information to guide the station infilling. The other convenient thing about it is that you don’t then have to recompute anything for the reconstruction. You already have the amplitudes . . . now you just multiply them through the entire eigenvector set.

I don’t like the idea of using RegEM for this, as it’s not really built to do that. I’m sure you could jerry-rig it to do it, as you mentioned, but it’s not natural to the algorithm and I’m not totally certain how to make the scaling work. It seems more natural to simply solve iteratively using least squares. Plus, conceptually, it’s really simple.

#1 At least as far as the edge of Ross goes, that’s actually due to misidentification of water and ice. Different emissivities are used for that, and it appears as if the satellites got it wrong sometimes. I have graphs of it somewhere that I’ll try to find. A long time ago I had masked all of those cases out and it didn’t change the overall trend much.

## Kenneth Fritsch said

Nic L, I like people asking these questions as they sometimes confuse me and that often leads to better understanding on my part. I have assumed that Ryan O uses the AVHRR data for spatial considerations in these reconstructions or how else would we see the full and detailed Antarctica trend maps? I’ll let Ryan attempt to deal with my confusion here.

I am now going to do a TCO and mail this one in without going back to the literature. I have read reference to the skin temperatures versus the air temperatures with, I think, the skin temperatures meaning the AVHRR measurements and air temperatures from the surface stations measurements. Is a conversion factor needed to convert skin temperatures to air temperatures or does it not matter if temperature anomalies are used?

Also I have a question about using the AVHRR data to spatially spread the station temperatures around the Antarctica by way of spatial covariances. Can the AVHRR data provide a valid tool for doing this without consideration of the AVHRR also providing valid trends without the station data?

## John F. Pittman said

Thanks. I get curious sometimes.

## Ryan O said

#7 Nic’s question is whether you can directly use the covariance from the satellite data to help impute the ground stations. In our reconstructions, we impute the ground stations without any satellite information and then use the satellite covariance to build the whole grid. The answer here is that I don’t think so. I’ve tried to directly use the satellite covariance and the predictive power is significantly less than just using the surface stations alone. Still more work there, though.

As far as skin vs. air, in the regularized least squares, it doesn’t matter. This one has zero temperature information from the satellite. All it has is covariance information. All of the other reconstructions are a blend of air and skin temps, though because of the weighting used, they’re primarily air.

Remember that the AVHRR data is decomposed into loadings (which contain the trends) and covariance maps. The covariance maps are static. They are not a function of time. So if you just use the covariance maps to recover the entire grid, you do not explicitly include any time-varying information – hence, you do not explicitly include trends. It is still possible that the trends in the satellite data might not quite geographically match the ground stations, which means the covariance maps won’t match perfectly, but that effect is quite small.

So if you only use the covariance, there is no need to calibrate. The only temperature information comes from the ground stations.

## Kenneth Fritsch said

Ryan O, thanks much for the explanations of the reconstruction details which appear in line generally with what I had surmised. I do admit that there are some nuances here that I will not truly understand until I find the time to go through R code calculations.

When you say above that the AVHRR data can be decomposed into loadings containing the trend data and covariance maps, you are saying, I presume, that where Steig uses both the loadings and covariance maps, you, in this reconstruction, have used only the covariance maps.

## Ryan O said

#10 Exactly.