Deconstructing a Reconstruction of Temperature Trend

emperorpenguinI’m a tired penguin but I’m not waiting for this post.

First I tried everything I could with stats until the RegEm program began converging. I’ve been able to use the Automatic Weather Station data (AWS) combined with the surface stations to recreate an AWS reconstruction. Without the actual implementation code or appropriately detailed descriptions I’m forced to guess at actual parameters for implementation.

I used RegEm, as provided by dr Steig with the data as he also provided it to make the following graphs. There are a pile of them because I don’t have the actual implementation code and I haven’t verified which version (if any of these) he used yet but here it goes. I think you’ll find some interesting detail.

I first used the R scripts provided by CA to download the data at CA for the aws and surface data. I placed them side by side with the 63 AWS data sets on the left and the surface stations on the right being careful to align the data by time. Then I used R and excel to process the graphs into the correct sized and oriented matricies for RegEm as shown on the internet.

My next step was to use matlab to ‘cram’ the data through RegEm to generate some output’s as determined by Steig et. al guessing at parameters to find what makes the algorithm converge.

After several hours, I succeeded.

In the end I found that the important parameter was OPTIONS.regpar which had an amazing amount of control over the final trend. In the methods section the paper states:

Principal component analysis of the weather station data produces results similar to those of the satellite data analysis, yielding three separable principal components. We therefore used the RegEM algorithm with a cut-off parameter k=3. A disadvantage of excluding higher-order terms (k>3) is that this fails to fully capture the variance in the Antarctic Peninsula region.Weaccept this tradeoff because the Peninsula is already the best-observed region of the Antarctic.

K = 3 corresponds to an OPTIONS.neigs value of 3. This means they want to define the temperatures of an entire continent by linear combinations of only 3 trends. You have to wonder why only 3. As far as I know, there are no references in the paper, SI or anywhere I could find to the actual value of OPTIONS.regpar.

So is the antarctic warming?

recon-3-1

recon-3-2

recon-3-3

recon-4-11

recon-4-2

recon-4-3

recon-4-4

recon-10-1

recon-10-21

Trends are included in the graphs. Besides that what do the graphs show? Well to me they show that setting the regpar value, or truncated total least squares truncation parameter (regpar) to a value greater than 1 results in a flat or negative trend which is more consistent with John Christy’s work.  With only 3 pc’s for an entire continent as in the first 3 graphs and allowing the regpar to be a value other than 1 doesn’t hurt the overall appearance but the trend is lost. The same thing happens with 4 pc’s or even 10 pcs. A regpar value of 1 makes it work.

The other thing you can see is that higher degrees of freedom in regparm create a huge step in the data prior to he AWS station’s implementation Figure 9 and especially Figrue 6 (three from the end) demonstrate this.

Now it is fair to say I haven’t verified the quality of these trends or the quality of match to the reconstruction presented, that is for tomorrow. I also don’t know what is was used as a reasonable value for regpar. What I do know is that this effect apparenlty isn’t mentioned in the paper and makes a substantial difference in the final result.  From my trends above I have 6 of 9 graphs with values in the required range of Steig et al that produce flat or negative trends. Had I plotted regpar of 3 through 9 in the last two neigs=10 graph, rest assured I would add another 6 negative trends.

I need to gain an improved physical understanding of the regpar value. As I understand it now, a higher number gives more degrees of freedom to the linear fit. If my interpretation is reasonably correct, there should be a section in the paper describing these results and explaining them away with overfitting of trend using higher regpar.

Still, I’m not sure yet so as in politics, I reserve the right to revise and extend my remarks.

If I had the code, this analysis wouldn’t have required all three R, Matlab and Excel.

I’m exhausted so I’m going to bed, I hope the wording is intelligible.

Thanks to Steve McIntyre for suggeting I do this analysis and helping me figure this out and a special thanks to RomanM who was patient to answer my questions and thanks to Eric Steig who made this so much more entertaining than it should be.

Consider that my post wouldn’t be sitting here using Eric Steig’s math with Steig et. al. data and 6 of 9 graphs showing a flat or negative Antarctic temperature trend, if they had just shown me the code and data as I politely requested.

Oh and I nearly forgot, I’ll release my code for how I did this after Eric Steig releases his. (or tomorrow afternoon, whichever comes first)

23 thoughts on “Deconstructing a Reconstruction of Temperature Trend

  1. Good job, Jeff, but who is going to pay you for your work?
    You pulled this job – the basis of Steig et al.’s work – in let’s say 24 hours – for nothing. Steig et al. probably have received the equivalent of 2 months of salary for their work.

  2. I received an amusing error message when trying to make that last post:

    Goshdarnit!

    Something has gone wrong with our servers. It’s probably Matt’s fault.

    We’ve just been notified of the problem.

    Hopefully this should be fixed ASAP, so kindly reload in a minute and things should be back to normal.

    Who is Matt? Enquiring minds want to know! 🙂

  3. I just read on the “Steig Corrections” thread on CA that RegEM was designed as an interpolation tool. It was never intended for extrapolation…

  4. I messed up on the formatting, #4 was supposed to be a response to your comment:

    “The other thing you can see is that higher degrees of freedom in regparm create a huge step in the data prior to he AWS station’s implementation”

  5. One wonders if Steig considers Jeff worthy of receiving his code yet…

    He may have to defeat him in a light sabre duel. Rumor has it Steig hid the code in a droid.

  6. #5

    That’s what I understand, M08 used it as an extrapolation mechanism also. Now that I can run it I might be able to figure out how RegEm was set up for M08.

    #7 – I do like StarWars references funny stuff. As I fell asleep I considered actually putting a pic of Vader and Skywalker with blades crashing instead of the penguin.

    Seriously though, this imputation algorithm seems pretty sensitive to it’s setup parameters in this case. In my humble opinion there isn’t enough data in these matrices to make the claims. I guess I don’t know the whole story yet but when you see the matrix laid out before conversion and you find out that it simply didn’t work when you go back in time any further this algorithm has been pushed right to its limit.

  7. Jeff, pondering the following info on “regpar” from RegEM doc, I would suggest sqr(eps) is a very small real number << 1, reflecting the intended precision, as it is used like that in numerical programs.

    % OPTIONS.regpar Regularization parameter. not set
    % For ridge regression, set regpar to
    % sqrt(eps) for mild regularization; leave
    % regpar unset for GCV selection of
    % regularization parameters.
    % For TTLS regression, regpar must be set
    % and is a fixed truncation parameter.

  8. This is great stuff. Since programming is not my area, I only half understand half of half of what you write. Nevertheless, I get the overall gist of it, and your work is invaluable.

  9. I don’t even get half of half of half. Is there a “Dummies” version for the layman?

    I can write a webpage by hand, but that’s not really programming. If that’s what it primarily entails, it may be over my head anyways.

  10. Hugo, I saw that in the RegEm file. I’ll try lower values when I’m comparing it to the final result but gavin added a bit more confusion for me in what is a rare successful pass through moderation at RC it had a link so we’ll likely get a pile of RC regulars.

    I got the RegEm to work. My last comment didn’t show up in moderation so cut this one if you need to 😉 . Since I don’t have the parameters you used these were my best guesses.

    https://noconsensus.wordpress.com/2009/02/10/deconstructing-a-reconstruction-of-temperature-trend/

    [Response: Well done. Now you have learned something. You now also have the means to test what the results are and are not sensitive too – try looking at monthly anomalies, rather than raw values, the impact of including more eigenvalues and the various other issues discussed in Tapio’s papers. – gavin]

    I’m not sure what I learned from guessing at the steps, I think he doesn’t know me very well yet. Anyway he suggests using anomalies instead of temperature so I sent him back a quote from methods which is confusing.

    Are you saying the AWS data is normalized prior to usage because I wasn’t sure from this chunk in methods where it says —

    “and using the READER occupied weather station temperature data to produce a reconstruction of temperature at each AWS site for the period 1957-2006.”

    Would you mind telling me what value was used for
    OPTIONS.regpar
    This one has me pretty confused, what would you consider an appropriate value for a TTLS run.

    Finally, I would also be interested in running the satellite data and processing algorithm to mask for clouds. Is there some ETA on that?

    So far nothing yet.

  11. So, am I reading things correctly?:
    1) by matching the trends as presented you will also determine the value of regpar as used in the paper.
    2) the value of regpar likely has to be at some level greater than 1 in order to result in positive trends. however…
    3) values of regpar greater than 1 will reduce the degrees of freedom and therefore the confidence that positive trends are realistic.
    4) no explanation or justification from the authors for values of regpar which produce the trend have been seen so far.

    Let me know if I am off on this.

  12. Jeff,

    This material is way outside my experience and expertise arenas and over my head. But I see that convergence is mentioned relative to the RegEM methods. I suggest that the sensitivity of the calculated results to the stopping criterion might be interesting to see.

    Just a thought and very likely not even close to being useful

  13. Roman and gavin both recommended using the anomaly data instead of temp data. I plan to do that tonight if I get time, there is a hockey game after all. It may improve the sensitivity to different factors but Laymen is right about his point #1 which takes a bunch of time.

    I should bill ’em my normal hourly rate, I mean what’s the big deal with releasing the processing code and results, if the paper is good nothing I say will change it. Even if it isn’t good, what I say will not have much effect.

  14. Jeff,

    It appears that neigs is not the parameter to use to specify the number of eigenvectors used in the TTLS solution. It’s only used to truncate the eigenproblem solution to avoid long run times.

    Instead, the regpar parameter is used to specify the number of eigenvectors to carry in the solution. Note at line 206 of regem.m that parameter trunc is set as

    trunc = min([options.regpar, n-1, p]);

    This parameter is then passed as dummy variable r into pttls.m:

    % [Xr, Sr, rho, eta] = PTTLS(V, d, colA, colB, r) computes the
    % minimum-norm solution Xr of the TLS problem, truncated at rank r

    So Steig specifies OPTIONS.regpar=3 based on this quote from the SI:

    We apply the same method as for the TIR-based reconstruction to the AWS-data, using RegEM with k = 3 and using the READER occupied weather station temperature data to produce a reconstruction of temperature at each AWS site for the period 1957-2006.

    Suggest you ignore neigs (leave it blank and solve the whole eigenproblem).

  15. Molon Labe,

    Really nice work. I read that same line but never noticed the comment.

    I think there is another interpretation for what you found.

    First read the neigs parm

    % OPTIONS.neigs Number of eigenvalue-eigenvector pairs not set
    % to be computed for TTLS regression.
    % By default, all nonzero eigenvalues and
    % corresponding eigenvectors are computed.
    % By computing fewer (neigs) eigenvectors,
    % the computations can be accelerated, but
    % the residual covariance matrices become
    % inaccurate. Consequently, the residual
    % covariance matrices underestimate the
    % imputation error conditional covariance
    % matrices more and more as neigs is
    % decreased.

    The keyword for me was rank. I didn’t realize I was looking at rank, when I was playing around with different values I saw some effects which made me think it was an integer value. This explains my choices for the post. From your find I did some google work.

    I’m sure this is the coolest truncated least squares demo on the planet

    Click to access Sima.pdf

    I think neigs is right but you have given me what I needed to figure out regpar.

    Thanks.

  16. Note how neigs is used:

    regem.m line 407: [V, d] = peigs(C, neigs);

    It’s only used in computing the eigendecomposition. This command says to compute the first neigs eigenmodes of the correlation matrix, C.

    You must have neigs > regpar. And regpar is the integer number of modes to retain in the TTLS analysis.

  17. Thanks for your help again, it’s really appreciated.

    As in my plots above, you are correct neigs must be greater than or equal to regpar.

    I interpret the neigs value as a limitation of 3 pc’s to represent the missing values. I think you may be right that rank for the TTLS can eliminate eigenvalues by default but I’m not certain.

    From what you’re saying I need neigs=3 and regpar=3 correct?

  18. OPTIONS.neigs value of 2.
    OPTIONS.regpar. value of 2.
    OPTIONS.neigs value of 1.
    OPTIONS.regpar. value of 1.
    I would like to see the out put with these values.
    it looks like regpar at 1 messes with the later negative values, and at 2 across the board.
    Does it function at value of 0? say 2-0 or 4-0?
    just asking LOL!
    ttls might need a 12 humm….
    last note: it looks to me another devious way to make up number to look like what you want!

  19. You need neigs > 3 and preferably >>3. Only reason not to do it is runtime. After you determine the first 3 eigenmodes, everything left over is orthogonal (under transformation by C) to the subspace spanned by the first three modes. However, the “size” of this residual subspace apparently factors in to uncertainty estimates. I think that’s what they mean by this comment on neigs (regem.m line 155):

    Consequently, the residual covariance matrices underestimate the imputation
    error conditional covariance matrices more and more as neigs is
    decreased.

    I’m sure you will see that as neigs increases with constant regpar, results converge.

    Also, I suggest you not scoff at the notion that only three modes are used to characterize the temperature field. In reactor physics, the entire reactor core time-dependent neutron flux distribution is adequately characterized with only the single fundamental mode solution of a (generalized) eigenvalue problem.

    What is much more important is the possibility that no positive trend is observed until they get to k=3. The notion that this sparse data can adequately contain enough information to resolve the 3rd eigenmode is fatuous.

  20. #21,

    Thanks again, the results do converge visually with higher neigs as I would also expect. I need to spend some time studying the math to consider your words and understand if that is what they did but you have helped in a big way.

    If you look at temp data, the local effects of ocean currents, winds and other influences create fluctuations in the way the termperature is distributed over a land mass. These fluctuations are bounded by chaotic weather patterns with land mass providing some porous but more stable boundary. Since the weather pattern is irregular, the danger of 3 PC’s becomes the division of temperature fields/response at a mushy poorly defined border creating an inaccurate trend. Maybe I’m wrong but if we had the data it would be easy enough to find out.

  21. Jeff, after they do these decompositions, they then appear to calculate a geographically weighted average – illustrated in an RC diagram but not described to my knowledge. To the extend that this is the “signal” of interest, it’s not obvious to me that lower order PCs actually help; indeed, I can think of situations where doing so attenuates the amplitude of the signal and worsening the results. RegEM is not as bad as a multiple inverse regression, but it appears to me to be partway down the road as compared to more straightforward averaging/PC1 methods, which are more plausible to me. In this case, regpar=1 is definitely worth looking at long and hard before as a base case.

Leave a comment