tAV to REALCLIMATE: YOU CAN’T GET THERE FROM HERE

A few notes from Jeff,

First, this is exciting work Ryan has done,  The Air Vent has been lucky to have great guest posts lately.   The work by Ryan and Nic has been excellent in improving my own understanding of the original Steig et al. reconstruction and methods for improvement.

Ryan has been working on a concept he had to improve two items of the original reconstruction.   The first one I want to mention is the imputation of Steig et al. is actually backwards.  The Steig reconstruction is  a reconstruction of satellite surface skin temperature (AVHRR)  rather than a surface temperature reconstruction as it’s billed.  It needs to be recalibrated to match surface station trend to be considered a valid surface station reconstruction.  As we know from previous work here, the trends of the Steig reconstruction are substantially different from surface station trends.  The second point has to do with RegEM converging to a local or global minima.

Ryan employs a weighting method in a TSVD reconstruction which involves two separate steps.

The method Ryan used to correct for both of these problems involves pre-weighting surface stations in relation to satellite information .  By applying a large equal multiplier to the surface stations in relation to sat PC’s, Ryan gives a strong weighting to the surface stations vs sat and negates the need for a post-reconstruction calibration.

The second weighting is more clever and important to climate science.  While Ryan explains it pretty well below, sometimes two explanations can help communicate it and this mathematical step should be very important in EM processing using similar data fields.   In  EM where large qty of data is missing,  spurious correlations can occur and a non-global minimum can be reached.  In this case large portions of the data are missing.   Imputing 1 PC at a time, Ryan weighted the individual surface stations by the pca eigenvector weighting of the original AVHRR data.  This means areas with low information content for the pc are less likely to accidentally become heavily weighted as each iteration progresses.  – Very important.  Think of it as an improved start point for the iteration, or an improved station location in the imputation rather than RegEM figuring out where stations belong.

Ryan, please feel free to correct any details you see wrong with this description.

Of course Ryan couldn’t resist a little discussion with RC 😉 .  Read on, I think you’ll like it.

============================================================================

Guest post by Ryan O

Most of you are aware that Dr. Steig posted a response to our reconstructions over at RealClimate. The link is here: http://www.realclimate.org/index.php/archives/2009/06/on-overfitting/

There were two salient points in his post that we should look at. One point was that “someone called Ryan O” had obtained better verification statistics by first calibrating the satellite data to ground data. This point is easily addressed. To all RealClimate readers: everything that follows was done using the cloud masked AVHRR data provided by Dr. Steig as-is. No calibration. In fact, given the way our reconstructions are done, any such calibration would not affect the results. This, too, will be shown later.

The second point is that Dr. Steig claimed that the verification statistics were degraded as additional AVHRR PCs were included. This is certainly true, if you use the original tools (RegEM TTLS) and the original methodology (impute the whole mess at once). I think this point was lost at RealClimate. There are problems with the method and the math behind the method – so we changed the method to address these issues.

Here is a short summary of the issues:

1. TTLS assumes errors in both the predictors and predictands. However, an error in a PC (which is an abstract quantity) does not mean the same thing as an error in a temperature measurement at a specific location. Additionally, if the pre-1982 portion is calculated based on assuming errors in the ground stations and the PCs, then it is inappropriate to simply add the original, unmodified PCs onto the end. The post-1982 solution needs to be calculated the same way as the pre-1982 solution: assuming errors in both.

2. While Steig refers to the ground stations as the predictors and the satellite PCs as the predictands, in their method, this is not strictly true. Any existing values are the predictors and the missing values are the predictands. This means the ground stations and satellite PCs are both predictors and predictands. Not only that, but the satellite PCs affect each other’s imputation by interacting with each other and with the ground stations.

3. Because the solution is based on the truncated SVD of the correlation matrix, the pre-1982 portion of the AVHRR PCs is not truly an extrapolation of the PCs. It is a rotation of the PCs to the ground station data. This means that the original AVHRR PCs should not simply be tacked on to the end. The rotated PCs should be used from 1957 to 2006. The standard RegEM TTLS algorithm does not return the rotated (unspliced) solution (though Nic L’s modification does return that solution). This problem can be done as an extrapolation, but Steig’s method does not accomplish that.

4. The ground stations are used to predict PC values without regard to whether the PC explains any variance at the station location. This is not necessarily a problem – unless you subsequently recover gridded temperatures using the eigenvector. Because Steig uses the eigenvectors to recover gridded temperatures, then the eigenvector must be used to constrain the imputation. RegEM TTLS has no means of doing this.

5. An insufficient number of PCs are used. The claim that 3 PCs can represent land the size of Antarctica when the ERSST reconstructions required 15+ PCs to represent open ocean areas of equivalent size defies belief.

Some of these issues cannot be resolved when using RegEM TTLS. To that end, we started using a different imputation tool based on a truncated SVD approach (originally written for R by Steve McIntyre). The benefits of the truncated SVD approach are that it is faster, allows direct access to the unspliced solution, and is simpler to understand. That last benefit is important because we will discover that we need to modify the truncated SVD approach to address some of the methodological problems.

But as we intend to publish our results, we do need to make some direct comparisons of RegEM and truncated SVD in order to show that the solutions returned by TSVD are at good – or better – than RegEM. Speed and convenience do not matter if the solution is crap.

To make this comparison, I performed a series of ground station imputations using RegEM and the standard TSVD from Steve McIntyre. No AVHRR PCs were included in these imputations – the purpose was to determine how well the methods compare to each other. This method of comparing reconstruction verification statistics for different settings of “n” (equivalent to regpar) to decide the optimal setting comes straight out of Beckers and Rixen (2003) here:

http://ams.allenpress.com/archive/1520-0426/20/12/pdf/i1520-0426-20-12-1839.pdf

The ground station sets used were:

1. All READER stations (set “FULL”)

2. S09 (Steig et al., 2009) stations (set “S09”)

3. No ocean stations (set “NO OCEAN”)

1. All manned and AWS stations (variant 1A)

2. 1A minus Deception and Adelaide (variant 1B)

3. 1B minus all stations with less than 96 months of history (variant 1C)

4. 1A minus all AWS stations (2A)

5. 1B minus all AWS stations (2B)

6. 1C minus all AWS stations (2C)

4. Stations on the AVHRR grid (set “GRID”)

1. All manned and AWS stations (variant 1A)

2. 1A minus Deception and Adelaide (variant 1B)

3. 1B minus all stations with less than 96 months of history (variant 1C)

4. 1A minus all AWS stations (2A)

5. 1B minus all AWS stations (2B)

6. 1C minus all AWS stations (2C)

This is a total of 14 different ground station matrices. For RegEM, each one was imputed starting at regpar=1 up to regpar=9 (not all sets had solutions at regpars greater than 9). Split verification/validation tests were performed at each regpar setting, where some station data was withheld during the imputation.

The total number of reconstructions performed for RegEM, then, is:

14 sets X 9 regpar settings X 3 (split calibration/verification) = 378

Which took nearly 28 hours to finish.

The same series of sets were imputed using TSVD. TSVD did not have issues with computational singularity at high settings of n (regpar equivalent), so reconstructions were performed up to n=15. The total number of reconstructions for TSVD, then, is:

14 sets X 15 settings of “n” X 3 (split calibration/verification) = 630

This only took about 14 hours to finish.

The results of the comparison are below:

Fig_1
Fig. 1: RegEM (left) vs. Standard TSVD (right) in split reconstruction verification experiments

From this, we immediately see the following:

1. RegEM and TSVD perform nearly equivalently for equivalent station sets and regpar=n settings.

2. Up to regpar=5, RegEM shows slightly better verification statistics.

3. After regpar=n=5, verification statistics for both RegEM and TSVD drop dramatically, indicating overfitting.

This means that both RegEM and TSVD are somewhat limited. Even if we knew beforehand that more than 5 EOFs were needed to explain the station covariance, we would be restricted by our imputation algorithms to some number less than that. This is the source of the “Which Antarctica Do You Want” post, in which I showed that the RegEM and TSVD solutions were unstable. Small changes to regpar or n result in massive changes in the resulting reconstruction.

ITERATIVE TRUNCATED SVD

This is the first major departure from the Steig method. I was unhappy with the performance of both RegEM and the standard TSVD imputation. I suspected that the source of the poor performance was related to the initial infill of missing values with zeros. To that end, I modified the TSVD algorithm to approach “n” (equivalent to the regpar setting in RegEM) iteratively – starting at n=1. The initial infill at n=2 uses the converged solution from n=1, and so on.

To show that this actually works, I performed the same set of tests as I did on RegEM and standard TSVD. This added another 630 reconstructions to my repertoire! The results – with the standard TSVD shown for comparison – are below:

Fig_2
Fig. 2: Iterative TSVD (left) vs. standard TSVD (right) in split reconstruction verification experiments

The improvement in performance is dramatic. Iterative TSVD provides greater maximum values of r (0.73), RE (0.54), and CE (0.48) than either RegEM (0.64, 0.44, 0.42) or standard TSVD (0.63, 0.46, 0.42), is much more stable, and allows use of higher values for n.

So if anyone at RealClimate is still reading, you can’t get there by using RegEM. Dr. Steig will never be able to replicate our results using RegEM – because RegEM simply is not as accurate as our Iterative TSVD method.

Also note that the station selection by S09 is not the optimal station selection. In all of the experiments, the No Ocean 1C and Grid 1C variants showed significantly better verification statistics than the station selection by S09. There simply are not enough stations (42) to capture enough of the actual covariance. More stations are needed to constrain the solution. For No Ocean and Grid sets 1C, all stations (AWS and manned) with greater than 96 months of data are included. This provides 72 and 64 stations, respectively.

From this series of experiments, we have determined the optimal means of imputing the ground stations:

* Use station set No Ocean 1C or Grid 1C

* Use Iterative TSVD

* Impute at n=7

Fig_3
Fig. 3: Minimum values obtained for split reconstruction verification experiments for set No Ocean 1C, Iterative TSVD, n=7.
Fig_4
Fig. 4: Minimum values obtained for split reconstruction verification experiments for set Grid 1C, Iterative TSVD, n=7.

AVHRR PC IMPUTATION

Now that we know the most accurate combination for our ground station imputation, we can determine the most accurate combination for the AVHRR principal component imputation. First, we will not use either RegEM TTLS or standard TSVD, as the Iterative TSVD method is inherently more accurate. That eliminates two possible methods. Using Iterative TSVD, there are still 3 additional possibilities:

1. Imputing all PCs at the same time (simultaneous). This helps constrain the solution as they are all mutually orthogonal post-1982. However, this does allow the PCs to interact with each other and prevents the PCs from fully rotating to the ground station solution.

2. Imputing each PC individually (separate). While this does not allow the PCs to interact, the solution is less constrained because there is no post-1982 overlap to help maintain orthogonality.

3. Imputing each PC individually, but weighting the station data by the corresponding eigenvector weight at the station location (separate, weighted). This both prevents the PCs from interacting and constrains the solution by increasing the importance of stations in regions where the eigenvector explains the most variance. It also prevents an additional problem – present in both the Steig method and the two methods above – where stations in areas where the eigenvector coefficients are small drive the imputation of the PCs.

The test was performed similarly to the previous tests. Reconstructions were performed at n=1 to n=15. Five different combinations of AVHRR PCs were used – 3, 7, 13, 21, and 28, which correspond to breaks in the eigenvalue spectrum which fulfill or nearly fulfill the North criterion. This was performed for the No Ocean 1C, Grid 1C, and S09 station sets at the optimal settings. The total number of reconstructions performed was:

3 station sets X 15 values for “n” X 5 different AVHRR PC sets X 3 imputation options X 3 (split verification/calibration) = 2,025

I will show a panel for the Grid 1C set, which overall had the best verification statistics. Note that although up to this point we have done all of our selections based on best performance to the ground data, in this case, the verification is performed against the satellite data.

Your browser may not support display of this image.

Fig_5
Fig_5 Fig. 5: Split reconstruction/verification experiments vs. satellite data. Black lines represent the eigenvector-weighted method; red the separate unweighted method, and green the simultaneous method. The blue line represents the method with the best overall verification performance, which is the 28-PC, eigenvector-weighted reconstruction.

Note the evidence of overfitting in the top left panel – you can see that the separate and simultaneous methods continue to increase the explained variance statistic while sacrificing accuracy as compared to the withheld data.

Also note how stable the eigenvector-weighted solution is. There is almost no change in verification performance as n is increased. We had observed this in the past (both here and in panels I posted at CA) that demonstrated that the eigenvector weighting produced identical results at a large range of values for n.

The last observation is that contrary to Steig’s claim that including additional AVHRR PCs lead to overfitting, we find that additional AVHRR PCs leads to better verification performance. This makes sense when we consider that the point of the PC decomposition of the AVHRR data was simply to approximate the data set. The better the approximation, the better the predictive ability.

Steig’s claim about the AVHRR PCs is not an axiomatically true statement. It is only true because the tool used (RegEM) and the method used (simultaneous calibration and prediction of the PCs and ground stations) is mathematically suspect and produces suboptimal verification statistics. It is true because the method is poor.

CONCLUSIONS

At this point, it should be clear that our method and tools provide significantly improved performance as compared to Steig’s method and tools. It should also be clear that Iterative TSVD provides superior predictive ability as compared to RegEM TTLS, at least in cases where a large number of values are missing. In my opinion, there is no reason to use RegEM TTLS at all. Standard TSVD provides nearly identical performance at a fraction of the computational cost, and Iterative TSVD provides much improved performance (still at a fraction of the computational cost).

It is also clear that, if one wishes to retain the eigenvector to reconstruct the field, then the predictors must be weighted by the eigenvector at their locations to provide the best predictive ability. The Mannian / Steig method of regressing the PCs against unweighted predictors results in reduced verification performance. In this specific case, the performance is significantly reduced.

So what does our optimal solution of:

* Station set Grid 1C, fully imputed at n=7 using Iterative TSVD

* 28 AVHRR PCs

* PCs imputed separately with the stations weighted by the eigenvector coefficients at n=10 using Iterative TSVD

look like?

Your browser may not support display of this image.

Fig_6
Fig. 6: 1957-2006 trends using optimal ground station set and Iterative TSVD with eigenvector-weighted prediction of AVHRR PCs.
Fig_7
Fig_7 Fig. 7: 1967-2006 trends using optimal ground station set and Iterative TSVD with eigenvector-weighted prediction of AVHRR PCs.
Fig_8
Fig. 8: 1957-2006 and 1967-2006 trends with confidence intervals (uncorrected for spatial autocorrelation) using optimal ground station set and Iterative TSVD with eigenvector-weighted prediction of AVHRR PCs.

Now the most important part of all of this is – what do our verification statistics say? For context, remember the Steig verification statistics were:

Fig_9
Fig_9 Fig. 9: Minimum verification statistics for S09 reconstruction. Average values are 0.21, 0.11, and 0.09, respectively.

Ours, on the other hand, are much, much better:

Fig_10
Fig. 10: Minimum verification statistics for our reconstruction. Average values are 0.37, 0.37, and 0.32, respectively.

Our r2 is nearly double, and our RE and CE scores are triple that of Steig’s. For RE, our reconstruction has a total of 5 pixels of negative values.

We can also compare our performance with respect to explained variance at all on-grid ground station locations. Remember that we only used stations with 96 months of data, so any stations with less than 96 months of data were withheld. That means that 25 of the stations shown in the following figures were entirely withheld during the reconstruction. Our performance with respect to withheld stations is almost identical to the performance with respect to stations used in the imputation.

Fig_11
Fig. 11: Explained variance for all on-grid stations. 25 of the 89 stations were withheld during the reconstruction process.

How do we compare to Steig’s reconstruction? Steig’s is shown below:

Fig_12
Fig. 11: Explained variance for all on-grid stations for the S09 reconstruction.

The interesting thing as that, even though Steig’s reconstruction contains actual satellite data in the post-1982 timeframe, our reconstruction – which does not contain any actual satellite data in any period – actually does better than Steig’s even with respect to the AWS stations that only exist during the satellite period.

In other words, our reconstruction does a better job of reproducing station data than the raw AVHRR data.

To show just how much better our reconstruction does, examine the following plot – which is the difference between our reconstruction and Steig’s:

Fig_13
Fig. 13: Our reconstruction explained variance minus the S09 reconstruction explained variance.

One Siple away from a perfect score.

*****

To RealClimate: Posting about how the S09 method cannot use more than 3 PCs without overfitting does not have anything to do with our reconstruction. If your “here” always starts with using RegEM, on using a calibration/imputation step that is not mathematically proper, and using an improper splice of satellite data on the end of rotated PCs . . . well, you won’t get there from here. You should recognize that our critique of the paper is far deeper than simply the number of PCs used.

*****

Publishable format for this is in the works.

CLOSING REMARK

The attached script will allow you to duplicate all of the information presented here. Keep in mind that to make this post, I needed to perform 3,033 reconstructions. Those portions of the script take about 70-100 hours to run. You also need adequate disk space to store the results, which consume about 4 GB.

With that in mind, those portions of the script are commented out. You can still do individual reconstructions using the methods described here using command “do.recon()” and the appropriate settings. You can also do all the standard verification tests, maps, and graphs.

The version of the Iterative TSVD algorithm in this script has been rewritten to run faster and use MUCH less memory than my first version of the algorithm. That first version caused problems, as I would run my reconstruction test functions for a day before they would suddenly error out due to using too much memory. Frustrating, but in the end, the code is a lot cleaner.

———————————

The 100% ready to go free R code is linked here:  SplitReconFinal.

72 thoughts on “tAV to REALCLIMATE: YOU CAN’T GET THERE FROM HERE

  1. Ryan, What a great reference paper Beckers and Rixen.

    If we start from the initial guess of zero anomalies
    at the missing data point, it can be shown (appendix A)
    that the variance of the dominant modes will generally
    be increased, whereas the less dominant modes will see
    their amplitude decreased. This is coherent with the interpretation
    that the initial guess overestimates ‘‘high
    frequency’’ signals and underestimates ‘‘large scale’’
    signals by disregarding any local information around
    the missing data point.

  2. “Steig’s claim about the AVHRR PCs is not an axiomatically true statement.”

    Ryan, you shouldn’t use the word “axiomatically” without properly crediting TCO. 😉

    Seriously, it has been fascinating watching your understanding of methods and issues evolve on this project. Thanks for the determination to follow through on this. Since I have been too busy lately for anything other than quick reads (rather than studying the posts) the time you must have invested boggles my mind.

  3. Ryan. What kind of horse-power are you throwing at these calculations? (Apologies if this is already addressed; I’ve not had time to fine-tooth-comb all of your postings)

  4. #1 Yah. When Steve first posted that over at CA I just kind of skimmed it. A little while later, I read it in detail, and it is very applicable to this situation. Nicely written, too. Good to see your HF criticisms validated by a major paper, as the mechanisms in the Antarctic reconstruction are identical to those in Beckers & Rixen’s cloudmasking.

    One minor correction to the introductory text. In TSVD, the SVD is done directly on the data itself. The only reason a covariance matrix is even calculated in TSVD is for scaling. It serves no other purpose. Using the correlation matrix for the SVD is a RegEM TTLS thing. That’s one of the reasons I like the TSVD method so much – it’s very intuitive and it’s much easier to evaluate the results.

    #2 Hahaha! Very true, and not in a snarky way. I did actually think of TCO when I wrote that. 😉

    #3 It’s not really all that much, to be honest. A bunch of SVDs. It’s just so iterative that it takes a while to perform.

  5. #4 Oops – that should be “infilling after cloud masking”, not cloudmasking. Beckers & Rixen were reconstructing the data after cloudmasking had been done.

  6. A work of beauty! Good luck with publication – I hope any niggles get spotted and ironed out but I’d say this is worthy of a cover story 🙂 ….

    … and maybe a taught class module? 😉

    Like Layman – nothing but admiration.

    (btw – another “curious” has cropped up elsewhere in case you see posts which don’t echo the above)

  7. So in the end, a barely statistically significant warming trend over the last 50 years, and no statistically significant warming over the last 40.

  8. Thanks & Good luck with the publication. Let’s all hope this work gets the kind of attention it deserves!

  9. #9 The resemblance is really cool. That is why the real meat of the reconstruction is in the surface station imputation; not the satellite part. The satellite part just provides a prettier picture with a bit better resolution.

    Also, did you see the challenge I posted to TCO on Tamino? I told him I’d kiss and make up if he can figure out why this post shows that negatively weighted predictors result in poor extrapolations compared to positively weighted predictors (assuming the predictors and predictands are measuring the same quantity). I won’t say why this shows that yet – I’ll give him a shot first. 😉

  10. I went and looked, I can think of several reasons but TCO isn’t going to be able to come up with any. He needs to put in the legwork.

    I’ve tried to drop a link on RC it doesn’t even make it into moderation.

  11. Ryan O, an excellent presentation of your reconstruction modifications (actually a complete overhaul) that will require a second read by this layperson to fully appreciate. I suspect that, by all your analysis, you are in a better position than Steig (or his RC cohorts) might be for some time in understanding the weaknesses of his approach.

    You need not answer but were you able to have a discussion with Beckers about your reconstruction methods and can you name with whom you might be coauthoring. Oh, and will you be seeking out TCO’s true identity in order to have him consult on the writing of the paper.

  12. #14 Not yet. Dr. Beckers has not yet responded. I’m hopeful that he will respond. If not, I’ll make another attempt. As he has used the standard TSVD approach very well in the past, I’m most interested in his thoughts on the iterative version. I believe he will like the approach, as it makes selecting the value for “n” less critical.

    For coauthors – I’m hoping that Jeff at least would like to sign on. Once we have a draft, then we might be able to convince others to sign on as well – assuming the draft is worth half a shit.

    #13 BTW – the closest station recon would be a great thing to have in a paper, if only to show that the fancy methods really don’t do uber-dramatically better than simple methods. As such, simpler methods should be used as a reality check on the output of black box methods.

  13. I don’t want to over-step my bounds Ryan (your having too much fun with this) but I have to give TCO a hint to your challenge. TCO, the answer is hidden in Ryan’s summary above.

  14. #16 Haha. Yep. In the end, something very useful came out of that weighting discussion. The idea behind the methods here was to show that overall negative weights provide less accurate predictions when the predictors and predictands are measuring the same thing . . . in this case, temperature.

  15. Ryan,

    Most excellent work – hearty congratulations. I will study the paper in more detail, and look at the code, before asking any detailed questions save the one below. I like the basic idea of the iterative truncated SVD method. Once I have got my head round it, I might have a go at trying to modify RegEM to incorporate it, and see how that compares.

    One initial query, with apologies if I have misunderstood things. If you are imputing each of the 28 AVHRR PC separately using all the ground stations in the chosen set (which is what I think you are saying) then am I right in thinking that the 7 TSVD internal solution PCs will be slightly different for each imputation, so that the 28 imputed AVHRR PCs are not all exact linear combinations of the same 7 TSVD PCs? But will the 28 imputed AVHRR PCs not all be nearly so, unless the single PCs are each given a very high weighting in the TSVD? If so, intuitively I find it surprising that keeping more than 7 AVHRR PCs improves the unspliced solution much, although I can see it would improve the unspliced solution (and might well improve the ground station reconstruction if all the AVHRR PCs were imputed simultaneously with the ground stations). But probably I am missing something.

  16. RyanO I hope in your paper you will add some more meat to 2 of your comments here. One is properly assessing the effects of autocorrelated data and methodology from your methods; and of course, the second is the signifigance of the results wrt properly assessed autocorrelation.

    This is one of the areas that I have found hard to accept that the “amateurs” are piling on those poor misunderstood realclimatescientists. And yes, I am thinking of the Rahmstorf et al 2007 that Steve and Lucia have so ably shown to be “amateurish.” I know that JeffId would let me make a stronger statement,but I think that would be piling on.;)

  17. Nic,

    I’m not sure I understand your question. The reconstruction steps are these:

    1. Impute the ground stations without the PCs. The reason for this is that the PCs do not provide any covariance information; that information is contained solely in the eigenvector. Imputing everything together degrades the reconstruction performance (which was shown in the previous reconstruction posts). Because all quantities in this imputation are like quantities (temperature anomalies at a station), the appropriate solution to use is the spliced solution containing all of the original station data.

    2. Impute each AVHRR PC using the fully populated ground station matrix, weighted by the eigenvector. Essentially, this rotates the PC such that the eigenvector explains the maximum possibly variance in the ground station information.

    3. Multiply the imputed PCs by their respective eigenvectors and sum.

    You are right in thinking that the TSVD EOFs are different for each PC that is being imputed. For each PC, the associated eigenvector weights are different, which means that TSVD EOFs will be different because they explain variance at different stations. What I’m not clear on is why you think the number of TSVD EOFs limits the number of AVHRR PCs that can be used.

    In the eigenvector weighted method, each PC is independently regressed against the ground information in the associated eigenvector. The regression results are different for each PC, regardless of how many TSVD EOFs are retained. Even at 1 retained TSVD EOF, including additional AVHRR PCs improves the prediction.

    I’m also unclear as to why you think that simultaneous imputation would be better. You cannot use the spatial information from the PC eigenvectors during simultaneous imputation, because each eigenvector is different. To be able to use the spatial information, the imputation must be done separately for each PC.

  18. Did you mean to show all the data (1957 to 2006) in both graphs of Figure8? Should the second graph only have 1967 to 2006?

  19. Don’t want to be too cynical, but todays RealClimate posting is about studies/journal articles that report a new breakthrough, or that overturn previous ideas – but then turn out to be false, or are improved upon by others. This may just be a coincidence, right?

  20. #22 The trend was calculated using just the 67-06 portion even though all the data is shown.

    #21 Haha. I forgot about that. 😉

  21. lol #24 – very prescient of them!

    Nice work Ryan.

    Tco … echo … TCO … Dhogza … echo … echo.

  22. Ryan, you have shed a lot of light on the RegEM “black box”. I think that one of the possible benefits which could come from publication is breaking down the knowledge barrier of “black box” RegEM and paving the way for reviewers to better probe these reconstruction methods before a paper is accepted for publication.

  23. What all we lurkers really want to know after this mammoth effort is Antarctica warming or not? Were Steig et al correct or not?

  24. Re Previous: warming significantly P < 0.05? over the period tested but even more interesting and more relevant (if possible) from 1958? to June 2009? BTW thank you very much for this effort Ryan'O and Jeff Id.

  25. VG,

    No one knows. There isn’t enough data to know. Ryan has shown that, if one is inclined to try to tease a signal from the inadequate data, there is a much better way to do it than Steig et al chose. But the problem of inadequate information remains.

  26. I wonder if someone would be willing to drop a link in the latest RC thread which is ironically on topic. I’ve tried twice to no avail, apparently my polite and on topic comments don’t even get into moderation for some reason now.

  27. Jeff, FYI I just posted the following comment at RC (awaiting moderation):

    Dr. Steig’s Antarctic warming paper is an example of recent work overturning conventional views Antarctic climate trends.

    Ryan O, has just completed an alternative Antarctic climate reconstruction based on iterative truncated SVD rather than RegEM. Some objections raised by Dr. Steig on Ryan’s previous analysis included 1. the affect of calibrating AVHRR and surface station data and 2. overfitting due to inclusion of additional AVHRR PC’s. The latest analysis eliminates calibration and demonstrates improved verification with the inclusion of additional AVHRR PC’s.

    I think that Ryan would appreciate discussion, comments, or criticisms from anyone who might be interested.

    https://noconsensus.wordpress.com/2009/07/06/tav-to-realclimate-you-can%e2%80%99t-get-there-from-here/#more-4420

  28. 20# Ryan,

    Thanks for the clarification. I see now how your method results in quite different TSVD EOFs/PCs for each AVHRR PC being imputed – I had not focussed on the effect of varying the weighting of the surface stations for each separate imputation. So I agree that in the weighted case the number of TSVD EOFs does not limit the number of AVHRR PCs that can usefully be used In the unweighted case, I still imagine that it would do so. Certainly, I think that for simultaneous AVHRR imputation using RegEM the imputed AVHRR PCs, no matter how many there are, will all be linear combinations of the same 3 TTLS solution PCs. That could possibly be one reason why Steig did not use more AVHRR PCs than the RegEM regularisation parameter (=number of TTLS solution PCs).

    I was by no means saying that simultaneous imputation was better, and I agree that if the surface station imputation has been performed separately, as you have done, it makes sense to impute the AVHRR PCs separately so that you can use the spatial information in the eigenvector. What I was saying was that if the AVHRR PCs were imputed simultaneously with the ground stations (not merely with each other) then having more AVHRR PCs might well improve the ground station reconstruction. Although one cannot use the AVHRR eigenvector spatial weights in that case, in principle with a large number of AVHRR PCs the correlation should be very much higher between an AVHRR PC and a ground station in cases where the eigenvector had a high weight at that ground station’s location, so the spatial information should to a fair extent be captured indirectly. I know that the AVHRR data is not measuring quite the same thing as the ground station data, but it is not impossible in principle that including it when performing the ground station imputation might yield improved results. However, I am quite prepared to accept that in practice that is not the case.

  29. #37

    Mine hung in moderation for quite a while. Then there was a glitch in the website. When the site returned my comment was gone from moderation and I saw that yours was gone. Hopefully, they will put it back up.

  30. I’ve dropped a line to RC (Should be UC really!), I’ve no doubt that if anything gets past Charybdis, then it’ll attract the blue crayon of “Steig et al is robust”
    I’ve just been reading “Spycatcher” by Peter Wright, a nice couple of quotes from him.
    A Royal Commission derided Marconi’s vision (About SW radio) as “Amateur Science” & one member concluded that radio was “A finished art”
    He also notes that
    “It taught me a lesson which stayed with me for life-that on big issues the experts are rarely right”
    Plus ca change?

  31. In practice, the Team will probably not engage in a public discussion, or respond until or unless the work is published in a journal of record. Naturally I would be not expect them to fall over in getting it favourably peer reviewed to allow such publication.

    Try a math literate journal. A re-occuring issue is there is AGW maths, and rest of the world maths.

  32. I think RC have lost any credibility why do you bother? I wouldn’t dream of posting anything there

  33. After seeing Jon P’s comment go down following the website glitch I tried to post again asking the following:

    “I see that Jon P’s comment is gone. Is that due to the glitch?”

    I just got back now to see if it got posted. I guess that RC wants nothing to do with Ryan’s latest post. at least not yet.

  34. #45, Ryan hasn’t been around today. My understanding is that he didn’t correct the sat data for this post. By weighting the stations more heavily than the sat data, the trends are far less affected by satellite trend.

  35. 41 and 42: I also tried asking this simple question at RC:
    “Has anyone at RC had a chance to look at the new Antarctic analysis by Ryan O? It looks very interesting to me.”

    After waiting for moderation for about 10 minutes, it disappeared as well. I’ve had numerous other comments get through, and some censored (but none as innocuous as this).

  36. #45, #46 You guys are correct. Zero adjustments to the satellite data.

    #35 You are exactly correct – assuming that the stations in the high weight areas of the AVHRR eigenvectors are the ones that show the best correlation, and, hence, drive the imputation. Unfortunately, this is not the case in Antarctica. Not all of the stations that show the best correlation to the PCs are located in high-weight areas.

    The other problem with the simultaneous method is that you constrain the solution to be (wholly or partially, depending on your weighting scheme) orthogonal in the post-1982 timeframe. This prevents the PCs from fully rotating to the ground station solution. No matter how you try to slice that one, it always ends up being a blend of surface and satellite information.

    The goal with this particular method was to eliminate the need for any adjustment of the satellite data to ground data by simply letting that occur naturally as part of the imputation.

  37. Blog analyses here and at CA and Lucia’s have an advantage over reading published results in that we are not limited by space and contacts with regards to sensitivity testing of the methods employed. The authors of published papers may have done extensive testing but most of us will probably never know about it. What Ryan O has presented on this thread has to be considered a rare treat in providing non-directly involved participants of the analyses with good insights into the reconstruction methods. The bonus is that Ryan O was able to show and articulate the weaknesses of the Steig reconstruction and then demonstrate a superior one via verification and stability tests and do it in a precise and logical progression.

    I must pose the question of whether Steig et al would have published had they used the reconstruction method suggested here or if Nature would have published it had it been submitted – or given it the cover page.

    I truly believe that some of these climate scientists apply methods which have not been thoroughly tested, and particularly so outside their own field, and when the method produces the “preferred” result they lose the incentive to expend much additional effort on testing the methods.

    I hope we can all be kept updated on efforts to publish the results of these analyses.

    I also hope that Beckers will reply to Ryan. I feel his input could make a difference in getting the work published. Unfortunately, I have been sometimes disappointed with leading figures in an applied field failing to help people who are attempting to publish outside their field and seemingly going against the grain.

    I think RomanM would be a major asset in the publication process.

  38. #49 I have actually received a nice, detailed reply from Dr. Beckers. He agreed that most of what we did was similar to his work. While he uses a different centering method (he centers globally, rather than centering each individual series), he did not see that as a substantive difference.

    He did question our scaling to unit variance, as it artificially inflates the variance of stations with large amounts of incomplete data. In my reply, I explained that unless some kind of scaling is performed, the Peninsula variance dominates and EOFs that explain variance elsewhere in Antarctica are relegated to the high order EOFs and are truncated. The overall verification statistics are degraded as a result. However, he did have an excellent point in that our particular scaling decision is somewhat arbitrary and may materially affect the results. To that end, I think I need to perform the analyses above using different scaling methods, one of which was suggested by Dr. Beckers. The criteria for selecting a scaling method would follow the same logic as above: that which results in the best reproduction of withheld data.

    When it comes to the eigenvector-weighted method, he needed some additional clarification to comment. I am grateful that he took the time to respond in as much detail as he did.

  39. Ryan O, thanks for sharing the discussion that you had with Beckers. Sounds like those discussions have the potential of being very productive. I would certainly understand you wanting to do check your reconstruction with his suggestion before reporting here.

  40. I got distracted and didn’t post this comment in a timely manner. 3yo’s!

    Dr. Beckers did reply to Ryan by email and was very helpful and supportive. Ryan’s method make so much sense to me now that I’ve seen it, I would have thought RC would find the discussion interesting enough to just leave a link – a scientist would swallow their pride and do that anyway. Five bucks says Dr. Steig would. Gavin probably fears giving credibility to work done in blogland.

    Perhaps after it’s published they will, there can’t be any real reasons not to as long as things are written well and not too confrontational, rebuttals always have a tough road. I feel like we definitively know what happened to Antarctic temperature now and are working on improvements. If you compare the next post blue and red areas, you can see station info is VERY close but slightly shifted in position compared to the Area recon.

  41. Warning explicit comment to follow:

    Take it from this Marine, people at RC are a bunch of fin [snip]. I was polite and short and to the point and they delete comments and links to this site, because they are the worlds biggest f[snip] p[snip] and should be ignored and not engaged.

    F Gavin, Dhogza, Barton, Michael, PaulK, Hank ,et. al they all can go fin pound sand and each other for all I care. You know what we call those type of people in battle? Casualties…

    REPLY: While I agree with the sentiment and appreciate your words let’s keep the dialog clean for the moment. This is a time for engagement rather than distancing. RC has a so far misunderstood opportunity to expand their influence through reasonable engagement, their thinking is that by continuous discrediting somehow a win is garnered. In fairness, this is partly due to the fact that they constantly duck bullets and IMHO a lack of belief in science. The pattern you correctly identified at RC is actually well known and possibly soon to be expanded on for the same incident you mention, so consider this a very soft snip with absolutely no apology required.

  42. #52 Yah, I agree. I don’t think it does any good to attempt to publish this as a rebuttal. Besides, it’s not really a rebuttal in the first place. By using the ground data and satellite data as we have, we are tacitly admitting to the validity of Steig’s concept – we’re just updating the tools and basing ours on firmer mathematical grounds. As Steig himself pointed out, if it is truly a contribution to science, then it would be worth publishing. If it is not a contribution, then it is not worth publishing.

    Also, one thing I would like to point out. We had initially suspected that Steig chose regpar=k=3 because it gave the most warming. I no longer believe that to be true. As the above plots indicate, it was the highest combination of PCs and regpar that their method allowed (given their choice of stations to use) before they began to see overfitting. In other words, based on the criteria that I myself used for the above, if all I had available was RegEM TTLS and the idea that everything should be imputed together, I myself would have chosen regpar=k=3 – or, at most regpar=k=4. I would not have included the southern ocean stations, though . . . but that’s a relatively minor point.

  43. #53

    No worries.. When I’m irritated I speak my mine and I’m not all that “elitist” nor very polite about it. My sentiment got through thank you.

    btw, I would never apologize for speaking the truth about all those little xxxxxx at RC. They make me laugh the cowards.

  44. #23, hmmm… a negative R square, something is wrong.

    If my memory serves me well, my understanding is that Steig et al. employed the principle components method to solve the problem of rank deficient in the data covariance or correlation matrix when imputing missing values. Isn’t the truncated SVD method is just a principle components method? And an issue with TSVD is the “component-cutting” or “eigenvalue discarding.” A straightforward method in selecting the number of components retained is to minimize the MSE. Well, I cannot tell exactly how the TSVD is applied by reading this post. Anyway, it’s great that you would like improve upon existing research. Good luck!

  45. #23, #58 That is R^2, not r^2. R^2 is average explained variance, not the coefficient of determination. It is a measure of how much variance the fit explains relative to the mean of the actual points. The range on R^2 is -infinity to +1.

  46. I know that the R^2 in Excel can be negative after they are adjusted – usually when the pre-adjusted R^2 is dimishingly small. I hope Steve M is not reading here.

  47. From the Cook article linked by Ryan O is the less than comprehensive or informative view of the R^2 statistic versus the use of RE and CE. It would appear that CE and RE originated with dendros without much input from statisticians. Can anyone enlighten me with a case for using CE and RE without R^2? I believe Ross McKitrick noted in a post somewhere that you could use all three statistics but not without scoring well or qualifying with R^2. A poor R^2 score with a good RE and/or CE score indicates that you have a problem.

    This is a direct measure of the least squares goodness of fit of the regression model that is achieved here by the minimum AICc criterion. However, R2 is known to be a very poor, biased c measure of true goodness of it when the regression model is applied to data not used for calibration (Cramer 1987; Helland 1987), hence the need for regression model verification tests.

    Among the verification statistics used here, the CE is the most rigorous. The only difference between the RE and CE lies in the denominator term. However, this
    difference generally makes the CE more difficult to pass (i.e., CE . 0). When xy 5 x c, CE 5 RE. But when xy ± x c , RE will be greater then the CE by a factor related
    to that difference. This follows by noting that for the CE, the sum of squares in the denominator is fully corrected because xy is the proper mean. However for the
    RE, the denominator sum of squares will not be fully corrected unless the calibration period mean is fortuitously identical to the verification period mean. When this is not the case, the denominator sum of squares of the RE will be larger than that of the CE resulting in RE greater than CE

    .

  48. Ryan, is it safe to say that the eigenvector weighting of surface stations in your method will be one of the key points that needs to be supported and argued in a publication?

    I speculated to Jeff a long time ago that the low eigenvector weights of surface station areas were responsible for the “un-natural” pre-1982 Steig PC3 artifact. Spatial grid weighting of station input data constrained this effect and eliminated the un-natural PC3 (I think) by imposing spatial discipline on the station inputs. If this is true, then perhaps an argument which demonstrates the effect of low eigenvector weighting of stations (maybe using Steig’s PC3) could be considered in your publication.

  49. r^2 is a verification period statistic; R^2 is a calibration period statistic.

    I see the difference in notation as you apply it above. I have been too sloppy with my interchangeable use of r^2 and R^2. Thanks for setting me straight.

    Correction to Post #62 above: replace R^2 with r^2.

    I believe it was Cubasch and Burger who pointed out the history of using RE and CE and its limited to non existent use beyond a couple of fields. Also they pointed that CE and RE should have the same means if they come from the same distribution and implying if the do not something could be wrong with the reconstruction. I suppose that assumes a prior detrending or a constant trend.

  50. #65 Personally, I feel that RE and CE are a diagnostic telling you whether the series is stationary. I have read and re-read Wahl & Amman and Amman & Wahl and I can honestly say that I do not understand their argument for the exclusive use of the RE statistic. If there is any one statistic that I would choose – were I forced to choose only one – I would choose CE. I do not see how you can assess reconstruction performance by using a mean from a different period.

  51. Ryan —
    Figure 8 above shows your all-Ant. recon, with slopes that are positive but insignificant (as adjusted for AR(1)? — as I showed on CA, Steig neglected to adjust for s.c.), even for the full period 1957-2006.
    I trust your final product will show similar graphs for each of the 3 regions, including in particular W. Ant., which was the big deal in Steig09. Can you show this now, or are you saving it?

  52. As far as I can tell, r^2 and R^2 are simply stylistic variations: R is used more in economics. r is used in climate science. I include a prefix calibration or verification to distinguish between periods.

    RE and CE as terms come out of the tree ring literature – see Fritts for example. The term “Skill Score” in meteorology sometimes matches RE. There are occasional uses of RE under different nomenclature in economics (I think Theil in the 1970s.)

    The conceit that RE should rule uber-alles occurs nowhere in any literature prior to the revelation of the MBH failure with other verification statistics – whereupon Wahl and Ammann, writing up Mann’s arguments, advocated RE uber alles.

  53. #67 Actually, Hu, I haven’t even done a correction for autocorrelation yet. BTW – AR(8) fits much better than an AR(1) model. So there’s no chance the trends are significant once autocorrelation is accounted for.

    I haven’t yet partitioned up my grid into the various regions. No real reason for that . . . I just haven’t gotten to it yet.

    #68 I just can’t figure out any sensible logic behind the primacy of RE. It really doesn’t make any sense. The more I think about it, the more stunning it is to me that WA and AW got published in the first place, as that argument was essential to both of the papers.

  54. Re Antarctic: Why not just look at the actual station records and plot them for the main continent (the Peniisula is affected by the ocean currents an temperatures). There are 15 stations on the main continental mass that have 40-50 years of continuous record. Molodeznaya is a good record but it ends in 1999 and Base Orcadas is too far north. In terms of climate, altitude, and latitude the Peninsula is different to the rest of the continent. All other records are too short or too incomplete to be used to define any trends. temperatures in the 95% of the continent’s land surface lying between 70 South and the South Pole? I have plotted averages of mean annual, winter, and summer temperatures for long-term records in this band using GISS data. I found no significant trends up or down (most are flat trends) in summer temperatures. And very few of the stations between 67 S and 70 S show positive summer temperatures and then only fractions of a degree in January. Moving toward the Pole from 70 S, stations show increasingly low temperatures well below zero in the summer months (Dec, Jan, Feb) that would require a huge warming trend to ever make them positive. Even between 70 S and 67 S, warming trends are absent and positive temperatures occur only in January.In my opinion this is a much better way to assess past temperatures in the Antarctic Continent than gridding and calculating anomalies.

  55. Steve Reynolds has been posting at RC (“Friday Round Up” thread) about their moderation policy and has engaged in some discussion with a few commenters. Steve used this thread as an example of RC filtering reasonable comments. I just left a post there responding to Steve Fish (awaiting moderation) which I have copied below:

    #302 Steve Fish

    Steve, first of all I want to state outright that I have no problem with RC or how they conduct their moderation. If I try to post something and it doesn’t go through then so be it.

    You mentioned that some of the comments got through. If you are talking specifically of the “you can’t get there from here” thread I don’t think anyone got through (if I’m missing something point it out to me). Jon P posted a comment that I saw got through initially, but it later disappeared or was removed. I think most of us had links in our comments to the tAV thread which may have been the reason they did not get through, or in Jon’s case – removed.

Leave a reply to Stephen McIntyre Cancel reply