A guest post by Ryan O.
As many of you know, we have been in contact with Dr. Beckers, primary author of a few papers on using the DINEOF algorithm for infilling cloud masked images. It was very gratifying to discover that our Iterative TSVD method we developed turns out to be the same algorithm – with one important exception. DINEOF uses covariance, while ours uses correlation.
The difference between correlation and covariance has been the subject of much debate. Indeed, it was one of the issues in the Hockey Stick debate. In our case, the rescaling to unit variance results in an artificial inflation of the variance of the actual data at the beginning of the algorithm (though, as the algorithm progresses, this effect disappears). It also changes the spatial structure of the EOFs used for imputation.
While there are certain a priori arguments that can be made for using correlation, there are also equally plausible arguments that can be made for using covariance. So based on Dr. Beckers’ suggestion, I felt it was important to compare results obtained using covariance to earlier results. The method used was the same as the previous “You Can’t Get There From Here” post. On the left is the correlation results; on the right is covariance.
Fig. 1: Results of split verification experiments using ground stations only. Left – correlation. Right – covariance.
For most of the ground station sets, the maximum verification statistics for covariance are slightly better than those for correlation. In the case of the two best-performing sets, correlation actually provides slightly better maximum values. Additionally, the correlation results display less dependence on the maximum number of imputation EOFs. However, neither of these is, in my opinion, sufficient to prefer correlation over covariance.
Even looking at the individual station verification stats shows no clear reason to prefer correlation:
The next test is to determine if there is a substantial difference between using correlation or covariance for rotating the satellite PCs to the ground station solution. Again, this was performed in the same manner as the previous post, using the covariance solution for ground station set GRID 1C. A side-by-side comparison of correlation (left) and covariance (right) is below.
When comparing, please take note of the scale change. The maximum values for correlation and covariance are nearly identical.
What is definitely different is the stability of the covariance solution. In order to achieve the same stability for the correlation solution, we had to impose an external condition of weighting the stations by their respective AVHRR eigenvectors for each AVHRR PC. With the covariance solution, no such conditions are required. It is clear that the magnitude of the variance for each station contains predictive information, and by using all of the available information – i.e., using covariance – the need for external conditions is removed. Even when imputing the PCs individually there is no need to impose any additional conditions.
Given these results, it is clear that the most appropriate method is to use covariance rather than correlation.
So what do our validation statistics look like for the covariance solution? First, let’s remember what they looked like for correlation:
Now for covariance:
Virtually identical.
Now what do the reconstruction trends look like?
For the paper, we will use the covariance method. It provides nearly equal results to the covariance method and does not require imposition of external constraints during the PC imputation. It also does not require any somewhat arbitrary (and, hence, arguable) assumptions about how to scale while infilling. Indeed, as the AVHRR PC imputation indicates, by scaling to correlation, important information contained in station variance is not being used for prediction. Another important consideration is that an exactly analogous covariance-based algorithm – DINEOF – is already in use for this type of infilling.
I would like to extend a hearty thank-you to Dr. Beckers for taking the time to assist us.
The attached script has been modified to allow the user to select either covariance or correlation for the imputation.
UPDATE: I don’t believe this is the up to date code file. Some sections are missing so I will re-post a link as soon as possible.