Ljungqvist Analysis Continued
Posted by Jeff Id on December 6, 2010
This is a guest post by Kenneth Fritsch. He’s taken the time to further the analysis of the unusual proxies from the recent Ljungqvist paper covered here. Like our work on the Antarctic, this is progressing in baby steps toward a result. In case you don’t know, I have a ton of respect for people who don’t fear the data or results. Just let it lead where it goes.
I’ll attempt to make my analysis of the comparison of series correlations of the annual Ljungqvist proxies and GHCN station series brief and to the point.
When I refer to correlations in this analysis I mean correlations between times series of stations or between time series of proxies. The comparisons where made using time periods of 20 and 41 years. In all cases the correlations include all the unique pair-wise station to station or proxy to proxy correlations.
I used the Lungqvist proxy data after normalizing it (z scores) by subtracting the mean and dividing by the standard deviation. I used the mean and standard deviation for 29 of the 30 proxies for a period of 947 years going back from the early 20th century to the late 10th century. I eliminated proxy number 21 because it had a very discontinuous pattern of data availability during this long time period. In order to make the comparison with GHCN station data I used only proxies that had annual resolutions.
I used GHCN data for the period 1950 to 1990 because during that time I had the largest set of stations with complete data. For that period there were 447 stations. I also used a set of GHCN data over a 100 year period in order to get a picture of the change in the station series correlations over 20 year periods. For that period there were only 35 GHCN stations that had complete data. For GHCN correlations I used temperature data and not anomalies.
My intent was to use an internal comparison of station and proxy data in the form of the relationships of station and proxy series correlations with distance separations and estimate how that relationship changes over time. The standard of comparison here would be the station correlations knowing full well that proxies are not thermometers and that the relationship of the proxy response would deteriorate from instrumental by way of white and red noise. The question that remains is how much deterioration would we expect and still assume we could find a temperature signal or better at what point could we assume no signal is discernible.
First the relationship of station correlation with distance and how well it holds up over time must be estimated. To that end I used the 447 GHCN stations temperature series for the time period 1950-1990 and calculated the unique pair-wise series (and excluding the trivial same station correlations). Those unique pairs are a large number [447*(447-1)/2 =99,681]. I plotted and regressed those pair-wise series correlations against distance for distance separations less than 3000 km. It can be seen that beyond 3000 km separation the correlation on average goes to 0. The plot is shown in Link 1 (Figure 1) and the regression summary is given in Link 2 (Figure 2) below. While there is considerable scatter of data points around a linear trend line, the correlation is quite high and significant.
To better see the relationship that is subject noise in Link 1, I took the same data and calculated means and standard deviations for 100 km increments. The plots for the mean and standard deviations are shown in Link 3 (Figure 3) below. Included is a plot of the mean for the same data that has altitude difference between stations of more than 1000 m to show that while altitude differences have an effect that are not large. The correlations of the mean distance separation and the standard deviation seen in this form shows that there is a good correlation between series correlations and distance of separation, i.e. minus the noise. That does not, however, entirely mitigate the scatter of data seen in the first link when a comparison is made with a small number of data such as is the case with the Ljungqvist proxies. For this post I leave that as an open question that I will attempt to answer at another time with some stratified re-sampling of the station data.
Next I looked at the proxy data using the same 41 year time periods, but since I have a long time series (over the 947 years) I was able to obtain 23 time series to obtain a mean correlation and a standard deviation. The data plotted in Link 4 (Figure 4) below used only separation distances less than 3000 km and thus, unfortunately the number of data points for comparison with the station data was only 16. Remember though that each point is an average of 23 time periods. The trends look very flat for both the means and the standard deviations. Take out the one point at 0.26 and the means would be flat at approximately 0 correlation.
To obtain a better comparison over several time periods I used a 20 year time series for pair-wise correlations for the GHCN temperatures over the period 1990 to 1999 and compared it to the Ljungqvist proxy data over the 947 year range with 47 sets of 20 year time periods. That comparison is shown in Link 5 (Figure 5) below. While the proxy data is sparse and without re-sampling the station data it is rather apparent that there is not even an approaching overlap of the station correlations in the 500 km distance separation range with the very low proxy correlations.I have added a histogram in the link below that compares a stratified re-sampling of the log10 of the probability of the regression trend of the GHCN station distance separation versus station series correlations being 0 to the same statistic for the annual Ljungqvist proxies.
Recall that the GHCN stations distance/series correlations are from 447 stations from the period 1950-1990. The proxy relationships are average series correlations of 41 years collected in 23 sets over 947 years. In both station and proxy relationships the correlations used were confined to distance separations less than 3000 km. The final proxy regression with 16 data points was the basis for the GHCN re-sampling with the station data being re=sampled with 16 data points randomly selected from the same 100 km incremental distance separations in which the proxy data resided.
The histogram of the station log10 probabilities compared to the proxy log10 probability shown as the vertical line at -0.33 shows that proxy series correlation versus distance relationship is significantly different than that relationship for the station data. This difference is not so apparent when one looks at the scatter plot of the GHCN series correlations to distance separation.
The next step will be to add white noise to the GHCN station data and repeat the comparison described here.