Subsampled Confidence Intervals – Zeke
Posted by Jeff Id on July 15, 2011
Sub-sampling is hard to argue with. Zeke has done a post on global temperatures at the Blackboard taking 500 stations at a time and looking at the extremes of averages. It presented a set of very tight error bars based on weather variance, sampling errors, and any other random events which affect measurements. The error bars don’t incorporate any systematic bias but there is an amazing amount of detail in the result.Method:
To test if this is in fact true, we can randomly select subsets of stations to see how a global reconstruction using only those records compares to other randomly selected stations. Specifically, I ran 500 iterations of a process that selects a random 10% of all stations available in GHCN v3 (which ends up giving me 524 total stations to work with, though potentially much fewer for any given month), created a global temperature record with the stations via spatial gridding, and examined the mean and 5th/95th percentiles for each resulting month.
The plot above uses only 5% of the data but the point of this exercise is that Zeke has proven beyond any shadow of doubt that we do have enough station data to determine temperatures to an effectively consistent level. Is the data clean enough to have a high quality trend is another question as systematic bias is also perfectly recorded in the above record. Note how the uncertainty expands both in the past and in recent years due to lack of station data in both times.
Back when I still worked with numbers, I performed asimilar analysis on the Ljungqvist proxy data. The method is just too simple and direct to argue with.
Here is what is interesting. These are the error bars incorporating weather noise due to sampling error. This is entirely different from Pat Frank’s weather noise discussed in previous posts because Zeke’s example includes all the local correlation and distant de-correlation of weather patterns. Every possible random variation is included, and the 95/5 percent extremes are still as narrow as shown in Figure 1 above. I wrote at the blackboard to see if he would mind projecting those to the total dataset. It should be possible to estimate the true (everything included) error per station from his result and that would allow a projection of a very narrow confidence interval on surface station data. The reason that gets me excited is because it would be based on reality instead of complex estimates used in the standard CRU CI projections.
I like real but that doesn’t make CRU uncertainty inaccurate.
Zeke’s result makes the CRU uncertainty estimates look to be a little weird but not bad. From the CRU website the paper Brohan, P., J.J. Kennedy, I. Harris, S.F.B. Tett and P.D. Jones, 2006: Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850. J. Geophysical Research 111, D12106, justifies the following confidence intervals:
Sorry for the size of the image, the pdf has the full size. Considering the number of stations involved is 20 times greater in the total dataset of Figure 2 than 1, it looks like the CRU intervals are visibly reasonable – excepting the weird lack of expansion of the CI in recent and distant past years which are known to have less data. It leaves me wondering just how that bodge was applied, because it has to be a mistake of some sort as there is a lot less data.
Some may wonder why I think it is alright to put weather noise in Zeke’s CI yet not in Pat Frank’s study. The answer is that this weather noise Zeke incorporated represents the temperature differences due to global sampling error. Zeke determines the error bars which say, if you don’t measure all of the weather, (incomplete gridding) how much effect does it have on uncertainty of your final average. In Pat’s work, the error due to weather was the total variance of different stations. In other words, comparison of Pat and Zeke’s method proves that the difference in temperature from two stations doesn’t affect our knowledge of the average, but the density of measurement of weather patterns does.
Anyway, Zeke’s post should put to rest any concerns that people should have about sampling density being insufficient for discerning average global temperature. Sampling quality, systematic bias and missing global coverage in the distant past are other matters entirely.