Historic Hockey Stick – Pt 2 Shock and Recovery
Posted by Jeff Condon on June 23, 2009
This post is about testing how well Mann 08 CPS (composite plus scale) can recover a signal from artificial ARMA proxy data. ARMA is just a fancy method to create artificial signals which match the noise and autocorrelation of a measured one. If you’re not familiar with this – don’t worry, it doesn’t matter.
In the last post we saw that Mann CPS hockey stick maker can make any shape you want using the same method and data used to make a hockey stick temperature curve. It happens because any data which doesn’t correlate to a pre-determined curve is discarded. This leads reasonable folks to say incorrectly – If it is temperature it should correlated so the method is reasonable. What’s missing from this seemingly reasonable understanding is that the response of the correlation sorting to noise level is non linear. In high or medium noise cases correlation can become a cherry pick of your favorite noise. This post takes the next step and looks at how well CPS does at retrieving a signal from both zero average random data and random data with a signal.
First, I looked through the Briffa Schweingruber MXD latewood proxies, which are tree ring density proxies. These ones are interesting because you can actually see the imputed (not real) data which was RegEM infilled on the graph endpoints by the change in noise level in the most recent 60 years.
The code presented below performs an ARMA match to the noise level of actual proxies including (redness – see comment two below) and creates 10,000 proxies with no trend but similar noise levels. Figure 2 is the first of 10,000 generated ARMA signal-less proxies.
The next step is to verify the proxies don’t contain a signal. So the code averages the data by row, one row for each year. Figure 3 is the average of all proxy data.
The ripple is of course the remaining noise of the average. In Figure 4, CPS was run on the same data with a zero average, the red line represents the curve the software was set to look for which is analogous to Mann08 inserting a temperature curve in the graph. Data which didn’t correlate greater than r = 0.6 was thrown out and the remaining data was scaled and averaged one proxy at a time to match the red line. Standard CPS in other words.
We discovered an unprecedented warming trend in no signal data. I’ve done a lot of these curves and I see it as a perturbation(shock) and recovery. The shock is created by a biased calibration sort of the noise and the recovery rate is based on the autocorrelation of the noise level. The long term trend tends to re-center right on the mean of the calibration range data.
So you might ask what happens when there is actual temperature information represented by the proxies. Fortunately ARMA gives us a method to do just that. The code presented below generates 10,000 proxies 1000 years long. We can create a fake temperature signal of known amplitude and add it into the proxies. After that, we’ll go look for it using CPS and see how well it does.
Figure 5 is an artificial temperature signal. One hundred one years of warming was chosen arbitrarily from 1900-2000 with exactly a 1 C amplitude. Then in history the code added a sine wave, also having a 1C amplitude.
This signal is then added to the simulated proxy data to create the same graph as Figure 1 which when you squint, you can see the sine wave and temperature rise in Figure 6.
Just to make it very clear, Figure 7 is an average of all the same ARMA proxies as above with signal together.
We have a near perfect recovery of the artificial temperature signal simply from an average of the noisy data. Really, it’s tough to beat a simple average in signal recovery. CPS aims to do just that, let’s see what we get. Figure 8 uses the same CPS Mannian code as Figure 4, yet we use the dataset which has the signal from Figure 5. Since we know a priori the temp rise in the data is exactly linear 1C from 1900 onward to seven digits accuracy, that is what I set up the code to look for. The red line in Figure 8 represents the expected value of 1 C rise in 100 years.
As Figure 8 shows, using CPS to try and detect a 1C rise in the data which Figure 7 proves averages perfectly to a 1C rise found only a 0.6-0.7C rise which is not very good performance in signal extraction.
Just to re-explain what happens here. In Mann 08 the proxy data which may or may not be temperature is compared by correlated to gridded temperature data and proxies which don’t have the proper upslope are chucked in the circular bin. The remaining data is offset so that the mean in the calibration range (years red line above exists) matches the mean of the temperature calibration data (red line). The last step is that the individual data is then scaled to match the red line. -
This get’s a little complex.
It turns out that the added noise does not evenly affect correlation to the slope. It causes either a positive or negative bias in the calibration region unless we’re really lucky. This is the equation for correlation copied from Wikipedia:
If you use correlation you find out that correlation values on high slope data is less affected by noise than low slope data. This means that on average slope reducing noise will have a greater correlation reducing effect than slope increasing noise. For a balanced signal recovery we would hope the math would have an equal effect on positive and negative slope changes due to noise on the signal. Since more positive slopes are favored during sorting, when the signal is re-averaged the completely random unsorted noise in the historic portion (to the left of the recovery) still averages to zero while the area in the calibration period has a non-zero noise average. The result is a distortion in the isotemperature lines of what is presented as a rectilinear plot.
It is also possible but unlikely to get a magnification of the historic signal as well. If the actual signal plus noise has a standard deviation smaller than the signal you’re looking for the result of the sort and scale will amplify the historic data but in practice this is a nearly impossible case.
Therefore what ends up happening is that the signal in history is amplified or deamplified depending on the level of the signal you are looking for and the level of noise on the data. If the signal in each proxy is not substantially less than the temperature signal you’re looking for, most of the proxies recovered by correlation have a greater calibration range slope requiring a demagnification in CPS to match the standard deviation of the red line. Since most data are then reduced in amplitude the NET is a demagnification of the historic signal.
Therefor there are actually two distortions of any possible temperature signal in CPS. The first is the throwing out of data, the second occurs during the scaling of the graph.
The exciting bit:
In this post the individual distortions are combined but our artificial data allows us to do one more trick. Since we added the signal to the simulated data and then went to find the signal, we still have a perfect copy of the simulated data without any signal. As the last step in the code presented, the proxies with no signal average to 0 as shown in Figure 3 are put in a 7 x (10000 x 1000) array and a constant value is added to them. The offset values used were (-1.5, -1, -.5, 0, .5, 1, 1.5 ) . Then, instead of running CPS with correlation separately, the code used the same proxies which pass correlation from Figure 8 and employed the same offset and magnification from Figure 8 giving us the actual shape of the distortions in temperature created by CPS. True iso-temperature lines.
While the previous results of these latest posts have been published showing some of the distortions in the calibration range signal, the distortion of the hisoric signal hasn’t been published on to my knowledge.
The iso-temperature lines indicate the true temperature scale of the graph. I made this graph larger so you can see the detail (click on it). Notice how the tips of the black line sine wave just touch the top and bottom of the blue -/+1C isotempereature lines. Note again and the valley of temperature at the calibration range, the black line just touches the zero isotemp at the year where the red line begins (0 temperature). The black line ends again just touching the blue +1C isotemperature line at the year 2000. Now look at the CPS value on the far left temperature scale.
What this means is that if we know the coefficients used and we know the resulting signal, it may be possible to reasonably back calculate (correct for distortions) the result and find the true signal prior to the CPS weighting.
Even if there isn’t a reasonable way to do the back-calculaiton, what this demonstrates is that hockey stick CPS curves are naturally unprecedented due to the math. Calibration range data is automatically matched to whatever signal you’re looking for and historic data is demagnified. EIV in Mann 08 is simply a regEM of the sorted data onto gridded temp curves – (an odd form of regularized multivariate regression). Since all data is available, the series are actually linearly scaled by the same factors through most of the early record so there is no possibiltiy that EIV can correct for the correlation implemented distortions.
If you consider the most recent 150 years, the correlation sorting has a form of shock and recovery. It’s apparent that the same shock and recovery pattern exists in almost every paleoclimatology temperature reconstruction. It’s effects are also visible in the recent sea ice reconstruction by Fauria 09 my eyes are so sensitive to it now that every time I see an unprecedented curve I wonder what happened mathematically to the data. Mann 09′s recent hurricane paper also demonstrates the same effect but it’s difficult to see where it comes from.
What we do know from this is that correlation sorting of proxies is not a valid method for signal recovery.
As before the best explanation is in the code. I recommend people run it. The first half of this code is from the previous post, there are comments beginning the second half to separate them out. If you’re interested in how CPS distorts the signal, I would recommend you read above briefly but then study and run the code.