Mann 08 – Variable Data
Posted by Jeff Id on September 16, 2008
Recently Michael Mann published his latest version of the hockey stick, claiming boldly that now he can reconstruct temperatures back into the last milleneum with high accuracy. Unlike before, Dr Mann published his data and software for everyone to see. Unfortunately, he published an original dataset which he re-published less than a day later. Thanks to climate audit, I have obtained a copy of the original deleted data and have done a comparison from the original to the actual data used below. Keep in mind, all these comparisons are actually between THE SAME DATA!!
The first dataset is a series of 1209 data series captured and archived by ClimateAudit when the data was released. The second set was 1347 data series published (by accident I assume) on the NOAA server and is at the link ftp://ftp.ncdc.noaa.gov/pub/data/paleo/contributions_by_author/mann2008.
Now that I have some software tools built and I took a short look at what wasn’t used in the latest hockey stick in my previous post. I thought it might be useful to make sure that the original (deleted) data matched the 1209 used proxies. As Steve McIntyre already noticed there are some problems.
First an example of how the data was extended on many series.
And a closeup to show the end better.
There were many datasets infilled as discussed in the Mann paper. The majority in fact had some kind of extension in recent years.
I ORIGINALLY HAD A TRUNCATED DATASET FOR NV037 HERE. THIS WAS AN ARTIFACT OF THE DOWNLOAD AND THE FULL LENGTH DATA IS ON THE ARCHIVE.
But much stranger than all of this this was that many data sets are just plain different. This graph is from Lutannt10 showing the scrapped data series and the replacement series.
I wonder which was the real curve?
Ok, how often does modification happen. I put the number cruncher to work comparing all the datasets and came up with this. The series titled “modified” have numbers in the original set which are different from the numbers in the replacement set and does not include numbers which were added or deleted to the ends of the original scrapped data. These curves were changed for unexplained reasons.
From 1500 on about 50 – 60 series had modified values. The computer went through every series looking for differences. Only years which had actual values in both datasets were compared, the number of series values which didn’t match were counted for each year to make this plot. I will probably reduce this to a single number instead of a graph in the future but for now this shows the presence of a great deal of strangeness!
Ok, what did that do to the sum.
This is the net change to the data from the original to the actually used data, not including in filled data, only data which existed with one value and was changed to another.
Then I asked what did the truncation or addition of data do. First let’s see how many datasets were modified this way. These curves represent only data which was either added or deleted from the original series.
Nearly all of these series were modified at the end years. it turns out that most but not all of the modifications were extensions at the end of the calibration window! A very important point when you consider how correlation is used to show that the curves are even temperature at all.
Same graph zoomed in.
What did that do to the average.
This plot is an average of the standard deviations added or deleted from the series. This was calculated by taking the total series modified each year, adding only those together and dividing by the number of series modified for each year. This means that a year with one series and a 0.5sigma value is plotted equally to a year 1996 having over 1000 series modified having an average of 0.5 sigma. So although this graph doesn’t show it, you need to look at the last 25 years as having a much greater weight than the rest.
How do these effects look when added together. The following graph is a total of all modifications above and includes all series values which were different in any way.
The summation of the change
Again the weighting of this graph is equal for 50 series or 1000. In the calculations of the final paper the summation of 1000 series (as in the recent end of the graph) has a greater weight. The same graph is zoomed in below.
Now what would happen if we take into account the numerical weighting of the modifications. After all, repeating the same modification more times gives a greater weight.
So there it is folks. The entire difference between the first set of original data and the modified data actually used. A hard peak at the end with no other net change!.
I wonder if this solves the divergence/corrrelation problem?
As always, comments welcomed!
I think I will do my next comparison between the original 1209 data series downloaded by Climate Audit and the 1209 data series now posted as the “original” data. There might be no difference aside from the truncation above, but we’ll find out.
I have also done a post which shows proxy infilling, It takes some study but it reveals a huge problem in the latest hockey stick.