## Mann 08 – Variable Data

Posted by Jeff Id on September 16, 2008

Recently Michael Mann published his latest version of the hockey stick, claiming boldly that now he can reconstruct temperatures back into the last milleneum with high accuracy. Unlike before, Dr Mann published his data and software for everyone to see. Unfortunately, he published an original dataset which he re-published less than a day later. Thanks to climate audit, I have obtained a copy of the original deleted data and have done a comparison from the original to the actual data used below. Keep in mind, all these comparisons are actually between THE SAME DATA!!

The first dataset is a series of 1209 data series captured and archived by ClimateAudit when the data was released. The second set was 1347 data series published (by accident I assume) on the NOAA server and is at the link ftp://ftp.ncdc.noaa.gov/pub/data/paleo/contributions_by_author/mann2008.

Now that I have some software tools built and I took a short look at what wasn’t used in the latest hockey stick in my previous post. I thought it might be useful to make sure that the original (deleted) data matched the 1209 used proxies. As Steve McIntyre already noticed there are some problems.

First an example of how the data was extended on many series.

And a closeup to show the end better.

There were many datasets infilled as discussed in the Mann paper. The majority in fact had some kind of extension in recent years.

I ORIGINALLY HAD A TRUNCATED DATASET FOR NV037 HERE. THIS WAS AN ARTIFACT OF THE DOWNLOAD AND THE FULL LENGTH DATA IS ON THE ARCHIVE.

http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/data/proxy/allproxyoriginal/

But much stranger than all of this this was that many data sets are just plain different. This graph is from Lutannt10 showing the scrapped data series and the replacement series.

I wonder which was the real curve?

Ok, how often does modification happen. I put the number cruncher to work comparing all the datasets and came up with this. The series titled “modified” have numbers in the original set which are different from the numbers in the replacement set and does not include numbers which were added or deleted to the ends of the original scrapped data. These curves were changed for unexplained reasons.

From 1500 on about 50 – 60 series had modified values. The computer went through every series looking for differences. Only years which had actual values in both datasets were compared, the number of series values which didn’t match were counted for each year to make this plot. I will probably reduce this to a single number instead of a graph in the future but for now this shows the presence of a great deal of strangeness!

Ok, what did that do to the sum.

This is the net change to the data from the original to the actually used data, not including in filled data, only data which existed with one value and was changed to another.

And closeup.

Then I asked what did the truncation or addition of data do. First let’s see how many datasets were modified this way. These curves represent only data which was either added or deleted from the original series.

Nearly all of these series were modified at the end years. it turns out that most but not all of the modifications were extensions at the end of the calibration window! A very important point when you consider how correlation is used to show that the curves are even temperature at all.

Same graph zoomed in.

What did that do to the average.

This plot is an average of the standard deviations added or deleted from the series. This was calculated by taking the total series modified each year, adding only those together and dividing by the number of series modified for each year. This means that a year with one series and a 0.5sigma value is plotted equally to a year 1996 having over 1000 series modified having an average of 0.5 sigma. So although this graph doesn’t show it, you need to look at the last 25 years as having a much greater weight than the rest.

How do these effects look when added together. The following graph is a total of all modifications above and includes all series values which were different in any way.

The summation of the change

Again the weighting of this graph is equal for 50 series or 1000. In the calculations of the final paper the summation of 1000 series (as in the recent end of the graph) has a greater weight. The same graph is zoomed in below.

Now what would happen if we take into account the numerical weighting of the modifications. After all, repeating the same modification more times gives a greater weight.

So there it is folks. The entire difference between the first set of original data and the modified data actually used. A hard peak at the end with no other net change!.

I wonder if this solves the divergence/corrrelation problem?

As always, comments welcomed!

I think I will do my next comparison between the original 1209 data series downloaded by Climate Audit and the 1209 data series now posted as the “original” data. There might be no difference aside from the truncation above, but we’ll find out.

———————

I have also done a post which shows proxy infilling, It takes some study but it reveals a huge problem in the latest hockey stick.

## Stephen McIntyre said

Jeff, for good order’s sake, you should state in your post the exact data set (fill name and URL) that you used and when the data was downloaded.

## Stephen McIntyre said

Tree ring chronologies (e.g. nv037) are in dimensionless units multiplied by 1000. They divide ring width measurements (i say mm) by a sort of “standard” aging model (in mm) to get a ratio and then take the average across trees available in a given year.

## Jeff Id said

Thanks Steve,

I will take care of both of these.

## mugwump said

Very interesting. A lot of work on your part Jeff!

So are they just playing hide and seek with themselves? “We hid the hockeystick in the data, now we’re going to go find it”

## Jeff Id said

Mugwump,

I answered your comment earlier but must have screwed up the posting somehow. I’m glad though because it went something like this.

‘I’m not sure how the infilled data was added, it might be a reasonable process. After work today I’m hoping to figure it out.’

Well I’m glad it didn’t post because I’m an idiot for assuming it might be reasonable. The infilled data uses an algorithm which follows high and low frequency trends specified by the ‘scientist’. The extended data is therefore FAKE, FALSE, PAINTED IN, IMAGINED and I think it may have been critical to the correlation.

The dolt that I am, I thought it might follow additional measurements in the area or some other measurable trend but IT DOESN’T. I keep trying to find the reasonableness in this paper but I can’t do it. Sorry about the rant, that’s why I call it The Air Vent.

The infilled data is statistical painting, nothing more!

So, you are right.

## Lucy Skywalker said

Jeff

Great work here too though not being a statistician I am only slowly learning to understand. However, I’ve borrowed a HS icon from this page and used it to start designing what might be a T-shirt design for BBC’s Iain Stewart’s lecture in Southampton UK if we can make it. Just having fun. Hope you like it. If there’s any problem let me know, it’s still work in progress.

## Lucy Skywalker said

ps All your graphs slide into oblivion on the RHS unless I shrink the text into tiny oblivion. Can you make them narrower? or can I expand the WordPress width?

## Mann 2008: the Luterbacher Mystery « Climate Audit said

[...] Jeff Id has identified another intriguing mystery in the arduous problem of determining what Mann’s realdata was. Jeff observed that the version of the Luterbacher lutannt10 series in Mann’s infilled data version allproxy1209 was different than the version of lutannt10 in allproxyoriginal (Sep 5 version), illustrating this as below: [...]