The All Important Blade of the Stick Uses Less Than 5% of the Data

Many of you saw my graph of the net effect of the extrapolated “projected” data used in Mann 08 in my post titled Mann 08 – Variable Data. What was not clear at the time to me was how this data was created and why. I think this post clears it up a bit!

The graph below is the last graph from the post, which indicates that a strong positive weighting was added to about 1100 of 1209 series which in reality had no data at these points. The graph is here. You can think of it as a summation of the “added in” data divided by the 1083 series that the extrapolation procedure was used on. This represents all of the data which was never measured but was used in the correlation for the hockey stick.

This clearly demonstrates a strong hockey stick modification to the complete data set. It was accomplished using a RegEM method which has been used by others to interpolate between series, extrapolating (projecting the future) is another use entirely! But how many of these were accepted into the 484 series of the final paper?

After some number crunching I found that 391 of the 484 used data sets included extrapolated data. That is 80% of the final “passed calibration” data set used extrapolated data. I need to note here that one proxy (NV037) had a problem in the data import so the actual number is 392 modified proxies but let’s not split hairs. The European Luterbacher series was accepted by calibration 100% with no extrapolation, it had an additional 71 series which meant that 95% of the data in recent times is between Luterbacher and the projected data.

Of course we have to ask, what about the ones which passed calibration, how much effect did they have.

When I saw this graph I thought there had to be a mistake. The net effect of the added proxy data was, almost nothing?? The average of the purple line is the same data as the blue line without the infilling. I expected something bigger at the 1950 end. What happened?

In the graph below I zoomed in on the endpoint to see the smallest variations of the data. The purple line indicates the non-extrapolated data which is barely shifted from the entire series average. The blue line is the average of all 484 series which passed calibration, the yellow line is the extrapolated data and it represents 80% of the data used at this point in the calibration. Mann had more of a hockey stick without the Yellow line included but he only had the Luterbacher series plus 4.5% of the other non extrapolated data, not enough to make big claims by.

The graph below is the same as the first graph which incorporates 1209 series. The one below depicts the summation of the extrapolation used in 484 series. The summation was divided by 391 extrapolated series to show the weight of the effect on the final data. This shows that a very high percentage of the Blue line in the above graph was created by the extrapolated series in the below graph. It was enough to put an artificial kink in the blue line above at the 1998-1999 boundary.

Ok, extrapolation is a tough business even if you know for sure that the trend will continue. For instance if you know that your company grew 10% each year for 3 years, how certain are you that the fourth year will also grow 10%. In the case of proxies, we don’t even know if they represent temperature, Mann’s theory is that he can extrapolate using an algorithm designed for interpolation to tell us what the tree rings are doing, rather than actually measure them.

So we need to look at the reasonableness of the extrapolations. This next graph is the very first graph I plotted. All of the following graphs are values from a single proxies. The purple line was separated by me to demonstrate the extrapolated portion of the data. ALL OF THESE FOLLOWING PROXY GRAPHS PASSED THE CALIBRATION SIGNIFICANCE TEST.

The blue line is the actual data and the Pink line is the extrapolated data. When I do intense math, I like to make sure that the result passes the giggle test. Mann does not follow the same standard. I found this somewhat encouraging so I plotted some more.

Again, it isn’t the curve I would have drawn with my pencil.

The above graph is filled in to the same extent as a whole bunch of Schweingruber series. The projected data jumps downward and completely reverses trend from the blue line. Pretty fancy math I think.

This one above doesn’t hurt so much except that it just doesn’t match the data where the RegEM method takes over.

Pretty strange looking extension also. All of these curves have a big jump where RegEm takes over.

Does this purple line above look like it belongs to the blue one.

Really strange! a negative trend. Because the summation of the added extrapolated data is 100% positive I must have been pretty lucky to find this one!

How did Mann come up with these projections.

The RegEM algorithm of Schneider (9) was used to estimate missing values for proxy series
terminating before the 1995 calibration interval endpoint, based
on their mutual covariance with the other available proxy data
over the full 1850–1995 calibration interval. No instrumental or
historical (i.e., Luterbacher et al.) data were used in this procedure.

He took a very select series of proxies, and essentially “pasted” them onto the 1083 of the 1209 total proxies.

That means that 89.6 % of the total data sets had extensions, with 71 Luterbacher series excluded from the RegEm calculation there are only 1138 total proxies to use. Of those 1138 – 1083 = 55 remaining proxies determined the actual end value of the other 1083 proxy series. Don’t forget that Luterbacher was just an example and some of the other series he references were also rejected by his above quote. So the number is actually less than 55.

Now, my first post was about the 148 proxies which were rejected from the original, original posted dataset which was placed on a server at the NOAA before Mann changed the data to the new origninal dataset of 1209 series. Since the original series was unmodified by regem, let’s look at the graph of the data again.

Sixty four of these 148 rejected non extrapolated data series had values in 1998. There were 64 more proxies to work with in his extrapolation, yet they were rejected without mention in the paper.

My thoughts:

Steve McIntyre was looking into the Mann software to determine whether the extrapolated data was used or not in correlation “calibration” to temperature data. I haven’t had time yet because I stayed up until 1am trying to make this post. I believe that it must have been used in calibration and the extrapolated data was required for a few reasons. Either way it will be figured out for certain.

1. Mann had a better hockey stick without the data extrapolation. 3rd figure in my post. The added data shortened the HS imperceptibly.

2. The infilled data only was added to recent times (not projected backwards). We have measured temperature for recent times so additional data points here can only be useful for correlation/calibration procedure. There is absolutely no other reason I can think of why this would be done.

3. Without the now HUGE mistake of posting 148 available proxies which weren’t used, the Mann 08 paper would only need a few hand selected proxies to alter all the remaining datasets to have an upward trend. How would we ever show that he hand picked a few weird proxy’s to make the case. If you notice, in figure 4 of this post, all of the additions to the data were positive in nature!!!!

4. Many are aware of divergence which means the actual measured tree rings didn’t follow temperature in recent times. In this case, a few non-diverged series could be used to fill in the endpoints of the proxies rather than have to fight with the “inconvenient” data. I think this is why the MXD data was eliminated as well, this is from the Mann 08 paper.

Because of the evidence for loss of temperature sensitivity after 1960 (1), MXD data were eliminated for the post-1960 interval.

As I understand it the “lack of sensitivity” resulted in a short blade on the hockey stick and a strong MWP is produced compared to today. The fact that it is so casual a statement is where the consensus comes into effect. There should be a lot more involved in the elimination of data but the group think allows this to happen.

Keep in mind that the claim to fame is that 40% of the series passed calibration whereas random data would only pass 10-13%. If a large% of the data are rejected in calibraion, no paper.

My conclusion overall is that the blade of the hockey stick (prior to the temp curve overlay) doesn’t even exist. It was fabricated by an amazingly low number of proxies (under 55) and painted into the remaining series to claim a high degree of correlation. These 55 series were selected after throwing out an additional 148 proxies 64 of which had data which could have been used for extrapolation. This tossing of the data was done for unexplained reasons!

It now seems possible that if we remove the extrapolated data, a significant portion of the datasets would likely fail calibration due to poor correlation and another Mann paper can be shot down.

Ok, again I feel like I should attempt to translate what Steve is saying. I say that because two weeks ago it wouldn’t have had much meaning to me.

When Mann takes these plots of tree rings (which are not temperature) he compares them to measured temperatures in their location on earth. If it doesn’t meet criteria for one temp, he compares it to the next closest. This means his very noisy and nearly random data gets two chances to have a similar shape to a measured curve. This alters the probability that the data will be accepted as temperature in Mann’s favor.

Mann did a pretty good job explaining what he did with most of his paper. He was quite open about most of it. The details are the problem, this is one of the details which wasn’t very clear in the paper to me. I may have missed it but it is very critical to the premise of the paper. Which goes like this below.

– Compare proxies to ground measured temperature at each location on earth.
– if proxies correlate to 90% they are temperature
– if proxies are random only 10% will be accepted
– Over 40% proxies were accepted after comparison. Therefore it must be temp– 60% were thrown away even after infilling. Mann needed well over 10% acceptance (30% and up) or I don’t believe the paper gets published!!

Well if you compare each series twice (once to each nearby temp)what does that to to the 90%? Anyone who ever pulled the lever on a slot machine knows that certainly changes the odds.

If some series contain temp info what does that do to the 90%. Luterbacher contains actual temp info, therefore it is not a proxy!

If the ends of other series are truncated because they don’t fit the conclusion (he actually indirectly says that in the paper) and temperature curve data is added, pasted, stuck directly onto, 90% of the data series ends, WHAT DOES THAT DO TO THE 90%?

Why am I 90% certain that these guys think they are smarter than the rest of us put together.

Where is the outrage from those who know?

George Tamino knows, he has been beating up on CA for a long time now. He knows it’s BS. Gavin Schmidt knows. Hell a bunch of those RealClimate guys know, this is not good science.

I guess the farther I look the more questions I have.

15 thoughts on “The All Important Blade of the Stick Uses Less Than 5% of the Data”

Craig Loehle says:

September 18, 2008 at 7:00 pm

I think you have got it, if you can make a few more links in the chain of computation. This needs to be published.

Jeff Id says:

September 18, 2008 at 8:10 pm

Thanks Craig, I think there’s too much data here, people can’t figure it out so not too many comments. I got pretty wound up when I figured out what I was looking at last night.

chopbox says:

September 19, 2008 at 5:20 am

You’re doing good stuff here. Thank you.

Gamail says:

September 20, 2008 at 6:00 pm

No doubt there is a lot of data. Your straightforward comments make it easier to read. Your work is very interesting. I am just surprised that the veneer of science that Mann used to hide manipulations is so transparent. I think you said in another post that there must be a lot of pressure (both internal and external) for Mann to publish such a thing. That is the only explanation I could think of as well

Stephen McIntyre says:

September 20, 2008 at 8:43 pm

Jeff, you might take a look at this post which shows a few of the Schweingruber regional ABD series BEFORE the 1960 truncation http://www.climateaudit.org/?p=536 . These MXD series are the type cases for divergence and it is very disquieting to see them appear in an “infilled” form where they have substituted infilled data for actual data.

Jeff Id says:

September 20, 2008 at 10:46 pm

Thanks Steve,

What he is referring to is a large set of 105 tree ring proxies which have more recent values available. These recent values clearly don’t match temperature. These were all chopped to the same year 1960 and had fake data pinned on the ends of them. Yup, I called it fake.

Every Schweingruber had 38 years of data pinned to the ends. These are some of the most infilled of all proxies. In the end 95 of 105 were deemed significant in matching to temperature.

Mann was so proud in his paper by saying that 40% were accepted yielding 484 proxies of 1209.

Lets look now, 484 (-) 71 lutebacher series which contained actual temp info (-) 95 schweingruber= 318 series remaining or 26% of real proxies have now passed.

But wait only 5% of these werent infilled. So there’s more to come.

Stephen McIntyre says:

September 21, 2008 at 12:27 am

Also Mann did a double dip comparing correlations to 2 different grid cells. If only correlations to the gridcell containing the proxy are counted, it goes down even more.

Jeff Id says:

September 21, 2008 at 1:33 am

Ok, again I feel like I should attempt to translate what Steve is saying. I say that because two weeks ago it wouldn’t have had much meaning to me.

When Mann takes these plots of tree rings (which are not temperature) he compares them to measured temperatures in their location on earth. If it doesn’t meet criteria for one temp, he compares it to the next closest. This means his very noisy and nearly random data gets two chances to have a similar shape to a measured curve. This alters the probability that the data will be accepted as temperature in Mann’s favor.

Mann did a pretty good job explaining what he did with most of his paper. He was quite open about most of it. The details are the problem, this is one of the details which wasn’t very clear in the paper to me. I may have missed it but it is very critical to the premise of the paper. Which goes like this below.

– Compare proxies to ground measured temperature at each location on earth.
– if proxies correlate to 90% they are temperature
– if proxies are random only 10% will be accepted
– Over 40% proxies were accepted after comparison. Therefore it must be temp– 60% were thrown away even after infilling. Mann needed well over 10% acceptance (30% and up) or I don’t believe the paper gets published!!

Well if you compare each series twice (once to each nearby temp)what does that to to the 90%? Anyone who ever pulled the lever on a slot machine knows that certainly changes the odds.

If some series contain temp info what does that do to the 90%. Luterbacher contains actual temp info, therefore it is not a proxy!

If the ends of other series are truncated because they don’t fit the conclusion (he actually indirectly says that in the paper) and temperature curve data is added, pasted, stuck directly onto, 90% of the data series ends, WHAT DOES THAT DO TO THE 90%?

Why am I 90% certain that these guys think they are smarter than the rest of us put together.

Where is the outrage from those who know?

George Tamino knows, he has been beating up on CA for a long time now. He knows it’s BS. Gavin Schmidt knows. Hell a bunch of those RealClimate guys know, this is not good science.

I guess the farther I look the more questions I have.

Pingback: Una nueva jugada de Mann? Palo de Hockey II « PlazaMoyua.org
Roger says:

September 21, 2008 at 3:35 pm

Does anyone have comments about the missing “little ice age” and “mediaeval warm period”?
More attempts at rewriting?

Stephen McIntyre says:

September 22, 2008 at 1:13 pm

Jeff, I’m 99.9% sure that there’s something wrong with the locations of the “Schweingruber” series in the Mann SI. These series appear to come from Mann’s own group despite their name ; they derive from Rutherford et al 2005, which has a location map of the MXD gridpoints; the locations don’t match a plot of the Mann locations. It looks like they’ve turned 37.5 long into 3.8 etc.

Pingback: WeatherOutpost12
Andy says:

September 22, 2008 at 10:14 pm

I take my hat off to you, quite amazing!

Pingback: Has A Statistical Ghost Been Used To Create A Political Hoax? « Evynn’s Weblog
Pingback: Climategate, a year of comedy. « the Air Vent

Rate this:

Share this:

Related

15 thoughts on “The All Important Blade of the Stick Uses Less Than 5% of the Data”

Leave a comment Cancel reply