## Problems With Berkeley Weighted Jackknife Method

Posted by Jeff Id on November 20, 2011

Well I’ve heard nothing from the Berkeley temperature team (BEST) on my critique of the CI calculation method. As promised, I waited a couple of weeks before starting to publish any demonstrations of the problem. Like Mannian statistics, this one stuck out to me like a sore thumb. They used a neat method to estimate the confidence interval through the algorithm but in the process accidentally violated the main assumptions of the method they used. It is described below in the image grabbed from the methods paper.

As I have written before, Equation 36 is based on the premise that the last term is truly 7/8 of the data. Having no response yet to my critique, I thought it would be worthwhile to make a simple example of the problem for demonstration here. The BEST algorithm works by weighting data closer to the mean greater than data far from the mean. The intent is to weight the better quality stations ahead of the rest. The result, however, may be quite a bit different than the intent.

The confidence interval calculation steps are as follows:

1 – compute global average using entire method

2 – remove 1/8 of the data and compute global average again

3 – scale the global series 8 times and the reduced data series 7 times and take the difference.

4 – repeat the process for all the data 8 times using different random stations each time.

The problem occurs in that weighted data means that some stations count more than others. It would still be ok except that the stations are re-weighted with some data missing. The problem is that we have violated the assumption that the removed data represents 1/8 of the result. There is no-longer any mathematically consistent connection between the per station temperature noise and the fraction used in the jackknife formula. Instead of 1/8 you might have 1/5 (more noise) or 1/20 far less noise.

Below is an example which uses a simple series of 8000 points having either normal or a mirrored exponential distribution (Laplace). The reason for Laplace is in this paragraph from the BEST paper:

In practice, we observe that the data fluctuation distribution tends to be intermediate between a normal distribution and a Laplace distribution. In the Laplace limit, we would expect to crop 2.9% of the data, so the actual exclusion rate can be expected to be intermediate between 1.25% and 2.9% for the typical station record.

This demonstration does not use whole series which are weighted according to their variance but rather uses a single point value to demonstrate the effect of the BEST oversight. It consists of two basic steps which are to first calculate the known error caused by deletion of 1/8 of the data, then see if it changes with the weighting method. In greater detail it is:

1 – Create 8000 random points having either normal or Laplace distribution having a zero mean.

2 – Randomly remove 1/8 of the data 1000 times and calculate the mean to estimate the standard error caused by damaging the dataset.

3 – Use the same data and employ an iterative weighting methodology similar to BEST.

4 – Randomly remove 1/8 of the data 1000 times, reweight after removal, and then calculate mean and standard error as in step 2 above.

If there is any difference between the unweighted 8000 points and the weighted result — there is a problem. I ran the code below and found that with a normal distribution, it returned a standard deviation in result by removal of 1/8 of the unweighted data of .0040. This represents the true value of error in knowledge of the mean caused by removing 1/8 of the data. The next section of the code then calculates the weighted result and it returned 0.0054 indicating that assuming a normal distribution, the BEST method could increase the error estimate over the actual. Again, any real difference is bad news for the method.

I repeated the result then using a Laplace distribution which has longer tails and it gave 0.0058 for the unweighted series and then 0.0045 for the weighted version indicating that the Laplace distribution would result in an undersized confidence interval in that case. In both cases we have a difference from the actual result created by the weighting function in combination with the type of distribution exhibited by the raw data. Variations in the BEST version data and code from this example will likely give different results than shown here but the point is that the effect is real and unaddressed in the BEST CI calculation.

Why do I bring it up again if it is potentially a small issue? Because the paper is in error. This isn’t as important as a potential bias created by the scalpel method which could very easily skew results, but it is a defined error in the usage of the Jackknife formula in a $600,000 reenactment of the global average temeprature from surface stations. The entire CI section of BEST needs to be redone and I’m not sure how to do it.

I hope to expand the detail of this post in the future. In the meantime, if anyone can find any errors in what I’ve done below or has anything to check with the code, let me know your thoughts.

The code for this post is below:

#data=rnorm(8000) # run for normal distribution #data=rexp(8000)*sample(c(-1,1),8000,replace=T) #run for exponential distribution #median v squared is .44 wt=getweightvector(data[0:8000]) #test function #single point weighting using similar methods to BEST getweightvector=function(x) { weights=rep(1,length.out=length(x)) for(i in 1:100) { mn=mean(x*weights) #print(mn) weights=2*.44/(.44+(x-mn)^2) weights[weights> 2 ]= 2 weights[weights< 1/13 ]= 1/13 weights=weights/mean(weights) } #print(mn) weights } #unweighted result noweight=rep(0,1000) for(i in 1:1000) { noweight[i]=mean(sample(data,7000,replace=F)) } sd(noweight) #weighted result weight=rep(0,1000) for(i in 1:1000) { dat=sample(data,7000,replace=F) wts=getweightvector(dat) weight[i]=mean(wts*dat) } sd(weight)

## kuhnkat said

I think the fact you received no response tells us more about Best than any of the problems being found.

Thank you for your efforts.

## Grzegorz Staniak said

Have you considered publishing a critical paper/letter in reviewed press? It would simply weigh more than a blog note. Perhaps they’re not intentionally ignoring you, as Kuhnkat suggests, perhaps you’re just under the radar.

## Venter said

#2, perhaps you would first tell the BEST group to publish in press instead of giving press conferences based on flawed papers which are not peer reviewed and published? Perhaps if you care to read first before commenting, you’ll see that none of the BEST papers have been peer reviewed and published. So why should Jeff submit a peer reviewed rebuttal for an unpublished paper?

Do you even understand or read about what you are posting?

## Anonymous said

@Venter

The papers have been published at the moment when they were submitted for review, and at the moment are in the process of being reviewed. People have been publishing pre-prints for decades now (have a look at http://arxiv.org/), so I really don’t understand what exactly you’re shocked by. If they hadn’t been published, the host of this blog wouldn’t even had a chance to be heard before the press time. I still think that writing a (open) letter to reviewers would be a much better move than just counting on the reviewers to check blogs before they give opinions on the papers.

It’s really heartening to see how peer review suddenly became the benchmark of credibility. But if it is so, then tell me Venter, how do you know the BEST papers are “flawed”? Neither they nor their critiques have been reviewed and published yet.

## Grzegorz Staniak said

#4 is me, I just forgot to log in.

## Jeff Id said

“I still think that writing a (open) letter to reviewers would be a much better move than just counting on the reviewers to check blogs before they give opinions on the papers. ”

Why would you assume I didn’t contact the authors? I’ve stated on other posts that I did. For this post, I was rather hoping that those with the wherewithal would try to grock the issue and give commentary on what I am saying. I have gone back and red the referenced literature by Tukey on the method and have located some language which supports my contention that jackknife method assumptions were violated, but I don’t think the original authors imagined anyone re-weighting individual data between jackknife tests.

## Venter said

#4,

No paper is considered published when it is submitted for review. I’m aware of what pre-prints are and how they are circulated. These were not pre-prints for review sent to close colleagues and experts for comments. These were papers put up in the web before peer review and with a huge PR saying things that were not in the papers. The entire PR exercise was a disgrace, misrepresenting the papers.

Now of course, experts are taking the papers apart and the BEST team will have a lot of re-writing to do. As usual, Climate Scientists and statistics don’t seem to mix as they seem to be bumbling amateurs when it comes to statistics. And their entire edifice is built upon crappy models, flawed statistics and bad maths, not to mention a ton of hubris.

## Carrick said

GS:

Why do you say that?

## Grzegorz Staniak said

@Venter

I encourage you, go and take a look at arxiv.org. You’ll find there papers — in mathematics, physics, biology, statistics — routinely published there as pre-prints/e-prints

beforethey are submitted to press review process, if they’re submitted at all. That’s what open access in science is about, among other things. It looks like you’d like to turn back the time and restrict review to “close colleagues and experts”. I have no idea why.As for the rest of your post: how exactly do you know, Venter, that “the BEST team will have a lot of re-writing to do”? Are you aware that the team consists of physicists and statisticians, with the addition of literally

oneclimate scientist, the skeptical Dr. Curry? Here’s the CV of Prof. Brillinger, who’s responsible for the statistical side of the project:http://www.stat.berkeley.edu/~brill/2010CV.htm

Would you care to present the reasons why you call him a “bumbling amateur”?

Sorry, Venter, but I’m strongly tempted to finish by addressing your question to yourself: do you even understand or read about what you are posting?

## RomanM said

Publishing at archiv.org is no guarantee that everything (or anything!) in a paper is correct. Furthermore, it is sheer arrogance to pontificate on the supposed implications of one’s research BEFORE anybody (reviewers or otherwise ) has had a chance to even glance at the contents.

I have all the respect in the world for David Brillinger having known him well personally and read his work for 42 years. Why do you claim that he is “responsible for the statistical side of the project” when his name does not appear on any of the four papers listed on the BEST site? I would have thought that if he had contributed more than a brief consultation, he would have been listed as one of the primary co-authors instead of given a very cursory “thank you”:

The “jackknife” method used in the paper does not appear to be a standard version since the eight groups which are defined are not unique so there is a further element of uncertainty due to how the specific groups of station records are formed. One would expect that some further

theoreticaljustification of the efficacy of the technique (which resembles a semi-systematic bootstrap) would be in order.## Grzegorz Staniak said

@Jeff Id

Why would you assume I didn’t contact the authors? I’ve stated on other posts that I did.

I don’t read all of them. If you did contact them, then there’s hope this particular point of criticism will be dealt with.

## Carrick said

RomanM:

Worse than just arrogant. Completely foolish, with a nice dash of utterly cynical behavior on the part of Muller and/or whoever else is involved in this nauseating media blitzkrieg.

I’m sure it contains errors, I’m sure sending it through peer review will correct some of those, leave others in place. How well the peer review works depends on the reviewers the editors select and the bias of the reviewers towards the work.

The idea that the blessing of peer review is going to turn this into a gold standard that everything else must be compared against is kind of laughable.

## Jeff Id said

Roman,

“The “jackknife” method used in the paper does not appear to be a standard version since the eight groups which are defined are not unique so there is a further element of uncertainty due to how the specific groups of station records are formed.”

I’m fairly certain they removed each station only once in the 8 groups. I’m at work now and don’t have time to look up the quote. My example above was just to see if there was the predicted difference caused by reweighting and there was. My understanding is that this isn’t simply a matter of a slight adjustment but rather is completely incorrect and will require a rewrite of the CI.

## Grzegorz Staniak said

@RomanM

Publishing at archiv.org is no guarantee that everything (or anything!) in a paper is correctWho says it is a guarantee of anything? The only point I was making is that it’s a bit strange to object to the publication of pre-prints. Everybody’s doing it routinely. As far as “pontificating on supposed implications of one’s research” is concerned, I’m glad you’re saying this, and I’d like to ask a favor off you, if you don’t mind: could you please repeat this opinion when people here start talking about the importance of surfacestations.org, thank you?

Prof. Brillinger is listed as a member of the BEST team, here: http://berkeleyearth.org/aboutus.php. If he’s not formally responsible for the statistics in the project, then again “guidance, key suggestions, and many discussions” seems to be a bit more than “a brief consultation”. The more he was involved with the project, the harder it is to believe he would’ve overlooked a flaw that “like Mannian statistics […] stuck out like a sore thumb”.

## RomanM said

#14, GS: Don’t be so obtuse! Nobody here is objecting to the publication of pre-prints. That is not the issue.

It is the media blitz instigated by BEST (which has become so commonplace in the field of climate science) which in this case

precedesthe entire evaluation process making claims which have not been verified. Should it turn out that serious flaws exist, it is almost certain that there won`t be a retraction and the misinformation will continue to circulate for a long time.With regards to David Brillinger`s contribution to the BEST end product, you don`t have any actual knowledge of the depth of his involvement (as evidenced by your initial statement that he was “responsible for the statistical side of the project”). From my own experiences in consulting situations, there is no guarantee that the advice offered is followed exactly without omissions from or additions to that advice. With his own substantial reputation in the statistical community, had he been the sole (or even major) contributor to the statistical analysis, it would have been a distinct oversight to omit his name from the author list.

Had David had a major hand in the jackknife evaluation, I would have thought that he would have provided better references to the use of the method than the three given in the paper (the latest of which was dated 1974) since none of these references would deal with the specific situation encountered here.

## A. C. Osborn said

Grzegorz Staniak, have you seen the raw data and the resulting output of the calculations?

If not how can you possibly say whether or not the “Papers” are worth the paper they are written on?

Others have critisized elements of the Papers, but as far as I know no-one has actually seen the “Corrected/Improved” database.

The one that they posted is absolute Rubbish and riddled with errors, so if that is the output from there calculations it certainly should never be used for “Trend” analysis or anything else.

## Kenneth Fritsch said

I would think that we should refrain from having GS change the subject of the thread. It sounds to mean that Jeff ID has articulated a valid question of the authors. I would think that if Jeff had made a basic error or assumption in his analysis that the authors would have been able to reply in a timely manner. The weighting issue would appear to be an added complication to the original Tukey jackknife process that the BEST authors may have overlooked. I personally think that a proper uncertainty calculation and the assumptions that calculation includes are very important when attempting to use global and regional temperature data bases.

## Jeff Id said

Thanks Kenneth,

I’ve sent a link to Judith Curry on this post. I happened to notice that I am no longer on her blogroll earlier today which I found interesting. I emailed on both so we’ll see what’s going on.

## Grzegorz Staniak said

@RomanM

Roman, you might recall that the “media blitz instigated by BEST” was originally aimed quite openly against the so-called “consensus view”, with Dr Muller smearing climate scientists in a way that didn’t make him too many friends there. Somehow the “skeptical” side of the blogosphere didn’t mind his PR stunts then at all — apparently “media blitzes” are only wrong when they don’t confirm the denialist bias.

Yes, you’re right, Prof. Brillinger is not signed under any of the papers and can’t be described as someone “responsible for the statistical side of the project”. However, I don’t think you can prove that his role was restricted to “brief consultation” either. And his deeper involvement in the project would make it less probable that he overlooked something that “stuck out like a sore thumb”. His being part of the published team makes me doubt it was just a “brief consultation”, to be honest.

## tertius said

The “denialist” word appears in Gregorz Staniak’s post at 19. One word often says a lot more than many words…

## RomanM said

#20 Tertius: It’s not the first time he has used the word on tAV. In my book, anyone displaying such an ignorant attitude is frankly not worth wasting any time with.

Anyway, Ken is right. He’s hijacked the thread for long enough.

## Grzegorz Staniak said

@AC Osborn

When I read such posts like yours I always want to ask: where does you certainty and lack of doubt come from? How do you know the data published by BEST are “rubbish” and “riddled with errors”? Have you checked? How? Have you accepted somebody else’s opinion about it? Whose?

## Grzegorz Staniak said

@Kenneth Fritsch

Please read the thread you’re joining. And thank “Venter” for changing the subject, not me.

## Grzegorz Staniak said

Guys, I understand you might dislike certain words. Feel free to propose a more correct one to use, and I’ll hapilly oblige to call it what you like. But if you wanna claim there is no such phenomenon as “climate change denialism”, you might as well come from Mars.

## steven mosher said

Spoke to R Rhode and elizabeth a week or so ago in berkeley.

Basically he is inundated with comments.

1. write to judith and have her point him at it

2. write to Mac and have him recommend it.

I’ll pass along a recommendation as well.

squeaky wheel. issue nothing more. I asked about your your post specifically.

## RomanM said

Oh, puh-leeze! Patronizing, arrogant. Who do you think you are dealing with? Children?

Let’s use the two words in the same sentence to be sure that there can be no question about who we are talking about. If I thought you were capable of comprehending anything substantial, I could add some choice words to the mix myself.

## John M said

#23

If by “changing the subject” you mean starting a discussion about an inane recommendation that Jeff submit a rebuttal to an

unpublishedarticle……that would be your doing.

## Jeff Id said

Steven,

Thanks for that. I’ve emailed both and happened to notice that Judith dropped tAV from her blogroll. No problem with that but I assumed she may be angry or something. She’s left a couple of very odd selections there but that’s her right, there are plenty of readers here anyway. I just don’t really know why.

## Geoff Sherrington said

25 Steven

People who comment here, especially those not personally known to the BEST team, could benefit from using your connections to Robert and Elizabeth and others. Some of us thought you were part of the team, but if that is not the present case, it could be a good move for BEST to bring you back on board. If you don’t mind being an intermediary in this way, it seems to me that communication would be faciltated and that a better outcome is a distinct possibility.

If they are inundated with comments, then there must be people here who could help deal with comments. You are a good communicator/facilitator.

## tongo said

Hi Jeff,

Can you check that your weighting function is actually working properly. I couldn’t exactly work out what it is supposed to be doing, however i would expect that after it is applied, the mean of the weighted vectors either moves closer to the mean of the original series, or else stays the same. From my hacky test with your code, I found that it sometimes moves away (mind you this is the first R code I have ever looked at, so I may be doing something wrong).

Code I used:

data=rnorm(8000) # run for normal distribution

#data=rexp(8000)*sample(c(-1,1),8000,replace=T) #run for exponential distribution

#median v squared is .44

# #test function

#single point weighting using similar methods to BEST

getweightvector=function(x)

{

weights=rep(1,length.out=length(x))

for(i in 1:100)

{

mn=mean(x*weights)

#print(mn)

weights=2*.44/(.44+(x-mn)^2) 2 ]= 2 #cap max weighting @ 2

weights[weights< 1/13 ]= 1/13 #cap min weighting @ 1/13th

weights=weights/mean(weights)

}

weights

}

#dat = sample(data,7000,replace=F)

#wt=getweightvector(dat)

#print(mean(wt*dat))

#weighted result

weight=rep(0,10)

print(mean(data))

print("<<< source(“C:\\test.r”)> source(“C:\\test.r”)

[1] 0.007097447 <—— Original Mean

[1] "<<<<"

[1] 0.00365329 <—— Mean of Dat before weighting

[1] 0.003247681 <—— Mean of Weighted Vector

[1] "…"

[1] 0.01645587

[1] 0.008348417

[1] "…"

[1] 0.003540147

[1] 0.0002978156

[1] "…"

[1] 0.006215551

[1] 0.003838796

[1] "…"

[1] 0.006396342

[1] 0.007824298

[1] "…"

[1] 0.008576286

[1] 0.003792833

[1] "…"

[1] 0.008121355

[1] -0.002360774

[1] "…"

[1] 0.0008143292

[1] -0.00179705

[1] "…"

[1] 0.004681428

[1] 0.003562609

[1] "…"

[1] 0.01344138

[1] 0.0067368

[1] "…"

[1] "Done"

Regards (and thanks for inspiring me to download and play with R – not convinced yet)

## tongo said

Ooops, something went wrong between CTRL-C and CTRL-V. Code is:

data=rnorm(8000) # run for normal distribution

#data=rexp(8000)*sample(c(-1,1),8000,replace=T) #run for exponential distribution

#median v squared is .44

# #test function

#single point weighting using similar methods to BEST

getweightvector=function(x)

{

weights=rep(1,length.out=length(x))

for(i in 1:100)

{

mn=mean(x*weights)

#print(mn)

weights=2*.44/(.44+(x-mn)^2)

weights[weights> 2 ]= 2

weights[weights< 1/13 ]= 1/13

weights=weights/mean(weights)

}

weights

}

#dat = sample(data,7000,replace=F)

#wt=getweightvector(dat)

#print(mean(wt*dat))

#weighted result

weight=rep(0,10)

print(mean(data))

print("<<<<")

for(i in 1:10) #Only 10 because my computer is so slow

{

dat=sample(data,7000,replace=F)

print(mean(dat))

wts=getweightvector(dat)

weight[i]=mean(wts*dat)

print(weight[i])

print('…')

}

sd(weight)

print("Done")

## Www.Undertheyardarm.Com said

Hi there! Do you know if they make any plugins to safeguard against hackers? I’m kinda paranoid about losing everything I’ve worked hard on. Any recommendations?