Antarctic Temperature RegEm Forensics

Thanks to Molan Labe who was quite helpful in identifying what I believe is the correct coefficient into regparm for the purposes of recreating the RegEm calculations. After some considerable time I am still not able to recreate the results of Steig’s paper perfectly, but as you can see below, I am fairly close.

First let’s look at the mean trend of the reconstructed AWS station data by Steig. All data used was from the Climate Audit directory (thanks to Steve McIntyre) so the updated reconstructions and data are not involved.

The slope is written in the bottom right. Next is my attempt at the reconstruction.

Overall the shapes are visually quite similar so I subtracted them.

The step function is quite visible. The data pre-1980 is completely imputed/extrapolated while after 1980 has at least some content from one automatic weather station or other. Next are some of the differences in individual series from the data Eric Steig archived for Nature and my reproductions.

The flat line sections are sections where the actual data exists in each series, the other data was infilled by RegEm. The step in the difference is quite visible here as well. I did take the time to confirm that offsetting the surface station data had no effect on the output reconstruction whatsoever so an anomaly in the offset is not the issue. I will try some other settings in RegEm but I doubt that this will resolve the difference. All in all, I feel like I’m pretty close now. Perhaps someone has a suggestion which could resolve the final differences.

Here is the code I used to make the above reconstructions in R. If you cut and paste it you will need to fix the wordpress quotes.

download.file(“http://www.climateaudit.org/data/steig/Data.tab”,”Data.tab”,mode=”wb”)
load(“Data.tab”)
download.file(“http://www.climateaudit.org/data/steig/Info.tab”,”Info.tab”,mode=”wb”)
load(“Info.tab”)
download.file(“http://www.climateaudit.org/data/steig/recon_aws.tab”,”recon_aws.tab”,mode=”wb”)
load(“recon_aws.tab”)
dimnames(recon_aws)[[2]]=Info$aws_grid$id

#########backcalc anomalies
anoms =window(recon_aws, start = 1980)
dat_aws = window(Data$aws[,colnames(recon_aws)],start=1980,end=c(2006,12))
anoms[is.na(dat_aws)] =NA
#write.csv(anoms,file=”c:/agw/antarctic paper/bc anoms1.csv”)

#########calculate surfacestatoin anoms
surf=Data$surface
surf=window(surf,start=1957,end=c(2006,12))
a=as.numeric(surf)
dim(a)=c(600,44)
surfanomaly=array(NA,dim=c(600,44))
for(i in 1:44)
{
b=a[,i]
dim(b)=c(12,50)
an=rowMeans(b,na.rm=TRUE)

surfanomaly[,i]=a[,i]-an
}
#write.csv(surfanomaly,file=”c:/agw/antarctic paper/surface temp anoms.csv”)
sanom=ts(surfanomaly,start=1957,deltat=1/12)
##### combine anomalies

anomalies=ts.union(anoms,sanom)

write.csv(anomalies,file=”c:/agw/antarctic paper/combined anomalies r4 offset.csv”)
######## after matlab RegEm run this #################
recon_id=read.csv(file=”c:/agw/antarctic paper/recon redo R4 offset e=3 rp=3.csv”,header=FALSE)

recon_id_aws=ts(recon_id[1:63],start=1957,deltat=1/12)

#######plot differences

for(i in 1:63)
{
plot(recon_aws[,i]-recon_id_aws[,i],main=paste(“RegEm Steig – IdnStation # = “,colnames(recon_aws)[i]),xlab=”Year”,ylab=”Anomaly C”)
savePlot(paste(“C:/agw/antarctic paper/id recon difference”,colnames(recon_aws)[i],”.jpg”),type=”jpg”)
}

#######total trend calcs
idmean=ts(rowMeans(recon_id_aws),start=1957,deltat=1/12)
steigmean=ts(rowMeans(recon_aws),start=1957,deltat=1/12)

plot(idmean,main=”Antarctic AWS Recon TrendnId eigs=3 regpar=3″,xlab=”Year”,ylab=”Anomaly C”)
abline(lsfit(time(idmean),idmean,NULL), col=”red”)
text(1962,-6,paste(“Slp =”,round( coef(lsfit(time(idmean),idmean,NULL))[2],3) ),col=”red”)
savePlot(“C:/agw/antarctic paper/id recon trend A.jpg”,type=”jpg”)

plot(steigmean,main=”Antarctic AWS Recon TrendnSteig eigs=3 regpar=3″,xlab=”Year”,ylab=”Anomaly C”)
abline(lsfit(time(steigmean),steigmean,NULL), col=”red”)
text(1962,-6,paste(“Slp =”,round( coef(lsfit(time(steigmean),steigmean,NULL))[2],3) ),col=”red”)
savePlot(“C:/agw/antarctic paper/steig recon trend A.jpg”,type=”jpg”)

plot(steigmean-idmean,main=”Antarctic AWS Recon Trend DifferencenSteig – Id eigs=3 regpar=3″,xlab=”Year”,ylab=”Anomaly C”)
savePlot(“C:/agw/antarctic paper/steig minus id recon trend A.jpg”,type=”jpg”)

######recenter anomaly calculations based on 1980 and redo analysis

sanomw=window(sanom,start=1980)
means=colMeans(sanomw,na.rm=TRUE)

for(i in 1:44)
{
if(!is.na(means[i])){ sanom[,i]=sanom[,i]-means[i]}
}

######

anomalies=ts.union(anoms,sanom)
write.csv(anomalies,file=”c:/agw/antarctic paper/combined anomalies r2.csv”)

Since most of the work was done in R, the matlab code was quite simple.

A = dlmread(‘c:/agw/antarctic paper/combined anomalies r3 offset processed.csv’, ‘,’)

OPTIONS.regress = ‘ttls’;
OPTIONS.neigs=3
OPTIONS.regpar=3
A=regem(A,OPTIONS)

dlmwrite(‘c:/agw/antarctic paper/recon redo R4 offset e=3 rp=3.csv’,A)

26 thoughts on “Antarctic Temperature RegEm Forensics”

Earle Williams says:

February 13, 2009 at 10:07 pm

Jeff,

Indulge my curiosity if you would be so kind. I recall seeing on CA or perhaps WUWT where Molan Labe taking you to task on the availability of Steig’s code. Has Molan changed his position or is he still of the opinion that the code exactly as used is available online?

Reply
BDAABAT says:

February 13, 2009 at 10:19 pm

Jeff wrote: “The step function is quite visible. The data pre-1980 is completely imputed/extrapolated while after 1980 has at least some content from one automatic weather station or other.”

Sorry to be dense, I don’t know what this means. Where does this pre-1980 data come from? Is it solely from a sprinkling of manned stations that was then “RegEm-ed”? Or is it from some other source that was then “RegEm-ed”?

Bruce

Reply
Thor says:

February 13, 2009 at 10:45 pm

This is interesting, I’d very much like to see how this progresses, and wanted to try running the code myself.

But, I’m having problems with the wordpress curly quotes. Is there any chance a WordPress plugin like this one could make life easier?

http://coffee2code.com/wp-plugins/wpuntexturize/

Reply
Jeff Id says:

February 13, 2009 at 11:03 pm

Earle,

Whether someone agrees or not a substantial amount of code is being deliberately concealed. There is processing code for the satellite data, unusual anomaly calcs, reg em parameters and after this post perhaps even a modified version of RegEm. I suspect the RegEm was Mann’s contribution.

Bruce,

The pre 1980 data is basically pasted on from surface station measurements which went back into that timeframe.\

#3 I run wordpress from their servers. I kind of wish I had done it myself now. I’m not sure that I can implement the untexturize simply because I don’t have access to the processing. I’ll look into it more.

In the meantime to make it faster, copy the text into something like word and do a find/replace on the left curly quote and then the right. It shouldn’t take too long, especially compared to setting up the matricies and running the code.

If you need help, feel free to ask.

Reply
Greg F says:

February 14, 2009 at 3:32 am

Real crude here. I took both yours and Steigs “Antarctica AWS Recon Trend” graphs and put them in Paint with “draw opaque” off. I put yours over Steigs and moved it back and forth in time so I could see where the peaks were in relation to each other. Around 1980 the peaks in your graph appear mostly to be compressed relative to Steigs. Around the “step in difference” it is almost like each recon goes through filters with different damping. On the downward slope from about 1973 to 1979 the peaks in your recon appear to be generally larger.

Is it my lying eyes or is the difference graph really ‘Id-Steig’? Any chance of getting Steig and your “Antarctica AWS Recon Trend” in a text file?

Reply
Jeff Id says:

February 14, 2009 at 4:31 am

Greg

interesting. {Is it my lying eyes or is the difference graph really ‘Id-Steig’?”

Doc Steigs graph is more positive than mine in history. I show a greater uptrend using his math.

Any chance of getting Steig and your “Antarctica AWS Recon Trend” in a text file?

I’ll send it by email tomorrow. I’m feeling a little tired and grumpy now.
—-

This whole concept of inventing data is so out of control I can’t even begin. I’ve held my tongue but it’s becoming unbearable. How can my graph be of so much greater trend and still not be wrong. Nobody could claim my reconstruction is wrong any more than they can claim steigs is wrong. Yet mine shows a much greater warming trend. It’s the same math, the same interpolating, extrapolating math with a much larger slope!! How can this be? I did my anomaly calcs correctly, I showed everyone how I did them. I used his anomaly calcs for the AWS cause there ain’t even enough data to do it reasonably. I showed that too. I fed it into the RegEm standard code and —– a different result. How can that be? What’s going on??

If these punks would share the code and data we would all know but YOU’RE NOT QUALIFIED TO KNOW. You can’t handle the truth!!! You don’t live in you’re house, you live in their house because they allow it…. You get my meaning.

I tell you what, I do not fear them or their loose statistics. What I do fear is the possible motive. I hope it doesn’t exist but there seems a trend toward cover and hide which leads people to question. Mann 08 was so bad that I will say until my dyeing day that I believe the stats were messed up on purpose. Tamino wouldn’t even support it when I went after him. Still I’ve read other climate/global warming papers which seemed reasonable but what is the point of making a warming trend from imputed data?? What reason do they have?

Is it political? Is it money? Is it ego and fame? ………because it sure as hell isn’t science!!

The unfortunate thing about all of it is that it still could be right. I just don’t know…

Reply
Paul Penrose says:

February 14, 2009 at 6:25 am

Take a deep breath, Jeff, and relax. Eventually the truth will come out.

Reply
Tim L says:

February 14, 2009 at 7:36 am

Jeff,
I see the puzzle pieces too.
Thanks for the post at wuwt.
LMAO!!!!! I think I got a small tear in my eye!

Jeff Id (18:01:15) :

BTW it takes large amounts of power to induce current into wire.
No. A few TeraWatt are enough.

Science humor is great…….

I am still laughing!!!

Your subtracting looks insightful. have you tried an inversion? I did see a (n-1) some where? in the code.

Reply
Tim L says:

February 14, 2009 at 7:42 am

Shit sorry!
but a small number (0.X) X minus 1
This will invert the hole dam mess.
Dam sneaky us geeks are, to hide an applesauce code
good luck!

Reply
Jeff C. says:

February 14, 2009 at 8:28 am

Jeff-

In the SI it says:

“In the full reconstructions, we used the complete weather station data (1957-2006) from all 42 of the
READER database locations listed in Table S2”

Yet in your “combined anomalies r4 offset.csv” input file to Matlab, you have 44 surface stations, not 42. The Data$surface you are reader from Steve Mc has 44 sites. It includes two sites that are not in Table S2 (Gough, Marion). It also has one called Mario_Zucchelli that is not in Table S1, but this appears to be the same as Terra_Nova_Bay (that is in Table S1) based on lat/long and record length.

I added two lines to the code to delete those columns:

#########calculate surfacestatoin anoms
surf=Data$surface
surf=window(surf,start=1957,end=c(2006,12))
surf <- surf[,-26] #deletes column 26 – Marion
surf <- surf[,-17] #deletes column 17 – Gough
### data from Terra Nova Bay appears to be same as Mario_Zucchelli ####
surf <- surf
a=as.numeric(surf)
dim(a)=c(600,42)
surfanomaly=array(NA,dim=c(600,42))
for(i in 1:42)
{
b=a[,i]
dim(b)=c(12,50)
an=rowMeans(b,na.rm=TRUE)
surfanomaly[,i]=a[,i]-an
}
#write.csv(surfanomaly,file=”surface temp anoms.csv”)
sanom=ts(surfanomaly,start=1957,deltat=1/12)
##### combine anomalies

This seemed to fix the slope issue in the trend difference plot (and on the individual station plots I sampled), but there is still a high noise level. I think you may still need to tweak the options.

BTW, you left out the step where you hand edit the R output in Excel to remove the row and column header and change all the “NA”s to “NAN”s. Had to figure out that one myself.

Reply
Jeff C. says:

February 14, 2009 at 8:34 am

“It also has one called Mario_Zucchelli that is not in Table S1, but this appears to be the same as Terra_Nova_Bay (that is in Table S1) based on lat/long and record length.”

Ooops, meant to say Table S2 in both cases. FYI, Gough and Marion
aren’t included because of their latitude (around 45 S).

Reply
Jeff C. says:

February 14, 2009 at 9:02 am

A couple of final things I forgot in the earlier comments. I also changed the dimensioning and indexing from 44 to 42 as shown in the code in my comment above.

The fact that Mario_Zucchelli/Terra_Nova_Bay have different names in Steve Mc’s table mean that there order in the input matrix to RegEM is slightly different from that shown in Table S2 (they are sorted alphabetically in Steve’s table and Table S2). This doesn’t affect the algorithm, but could make things confusing when you are looking over the Input/Output tables.

Reply
Jeff Id says:

February 14, 2009 at 2:22 pm

Thanks Jeff,

It looks like you’re becoming an expert.

I’ll fix the code and try to work with the new data next. There wasn’t enough time to dig through all the data yet. I’m not sure if the set I’m using even has the harry correction made. I just knew the result should be close, and after your changes it is.

BTW, I checked the slope of all the data prior to 1980 in the surface stations (which is all the data that exists) and Steig’s output. Somehow RegEm pasted on data to the AWS has a steeper upslope than the actual data.

——–
#7, Sometimes I get a little pissed off with the games. After a bit of venting, I feel better.

Reply
David Jay says:

February 14, 2009 at 4:18 pm

The VENT is the cure!

(physician, heal thyself)

Reply
Jeff C. says:

February 14, 2009 at 5:00 pm

Jeff-

You getting this to work has really opened my eyes as to what might be going on here.

Steig says on page 2 of the paper:

“Unlike simple distance-weighting or similar calculations, application of RegEM takes into account temporal changes in the spatial covariance pattern, which depend on the relative importance of differing influences on Antarctic temperature at a given time.”

What he fails to state is that regEM uses *no* distance weighting. RegEm has no information regarding the lat/long of either the 42 manned stations or the 63 AWS or how far apart they are. It can draw conclusions based on the similarity in the temperature trend patterns from site to site, but that is about it.

This is important because the peninsula is known to be warming, yet only constitutes a small percentage of the overall land mass (less than 5%). Despite this, 15 of the 42 manned stations used in the reconstruction are on the peninsula! RegEm doesn’t know this. All it knows is that a lot of stations have a warming trend and infills accordingly.

I think your code is working well-enough that we can selectively delete peninsula sites and rerun regEM and see what happens to trends.

Crap. Valentines Day was not the right day to have this insight. The wife is going to be pissed.

Reply
Jeff Id says:

February 14, 2009 at 5:42 pm

Jeff,

That is exactly why I went through this. No distance weighting, how is that possible? I’m just an engineer, so when I imagine mixing the data together from a pile of instruments thousands of miles apart and predicting the others my eyes cross.

I want to run a plot of the locations of the pre-1980 sites today just to see where most of the data came from.

Also, removal of peninsula or just messing around with different trends to see how the values shift would be interesting.

If we delete part of a series, how well does RegEM replace it?

Thanks to your help I’ll be able to move forward with these things.

Reply
Jeff C. says:

February 14, 2009 at 6:17 pm

Yep, being an engineer does give a different perspective doesn’t it? Things we design actually have to work.

I’m going to play with converting the 42 manned station data into sectors or gridcells (ala Hansen’s gisstemp) and then feeding them into regEM. By that I mean plot the locations of the 42 on a map, divide Antarctica up into sectors, then average the results of all the stations within a sector into a single time series. Then use the sector time series in place of the 42 stations for regEM input.

Some of the sectors will have many stations averaged (like the peninsula), others will only have one. This should reduce disproportionate weighting of certain locations where there are lots of stations in the reconstruction.

Reply
Jeff Id says:

February 14, 2009 at 6:36 pm

Now that is a good idea. I’ll be interested to see the results. It sounds like a better strategy than the actual paper.

Reply
Jeff C. says:

February 14, 2009 at 7:46 pm

Re: I want to run a plot of the locations of the pre-1980 sites today just to see where most of the data came from.

Not sure if you have done it yet, but here it is. It’s not pretty.

Reply
Michael D Smith says:

February 14, 2009 at 11:19 pm

I try to run the code and get nothing but errors… What exactly am I supposed to change / replace, and with what?. Can you email me or link to a .txt file that will run in R 2.8.1?

Just so you know, I’m a complete rookie with R so I might have something set up wrong too…?

Thanks.

Reply
Jeff Id says:

February 15, 2009 at 12:14 am

Michael D Smith

It takes a bit of time to run and you need both matlab and R.

Run the first section of code up to the

######## after matlab RegEm run this #################

Then you need to replace all the NA values in the CSV file with NaN delete the header and far left columns, and run in matlab. After matlab run the R code after the above comment.

That’s it.

Reply
hswiseman says:

February 15, 2009 at 6:43 am

#16,
Ergo “the Rain in Maine falls mainly on the Seine” at CA. SM found whopper grid errors in some of the Team papers, and the response was deafening….silence. Now I get it. Since the papers were applying Mannian RegM, grid locale mattered not at all.

Reply
John Blommers says:

February 15, 2009 at 10:10 pm

To reproduce this effort the following file is needed but not provided as a download link to R

”c:/agw/antarctic paper/recon redo R4 offset e=3 rp=3.csv”

Reply
Jeff Id says:

February 15, 2009 at 10:24 pm

#23 thats the name I gave the file from the matlab run. All the data you need is provided thanks to Steve McIntyre

Before you run matlab you need to replace all the NA’s in the file with NaN and delete the header row and far left column (column number). Microsoft excel does these things nicely. After that there isn’t much else to it.

Reply
IanH says:

February 16, 2009 at 10:40 pm

I have a problem looking at the first couple of graphs and the plotting of slope based on them. I know we’ve discussed before that you shouldn’t smooth before analysis, but we’re trying to look at a warming trend in a signal with 10C range, none of which even the Team would claim their models can reproduce. It would seem you should either reject the outliers, or average over the annum/decade before trying to do any real stats on it. But I realise we really only have 30-40 years of data to look at, and the provenance of that is debatable. I doubt any of us with a stats background are used to dealing with such small sample sizes in a chaotic system. It’s really all just so much nonesense wrapped in equations.

Reply
John Blommers says:

February 17, 2009 at 7:12 am

#24 OK I understand the workflow now, have rounded up all the regem and related MATLAB files from http://www.gps.caltech.edu/~tapio/imputation/ , fixed the paths and smart quotes in the code, and replicated your results. Excellent. Thank you so much for sharing.

Reply