Antarctic Temperature RegEm Forensics

Thanks to Molan Labe who was quite helpful in identifying what I believe is the correct coefficient into regparm for the purposes of recreating the RegEm calculations. After some considerable time I am still not able to recreate the results of Steig’s paper perfectly, but as you can see below, I am fairly close.

First let’s look at the mean trend of the reconstructed AWS station data by Steig. All data used was from the Climate Audit directory (thanks to Steve McIntyre) so the updated reconstructions and data are not involved.


The slope is written in the bottom right. Next is my attempt at the reconstruction.


Overall the shapes are visually quite similar so I subtracted them.


The step function is quite visible. The data pre-1980 is completely imputed/extrapolated while after 1980 has at least some content from one automatic weather station or other. Next are some of the differences in individual series from the data Eric Steig archived for Nature and my reproductions.



The flat line sections are sections where the actual data exists in each series, the other data was infilled by RegEm.  The step in the difference is quite visible here as well.  I did take the time to confirm that offsetting the surface station data had no effect on the output reconstruction whatsoever so an anomaly in the offset is not the issue.   I will try some other settings in RegEm but I doubt that this will resolve the difference.  All in all, I feel like I’m pretty close now.  Perhaps someone has a suggestion which could resolve the final differences.

Here is the code I used to make the above reconstructions in R.  If you cut and paste it you will need to fix the wordpress quotes.


#########backcalc anomalies
anoms =window(recon_aws, start = 1980)
dat_aws = window(Data$aws[,colnames(recon_aws)],start=1980,end=c(2006,12))
anoms[] =NA
#write.csv(anoms,file=”c:/agw/antarctic paper/bc anoms1.csv”)

#########calculate surfacestatoin anoms
for(i in 1:44)

#write.csv(surfanomaly,file=”c:/agw/antarctic paper/surface temp anoms.csv”)
##### combine anomalies


write.csv(anomalies,file=”c:/agw/antarctic paper/combined anomalies r4 offset.csv”)
######## after matlab RegEm run this #################
recon_id=read.csv(file=”c:/agw/antarctic paper/recon redo R4 offset e=3 rp=3.csv”,header=FALSE)


#######plot differences

for(i in 1:63)
plot(recon_aws[,i]-recon_id_aws[,i],main=paste(“RegEm Steig – IdnStation # = “,colnames(recon_aws)[i]),xlab=”Year”,ylab=”Anomaly C”)
savePlot(paste(“C:/agw/antarctic paper/id recon difference”,colnames(recon_aws)[i],”.jpg”),type=”jpg”)

#######total trend calcs

plot(idmean,main=”Antarctic AWS Recon TrendnId eigs=3 regpar=3″,xlab=”Year”,ylab=”Anomaly C”)
abline(lsfit(time(idmean),idmean,NULL), col=”red”)
text(1962,-6,paste(“Slp =”,round(  coef(lsfit(time(idmean),idmean,NULL))[2],3)  ),col=”red”)
savePlot(“C:/agw/antarctic paper/id recon trend A.jpg”,type=”jpg”)

plot(steigmean,main=”Antarctic AWS Recon TrendnSteig eigs=3 regpar=3″,xlab=”Year”,ylab=”Anomaly C”)
abline(lsfit(time(steigmean),steigmean,NULL), col=”red”)
text(1962,-6,paste(“Slp =”,round(  coef(lsfit(time(steigmean),steigmean,NULL))[2],3)  ),col=”red”)
savePlot(“C:/agw/antarctic paper/steig recon trend A.jpg”,type=”jpg”)

plot(steigmean-idmean,main=”Antarctic AWS Recon Trend DifferencenSteig – Id eigs=3 regpar=3″,xlab=”Year”,ylab=”Anomaly C”)
savePlot(“C:/agw/antarctic paper/steig minus id recon trend A.jpg”,type=”jpg”)

######recenter anomaly calculations based on 1980 and redo analysis


for(i in 1:44)
if(![i])){    sanom[,i]=sanom[,i]-means[i]}


write.csv(anomalies,file=”c:/agw/antarctic paper/combined anomalies r2.csv”)

Since most of the work was done in R, the matlab code was quite simple.

A = dlmread(‘c:/agw/antarctic paper/combined anomalies r3 offset processed.csv’, ‘,’)

OPTIONS.regress = ‘ttls’;

dlmwrite(‘c:/agw/antarctic paper/recon redo R4 offset e=3 rp=3.csv’,A)

26 thoughts on “Antarctic Temperature RegEm Forensics

  1. Jeff,

    Indulge my curiosity if you would be so kind. I recall seeing on CA or perhaps WUWT where Molan Labe taking you to task on the availability of Steig’s code. Has Molan changed his position or is he still of the opinion that the code exactly as used is available online?

  2. Jeff wrote: “The step function is quite visible. The data pre-1980 is completely imputed/extrapolated while after 1980 has at least some content from one automatic weather station or other.”

    Sorry to be dense, I don’t know what this means. Where does this pre-1980 data come from? Is it solely from a sprinkling of manned stations that was then “RegEm-ed”? Or is it from some other source that was then “RegEm-ed”?


  3. Earle,

    Whether someone agrees or not a substantial amount of code is being deliberately concealed. There is processing code for the satellite data, unusual anomaly calcs, reg em parameters and after this post perhaps even a modified version of RegEm. I suspect the RegEm was Mann’s contribution.


    The pre 1980 data is basically pasted on from surface station measurements which went back into that timeframe.\

    #3 I run wordpress from their servers. I kind of wish I had done it myself now. I’m not sure that I can implement the untexturize simply because I don’t have access to the processing. I’ll look into it more.

    In the meantime to make it faster, copy the text into something like word and do a find/replace on the left curly quote and then the right. It shouldn’t take too long, especially compared to setting up the matricies and running the code.

    If you need help, feel free to ask.

  4. Real crude here. I took both yours and Steigs “Antarctica AWS Recon Trend” graphs and put them in Paint with “draw opaque” off. I put yours over Steigs and moved it back and forth in time so I could see where the peaks were in relation to each other. Around 1980 the peaks in your graph appear mostly to be compressed relative to Steigs. Around the “step in difference” it is almost like each recon goes through filters with different damping. On the downward slope from about 1973 to 1979 the peaks in your recon appear to be generally larger.

    Is it my lying eyes or is the difference graph really ‘Id-Steig’? Any chance of getting Steig and your “Antarctica AWS Recon Trend” in a text file?

  5. Greg

    interesting. {Is it my lying eyes or is the difference graph really ‘Id-Steig’?”

    Doc Steigs graph is more positive than mine in history. I show a greater uptrend using his math.

    Any chance of getting Steig and your “Antarctica AWS Recon Trend” in a text file?

    I’ll send it by email tomorrow. I’m feeling a little tired and grumpy now.

    This whole concept of inventing data is so out of control I can’t even begin. I’ve held my tongue but it’s becoming unbearable. How can my graph be of so much greater trend and still not be wrong. Nobody could claim my reconstruction is wrong any more than they can claim steigs is wrong. Yet mine shows a much greater warming trend. It’s the same math, the same interpolating, extrapolating math with a much larger slope!! How can this be? I did my anomaly calcs correctly, I showed everyone how I did them. I used his anomaly calcs for the AWS cause there ain’t even enough data to do it reasonably. I showed that too. I fed it into the RegEm standard code and —– a different result. How can that be? What’s going on??

    If these punks would share the code and data we would all know but YOU’RE NOT QUALIFIED TO KNOW. You can’t handle the truth!!! You don’t live in you’re house, you live in their house because they allow it…. You get my meaning.

    I tell you what, I do not fear them or their loose statistics. What I do fear is the possible motive. I hope it doesn’t exist but there seems a trend toward cover and hide which leads people to question. Mann 08 was so bad that I will say until my dyeing day that I believe the stats were messed up on purpose. Tamino wouldn’t even support it when I went after him. Still I’ve read other climate/global warming papers which seemed reasonable but what is the point of making a warming trend from imputed data?? What reason do they have?

    Is it political? Is it money? Is it ego and fame? ………because it sure as hell isn’t science!!

    The unfortunate thing about all of it is that it still could be right. I just don’t know…

  6. Jeff,
    I see the puzzle pieces too.
    Thanks for the post at wuwt.
    LMAO!!!!! I think I got a small tear in my eye!

    Jeff Id (18:01:15) :

    BTW it takes large amounts of power to induce current into wire.
    No. A few TeraWatt are enough.

    Science humor is great…….

    I am still laughing!!!

    Your subtracting looks insightful. have you tried an inversion? I did see a (n-1) some where? in the code.

  7. Shit sorry!
    but a small number (0.X) X minus 1
    This will invert the hole dam mess.
    Dam sneaky us geeks are, to hide an applesauce code
    good luck!

  8. Jeff-

    In the SI it says:

    “In the full reconstructions, we used the complete weather station data (1957-2006) from all 42 of the
    READER database locations listed in Table S2”

    Yet in your “combined anomalies r4 offset.csv” input file to Matlab, you have 44 surface stations, not 42. The Data$surface you are reader from Steve Mc has 44 sites. It includes two sites that are not in Table S2 (Gough, Marion). It also has one called Mario_Zucchelli that is not in Table S1, but this appears to be the same as Terra_Nova_Bay (that is in Table S1) based on lat/long and record length.

    I added two lines to the code to delete those columns:

    #########calculate surfacestatoin anoms
    surf <- surf[,-26] #deletes column 26 – Marion
    surf <- surf[,-17] #deletes column 17 – Gough
    ### data from Terra Nova Bay appears to be same as Mario_Zucchelli ####
    surf <- surf
    for(i in 1:42)
    #write.csv(surfanomaly,file=”surface temp anoms.csv”)
    ##### combine anomalies

    This seemed to fix the slope issue in the trend difference plot (and on the individual station plots I sampled), but there is still a high noise level. I think you may still need to tweak the options.

    BTW, you left out the step where you hand edit the R output in Excel to remove the row and column header and change all the “NA”s to “NAN”s. Had to figure out that one myself.

  9. “It also has one called Mario_Zucchelli that is not in Table S1, but this appears to be the same as Terra_Nova_Bay (that is in Table S1) based on lat/long and record length.”

    Ooops, meant to say Table S2 in both cases. FYI, Gough and Marion
    aren’t included because of their latitude (around 45 S).

  10. A couple of final things I forgot in the earlier comments. I also changed the dimensioning and indexing from 44 to 42 as shown in the code in my comment above.

    The fact that Mario_Zucchelli/Terra_Nova_Bay have different names in Steve Mc’s table mean that there order in the input matrix to RegEM is slightly different from that shown in Table S2 (they are sorted alphabetically in Steve’s table and Table S2). This doesn’t affect the algorithm, but could make things confusing when you are looking over the Input/Output tables.

  11. Thanks Jeff,

    It looks like you’re becoming an expert.

    I’ll fix the code and try to work with the new data next. There wasn’t enough time to dig through all the data yet. I’m not sure if the set I’m using even has the harry correction made. I just knew the result should be close, and after your changes it is.

    BTW, I checked the slope of all the data prior to 1980 in the surface stations (which is all the data that exists) and Steig’s output. Somehow RegEm pasted on data to the AWS has a steeper upslope than the actual data.

    #7, Sometimes I get a little pissed off with the games. After a bit of venting, I feel better.

  12. Jeff-

    You getting this to work has really opened my eyes as to what might be going on here.

    Steig says on page 2 of the paper:

    “Unlike simple distance-weighting or similar calculations, application of RegEM takes into account temporal changes in the spatial covariance pattern, which depend on the relative importance of differing influences on Antarctic temperature at a given time.”

    What he fails to state is that regEM uses *no* distance weighting. RegEm has no information regarding the lat/long of either the 42 manned stations or the 63 AWS or how far apart they are. It can draw conclusions based on the similarity in the temperature trend patterns from site to site, but that is about it.

    This is important because the peninsula is known to be warming, yet only constitutes a small percentage of the overall land mass (less than 5%). Despite this, 15 of the 42 manned stations used in the reconstruction are on the peninsula! RegEm doesn’t know this. All it knows is that a lot of stations have a warming trend and infills accordingly.

    I think your code is working well-enough that we can selectively delete peninsula sites and rerun regEM and see what happens to trends.

    Crap. Valentines Day was not the right day to have this insight. The wife is going to be pissed.

  13. Jeff,

    That is exactly why I went through this. No distance weighting, how is that possible? I’m just an engineer, so when I imagine mixing the data together from a pile of instruments thousands of miles apart and predicting the others my eyes cross.

    I want to run a plot of the locations of the pre-1980 sites today just to see where most of the data came from.

    Also, removal of peninsula or just messing around with different trends to see how the values shift would be interesting.

    If we delete part of a series, how well does RegEM replace it?

    Thanks to your help I’ll be able to move forward with these things.

  14. Yep, being an engineer does give a different perspective doesn’t it? Things we design actually have to work.

    I’m going to play with converting the 42 manned station data into sectors or gridcells (ala Hansen’s gisstemp) and then feeding them into regEM. By that I mean plot the locations of the 42 on a map, divide Antarctica up into sectors, then average the results of all the stations within a sector into a single time series. Then use the sector time series in place of the 42 stations for regEM input.

    Some of the sectors will have many stations averaged (like the peninsula), others will only have one. This should reduce disproportionate weighting of certain locations where there are lots of stations in the reconstruction.

  15. Re: I want to run a plot of the locations of the pre-1980 sites today just to see where most of the data came from.

    Not sure if you have done it yet, but here it is. It’s not pretty.

  16. I try to run the code and get nothing but errors… What exactly am I supposed to change / replace, and with what?. Can you email me or link to a .txt file that will run in R 2.8.1?

    Just so you know, I’m a complete rookie with R so I might have something set up wrong too…?


  17. Michael D Smith

    It takes a bit of time to run and you need both matlab and R.

    Run the first section of code up to the

    ######## after matlab RegEm run this #################

    Then you need to replace all the NA values in the CSV file with NaN delete the header and far left columns, and run in matlab. After matlab run the R code after the above comment.

    That’s it.

  18. #16,
    Ergo “the Rain in Maine falls mainly on the Seine” at CA. SM found whopper grid errors in some of the Team papers, and the response was deafening….silence. Now I get it. Since the papers were applying Mannian RegM, grid locale mattered not at all.

  19. To reproduce this effort the following file is needed but not provided as a download link to R

    ”c:/agw/antarctic paper/recon redo R4 offset e=3 rp=3.csv”

  20. #23 thats the name I gave the file from the matlab run. All the data you need is provided thanks to Steve McIntyre

    Before you run matlab you need to replace all the NA’s in the file with NaN and delete the header row and far left column (column number). Microsoft excel does these things nicely. After that there isn’t much else to it.

  21. I have a problem looking at the first couple of graphs and the plotting of slope based on them. I know we’ve discussed before that you shouldn’t smooth before analysis, but we’re trying to look at a warming trend in a signal with 10C range, none of which even the Team would claim their models can reproduce. It would seem you should either reject the outliers, or average over the annum/decade before trying to do any real stats on it. But I realise we really only have 30-40 years of data to look at, and the provenance of that is debatable. I doubt any of us with a stats background are used to dealing with such small sample sizes in a chaotic system. It’s really all just so much nonesense wrapped in equations.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s