## Improved Weight Calculation

Posted by Jeff Id on June 15, 2009

My last post on station weight back calculations had a few of problems in it. First, I made the calculation using infilled RegEM matrices, this resulted in a distorted perception of the peninsula weights. The second problem is that I had some location errors in the matrix resulting in improper labeling of individual stations. The third problem is that my back-calculation didn’t allow for the potential for different weightings applied to each year. All things considered, the results are not that different.

This is unfortunate however, because I’ve found that the weighting isn’t a simple value per station but rather a complex intermixing of a different weight vector for each year. When we have to discuss the detail of whether a thermometer anomaly should be inverted, complex weightings aren’t going to help people figure this out ..one bit.

There are still plenty of negative values but I haven’t re-analyzed where they come from and I believe my previous post is correct in this. The B RegEM TTLS matrix has columns and rows which correspond to missing and available values. The matrix has horizontal and vertical dimensions with column and row numbers that always add to 45, corresponding to the 42 surface stations plus 3 pc’s of the satellite data.

The sum of the column values in the B matrix represents the final weighting of the temperature series for each year in the reconstruction. If you follow the math, you’ll wonder how I back-calculated the reconstruction successfully. Since there are years with slight differences small changes in matrix values can create big changes in the result. Basically, I got lucky. Was my previous result wrong?? — nope sorry Gavin. Does it need more improvement? Yes TCO, it does.

— You really have to love blogland.

So NicL noticed my station labling problem which did lead to an apparent overweighting of the peninsula. However the peninsula in Nic’s calculation of Steig et al, similarly to my own, showed far too heavy a weight in comparison to the geographic area – 49% trend (X times) weight and I was still wrong in assuming that the trend didn’t come in large part from the peninsula. Nic, did a great job and I’m thankful for the checking — I do make plenty of mistakes as this is pretty well real time blogging.

Nic’s answer I believe also has a bit of trouble and it’s also my fault. —Ever had one of those lives?

In the post before, the back calculations included infilled station values. Each infilled value had some contribution from the other stations already included. This mixes the contributions of non-peninsula stations with peninsula and vice-versa. So to do it correctly I should have filled the ‘A’ matrix from before with only station values and the output with only the RegEM result. Instead I used RegEM’ed station data.

So I still haven’t repaired the other post and if you follow this you will understand why I’ve waited this long. Steve McIntyre emailed to say he already had the weightings months ago but didn’t do anything with them. His understanding of what was happening in TTLS was much better than my own. Thanks to CA help, I’m catching up though.

Steve said in his post on TTLS – Truncated total least squares.

Because the TTLS relationship can be reduced to a matrix of coefficients , the ultimate Antarctic temperature index proves to be a linear weighting of the underlying station histories.

There’s nothing wrong with that but the actual situation in RegEM is more complex than I thought. Unlike TTLS, the linear weighting apparently changes on an annualized basis for RegEM. Each year with different stations has a slightly different weighting. Stations don’t even maintain the same ratio of weighting when a new one is added, but they are close… It’s blurry math! Similar to image processing nothing you see is definite, only an approximation of the physical world.

From my last post I was able to back calculate the output of RegEM by solving a matrix of 34 equations. I showed that 5 stations had negative weights and everything solved. There was an oddity in the solution which I explained as rounding error. I wasn’t certain about it though.

Here is the important graph (Figure 4) from before which has not changed.

This graph shows by the flat line difference between the Steig solution and the back-calculated linear weighted solution that I successfully calculated station weights which backsolve the problem but the ripples in the line show some imperfection which is not typical. The result had a small problem because I made an error in the assumption that linear weighting would be equal for each year as the TTLS CA solution was.

The B matrix is used in TTLS as the linear multiplication factors between missing and non-missing values. This graph is similar to but not exactly the same as the previous posts weightings. To be exact I need to take the last step of deconvolving the weights from the satellite PC’s, *it’s going to be fun*. What these weights represent is the total weighting of each region to the total missing values. Missing values include surface stations, as well as the satellite record so these weights represent how the total reconstruction is weighted through all stations, surface and satellite.

You can see the Peninsula has a similar weighting to the 50% of my pie chart again showing that the peninsula stations are still overweighted in the reconstruction. The thing I was impressed by was the station weighting didn’t change a great deal through the length of the record. The satellite data shows up at the top right of the plot post 1982.

The code for the above post looks like this: —formatting courtesy wordpress.

#perform RyanO SteveM RegEM reconstruction

clip=form.steig

dat=window(calc.anom(all.avhrr), start=c(1982), end=c(2006, 12))

base=window(parse.gnd.data(all.gnd, form=clip), start=1957, end=c(2006, 12))

base=calc.anom(base)

pcs=get.PCs(dat,3)

dd=ts.union(base,pcs[[1]])

reg3=regem.pttls(dd,maxiter=50,tol=.005,regpar=3,method=”default”, startmethod=”zero”, p.info=”Unspecified Matrix”)

dim(reg3[[35]]$X) #600 45#extract surfacestations and PC’s

regemSST=ts(reg3[[35]]$X[,1:42],start=1957,deltat=1/12)

dim(regemSST) #600 42

regemPC=ts(reg3[[35]]$X[,43:45],start=1957,deltat=1/12)

#######################

##calculate all weights

wtary=array(NA,dim=c(600,45))

for (i in 1:600)

{

m=as.numeric(row.names(reg3[[35]]$B[[i]]))wtary[i,m]=rowSums(abs(reg3[[35]]$B[[i]]))

}

########################

########################

statmask=rep(TRUE,44)

statmask[17]=FALSE #rm gough

statmask[26]=FALSE #rm marion

stationIDX=all.idx[1:44,]

stationIDX=stationIDX[statmask,]nam=as.character(stationIDX)

lats=stationIDX[[2]]

lons=stationIDX[[3]]

stationtype=c(stationIDX[[10]],5,5,5)swts=array(0,dim=c(600,5))

for(i in 1:5)

{

for(j in 1:600)

{

mask=stationtype==i

swts[j,i]=sum(wtary[j,mask],na.rm=TRUE)

}

}

sumrow=rowSums(swts)

swt=(swts/sumrow)*100

swt=ts(swt,start=1957,deltat=1/12)par(mar=c(5,4,4,1))

barplot(t(swt),beside=FALSE,border=NA,col=c(1,2,3,4,5),space=0,main=”Steig Replication B Matrix Absolute – |Weights|”,ylab=”Absolute Percent Weight”,xlab=”Year”)

legend(“topleft”, legend=c(“Peninsula”, “West Antarctica”, “Ross Ice Shelf”, “East Antarctica”), cex=.6, col=c(1, 2, 3, 4), bg=”white”, pch=15,horiz=FALSE)

abline(v=300,col=”darkred”,lwd=3)

text(300,10,”1982″,col=”red”)

text(30,10,”1957″,col=”red”)

text(570,10,”2007″,col=”red”)

#savePlot(“C:/agw/antarctic paper/pics/Steig Recon B matrix weights.jpg”,type=”jpg”)

## TCO said

Dumb, innocent question (honest): The trend is not ALL sattelites after 1982?

## Jeff Id said

The trend is presented as 100% satellite after 1982, however the surface station version is calculated behind the scenes. This behind the sat recon is what RyanEM brings to the forefront.

## Fluffy Clouds (Tim L) said

wow! but what happened to west? some of this shows 0 zero….

that is where the peninsula data went?

Thank you for the redo!

my response is still the same as last time/post.

## Jeff Id said

#3 No problem. It’s complicated enough that it’s difficult to figure out.

## Stephen McIntyre said

Jeff, this is definitely coming along.

I didn’t mean to imply that the weights didn’t change year-to-year in RegEM. In the post in question, I was trying to focus on the fact that the actual data had an “almost block” matrix structure – to coin a term – and to derive coefficients for that case and and analyse the RegEM variation from that “base case”.

There are some older CA posts on weights in MBH and I discussed these in my Georgia Tech presentation. There’s a pretty graphic in it showing the bristlecone weights.

The approach there was very much along the lines that interest you here, relevant CA posts include http://www.climateaudit.org/?p=2708, 2841, 776 and others.

Interestingly, I ended up grouping proxies to achieve an intelligible representation of results that was along the lines of what you’ve done here. I grouped proxies by continent, keeping bristlecones as a separate class. I then calculated the contribution of each proxy class to the total reconstruction and presented a spaghetti graph. Every proxy class other than bristlecones was pretty much low-amplitude noise, while bristlecones had a HS of the same shape and size as the final recon.

In this case, the final recon is the sum of contributions from the 4 classes. It would be interesting to do a spaghetti graph of the contributions of each class to the recon. My guess is that this will show the Peninsula contribution quite vividly.

There’s a definite irony in all of this, as readers of our blogs speculated almost instantly that the Peninsula stations would have a comparable function to bristlecones, with the multivariate methodology serving to overweight these stations and proof of this seems ever closer.

## Al S. said

Surely, using Ross Ice Shelf Data (from sattelite, I preume)for more than 10% of your weight after 1982 will introduce a false trend… it’s GOT to be a lot closer to zero C. than the interior.

## Kenneth Fritsch said

Jeff ID, I see your explanations of your corrections of analyses, that, like you say, are near real time, as something I really appreciate about blogging. It provides an opportunity to obtain more insights into the whole process and it gives me more confidence in credibility of the analyzer.

I also have been referencing yours and others R scripts to help me learn the applications of more R functions and commands.

## TCO said

6. Jeff’s answer was a little ambigous, but I think the Steig trend is all sattelite after 1982. The picture shows some other calculation going on in the background, but regardless, the post 1982 trend is all sattelite. Jeff can clarify if I got that wrong.

## Jeff Id said

#7 Thanks Ken, DHog called me dishonest on tamino’s blog. I realize he can’t read technical (I like that wording), but how can I lie if the code is right there?

Whatever.

## Kenneth Fritsch said

TCO, if you can cut out all those silly mind games you like to play and attempts to spin comments of others, perhaps you can get yourself up to speed on this part of the analysis. We have had a rather lengthy discussion of this issue a while back and I am quite sure that you were around – in fact I know you were. Ryan O in his recent comments has also pointed to this issue as a problem with the Steig et al reconstruction – in splicing measured data (1982-2006) to the reconstruction (1957-1981). Try the link below and some related ones following it.

Steig’s Verification Statistics – Part 2 (A little more majic)

Posted by Jeff Id on April 28, 2009

## TCO said

I’m still curious if the post 1982 trend is sattelite or not. This is not a mind game (honest). Pinning things down.

REPLY:Satellite was shown in Steig et al post 1982. The recon exists behind it in the math. This was well established early in the analysis.

## TCO said

Understood. That’s what I thought. Just double clarifying since Al S. at 6 was still confused even after your explanation.