The Best Laid Plans
Posted by Jeff Condon on August 30, 2009
My job for the Antarctic paper so far has been simple. Present an area weighted reconstruction of the surface stations as a simple sanity check of the serious math reconstruction Ryan and Nic have been doing and I’ve been reading along on by email. So if you’ve been visiting once in a while you know I’ve presented several area weighted reconstruction’s. The algorithm I prefer so far is one which infill’s each satellite gridcell by the closest station which has data for that year I’ve created several ‘dozen’ versions, the latest one worth publishing was shown Some Other Area Weighted Reconstrucions and others were presented at Area Weighted TPCA Check .
Since the latest reconstructions utilize the noisy automatic weather stations which weren’t employed until after 1982 there are a lot of new series starting in the second half of the data. This creates some problems. First, we need to recognize that the Manned stations are more consistent and of a better quality than AWS stations which can become buried in snow for extended periods of time. A second issue is also important, each anomaly is centered about it’s mean. Average = 0. So if you have two thermometers which measure exactly the same values always and one starts 10 years after the other. Both have a mean of all their data of zero so if any trend exists thermometer 2 which measures exactly the same temperature as thermometer 1 ends up having a slightly different anomaly. When the two are averaged this creates a sudden step in the data.
Currently all presented plots do not correct for this factor. Since trends are fairly low, it isn’t a terrible method to use and we can expect that whatever corrected versions I come up with we’ll have a similar result. I’ve had several ideas of how to fix this annoying effect, some more stupid than others and none of which were worth posting on.
One particularly misguided method I used was to start at 1957 – the beginning year of our reconstruction – and offset existing Jan 1957 temps by zero. Then each time a new series came in I took the mean of all the other stations and applied that as the offset for the new data. I got a trend of 0.12 C/Decade which is out of whack completely and matches Steig et al. We know it’s out of whack by checking the surface station versions of the same algorithm which typically come to 0.04 or 0.05C/Decade. What the algorithm did by taking mean was to ignore the fact that the peninsula and Ross ice shelf are heavily oversampled in relation to the rest of the continent.
Ryan O made a great point that the offset in this manner assumes that when a new station is introduced and the mean is used, the temperature record for the new station prior to its existence is assumed to be the same as the continent. It’s an obviously incorrect assumption. In this post, I tried another method for correction. Since any offset assumes the trend of the new station prior to its own existence the next best option would be to choose the next closest station with existing data to compute the offset!
Well the algorithm I used was designed to calculated the offset between the new station and next closest station by using an average of the closest data points. I took the matrix of all ground station data on the sat grid 63 manned and aws stations and ran them through a filtering algorithm. This matrix FYo (filtered Yo)was stored separately and used only for calculating offsets.
squarefilter=function(Yo=Yo,windowsize=3)
{
FYo=Yo #allocate memory
for(j in time(Yo))
{
FYo[time(Yo)==j,]= colMeans(window(Yo,start=j-windowsize,end=j+windowsize),na.rm=TRUE)
}
FYo
}
FYo=squarefilter(Yo,2)
The last line calls the function with a half width of two years. The filter does no endpoint projection and simply averages the data inside the specified window. Seemed like a good idea, however the data is so noisy that I got some unusual results. The function to go through the series wasn’t terribly complicated but isn’t exactly simple either.
for (i in 1:600)
{
mask= is.na(offset)
newmask=((!is.na(Yo[i,]))-!mask)>0
if (sum(newmask)>0)
{#new value started
for(j in which(newmask==TRUE))
{
#get next closest distance with offsets
dist =circledist(surfcgrid[j,1],surfcgrid[j,2],surfcgrid[,1],surfcgrid[,2],)
hh=order(dist[!mask],decreasing=FALSE) #order all stations which have offsets
namask=is.na(Yo[i,hh]) #find stations with no data values
hh=hh[!namask][1] #remove stations with no values
stationindexb=which(dist==dist[!mask][hh])[1]
offset[j]=FYo[i,stationindexb]+offset[stationindexb] – FYo[i,j] #calculate offset as a difference between
#filtered anomalies
if(j==4)
{
print(stationindexb)
print(time(Yo)[i])
}
}
}
}
Formatting by WordPress.
The function moves through all surface stations, as new stations are introduced it calculates an offset based on the filtered existing data from the next closest station. It took quite a while to write and debug so I expected interesting results and wasn’t terribly dissapointed to that end. The following plots are from a randomly chosen gridcell #130 of 5509 total. Each color represents a different surface station. Each satellite gridcell is infilled with the closest available surface station data so the introduction of new stations over time results in a complex combination of surface measurements for each point. Actual stations used are enumerated in the legend.
These plots are of 1 of 5509 reconstructed satellite gridcells.
The light blue surface station #24 is consistently below station #64 in the trend. Of course this is completely unnatural and means that my offsets are too noisy. I confirmed that by trying a variety of different filter levels which demonstrated the offsets were better and worse depending on the filter level. The problem is that in order to get values that look reasonable, the filter level had to get ridiculously long to reduce any high frequency noise.
Below is a plot of he same values using a 4 year filter windows.
The four year window results in much better ‘looking’ values but the trend of the reconstruction is dependent on the filter level. For the four year trend, this method calculated — 0.083C/decade. I believe this trend to be about 60%-80% higher than actual from the manned surface station only reconstructions. The same plot without any offsets is in Figure 3. This is still my preferred area weighted reconstruction despite the fact it has offset issues described in the beginning.
The four year filtered area weighted plot looks like this:
Compared Figure 4 to the non-offset version which has a trend of 0.051 C/Decade in Figure 5.
Well, all I can say is another 10 hours spent on a great idea which didn’t pan out. For those of you who are perennially on the wrong side of every issue, several trends I found were in the opposite direction from reality – cooling. The slightly higher trend, of the offset reconstruction is perfectly acceptable and that was the one I presented, however the method just didn’t work.
So with all this, I’ve got a new idea. My next version will take all the available points between this station and the next closest station and regress them to determine the offset. This will use the maximum available information to align the two stations. It’s Sunday though and I’ve spent four hours this morning and yesterday on the Antarctic trend and I’m going to take a break. BTW, We’re not even allowed to visit anymore due to the possibility of environmental damage (what a pile of bull). The Antarctic is huge in area and impossible to damage through a bit of tourism. Not that I wanted to go anyway but it would have been fun for a short time.







curious said
Apologies if this is a bit OT but seeing fig4 reminded me of this:
http://vulcan.wr.usgs.gov/Volcanoes/Antarctica/Maps/map_antarctica_volcanoes.html
As far as the best fit stuff goes I do wonder about doing an incremental correlation exercise of the coastal stations along the Antarctic circle to nearest neighbours. This could be done on an arc basis – they share the same latitude and AFAIK the same climatic influences. This could possibly help set an upper distance limit to correlation for the coast stations. A similar technique could be used for internal stations by choosing subsets. This could help choose appropriate means to use for new series. I’m typing this without going back to older posts and I think some of this was considered/covered (inc. something at CA) so apologies if it is not relevant or already covered.
Re: visiting – I wondered if it could be to stop us all being so shocked by the glaciers in retreat (!) – and will it also mean an end to individuals doing sponsored or research trips?
Ryan O said
Jeff,
You might also be able to simply extract the satellite data at station locations, take the covariance, and use that to establish the offset to the nearest station.
Jeff Id said
#2, Ryan, I was hoping to do it independently of the sat info because it should give independent confirmation of your work. The no offset version isn’t far from real as verified by the manned only trends. My impression is the same as before, that the change from whatever we do should be minimal from the ground only data. It’s comforting to know that the ground version verifies the rest, both in pattern and trend. I’d like to try the ground only covariance regression first, if that gives odd results I’ll move to the satellite data.
Kenneth Fritsch said
Good stuff Jeff ID, and I hope you do not judge the interest in a topic by the number of responding posts. A post introduction with some meat in it takes a while to simply digest, whereas a post on posting at RC can garner immediate responses from the many who have experienced rejection.
I hope that you do your analysis with and without the satellite data.
Jeff Id said
#Kenneth and Ryan,
I finished the calculation using only the overlapping data from the next closest station and got some reasonable results. I’ll post it this evening.
Ryan O said
Ah, cool.
In a completely unrelated note, I sent you an email. Finally got a perfect replication of Steig. Nothing earth-shattering, but will require a few changes to the script.