## Significance in Global Trend Metrics

Posted by Jeff Id on October 13, 2009

This is something I’ve wanted to mess around with for some time. Everywhere I’ve looked the difference between HadCRUT and GISS is described as insignificant and rarely discussed beyond that. However the significance of trend is described in comparison to the short term variance in the data which is discounted as ‘weather noise’. However, weather noise uncertainty in this case is not instrument or methodological uncertainty. Therefore the confidence interval of the slope is the probability that the trend is created by the deviations from a perfectly linear trend. These deviations are of course primarily weather related shifts in global temp. In this post the point is to look a little deeper and examine only the instrument errors to determine if the difference in Giss and HadCrut is statistically significant.

This post plots data since 1950 and provides autocorrelation corrections based on the non-linear residuals in Degrees C/Decade. I’ve done this for only this time period and encourage readers to download R and try their own time windows. Confidence intervals and plots are calculated using code from RyanO’s Antarctic work.

The HadCrut trend is .1249+/- 0.0231 C/Decade at the 95% confidence interval. This confidence interval is calculated from the residuals of the linear fit corrected for autocorrelation. Think of the confidence window as the change in trend which typical deviations from linear could create.

The Giss trend is 0.1175 +/- 0.0191 C/Decade which is slightly lower than HadCrut. The net difference in trend is .0074 C/Decade which is small for sure but remember we have thousands of stations measuring the same thing and over 50 years the question is- how good do these different methods fit.

The way to solve this is to simply subtract the two metrics and plot it again with confidence intervals. If the slope of the difference is greater than the confidence interval, we have a statistically significant difference between the metrics. This means that the slope difference is created by something other than measurement noise.

There you go, the trend of the difference is well outside the 95% confidence interval. There is a statistically significant difference between HadCRUT and GISS, if we had the data and code for HadCRUT, perhaps we could figure out why.

Now my opinion is like many, the total trend of GISS and HadCRUT is likely way out of whack. The corrections to GISS are too large and their magnitude is not well justified. Like tree ring proxies, little verification has been done and some papers estimate 30% to 50% exaggeration of trend. This post demonstrates that the difference between ground station networks is significant mathematically in relation to the instrumental quality of data and should be addressed.

The turnkey code for this post is below. This code includes downloaders for GISS, HadCRUT, RSS and UAH for those who want to mess around.

#ryan plotting functions with a few tweaks

ssq=function(x) {sum(x^2)}

plt.avg=function(dat, st=1957, en=c(2006, 12), y.pos=NA, x.pos=NA, main.t=”Untitled”) {

### Get trend

fm=lm(window(dat, st, en)~I(time(window(dat, st, en))))### Initialize variables

N=length(window(dat, st, en))

I=seq(1:N)/frequency(dat)

rmI=ssq(I-mean(I))### Calculate sum of squared errors

SSE=ssq(fm[[2]])/(N-2)### Calculate OLS standard errors

seOLS=sqrt(SSE/rmI)*10### Calculate two-tailed 95% CI

ci95=seOLS*1.964### Get lag-1 ACF

resids=residuals(lm(window(dat, st, en)~seq(1:N)))

r1=acf(resids, lag.max=1, plot=FALSE)$acf[[2]]### Calculate CIs with Quenouille (Santer) adjustment for autocorrelation

Q=(1-r1)/(1+r1)

ciQ=ci95/sqrt(Q)### Plot data

plot(window(dat, st, en), main=main.t, ylab=”Anomaly (Deg C)”)### Add trendline and x-axis

abline(h=0, col=2); abline(fm, col=4, lwd=2)### Annotate text

text(x=ifelse(is.na(x.pos), st, x.pos), y=ifelse(is.na(y.pos), max(dat,na.rm=TRUE)-0.2*max(dat,na.rm=TRUE), y.pos), paste(“Slope: “, round(fm$coef[2]*10, 4), “+/-“, round(ciQ, 4), “Deg C/Decade\n(corrected for AR(1) serial correlation)\nLag-1 value:”, round(r1, 3)), cex=.8, col=4, pos=4)

}loc=”http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt”

wd=c(7,5,5,5,5,5,5,5,5,5,5,5,5)

giss=read.fwf(loc,skip=9,widths=wd)giss=giss[,-1]

giss=giss[-(143:154),]giss=giss[-(132:133),]

giss=giss[-(110:111),]

giss=giss[-(88:89),]

giss=giss[-(66:67),]

giss=giss[-(44:45),]

giss=giss[-(22:23),]

giss=t(giss)

giss=as.numeric(giss)

giss=ts(giss/100,start=1880,deltat=1/12)

plot(giss)##################################

loc=”http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/monthly”

wd=c(8,7,7,7,7,7,7,7,7,7,7,7)

hadc=read.fwf(loc,skip=0,widths=wd)

hadcrut= hadc[,2]hadcrut=ts(hadcrut,start=1850,deltat=1/12)

plot(hadcrut)

lines(giss,col=”red”)

lines(hadcrut-giss)###################################RSS

loc=”ftp://ftp.ssmi.com/msu/monthly_time_series/rss_monthly_msu_amsu_channel_tlt_anomalies_land_and_ocean_v03_2.txt”

wd=c(10,9,9,9,9,9,9,9)

RSSa=read.fwf(loc,skip=3,widths=wd)

RSS=ts(as.numeric(as.vector(unlist(RSSa[,2]))),start=1979,deltat=1/12)###################################UAH

loc=”http://vortex.nsstc.uah.edu/public/msu/t2lt/tltglhmam_5.2″

wd=c(13,8,8,8,8)

UAHa=read.fwf(loc,skip=5,widths=wd)UAHa=UAHa[-(371:372),]

UAH=ts(as.numeric(as.vector(unlist(UAHa[,2]))),start=c(1978,12),deltat=1/12)########################################

temp=ts.union(hadcrut,giss,RSS,UAH)gnddiff=giss-hadcrut

plot(gnddiff)

hh=window(hadcrut,start=1950)

gg=window(giss,start=1950)coef(lsfit(time(gg),gg))

coef(lsfit(time(hh),hh))plt.avg(hh,main=”HadCrut Trend Since 1950″)

#savePlot(paste(“C:/agw/hadcrut 1950 -.jpg”),type=”jpg”)plt.avg(gg,main=”Giss Trend Since 1950″)

#savePlot(paste(“C:/agw/giss 1950 -.jpg”),type=”jpg”)plt.avg(hh-gg,main=”HadCrut Minus Giss Since 1950″)

#savePlot(paste(“C:/agw/hadcrut-giss 1950 -.jpg”),type=”jpg”)

## Aslak said

Interresting. You can reject your null hypothesis which is that the errors have the assumed AR1 relationship. To me, it looks as if the residuals in the 3rd plot have a residual enso pattern, which suggest that the noise model also should have a spectral peak at ENSO frequencies.

Some ideas:

* Remove ENSO and repeat the analysis (like here: http://www.atmos.colostate.edu/~davet/ThompsonWallaceJonesKennedy/).

* Pre-Stack the data into ~5 year bins to dampen ENSO effects. Perhaps this will make AR1 a better noise model.

## Jeff Id said

#1, Thanks Doc. I see what you’re looking at and will give it some thought as to how to work it.

## Charlie said

The label on your Hadcrut minus Giss graph has an extra zero. The difference in trend of 0.074/decade is significant. 0.0074/decade isn’t.

## The Diatribe Guy said

#1 – Excluding ENSO gives is a pet suggestion from many people who suggest improvements in temperature trending. But ENSO isn’t the only obvious contributor to temperature. AMO, PDO, and ENSO are all major drivers. But it doesn’t end there… there are numerous, smaller oceanic oscillations. Some of these lesser oscillations seem to indicate a cyclical pattern that can help drive temperature, while others seem to react to temperature changes, and are not drivers. And then, of course, there is the question of how Carbon Dioxide levels actually do affect temperature changes. But let’s not forget about solar cycles while we’re at it…

Now, let’s take all those things and determine their respective impacts. But how? Well, since PDO and ENSO are positively correlated, and one may well help drive the other, we probably have a cross-bias there, and that will need to be quantified. Which probably means there is a cross-bias to the EPO as well. Which probably means….

And then there’s the idea that solar cycles may actually put some or all of these cycles in motion. And then we know that Carbon Dioxide is released or absorbed from ocean waters based on the temperature of those waters, so there’s probably some cross-correlation between solar cycles, ocean cycles, and CO2.

And then, after all this stuff, one has to figure out any positive and negative feedback effects that result from them all before a new equilibrium is establshed.

Gosh… I haven’t mentioned cloud cover yet. Or wind. Huh.

The point is this: people often point out trends and then say “Yeah, but take out ENSO and see what happens!” While it may be an interesting exercise, anyone who suggests this really gets you much of anywhere is kidding themselves.

Truly, one must be able to identify ALL the elements of causation, and perform a simultaneous/recursive analysis on them to eliminate all cross-bias. This means taking ALL the data sets of ALL the oscillations, solar cycles, CO2 levels, and everything else that might matter, and simultaneously optimizing not just each month’s data, but determining lag period within each set (e.g. I’ve estimated that the likely lag time in sunspot measures to be 18 months following the most recent annual average. It is estimated that El Nino effects occur 4-6 months later). The true lags would need to be determined within this simultaneous approach, which means that you now have to test sets of data within each data set.

All of this is theoretically possible, and I’ve been trying to figure out a way to proxy it most effectively. But there are two main issues, which loom quite large: (1) the data needs to be good. First, each individial set of data needs to be accurate, and the ultimate temperature measurement needs to be accurate. We all know that there are issues on both sides of that equation, and (2) we need a boatload of data. A recursive analysis of this sort requires a lot of data with each additional item you’re throwing into the analysis. And even if you throw out some of the minor ocean oscillations, you’re still left with no less than 5 main parameters to look at. To really measure those effects well, we probably need a minimum of a couple centuries of really good data.

Given these limitations, it makes little sense to exclude ENSO only. One should either do their best to quantify all such effects – at least on the major items – and exclude all of them, or simply exclude nothing. Another reason for this is the obvious step-elevation in temperature after the 1998/99 event. It seems apparent that the engine of a strong El Nino warmed us considerably, and the greenhouse nature of our atmosphere (regardless of increased CO2 levels, mind you) retained that for a period of time. It takes a while for things to cool off, so how do you remove the ENSO effect without also adjusting for the CO2 or greenhouse effect? It’s quite difficult.

Finally, even if one wished to do so, without doing the simultanous correlation analysis, one doesn’t truly know what the lag of the ENSO effect is. We have decent estimates, but even the lag may differe whether we’re talking about an El Nino or a La Nina, since we know that global temps can more easily be warmed up more quickly than they cool off.

Sorry for the long post, Jeff. I’ve been struggling with this for some time.

## John F. Pittman said

If ENSO is in the residuals, then that would imply that GISS is somewhat ENSO corrected? I would assume it is more likely the difference is that they handle ocean surface temperatures, or perhaps the GISS way of handling coastal cities lessens the efect. Perhaps a comparison of the magnitude of ENSO with the residuals will give a clue.

## The Diatribe Guy said

I’m an idiot. I misunderstood that he was referring to the residuals, and not the overall temperature data.

Well, my comment stands for those who think it’s a good idea to remove ENSO from temperature data, which I’ve seen suggested before. But I apologize for the rant, given that it was misapplied.

## Jeff Id said

#3 I’m not sure what you mean. 0.0074 +/- 0.0053 means the trend is outside the 95% confidence.

## Jeff Id said

#6, It’s actually a good suggestion I think. However the datasource he’s linked to has already had some operations done to it so I’d like to go back to the source data and try. If the main difference between the two sets is created by ENSO then the trend correction schemes might be pretty similar and the difference could be due to things like station location and density. There’s always something else to try.

BTW, thanks for the links at your site. That’s where I got the links used in the code above. I actually thought about putting together your old style trend graphs in R and sending it to you so that you could push a button and create your posts. It wouldn’t take much time now that the download part is complete.

## Kenneth Fritsch said

Jeff ID, you are taking the path that I have always advocated when attempting to determine how well various temperature series compare and indirectly capture global and regional temperatures.

The keepers of these records tend to imply that there will be no statistically significant differences from series to series. Differences between series imply that the series are not as accurate as they are portrayed. One does have to be careful though in that the finding of statistical differences can depend on the period used in the comparison.

As you say, a statistical difference does not imply that one series is correct and the other not – they can both be incorrect and different. I did a statistical comparison of the USHCN series for the US for the period 1920-present using versions 1 and 2 and found a statistical difference that was not small – as I recall. In effect we have a body that implied that Version 1 was an accurate representation of US temperatures and then does a Version 2 that is significantly different at least over some time period. When the individual USHCN stations are compared using the 2 versions some very large differences can be measured and for a surprisingly large number of stations.

The keepers of the USHCN series did a paper on the differences between Versions 1 and 2, but left most of the observations, that I made above, out of it.

## The Diatribe Guy said

#8 – Thanks, Jeff. Actually, bringing the data into my spreadsheets is pretty automatic. The time consuming part is looking through everything and picking out narrative points, and then actually typing up the post. Often enough I actually have the charts done well before I have time to make the post.

Lately I’ve been looking more at the Ocean Oscillations, and that takes a bit more time while I build it for each data set.

But don’t take that to mean I’m turning down any assistance if you believe there’s a way to make life a bit easier, that isn’t causing work or headache on your part.

## Andrew said

If I had to guess, I’d say this is GISS’s Arctic extrapolation, mostly:

http://data.giss.nasa.gov/gistemp/graphs/ArcticEffect.pdf

## Jeff Id said

#9, I will be adding the intervals that you’ve requested to the sea ice posts soon as well.

## Charlie said

RE my #3, your #7. Please ignore my post. Neither my brain nor my calculator was working very well this morning.

## Raven said

I have always found the idea of ‘correcting for ENSO’ is a bit odd because ENSO is simply a temperature measurement so we should expect to see an ENSO signal and a GMST estimate. If the ENSO signal shows up in the difference between HadCRUT and GISS then that means the processing in one of the datasets has smoothed the data to eliminate the ENSO correlation. SteveMc has documented how GISS eliminates UHI by smearing the UHI across all cells so it is possible that this algorithm also smooths the signals in time.

## Chad said

Hey Jeff. I see this post is rather old, but I’d like to put in my two cents. I’m working on making the best apples to apples comparison of the surface temperature records. I’ve been struggling with how to deal with gridded data that has varying coverage. HadCrut is a mess. GISS (1200 km version) is excellent in this respect because it has virtually total coverage since about 1960. The problem is if you’re going to change the base period of one dataset to match the one your’re comparing it to, you’re making the implicit assumption that the area coverage hasn’t changed over the min/max extent of both time periods. But it has.

Here’s what I’ve done: Get the GISS and HadCRUT gridded data. Calculate the GISS monthly mean anomaly on 1961-1990 to match HadCRUT’s reference period. If missing values are present, let the mean become NA. Subtract the mean monthly gridded anomaly from the entire data set. Interpolate this new data to match HadCRUT’s grid map. Subtract one series from the other, delta <- GISS – HADCRUT. Use is.na(delta) to determine the common missing grid points. Mask these grid points in both data sets. Compute the global average for both. Now you have two data sets that are on the same base period and have exactly the same temporally-varying area coverage. I then calculated HadCRUT – GISS as you did above. The magnitude of the residuals is reduced greatly than if one just calculated HadCRUT – GISS without doing all the steps I outlined. This is because calculating delta the simply way differences noise from different grid boxes whereas the more complicated method differences the noise on the same set of grid boxes each time step and since they are of similar magnitude, they cancel out more. Anyway, I found a much larger difference between the two data sets than you found. I found 0.0172 ± 0.0024 °C/decade (2 SE) with the lag-1 value being 0.423. I'll probably have a post up later tonight revealing my results. I should also note that I did not calculate the global average the way Hadley does. They calculate NH and SH and average to the two which seems weird. I calculated the GISS and HadCRUT global average the simpler, more straight-forward way for consistency.