Thermal Hammer

Ok, today is the day that tAV, the repeatedly alleged EVIL denialist blog, lucky unwitting recipient of the biggest scientific scandal of the last hundred years, hosting the proprietor who has been snipped from every advocate website, who will now present a global gridded temperature from the RAW-ish GHCN data, having a higher trend than the believers will publish.  However, it is to my knowledge a more correct representation of the actual data.  To perform this feat of magic, we’ll use Roman’s hammer, so popular it has been immortalized in a t-shirt.

Roman’s hammer, is a simple method for the combination of temperature time series. Remember different thermometers can experience two primary effects. Offset of temperature due to altitude or proximity to water etc.  and different levels of seasonal variance, proximity to water, dryness of air, etc.. The “hammer” method takes care of both seasonal variation and offsetting values to provide the best match between series. A second and independent improvement from Roman’s recent work means that we no longer need to calculate anomaly in order to solve for global temperature trend.

The steps are simple.

– load data
– sort temperature series into their own 5×5 gridcells 72lon by 36 lat
– create individual series from the GHCN inventory file by simple averaging of station ID’s with multiple series representing the same instrument.
– gather all series for each gridcell and hammer them together.
– average NH an SH individually and combine — not sure if this makes sense but it copies HadCrut.

I’ve changed the renown “getstation” algorithm for this post, it has a new value called “version”.  Since multiple instruments which sometimes have the same WMO number are mixed in with multiple copies of the same instrument (How sloppy is that?!) they use an additional digit. Considering the crew, team or whatever, wants literally trillions of dollars to limit CO2, they can certainly spend the small time required to document and record “actual” temperature stations by their own ID.   In this post, the algorithm sorts out which stations are nearby vs same instrument.  Multiple series are placed in columns and averaged using a row mean and placed in a single output timeseries.

I’ll put the whole code at a different link but here is the getsinglestation function:

getsinglestation=function(staid=60360, version=0)
{
	staraw=NA
	#raw data
	smask= (ghmean[,2]==staid)
	mask= ghmean[smask,3]==version
	data=ghmean[smask,][mask,]
	noser= levels(factor(data[,4]))

	for(j in noser)
	{
		mask2=data[,4]==j
		startyear=min(data[mask2,5])
		endyear=max(data[mask2,5])
		subd=data[mask2,6:17]
		index=(data[mask2,5]-startyear)+1
		dat=array(NA,dim=c(12,(endyear-startyear)+1))
		for(k in 1:length(index))
		{
			dat[,index[k]]=as.numeric(subd[k,])
		}
		dim(dat)=c(length(dat),1)

		dat[dat==-9999]=NA
		dat=dat/10
		rawd=ts(dat,start=startyear,deltat=1/12)
		if(max(time(rawd))>=2011)
		{
			print ("error series")
			rawd=NA
		}

		if(!is.ts(staraw))
		{
			staraw=rawd
		}else{
			staraw=ts.union(staraw,rawd)
		}
	}
	if(!is.null(ncol(staraw)))
	{
		allraw=ts(rowMeans(staraw,na.rm=TRUE),start=time(staraw)[1],freq=12)
	}else{
		allraw=staraw
	}

	allraw
}

The main loop of the software is worth discussing a bit. In the code below tinv is the inventory values for each temp station  It contains lat, lon, and a variety of station metadata. Before the loop,  an array is created to which gridded cell each station is assigned to.

##assign stations to gridcells
gridval=array(NA,dim=c(stationnum,2))
gridval[,1]=as.integer((tinv[,5]+90)/5)+1  #lat
gridval[,2]=as.integer((tinv[,6]+180)/5)+1  #lon

Over the last couple of months, the algorithm keeps getting simpler, which of course makes the engineer in me happier with each revision.   It starts where “ii” is gridded longitude in 5 degree increments and “jj” is latitude also each 5 degrees.

I’ve added a number of comments to the code for this post.

gridtem=array(NA,dim=c(36,72,211*12))

for(ii in 1:72)
{
	for(jj in 1:36)
	{
		maska=gridval[,2]==ii  #mask all stations where longitude doesn't match
		maskb=gridval[,1]==jj  #mask all stations where latitude doesn't match
		maskc= maska & maskb   #combine lon and lat masks to get all stations where both match
		sta=NA                 ##initialize to non TS
		for(i in (1:stationnum)[maskc])
		{
			rawsta = getsinglestation(tinv[i,2],tinv[i,3])

			if(!is.ts(sta)) #add stations to multicolumn timeseries
			{
				sta=rawsta            #first station
			}else{
				ts.union(sta,rawsta)  #other columns
			}
		}

		if(is.ts(sta))           #if at least one time series exists
		{

			sta=window(sta,start=1800)	                #trim any pre-1800 data
			index=as.integer(((time(sta)-1800)*12+.002)+1)  #calculate time index for insertion into array
			if(!is.null(ncol(sta)))                         #is station more than one column
			{
				gridtem[jj,ii,index]=temp.combine(sta)$temps    #more than one column, use Roman's algorithm
			}else{
				gridtem[jj,ii,index]=sta                        #a single column, just assign the data
			}
			print(jj)                                       #progress debug
			tit=paste("ii=",ii,"jj=",jj)
			plot(ts(gridtem[jj,ii,],start=1800,freq=12),main=tit)   #debug plot
		}
	}
	print ("COLUMN")        #progress debug
	print(ii) 		#progress debug
}

That’s it. Pretty simple, just gather it into gridcells and if there is more than one temp station per gridcell, use the Roman Hammer to smack them into place. Ok Jeff, we’ve waited for two weeks, enough scribbling of nonsense, what are the results.

Remember Roman’s temperature combinations allow different offsets for linear trend by month, yet a single trend for the whole dataset.  The offsets are required to align anomalies with each other and represent a substantial improvement over typical global instrumental temperature series.

The algorithm above is called with the following lines.

rgl=gridtem                                                #move gridded temperature to different variable
dim(rgl)=c(36*72,2532)                                     #restructure grid into 36*72 individual series
rgl=t(rgl)                                                 #transform rows and columns so each year is a row
rgl=ts(rgl,start=1800,deltat=1/12)                         #make time series
mask=!is.na(colMeans(rgl,na.rm=TRUE))                      #create mask for empty columns (empty gridcells)
wt=rep(weight,72)                                          #apply area weighting to gridcells
maskb=array(FALSE,dim=c(36,72))
maskb[1:18,]=TRUE                                          #create mask by hemisphere
dim(maskb)=(36*72)
maskc=mask & maskb                                         #combine hemisphere mask with empty gridcell mask
nh=temp.combine(rgl[,maskc],wt[maskc])$temps               #use Roman's method to combine all gridcell series in the hemisphere
maskc=mask & !maskb                                        #invert hemisphere mask and redo for sh
sh=temp.combine(rgl[,maskc],wt[maskc])$temps

glt=(nh+sh)/2                                              #take global average

It’s pretty sweet how simple things are getting. There is of course still the possibility (an in my case probability) of error but as the code has been gone over again and again, the differences created by any mistakes are becoming small.

Northern hemisphere trend.


Figure 1 - Northern hemisphere temperatures

I bet you’ve never seen a semi-global temperature plot that looks like that!  It happens to be in Degrees J,  which have equal increments to degrees C but the true value has some offset. Think of it as Degrees C, except that nobody knows where zero really is.

The Southern hemisphere is in Figure 2.

Figure 2 - Southern hemisphere temperatures

Pretty unique looking. It’s a simple matter to combine the two series, yup I used the hammer again.


glt=temp.combine(un)$temps
Figure 3, Global temperatures

Ok, so that told us nothing, except that we’ve now been able to calculate global temperature in degrees J.  The varition in actual value seems fairly small.

Trend is what we care about tho.

Using Roman’s true anomaly methods, I get the following plot.

Figure 4, Global temperature trend with true slope

This method really is different from the less accurate yet standard anomaly trend of climate science.  Globally, 0.079 C/Decade since 1900.  Of course this is just GHCN data, so the trend could be created by UHI and such.

Below is a corrected anomaly trend.

Figure 4 Global temperature trend by anomaly

Below that is the same plot as Fig 4 with HadCrut overlaid.The anomaly for this post is calculated using my adaptation of Roman’s trend for anomaly calculation.

Figure 5 - Global temperatures plus HadCrut

Now look at that match.  CRU has slightly too low a historic value WRT  GHCN 1935 and before, but in the alleged anthropogenic global warming era, GHCN has a much higher trend.  I call it Id’s but it’s really primarily Roman’s work which has been shaped into this post.  There are better methods for area combination than gridded and we should play with those in time, but there is a lot of information in the above graph. As a last minute add on, Bob Tisdale has a post which is a fantastic match to this result.

First the obvious, a skeptic, denialist, anti-science blog published a greater trend than Phil Climategate Jones.  What IS up with that?

From climategate, we learned that Phil flinched away from review of at least one technical math oriented paper.  My own guess is that he just hadn’t considered an offset method for aligning anomalies just hoping the steps would come out in the wash.   In reality, when there is an upslope in the true signal, the steps created from starting and stopping of anomaly temperature series, always reduce the trend.

We discussed that in this post Anomaly Aversion.

The hemispheres tell an interesting story.

Figure 6 - Northern hemisphere temp anomaly
Figure 6 - Southern hemisphere temperature anomaly

Check out the difference between the two hemispheres of GHCN.

Figure 7 - Both hemispheres. Filtered 11 year Gaussian

Note the delta in the most recent data.

There are a lot of details in this next plot which others may not see.  Many of you are more experienced and knowledgeable with respect to temp trend than I, but running the algorithm makes a difference in understanding the quality of the data.  Visually, as each dataset is plotted by anomaly, I see a general upslope in most data.  There are plenty of blue series which go against the trend though, which brings the question of data quality into focus.

Below is a global plot since 1978.  The deep red and blue points are gridcells with such unreasonable temperature trends that they cannot be accepted.  There are far more deep red than deep blue.  So far though, this algorithm is still a bit of a black box beast.  I’ve not explored the reasons for individual extremes in gridcells and we are looking for small temp trends.

Figure 8 - Temperature trends since 1978
Figure 9 - Temperature since 1900

This post has been worked on for literally weeks, I’ve looked at so many plots my head spins.  There is a lot of unwritten detail here.  Consider the amazing evenness in Russian temp stations.  Enough so that it will be difficult to ever question Siberia as was done prior to climategate.

There are high trends from GHCN,  so high in fact that anyone who questions Phil Climategate Jones temp trends  will need to show some evidence. Certainly Phil is an ass, but it no longer seems to me that he has ‘directly’ exaggerated temp trends one bit.   Also, the elimination of temp stations has certainly reduced the quality of data, however, despite E.M. Smiths work, we don’t know which way the bias runs.   We need the actual data for that.

Several skeptics will dislike this post.  They are wrong, in my humble opinion.   While winning the public “policy” battle outright,  places pressure for a simple unified message, the data is the data and the math is the math.  We”re stuck with it, and this result.  In my opinion, it  is a better method.  Remember though, nothing in this post discusses the quality of the raw data. I’ve got a lot of information on data quality, for the coming days.  In the meantime, consider what would cause such a huge difference in trend between the northern and southern hemispheres.

Anyway, the global temp trend, from this data since 1978 is:

Figue 10, global temperatures as calculated from GHCN

124 thoughts on “Thermal Hammer

  1. I’d prefer to not see any confirmation of Jones’ work, but my first pass through this leaves me saying “cool”. Let me take a few more passes. In the mean time, congratulations to Roman. I suspect if Jones was right, it was by accident.

  2. Jeff:

    “Since multiple instruments which sometimes have the same WMO number are mixed in with multiple copies of the same instrument (How sloppy is that?!) they use an additional digit.”

    I have a suspicion that they may use 2 additional digits. See my comment 17 on your previous Demonstration of Improved Anomaly Calculation.

    The use of 71892 0 for Vancouver Int includes temperature data for NaniamoA, Ladner, Naniamo Departure Bay and New Westminister and gives a date range of 1937 to 2001

    The use of 71892 4 gives temperature data for an unknown site (it is not in tinv; tinv only contains single digit modifiers – a WS of v2.mean I created in Excel has double digit modifiers) for dates from ~1880 to 1980.

    I don’t know if this makes any difference to the overall result but the WMO numbering system is confusing to say the least.

  3. The simplistic explanation for the difference in trends between the hemispheres is that the southern hemisphere is mostly water, which has a lot higher effective thermal mass than does land.

    I don’t have hard data at hand, but I seem to recall that for up and down bumps in temperature of a decade or so, that it is common for the sea surface temps to vary only about 1/3 that of the land.

    “the data is the data and the math is the math. We”re stuck with it,…. ” —- it may take a while for facts and truth to displace propaganda and false theories, but in the long run they generally do win out.

  4. Just another comment on the WHO #s. I have a copy of the Canadian sites used for someone’s temperature data CRU?. Think I got it off Warrick Hughes site (if that is his correct name). Some of the WHO numbers given in the document do not agree with those in tinv (created by the script you sent me) and some do.

    For example Clinton Point is given in the document as 710530. In tinv it is 719481. Others are correct. Doesn’t give much confidence in whoevers accuracy.

  5. Jeff:

    On the “loss of stations” effecting the quality of the data, it seems to effect GISS more then Hadcrut due to GISS “infilling”. I did a little look at a nice out of the way grid cell that has only had one station (by location,6 different records according to GISS) in it, ever: Grid Cell 83N by 63W (Alert, Canada). From the GISS Gridded Map making page you can make trend and anomaly maps for both 250km and 1200km infill and you can download the data for those maps. According to GHCN Alert only has data from 1951 to 1990 and that is what GISS uses. When you take the Gridded yearly anomaly data for 250km infill you find it matches the Alert station data and that the trend for that station is -.42°C (both 1200 and 250 km infill trends are that).

    Now here is the interesting part when you go to Environment Canada’s website you find that the Alert Station reported data from 1951 to 2005, 14 years longer then GHCN has it. So I took the 1951 to 1990 data computed 1951-80 basleine anomalies and plotted it against the GISS adjusted station data and almost got a perfect match.

    This led to looking at what GISS had the trends for the period 1951-2005 for that grid cell. What I got was a trend of -.32°C for 250 km infill and 1.16°C for 1200km of infill. Notice that when I had set the trend for the dates GISS had actual data from that one station the trends matched between 250 and 1200km infill, once they had no more data fromt that grid the trends drifted over that 14 year period by almost 1.5°C. I then plotted out the Environment Canada data from 1951 to 2005 (which almost exactly matched the GISS adjusted data up to 1990) and got a trend of .4°C, so both the 1200km and the 250km infill trends do not match the actual data.

    Side note if you run an 1880 to 2009 Trend map (1200km) you will find that the grid cell that Alert is in comes up as having no data since there were no stations within 1200km of it until 1915. So that is the earliest you can go back and get a trend for that Grid Cell, of course all data is infilled from 1915 until 1951 and you see how well that worked for GISS over just 14 years when compared to the Alert Station data for 1991-2005.

  6. Nice, thought provoking stuff.

    And may one say that its a refreshing reversion to what has been most valuable about this blog, until its own recent El Nino produced an unseasonably very cold, very dry month….

  7. I’ve suggested before – but probably not here – that it’s not impossible that the Climate Hysterics will prove to be like the notorious Senator McCarthy. That is to say, their evidence is, much of it, bogus but they happen by fluke to be right and that the observations really do suggest a mild bit of warming since the end of the Little Ice Age – just as that old crook McCarthy was right that the Truman and (especially) Roosevelt administrations were riddled with communists. Mind you, that won’t really be clear until something intelligent is done about (1) lousy weather stations, and (2) the Urban Heat Island effect. Still, it would be very droll if those left-wing careerists were really McCarthyites in drag.

  8. A long time lurker sending congratulations and thanks to Jeff (and Roman) for this excellent piece of work. Particular kudos to Jeff for presenting these results in an unbiased manner.

    While presumably just being the start of a long road of investigation for Jeff et. al., for me this has already underlined one key point: having independent un-biased analysis of data and conclusions is demonstratably a ‘good thing’ and commensurate with ‘good’ science?

  9. Interesting stuff Jeff. So, temperature has marginally increased since the end of the LIA, who would ever have thought it? 🙂

    The first three graphs back to 1800/1860 included the known warm peaks in that era, although misses the much warmer peaks in the century before that. The nature of this analysis is that it obviously can’t go back to the MWP or Roman warm period so we are looking at a small snapshot in time, dominated by an extraordinary event-the LIA.

    I’m curious as to why you then shifted to a 1900 onwards anomaly analysis rather than maintained the broader sweep of history contained in your first three graphs? Data from 1900 is a very short time scale which can’t reflect all the known cycles that appear to operate, let alone the unknown ones.

    Of course none of this says anything about the quality of the data during any of this period nor the effects of UHI nor that many of the stations are recording a different micro climate to the one they started off with, which can’t be catered for in an anomaly system that looks at data rather than the circumstances behind it.

    Look forward to more of this stuff-good work.

    Tonyb

  10. Hi Jeff, I have an unhealthy interest in the 1998 warm peak. If you look at the NH SH graphs in your Fig 7, the main reason 1998 appears high in Fig 8 is because it is high in both hemispheres at the same time. Compare and contrast with say year 2000, or the contrapuntal 2008 period.

    It causes one to question “global warming” when a high or a low is determined by the sum of the hemispheres. It makes one wonder what mechanism can ceate such strange patterns. It would not seem to be the well-mixed CO2 blocking IR reradiation on a uniform global scale, would it?

    The progress is excellent. When you get raw data free of certain artefacts, we can probably expect to see some flat lining. There are many flat line stations on the globe since 1900. One has to explain why they do not rise at 1 deg C a century or so.

  11. Nicely done Jeff.
    A couple of queries/amplification requests where I think you’re assuming reader familiarity with what you’re doing –

    ‘Load data’ is perhaps glossing over the details a bit? At least ‘GHCN data used was downloaded on day mm/dd/yy’, and possibly what files were used?

    I might be missing something, but if GHCN data is used here, how is it copying Hadcrut in the hemispheric averaging? Does Hadcrut not include ocean temps?

    I’d also suggest explicitly stating that “No GISS type ‘adjustments’ (TOBS,UHI etc) were made to the raw GHCN data during this process”, if that is indeed the case?

    Your summary at the end should make it very clear that this is an attempt at a gridded representation of the input data for GISS, HADCru etc without tweaking or enhancing?

  12. #3, The whole process of WMO numbers is confusing. In tinv I get this list of stations

    2957 403 71892 0 VANCOUVER INT 49.18 -123.17 2 30 U 1169FLxxC O 1 A 2WATER
    2958 403 71892 1 NANAIMO A,BC 49.05 -123.87 3 0 73 S 47HIxxC O 5 A 7WATER
    2959 403 71892 2 LADNER,BC 49.08 -123.07 9 20 U 74FLxxC O 3 x- 9WATER
    2960 403 71892 3 NANAIMO DEPARTURE BAY,BC 49.22 -123.95 8 46 S 47FLxxC O 1 x- 9WATER
    2961 403 71892 4 NEW WESTMINSTER BC CANADA 49.22 -122.93 11 9 59 U 1169FLxxn o-9 x- 9COOL CROPS

    In ghmean there are two relevent columns, It’s too big to copy here. In this example column 3 represents the stations from Vancouver to New Westminster and column 4 represents the number of copies of what is supposed to be the same data. You can see for yourself by setting version to 0 and subtracting the series represented in staid from the getstation function from each other. Most of the values are the same, however, in this particular example, the series are more different than others. How come the data is different at all? The pro’s and some readers here probably know but to me, it’s just another mystery of climate science.

    In your example of Vancouver though there coincidentally happens to be the exact number of copies of the Vancouver data as there are different stations with the same WMO number. Long winded but the algorithm sorts the data according to column 3 and doesn’t, in fact, mix different locations together. There are other stations in the series you can use to prove to yourself that the algorithm is operating correctly.

  13. I am pretty new to all of this. So, fire away, or better, point me in the right direction. What is the underlying theory of climate that suggests temperature trends are a linear process? And if the theory does not suggest a linear process, why fit the data with a straight line?

  14. #14, you’re right, I got too tired by the end. I have been working on this for a long time now, it’s been through many improvements on what should be a much more simple exercise than I made it. There are several posts here which explain the detail of what you are looking for, in the meantime I need to finish some documentation of this though.

    #13, The values are used as NA. Just gaps in the data. Roman’s method only looks at existing values during the regression to create the offsets.


    Thanks to everyone for the kind words but please feel free to be critical where you can, that’s what helps it improve. For instance, I spent 20 minutes re-verifying the station version due to the comment from #3, although it seems to be working correctly, it was not time wasted.

  15. Jeff and Roman,

    Well done! I suspect that you’ll soon be getting a phone call from the good Dr Phil (maybe even from Reto?) asking if it will be OK to use your algorithm instead of the dodgy CRU one.

    Like Tonyb, I’m truly amazed (not really!) to find out that all this time we’ve been getting all worke dup and worried about what is essentially a minor increase in temperature as a result of recovering from the nadir of the LIA.

    As always the key question to ask and continually repeat is ‘was the late 20th century warming period ‘exceptional’ in any way – was it ‘unprecedented’ on a multi-centennial timescale?. Clear it wasn’t, so why are we we so worried about it? Why are the IPCC scientist 90% confident that it was ‘unprecedented’ within the last 1000 years and was (according to the consensus of 2,500+ scientists) due to man’s continued reliance on burning fossil fuels?

  16. #1, In previous posts, I had some trouble figuring out the best way to combine copies of what was supposed to be the same data. In some versions I treated it as different instruments and just averaged the whole pile together under 1 wmo number, in other versions, I created a single series from each instrument and then averaged the entire wmo number together. This version just assembles all the series from an individual instrument and returns that.

  17. OK, for someone with a pea sized brain, help me out. I can follow what you are doing, except for the “coding”. Are you saying that you have duplicated their process and it matches their results? (without dealing with the data quality issue0

    Bottom line, what does it mean?

  18. Interesting work. But as much as I appreciate your work, I still think the real problem is in the data: The problem with GHCN is that far too many of the still reporting stations are busy airports. I downloaded the v2.mean file and grepped out the Norwegian stations that reported in 2009. There were only 15 of them, so I quickly looked up the graphs with the GISS station data tool (many data series had large gaps, don’t know if they are in the raw ghcn data but http://www.rimfrost.no has more complete records), and as far as I can tell, only the airport stations show anything that could be taken for an “obvious” AGW (or should I say ALW, because it’s anthropogenic for sure :-)) “signal”. Most of the stations that aren’t airports, are close to the sea, and the northernmost of these could be used as a proxy for the AMO – they have a fine “camel back” with bumps in the 30s-40s and in the recent years.

    Btw. one of the longest running stations in GHCN, Torungen fyr, (Torungen light house), holds the record for the warmest summer month ever in that part of the country. It was set in 1901 (as were many of the other regional monthly heat records).

  19. #20 and 22, I think that is what we can take away from this. The math of this post is different and intended to be as accurate as possible with a gridded temperature trend. Sometimes intent doesn’t give good result though. However, using this different math on the same data produced a trend which is similar to existing series. That means that any large bias in the trend would have to come from the raw data only. By the resutlt, I’m convinced that there is no “hide the decline” in CRU’s math. — A reasonable question after climategate.

    A better comparison of this vs crutem is coming soon though.

  20. Well, us deniers never said there was NO warming…we are, after all, in an interglacial warming process. Congrats for the work; real temperatures always were imho a better scientific approach.

  21. Great analysis method for the data. Really really good work.

    Now begins the truly hard work of cleaning up the underlying data (adjusting for UHI growth over time, bad station siting, etc). Until that is complete, any pronouncement of trend based on this data is seriously premature.

  22. Jeff, correct me if I’m wrong, but I feel the real benefit from this excercise will be to give sharper resolution to the raw data – less error therefore less blurred (smoothed) by noise. It will be easier to distinguish the true data problems and homogeneity issues and whether the subsequent adjustments (done by detection algorithms I believe) have any problems.

  23. #11 Tony,

    I shifted to 1900 because there are so few stations prior to that time that the data doesn’t really represent global trend anymore. If we were to create a trend from that time to present with a more balanced number (use less stations in recent time) it would be a more fair result, however the uncertainty would be large. Maybe someone else (like Roman) has an idea on how to best handle this.

  24. Re: Jeff Id (Mar 25 12:20),

    I think that for the moment, starting in 1900 is probably a good idea. The data is very sparse before that time and probably not the most reliable. Any error bounds would be pretty big. I haven’t finished a script for the calculation of appropriate standard errors for the combined series yet, but it’s coming.

  25. Thanks for the reply Jeff.

    As you know, I have increasingly come to believe from writing my articles that Giss was deliberately started in 1880 precisely because it was a date that was reflecting a cold trend. Surely it also over allowed for US stations because that is when the US weather service really got going. The trend from a cold period in 1900 will be more pronounced than one that takes into account warm and cold ones for a further century

    Consequently the data is not comparing like for like, but absolutely appreciate your reasons for doing what you did.

    The 1700’s in particular are very interesting for their warming trend and it would be good if Roman-or anyone else- could factor that in separately on a like for like basis to see what we get, but appreciate the sparse nature of data makes this impracticable.

    As I say, we really shouldn’t get too excited because temperatures have risen from the last shout of the LIA, especially as they have been on a gently warming trend for around 400 years. By ‘we’ I mean the IPCC, Individual Governments etc etc. We should be thankful we are living in a relatively benign period. All this notion of stopping our civilisation in its tracks to reduce carbon or-even worse-introduce geo engineering- is extreme foolishness.

    What we need are more historians on climate projects in order to provide more balance and context. I suspect you’d rather be working with Bettany Hughes (currently gracing British Tv screens) than me though 🙂

    http://www.bettanyhughes.co.uk/news.htm

    Tonyb

  26. Jeff,

    “Remember though, nothing in this post discusses the quality of the raw data”

    Exactly. And, frankly, nothing else matters. Which is why I don’t find any value in these correction/averaging/anomolizing schemes.

    The only meaningful exercise is to identify long lived quality data stations.

    Luv the site, enjoy your posts. I just see this differently, I guess.

  27. #35, I don’t think we disagree at all. What this post demonstrates though is that the previously secret code and short series are probably not responsible for the trends we see. It’s actually in the data as we have it, before this particular post, I wasn’t really sure.

  28. All you have here is proof that they did not clean up the data, if you actually have RAW data.
    take out the UHI and bing go you may find a slight cooling.
    this method does a much better job and is transparent.
    Thank You Men!
    Tim L

  29. Now that’s funny. Directing people to my blog and tell them not to read my post. Aren’t people allowed to see some graphs of global average temperature? What are you afraid of?

  30. #40
    Hi Bart,
    Reading your post and the comments on a pc is hard on my eyes. I wrote the whole thing to a pdf, converted to mobi, now it’s on my Kindle, which seems kinder to my eyes. Also, it can be read in bed – which is where I tend to read the things I have the most difficulty understanding.

    some of the threads that have been at AirVent, Lucia’s, and your site, seem (with some editing) publishable.

    Jeff,
    WRT E.M.’s investigations of the data, do you suppose there might be some less tedious way to flag questionable station time series than looking at them one by one?

    I suppose this would inevitably mean that you would have to make assumptions about what a series ought to look like and then compare each one with that assumption – possibly cherry picking.

    On the other hand, when the “raw” data dance in unison (to a new step) in 1971 +/- and again in 1991 +/- it does make one wonder.

  31. #40, No insult was intended. Your comments in the thread are enlightening as well. Roman is a statistician and will probably be most interested in the stats discussion more than the beginning of the thread.

  32. #40, Now that’s really funny, I just realized you posted a comment about being afraid of not showing people global temperature on a thread where we are showing people global temperature. One we calculated to have a higher trend than the published version (although the published version used here includes ocean as someone correctly noted above).

    I’m sure the irony of the “not needing to read the headpost” situation won’t escape you.

    can you say — Oops.

    BTW: I don’t mind, you should take a moment to look at some of Roman’s work linked and explained in the following posts.

    https://noconsensus.wordpress.com/2010/03/21/demonstration-of-improved-anomaly-calculation/
    http://statpad.wordpress.com/2010/03/08/combining-stations-plan-c/

  33. #40

    Bart, Jeff has obviously taken an interest in the VS commentary which happens to be “hitched” to your post. This is the obvious reason for the 700+ comments is it not? The “what are you afraid of” stuff is pretty weird.

  34. Re: Jeff Id (Mar 25 14:06),

    Jeff, I am aware of this link and have over the last week or two glanced at it in spare moments, but have no desire at this point to get involved with 700+ comments among other things.

    As far as what we have been doing, this is not really particularly relevant. What is important to the proper reconstruction of “global temperatures” is that these temperatures have already occurred. At this point there is nothing “random” about them. They become fixed parameters which can be estimated more accurately by gathering a greater amount of (properly collected) information and processing that information in a scientifically acceptable fashion.

    As opposed to that, the discussion you referred to is more concerned with the structural form of the phenomena which generated the temperature sequence and which will continue to generate it in the future. This is more concerned with prediction rather than estimation. In my book, this is not a simple exercise in statistics, but requires serious justification from a physical “causative” viewpoint without which there is no genuine basis for carrying on such an argument.

    I think that there is more to be gained from what we have been doing… and we have fewer people throwing ad hom rocks at us ala Tamino and his Open Whatever followers. 😉

  35. Way cool. I’d like to see this tool turned into a compiled windows app, or perhaps a web app like woodfortrees.org

  36. Re: Bart (Mar 25 14:32),

    Aren’t people allowed to see some graphs of global average temperature? What are you afraid of?

    Nothing to be afraid of, but if you are going to put error bounds on your regression graph, you should at least do them correctly.

    Since you are estimating two parameters (intercept and slope), the bars are not parallel to the line , but are more of a “U” shape – closer in the middle of the line and farther away at both ends. I was able to locate an example of what they look like on the web. Most decent stat programs can generate these for you.

  37. Great work Jeff (and Roman). I’ve struggled with some of the technical details both here and that Roman has posted(and definately the code). I got enough to appreciate it.

    Having spent 6 months or so looking at temperature data from individual sites over large areas to small groups, looking at the gaps in the data and the trends. I’m not in the least surprised by Fig. 4. I’ve just not had the capability to do anything like this – fantastic!

    Am I correct in making the assumption that you should be able to run this with small groups of data sets – such as those for individual grid cells?

  38. #15 Jeff Id:

    Nanaimo A and Nanaimo Departure Bay are both on Vancouver Island (airport and ferry terminal respectively). Ladner is north of the Tswwassen ferry terminal, and New Westminster is up the Fraser River from Vancouver Int(ernational airport). Odd that they’re the same WMO series.

  39. #47, Anthony,

    I have no experience whatsoever in creating web applications. Almost all my programming has been in math, vision systems, robotics, lens design calculations, and a few other fun things.

  40. #51, it’s like a box of old socks. They have got to do something about the quality of our temperature data. If nothing else, I would support a project to gather and maintain a quality open dataset with true RAW data of the best quality which can be obtained.

  41. Jeff & Roman,

    Just a suggestion: Could you calculate a kind of ‘centre of mass’ for the stations? It would be interesting to see how the mean position of the stations migrated with time and similarly or the number of stations.

  42. Jeff,

    I am furthering the idea put forth by Timetochooseagain (comment 33). Your analysis seems extremely land-based and yet you’ve overlaid the CRU global (land+ocean) data.

    The land-only CRU dataset has a trend of about 0.224C/decade since 1978 (compared with about 0.16C/dec in the land+ocean record), which is not all that much different from your 0.248C/dec.

    Admittedly, your new trend seems higher, but I guess how much higher will ultimately await an areal to areal comparison, right?

    Thanks,

    -Chip

  43. #56, You are right about the ocean land comparison. The slight increase in trend is all we can expect from an anomaly offset analysis like this one but it is more representative of the data.

  44. Fair enough. Just though it odd and wrote a quick murky reply.

    RomanM, The lines are indeed not parallel, but slightly curved, but point taken nevertheless.

  45. “As far as what we have been doing, this is not really particularly relevant. What is important to the proper reconstruction of “global temperatures” is that these temperatures have already occurred. At this point there is nothing “random” about them. They become fixed parameters which can be estimated more accurately by gathering a greater amount of (properly collected) information and processing that information in a scientifically acceptable fashion.”

    It’s not about randomness, it’s about the assumption of trend-stationarity, which is violated due to the clear presence of a unit root in the temperature record (extensive evaluation results posted, review welcomed).

    Deterministic OLS trend estimation assumes trend-stationarity, so the model is misspecified. This is a clear violation of the first Gauss-Markov assumption making the OLS estimator BLUE.

    Estimating a proper stochastic trend (I found ARIMA(3,1,0) to do just fine, diagnostics posted) yields a statistically insignificant drift parameter (p-value>0.10) for the period 1881-2008.

    Do come over and take a look. I’m inviting all statisticians.

    There’s goodies like piles of test results, endogenous breakpoints, forecasting intervals and monte carlo simulations 😉

    VS

  46. An invitation to statisticians for to come over for “goodies”. Instead of hors d’oeuvres we have “piles of test results, endogenous breakpoints, forecasting intervals and monte carlo simulations”.

    Remind me to never go to one of your parties. 🙂

  47. Re: VS (Mar 25 17:36),

    Deterministic OLS trend estimation assumes trend-stationarity, so the model is misspecified.

    That may be the case if you are talking about fitting a linear model to the situation, but that is not what Jeff is doing (until the very end of the analysis – a very minor optional part). The OLS involved is in estimating a sequence of monthly “combined values” – call it “temperature signal” if you like (I don’t like). There is nothing in the earlier part of the analysis which assumes anything (or depends on anything) of the structure of the process which generated the series.

    If there is auto-correlation in the residuals of the linear fit that he did do, it can either be to persistence of effects at the station and/or grid level or due to the non-linearity of the global temperature changes themselves, which have already occurred.

    Whether the linear fit is reasonable is another question, but what Jeff has done in creating the global series is independent of the discussion you have been having on the possible time series structure. It could however help give a clearer picture of what has actually occurred in the past.

  48. # 63
    As a non statistician I can only presume that there is joke in there!!

    The only “math” joke I know is the one abut calculus having it’s limits………………….

  49. # 60
    Don’t laugh !! I once went to a chemical engineer party ( I am NOT a chemical engineer) that was quite as exciting as you listed !!!!

  50. #63 – haha. Crazy dog.

    Roman or VS,

    As I see it the least squares trend of a 100 year temp series is nearly meaningless. Of course it’s just a thumbnail look which tells me that. I’m still reading and learning about VS’s analysis on the other thread. On the shorter 30 year period, it has meaning to me in that we can see the rate of past rise. Whether it’s significant with respect to what you might get from modeled trendless data having similar properties is not something I’m really trying to answer, however it does look significant. My eyes say there could be a true long term signal underlying the noise. Noise which Roman’s work has incrementally reduced BTW.

    VS’s points though are well taken on the other thread that methods exist for the statistics that can make a proper determination. Others here have seen how I work enough to know that I’ll be doing some reading and hopefully absorbing of what VS does.

    What doesn’t make sense to me is how a linear trend ls or other, over a short 30 year period isn’t a reasonable thing to do. It only tells us what happened and has no predictive power so I have a hard time seeing where someone would object. As long as I don’t claim to have knowledge of what the next point will be other than thermal mass (auto correlation), a linear trend seems fine to me for a description of the rate of warming. There is no a-priori claim of a mechanism for the trend and nobody is saying that taking the difference of a proper group of two different time periods might not be better. Where did I go wrong?

  51. I have a question. Just how meaningful is a “global mean temperature” to an understanding of climate and climate changes? There are many aspects of regional climate changes that correlate with major global climate changes, but which can be only vaguely related to anything a thermometer could measure. As an example consider the overall climate changes of the Great Basin in the western US between the Pleistocene and the present. During the Pleistocene the Great Basing hosted some of the largest freshwater lakes on the planet including Pleistocene lakes Bonneville and Lahontan. These lakes were truly immense. Nevada and Utah would have appeared more or less as inland seas with mountainous archipelagos from space. While this might seem consistent with an AGW idea that increased heat implies increased drought and cooling, moister conditions, the ice core data from the Antarctic and from Greenland seem to show that atmospheric dust was far more prevalent during glacial epochs, implying average drier conditions around the globe during the glacial epochs.

    The Great Basin is largely a desert because the Sierra Nevada and Cascade ranges cast a long rain shadow. J. G. Houghton in an article in Professional Geographer (23 Feb 2005) provides a description of the primary sources of Great Basin precipitation. He notes three sources: Pacific Frontal systems during the winter, which generate little precipitation except in the higher mountains, Spring and Fall Continental weather that develops over the Great Basin itself as the polar front locates over the region, and non-cylonic summer precipitation driven by convection from moist easterlies moving up from the Gulf of Mexico. The physical geography of the Great Basin has not changed significantly since the Pleistocene. It can be reasonably argued that if there was more water available in the Great Basin during the glacial epochs and early Holocene to fill those immense lakes, then the individual contributions of these precipitation sources must have balanced out very differently.

    The Pacific frontal weather would almost certainly have been even drier than at present. Air masses moving over the Sierra would have had to pass not only over the mountains over large montane ice sheets that capped the Sierra for much of the glacial epoch and would extracted even more moisture. Colder air from the continental ice sheets to the north would also have been extremely dry presumably limiting the role of “continental” weather in supporting the large lakes. The sole important moisture source left would seem to be the Gulf of Mexico. It would appear then that immense lakes in the Great Basin had to be fed by a radically different atmospheric circulation pattern. Simple temperature differences between the Pleistocene and the present cannot possibly account for the climate differences between the lacustrian environment of the Great Basin during the Pleistocene and early Holocene and the desert present. Climate and temperature appear to be only partially related, and limiting a discussion of climate to “warming” versus “cooling” would seem to turn a blind eye to the actual important climate changes which are in the behaviour of weather systems and atmospheric circulation patterns, which might not have any simple, directional causal ties with temperature changes.

    So, again, just how meaningful IS a measure of global mean temperature?

  52. 67-It depends on what you mean by meaningful. If you are talking about what it means in terms of the global climate, ideally it describes the mean of the whole Earth’s surface’s temperature distribution. Obviously that is only a small part of climate, which consists of the whole of the distributions of all the phenomena.

    That being said, the GMST could also be expected to change generally in sync with most places on Earth, if not the whole, in terms of mean temperatures. Over the short term this is apparently not the case, but over the long term (tens of thousands of years and on) it certainly is, except for a few things like the Younger Dryas. In terms of other climate variables the correlations are weaker and more varied spatially, which is not surprising considering that GMST is not a statistic derived from those distributions. However, in the case of precipitation I would expect a moderately positive correlation on average with some negative here and there.

    How meaningful is GMST to actual people? I’d say zip, zero, zilch, nada.

  53. 66 – I don’t think you’ve gone wrong.

    The value of this is audit, quality and diagnostic. It says nothing about climate. There is clear value in doing something right and checking it against what has been done before with flawed methods.

    The diagnostic value is also against other measures such as single station high quality long term data sets. If they do not agree with the global trend over equivalent periods it prompts questions of “why?”. The predictive quality of all the stuff I’ve seen so far is nil. Splines, wavelets, OLS etc etc tell us nothing about the future without a validated causal model. Nor have I seen a “nul case model” – why is climate as it is and what are the limits of accuracy on that model? Tonyb is showing time and again we are within the limits of what has gone before and that this was in times prior to large scale industrial activity. If the global trend can be cleaned up and relied on with known limits all of a sudden maybe things like rate of change can be discussed sensibly – what if the late 20C showed a (say) 2x, 3x or 10x rate of temp rise to anything previous? I think that would pose more questions; so again trend, or simply gradient, is relevant as a diagnostic not a predictive.

    Apologies for preaching to the converted but a couple of the dismissals of the value of this post are IMO off target. The “whys” that will follow from this will interrogate the datasets as well as the ways in which they have been analysed, represented, modelled and extrapolated. High quality long temp records are the standards which the global temp index needs to measure up to – they are reality and the global temp index is the abstraction; where and why they differ is a cause for investigation in the context of the proposal of AGW.

    FWIW from an observer I think this should be publishable? – given how long the climate community have been banging on about trends it would be good to get a decent open approach in the literature. Maybe as a letter or note to a journal rather than a paper?

    Thanks Jeff and Roman – I’d say again get a tip jar, but now you are pumping the trends I guess it means the big oil money finally found you?!? 🙂

  54. Jeff

    Sorry to belabor this point, but I just don’t seem to get this single station thingy.

    I tried it with 68262 which is according to the ghmean data base is
    V1 V2 V3 V4
    141 68262 0 0 Years of data 1949 to 1991 tinv gives PRETORIA
    141 68262 0 1 Yrs of data 1987 to 2009 PRETORIA – UNIV PROEFPLAAS
    141 68262 1 0 Yrs data 1960 -1991(There is no entry in tinv for this)

    running the getstation with satid=68262 Version=0 gives a seasonal plot with the Years in the X axis of 1950 to 2010.

    running getstation with satid=68262 Version=1 gives a seasonal plot with the years 1960 to 1991.

    Is it possible to get the program to distinguish between

    V3 V4
    0 0
    0 1
    1 0

    as these appear to be different data records?

    OR can you try to explain to me (for the final time) what the heck I’m missing here.

    By the way, has the script you sent me been updated?

  55. It’s no problem.

    V3 (col 3 in your text) is the location, V4 (col 4 in your text) is different reports of the same instrument.

    141 68262 0 0 Years of data 1949 to 1991 tinv gives PRETORIA
    141 68262 0 1 Yrs of data 1987 to 2009 PRETORIA – UNIV PROEFPLAAS —– this is not correct, it’s actually the same instrument as the first.

    You can see it’s not correct if you look at individual values of the second series. They are in most cases, the same as the first. Why is it like this???– is a great question.

  56. Jeff

    Thanks – I understand what you are saying but it doesn’t seem to make much sense. I’ll have to look closer at the data when I get back home. Don’t understand why they would have two sets of temperatures from one instrument, with different years some of which overlap, two different locations with same 5 digit code one of which gives a opposite slope etc, the tinv database seemingly giving incorrect location names (did they move the same instrument from Van Int to Nanaimo (50-70 miles accross Georgia strait)etc etc.

  57. Very impressive work guys. I like figure 4. I agree with Scott that until you address the UHI expanded to include land use and bad siting increasing as instrumentation changed, you can’t conclude anything about real trend.

    Obviously as one commenter pointed out there is a cyclical trend to the data (which by the way correlates very well with the ocean cycles). Why not apply a polynomial or 5 to 10 year running mean to see that cyclical behavior. Then compare peak to peak and min to min in the cycles to see if you get different results from a linear that starts in a cold cycle period and ends in a warm cycle period? We found the 1940 to 2000 net change less than 0.2C warming in USHCN v2 from peak to peak of the polynomial.

    Also have you spot checked the raw GHCN vs any real stations for which we have raw data to verify it is raw. I heard from one met yesterday that has been comparing NCDC data with the paper records for a long standing high quality site that they didn’t agree.

    Finally, does your approach allow you to quickly do the same analysis for the GHCN adjusted and compared the GHCN raw to see how much GHCN adjusted adjusts?

    Keep up the good detective work.

  58. Varco @ Post #9

    While presumably just being the start of a long road of investigation for Jeff et. al., for me this has already underlined one key point: having independent un-biased analysis of data and conclusions is demonstratably a ‘good thing’ and commensurate with ‘good’ science?

    I really enjoy these analyses and discussions and second Varco’s comments. It shows that it is important to get the methods correct before making any more detailed conclusions. I also like to hear some of the more philosophical POVs on what a global mean temperature means.

    Ultimately I am interested in someone determining the realistic CIs for global temperature trends. In the meantime I am continuing to look at what influences the differences in temperature anomalies with a 5 x 5 degree grid. I have reached a point where I will be using the method RomanM taught in analyzing the Watts team CRN evaluations.

    So many variables and so little time.

  59. Now look at that match. CRU has slightly too low a historic value WRT GHCN 1935 and before, but in the alleged anthropogenic global warming era,

    The legend says HadCru. Is the comparison to GHCN to CRU, or GHCN to HadCRUT?

  60. I’ll stand where I’ve stood all along; there may be warming,in fact it’s quite likely there has been, but it’s impossible to measure directly, and statistical analysis of a poor data set is a poor substitute for direct
    measurement.

    And as you indicated; until we have a much better understanding of how to deal with UHI, any trend information derived from this data is highly suspect.

    So while all of this is fun to debate and there is probably learning involved, it is more likely applied statistics in chaotic systems that is being advanced than climate physics. Whatever it is it’s not the basis by which we should enact laws governing our behavior and limiting our economic opportunities.

    Further; if there is a link to CO2, it’s likely that it’s a minor contributor, along with Malankovitch cycles, solar variation, oceanic oscillations and the atmospheric effects of volcanism, etc. etc. Until we can ‘unhighjack’ the science of climate systems we are unlikely to make any progress at all in understanding how all of these contributing factors integrate into mean surface temperature, whatever the trend shows.

    Good work though, always interesting to see a new statistical run at the same problem, producing a slightly differing answer. Sorry to rant, happy Friday.

  61. Jeff, sorry to bug you. Would you consider re-doing your raw 1978-2010 GHCN spatial trend map on the same -3 to +3 scale as your figure 8 above?

  62. Jeff,

    I had the same idea as Anthony, if we could get a volunteer to do the programming, basically duplicate the page at GISS.

    If I get some spare time I’ll see if I can find somebody.

    On the multiple versions for one WMO. That’s one of the most confusing things I’ve seen.

    The three digit modifier has never been clear to me.

    perhaps a couple of concrete examples of what you see and what you do would help.

  63. Re: Jeff Id (Mar 25 22:24),

    They are in most cases, the same as the first. Why is it like this???– is a great question.

    From the GHCN v2_temperature_readme test file:

    Each line of the data file has:

    station number which has three parts:
    country code (3 digits)
    nearest WMO station number (5 digits)
    modifier (3 digits) (this is usually 000 if it is that WMO station)

    Duplicate number:
    one digit (0-9). The duplicate order is based on length of data. Maximum and minimum temperature files have duplicate numbers but only one time series (because there is only one way to calculate the mean monthly maximum temperature). The duplicate numbers in max/min refer back to the mean temperature duplicate time series created by (Max+Min)/2.

    The duplicate values exist because their data may have come from different sources. The ones that share the five number station id, but don’t have the 000 modifier are measurements from other sites proximate to the station.

  64. Thanks Roman,

    I knew I learned how to do this from somewhere ;). Getting older, brain not working. Someone, maybe Nick or NicL pointed it out to me several months ago. I should be calling them duplicate number and modifier.

    BTW:I’m going to be a dad here, twice over, in the next few days. 41yo is a bit late for the second!

  65. Jeff,

    I think your comment regarding the diff between NH & SH temps is the most important one here. The SH trend tracks closely with the SST and LH trends.

    http://woodfortrees.org/plot/uah/from:1900/to:2009/plot/hadsst2gl/from:1900/to:2009/plot/hadcrut3sh/from:1900/to:2009/plot/hadcrut3nh/from:1900/to:2009

    The NH land trend is obviously the outlier. Although Chad does not have the GHCN data online, so that the land-only trend can be plotted separately, the large NH diff is clearly due to land temps. That should clue us that some factor is at work in that dataset that is not affecting the others.

    Good work on the new method.

  66. My head still hurst. or hurts.

    station number which has three parts:
    country code (3 digits)
    nearest WMO station number (5 digits)
    modifier (3 digits) (this is usually 000 if it is that WMO station)

    Duplicate number:
    one digit (0-9). The duplicate order is based on length of data.
    Maximum and minimum temperature files have duplicate numbers but only one
    time series (because there is only one way to calculate the mean monthly
    maximum temperature). The duplicate numbers in max/min refer back to the
    mean temperature duplicate time series created by (Max+Min)/2.

    err needs a picture

  67. re – BTW:I’m going to be…
    1. Congratulations.
    2. Have heard much discussion about this type of outcome over the years (and strange analogies with jelly beans and bottles) with many claiming that the aforesaid outcome was devoid of any and all documented input. Thus rendering any such outcome as not just highly improbable but phsically impossible.
    3. Might I suggest that you (as a scientifically minded person) conduct a rigorous experiment (I believe that 30 years is the preferred time span) with the goal of finally putting to bed some of the common misconceptions re correlation and causation wrt the aforementioned outcome.
    4. Naturally any trend you identify within your dataset should be positive to correlate with the overwhelming consenus of other trends.
    5. Ensure your data is protected by IPR which I am reliably informed stands for Individual Procreation Rights.
    6. One rather expects that as a scientically minded person you will maintain your enthusiasm for generating project data and thus never find oneself in the position where you may need to ‘hide the decline’.

  68. Well, us deniers never said there was NO warming

    Skeptics have consistently and voluminously said that the official surface records are biased warm. This analysis, and many other recent ones like it, throw cold water on this assertion repeated a gazillion times on the intertubes. Of course, the skeptics were announcing a perversion of data long before a global analysis was ever undertaken (kudos to the authors here). What do you reckon the chances are that skeptics will learn to be more cautious in their claims?

    we are, after all, in an interglacial warming process.

    Actually, the reverse according to orbital dynamics. We’ve been slowly heading towards an ice age since the beginning of the current interglacial ~10 000 years ago.

    I find this ‘rebound from the Little Ice Age’ meme quite surprising. The implication is that the Earth has a preferred temperature toward which it has been returning. I thought the skeptical view abhorred such a concept. (That is actually consistent with mainstream science) No mechanism is discussed, just an assumption of some elastic effect with no physical underpinning. It aint orbital dynamics – the trend should be the other way.

    Congratulations on doing all the hard work Roman/Jeff ID (et al?). May the upcoming paper be reviewed favourably.

  69. #92 Barry

    This analysis, and many other recent ones like it, throw cold water on this assertion repeated a gazillion times on the intertubes.

    It may throw some cold water on the notion that station loss causes a warm trend bias, but it says nothing about possible bias in the adjustment methods.

    I find this ‘rebound from the Little Ice Age’ meme quite surprising. The implication is that the Earth has a preferred temperature toward which it has been returning.

    Nothing of the sort is implied. Climate varies on wide ranging time scales. Orbital changes account for climate change on a scale of tens of thousands of years. It is only suggested that GHG’s can account for warming in the last few decades – yet the world warmed between 1850 (roughly the end of the LIA) and 1950.

    No mechanism is discussed, just an assumption of some elastic effect with no physical underpinning.

    Some of the proposed mechanisms to account for emergence from the LIA include solar variability (a lack of solar minimums like the Maunder and Dalton); decreased volcanic activity; possible changes in ocean circulation patterns; internal climate variability.

  70. #94 Turboblocke

    Section 3 is a discussion of elimination or merging of duplicated station time series. After a quick read all I could find was page 2842 (section 4) where it discusses the Colonial Area Archive Project which is digitizing pre-1900 data and gradually including these into the data set. Is this what you are refering to?

    Figure #2(a) page 2842 shows the station count time series with the dramatic drop starting about 1980. I’m pretty sure this recent station drop off is what Jeff is speaking of.

  71. Layman Lurker,

    It may throw some cold water on the notion that station loss causes a warm trend bias, but it says nothing about possible bias in the adjustment methods.

    The majority of recent analyses, like the one in the above post, are of the raw data, not station dropout. The result so far is unanimous. The temperature trends derived from raw station data differ little from the adjusted surface records (GISS is actually the lowest). Previous (endless) claims otherwise were never substantiated and should now be modified if not outright recanted.

    Here’s a comparison of results, including Jeff/Roman’s above, at another skeptical blogsite.

    http://rankexploits.com/musings/2010/comparing-global-land-temperature-reconstructions/

    A lot of hard work has being done by skeptics and proponents alike – finally. This should be acknowledged, as should the results.

    Some of the proposed mechanisms to account for emergence from the LIA include solar variability (a lack of solar minimums like the Maunder and Dalton); decreased volcanic activity; possible changes in ocean circulation patterns; internal climate variability.

    I’d be interested in references if it’s not too much trouble. I’d note that the solar trend has been flat or slightly declining for the last 50 years, and that volcanic activity has been relatively stable for the last 80 as far as we can tell.

  72. #94 Tb

    What do you mean by the “elimination of temp. stations”? Aren’t you aware that there is on-going back filling of old station data?

    Funny thing about that. If it is the cause of the Great Thermometer Dropout of the late eighties and early nineties, then perhaps you might explain why they are “back filling” only the early portion of the record of the multitude of stations that continue to measure temperatures to the present day.

    Maybe they are just waiting for more funds to complete that onerous job of data entry. 😉

  73. Take a break, Jeff. I still haven’t finshed catching up on reading everything published while I was gone.

    Hope the little one is doing well.

  74. You cannot polish a turd. Its all pointless if your dataset is no good.

    As the surface record tracks the satellite record pretty well – variance and trend – we might have some confidence in the surface data. The satellite data is not calibrated by surface data. It’s completely independent (and measures the atmospheric temps to a higher altitude). Satellites are not affected by urban heat influence.

  75. If it is the cause of the Great Thermometer Dropout of the late eighties and early nineties, then perhaps you might explain why they are “back filling” only the early portion of the record of the multitude of stations that continue to measure temperatures to the present day.

    The linked report was published in 1997.

    There will be another flurry of infilling in the future. The task is onerous. It’s done by hand. As we’ve learned, station dropout doesn’t make much of a difference, so perhaps infilling is not a priority at present.

  76. #96 You cannot polish a turd. Its all pointless if your dataset is no good. And if you claim that a dataset is no good, then in the real world, you have to prove it. Considering that, as Barry points out in 100, there is good agreement between all the main temperature indices, why postulate that there is anything wrong with the data?

    I have the suspicion that this meme started because people didn’t realise that the main indices had different baseline years for calculating the anomaly. As GISSTempt has the earlier baseline it always shows a greater anomaly than HADCRUT which has a later baseline and which shows a greater anomaly than UAH and RSS which have the youngest baselines.

  77. The majority of recent analyses, like the one in the above post, are of the raw data, not station dropout.

    True. My confusion. I do however belive Roman’s method for combining series gives more confidence in preventing any leakage of bias in absolute temps (due to shift to warmer stations from station dropout) into the anomaly trends.

    The result so far is unanimous. The temperature trends derived from raw station data differ little from the adjusted surface records

    True enough, but it is also true that there is a substantial trend in the adjustments from raw to adjusted GHCN. There will undoubtedly be continuing analysis of adjustment methods, UHI, TOB, etc, with much more to learn before conclusions can be drawn. One should also note from figure 8 (trends since 1978) above that the sharper resoluton of Roman’s method has uncovered inexplicable spatial trend discontinuities between grids which seem to be smoothed over in the other data sets.

    As for references on possible causes and mechanisms of climate variability (including emergence from LIA), here are some:
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC34299/

    Click to access Crowley2000.pdf

    Besides this there is Wiki, and paywall abstracts on internal variability. Most articles argue for a convergence of several factors. My point was essentially that changes in forcing due to orbital dynamics occur so slowly that it cannot account either for the existence of the LIA or the emergence from it.

  78. (due to shift to warmer stations from station dropout) Here’s another zombie meme: it’s a global temperature anomaly. Therefore it measures the difference between the the temp. station in the past and the temp. station now. So it doesn’t matter whether the starting temperature was 10, 20 or 30 C, it’s the difference between the start temperature and the temperature now that gives the anomaly.

    And of course the station drop out meme which is so appealing to conspiracy theorists is just plain wrong. What’s the most likely story…as explained in the 1997 paper I referred to in #94, past station data is being back filled as it (and the necessary man power)becomes available or… for about 30 years temp stations have been dropped and no one noticed until now?

  79. #101 Tb

    There will be another flurry of infilling in the future. The task is onerous. It’s done by hand.

    Ridiculous. Do you honestly believe that the temperature data post 1990 that we are talking about is NOT in electronic form? Most of it actually exists (and has existed) in that format since it was measured. If it was the other way around, you might have a possibly valid point, but, as it is, your “explanation” is nonsense.

  80. Well, that wasn’t me at 101, but I do think that you ought to check your facts before using words such as “ridiculous” and “nonsense”

    Read, digest and let’s see how graciously you respond…
    http://www.yaleclimatemediaforum.org/2010/01/kusi-noaa-nasa/

    Some common misconceptions are dealt with in the comments.

    Note here how the data has to be prepared for inclusion… http://gosic.org/gcos/GSN/CLIMAT-code_practical-help_081223.pdf

    BTW I acknowledge that I mistakenly pointed to section 3 in the paper in #94; it should have been section 2 and the note to Figure 2

  81. Also see http://www.ncdc.noaa.gov/cmb-faq/temperature-monitoring.html

    The number of land surface weather stations in the Global Historical Climatology Network (GHCN) drops off in recent years. This fact is an indication of our success in adding historical data. Every month data from over 1,200 stations from around the world are added to GHCN as a result of monthly reports transmitted over the Global Telecommunication System. This number is up from what it was a decade ago due to systematic efforts by the Global Climate Observing System (GCOS) and others to encourage countries to send in CLIMAT reports. If NCDC relied solely on such data that would be the maximum number of stations available. But we have systematically sought to increase the data holdings in the past through international projects such as the once a decade creation of World Weather Records as well as NCDC’s own digitization of select Colonial Era archive data. The creation of the GCOS Surface Network is one example of a specific attempt to both enhance data exchange around the world and to identify and select the ‘best’ stations for long-term climate change purposes. The weighting scheme used to rate stations for the initial selection in the GSN clearly indicates the biases climatologists have in favor of stations that have been in operation for a long time, that are rural, are agricultural research sites, and are distributed throughout the world with increasing density the farther they are away from the tropics. The result of all these efforts is that GHCN has data for many thousands of stations in the period from the 1950s to the 1990s that cannot be routinely updated, thus the number of stations drops considerably in recent years.

  82. #106 Tb

    You are correct. My bad. I mistakenly referenced the wrong name in my comment #105. However, it was your original statement in #94 that was the point in contention.

    Despite the statements in the Peterson and Vose paper you refer to, I stand by my contention that the explanation is specious. What may have had even a grain of truth in 1997 is clearly not true thirteen years later in 2010. There is no excuse for not having incorporated the stations which actually continue to measure temperatures today. The yaleclimatemediaforum site states:

    During that spike in station counts in the 1970s, those stations were not actively reporting to some central repository. Rather, those records were collected years and decades later through painstaking work by researchers. It is quite likely that, a decade or two from now, the number of stations available for the 1990s and 2000s will exceed the 6,000-station peak reached in the 1970s.

    The 1970s were 40 years ago and stations may have used paper to store records then, but that would not be the case today so waiting a “decade or two” to update the 1990’s is nonsense. An update for an earlier period is a one-time deal. It isn’t that difficult.

    A drop of 50% in the number of stations and more than 40% in the number of 5×5 degree grid cells represented is not negligible. Arguments that “it doesn’t matter” for calculating the trends just don’t cut it particularly since regional information is involved and can be important to other research.

    Your quote from NOAA in #107 sounds like a rationalization, not legitimate reasons:

    Every month we add 1200 station values (This is a lot? Are these really the only ones available?). This is up from ten years ago (I would expect this, but down a bunch from twenty years earlier). But we only wish to select the “best” stations for long-term purposes (except when we added – and kept – all of those onerously entered historical ones that inflated those numbers in the 1950s)… and we just cannot update them (Really? All of them? Cannot or “do not”?).

    If these are professional data management agencies, then we are entitled to expect them to produce professional results. If they are unable to do so, then another agency should be given the job.

  83. Sorry RM, but to me you’re just expressing an opinion and using judgemental language such as “specious” and “rationalization” does not reveal you to be open minded. I’m not sure that I’m qualified to express an opinion on the matter as I’m not a professional in the area but I do note that not all countries systematically produce data in the CLIMAT format required. In addition, not all the countries of the world have access to the budget and the technology that is commonplace in the US and the UK.

    I also find that you comments about adding the earlier data bizarre. What on earth could possibly be wrong with that? How would you react if you heard that they had access to station data that they didn’t include? You may also have missed the significance of the decadal World Weather Record, see here: http://ols.nndc.noaa.gov/plolstore/plsql/olstore.prodspecific?prodnum=C00160-PUB-S0001

    Frankly it seems to me that your only responses to links to papers and data showing the legimate reasons for the past number of stations to be greater than today’s are vague and unsubstantiated accusations of a lack of good will/faith.

  84. The reporting stations that comprise the near real-time updates to GHCN provide data in digitised form that is consistent with or easily translatable into GHCN forma. The rest have to be converted individually.

    TB’s link

    Click to access CLIMAT-code_practical-help_081223.pdf

    gives a clue as to how data has to be translated. It is an onerous task. From the Yale Forum on Climate Page:

    It’s common to think of temperature stations as modern Internet-linked operations that instantly report temperature readings to readily accessible databases, but that is not particularly accurate for stations outside of the United States and Western Europe. For many of the world’s stations, observations are still taken and recorded by hand, and assembling and digitizing records from thousands of stations worldwide is burdensome.

    We must be cautious of projecting our experience of easy access to whatever we want on the net to everything else in life. But, if it is insisted the data is easily obtainable, then why don’t you, Roman, go a little further in your laudable efforts, and gather up the data for missing stations? You would be doing two things at once – substantiating your suspicions and infilling data for your own work. You might also end up make a contribution to the GHCN database itself, which would be good for everybody.

  85. Sorry RM, but to me you’re just expressing an opinion and using judgemental language such as “specious” and “rationalization” does not reveal you to be open minded.

    …and “zombie meme” is objective and scientific? C’mon tb. Did you not do the same thing in #104? For you to state what you did in 104 and then chirp at Roman comes off as hypocritical to me.

    We all have our opinions. I think you will find that guy’s like Roman are open minded but for an opinion to change it usually takes more then a few lines of a comment.

  86. Barry, tb,…FWIW Chad has a series of posts using simulated data in an attempt to evaluate whether the data homogeneity, station counts, and other data quality related issues interacted with the methods of combining station data. Figure 6 and table 3 of this linked post shows correlations between the derived methodological errors and the station counts. This is a case example of why we should not be to quick to shut the door on station counts or any other potential data quality issues.

    http://treesfortheforest.wordpress.com/2010/02/16/combining-inhomogeneous-station-data-%e2%80%93-part-ii/

    I hope to see Chad pick this up again to unravel (and confirm) just how all these factors interact. Perhaps he can incorporate Roman’s method as well.

  87. Ask yourself when was the data “infilled” into the data set and what occurred afterward.

    If it is referred to in a 1997 paper, then the data would have been added at the time the initial dataset was being constructed or shortly thereafter. Then, in the intervening time period, no more data from those stations was included in the set. Now, I would not call that “infilling”. The stations were de facto dropped from the mix although their earlier data remained.

    This was due to station closures in some cases, but for the many cases where data continued to be generated, a decision was obviously made by the people who were looking after the data set not to actively seek and add that data. This decision could have been made on the basis of ease of access and availability, but those stations have been dropped nonetheless. To this date, 13 years after the paper, they have not reappeared.

    Yes, some of the stations may be on paper, but in my view, at this stage that is rare. Stations in Canada, New Zealand and other countries are not used although I am sure you would admit that they are not that difficult to get information from. If “infilling” is desired, this information need only be gathered once so even real-time access is not necessary for that purpose.

    Frankly, I would rather examine what has happened and base my judgment on that rather than read the opinion of a blogger based on the out-of-date verbatim statements made by someone else. But, enough time spent on this disagreement …

  88. If it is referred to in a 1997 paper

    You mean you made your comment without checking the reference given?

    You may harbour suspicions about the honesty of the GHCN compilers, but if you are going to air them you need to come up with something concrete.

    but for the many cases where data continued to be generated, a decision was obviously made by the people who were looking after the data set not to actively seek and add that data. This decision could have been made on the basis of ease of access and availability

    Excellent, you have a hypothesis that you can test very easily. We know that La Paz is not covered by GHCN. They have a weather service and you can get daily temps and precipitation online. See how long it takes you to transform La Paz data into GHCN form. It’s only one city, and you seem to believe that the task should be easy, so it should take you – what – 10 minutes at the most?

    Now, what does GHCN data include? Max/min/mean temps, monthly anomalies, precipitation…. you’d need to replicate the full GHCN forma to see how long it would take them to do one station. This should be straightforward because you have been working with such data intensely. Then, you can extrapolate how much time it would take to do thousands of stations – that have weather data online.

    Let us know how long it takes for just the one station.

    Once again, kudos for all the hard work you’ve put into the analyses above. The disagreement we have here doesn’t touch that.

  89. #114 Barry

    If it is referred to in a 1997 paper

    You mean you made your comment without checking the reference given?

    Where do you get that notion from? I think you misunderstood what I said. Read “Because” instead of “If”. My point was that what has been termed “infilling” was actually the construction of the initial data set BECAUSE it was already referred to in the 1997 paper. The drop in the number of stations used was not an artifact of a later inclusion of early results. The reason I called that notion nonsense is because it would make zero sense to add data (say in 2005) from ongoing stations but only add the values up to 1990 and at the same time ignore values up to 2005.

    From that same paper:

    One of the primary goals of GHCN version 2 was to acquire additional data in order to enhance spatial and temporal coverage.

    Well, they did that. But then it seems it became subsequently unimportant, because they made no further effort in that direction. Increased spatial coverage is important so that the record need not rely on “1200 km smoothing”. Land areas like Africa, South America and the Arctic could be better sampled and included in the current version of the data set, but there has been \very little achieved in that direction since the original construction.

    Your argument that it is “oh, so difficult” to do this one station at a time may apply to amateurs, but this is supposed to be a professional outfit. Does it have to be done one station at a time? I don’t think so. Agreements can be made with other national meteorological organizations for a more regular gathering and transmission of such information. If money is lacking, maybe we can steer some of the research money from polar bear and penguin studies towards the creation of a currently updated good quality temperature record. I think it is important, but that’s only my opinion…

  90. Your argument that it is “oh, so difficult” to do this one station at a time may apply to amateurs, but this is supposed to be a professional outfit. Does it have to be done one station at a time? I don’t think so. Agreements can be made with other national meteorological organizations for a more regular gathering and transmission of such information. If money is lacking, maybe we can steer some of the research money from polar bear and penguin studies towards the creation of a currently updated good quality temperature record. I think it is important, but that’s only my opinion…

    Agreements have been made with NMS’s. That’s the near real-time stream that builds the monthly data.

    I don’t know the ins and outs of the relationships NOAA has with the NMSs it deals with, what the capabilities are of those NMSs, or whether NOAA figures the distribution of stations is sufficient at the moment. Judging by the recent bevvy of analyses on station drop out, it seems pretty clear that there’s not much difference using *all* (pre-cutoff) or the ones they have reporting regularly (post-cutoff). IOW, the data grids they have seem to be robust. Have you tested this yourself as Tamino and others have done?

    Maybe it does have to be done one station at a time, for the data not regularly reported.

    Consider – for years ‘skeptics’ have decried the official surface records as warm biased. Years, and no one did the proper analysis to substantiate that view – just cherry-picked stations that showed what they wanted to see. Now YOU have gone to considerable effort, and lo, you’ve shown that the millions of words devoted to rubbishing the temp records (and slandering the compilers) in this way has been grossly misplaced.

    I would think, in light of that alone, that the lesson is clear. If one hasn’t done the work, doesn’t know much about it, then one should be mighty cautious about offering opinions.

    The assumption of warm biased adjustments all this time is not based on analysis but on ideology. You have done good work here (and not because it bolsters the official records). You tested your assumptions and got a clearer view. It’s admirable. What’s not admirable is returning suspicion and innuendo on a subject you haven’t yet tested. The legitimate course is to formulate a good question and set about answering it. If you’re not prepared to make a new investigation – and you don’t have to – then the only honest opinion you can have is “I don’t know.” Have you even tried contacting NOAA and asking them why they don’t have more infill?

    Having seen what it’s done to Phil Jones, the ideology-based criticism galls me. I should gall any honest skeptic, too. It can do real damage. There’s got to be respect alongside skepticism.

    I won’t press this any further. It’s personal to me. Good wishes and keep up the good work.

  91. I was thinking about the amount of work it would take to update 1000 stations that provide daily data online. Daily data has to be averaged to monthly for GHCN, so let’s imagine someone has come up with a nifty program that does this automatically after being fed the numbers, daily min/mean/max temps, precipitation figures etc, converts to GHCN forma, and that it takes about 60 seconds to enter the values for each day. So that’s half an hour to click on the appropriate weather web portal, and then each day in turn (handily archived and readily accessible), and then enter all the daily data for each day for one station, which is automatically converted into a monthly average in GHCN format.

    For a thousand stations/met offices, that’s 500 hours, which works out to 62 days at 8 hours a day. Allowing our human data processor weekends off, it would take about 10 weeks to convert, via the web, the data for 1000 stations. We have a time problem.

    You could divide the work between three people, and they would have to be doing this full time, day after day, month after month. Or you could divide it between 30 undergrads working at NOAA (if there are that many in the right department), and it would only take three days out of their month to do the work. Of course, they would complain, and they would have more than reasonable grounds.

    Of course someone would make a mistake along the way, and someone in the skeptical blogosphere would discover it and accuse NOAA of fudging the data. So I propose this:

    Climate change skeptics should do the work.

    I can think of no one better to oversee quality control than Anthony Watts. Such an effort would seem to be the logical step-up for surfacestations.org.

  92. Good work. My concern is still the UHI and I wonder if you black boxes shed light on this, are they regions of high population growth or energy consumption? In this case would the lower rate of rise in the S hemisphere be be consistent with the UHI?

  93. #120, Thanks Mark, I think some of it could be UHI but there is a lot higher ocean/land area in the SH so that is expected to dampen any land temp rise.

Leave a comment