Trend Calculation Incorporating Seasonal Offsets
Posted by Jeff Condon on February 23, 2010
In this post, RomanM has proposed an interesting concept which was explained in very much approachable terms for us technical folks, I’ve attempted to employ it and explain it further below. Rather than using a fixed offset value or no offset for averaging sevearal temperature anomaly series, he’s used a seasonal method which calculates a different offset for each month of each series. The goal of this method is to combine multiple temperature stations into a more accurate trend curve.
Simple tech side - Roman’s post regards an improved anomaly offsetting calculation which offsets two different temperature stations according to their monthly anomaly differences. The process is a least squares knitting of two temperature time series by month rather than a single least square minimized factor by the total available overlapping data in two or more series.
The rest of us - Roman’s method makes sense when you consider the different climate of various station positions. For instance a station on the East side of lake Michigan (right on the shoreline), experiences winds from the lake for the vast majority of the year. During the spring, summer and fall months, open water moderates the air temp that the station experiences. The result is temperatures which are closer to water surface temperature whereas an inland station air temperature will see more daily/hourly variations. In the winter months, the lake freezes substantially insulating the air from the water surface and therefore the water has less heat conduction and effect on temps. The net result is that the anomaly differential from shore to inland has a seasonal component.
To take this seasonal variance into account, Roman applies 12 offsets to correct by month separate station anomalies. He is the first I’ve seen address this in instrumental temperature record — not that it hasn’t been done somewhere else.
I used a slightly modified version of the getstation4 algorithm which was used in my previous GHCN post. The algorithm automatically sorts stations to see if they are from the same instrument or different. If they are from the same instrument, an average is taken. Normally, if they are different instruments, the algorithm calculates the anomaly and then averages the data into a single series. In this case, the raw data is returned in multiple series.
The function call for the first three stations of GHCN is:
tem=cbind(getstation4(60355),getstation4(60360),getstation4(60390))
Tem is a time series which contains 4 timeseries shown below.
Combining these series isn’t a straightforward matter. Since the algorithm has already determined that these four series are from different temperature stations though, one method might be to simply take the anomaly and average the series. This is what I used in my recent GHCN gridded reconstruction.
Those with some experience might be looking at that 1950 hump and imagining it to be the same as CRU from only 3 station ID’s but the date of the hump is just slightly later than the CRU curve.
I fed the same data into Roman’s algorithm below.
Not terribly different but it has a slightly higher trend. Below is the difference.
There is a substantial annual variance between the two methods with an underlying sudden step in the trends. The last plot is offset by month.
If seasonal variation is taken into account, we can achieve a final result will be superior in accuracy to a standard non-seasonal anomaly combination. The variance in the offset curves above, represents the existence and correction of seasonal signal in the residuals found in knitting different temperature station anomalies together.



steven mosher said
Is this merely auditing ( wink)
John F. Pittman said
wmbriggs.com did this for several climate related questions.It can be reaaly nice. I am presently going to use it to help determine how much energy costs are associated with poor insulation. It would be more difficult and less accurate using some other scheme. He also has a good book for those who wnat to do the fun part of the stats wth R rather than the old way.
RomanM said
Nice work, Jeff.
If I may make several observations:
If the measurements are from the same instrument, you can use this same procedure to combine them into a single station record. Plotting the residuals can help to identify spurious observations.
Also, with regard to the “Difference between no offset and Least squares plot”, two things appear evident. The “annual” variation seems to be a systematic seasonal variation whose magnitude depends on the number of stations present at that time. The timing of the jump is coincidental with the introduction of Skikda and Annaba into the mix. Try plotting the residuals for each station and see what they look like.
By the way, slightly OT, in his latest temperature post, Tamino uses his “optimal method” to calculate grid series. Is he not aware that there may be a “more optimal” way of grid averaging? I’d give him the link to my blog, by I dread the thought of being rejected again.
Kenneth Fritsch said
What am I missing here, Jeff. The prevailing winds in the Chicago area are from west to east and thus would not the east side of Lake Michigan get winds off the lake most of the time? In Chicago we get lake breezes and lake effect snow, but nothing like they get on the other side, e.g. South Bend’s lake effect snowfalls.
Kenneth Fritsch said
Jeff, you leave us with a step difference between methods and without an attempt to explain. Methods are easy. It’s the explanations that are difficult. (I do not do smilies.)
What would a breakpoint calculation show?
Jeff, the stations you analyzed here, as I recall, are at the very beginning of the GHCN data set. At this rate we will never finish.
Jeff Id said
#5 Kenneth, I think Roman has it right, the no offset method would result in a fixed step value when new series were introduced. I don’t know which series caused it though.
Jeff Id said
#4 should be east — damn.
Luke Skywarmer said
#4 and #7
It is not so simple ( looking for links to verify )
the wind blows inland during the day and then reverses at night blowing lake ward after dark.
relative to land the lake is warmer at night and colder during daylight.
looking for links, this also works along the ocean.
Genghis said
Where I live in the mountains in Utah, the nearest airport is down in the valley. Typically the valleys highs are 10˚F hotter in the Summer and 10˚F colder in the winter depending on whether an inversion develops or not. If we get a nice inversion we can easily get 50 to 60 degree weather while the valley gets a nice cold 15˚ for a month at a time : ) The Indians called it the white death I think.
The distance as the crow flies from my houses temperature gauge to the Airports temperature gauge is less than 8 miles, but an 11,000′ mountain range seems to change the climate a little.
steven mosher said
whatever Tammmy wont let any comments whatsoever about this.. This could be a new sport.
a tammy slam
Kenneth Fritsch said
Luke @ Post #8:
Do you live in the Chicago area?
See this link on the Windy City:
http://www.usatoday.com/weather/resources/askjack/archive-windy-city.htm
Jeff Id said
I used to live in Kalamazoo and Grand Rapids. It’s not often I noticed a reversal. People don’t know how much those lakes affect weather. The amount of snow within 1/2 mile of the shore is often several times greater than 50mi inland.