Global temperature by combining surface and satellite data

This is a complex guest post by Nic L.  It represents early results of a method he and Ryan have been working on for global temperatures.  This post uses the concept of regularized least squares, similarly to the RegEM method of Steig et al, whereas you have two datasets, one with more trend information(ground stations), the other with more spatial information (satellite).  There is a huge amount of work here and the concept is fully publishable, especially when you see the huge warming result at the end ;).  Because of that, this post would probably be one which would make the cut for Tamino or RealClimate, however, this post makes no effort to correct for surface station problems but rather is about the uniqueness of the method and the issues involved with a complex problem.

——NicL 11 February 2010

Introduction

My earlier post on long-record GHCN stations sprang from my interest in using them to create an improved global gridded temperature history from 1850 on, by combining the data from a set of good quality long record surface stations, augmented by ocean data, with the gridded satellite TLT (lower troposphere) data produced by UAH and/or RSS since 1978.

A well known example in the peer reviewed literature (Steig et al, Nature, January 2009) applied this sort of approach to reconstructing a 50 year gridded temperature history of Antarctica using a relatively sparse set of ground stations, many of which had incomplete records and some of which did not cover the full 50 year period, in conjunction with AVHRR satellite TIR (thermal infra-red) data.  Steve McIntyre started an investigation of Steig’s Antarctica reconstruction and Ryan O, Jeff Id and myself took this up and spent a lot of time and effort looking into Steig’s work and finding ways to improve on it.

Following on from that work, I have been seeking to apply some of the same techniques on a global scale.  Ryan has also begun exploring this area, we have been exchanging ideas and the plan is to work on it together, hopefully along with Jeff. What is presented here is very much a “proof of concept” first cut approach.

The rationale for combining in-situ surface temperature data with satellite data is that the surface data is available going back much further than satellite data, but is spatially very incomplete, whilst the satellite data is spatially complete (or nearly so) but is available for only the last thirty years. Using mathematical techniques, spatial information from the satellite data can be combined with the temporal information from the surface data to produce a gridded reconstruction over the entire period that surface temperature measurements are available.

Combining data from satellite and surface measurements in the proposed way relies on the assumption that the pattern of spatial correlations of temperature fluctuations shown by the satellite data was the same in the pre-satellite era.  Over the relatively limited time period involved in this case (160 years), that seems a reasonable assumption.  It can be tested to an extent, but I have not attempted to do so at this stage.

There is more than one way of using the satellite data. Because of the very large number of grid cells, it is necessary first to reduce the dimensionality of the satellite data.  This is generally done by performing a singular value decomposition (SVD) of the matrix of monthly satellite temperature data. This is performed after adjusting the data to anomalies by subtracting from each calendar month the mean thereof, and possibly also then normalising the data to unit monthly standard deviation.  SVD is used to convert the data into a set of principal component time series (PCs) and their corresponding spatial eigenvectors (weights), together defining EOFs (Empirical Orthogonal Functions), ordered according to their contribution to the total covariance in the data, such that each sequential EOF accounts for the maximum possible amount of the total (remaining) covariance.  This procedure is also known as Principal Components Analysis (PCA).

Only a limited number of PCs /spatial eigenvector pairs are retained for the reconstruction, on the basis that higher order ones mainly represent noise.  Steig only retained 3 PCs for Antarctica – almost certainly too few properly to capture the spatial distribution of temperature fluctuations there.  Mann in his latest 1500+ year proxy global reconstruction retained a maximum of 7 PCs from the gridded instrumental data.  But in order to obtain a reconstruction with reasonably detailed spatial information it is necessary to use rather more PCs than that.

Steig used the longer record ground station data to complete the satellite record for the pre-satellite period using the TTLS (truncated total least squares) version of RegEM, an iterative regularized expectation maximization algorithm devised by Tapio Schneider and much used by Mann et al for “hockey stick” paleoclimate reconstructions.  With TTLS RegEM, the regularization (involving a reduction in the effective number of free parameters in the model) takes the form of including only a fixed number of EOFs in regression used to impute missing values.

The choices made by Steig in his procedure seemed unsatisfactory, albeit partly constrained by limitations of the original RegEM program.  But in principle it may be possible to produce a satisfactory reconstruction using a purely RegEM approach, at least where the temporal evolution of the satellite data is not only reasonably correlated with the corresponding surface data but broadly consistent with it as to trends.

Another approach to producing a gridded reconstruction is to perform RLS (regularized least squares) regression of the surface station data against the satellite spatial (eigenvector) structure, as pioneered by Ryan O in relation to Antarctica.  This gives a set of estimated PCs for the entire time period that are then applied to the corresponding satellite spatial eigenvectors to produce a gridded reconstruction.  The surface station data is generally infilled using RegEM before the RLS regression is performed.  This method essentially involves the same equations as used in RegEM, save that the spatial correlations between satellite grid cells are used rather than the correlations between the surface data and the satellite data PCs.

The RLS method has an advantage over a pure RegEM approach in that the regularization can be more finely controlled and optimised to suit the data, with regularization taking place both in the initial RegEM surface data infilling and in the RLS regression.  On the other hand, it assumes that all surface station data is equally representative of the mean temperature for the grid cell occupied by the station, which (whilst a reasonable approximation in the case of Antarctica) is not necessarily the case.

I initially intended to concentrate first on the pure RegEM approach, which is simpler.  However, much of the work involved is common to both methods, so I decided to try both.

Satellite data

The TIR AVHRR data used in Antarctica measures skin temperature, which one would expect to be reasonably close to near-surface air temperature as measured by the ground stations. TIR measurements can only be made with fairly clear skies, but despite the need for some heavy cloudmasking the AVHRR data generally shows good correlations with the ground station data, on a monthly basis.  Unfortunately, over time the satellite data exhibited substantial drift and inhomogeneities.  Further, AVHRR data is only available pole-wards of about 50 degrees, so is not suitable for a global reconstruction.

In contrast, microwave satellite TLT data is available globally and does not require cloudmasking,  although close to the poles and over high terrain its accuracy is poor.  Further, the TLT data is thought to suffer relatively little from drift and inhomogeneities (although RSS’s version does appear to have had some inhomogeneities in the early 1990s).  In case you aren’t familiar with the picture the TLT data gives of temperature trends worldwide since coverage started, here is a map of them, using UAH’s data. The concentration of warming in the higher northern latitudes is very evident, although the projection used over-represents the area affected.

It is also relevant to look at the distribution of variability of the TLT temperature data.  The mean monthly standard deviation of the TLT data over land is about 25% higher than over sea.  However, the bulk of this difference is due to the strong positive correlation between standard deviation and distance from the equator, as shown in the following map.

tlt.sd.map

In contrast to the AVHRR data, the TLT data does not measure temperature at ground level  (although it does measure air temperature and not ground skin temperature). This means that there is a fundamental difference between the reconstructions produced by the pure RegEM and RLS methods. The former results in a reconstruction of TLT temperatures, as it is the TLT data (as represented by its lower EOFs) that is being imputed, with the regression coefficients reflecting the correlations between changes in TLT data and in surface data.  But as the RLS method involves correlations between satellite data in different places  (used, where ground stations are located, as a proxy for correlations between surface temperatures in those locations) and not between satellite and ground level data, the reconstruction is of surface temperatures.  (These are my views; Ryan doesn’t necessarily agree with me on this – yet. J)

Whichever type of reconstruction is intended, the first step is to perform a SVD of the satellite TLT data and to examine the results.

I only used UAH’s TLT data, as their data seems to have been more homogenous than RSS’s and I have more confidence in their methodology.  The UAH TLT data is available as mean monthly anomalies from December 1978 onwards at http://vortex.nsstc.uah.edu/data/msu/t2lt/tltmonamg.xxxx_5.2, xxxx being the year involved.  I took data only up to November 2009, so as to use only complete years.  The whole globe is covered in 2.5 x 2.5 degree grid cells, but UAH defines global as being from  -85 to +85 latitude.  As mentioned previously, data near the poles is of limited use.

The further from the equator one gets, the smaller each grid cell is.  It is appropriate to correct for this before taking the SVD, otherwise areas towards the poles have a disproportionate influence.  Ryan and I have together looked at how best to do this.  One solution is to make each grid cell cover a larger range of longitude as latitude increases (N or S).  However, I opted here to weight the grid cells by the square root of their areas.  This has virtually the same effect as merging data into longitudinally wider grid cells (square root since SVD minimises a RMS function), once the spatial eigenvectors have been reweighted by the same factors. Weighting grid cells retains the conventional format and also has the advantage of preserving the finer spatial resolution available in the original data at higher latitudes.  However, these are relatively minor advantages and come at the expense of a considerably increased computational load and the need to worry about weighting, so I may well in future instead use longitudinally wider grid cells as latitude increases. Ryan has written some helpful code for that purpose and for plotting data in that format on a more realistic global projection than the rectangular format employed here.

There is one other weighting issue.  In Antarctica, Steig scaled all the satellite data to unit variance before taking the SVD and deriving the principal components and spatial eigenvectors.  That means that the SVD was calculated using the correlation matrix of the data rather than its covariance matrix.  The strong dependence of standard deviation on latitude could be seen as a reason for preferring to use a correlation basis.  However, as Steve M pointed out in his reply to Huybers (http://climateaudit.org/?p=405) in the equivalent context of PCA, the statistical authorities favour using covariance where (as here) the data are measured in commensurate units.  Accordingly, I have not scaled the TLT data to unit variance before taking the SVD.  I may investigate the effects of doing so at a later stage.

SVD eigenvectors of global TLT data

The first six of the spatial eigenvectors resulting from taking the covariance SVD of the thus weighted global TLT data from -85 to +85 latitude are as follows.  Their signs are of course arbitrary.

The correlations between adjacent TLT grid cells are generally high.  But, as the generally horizontal orientation of the first six spatial eigenvectors suggests, the correlation between longitudinally adjacent (E-W) TLT grid cells is considerably closer to unity than that of latitudinally adjacent (N-S) cells.

The below graph shows the pattern of the SVD eigenvalues, each approximately proportional to the mean standard deviation of that part of the fluctuations in the data accounted for by the corresponding EOF.  It will be seen that, apart from after the first seven eigenvalues, there are no steep declines in the curve that would suggest an obvious point at which to truncate.

tlt.85.sqrt.d

The first seven EOFs only account for 40% of the total variability power in the data (the sum of the squared eigenvalues); to get to 90% thereof requires 58 EOFs, and to get to 95% requires 88. However, virtually all of the mean global trend is likely to be accounted for by a fairly small number of the lower EOFs.

SVD eigenvectors of land only TLT data

It may be better to reconstruct global temperatures separately for land and sea, as the surface data available and its characteristics are quite different for land and for sea.  If so, using a land only and a sea only SVD might provide a more suitable representation for the corresponding reconstruction.  It is relevant in particular to see what the land only SVD of the TLT data looks like, as there is likely to be more spatial detail over land than over sea.  The first six spatial eigenvectors thereof are shown below.

Surface data – Land

The next step is to build suitable sets of surface data, both land and sea. I will start with land data, where there are many more choices to be made.  l used GHCN land station monthly mean raw data up to December 2009.  I processed it in the same way as described in my previous post (see https://noconsensus.wordpress.com/2010/01/19/long-record-ghcn-analysis/).  That gave 7280 stations, excluding stations with no post 1800 data.

Raw data was used in preference to adjusted data mainly because of the greater extent of raw data, particularly outside the USA and for earlier dates, but also partly because of concerns that the adjustment process could contain biases and did not in any event deal with urban heat island (UHI) and other effects of changing land use.  In the case of the USA, adjustment should arguably be made for two widespread cooling biases.  These are the change in time of observation, about which there is little debate, and possibly the change from liquid in glass thermometers to electronic sensors (MMTS) – although the effect of the latter is not entirely clear, and in any event seems small.  But my view is that, overall, UHI and other land use change effects are likely to result in any bias in the raw data being in the direction of warming.

I wanted to obtain as much data as possible going back at least to 1900, ideally 1850, but there is very limited availability of such data for much of the world outside the 48 contiguous USA states, and data for many stations ceased at some point  in the last twenty years.  The number of stations with data extending from 1850 to the late twentieth century is very limited (only 100 or so with fairly complete records), and most of these are included in the set of stations with good data from 1900 to 1999.  I therefore included stations with data at least from 1900 to 1999 and, outside the contiguous USA, stations with data at least from 1950 to 2005.

Initially I required not only that there be some data in the start and end year but that no more than 20% of data be missing over the whole qualifying period.  However, I relaxed this to 30% for the 1900-1999 stations. Whilst doing so increases the number of 1900-1999 stations only modestly (from 1034 to 1060) it  noticeably improves coverage in some areas of the world.

My preference was to include only rural stations, with a view to avoiding UHI effects, but the coverage of long record rural stations outside the USA is too patchy to permit this.  Additionally, there are instances of stations marked at GHCN as being rural that have in fact become urbanised; this may be a particular problem outside the USA.

The final selection was of all 1900-1999 stations, save for non-rural USA stations, and of all 1950-2005 stations that were either outside the USA or were rural stations in Alaska.  Coverage of the USA from rural 1900-1999 stations (of which there were 463, most of which also have post-1999 data) was generally excellent save for in Alaska, where the nine rural stations meeting the 1950-2005 record criteria were included.  The final data set contained 1123 stations.

The precise selection criteria are as per the following lines of R code, “araw” being the matrix of monthly anomalies temperatures for the 7280 stations from January 1850 to December 2009 and “tinv” the list of those stations with their metadata:

### create logical indexes of stations with at least one month’s data in the year concerned:1900, 1950, 1999 or 2005

araw.1900.idx=!is.na(colMeans(window(araw,st=1900,en=c(1900,12)),na.rm=TRUE))

araw.1950.idx=!is.na(colMeans(window(araw,st=1950,en=c(1950,12)),na.rm=TRUE))

araw.1999.idx=!is.na(colMeans(window(araw,st=1999,en=c(1999,12)),na.rm=TRUE))

araw.2005.idx=!is.na(colMeans(window(araw,st=2005,en=c(2005,12)),na.rm=TRUE))

### create vectors of number of months with missing data for each station in periods 1900-1999 and 1950-2005

araw.55.mis=colSums(is.na(window(araw,st=1950,en=c(2005,12))))

araw.100.mis=colSums(is.na(window(araw,st=1900,en=c(1999,12))))

### make index of all GHCN stations with raw data for some month in each of 1900 and1999 and <30% missing data

araw.100.30.idx=araw.1900.idx&araw.1999.idx&(araw.100.mis<360)  ; sum(araw.100.30.idx)  #[1] 1060

### make index of all GHCN stations with raw data for some month in each of 1950 and 2005 and <20% missing data

araw.55.20.idx=araw.1950.idx&araw.2005.idx&(araw.55.mis<132) ; sum(araw.55.idx)  #1808

### make indexes of rural stations, of all non-USA stations and of 9 rural Alaskan stations included in  araw.55.20.idx

araw.exUSA.idx=(!(tinv[,1]==425)); sum(araw.exUSA.idx)  #  5359

araw.rural.idx=(tinv[,9]==”R”) ; sum(araw.rural.idx)  #  3912

araw.55.r.idx=araw.1950.idx&araw.2005.idx&(araw.55.mis<132)&araw.rural.idx ; sum(araw.55.r.idx)  # 897

araw.55.20.alaska.idx=rep(FALSE, times=7280);  araw.55.20.alaska.idx[c(3361,3364,3370,3375,3403,3404,3405,3410,3419)]=TRUE

### make index of all 1900-1999 stations save USA non-rural plus non-USA (and rural Alaska) 1950-2005 stations

araw.rls.idx= (araw.100.30.idx&(!araw.exUSA.idx&araw.rural.idx|araw.exUSA.idx)) | (araw.55.20.idx&araw.exUSA.idx)|araw.55.20.alaska.idx ; sum(araw.rls.idx)  #1123

[Note that I have not included other R code used to generate the results given in this post; there is much code and it is not currently in a sufficiently organised state for posting.]

The two following maps shows respectively the locations and 1950-2005 trends of the1123 stations selected for the global reconstructions; stations with small trends show up much more clearly on the location map.  [Ignore the number 1132 in the headings of these maps; I made a transposition error in my typing of 1123.]

araw.rls reconstruction stations  locations big

The mean 1950-2005 trend of the set of 1123 stations is 0.104 degrees C /decade (all trends given hereafter are in degrees C /decade).  That is much higher than for rural stations only, at 0.066, but the rural station set is dominated by those in the USA.

Trends over 1900-2005 for stations with data stretching back to 1900 are shown in the below map.  The much sparser coverage outside the USA of stations with good records back to 1900 as compared  to 1950 is evident.  This map includes 383 non-rural USA stations that do not form part of the 1132 strong reconstruction set, but their mean trend of 0.020 degrees C /decade is close to the overall mean of 0.026, so the mean excluding them would be only slightly higher.  The 453 rural USA stations had a mean trend of only 0.007, but the mean 1900-2005 trend of the 30 rural stations in the rest of the world was 0.071, close to the 0.078 mean trend of the 189 non-rural ex USA stations.

araw.100.30 station 1900-2005 trends and  locations 2

For a reconstruction that combines ground station with satellite temperature data, it is important that there is a reasonably good correlation between fluctuations in the ground station data and in that for the satellite data for the grid cell in which the station is located, or (where, for example, the station is close to the edge of that cell, or is on the coast) an adjacent cell.  Encouragingly, these correlations were generally good, with a mean adjusted R-squared of 0.52, as shown in the below scatter plot.  Since the satellite TLT data measures temperatures at an average height well above the ground, and local weather and micro-climate derived fluctuations lead to variability in station data that does not feature in a grid cell average, one would not expect anything like perfect correlation.

araw.rls reconstruction own tlt r-sq

Whilst most of the correlations are strong, many very strong, there are a fair number with low adjusted R-squared.  The below map shows R-squared varies by location.

araw.rls reconstruction tlt r-sq map

It is probably unsurprising that correlations for stations located on small islands, and in some cases perhaps on the coast, are generally low, as sea temperatures behave differently from land temperatures.  But the large north-south difference was unexpected, and is a source of some concern.  Why should most stations south of about 30-40N show much lower correlations than stations to the north?  I have not been able to find any error in my computations, although it is possible that there is one.  Probably there is some good climatological reason for the north-south divergence, of which I am currently unaware.  Fortunately, the correlation of most of the stations south of 30-40N with their TLT grid cell data is significant, even if not particularly high.

The map below shows the regression coefficient (multiplier) of the station data on the satellite data.  This is related to their correlation but also to their relative variance.  Most stations have a coefficient well above 0.4 (the scale is from 0.4 to 1.4).  A good number of stations have a coefficient in excess of 1.2, significantly above unity.  It is well known that in situ measurements of ground temperature show larger trends than do TLT measurements, contrary to the predictions of global circulation models.  This phenomenon, and possible explanations for it, were for instance discussed in the 2009 JGR Klotzbach, Pielke at al paper “An alternative explanation for differential temperature trends at the surface and in the lower troposphere”.  However, I was unaware that a geared relationship often obtained for monthly fluctuations.

araw.rls reconstruction tlt coeff map

Surface data – Ocean

Having assembled a large set of long record ground station, I needed to obtain sea surface data.  Although it would seem logical to use near surface marine air temperatures, the accepted global surface temperature datasets such as HadCRU use measurements of water temperature, taken typically taken 1-10 meters below the surface.  This data is readily available on a gridded basis back to 1850, based on a 5 x 5 degree grid.  There is a fair amount of data missing, increasingly so as one goes further back from the present.  As the measurements generally don’t come from fixed stations, it seems impracticable to use raw data.

I accessed  the HadSST2 data set of adjusted grid cell monthly mean sea surface temperature anomalies at http://hadobs.metoffice.com/hadsst2/data/HadSST2_SST_1850on.txt.gz and scanned the downloaded file into R. In order to get anything approaching reasonable global coverage going back to 1900, I accepted grid cells with up to 30% missing data over 1900-2005 and some data in both those years, in line with the criteria for 1900-1999 ground stations.  724 of the 2592 grid cells met that requirement.  I then added 23 other cells that had some data in 1850-1852 and in 2004-2006 and no more than 30% missing data in between, giving a total data set of 747 grid cells with data starting (for some cells) in 1850.  The distribution of the 724 grid cells is shown in the below map.  It can be seen that coverage, whilst generally reasonable, is weak in the middle of the Pacific ocean and very limited pole-wards of about 40S.

  all grid boxes with some Hadley  adjusted data in 1900 and 2005 and under 30% missing months over that  period 1900-2005

The corresponding map for cells with data in 1850-1852 is sparser, particularly outside the Atlantic ocean, and has virtually no coverage of the Pacific ocean other than coastal areas.

I then carried out a similar investigation to that for the land stations, of the relationship between the sea surface temperature data and that for the corresponding satellite TLT grid cells.  This required a little thought since the HadSST cells four times the area of the TLT cells and are ordered differently, but at least each HadSST cell matches exactly with four TLT cells.  The results are shown in this map.

Disappointingly, in most cases the correlation is fairly low, with a mean adjusted R-squared of only 0.14. The regression coefficients of HadSST data on the satellite TLT data were correspondingly low, with a mean of 0.37.  There are a few areas where the correlations and regression coefficients are much higher; principally the eastern Pacific.

It is perhaps unsurprising that sea surface temperatures, reflecting as they do temperatures 1 to 10 metres below the surface, should correlate poorly with air temperatures measured in the lower troposphere.  Certainly, the greater heat retaining capacity of water is reflected in a much lower monthly standard deviation of HadSST grid cell temperature anomalies (mean 0.8 degrees C) than for land stations (mean 1.7 degrees C), measured over the period for which satellite data exists.

Data for marine air temperatures also exists, although it is even less complete than for sea surface temperatures.  Hadley produces the MOHMAT43 dataset of gridded monthly marine air temperature anomalies.  This is based on night time readings only, to avoid distortions from deck heating, and is adjusted to correct for, in particular, changes in mean deck heights over time.  The series starts in 1856 but data has only been produced up to May 2007.  There is also an infilled interpolated version, HadMAT1, but I decided to work with the uninterpolated version in the hope that the lack of spatial smoothing would result in better correlation with the satellite TLT data.

There are 501 MOHMAT grid cells with some data in 1899-1901 and in 2005, and under 30% missing months in 1900-2005.   The distribution of those grid cells is shown in the below map.  It can be seen that coverage is more limited than for the HadSST dataset.

I feared that MOHMAT grid cells with data going back to 1900 provided too limited coverage to provide a reasonable basis for a global reconstruction, so I decided also to use cells with data in 1950 and 2005 and under 30% missing months in that period.  I then computed the correlation of each of the resulting 988 cells  with the TLT satellite data.  The results are shown in the below map.


mohmat TLT R-sq 0-1
The correlation is generally  higher than for the HadSST cells, but the mean adjusted R squared is still rather low at 0.19.  The regression coefficients of MOHMAT data on the satellite TLT data were correspondingly higher than for HadSST data but remains quite low, with a mean of 0.50.

It occurred to me that the low correlation could partially be due to marine air temperatures, which are quite closely related to sea surface temperatures, lagging those of the atmosphere generally, due to the high thermal heat capacity of the upper layer of the oceans.  I therefore regressed the MOHMAT data on the corresponding TLT data for the preceding month.  This did produce modest correlations, with a mean R squared of 0.09.  However, regressing the MOHMAT data on a combination of TLT data from the same and the preceding month only resulted in a marginally higher correlation than regressing on the current month data alone (adjusted R squared increased from slightly under 0.19 to just over 0.20).

Interestingly, the mean monthly standard deviation for the 988 MOHMAT grid cells was virtually the same as for the 747 HadSST cells, at 0.8 degrees C over the period for which satellite TLT data exists. As for land, but perhaps surprisingly for sea, this is higher than the mean standard deviation of the TLT data for the corresponding grid cells, of 0.7 degrees C.

At this point, I was unsure which was the best data to use for reconstructing temperatures over the ocean areas, so I decided to concentrate on land.

Infilling the surface data – Land

In order to produce a spatially and temporally complete gridded reconstruction from surface and satellite temperature data, it is necessary (or in any case helpful)  first to infill the surface temperature records. That includes extending back, to whatever date the reconstruction is to start from,  the records of stations that commenced after that date.  I had always intended doing this separately for the land and sea data.  Their differing characteristics suggested that the extent to which sea temperatures would help to infill land temperatures and vice versa would be limited, and the sizes of the individual data sets meant that even with separate infilling it would take many hours of computer time running RegEM for the iterative infilling algorithm to converge sufficiently.  In light of the much lower correlation of the TLT satellite data with the sea as compared with the land surface data, I decided also to investigate producing separate gridded reconstructions for land and sea areas.

Infilling the missing data for the land stations was done using ridge regression RegEM.  Since all the data was in the same units, no scaling to unit variance was used, as for the TLT SVD.  The use of covariance rather than correlation is not an option in Tapio Schneider’s original Matlab implementation of RegEM, but the R version I have developed is much more flexible.  Unfortunately, ridge regression RegEM is slow with data sets like this.  Achieving an acceptable convergence level took four days on a fast computer.

Below is a map showing the resulting 1900-2005 trends of the stations after infilling.  The mean trend is 0.032, up from 0.026 for stations with actual data back to 1900, but that station set was much more dominated by the USA.

I was slightly concerned that the large number of USA stations might distort the imputation process.  However, the mean covariance (after imputation) between USA stations and those elsewhere was only 0.07, or 0.04 excluding Canada (where a measurable correlation is to be expected).  By comparison, the mean covariance of USA stations with other USA stations is 1.93.  So it appears that my concern was likely unfounded.  And with the large effective number of parameters in the regression any dominance of the first few EOFs by US station variability should make little difference to the imputation.

The variation of the effective number of parameters retained in the ridge regression is shown below.  The graphs rises sharply up to about 1900 but then levels off before peaking in the 1930s and thereafter fluctuates greatly, no doubt due both to the varying availability of station data and varying levels of agreement between data from different stations.

The mean estimated error of the infilled data matrix, as computed in RegEM, which up till 1940 mirrored the inverse of the effective number of parameters, continued to decline until the1950s and remained generally low until April 2006, when GHCN data for many long record stations ceased.

As the RegEM estimation error is zero where actual data exists, having data from a higher proportion of the data set generally leads to a lower mean estimated error, even when there is then more disagreement between the data (which triggers stronger regularization, causing a reduction in the effective number of parameters).  But also, the further back one goes the greater the uncertainty in the estimates used for infilling, with there being fewer and less well correlated stations to impute from.  Back to about 1900, the mean estimation error – whilst by no means insignificant – is not that high.  Going further back, the mean estimation error rises steeply.

Reconstructions – Land

With an infilled land station set having reasonable geographical coverage and reasonably low estimation error, I decided to have a go at producing some initial gridded land-only reconstructions, based on the land-only TLT data SVD.  The reconstructions are very much experimental at this stage, and I have made no attempt to estimate error bands or carry out verification experiments.

Note that if the SVD of the global TLT data were instead used, it would in principle be possible to produce a global reconstruction using data from land stations only.  I have not presented results of such reconstructions here, as I am not sure how valid they would be over ocean areas.

RegEM reconstructions

I first attempted a reconstruction with RegEM, using a similar approach to Steig in Antarctica.  In this case, unlike Steig’s, the trends recorded by the satellite data for grid cells containing the 1123 ground stations were close to (90% of) the trends recorded by those stations.  I had imputed the missing ground data without using the satellite data, so there was in any case no question of trends in the satellite data affecting

the ground data.  And in order to prevent any of the satellite PCs influencing the imputation of any other PC, I gave all the PCs a very low weighting within RegEM.

I found that ridge regression RegEM wouldn’t optimise the regularization correctly – it is not really designed to deal with a completely infilled data set combined with one that is complete for one period and entirely missing for the remaining time.  So I used TTLS RegEM instead (which is what Steig used).  That meant choosing the number of EOFs to retain in RegEM.  It made sense to retain at least as many EOFs in RegEM as there were satellite PCs to impute, but if the number of satellite PCs used was large it seemed unnecessary to retain more EOFs in RegEM than the number of satellite PCs.  So I used 50 satellite PCs and retained 50 EOFs in RegEM.  As for the ground infilling, I used covariance RegEM.

The 1900-2005 trend map of the resulting reconstruction is shown below.  It is disappointing; frankly it looks wrong, even allowing for this being a reconstruction of TLT temperatures rather than surface temperatures.  There is very little spatial detail, and the negligible trend is inconsistent with data from the land stations and from the established sources of gridded data.

I carried out some experimentation with using different numbers of PCs, which made little difference.  I hunted for errors, and found that I had mis-weighted the result, resulting in a slight error in the trend.  But I have so far been unable to establish where the real problem with the RegEM gridded TLT temperature reconstruction lies.  So I have instead turned my attention to RLS gridded reconstructions of surface temperature.

RLS reconstructions

I decided to use 100 PCs for the RLS reconstructions; experience with Antarctica showed that having many RLS PCs did not lead to over-fitting or other problems.  Regularization was set by randomly withholding a small proportion of the station data consisting exclusively of actual measurements. The program iteratively adjusts the level of regularization so as to minimise the estimation error for the withheld data points. Regularization was performed separately for various sub-periods into which the data was divided.  This was in case the time variation in the effective number of parameters in the ground station matrix led to the optimal regularization strength also being time varying (which indeed turned out to be the case).

The 1900-2005 trends resulting from the 100 PC RLS land reconstruction using the infilled full set of 1123 ground stations are shown in the below map.

Trend map 1900-2005 for land 100PC RLS  on RRR X of araw.rls 55-100 yr 30mis infl1

Although the mean trend of 0.033 looked more sensible than that of the RegEM reconstruction, it still seemed on the low side.  Moreover, the regularization parameters showed that heavy regularization was being applied, indicating considerable inconsistencies between the land and the satellite data.  Increasing the number of retained PCs from 100 to 200 had no effect, as I expected.

Investigation suggested that the heavy regularization was to a substantial extent related to on the one hand, the USA stations by and large having very little correlation with, and lower trends than, the stations in Siberia, whilst on the other hand the satellite data showed a material (albeit not that high) degree of correlation between temperatures in the USA and in Siberia.  The map below shows the regression coefficients of TLT temperatures worldwide on the mean TLT temperature for the USA; the correlation is closely linked to these coefficients.

regr coeffs of TLT land on TLT USA mean
RLS is reasonably tolerant of areas being over or under represented by stations.  In Antarctica, I found that reweighting the RLS regression, so as effectively to be equivalent to having the same density of stations in each of the four main regions, made virtually no difference to the outcome.  However, here the USA is very heavily over-weighted and there is considerable divergence between the correlation information from the stations and from the satellite data. It looked as if the over-representation of USA stations might be leading to USA temperatures significantly influencing estimated temperatures in Siberia and thus skewing the result.

I therefore removed most of the contiguous USA stations, retaining only those whose position in the list of GHCN stations was a multiple of eight, or 62 in all.  That reduced the station set from 1123 to 723 in size.  Producing a RLS reconstruction from this station set resulted in a much lower level of regularization being imposed than with the full station set.  The reconstruction showed trends as per the following map.

This is probably a more realistic result than that from using all 1123 stations.  It appears that the heavy over-representation of USA stations in the full set had indeed been skewing the results, in particular preventing the high trends in Siberia (and Europe) from being fully expressed.  But the cooling in the south eastern USA has gone in the 723 station reconstruction, whilst the cooling in Greenland is greater, which both seem odd. For good measure I also tried selecting every tenth USA station instead of every eighth, but found that doing so made little further difference.

And how does the 723 station RLS reconstruction compare with gridded temperature records prepared directly from station data?   I obtained the following 1900-2005 trend map from GISS (the only provider whose website offered this service).  It is based on adjusted data, and areas with insufficient station records are blank. The patterns are in very broad terms similar, but GISS shows lower trends than RLS in the south east USA and in northern Siberia, and higher trends in Alaska and Greenland.

The (smoothed) pattern of mean land temperature over time per the RLS reconstruction, from 1850 on, is as follows.

The chart of mean land temperature on the UK Met Office website, reproduced below, shows a broadly comparable pattern but with a rather less pronounced rise since 1970.  Note that the baselines of the two charts differ slightly, as do their scales.

Finally, just for completeness, the map below shows a global surface reconstruction based on data purely from land stations, in conjunction with the SVD of the global satellite TLT data.  I am not implying that doing so gives anything like an accurate reconstruction over ocean areas.  However, it is encouraging that the reconstruction for land areas is much the same as that using the SVD of the land only TLT data (as should be the case).  Note that the trends stated in this map are unweighted.  The area weighted trends are 0.0356 (global) and 0.0633 (land).

Trend map 1900-2005 for global 100PC RLS  on RRR X of 62US araw.rls 55-100 yr 30mis infl1

Conclusions

This is just an initial investigation of the possibilities for reconstructing global temperatures by using surface and satellite data in conjunction, so it is premature to draw any definitive conclusions.  But my initial thoughts are that the generally good correlation of the satellite TLT data with land station records suggests that using the two data sources in combination could work well.  The reasonably sensible results of the initial RLS land reconstructions seem to support that view, although the poor results from the initial RegEM reconstructions are disappointing.

The prospects for combining in situ ocean data with satellite TLT data seem rather less good. The low correlations of sea surface temperatures and (to a lesser extent) marine air temperatures with TLT data are disappointing.  But it may be that the TIR AVHRR data from polar orbiting satellites will provide a better match with in situ temperature measurements.  Further investigation is required.

Nic L

60 thoughts on “Global temperature by combining surface and satellite data

  1. From paragraph eight:
    “…..an iterative regularized expectation maximization algorithm devised by Tapio Schneider….”

    This seems a compact expression, but “expectation maximization?”

    What is this?

  2. # 1.

    In summary, an expectation-maximization (EM) algorithm is used for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM is an iterative method which alternates between performing an expectation (E) step, which computes an expectation of the log likelihood with respect to the current estimate of the distribution for the latent variables, and a maximization (M) step, which computes the parameters which maximize the expected log likelihood found on the E step. These parameters are then used to determine the distribution of the latent variables in the next E step.

    This is taken from http://en.wikipedia.org/wiki/Expectation-maximization_algorithm , where a much more detailed explanation is available.

    EM methods are very well established. RegEM is a more recent extension of EM, and has been widely used (sometimes misused) in climate field reconstructions.

  3. Good morning Nic,

    This bit left me confused

    “However, I opted here to weight the grid cells by the square root of their areas.”

    I don’t understand why you wouldn’t have used just the area for this rather than the root.

  4. Greetings Jeff,

    Yes, it’s not intuitively obvious that one should weight cells by the square root of their area when taking the SVD. Initially I weighted by area, but I found that the resulting PCs differed from those resulting from regridding the globe into equal area cells as per Ryan’s code. After a little thought, I realised why.

    In technical terms, the SVD minimises the Frobenius norm of the difference between the original matrix and each truncated SVD approximation, given the number of retained EOFs. The Frobenius norm is an RMS measure, so the differences arising from each (weighted) cell get squared before being summed and the sum minimised. hence in order for area-weighting to give the same result as regridding one must weight by the square root of the areas, not by the areas themselves.

    Weighting by the square root of each grid cell’s area produces identical PCs to those of the SVD of the cells resulting from regridding to equal areas. As the SVD is of the original cell temperature values multiplied by sqrt(cell area), one has to divide the resulting spatial eigenvectors by sqrt( cell area) in order to get back to the original temperature values.

    When calculating weighted worldwide temperature trends, the weighting I applied is by cell areas, not by their square root.

  5. On another weighting issue, I have now implemented a weighted RLS algorithm. The weighting doesn’t scale the solution, but it affects how much influence each station has on the RLS regression. As it is, a station with twice the variance of an otherwise similar station has twice the influence in determining the outcome of the RLS regression.

    This doesn’t matter too much if the station records are, broadly, consistent with each other (having regard to the correlation between the TLT data for their respective locations). However, the Siberian and other Russian stations have much higher standard deviations than those in the southern half of the USA and all other places apart from Canada and Alaska. As well as their standard deviations, their trends are also generally higher. It is therefore possible that my reducing the contiguous USA station set to 62 stations resulted in the Siberian stations having an undue positive influence on worldwide trends.

    Keeping the set of 62 USA stations (723 stations in total), but weighting all stations by the reciprocal of their variance, restored the area-weighted mean land trend over 1900-2005 to the same value of 0.033 degrees C /decade as resulted from unweighted RLS using the full set of 462 USA stations (1123 stations in total).

  6. # 5. Correction. Variance-weighted RLS on the infilled records of the 723 stations gave a mean 1900-2005 trend of 0.030, not 0.033. The 0.033 figure was for variance-weighted RLS on the original, non-infilled, records of the 723 stations – a variant that I call “direct RLS”.

    Although my implementation of RLS includes direct RLS and has no difficulty with missing values, it is essential that all the data is set to a common baseline. That was a problem in Antarctica, but not in this case. All 723 stations have sufficient data over the 1951-1980 period. Whether normal or direct RLS give a more accurate reconstruction in this case remains to be seen. My frst impression is that the differences between them are minor.

  7. Nic,

    One of the lessons we learned from the Antarctic work was that the sampling density determined in large part the final trend. JeffC (a different one) found that by regridding the surfacestations the peninsula region oversampling had a dramatically reduced effect on trend and overall the continental trend dropped to 0.07 which is, of course, very close to what is the actual trend.

    https://noconsensus.wordpress.com/2009/02/20/satellite-temperature-trend-regridded/

    Anyway, after seeing the same effect in the US here, it would probably be reasonable to use gridded temperatures rather than individual stations in the setup of this calc. It makes some sense to me that oversampling an area with noise would tend to over-distribute that regions information through random correlation.

  8. NicL, I am most impressed with the statistical tools that you have applied to this problem and the apparent ease with which you do it.

    I have not read your thread introduction in any detail, but I intend to do that soon as I have been doing my caclulations of temperature trend variations within 5 x 5 degree grids using the adjusted GHCN data set.

    I have found that to obtain a reasonable number of stations within a 5 x 5 grid around the globe that I needed to used the time period 1950-1990. The US has lots of temperature station data for the period 1900-2005, but when you go outside the US, the station numbers drop dramatically.

    I will summarize my work soon after I do some Monte Carlo simulations using the trend data and its variations. What I intended was to estimate a global and regional variation of a mean temperature – given the station coverage and its variations with regards to trends.

    I do not know how my appraoch will tie into yours (although I know it will be confined to a simple minded approach in line with my talents or lack thereof), but what I have found is trends within a 5 x 5 grid can vary more dramatically than I suspected they would. I have not done any correlation calculations but I do see that while the trends can have large variations, the time series shapes and breakpoints do seem to conform fairly well.

  9. @Kenneth Fritsch8

    If I am understanding correctly your MonteCarlo efforts, the goal is to demonstrate the range of ‘sensitivity’ (per gridbox) to variations of the records within it. I imagine that this would show the sparsely-filled boxes are more sensitive to variations of a single record than the US boxes.

    This seems like an important measure of the validity of any/all reconstructions based on the spatially (and temporally) sparse records.

    So if you are doing what I imagine, I hope you can soldier on with it, and that it shows a noteworthy spectrum of sensitivities.
    Else, perhaps you will consider a ‘sensitivity’ plot along the way.

    I am thinking of a ‘sensitivity’ analysis as it is done in Electronic Circuit Analysis, where it is used to deduce the need for low-vs.-high precision components in a mass-produced analog design.

    Please pardon the semi-OT nature of this comment. I can make no pretense of addressing NicL work.

    Perhaps this is all obvious and/or irrelevant.
    In any event, thanks to all for your diligent efforts which exceed my capacity.
    RR

  10. # 7. Jeff,

    I think that in view of the effects of changing the number of US stations it certainly make sense to try using gridded temperatures. I will do so and report the results. For Antarctica, whilst using some methods the high station density in the peninsula does indeed affect the continental trend, that doesn’t appear to be the case when using RLS.

    Even with data that is gridded, there is the question of whether to weight the data so that high variance series have no more influence than low series ones, as is done in standard RegEM, for instance.

  11. # 8. Kenneth,

    I agree that station availability outside the USA (and to a lesser extent Europe, Siberia and India) before 1950 is a real issue. It makes one wonder what the true error bounds for global temperatures before 1950 are, even leaving aside concerns about the quality of the data as to inhomogeneities, gradual UHI effects and other non-climatic influences.

  12. Hello all, I hope you guys are having a good time with this. Please let the rest of us know when the English translation is available, if ever. 🙂

  13. # 13.
    Yes, Australia, NZ and other Pacific & Asian nations are included, except any too small to appear as land on a 2.5 x 2.5 degree grid. I can’t spot which map doesn’t show them.

    # 12.
    Yes, this is a rather technical post. Sorry about that. Hopefully these investigations may lead to more easily assimilated results later, but this is early stage stuff.

  14. I understand that you are familiar with US data. It is very likely that the GHCN database does not have all the data available outside the US. Warwick Hughes http://www.warwickhughes.com/blog/ has been a leader in looking at temperature manipulation at East Anglia and is mentioned in some of the leaked emails. You could contact him as he has average raw data of longterm stations in Australia covering the whole country.
    There are hundreds of long term weather stations in Australia. Data can be downloaded from the following site http://www.bom.gov.au/climate/averages/ but unfortunately all data has not been uploaded from manually operated sites which are now closed in favour of automatic stations at nearby airports. Many of the closed site at present only show rainfall data. Also, there is no information about changes of equipment and surrounds listed on the database. This has to be tracked down in manual files which Warwick Hughes has done for a large number.
    It has been indicated, elsewhere, that the most of the data in the GHCN database for Australia now comes from sites based in large cities and show UHI effects. No doubt you are aware of the post on WUWT about the stepped changes at Darwin.

  15. # 16.
    Thanks. I will look at the Australian data you refer to. I was aware of problems with Darwin. At this stage, I have been concentrating on methods of reconstructing global temperatures using surface data in conjunction with satellite spatial information rather than on trying to get the best of the long term surface data. But that is obviously important.

  16. The core problem is with any approach that uses a grid. When you move from sensor data to grids you move from measurements to prediction. There has been no mathematical proof as to how valid (uncertainty, error, pick your poison) the temperature of a grid is. Given the natural variability of temps over a few hundred kilometers (something that can be measured any day of the year on thousands of places on the planet to establish a baseline) the guesstimated temperature for 500×500 km grid is probably only accurate to +/- 2-3°C (all this is natural variability), assuming probably 50+ data points. Grids with 1-5 data points are worse. Grids without data points and filled by predictions based on nearby grids are worse still.

    The fact is we can use satellite data to determine who far off these grid estimates are. Just as we have locals where we can use 50+ data points to determine the natural variability over distance we have satellite data in all areas which can be used to derive the drift in these grid predictions relative to the satellite baseline.

    I don’t have to go too far out on a limb to predict that the GISS, CRU and NCDC grid temps are pretty lousy, and completely incapable of detecting changes below a degree.

  17. #7.
    I have now generated RLS land reconstructions based on gridded station data. I took the infilled full 1123 station series as described, centered all series on 1951-1980, then for each of the grid cells that had more than one station in them I took the mean of the station data. 700 grid cells in total were occupied. Using the gridded data unweighted, the area-weighted mean 1900-2005 reconstruction trend is 0.058 degrees C /decade. This is much closer to the 0.062 I originally obtained using the reduced set of 62 USA stations than the 0.033 using the full set of 1123 stations (462 in contiguous USA). That supports the view that, without gridding, the heavy over-representation of the USA in the full station set distorts the reconstruction. Even with gridding, there are a disproportionate number of occupied grid cells in the USA.

    Weighting the surface data for each occupied grid cell by the reciprocal of its variance, to equalise the influence of all gridded data series on the reconstruction, gave an area weighted 1900-2005 trend of 0.039. This is slightly higher than the area-weighted trend of 0.038 for the weighted reconstruction using the reduced set of 62 USA stations. (The mean trend of 0.030 that I stated for this reconstruction in #6 was not area weighted.)

  18. AJSrata:

    I don’t have to go too far out on a limb to predict that the GISS, CRU and NCDC grid temps are pretty lousy, and completely incapable of detecting changes below a degree.

    I have to disagree with you here. In terms of relative changes (“precision of measurement”) they can reliably detect differences of tenths of a °C/century, and are probably accurate to (at worst) 0.25°C/century.

    Regarding your criticism of gridding, the first thing to recognize is the variation in temperature trend is organized strongly south-to-north (so it makes sense to average it over bands of equal latitude), see NicL’s first figure above. Secondly, the trend varies slowly enough with latitude that one can average over as much as 5-degrees safely, [there is a standard theory of binned data that applies here.

    There is a difference between land and ocean (presumably because of differences in the boundary condition for ocean surface versus land surface with respect to the atmospheric boundary layer), so it makes sense to divide up the ocean and land and treat them separately. One can go further and separate out coastal/island climes if you wanted too as a third class of temperature fluctuations.

    In terms of your comment on natural variability, by integrating over bands of equal-latitude you substantially increase the surface area you are averaging over which dramatically reduces the amount of fluctuation left in the data: To a large extent you are spatially filtering out many of these regional-scale variations by averaging over a large area. A large area is needed because, by and large, regional scale temperature fluctuations are highly correlated over distances (at equal latitude) for up to 1000 km.

    I don’t know what you mean by “mathematical proof”, perhaps you mean a theoretical framework that can be written out and rigorously justified. It can certainly be written out, perhaps somebody else knows of a formal description for climate data (I don’t)

    I’ve worked on it a bit on the side, and what you end up with is a “teleconnections” function that relates the temperature trend averaged over any regional trends (with local fluctuations removed) to the global mean trend. If a really good global climate model existed, you could reliably compute this teleconnections function pretty much from first principles using known climate inputs.

    For proxy reconstructions (where you will never have good global coverage) you actually need this teleconnections function to reconstruction global temperature to extrapolate your very limited regional-scale proxy values to global mean temperature.

    For climate, one is dealing with temperature trends rather than just temperature, and trends measured over long time periods (30 years+).

    [There are some comments I made in this thread which address how I think one should go about developing a formal methodology. Criticisms obviously are welcome.]

  19. Luke, here’s my version of the latitude effect. I started from CRUTEMP3v rather than the raw GHCN data and this looks at the period 1960-2009 inclusive.

    Tdot(theta1,theta2; t1, t2) is the temperature trend (dT/dz) for temperatures in the range of latitudes theta1 to theta2 averaged over the period t1 to t2 (t2 > t1), shown in units of °C/century.

    This is land-only trend of course (centroid of the data is at 15°N).

  20. I’m not sure of some of the basic assumptions here. For instance, the earth has been in a warming mode during most of the satellite era – it started just at the end of the previous cooling period. I think you’ll need at least a full 60-70 year cycle of data to make this work.

  21. This is an exciting project Nic. No doubt you have worked through much of the methodology with Antarctica.

    I have just now read the article and have not had a chance to re-read and study so I apologize if this question has already been dealt with. Since there is greater HF variability with TLT when compared to surface, does this have potential to introduce unwanted bias?

  22. This work is good as a concept. However it does not seem to circumvent the various isues with GHCN data that have been found by Chiefio. Nor does it avoid the type of issues uncovered by the surfacestations project. Nor does it circumvent the socioeconomic impact on surface temperature measurement that has been confirmed by Ross McKitrick. Nor does it get around the types of issues recetly found in respect of the Darwin or Mackay temperature records.

    Nor does it get around the station choice issue. For instance there are only around 1 dozen surface stations currently reporting from NZ into GHCN. But there are in reality more than 100 current active stations with > 20 years continuous measurement, and more than 350 NZ stations that have more than 20 years continuous measurements at some time during the last 110 years. The opportunity for cherrypicking by GHCN is clear for NZ data just as it is clear for US data. There is a very clear urban bias in the NZ station data currently being inputted into GHCN. Many other Countries will be affected in a similar manner.

    When these types of issues are dealt with there might be some prosect that a project of the type started by Nic L could produce a really useful result.

  23. #24. It shouldn’t matter that the satellite data is taken from a warming period, provided that the spatial correlations of temperature movements are the same whether temperatures on the whole are going up or down. I agree that is an assumption that should be tested, so far as it is possible to do so.

    One obvious way is to split the satellite data into two or three different periods and see if the pattern of correlations (as shown by the spatial eigenvectors) is similar. I shall do so. Over the last 11 years, temperatures generally have not been rising.

    The work in Antarctica, where the satellite data showed very high (and mainly spurious) trends showed that the satellite spatial correlations were little affected by the trend of satellite temperatures.

  24. RE 26.

    I dont think you understand what Nic L is doing. Best to stand back and watch.

    “this post makes no effort to correct for surface station problems but rather is about the uniqueness of the method and the issues involved with a complex problem.”

  25. #25. Layman Lurker,
    Thank you for your comment and interesting question. Overall, the TLT data shows lower HF variability to the surface station data. If temperatures for a grid cell fluctuated equally at the surface and in the lower troposphere, then one would expect more HF variability in surface station data due to random very localised fluctuations, the surface data being a point measurement whilst the satellite TLT data is an area average over a fair sized area.

    In such a case the regression coefficient of surface data on the TLT data for the corresponding satellite TLT grid cell should however be approximately one. That is the case on average for the set of 1123 stations I selected. However, the relevant map (unnumbered, I’m afraid, but just before the Surface data – Ocean section) shows a clear general north-south divide, with the regression coefficient being below one for most stations south of roughly 20N and modestly above one for most stations north of that latitude. The pattern of regression coefficients is similar, but by no means identical, to the pattern of (squared) correlations shown in the preceding map.

    Some of the north-south difference is no doubt due to known poor quality of the TLT data over higher areas (e.g., Himalaya and Andes mountains, Tibetan plateau) and, for island and coastal stations, to land /sea temperature fluctuation differences. But I do not know the reason for the general north-south divergence in the strength of the relationship between surface and satellite temperatures. Further investigation is required. Possibly it is related to the greater proportion the ocean makes up of the southern hemisphere. And a different relationship between station and satellite data might be expected polewards of about 45S and 45N, where the satellite data shows much higher variability than elsewhere.

    To answer your original question, I don’t think differences in HF variability between the satellite and surface station data should in itself bias the reconstruction, whether the RLS method or RegEM is used. It is less clear whether variations in the relationship betweeen the satellite and surface station data, particularly as to the regression coefficient between them, could bias the result. My initial thinking is that, whilst in principle they could, in practice any distortion is likely to be fairly small as the TLT grid cell correlations that drive the reconstruction are mostly within regions and most of the major differences in the relationship of station data to local TLT data arise in regions that are well separated.

  26. I’m afraid I don’t understand it either, Steven.

    Raw station data is 100% useless for determining anything at all. We can argue with “the powers that be” about whether or not they’re applying _correct_ adjustments, but there is no argument that adjustments must be made to raw station data to get something useful.

    You can’t even assume UAH data can be used to make “better” adjustments to the raw data. Adjustments need to be made based on the characteristics of the individual raw readings, not based on the characteristics of some other data set (UAH) that is assumed to be “better”. The information needed to make correct adjustments to raw surface temperature data isn’t going to be found in UAH data. It’s just not there.

    And, for the record, I’ll dispute the notion that UAH data is “better”. That’s not a defense of surface readings, it’s just a statement of fact that _no one_ has vetted UAH data to anything close to the extent that surface readings have been vetted. Few folks, I would guess, even know that UAH data is built using statistics that make adjustments to the raw readings more than 3 times larger than the size of the largest anomaly ever reported by UAH, and more than 10 times larger than the average anomaly.

    If I’m misunderstanding what’s going on here, I apologize. I’m coming into this cold. And I do want to say I appreciate the effort being done here. It’s good to see folks trying to take control of what seems to be a completely out of control process. But if I am misunderstanding things, perhaps an explanation targeted at folks like me who just happen to “wander in” would be useful. Specifically, how, in detail, this process hopes to create a better record of surface temperatures.

  27. P.S.

    A mistake in my previous post. UAH statistical adjustments to the raw data are more than 14 times the size of the largest anomaly they’ve ever reported and more than 50 times the size of the average anomaly.

  28. There is a lot of confusion about what this post is attempting to do. I suppose you would have had to follow the Antarctic reconstruction math pretty closely especially considering that the final paper submitted for publication contains the bulk of the new work with these methods and we cannot publish any of it on line.

    If you have satellite data with good spatial coverage — every gridcell has a measurement every month. And you have ground data for a lot of gridcells but not all, how can you infill the rest of the ground grid?

    In the past CRU, GISS and UAH used different methods to infill between cells. CRUtem leaves them open, UAH uses kriging (I believe) at the poles, I don’t remember GISS offhand but I believe they do something as well. This post is the first time that satellite information has been used to distribute the ground information using correlation to high resolution satellite data rather than spatial position. The method allows general weather patterns to be taken into consideration when choosing what area each thermometer best represents. However, that said, there are considerable difficulties in achieving a confirm-able result. Station area density is one of them, sea surface temperatures is another, but there are others as well.

    The problem may be solvable and the result shouldn’t be that different from simpler methods but these are really the first (big) steps toward attempting a combined satellite/surface data solution to the problem.

    Something which should stand out from this that a bunch of us non-pro’s should realize is the horrible correlation between sea surface temperatures and lower troposphere temps. What it means is that 70 percent of the surface air temp isn’t being well sampled. I assumed the correlation would be much better so I learned something there.

  29. #30.
    The UAH TLT data, specifically the correlations that it shows between fluctuations in different grid cells, is being used purely to derive the spatial distribution of gridded temperatures everywhere using data from a limited number of surface stations. It is not used to correct the surface station data. The underlying thinking is that the satellite data may provide a more accurate way in which to estimate temperatures in cells where there are no representative surface records than existing interpolation, etc., methods.

    I would expect the pattern of correlations shown by the UAH to be fairly robust to most types of errors in the data, judging by experience with the Antarctica satellite data.

    The question of adjusting surface temperature records is separate. I used raw data because the availability of long term adjusted station data outside the USA is much more limited. And, leaving aside specific issues whose effects are in one particular direction (UHI; in USA TOB changes) it is not obvious that adjustments in one direction will tend to predominate. To the extent that the distribution of appropriate adjustments has equal weight in the positive and negative directions, why should they bias trend estimates (which is what we are mainly concerned with)?

  30. magicjava:

    Raw station data is 100% useless for determining anything at all. We can argue with “the powers that be” about whether or not they’re applying _correct_ adjustments, but there is no argument that adjustments must be made to raw station data to get something useful.

    I have to agree with you here. I think it’s useful to run with and without various adjustments… since the magnitude of the correction gives us some bound on how important that correction actually is. A better approach (really you do both) is that being taken by Chad right now where you Monte Carlo the effect and accuracy of various adjust algorithms.

    First an algorithm must be reliable before it should be used, namely the correction from the algorithm on average needs to improve the accuracy of the outcome. And secondly the correction must be significant with respect to the metric of interest (temperature trend here), then the adjustment should be made. Significant here is usually some fraction of your target measurement error, which I usually assume is about 0.1°C/century, and the “rule of 10” then says you want to make corrections that affect the outcome by 0.01°C/century.

    More formally you build an “error budget” and identify those adjustments that significantly contribute to the accuracy of the final product, and work out the minimum number of adjustments needed to achieve that accuracy.

    In the end, even if you know an adjustment is “needed” in the sense that the data really suffer from a given flaw, you shouldn’t make the adjustment unless the flaw leads to a meaningful

    And, for the record, I’ll dispute the notion that UAH data is “better”. That’s not a defense of surface readings, it’s just a statement of fact that _no one_ has vetted UAH data to anything close to the extent that surface readings have been vetted. Few folks, I would guess, even know that UAH data is built using statistics that make adjustments to the raw readings more than 3 times larger than the size of the largest anomaly ever reported by UAH, and more than 10 times larger than the average anomaly.

    I agree about it not being vetted to the same extent, but I think you’ll find the actual variability of the ground based climate data to be the same (I’d expect it to be larger actually) if you compare apples to apples (e.g., daily record).

    Monthly records in the GHCN2 database are highly smoothed already, you need to compare the variability in daily ground-based records in order to get an honest comparison.

  31. Jeff Id –
    Thanks for the clarification on the intent of this process. It’s good to know we’re not trying to correct raw readings with satellite data. Also, the differences between LT and SST was something I wasn’t aware of and is useful to know.

    NicL –
    I think that if you want to use raw surface station measurements you need to give strong reason why and show that for this procedure the raw readings give more realistic results than the adjusted readings. The justification for making adjustments goes back decades and is based on common sense things like “It’s hotter during the day than at night.”

  32. Carrick,

    What do you think about using first differences as opposed to ‘adjusting’ the data.

    Most of the adjustments ( TOBS, station move, equipment change) are step functions that are applied for one month in the series.

    If you just calculate first differences and eliminate the months where the change over happens you dont need to “adjust” the data

  33. 28. Steven Mosher

    I think I do understand what is being attempted and I am rather excited by what is happening. Its just that, like many others, I do not trust one component of the data.

    When we have a quality controlled set of thermometers where we know that the historic temperatures contain no unavoidable bias, and no unavoidable human-caused warming trends then “new” method will be very useful. I suspect that it might be possible to use such a method to actually detect the location and extent of urban warming in surface-station records for the satellite era.

    I certainly do not want to be critical of the concept and as you say it will be good to settle back and watch developments.

  34. magicjava,

    I dont think you have to make adjustments.

    I’ll explain why.

    you have a monthly series of temperature averages.

    1,1,1,1,

    Then you decide to change equipement to MMTS. In practice this change was just made and the effect now has to be estimated
    that estimation comes with some error which gets ignored.

    Lets just imagine that the change led to a 1 degree jump.

    1,1,1,1,2,2,2,2

    What do you? Well you can adjust the 2’s down to 1:
    1,1,1,1,1,1,1,1

    That’s ok if the change is deterministic one. A change that isnt estimated. But what if my change cant be estimated
    or is estimated with a big error?

    Why not just look at first differences.

    0,0,0,1,0,0,0

    Then you drop the case where the change was made.

    Its slope perserving. All you lose is 1 sample of the entire series.

    But when you adjust and dont carry the error forward you get a false sense of certainity.

    You dont need to make certain adjustments, especially if you are measuring trend.

    UHI contamination? that’s a different animal. But a station that has always been urban doesnt need an adjustment
    IF you are measuring trend and iff the change in urbanity has been slight ( depends on the time period)

    Actually NicL can test his approach by holding back data.

    he can probably test which predicts better: adjusted data, raw data, or first differences.

    Calculating first differences requires station histories.

  35. Jeff ID:

    Something which should stand out from this that a bunch of us non-pro’s should realize is the horrible correlation between sea surface temperatures and lower troposphere temps. What it means is that 70 percent of the surface air temp isn’t being well sampled. I assumed the correlation would be much better so I learned something there.

    I’d have to guess this is due to the fact that you generally have a temperature inversion in the marine atmospheric boundary layer (ABL). When that happens, you get a partial decoupling of the mechanics within the boundary layer and the atmosphere above it. (E.g., see this.)

    Secondly, quite often the circulation within the marine ABL is driven by the mechanical motion of the surface ocean waves. It’s easy to see how a measurement of the atmosphere above the ABL (as given e.g. by VORTEX) wouldn’t have as much to do with surface temperature as it would over land.

    I’ve pointed out for a while that GCMs don’t have an ABL, but all of the surface data is collected from inside of it. Possibly, satellite based measurements would be a better matchup to what the climate models actually simulate than ground based measurements (especially if you instrumented the GCM code to produce the exact same measure used by the satellites.)

    I guess the Antarctica had an advantage in that you were measuring surface (skin) temperature directly. That is probably better for oceans than land, on land, surface soil temperature is significantly different than surface air temperature too. [The soil is permeable to air in general, nor is soil generally simply structured like e.g. sand, so the mechanical and thermodynamic coupling of soil to air is pretty complicated]

  36. Interesting about the marine atmospheric boundary layer. I must look into this point.

    The same polar satellite AVHRR skin temperature data that was used in Antarctica is also available for all higher latitudes, and may for the oceans prove to be a better match to the in situ measurements than the satellite TLT data. But if one were seeeking to reconstruct TLT temperatures over ocean areas that wouldn’t help. Nor is AVHRR data available over the bulk of the earth’s surface.

    BTW, I have seen someone posting at Watts Up With That under the name NicL. No connection with me. Odd coincidence – I am the only person I know who spells his nickname “Nic”, without a k.

  37. Steven,

    I think you guys should do what you think is best, you’re the ones doing the work.

    But if you decide not to do adjustments, I’d expect a reaction like mine would be pretty common. The idea of adjustments are backed up by decades of literature and common sense. If you’re going to brush that aside, make sure you present a strong case on why you did what you did.

    Specifically, I _wouldn’t_ consider something along the lines of “We didn’t use adjustments because we have some other technique we like better”, and leaving it at that, to be a strong case. As I said in my previous post, I think you’re going to have to show that whatever technique you use or don’t use gives a more realistic result than using the standard adjustments.

    And, basically, if it’s not a big deal to your results, I’d just stick to the adjustments and avoid having to fight the fight as to why you left them out. If you don’t absolutely need to leave them out, leaving them out just distracts from the rest of your work.

  38. #41, The adjustments are incomplete, in many cases incorrect and my understanding from another comment left here is that even the authors of the GHCN data adjustments don’t recommend using them in their current state. Even the CRU stuff I’ve compared to GHCN used unadjusted.

    Even without that information, the fact that someone has adjusted data would not be a reason to use it. It’s up to the people who took the data to explain why 57 degrees as measured from a calibrated thermometer on July 1945 is really 56 degrees. I doubt very much that people will argue for use of GHCN ‘adjusted’ data.

  39. NicL:

    The same polar satellite AVHRR skin temperature data that was used in Antarctica is also available for all higher latitudes, and may for the oceans prove to be a better match to the in situ measurements than the satellite TLT data

    I’ve been arguing for a while SST and satellite TLT aren’t measuring the same thing.

    If you could show that you get a great correlation with AVHRR data and SST, that might be the basis for a paper by itself.

  40. Jeff ID:

    #41, The adjustments are incomplete, in many cases incorrect and my understanding from another comment left here is that even the authors of the GHCN data adjustments don’t recommend using them in their current state. Even the CRU stuff I’ve compared to GHCN used unadjusted.

    I think at the least, one needs to do QC on any adjustments you make.

    As I mentioned above, the two primary conditions are 1) effect of adjustment is significant with respect to your accuracy goal and 2) adjustment reliably improves the accuracy of the measurement on average.

    Chad is doing some interesting work in that regard btw.

    Take home message is you shouldn’t be applying adjustments that don’t meaningfully effect your results (UHI is one of these, according to my calculations it amounts to 0.03°C/century, hardly an important correction). And you should only apply the correct if on average it yields a significant improvement in accuracy.

    In other words:

    • Don’t apply corrections that don’t matter.

    • Don’t apply corrections that make things worse.

  41. Steven Mosher:

    If you just calculate first differences and eliminate the months where the change over happens you dont need to “adjust” the data

    That corrects one type of error, namely DC shift associated with shifts in stations or changes in instrumentation. In doing so it adds sqrt(2) more noise.

    It would be interesting to try combining data this way makes any difference. If it doesn’t affect the central values, we could argue from QC principles that these corrections don’t need to be made and should be left out.

  42. In contrast to the AVHRR data, the TLT data does not measure temperature at ground level (although it does measure air temperature and not ground skin temperature).

    The TLT and TMT channels do pick up emissions from the skin surface. Over land the contribution to the signal can be as much as 20%.

  43. Re: Chad (Feb 15 22:07),

    Of course they correct for that. UAH and RSS use different algorithms and data for that correction. It’s also the main effect behind the orbital drift correction. If the orbit drifts (as opposed to decays which is yet a different correction), then the time of day that the satellite passes over a particular area drifts and the surface temperature correction changes. Aqua doesn’t drift, which makes the correction much simpler and probably more accurate for UAH since 2002 when Aqua went into service.

  44. Re: 47

    DeWitt Payne,

    I’ve asked Christy about this a while back because I had questions about some weighting functions from RSS that were skin surface temperature/pressure dependent and he told me that a proper construction of a synthetic MSU product must consider the surface emission from land and ocean. From two of his somewhat recent papers:

    To be able to compare sonde and satellite data, brightness temperatures simulating the satellite observations of the three layers are computed from sonde profiles. This is done with full radiation code that includes the tiny effects of humidity and surface emission and reflection.

    Christy, J.R., and W.B. Norris, 2006: Satellite and VIZ–Radiosonde Intercomparisons for Diagnosis of Nonclimatic Influences. J. Atmos. Oceanic Technol., 23, 1181–1194.

    Sonde and satellite data can be compared by using the sonde observations to simulate the satellite brightness temperatures. This is done by taking the sonde temperatures at each pressure level and proportionally weighting them according to the contribution of that level to the microwave product. Full radiation code is applied to all reporting levels so that the effects of humidity and of surface emission and reflection are included.

    Christy, J. R., W. B. Norris, R. W. Spencer, and J. J. Hnilo (2007), Tropospheric temperature change since 1979 from tropical radiosonde and satellite measurements, J. Geophys. Res., 112, D06102, doi:10.1029/2005JD006881.

    Therefore I think that the effects of surface emissions are present in the MSU data because it is included in inter-comparisons with sonde-derived synthetic data.

  45. Re: Chad (Feb 15 23:09),

    If one knows the temperature profile, including the surface temperature and emissivity, then one can use the full radiation code and the weighting function of the detector to create a unique raw brightness temperature reading that should be measured by the satellite for that particular location and time. The reverse problem is ill-posed. There are an infinite number of temperature and humidity profiles that will create the same limited number of overlapped, both with each other and with the surface for some channels, and noisy satellite microwave intensity readings. In the microwave, you do get to use the Rayleigh-Jeans approximation, which helps. Then there’s the problem that you only get a very limited number of readings each day for each grid box and from that you have to construct a daily average temperature. Knowing how they use radiosonde data to check the MSU readings does not tell you how they convert MSU readings into LT, MT and LS temperature anomalies.

  46. DeWitt Payne,

    Look at page 2018 of “Analysis of the Merging Procedure for the MSU Daily Temperature Time Series”, Christy et al. 1997. It shows the weighting functions for TLT, TMT and TLS. TLT and TMT do have a non-zero surface contribution.

    On page 2023 is says

    In the subtropics, however, there are also substantial longitudinal variations in surface topography. For land below 500 m, T2LT emissions from the surface account for about 15%–20% of the total signal and for oceanic surfaces, about 10%. At higher elevations, the surface “shines through” more and more because the oxygen overburden, from which the atmospheric emissions originate, becomes less of the observational signal.

    The resultant data for TLT and TMT are weighted combinations of view angles for data that contain surface contributions. If I’m wrong, well, then I’m wrong. But I don’t think that I am. I’ll ask Christy about this.

  47. RE45.

    Agreed. I was going to pull up a few stations just to test. I think the best candidates would be either stations
    with only an MMTS change ot only a TOBS change.

    QC principles tell me not to make the adjustment.. I think.

  48. Re 41.

    Thx Magic.

    Before anybody did it I think you’d have to show the work obviously. The best candidate is TOBS. TOBS is, on my view after reading through the 1986 paper which justified and documented the adjustment a pretty good candidate for elimination. Its an empirical estimation of the impact of a change in time of observation. I dont have the errors associated with the adjustment off the top of my head but I recall being un impressed by the empirical model. In fact different agencies appear to use different models. So its a ripe one.

  49. Magic

    Lets do a littlel recap on adjustments typically made in USCHN ( for example)

    1. TOBS. The observer switches from collecting temps at 7AM to noon. In 1986 Karl studied this. its a model of the change.
    2. MMTS. The change is estimated from a feild study
    3. SHAP. for changes in station location altitude and lat.

    and a mystery

    http://climateaudit.org/2007/08/22/the-ho-83-hygrothermometer/

    If u read any of these u wont be that impressed. oh SHAP is un documented black box

  50. Steven Mosher 38 you say “UHI contamination? that’s a different animal. But a station that has always been urban doesn’t need an adjustment”. Is that true? Is there not a change of technology in buildings and equipmemt? Is it not more likely that the UDI increases progressively. In at Australia at least a rural station at a Post Office in the early 1900’s may have been on the lawn in a 1/4 acre block beside the residence with the PO at the front. There may not have been another building within 100yrds. Over time a few shops will be built. The dirt road at the front will be widened. The residence is taken over to enlarge the PO the lawn disappears.
    From the 1950’s the roads become bitumen. A health centre with parking is put adjacent to the weather station. The health centre gets air conditioning as does the PO. Footpaths, curb and guttering goes in multistory professional block is erected in place of one of the old shops. A super market is erected with parking on bitumen for 300 cars etc etc. The town may have grown from a population of 50 in 1870 to 100000 in 2000. At that point the station is closed and moved to the airport. Darwin has goneto over 200000.

  51. Re: Chad (Feb 16 01:59),

    The resultant data for TLT and TMT are weighted combinations of view angles for data that contain surface contributions.

    Yes. But they use the weighted combination to correct for the surface contribution and also to improve the pressure resolution of the TLT measurement. RSS uses a different method based on some sort of model (I think that’s correct anyway). So the surface emission does contribute to the raw measured data value. How much it contributes to the final product is a different question. I remember a lot of discussion on how much stratospheric cooling contributes to TMT and TLT a while back. It’s really the same problem with the same answer: It contributes to the measured microwave emission intensity but the algorithm used to calculate the product is supposed to remove that.

  52. Re: 55

    DeWitt Payne,

    Christy got back to me and said that the surface emission is a small part of the signal and is still in the final product, but when reduced to anomalies, the surface component is very small and that the signal is dominated by atmospheric changes.

  53. NicL, I hope you do not mind me piggy-backing this post onto your thread. It relates to your post, but addresses a different issue for which I am asking for suggestions on how to proceed. My last and only attempt at a thread here was a PR disaster for Jeff ID as the thread, without my own posts on the thread and a polite one or two by Jeff, generated pretty much no interest.

    I am having a conceptual quandary about the proper method to use to calculate a confidence interval for global of regional temperature anomaly trends given the grid coverage for the globe or regional area and the variation of those trends within the grid.

    Let me begin by explaining what I have done to this point in measuring grid coverage and within grid trend variations. Essentially I have used the GHCN adjusted data set for the time period 1950-1990 and grids bounded by 5 degree latitude and 5 degree longitude.

    I used adjusted GHCN data since I judge that if I use a temperature data base I use it the way the owners intended it to be used or I construct my own methods for adjusting the raw temperatures. I do not think that many of us would contend that the raw data requires no adjustments. In other words, we can agree on some adjustments and the rationales for using them or we admit that the raw data tell us essentially nothing or something with unknown and large uncertainties. Anyway my interest is in the estimating and assigning variation in temperature trends and not in determining an absolute temperature trend.

    The period of interest was selected based on getting some glimpse into the time period where the instrumental temperatures are expected to rise significantly due to GW/AGW and a period that allowed a reasonable sampling of global grids and stations.

    The station availability for GHCN adjusted temperatures by grids is summarized in this link:

    First of all dividing the globe by 5 x 5 degree grids yields 2592 grids and since the land makes up approximately 28% of the total global area we have 726 land grids – assuming no ocean/land overlap. With overlap we will have something like 780 land grids.

    What the table in the above link shows is the number and percent of grids that have a given number of stations per grid with stations that provide a temperature data set that runs at least from 1950-1990. This does not mean that these stations have continuous coverage for the entire period as we shall see later. We can see that 231 grids have only one station coverage, 107 have 2, 54 have 3 and 90% have 10 or less station coverage. The total grids accounted for here is only 539 and therefore if we subtract that number from 780 we see that approximately 240 grids have no stations covering them to provide adjusted GHCN temperatures. As has already been shown these stations are by no means distributed evenly around the globe and in fact many are from the US. Also, when one sees the station coverage of the globe by way of numbers, it becomes very misleading unless one realizes the very uneven spatial coverage that entails. It also points to why it is important to attempt to make an estimate of the uncertainty that that spatial coverage implies assuming reasonably large variations in the station trends.

    The linked table here presents the grid temperature anomaly trend data handled in two different ways.

    Each 5 x 5 grid with stations with continuous 1950-1990 yearly data are listed in the first 2 columns of the table. A designation of Lat 60 and Lon -150 encompasses the area from >60N to 65N and <150W to 155W. Note that the while there were 539 grids with at least one station that covered the period 1950-1990 (that actually means has at least one years coverage between 1950-1990) only 193 grids had complete 41 years coverage (with some missing months).

    The calculations were made firstly by calculating the individual station trends in degrees C per decade and then calculating a mean and standard deviations of those trend values over the designated grids. Those values are entered in the 3 and 4 columns of the table. The second calculation method was performed by averaging together the times series for each station of the stations in the grid into one time series for each grid. Each grid time series was than used for calculating a trend (slope) and standard error of the trend (slope). Those values appear in columns 6 and 7 of the table.

    As can be seen the two methods yield the same trends for the grids but the variations for the trends are different. The average grid trend for the first and second method is 0.130 degrees C per decade but the average variation of the first method is 0.123 and the second method is 0.082. Finally when the time series for each grid is averaged together to give one final grand average time series for all the grids the trend (slope) is 0 130 degrees C per decade and now the standard error of the trend (slope) is reduced to 0.035. I have a problem with stopping at this point and conceding that the overall variation from the sparse station coverage and station variations are measured by 0.130 +/- 0.070. What about those grids with one or no station coverage. I think it is easy to see that for a single station that the luck of the draw from potential stations within the grid could have an influence on the overall variation, yet I do not see how that is accounted for by averaging time series together.

    Here is my conceptual problem: I am thinking in terms of what a Monte Carlo (bootstrap) approach would require for estimating the confidence intervals that the variations measured in this exercise. Combing the data to obtain trial overall trends for all the grids would, I think, require averaging trend data for each separate trial as I did in the second method and then into a final grand time series. I think in layperson’s terms here that I am looking at two sources of variation and one is the variation from the individual station data and that from the variations of the stations within the grid. How do I capture both sources of variation? I am thinking it might require some form of ANOVA. Can anyone help me to the next step?

    I can provide R code for all calculations at a later time.

  54. Pingback: sat anlage
  55. Pingback: green stools

Leave a comment