Global temperature by combining surface and satellite data
Posted by Jeff Id on February 13, 2010
This is a complex guest post by Nic L. It represents early results of a method he and Ryan have been working on for global temperatures. This post uses the concept of regularized least squares, similarly to the RegEM method of Steig et al, whereas you have two datasets, one with more trend information(ground stations), the other with more spatial information (satellite). There is a huge amount of work here and the concept is fully publishable, especially when you see the huge warming result at the end ;). Because of that, this post would probably be one which would make the cut for Tamino or RealClimate, however, this post makes no effort to correct for surface station problems but rather is about the uniqueness of the method and the issues involved with a complex problem.
——NicL 11 February 2010
My earlier post on long-record GHCN stations sprang from my interest in using them to create an improved global gridded temperature history from 1850 on, by combining the data from a set of good quality long record surface stations, augmented by ocean data, with the gridded satellite TLT (lower troposphere) data produced by UAH and/or RSS since 1978.
A well known example in the peer reviewed literature (Steig et al, Nature, January 2009) applied this sort of approach to reconstructing a 50 year gridded temperature history of Antarctica using a relatively sparse set of ground stations, many of which had incomplete records and some of which did not cover the full 50 year period, in conjunction with AVHRR satellite TIR (thermal infra-red) data. Steve McIntyre started an investigation of Steig’s Antarctica reconstruction and Ryan O, Jeff Id and myself took this up and spent a lot of time and effort looking into Steig’s work and finding ways to improve on it.
Following on from that work, I have been seeking to apply some of the same techniques on a global scale. Ryan has also begun exploring this area, we have been exchanging ideas and the plan is to work on it together, hopefully along with Jeff. What is presented here is very much a “proof of concept” first cut approach.
The rationale for combining in-situ surface temperature data with satellite data is that the surface data is available going back much further than satellite data, but is spatially very incomplete, whilst the satellite data is spatially complete (or nearly so) but is available for only the last thirty years. Using mathematical techniques, spatial information from the satellite data can be combined with the temporal information from the surface data to produce a gridded reconstruction over the entire period that surface temperature measurements are available.
Combining data from satellite and surface measurements in the proposed way relies on the assumption that the pattern of spatial correlations of temperature fluctuations shown by the satellite data was the same in the pre-satellite era. Over the relatively limited time period involved in this case (160 years), that seems a reasonable assumption. It can be tested to an extent, but I have not attempted to do so at this stage.
There is more than one way of using the satellite data. Because of the very large number of grid cells, it is necessary first to reduce the dimensionality of the satellite data. This is generally done by performing a singular value decomposition (SVD) of the matrix of monthly satellite temperature data. This is performed after adjusting the data to anomalies by subtracting from each calendar month the mean thereof, and possibly also then normalising the data to unit monthly standard deviation. SVD is used to convert the data into a set of principal component time series (PCs) and their corresponding spatial eigenvectors (weights), together defining EOFs (Empirical Orthogonal Functions), ordered according to their contribution to the total covariance in the data, such that each sequential EOF accounts for the maximum possible amount of the total (remaining) covariance. This procedure is also known as Principal Components Analysis (PCA).
Only a limited number of PCs /spatial eigenvector pairs are retained for the reconstruction, on the basis that higher order ones mainly represent noise. Steig only retained 3 PCs for Antarctica – almost certainly too few properly to capture the spatial distribution of temperature fluctuations there. Mann in his latest 1500+ year proxy global reconstruction retained a maximum of 7 PCs from the gridded instrumental data. But in order to obtain a reconstruction with reasonably detailed spatial information it is necessary to use rather more PCs than that.
Steig used the longer record ground station data to complete the satellite record for the pre-satellite period using the TTLS (truncated total least squares) version of RegEM, an iterative regularized expectation maximization algorithm devised by Tapio Schneider and much used by Mann et al for “hockey stick” paleoclimate reconstructions. With TTLS RegEM, the regularization (involving a reduction in the effective number of free parameters in the model) takes the form of including only a fixed number of EOFs in regression used to impute missing values.
The choices made by Steig in his procedure seemed unsatisfactory, albeit partly constrained by limitations of the original RegEM program. But in principle it may be possible to produce a satisfactory reconstruction using a purely RegEM approach, at least where the temporal evolution of the satellite data is not only reasonably correlated with the corresponding surface data but broadly consistent with it as to trends.
Another approach to producing a gridded reconstruction is to perform RLS (regularized least squares) regression of the surface station data against the satellite spatial (eigenvector) structure, as pioneered by Ryan O in relation to Antarctica. This gives a set of estimated PCs for the entire time period that are then applied to the corresponding satellite spatial eigenvectors to produce a gridded reconstruction. The surface station data is generally infilled using RegEM before the RLS regression is performed. This method essentially involves the same equations as used in RegEM, save that the spatial correlations between satellite grid cells are used rather than the correlations between the surface data and the satellite data PCs.
The RLS method has an advantage over a pure RegEM approach in that the regularization can be more finely controlled and optimised to suit the data, with regularization taking place both in the initial RegEM surface data infilling and in the RLS regression. On the other hand, it assumes that all surface station data is equally representative of the mean temperature for the grid cell occupied by the station, which (whilst a reasonable approximation in the case of Antarctica) is not necessarily the case.
I initially intended to concentrate first on the pure RegEM approach, which is simpler. However, much of the work involved is common to both methods, so I decided to try both.
The TIR AVHRR data used in Antarctica measures skin temperature, which one would expect to be reasonably close to near-surface air temperature as measured by the ground stations. TIR measurements can only be made with fairly clear skies, but despite the need for some heavy cloudmasking the AVHRR data generally shows good correlations with the ground station data, on a monthly basis. Unfortunately, over time the satellite data exhibited substantial drift and inhomogeneities. Further, AVHRR data is only available pole-wards of about 50 degrees, so is not suitable for a global reconstruction.
In contrast, microwave satellite TLT data is available globally and does not require cloudmasking, although close to the poles and over high terrain its accuracy is poor. Further, the TLT data is thought to suffer relatively little from drift and inhomogeneities (although RSS’s version does appear to have had some inhomogeneities in the early 1990s). In case you aren’t familiar with the picture the TLT data gives of temperature trends worldwide since coverage started, here is a map of them, using UAH’s data. The concentration of warming in the higher northern latitudes is very evident, although the projection used over-represents the area affected.
It is also relevant to look at the distribution of variability of the TLT temperature data. The mean monthly standard deviation of the TLT data over land is about 25% higher than over sea. However, the bulk of this difference is due to the strong positive correlation between standard deviation and distance from the equator, as shown in the following map.
In contrast to the AVHRR data, the TLT data does not measure temperature at ground level (although it does measure air temperature and not ground skin temperature). This means that there is a fundamental difference between the reconstructions produced by the pure RegEM and RLS methods. The former results in a reconstruction of TLT temperatures, as it is the TLT data (as represented by its lower EOFs) that is being imputed, with the regression coefficients reflecting the correlations between changes in TLT data and in surface data. But as the RLS method involves correlations between satellite data in different places (used, where ground stations are located, as a proxy for correlations between surface temperatures in those locations) and not between satellite and ground level data, the reconstruction is of surface temperatures. (These are my views; Ryan doesn’t necessarily agree with me on this – yet. J)
Whichever type of reconstruction is intended, the first step is to perform a SVD of the satellite TLT data and to examine the results.
I only used UAH’s TLT data, as their data seems to have been more homogenous than RSS’s and I have more confidence in their methodology. The UAH TLT data is available as mean monthly anomalies from December 1978 onwards at http://vortex.nsstc.uah.edu/data/msu/t2lt/tltmonamg.xxxx_5.2, xxxx being the year involved. I took data only up to November 2009, so as to use only complete years. The whole globe is covered in 2.5 x 2.5 degree grid cells, but UAH defines global as being from -85 to +85 latitude. As mentioned previously, data near the poles is of limited use.
The further from the equator one gets, the smaller each grid cell is. It is appropriate to correct for this before taking the SVD, otherwise areas towards the poles have a disproportionate influence. Ryan and I have together looked at how best to do this. One solution is to make each grid cell cover a larger range of longitude as latitude increases (N or S). However, I opted here to weight the grid cells by the square root of their areas. This has virtually the same effect as merging data into longitudinally wider grid cells (square root since SVD minimises a RMS function), once the spatial eigenvectors have been reweighted by the same factors. Weighting grid cells retains the conventional format and also has the advantage of preserving the finer spatial resolution available in the original data at higher latitudes. However, these are relatively minor advantages and come at the expense of a considerably increased computational load and the need to worry about weighting, so I may well in future instead use longitudinally wider grid cells as latitude increases. Ryan has written some helpful code for that purpose and for plotting data in that format on a more realistic global projection than the rectangular format employed here.
There is one other weighting issue. In Antarctica, Steig scaled all the satellite data to unit variance before taking the SVD and deriving the principal components and spatial eigenvectors. That means that the SVD was calculated using the correlation matrix of the data rather than its covariance matrix. The strong dependence of standard deviation on latitude could be seen as a reason for preferring to use a correlation basis. However, as Steve M pointed out in his reply to Huybers (http://climateaudit.org/?p=405) in the equivalent context of PCA, the statistical authorities favour using covariance where (as here) the data are measured in commensurate units. Accordingly, I have not scaled the TLT data to unit variance before taking the SVD. I may investigate the effects of doing so at a later stage.
SVD eigenvectors of global TLT data
The first six of the spatial eigenvectors resulting from taking the covariance SVD of the thus weighted global TLT data from -85 to +85 latitude are as follows. Their signs are of course arbitrary.
The correlations between adjacent TLT grid cells are generally high. But, as the generally horizontal orientation of the first six spatial eigenvectors suggests, the correlation between longitudinally adjacent (E-W) TLT grid cells is considerably closer to unity than that of latitudinally adjacent (N-S) cells.
The below graph shows the pattern of the SVD eigenvalues, each approximately proportional to the mean standard deviation of that part of the fluctuations in the data accounted for by the corresponding EOF. It will be seen that, apart from after the first seven eigenvalues, there are no steep declines in the curve that would suggest an obvious point at which to truncate.
The first seven EOFs only account for 40% of the total variability power in the data (the sum of the squared eigenvalues); to get to 90% thereof requires 58 EOFs, and to get to 95% requires 88. However, virtually all of the mean global trend is likely to be accounted for by a fairly small number of the lower EOFs.
SVD eigenvectors of land only TLT data
It may be better to reconstruct global temperatures separately for land and sea, as the surface data available and its characteristics are quite different for land and for sea. If so, using a land only and a sea only SVD might provide a more suitable representation for the corresponding reconstruction. It is relevant in particular to see what the land only SVD of the TLT data looks like, as there is likely to be more spatial detail over land than over sea. The first six spatial eigenvectors thereof are shown below.
Surface data – Land
The next step is to build suitable sets of surface data, both land and sea. I will start with land data, where there are many more choices to be made. l used GHCN land station monthly mean raw data up to December 2009. I processed it in the same way as described in my previous post (see https://noconsensus.wordpress.com/2010/01/19/long-record-ghcn-analysis/). That gave 7280 stations, excluding stations with no post 1800 data.
Raw data was used in preference to adjusted data mainly because of the greater extent of raw data, particularly outside the USA and for earlier dates, but also partly because of concerns that the adjustment process could contain biases and did not in any event deal with urban heat island (UHI) and other effects of changing land use. In the case of the USA, adjustment should arguably be made for two widespread cooling biases. These are the change in time of observation, about which there is little debate, and possibly the change from liquid in glass thermometers to electronic sensors (MMTS) – although the effect of the latter is not entirely clear, and in any event seems small. But my view is that, overall, UHI and other land use change effects are likely to result in any bias in the raw data being in the direction of warming.
I wanted to obtain as much data as possible going back at least to 1900, ideally 1850, but there is very limited availability of such data for much of the world outside the 48 contiguous USA states, and data for many stations ceased at some point in the last twenty years. The number of stations with data extending from 1850 to the late twentieth century is very limited (only 100 or so with fairly complete records), and most of these are included in the set of stations with good data from 1900 to 1999. I therefore included stations with data at least from 1900 to 1999 and, outside the contiguous USA, stations with data at least from 1950 to 2005.
Initially I required not only that there be some data in the start and end year but that no more than 20% of data be missing over the whole qualifying period. However, I relaxed this to 30% for the 1900-1999 stations. Whilst doing so increases the number of 1900-1999 stations only modestly (from 1034 to 1060) it noticeably improves coverage in some areas of the world.
My preference was to include only rural stations, with a view to avoiding UHI effects, but the coverage of long record rural stations outside the USA is too patchy to permit this. Additionally, there are instances of stations marked at GHCN as being rural that have in fact become urbanised; this may be a particular problem outside the USA.
The final selection was of all 1900-1999 stations, save for non-rural USA stations, and of all 1950-2005 stations that were either outside the USA or were rural stations in Alaska. Coverage of the USA from rural 1900-1999 stations (of which there were 463, most of which also have post-1999 data) was generally excellent save for in Alaska, where the nine rural stations meeting the 1950-2005 record criteria were included. The final data set contained 1123 stations.
The precise selection criteria are as per the following lines of R code, “araw” being the matrix of monthly anomalies temperatures for the 7280 stations from January 1850 to December 2009 and “tinv” the list of those stations with their metadata:
### create logical indexes of stations with at least one month’s data in the year concerned:1900, 1950, 1999 or 2005
### create vectors of number of months with missing data for each station in periods 1900-1999 and 1950-2005
### make index of all GHCN stations with raw data for some month in each of 1900 and1999 and <30% missing data
araw.100.30.idx=araw.1900.idx&araw.1999.idx&(araw.100.mis<360) ; sum(araw.100.30.idx) # 1060
### make index of all GHCN stations with raw data for some month in each of 1950 and 2005 and <20% missing data
araw.55.20.idx=araw.1950.idx&araw.2005.idx&(araw.55.mis<132) ; sum(araw.55.idx) #1808
### make indexes of rural stations, of all non-USA stations and of 9 rural Alaskan stations included in araw.55.20.idx
araw.exUSA.idx=(!(tinv[,1]==425)); sum(araw.exUSA.idx) # 5359
araw.rural.idx=(tinv[,9]==”R”) ; sum(araw.rural.idx) # 3912
araw.55.r.idx=araw.1950.idx&araw.2005.idx&(araw.55.mis<132)&araw.rural.idx ; sum(araw.55.r.idx) # 897
araw.55.20.alaska.idx=rep(FALSE, times=7280); araw.55.20.alaska.idx[c(3361,3364,3370,3375,3403,3404,3405,3410,3419)]=TRUE
### make index of all 1900-1999 stations save USA non-rural plus non-USA (and rural Alaska) 1950-2005 stations
araw.rls.idx= (araw.100.30.idx&(!araw.exUSA.idx&araw.rural.idx|araw.exUSA.idx)) | (araw.55.20.idx&araw.exUSA.idx)|araw.55.20.alaska.idx ; sum(araw.rls.idx) #1123
[Note that I have not included other R code used to generate the results given in this post; there is much code and it is not currently in a sufficiently organised state for posting.]
The two following maps shows respectively the locations and 1950-2005 trends of the1123 stations selected for the global reconstructions; stations with small trends show up much more clearly on the location map. [Ignore the number 1132 in the headings of these maps; I made a transposition error in my typing of 1123.]
The mean 1950-2005 trend of the set of 1123 stations is 0.104 degrees C /decade (all trends given hereafter are in degrees C /decade). That is much higher than for rural stations only, at 0.066, but the rural station set is dominated by those in the USA.
Trends over 1900-2005 for stations with data stretching back to 1900 are shown in the below map. The much sparser coverage outside the USA of stations with good records back to 1900 as compared to 1950 is evident. This map includes 383 non-rural USA stations that do not form part of the 1132 strong reconstruction set, but their mean trend of 0.020 degrees C /decade is close to the overall mean of 0.026, so the mean excluding them would be only slightly higher. The 453 rural USA stations had a mean trend of only 0.007, but the mean 1900-2005 trend of the 30 rural stations in the rest of the world was 0.071, close to the 0.078 mean trend of the 189 non-rural ex USA stations.
For a reconstruction that combines ground station with satellite temperature data, it is important that there is a reasonably good correlation between fluctuations in the ground station data and in that for the satellite data for the grid cell in which the station is located, or (where, for example, the station is close to the edge of that cell, or is on the coast) an adjacent cell. Encouragingly, these correlations were generally good, with a mean adjusted R-squared of 0.52, as shown in the below scatter plot. Since the satellite TLT data measures temperatures at an average height well above the ground, and local weather and micro-climate derived fluctuations lead to variability in station data that does not feature in a grid cell average, one would not expect anything like perfect correlation.
Whilst most of the correlations are strong, many very strong, there are a fair number with low adjusted R-squared. The below map shows R-squared varies by location.
It is probably unsurprising that correlations for stations located on small islands, and in some cases perhaps on the coast, are generally low, as sea temperatures behave differently from land temperatures. But the large north-south difference was unexpected, and is a source of some concern. Why should most stations south of about 30-40N show much lower correlations than stations to the north? I have not been able to find any error in my computations, although it is possible that there is one. Probably there is some good climatological reason for the north-south divergence, of which I am currently unaware. Fortunately, the correlation of most of the stations south of 30-40N with their TLT grid cell data is significant, even if not particularly high.
The map below shows the regression coefficient (multiplier) of the station data on the satellite data. This is related to their correlation but also to their relative variance. Most stations have a coefficient well above 0.4 (the scale is from 0.4 to 1.4). A good number of stations have a coefficient in excess of 1.2, significantly above unity. It is well known that in situ measurements of ground temperature show larger trends than do TLT measurements, contrary to the predictions of global circulation models. This phenomenon, and possible explanations for it, were for instance discussed in the 2009 JGR Klotzbach, Pielke at al paper “An alternative explanation for differential temperature trends at the surface and in the lower troposphere”. However, I was unaware that a geared relationship often obtained for monthly fluctuations.
Surface data – Ocean
Having assembled a large set of long record ground station, I needed to obtain sea surface data. Although it would seem logical to use near surface marine air temperatures, the accepted global surface temperature datasets such as HadCRU use measurements of water temperature, taken typically taken 1-10 meters below the surface. This data is readily available on a gridded basis back to 1850, based on a 5 x 5 degree grid. There is a fair amount of data missing, increasingly so as one goes further back from the present. As the measurements generally don’t come from fixed stations, it seems impracticable to use raw data.
I accessed the HadSST2 data set of adjusted grid cell monthly mean sea surface temperature anomalies at http://hadobs.metoffice.com/hadsst2/data/HadSST2_SST_1850on.txt.gz and scanned the downloaded file into R. In order to get anything approaching reasonable global coverage going back to 1900, I accepted grid cells with up to 30% missing data over 1900-2005 and some data in both those years, in line with the criteria for 1900-1999 ground stations. 724 of the 2592 grid cells met that requirement. I then added 23 other cells that had some data in 1850-1852 and in 2004-2006 and no more than 30% missing data in between, giving a total data set of 747 grid cells with data starting (for some cells) in 1850. The distribution of the 724 grid cells is shown in the below map. It can be seen that coverage, whilst generally reasonable, is weak in the middle of the Pacific ocean and very limited pole-wards of about 40S.
The corresponding map for cells with data in 1850-1852 is sparser, particularly outside the Atlantic ocean, and has virtually no coverage of the Pacific ocean other than coastal areas.
I then carried out a similar investigation to that for the land stations, of the relationship between the sea surface temperature data and that for the corresponding satellite TLT grid cells. This required a little thought since the HadSST cells four times the area of the TLT cells and are ordered differently, but at least each HadSST cell matches exactly with four TLT cells. The results are shown in this map.
Disappointingly, in most cases the correlation is fairly low, with a mean adjusted R-squared of only 0.14. The regression coefficients of HadSST data on the satellite TLT data were correspondingly low, with a mean of 0.37. There are a few areas where the correlations and regression coefficients are much higher; principally the eastern Pacific.
It is perhaps unsurprising that sea surface temperatures, reflecting as they do temperatures 1 to 10 metres below the surface, should correlate poorly with air temperatures measured in the lower troposphere. Certainly, the greater heat retaining capacity of water is reflected in a much lower monthly standard deviation of HadSST grid cell temperature anomalies (mean 0.8 degrees C) than for land stations (mean 1.7 degrees C), measured over the period for which satellite data exists.
Data for marine air temperatures also exists, although it is even less complete than for sea surface temperatures. Hadley produces the MOHMAT43 dataset of gridded monthly marine air temperature anomalies. This is based on night time readings only, to avoid distortions from deck heating, and is adjusted to correct for, in particular, changes in mean deck heights over time. The series starts in 1856 but data has only been produced up to May 2007. There is also an infilled interpolated version, HadMAT1, but I decided to work with the uninterpolated version in the hope that the lack of spatial smoothing would result in better correlation with the satellite TLT data.
There are 501 MOHMAT grid cells with some data in 1899-1901 and in 2005, and under 30% missing months in 1900-2005. The distribution of those grid cells is shown in the below map. It can be seen that coverage is more limited than for the HadSST dataset.
I feared that MOHMAT grid cells with data going back to 1900 provided too limited coverage to provide a reasonable basis for a global reconstruction, so I decided also to use cells with data in 1950 and 2005 and under 30% missing months in that period. I then computed the correlation of each of the resulting 988 cells with the TLT satellite data. The results are shown in the below map.
The correlation is generally higher than for the HadSST cells, but the mean adjusted R squared is still rather low at 0.19. The regression coefficients of MOHMAT data on the satellite TLT data were correspondingly higher than for HadSST data but remains quite low, with a mean of 0.50.
It occurred to me that the low correlation could partially be due to marine air temperatures, which are quite closely related to sea surface temperatures, lagging those of the atmosphere generally, due to the high thermal heat capacity of the upper layer of the oceans. I therefore regressed the MOHMAT data on the corresponding TLT data for the preceding month. This did produce modest correlations, with a mean R squared of 0.09. However, regressing the MOHMAT data on a combination of TLT data from the same and the preceding month only resulted in a marginally higher correlation than regressing on the current month data alone (adjusted R squared increased from slightly under 0.19 to just over 0.20).
Interestingly, the mean monthly standard deviation for the 988 MOHMAT grid cells was virtually the same as for the 747 HadSST cells, at 0.8 degrees C over the period for which satellite TLT data exists. As for land, but perhaps surprisingly for sea, this is higher than the mean standard deviation of the TLT data for the corresponding grid cells, of 0.7 degrees C.
At this point, I was unsure which was the best data to use for reconstructing temperatures over the ocean areas, so I decided to concentrate on land.
Infilling the surface data – Land
In order to produce a spatially and temporally complete gridded reconstruction from surface and satellite temperature data, it is necessary (or in any case helpful) first to infill the surface temperature records. That includes extending back, to whatever date the reconstruction is to start from, the records of stations that commenced after that date. I had always intended doing this separately for the land and sea data. Their differing characteristics suggested that the extent to which sea temperatures would help to infill land temperatures and vice versa would be limited, and the sizes of the individual data sets meant that even with separate infilling it would take many hours of computer time running RegEM for the iterative infilling algorithm to converge sufficiently. In light of the much lower correlation of the TLT satellite data with the sea as compared with the land surface data, I decided also to investigate producing separate gridded reconstructions for land and sea areas.
Infilling the missing data for the land stations was done using ridge regression RegEM. Since all the data was in the same units, no scaling to unit variance was used, as for the TLT SVD. The use of covariance rather than correlation is not an option in Tapio Schneider’s original Matlab implementation of RegEM, but the R version I have developed is much more flexible. Unfortunately, ridge regression RegEM is slow with data sets like this. Achieving an acceptable convergence level took four days on a fast computer.
Below is a map showing the resulting 1900-2005 trends of the stations after infilling. The mean trend is 0.032, up from 0.026 for stations with actual data back to 1900, but that station set was much more dominated by the USA.
I was slightly concerned that the large number of USA stations might distort the imputation process. However, the mean covariance (after imputation) between USA stations and those elsewhere was only 0.07, or 0.04 excluding Canada (where a measurable correlation is to be expected). By comparison, the mean covariance of USA stations with other USA stations is 1.93. So it appears that my concern was likely unfounded. And with the large effective number of parameters in the regression any dominance of the first few EOFs by US station variability should make little difference to the imputation.
The variation of the effective number of parameters retained in the ridge regression is shown below. The graphs rises sharply up to about 1900 but then levels off before peaking in the 1930s and thereafter fluctuates greatly, no doubt due both to the varying availability of station data and varying levels of agreement between data from different stations.
The mean estimated error of the infilled data matrix, as computed in RegEM, which up till 1940 mirrored the inverse of the effective number of parameters, continued to decline until the1950s and remained generally low until April 2006, when GHCN data for many long record stations ceased.
As the RegEM estimation error is zero where actual data exists, having data from a higher proportion of the data set generally leads to a lower mean estimated error, even when there is then more disagreement between the data (which triggers stronger regularization, causing a reduction in the effective number of parameters). But also, the further back one goes the greater the uncertainty in the estimates used for infilling, with there being fewer and less well correlated stations to impute from. Back to about 1900, the mean estimation error – whilst by no means insignificant – is not that high. Going further back, the mean estimation error rises steeply.
Reconstructions – Land
With an infilled land station set having reasonable geographical coverage and reasonably low estimation error, I decided to have a go at producing some initial gridded land-only reconstructions, based on the land-only TLT data SVD. The reconstructions are very much experimental at this stage, and I have made no attempt to estimate error bands or carry out verification experiments.
Note that if the SVD of the global TLT data were instead used, it would in principle be possible to produce a global reconstruction using data from land stations only. I have not presented results of such reconstructions here, as I am not sure how valid they would be over ocean areas.
I first attempted a reconstruction with RegEM, using a similar approach to Steig in Antarctica. In this case, unlike Steig’s, the trends recorded by the satellite data for grid cells containing the 1123 ground stations were close to (90% of) the trends recorded by those stations. I had imputed the missing ground data without using the satellite data, so there was in any case no question of trends in the satellite data affecting
the ground data. And in order to prevent any of the satellite PCs influencing the imputation of any other PC, I gave all the PCs a very low weighting within RegEM.
I found that ridge regression RegEM wouldn’t optimise the regularization correctly – it is not really designed to deal with a completely infilled data set combined with one that is complete for one period and entirely missing for the remaining time. So I used TTLS RegEM instead (which is what Steig used). That meant choosing the number of EOFs to retain in RegEM. It made sense to retain at least as many EOFs in RegEM as there were satellite PCs to impute, but if the number of satellite PCs used was large it seemed unnecessary to retain more EOFs in RegEM than the number of satellite PCs. So I used 50 satellite PCs and retained 50 EOFs in RegEM. As for the ground infilling, I used covariance RegEM.
The 1900-2005 trend map of the resulting reconstruction is shown below. It is disappointing; frankly it looks wrong, even allowing for this being a reconstruction of TLT temperatures rather than surface temperatures. There is very little spatial detail, and the negligible trend is inconsistent with data from the land stations and from the established sources of gridded data.
I carried out some experimentation with using different numbers of PCs, which made little difference. I hunted for errors, and found that I had mis-weighted the result, resulting in a slight error in the trend. But I have so far been unable to establish where the real problem with the RegEM gridded TLT temperature reconstruction lies. So I have instead turned my attention to RLS gridded reconstructions of surface temperature.
I decided to use 100 PCs for the RLS reconstructions; experience with Antarctica showed that having many RLS PCs did not lead to over-fitting or other problems. Regularization was set by randomly withholding a small proportion of the station data consisting exclusively of actual measurements. The program iteratively adjusts the level of regularization so as to minimise the estimation error for the withheld data points. Regularization was performed separately for various sub-periods into which the data was divided. This was in case the time variation in the effective number of parameters in the ground station matrix led to the optimal regularization strength also being time varying (which indeed turned out to be the case).
The 1900-2005 trends resulting from the 100 PC RLS land reconstruction using the infilled full set of 1123 ground stations are shown in the below map.
Although the mean trend of 0.033 looked more sensible than that of the RegEM reconstruction, it still seemed on the low side. Moreover, the regularization parameters showed that heavy regularization was being applied, indicating considerable inconsistencies between the land and the satellite data. Increasing the number of retained PCs from 100 to 200 had no effect, as I expected.
Investigation suggested that the heavy regularization was to a substantial extent related to on the one hand, the USA stations by and large having very little correlation with, and lower trends than, the stations in Siberia, whilst on the other hand the satellite data showed a material (albeit not that high) degree of correlation between temperatures in the USA and in Siberia. The map below shows the regression coefficients of TLT temperatures worldwide on the mean TLT temperature for the USA; the correlation is closely linked to these coefficients.
RLS is reasonably tolerant of areas being over or under represented by stations. In Antarctica, I found that reweighting the RLS regression, so as effectively to be equivalent to having the same density of stations in each of the four main regions, made virtually no difference to the outcome. However, here the USA is very heavily over-weighted and there is considerable divergence between the correlation information from the stations and from the satellite data. It looked as if the over-representation of USA stations might be leading to USA temperatures significantly influencing estimated temperatures in Siberia and thus skewing the result.
I therefore removed most of the contiguous USA stations, retaining only those whose position in the list of GHCN stations was a multiple of eight, or 62 in all. That reduced the station set from 1123 to 723 in size. Producing a RLS reconstruction from this station set resulted in a much lower level of regularization being imposed than with the full station set. The reconstruction showed trends as per the following map.
This is probably a more realistic result than that from using all 1123 stations. It appears that the heavy over-representation of USA stations in the full set had indeed been skewing the results, in particular preventing the high trends in Siberia (and Europe) from being fully expressed. But the cooling in the south eastern USA has gone in the 723 station reconstruction, whilst the cooling in Greenland is greater, which both seem odd. For good measure I also tried selecting every tenth USA station instead of every eighth, but found that doing so made little further difference.
And how does the 723 station RLS reconstruction compare with gridded temperature records prepared directly from station data? I obtained the following 1900-2005 trend map from GISS (the only provider whose website offered this service). It is based on adjusted data, and areas with insufficient station records are blank. The patterns are in very broad terms similar, but GISS shows lower trends than RLS in the south east USA and in northern Siberia, and higher trends in Alaska and Greenland.
The (smoothed) pattern of mean land temperature over time per the RLS reconstruction, from 1850 on, is as follows.
The chart of mean land temperature on the UK Met Office website, reproduced below, shows a broadly comparable pattern but with a rather less pronounced rise since 1970. Note that the baselines of the two charts differ slightly, as do their scales.
Finally, just for completeness, the map below shows a global surface reconstruction based on data purely from land stations, in conjunction with the SVD of the global satellite TLT data. I am not implying that doing so gives anything like an accurate reconstruction over ocean areas. However, it is encouraging that the reconstruction for land areas is much the same as that using the SVD of the land only TLT data (as should be the case). Note that the trends stated in this map are unweighted. The area weighted trends are 0.0356 (global) and 0.0633 (land).
This is just an initial investigation of the possibilities for reconstructing global temperatures by using surface and satellite data in conjunction, so it is premature to draw any definitive conclusions. But my initial thoughts are that the generally good correlation of the satellite TLT data with land station records suggests that using the two data sources in combination could work well. The reasonably sensible results of the initial RLS land reconstructions seem to support that view, although the poor results from the initial RegEM reconstructions are disappointing.
The prospects for combining in situ ocean data with satellite TLT data seem rather less good. The low correlations of sea surface temperatures and (to a lesser extent) marine air temperatures with TLT data are disappointing. But it may be that the TIR AVHRR data from polar orbiting satellites will provide a better match with in situ temperature measurements. Further investigation is required.