## Ljungqvist Analysis Continued

Posted by Jeff Id on December 6, 2010

This is a guest post by Kenneth Fritsch. He’s taken the time to further the analysis of the unusual proxies from the recent Ljungqvist paper covered here. Like our work on the Antarctic, this is progressing in baby steps toward a result. In case you don’t know, I have a ton of respect for people who don’t fear the data or results. Just let it lead where it goes.

———————

I’ll attempt to make my analysis of the comparison of series correlations of the annual Ljungqvist proxies and GHCN station series brief and to the point.

When I refer to correlations in this analysis I mean correlations between times series of stations or between time series of proxies. The comparisons where made using time periods of 20 and 41 years. In all cases the correlations include all the unique pair-wise station to station or proxy to proxy correlations.

I used the Lungqvist proxy data after normalizing it (z scores) by subtracting the mean and dividing by the standard deviation. I used the mean and standard deviation for 29 of the 30 proxies for a period of 947 years going back from the early 20th century to the late 10th century. I eliminated proxy number 21 because it had a very discontinuous pattern of data availability during this long time period. In order to make the comparison with GHCN station data I used only proxies that had annual resolutions.

I used GHCN data for the period 1950 to 1990 because during that time I had the largest set of stations with complete data. For that period there were 447 stations. I also used a set of GHCN data over a 100 year period in order to get a picture of the change in the station series correlations over 20 year periods. For that period there were only 35 GHCN stations that had complete data. For GHCN correlations I used temperature data and not anomalies.

My intent was to use an internal comparison of station and proxy data in the form of the relationships of station and proxy series correlations with distance separations and estimate how that relationship changes over time. The standard of comparison here would be the station correlations knowing full well that proxies are not thermometers and that the relationship of the proxy response would deteriorate from instrumental by way of white and red noise. The question that remains is how much deterioration would we expect and still assume we could find a temperature signal or better at what point could we assume no signal is discernible.

First the relationship of station correlation with distance and how well it holds up over time must be estimated. To that end I used the 447 GHCN stations temperature series for the time period 1950-1990 and calculated the unique pair-wise series (and excluding the trivial same station correlations). Those unique pairs are a large number [447*(447-1)/2 =99,681]. I plotted and regressed those pair-wise series correlations against distance for distance separations less than 3000 km. It can be seen that beyond 3000 km separation the correlation on average goes to 0. The plot is shown in Link 1 (Figure 1) and the regression summary is given in Link 2 (Figure 2) below. While there is considerable scatter of data points around a linear trend line, the correlation is quite high and significant.

To better see the relationship that is subject noise in Link 1, I took the same data and calculated means and standard deviations for 100 km increments. The plots for the mean and standard deviations are shown in Link 3 (Figure 3) below. Included is a plot of the mean for the same data that has altitude difference between stations of more than 1000 m to show that while altitude differences have an effect that are not large. The correlations of the mean distance separation and the standard deviation seen in this form shows that there is a good correlation between series correlations and distance of separation, i.e. minus the noise. That does not, however, entirely mitigate the scatter of data seen in the first link when a comparison is made with a small number of data such as is the case with the Ljungqvist proxies. For this post I leave that as an open question that I will attempt to answer at another time with some stratified re-sampling of the station data.

Next I looked at the proxy data using the same 41 year time periods, but since I have a long time series (over the 947 years) I was able to obtain 23 time series to obtain a mean correlation and a standard deviation. The data plotted in Link 4 (Figure 4) below used only separation distances less than 3000 km and thus, unfortunately the number of data points for comparison with the station data was only 16. Remember though that each point is an average of 23 time periods. The trends look very flat for both the means and the standard deviations. Take out the one point at 0.26 and the means would be flat at approximately 0 correlation.

To obtain a better comparison over several time periods I used a 20 year time series for pair-wise correlations for the GHCN temperatures over the period 1990 to 1999 and compared it to the Ljungqvist proxy data over the 947 year range with 47 sets of 20 year time periods. That comparison is shown in Link 5 (Figure 5) below. While the proxy data is sparse and without re-sampling the station data it is rather apparent that there is not even an approaching overlap of the station correlations in the 500 km distance separation range with the very low proxy correlations.

I have added a histogram in the link below that compares a stratified re-sampling of the log10 of the probability of the regression trend of the GHCN station distance separation versus station series correlations being 0 to the same statistic for the annual Ljungqvist proxies.

Recall that the GHCN stations distance/series correlations are from 447 stations from the period 1950-1990. The proxy relationships are average series correlations of 41 years collected in 23 sets over 947 years. In both station and proxy relationships the correlations used were confined to distance separations less than 3000 km. The final proxy regression with 16 data points was the basis for the GHCN re-sampling with the station data being re=sampled with 16 data points randomly selected from the same 100 km incremental distance separations in which the proxy data resided.

The histogram of the station log10 probabilities compared to the proxy log10 probability shown as the vertical line at -0.33 shows that proxy series correlation versus distance relationship is significantly different than that relationship for the station data. This difference is not so apparent when one looks at the scatter plot of the GHCN series correlations to distance separation.

The next step will be to add white noise to the GHCN station data and repeat the comparison described here.

## jeff id said

I really have to spend more time with Ljungqvist. Some readers will remember that I ran his reconstruction with wildly different sections of the data and got the same answer no matter what was chosen. Then from Kenneth’s analysis above you can see the very low correlation of the proxies in all timeperiods and nearly independent of distance.

This is of course more direct evidence for the very high noise level in proxy data. I have a suspicion of how the two above cases could be true and it would only take a few hours to prove out, but as always these days, I have no time.

## John F. Pittman said

To obtain a better comparison over several time periods I used a 20 year time series for pair-wise correlations for the GHCN temperatures over the period 1990 to 1999 and compared it to the Ljungqvist proxy data over the 947 year range with 47 sets of 20 year time periods. paragraph below figure 4.

This should read 1950 to 1999 shouldn’t it?

## kim said

Cracks the rationale of the GHCN, no?

========================

## Kenneth Fritsch said

Reply to Post #2

I used five 20 year periods from the one hundred years for the time period from 1900-1999 for each of 35 GHCN stations. The shorter periods were 1900-1919, 1920-1939,1940-1959,1960-1979 and 1980-1999 for each of the stations. I used only GHCN stations that had no monthly missing data.

I obtained 447 GHCN stations with no missing monthly data from 1950-1990, but when I expanded the time period to 100 years in order to obtain sufficient numbers of subsets (20 years) for a reasonable statistical analysis, the number of stations without missing data was reduced to 35. I could have interpolated missing monthly data, but I judged that I obtain sufficient numbers of stations for statistical analysis without the interpolation.

## AMac said

Re: Kenneth Fritsch (Dec 7 11:05),

John Pittman (#2) spotted a typo, I think.

## AMac said

For Figure 5’s graph “Mean Correlation vs. Station Distance Separation” — this is the 20-year time periods result — there doesn’t seem to be any data for GHCN separation distances greater than 250 km and less than 1000 km.

This is for 47 sets of pairwise comparisons, 5 data points per set. So it looks like the 47 GHCN pairs that you chose did not happen to include any with distances in the 250 km to 1000 km range? Am I interpreting this correctly?

## AMac said

I like where this is going.

We can think of proxies (eg treerings) as having two components, signal and noise. How “signal” is defined is a function of what we want to look for. For instance, a study of insect infestation history might look at “temperature” information as noise. Similarly, for a paleotemperature study, random stuff, insects, and non-temperature climate effects (wind, precipitation) would all be “noise”.

(Or would non-temp climate effects like precip

notbe “noise”, on a teleconnections-like view?)Steve McIntyre’s analyses have shown that there are many ways to get spurious but statistically significant results, when there is redness (time-series autocorrelations) in the treering signal. It seems to me that SMcI

assumesthat 100% of the redness comes from the “non-climate-signal” part of the proxy data (eg the stripbark pattern of growth).Gavin Schmidt has addressed the redness issue, briefly and elliptically in some comments (at tAV? I didn’t tag them). In distinction to SMcI, Gavin

assumesthat 100% of the redness comes from the “temperature-signal” (or perhaps “climate-signal”) part of the proxy data. ISTM.If I’m understanding this correctly, we have the two adversaries, each with a particular and extreme idea of what the source of treering “redness” is. This is good, in that it should be possible to formulate the competing claims into hypotheses that are testable. If I understand this correctly, both SMcI and Gavin cannot be correct. Obviously, the truth could be somewhere in the middle — which would be important to know. Then the follow-up questions — Are the sources of redness fairly invariant over time? For treerings, do they vary from stand to stand, or from species to species?

I’d be interested to see references to the dendro literature where this subject has been addressed — it doesn’t take a lot of thought to end up with these questions, so I assume they have been asked.

It seems to me that Kenneth Fritsch’s approach to Ljungqvist et al. could be one way to attribute the redness of proxy time series. That would be neat.

## jstults said

You gents are headed in the right direction:

http://www.informaworld.com/smpp/section?content=a920254465&fulltext=713240928

Keep up the good work.

## Kenneth Fritsch said

Amac, you are interpreting correctly. I saw that gap before but did not bother to go back and make sure is real and not listing error – now I will. I am rather sure that it is real. The reason for the fewer stations was that I wanted to look not only at the spatial correlations of GHCN station series but also how well that relationship holds up over time. I used the 447 GHCN stations over a 41year period for my re-sampling for comparison with the Ljungqvist proxies. an thus had a better distance coverage there.

Your second post is in line with what I have been thinking about in using proxies to estimate temperatures and, of course, the (next) issue here will be what does white noise do the GHCN relationship of distance separation versus series correlations.

I keep thinking that the noise to signal issue is often handled by some climate scientists as an a prior that there is a temperature signal in there – and that it can extracted. In the link below I saw a criticism of MW2010 on the use of the Lasso method that related to this – I think. The critique went something like MW had put too stringent an a prior on proxies correlating well with local temperature, but rather that it is the average of the proxies that is the indicator and an indicator of more regional temperatures than local.

I think this all boils down to the thinking that a temperature signal in a locale proxy can potentially be overcome with other factors. I am not sure that I see how averaging a very weak signal over several proxies is necessarily going to bring that signal out. I think it is a given that proxies can be grouped together by selective processes to show just about any “temperature signal” you want to show and that process does not prove that there is a (extractable) signal in the proxies.

My other point is that climate scientists have used spatial and temporal correlations of temperatures from stations to impute missing data points in space and time in attempts to estimate the uncertainty in instrumental temperatures (and trends) due to incomplete coverage. If these relationships are greatly weakened by noise added to the temperature signal as would be the case for proxies then the uncertainty of the already sparse data we obtain, for even extensive reconstructions, would be greatly increased. I do not believe that climate scientist consider this point in dealing with reconstruction uncertainties.

http://www.people.fas.harvard.edu/~tingley/Blakeley_Discussion_Tingley_Submitted.pdf

## AMac said

Re: Kenneth Fritsch (Dec 7 16:05),

It’s always good to avoid reinventing the wheel. Are there any cites to articles in the dendro literature where folks have considered the problem of where proxy-data “red noise” comes from? I find it hard to believe that such a basic and important question would have been neglected by an entire field.

Until reading your post, the way I’d been imagining one could approach this was different. Take a bunch of series where there is (and should be) a strong correlation with the local instrumental record. Say, stand of near-treeline Siberian larches. Now take a bunch of similar series where there isn’t (and shouldn’t be) a similar strong correlation. Say, a stand of larches from a few hundred km to the south. Or, for a montane indicator species, choose a stand that’s a few hundred meters lower in elevation.

Characterize the temporal autocorrelation, as ARMA (x,x) (etc.), and quantitate its magnitude.

Steve McI would predict that treeline-near and treeline-far stands would show similar patterns of red noise.

Gavin would predict that the redness is restricted to treeline-near stands.

Separate issue: it seems to me that the meaning of “signal” should be nailed down early in the discussion. With respect to paleotemperature reconstructions, is “signal” anything that’s climate-related, including precipitation, days of growing season sunlight, and wind? Or, since we are discussing temperature, is “signal” restricted to that part of the treering series data that tracks with the local instrumental record?

## AMac said

Re: Kenneth Fritsch (Dec 7 16:05),

I scanned the Martin Tingley piece you linked. As far as the main issue of concern that I (as an outsider) see with proxy-based paleoclimate reconstructions, his analysis is inadequate. He proves that “The LASSO gives inferior results in [certain] situations…” But what are these? They are “situations where each of a large number of predictors is only weakly correlated with the target series, but the mean across all predictors is highly correlated with that target.”

“Maximizing the signal that is present” is a high-priority goal for a paleo reconstruction.

“Ensuring that noise is not mis-identified as signal and then processed as such” strikes me as the necessary handmaiden, if methods are to produce meaningful reconstructions.

Tingley has constructed his pseudoproxies so that they contain predictors that are weakly correlated with a known target series. OK, fine. But some of the elephants in the room are those data sets that are thought to contain correlated predictors — but do not. (Another problem is posed by data sets that indeed contain correlated predictors, but which are excluded from analysis.)

## Kenneth Fritsch said

I am definitely using signal to mean temperature and I think that most reconstructions we discuss here are looking for temperature (trend) indicators. I have seen references to climate indicators but I always thought that was a scientist being coy. Of course I may find that the W in AGW means not Warming but Wetness or even Whatever.

A problem with a proxy that is affected by other climate or whatever variables when we look for a temperature signal that potentially can give us a temperature trend is that that proposition, I think, must assume that the other variables do not have trends but tend to cancel out over time.

## Kenneth Fritsch said

I added white noise to the GHCN 447 stations with temperature anomalies from 1950 to 1990 and redid the re-sampling of these stations as I had done previously with the same GHCN series without noise added. The noise added gives a temperature signal to noise ratio of approximately 0.4. I added red noise to the white noise to keep the AR1 of each series approximately the same. The AR1 of these series regression residuals is relatively low and, on average, with a mean AR1 of 0.135 and a standard deviation of 0.130.

The histogram with the Ljungqvist proxies log10 probability shown at -0.33 (vertical line drawn upward from the x axis) shows that at this ratio of signal to white noise, the distance separation to correlation of the series for the GHCN station data are degraded to approximately what we see with the Ljungqvist annual proxies. The GHCN series with white noise added had 1/3 higher (and 2/3 lower) probabilities of the regression trend line being 0 than for the Ljungqvist proxies.

Next, I would like to apply a similar analysis to other temperature reconstruction proxies and compare the series correlation to separation distance with that for the GHCN station temperature anomalies.

The dilemma I see for reconstructions, with high levels of noise, in the attempts to extract a temperature signal are that in order to report an average for a region or the whole globe, the uncertainty of that interpolation for much missing spatial data is required. That interpolation and the uncertainty of it depends on a spatial correlation of the proxy data. With sufficient noise that correlation degrades dramatically and even for some of the less distance separations. Now, one could make the proposition that that relationship can be estimated using modern times instrumental temperatures with the more complete spatial coverage that these station data provide, but that approach assumes that the spatial relationship is stationary over time or that its changes over time can be estimated. That is why the temporal changes in the spatial relationship must also be considered for the instrumental period and assumed to change at the same rate when one goes back into the time periods covered in reconstructions. The problem with measuring the change in the spatial relationships with proxies is that when little of no trend is seen over a large time periods what does it mean to compare that relationship over incremental sub time periods.

## AMac said

Re: Kenneth Fritsch (Dec 8 13:42),

The first two-part question for Tingley and for you is, “Which type(s) of noise to add? And how much of each?”

Clearly, adding “a little” noise isn’t going to change things much, while adding “a whole lot” will bury the signal in the proxy or pseudoproxy. So we are talking about the middle ground that’s relevant to the proxies under consideration. Does “everybody” — Gavin, Steve, you, Tingley — agree on the which and how-much that is meaningful to treerings (etc.) over the past two millenia? Clearly not. As an outsider and a statistical ignoramus, I have no feel for how an S/N of 0.4 and an AR1 value of 0.135 map to the characteristics of the data sets.

Alas, I can only interpret the histograms in the crudest qualitative fashion. For instance, what are the units on the Y-axis?

As far as the dilemma you posit, I think that is so. Mann08 actually was a stab at attacking the spatial part of that problem, which IMO was admirable. At least in theory, disregarding the actual sausage-making. Recall that Mann08’s idea was to compare proxy series to the temperature anomaly calculated by HADCRU for the local 5 degree by 5 degree grid from instrumental record. This was instead of trying to match data sets to a global temp anomaly. Due to paucity of computer skillz, for Tiljander I had to extract the Southern Finland gridcell by hand. It is striking how sparse worldwide coverage becomes as one approaches the 19th century. There just isn’t much instrumental data beyond Western and Central Europe and North America.

Mann08’s approach also led to situations like the following. For the Tiljander proxies, there is also a nearby weather station with records going back to the 1890s or so. By inspection, that temperature record isn’t hugely similar to the gridcell reconstruction. So, did it really make sense to prefer matching proxies to the gridcell?

And, as you note, there can be no assurance that the pattern that held sway during the instrumental period was a faithful continuation of the pattern(s) that prevailed in prehistoric times. If patterns did change, the uncertainties produced via this approach are much larger than they are currently thought to be.

## Layman Lurker said

Fascination project Kenneth. What about adding AR1 noise with a coefficient of .2 (as Michael Mann suggests)to your GHCN sample?

## Kenneth Fritsch said

That would, of course, be frequency of occurrence, i.e. the number of times out of a total of 1000 trials (resamples) for a particular bin of log10 of probability of the resulting trend for regressing station series correlation (pairwise) versus station separations (pairwise) being 0.

## Kenneth Fritsch said

LL, what I need to do now is look at the AR1 of the proxies series. The GHCN annual station data for the time period 1950-1990 had an average AR1 of 0.135 – which is small.

## Layman Lurker said

Yeah, I was thinking about the same thing. I recently did the the non-bristlecone acf for the NOAMER network proxies from MBH98 for kicks.

## Kenneth Fritsch said

In the link below I show the AR1 values for the residuals of the regression of the z values over a 947 year period for the annual Ljungqvist proxies as identified by number of appearance from the list in Ljungqvist 2010. Included in the table are the mean and standard deviation of 23 subsets of the 947 year period of 41 years each of the same AR1 for regression of proxy z values over 41 years.

To be noted from the table data is that the longer term 947 year time period increases the calculated AR1 values in all proxies over the average AR1 for the 41 year subsets. The order of the AR1 values for the proxies remains the same for the long term and shorter term 41 year subsets.

More important, I think, is the observation that the between proxy AR1 variations are approximately 50 percent larger than those for the intra AR1 variations for the 41 year subsets over the 947 year time period. While this analysis is better suited to ANOVA, the evidence in my mind indicates that the variation in AR1 between proxies may well be an artifact of the proxy and not related to the AR1 of temperature that the proxy is supposedly linearly related. Recall that over the 41 year period from 1950-1990, the 447 GHCN stations had a mean AR1 of 0.135 and standard deviation of 0.130.

I am very suspicious of the Ljungqvist proxy 28 with an AR1 over 0.90. Such high AR1 values are seen more often, as I recall, when the annual values are obtained from an interpolation of decadal or longer data intervals. That is not the case that is ascribed to proxy 28 by Ljungqvist 2010.

The variations in the AR1 for the Ljungqvist proxies appear to be in line with those shown for Bristle Cones by LL in an above link.

## Layman Lurker said

Actually, I removed the bristlecones from the network before the excercise.

## Kenneth Fritsch said

Sorry LL, non-bristlecones in the link might have been my clue.

What do you think the large variation in proxy AR1s means?

## Ryan O said

A good guess might be that they are responding to something other than temperature in their local environment (precip or other disruption, perhaps) or there are systematic measurement errors for certain proxies (like treerings from cores that are not perpendicular to the growth axis rather than cut samples). It would be interesting to see the AR by proxy type and how ring-width measurements were conducted.

## Layman Lurker said

#19

Sure looks like a smoothed over signal to me Kenneth. In fact there are many time segments of annual values that seem to simply fall on multi year and even decadal linear slope with 0 deviation. Here is the last 500 years plotted for proxy#28 (Dongge cave). I usually graph these things using lines but the dramatically high autocorrelation of proxy #28 really hits home with the individual values plotted. The AR1 coefficient for the full raw archived proxy #28 data was a whopping 0.9633. I don’t know the first thing about cave proxies therefore don’t know how typical this is.

#21 and #22

Any literature I have read about tree rings chalks up “noise” as responses to non-temperature factors. I have not seen anything which says this noise must be “white” or even that it must be limited to AR1(.2). As to the AR1 variability, I suppose if there are variable, non temperature factors which influence trw growth from time to time it stands to reason that the AR1 variability reflects the variable nature of these processes.

I did an ac histogram on the bcp’s as well but I see that I did not save an image. When I get time I will go through my stuff and re-do the image and post it. Off the top of my head I think that all but one of the bcp series exceeded 0.7 as an AR1 coefficient. However when it comes to L10, his proxies were processed in a manner that did not involve a short centered PCA model, or a calibration screening, or a calibration weighting. The high AR value does not influence the weighting of proxy#28 in the L10 reconstruction. In fact I think Jeff (and maybe even Ljungqvist?) did the senstivities on each of his proxies in the reconstruction and they held up well. Still, with all the stuff that Mann has put out trying to lay the foundations for tame, low value AR1’s as the convention for simulations and Monte Carlo’s it would be interesting to go through the other published reconstructions using tree rings just to look at the ac of raw proxies (and the residuals if possible) just to see if they shake out as Mann suggests.

## Layman Lurker said

Here is the scoop on proxy# 28:

So it’s 5 year resolution.

## Layman Lurker said

Apologies Kenneth, I downloaded the proxy #28 from Ljungqvist 09. IOW the wrong data set! Oh well, at least we learned a little bit about Donnge Cave.

## Kenneth Fritsch said

LL I am going to link to a graph of the time series from Ljungvist 2010 which appears 28th on the list of 30 proxies. It appears that it is an annual interpolation of more thasn 5 years. I now need to go back and make sure I did not select the wrong proxy in my R manipulations of the proxy data.

## Kenneth Fritsch said

LL, I rechecked proxy 28 and it is supposed to be an annual TRW proxy described by Ljungqvist as:

28. Dulan, NE. Qingfhai-Tibet.Plat 36N 98E TRW Annual

The image in the link below strongly suggests that there was a 10 year interpolation of data. Why would one do that for a tree ring series where the resolution is annual?

## Layman Lurker said

#27

I think it really depends on the type of product an analyst may be looking to produce. Once you start tossing 10 year data with interpolated annual values into a meat grinder with other data series that use annual means as values, then I think things can get dicey real quick. For a 2000 year reconstruction like L10 where no individual series had a weighting of more than approx 0.035 then it may not be too bad if expressed as standard units, but when it comes to calibrating to align and scale the series with a narrow calibration period that may be another issue entirely.

## Kenneth Fritsch said

I wanted to assure myself that the data I was using was the same as represented in the Ljungqvist 2010 paper and to that end I went online and purchased a copy of it. I found a minor problem in the paper where latitude and longitude of the proxy locations were represented by designations such as 54 degrees and 92 minutes north. That is a bit like saying it is 92 minutes past 4 when it should be 32 minutes past 5. I think the confusion comes from the mixing of decimal degrees and degree/minutes. Anyway for my distance calculations in my analysis here it will not make a major difference.

The paper is authored by Fredrik Charpentier Ljungqvist, who is with the History Department at Stockholm University, Sweden. It is refreshing to see a non-Team member write in a clear and non advocacy manner listing many of the uncertainties connected with the temperature reconstructions. I have reproduced some of these comments below.

Here Ljungqvist references the wide variability found in different reconstructions. I think what has slipped by many skeptics is the differences in Mann 2003 and Mann 2008 reconstructions in the large difference in variability. Mann attempts to minimize this obvious difference in his comments but that in my view takes the audacity of a Mann to do.

The author makes a good point about the problematic early ending of the proxies.

Clearly the author is not “hiding the decline” here but strongly pointing to the instrumental splice at the end of most reconstructions.

The author brings forth the linearity issue and the calibration period that lacks the warming of the past two decades.

The author here, unfortunately, only touches on the issue of reconstruction amplitude suppression.

The author reiterates what the “hide the decline” was meant to hide. Let us look at reconstructions without the instrumental period tacked onto the end.

## Kenneth Fritsch said

I have also gone back to the proxies found in Ljungqvist 2009 and found some annual proxies (that were not replicated in L2010) that will expand my analysis of comparing the series correlations versus distance separation between proxies and GHCN stations. I plan to show that analysis here in the next day or two.

## Kenneth Fritsch said

I have finished my initial analysis of the combined Ljungqvist 2009 and 2010 annual proxies for the relationship of series correlations versus distance separation of the proxies for comparison with the same relationship for the GHCN station. I will not be showing the GHCN results as those results remain the same as shown previously. I used 41 year subsets of a 990 year continuous period for the proxy data that was reduced to Z scores for the longer period. In each period I determined the series correlation to distance separation for 300 pairs of proxies derived from 25 proxies. I further reduced these pairs to those with less than 3000 km separation; the distance found to be limiting in the case of the GHCN station data. Finally I calculated for each proxy pair the mean and standard deviation of the 24 subsets of 41 years each.

The proxy locations and descriptions, along with AR1 results, are listed in the first link below.

The second link below shows a graph of the relationship of interest for the 45 pairs of proxies that were separated by less than 3000 km and table showing in more detail the 5 pairs with the highest correlations. I was able to obtain more and closer pairs using both the Ljungqvist 2009 and 2010 proxy data. While I obtained higher correlations for the closely spaced proxies the overall correlations are much reduced from those obtained from GHCN station data and that includes the closer spacings here.

This time I calculated the AR1 for the proxy series over 990 years for both the residuals (from regressing the series versus years) and the series. It can be seen in the table that the residual and series AR1′ s are not much different. The auto correlations have values over a wider range than seen with GHCN station data. What I also found interesting was that even in the closely spaced pairs (Yamal was different proxies at the same location) that the auto correlation values are substantially different in 3 out of the 5 cases.

In the future, I want to look more closely at the GHCN station and proxy series differences in series auto correlations and to look at higher order correlations and the partial auto correlation values and patterns for these series. Partial auto correlation measures the correlations of higher orders not attributed to the effects of AR1.

## Layman Lurker said

Kenneth, correct me if I’m wrong in the following suggestion. Eyeballing your correlations vs distance I think you should consider a different regression model for fitting to your distance correlations (I am assuming it was linear). If you have not fitted a proper model then the residuals are unlikely to yield useful information. Consider the types of functions for fitting the plot of distance correlations that Ryan uses in this post. I believe he links to his code in the post so you should find the functions there.

## Kenneth Fritsch said

LL, I am actually more interested here in denoting differences that I see between GHCN station and proxy data. A fitting of a model to the data is best attempted when one can deduce some physical meaning to what is shaping the relationships. I suspect that the latest graph of the L0910 proxy relationships could be considered on, one hand an exponential fit, or more likely that once one gets further than walking distance from a proxy the relationship of series correlation to distance separation quickly shuts down. You then have a sharply declining straight line down to a breakpoint where the second line has little or no slope.

If I would have shown a similar graph for the GHCN station data extending beyond 3000 km the relationship would have had a similar shape with the breakpoint to zero trend occurring at approximately 3000 km for station data and less than 500 km for the proxy data.

I think this difference is due the proxy response noise as Ryan O notes above and I have ascribed previously. Based on what you, and Ryan O have stated and what I recently read in MW 2010, I think that a good look at differences between station proxy data auto correlations is in order. Fitting ARMA models to station and proxy data might be instructive.

What really puzzled me was the two Yamal proxies which give a reasonably good series correlations (for proxies) had very different series AR1 values. I strongly suspect I will not see that in GHCN station data.

Also important to note that at the same proxy location, as the case of the two Yamal proxies, the climatic non temperature noise can be very similar and we are then only looking at perhaps measurement noise. Also it is important to remember that a correlation of 0.85 gives one proxy response explaining 70% of the other and correlations of 0.70 and 0.35 only 50% and 12 %, respectively.

## Monex said

Once again I was surprised heres four synthetic PC1s that all have the same correlation with the monthly data month by month ..As you can see not only are the PC1s different but their trends are also different..CONCLUSIONS.1 Very different tree ring width patterns can give identical correlations with a given set of monthly temperatures..2 The fact that tree ring widths are correlated with monthly temperatures does not mean that tree ring width trends are correlated with temperature trends..3 For any given set of monthly RW temperature correlations there exists a family of individual different RW curves which will give the same correlations with the monthly temperatures within instrumental accuracy ..