the Air Vent

Because the world needs another opinion

Replicating Steig’s Verification Statistics – Pt 1

Posted by Jeff Id on April 24, 2009

An excellent guest post by RyanO. Ryan has spent a great deal of time in an attempt to replicate the correlation/verification of the quality of reconstruction statistics of Steig et al. in the Antarctic. This is a complex post which should take an experienced reader several careful looks to understand the meaning. I have not verified the work simply due to the incredible time and substantial calculations put in, his math processor has been sweating a bit. Ryan is trying to understand the statistical calculations which are the absolute center of Steig’s warming paper yet the code used for the calculation has been refused so far.
I very much enjoyed reading it in the same way you might enjoy finding the answer to a crossword, but I’ll leave my own thoughts for later. In the meantime let’s revel in the mathematics one has to go through just to understand what should be a simple temperature trend calculation on the Antarctic. While you’re doing that, imagine you are the unpaid peer assigned to review the paper without the code used in making the described calculations.
———-

REPLICATING STEIG’S VERIFICATION STATISTICS: PART ONE

The verification statistics used by Steig et al. in their paper are r (correlation coefficient), r2 (coefficient of determination), RE (reduction of error), and CE (coefficient of efficiency). The first two – r and r2 – are probably familiar to most of you. They are simply the Pearson product-moment correlation coefficient and its square. The second two may not be as familiar, so I will explain them briefly.

RE and CE are defined as the following:

fig_1

Fig. 1


Where xi and xi-hat are paired original/reconstructed data points in the verification period, xC-bar is the mean of the original data in the calibration period, and xV-bar is the mean of the original data in the verification period. As far as “calibration period” and “verification period” go, there is nothing special about either. You can define whatever calibration and verification periods you want, so long as they don’t overlap and there are at least 2 pairs of original and reconstructed data in each period.

The means (xC-bar and xV-bar) are often called the “climatological mean”. So when someone says, “Our verification statistics show improvement over the local climatology,” they really mean that the reconstruction explains more variation than fitting a zero-slope line. As such, RE and CE are not particularly hard tests to pass. You can be wildly off and still obtain positive RE and CE values. All you have to do is be better than that zero-slope line. So we might safely say that RE and CE do not have much statistical power.

Because of this, something else must be done to verify that the resulting RE and CE values weren’t merely due to chance. A typical method to determine this is to perform Monte Carlo simulations with red noise and calculate RE and CE for each simulation. The RE and CE numbers from the reconstruction must exceed a certain percentile of the RE and CE numbers from the Monte Carlo simulations (Steig chose the 99th) in order to have a less than 100-minus-percentile (in Steig’s case, 1%) percent chance of having been the result of a red noise process.

The reason for using red noise is because most natural processes have some degree of persistence, where subsequent values depend on previous ones even in the absence of any real underlying trend. Temperature falls into that category. So if you want to properly model a random, trendless temperature series, you can’t simply use Gaussian (or white) noise. Your resulting model will not properly capture the fact that in the real natural temperatures (and, hence, the residuals from a fit) at different times are correlated to each other.

A common theme in climate science is to use AR(1) noise – which is what Steig did. AR(1) noise simply uses the lag-1 autocorrelation coefficient, which makes the value at time t0 dependent on the value at time t-1 and then superimposes Gaussian noise on top of that. However, I have yet to see a temperature series that actually is properly modeled by AR(1) noise. Everything I’ve looked at in Antarctica, the Arctic, and Siberia have more persistence than an AR(1) model.

In order to determine the appropriate model, I used the following methodology:

  1. Find the autocorrelation factors for the AVHRR data for each gridcell.
  2. Perform 100 Monte Carlo simulations using the first n factors and R’s arima.sim function to generate AR(n) time series on each gridcell (550,900 total simulations per noise model).
  3. Find the autocorrelation factors for each simulation and sum the squares of the factors.
  4. Find the difference between the sum of the squares for the AVHRR data and the mean sum of the squares for the 100 simulations performed on each gridcell.

I performed this for AR(1) through AR(10) noise models, and also performed this for an ARFIMA noise model (which uses all of the autocorrelation factors for the time series – Steve McIntyre and Ross McKitrick used this noise model for their paper on Mann, which is where I got the idea), for a grand total of 6,059,900 Monte Carlo simulations (1,100 per gridcell). I then plotted the difference.

fig_2

Fig. 2

This panel shows samples of the difference. AR(1) noise clearly understates the persistence – the residuals are predominantly positive. ARFIMA (bottom right) way overestimates the persistence. The model that matched the best was the AR(8) model (just above the ARFIMA plot). The sum of the residuals for AR(8) was -0.00125.

R Functions to perform the above:


### Sum of Squares function

ssq=function(x) {sum(x^2)}

### Find the sum of squares for a set of simulations

get.mean.ssq=function(dat) {

mean.ssq=vector()

for(i in 1:ncol(dat)) {

mean.ssq[i]=ssq(acf(dat[, i], nrow(dat), plot=FALSE)$acf)

}

mean.ssq

}

### Function to perform the Monte Carlo simulations

get.montecarlo=function(dat, method=”arima”, method.p=c(1, 0, 0), num=100, start.date=1982) {

library(waveslim)

### Set up temporary lists

sim=ts(matrix(nrow=nrow(dat), ncol=num), start=start.date, freq=12)

sim.list=list()

### Use for ARIMA noise

if(method==”arima”) {

### Get the AC factors

for(i in 1:ncol(dat)) {

arima.coef=arima(na.omit(as.vector(dat[, i])), order=method.p)$coef[1:method.p[[1]]]

### Run the simulations

for(j in 1:num) {

sim[, j]=arima.sim(list(order=method.p, ar=arima.coef), n=nrow(dat))

}

### Save the simulations

sim.list[[i]]=sim

}

}

### Use for ARFIMA noise

if(method==”hosking”) {

for(i in 1:ncol(dat)) {

### Get all the AC factors

p.dat=na.omit(as.vector(dat[, i]))

n=length(p.dat)

acfs=acf(p.dat, n, plot=FALSE)[[1]][1:n]

### Run the simulation

for(j in 1:num) {

sim[, j]=hosking.sim(n, acfs)

}

### Save the simulations

sim.list[[i]]=sim

}

}

### Return the set of simulations

sim.list

}

Now that I’ve identified a more proper noise model, the next step was to duplicate the following excerpt from the Supplemental Information using AR(1) noise and then see if it changed substantially if AR(8) noise was used:

fig_3

Fig. 3

In order to arrive at this, Steig performed 1,000 Monte Carlo simulations per gridcell, calculated r2, RE, and CE, took the 99th percentile, and verified that these values were exceeded by the reconstruction r2, RE, and CE for the same gridcell. He did this twice, using a calibration/verification period of 1982-1994.5/1994.5-2006 for one run and reversing the order of the calibration/verification periods for the second run. He then plotted the minimums of (r2recon) – (r2sim), RE, and CE.

Using the same process as I described earlier, I performed 1,000 simulations per gridcell using the AR(1) model and an additional 1,000 simulations per gridcell using the AR(8) model for both combinations of the calibration/verification periods. I then calculated r2, RE, and CE (along with the Hurst coefficient and Kendall’s tau). I took the 99th percentile and compared it to the reconstruction r2, RE, and CE. In all cases, the reconstruction values exceeded the Monte Carlo simulations.

To see the difference in the 99th percentile values for r2 for the AR(1) vs. AR(8) models, I plotted those as well:

fig_4

Fig. 4

As you can see, in this case, it made very little difference. Note the scale as well – the black dots correspond to a whopping 0.019. So the good news for Steig et al. is that their choice of an AR(1) model has little impact on their verification statistics. This is not always the case, but it is definitely true for their Antarctic reconstruction.

R Functions:

get.stats=function(orig.cal, orig.ver, est.cal, est.ver) {

### Find common values (remove NAs without disturbing the pairing)

map.cal=!is.na(orig.cal) & !is.na(est.cal)

map.ver=!is.na(orig.ver) & !is.na(est.ver)

orig.cal=orig.cal[map.cal]

orig.ver=orig.ver[map.ver]

est.cal=est.cal[map.cal]

est.ver=est.ver[map.ver]

### Get calibration/verification period means

orig.mean.c=mean(orig.cal)

orig.mean.v=mean(orig.ver)

est.mean.c=mean(est.cal)

est.mean.v=mean(est.ver)

### Get residuals

cal.r=orig.cal-est.cal

ver.r=orig.ver-est.ver

### Calculate Hurst parameter

hurst=HurstK(c(orig.cal, orig.ver))

### Calculate average explained variance [Cook et al (1999)]

R2c=1-ssq(cal.r)/ssq(orig.cal-est.mean.c)

### Calculate Pearson correlation for verification period

### [Cook et al (1999)]

pearson=cor.test(orig.ver, est.ver, method=”pearson”)

r.ver=pearson$est

r.ver.p=pearson$p.value

### Calculate Pearson correlation for all available data

### [Cook et al (1999)]

pearson=cor.test(c(orig.ver, orig.cal), c(est.ver, est.cal), method=”pearson”)

r.all=pearson$est

r.all.p=pearson$p.value

### Calculate Kendall’s Tau for the verification period

tau=cor.test(orig.ver, est.ver, method=”kendall”)

tau.ver=tau$est

tau.ver.p=tau$p.value

### Calculate Kendall’s Tau for all available data

tau=cor.test(c(orig.ver, orig.cal), c(est.ver, est.cal), method=”kendall”)

tau.all=tau$est

tau.all.p=tau$p.value

### Calculate RE [Cook et al (1999)]

RE=1-ssq(orig.ver-est.ver)/ssq(orig.ver-orig.mean.c)

### Calculate CE [Cook et al (1999)]

CE=1-ssq(orig.ver-est.ver)/ssq(orig.ver-orig.mean.v)

### Return vector

stats=c(hurst, R2c, r.ver, r.ver.p, r.all, r.all.p, tau.ver, tau.ver.p, tau.all, tau.all.p, RE, CE)

stats

}

get.stats.matrix=function(orig, est, cal.st, cal.en, ver.st, ver.en) {

### Set up the placeholder matrix for intermediate calculations

data.matrix=matrix(ncol=ncol(orig), nrow=12)

### Set up the variables for the stats function

orig.c=window(orig, start=cal.st, end=cal.en)

orig.v=window(orig, start=ver.st, end=ver.en)

est.c=window(est, start=cal.st, end=cal.en)

est.v=window(est, start=ver.st, end=ver.en)

### Call the stats function for each column

for(i in 1:ncol(orig)) {

data.matrix[, i]=get.stats(orig.c[, i], orig.v[, i], est.c[, i], est.v[, i])

}

data.matrix

}

The next step is to reproduce the graphic from the SI. Since I had already calculated all of the relevant quantities, this should be a snap. The first graphic I attempted to reproduce was the r2 graphic. The result:

fig_5

Fig. 5

fig_6

Fig. 6

At first glance, it looks similar. However, the more I looked at it, the more I began to suspect that something was not right. My plot has minimum values on the Weddel Ice Shelf and the Steig graphic has a continuous minimum all along West Antarctica and the Peninsula. The red filaments extending into West Antarctica from the Antarctic interior have a different shape. The lower-right quandrant is totally wrong – my plot has the maximum r2 values in that area . The Steig graphic has minimums there and maximums up by the Amery Ice Shelf. Lastly, the east coast drops off in correlation in mine, but does not in Steig’s.

Furthermore, when I plot RE/CE, I obtain no values less than zero. This is clearly different than (b) and (c) in the Steig graphic, which have definite areas of negative values:

fig_7

Fig. 7

So I tried all kinds of things. I tried deliberately calculating RE and CE wrong. I tried scaling RE and CE to the Monte Carlo values. I tried randomly subtracting/adding constants. I reversed the AVHRR data and the reconstruction in the stats function. No matter what I did, the resulting plot either looked totally wrong, or it did not have minimums in the same areas shown in (b) and (c) in the Steig graphic. I could find no way to reproduce those graphics.

Being frustrated, I decided to turn my attention to the PCA recon. I performed the same analysis on it and plotted:

fig_8

Fig. 8

Aha! That looks more like it. Note how the east coast retains decent correlation, I have a continuous minimum along all of the West and the Peninsula, a maximum at the Amery Ice Shelf, and the shape of the filaments extending into West Antarctica is correct.

Steig’s graphic is of the PCA reconstruction – not the TIR reconstruction.

Being suitably encouraged, I decided to try plotting RE and CE.

I received another surprise:

fig_9

Fig. 9

Holy crap.

The negatives are in the right spots, but they’re way negative.

Again thinking that maybe I had done something wrong, I recalculated. Same result. Thinking that Steig may have accidentally done something wrong, I tried all kinds of things – calculating RE/CE incorrectly, scaling RE and CE to the Monte Carlo results, calculating them using the TIR recon vs. the PCA recon, changing the calibration/verification periods – everything I could think of.

In the process of doing this, I realized something. Take a very close look at the color scale Steig used. I couldn’t see the difference on my laptop, but on a CRT I could: Some of the yellow in the color scale is below the zero line. If some of the yellow areas in the Steig graphic are actually negative, it helps reconcile the two images. Even with that, however, the images do not seem to be identical. West Antarctica is too negative in my graphic.

The closest I came to being able to replicate the Steig image was the following:

fig_10

Fig. 10

Note by doing this I retain the strong negatives in between the filaments stretching from the interior into West Antarctica, and I have a local minimum along the west coast. The measurement, however, is physically meaningless and I don’t see any way of accidentally plotting this.

A similar result is achieved by dividing CE by the 99th percentile of the Monte Carlo CE and then adding this to a 2*r2 plot:

fig_11

Fig. 11

This way visually works even better (keeping in mind that some of the yellow and all of the white in the Steig graphic is actually negative). The shape of the negative regions in West Antarctica is almost exactly replicated. However, the scale is -2 to 2 and the calculation itself is entirely gibberish.

CONCLUSIONS

  1. The TIR reconstruction produces decent r2, RE, and CE statistics when compared to the AVHRR data in the 1982-2006 timeframe (the comparison to the ground data – including comparisons back to 1957 – will be Part Two). Though the authors used AR(1) noise instead of the more realistic AR(8) noise, this conclusion is unaltered.
  2. The graphic in the SI is not the TIR reconstruction. It is the PCA reconstruction. I am assuming this is was accidental because the PCA reconstruction has horrible r2, RE, and CE statistics compared to the TIR reconstruction.
  • I have no clue how images (b) and (c) in the Steig graphic were produced, but I do know that the color scale is misleading. That alone does not seem sufficient to account for the differences, as the strongest negative regions do not exactly match. The only way to replicate the appearance of the graphic is by using nonsensical equations.

  • 133 Responses to “Replicating Steig’s Verification Statistics – Pt 1”

    1. TCO said

      McI was very evasive on his overly modeled ARFIMA red noise. He did not like having his ass pinned down. Total weasely sea lawyer. Need to tape him up in the overhead.

    2. TCO said

      Seriously…it is funny how the same guys who complain abougt degrees of freedom and overly complex fits…at the same time want to overmodel red noise. BTW, I assume your perioud is a month. Realize McI was using a year as the period and modelling each series opposite itself. His noise was REALLY way overmodelled.

    3. Ryan O said

      #1 and #2: Yep, my period is a month. In this case, ARFIMA way overstates the persistence of temperatures. However, comparing this to the paleo stuff is apples and oranges. Were I to reduce the temperature series to decadal values and extend them back 2,000 years, an ARFIMA noise process might very well be appropriate.
      Koutsoyiannis has done a lot of work on long-memory natural processes and has found that fractional Gaussian noise (the ARFIMA model) is appropriate for many climate-related and hydrological processes. The waveslim R package itself includes a real-world example of when ARFIMA would have been appropriate (modeling uncertainty of flood basins after damming a river).
      The noise model to use depends on the situation and the time-averaging used. Over short time scales (like monthly), long-term memory might be (read: is) masked by the noise. Using longer averaging removes much of the Gaussian component, which increases the signal-to-noise ratio of the lower frequency component (the long term dependency). So drawing any conclusions about noise processes for long time series on the basis of short time series is inappropriate.
      Some Koutsoyiannis references:
      http://www.itia.ntua.gr/en/docinfo/511/
      http://md1.csa.com/partners/viewrecord.php?requester=gs&collection=ENV&recid=8543215&q=&uid=1056180&setcookie=yes
      http://www.itia.ntua.gr/en/docinfo/849/

    4. AndyL said

      Ryan

      This looks like a great piece of work, and it is certainly clearly presented so that people like me who do not understand the detail can follow it.

      Have you tried contacting Steig or his co-authors to ask the specific question about whether the correct graphics were used, and how they were calculated?

    5. Ryan O said

      #4 No, not yet. I was going to finish up doing the entire replication of their verification statistics first. That way, if I find anything else unusual, I can ask all at once.

      As far as the impact of the above replication on the paper, while it potentially shows some carelessness (or shows that I made a mistake), it does not affect the conclusion in the text. The conclusion in the text is that the 99th percentile Monte Carlo values are exceeded by the reconstruction for all locations for the TIR reconstruction (and they are). It is more of a curiosity that the graphic presented for the RE/CE calculations appears to be a mangled depiction of the PCA reconstruction rather than the TIR reconstruction.

      So it’s interesting, but it’s also something that doesn’t affect their main conclusion.

      It does, however, limit the usefulness of the PCA reconstruction as “confirming” the TIR results. Remember that they did 4 separate reconstructions: 1) the main (or “TIR”) reconstruction, 2) the PCA reconstruction, 3) a limited 15-predictor RegEM reconstruction, and 4) the AWS reconstruction. They presented these as four independent methods that confirm the result.

      The problem is that the AWS reconstruction does not show the same result as the TIR reconstruction unless the start/end dates are cherry picked. The PCA reconstruction has horrible verification statistics. No data was presented for the 15-predictor reconstruction, so the results cannot be independently verified because we don’t have the results against which to benchmark a replication.

      So this effectively eliminates the value of the “corroborating” evidence, but it does not make any statement on the validity of the main reconstruction. That will be Part Two. ;)

    6. Kenneth Fritsch said

      Ryan O, thanks much for the comprehensive analysis and explanation. It will take me another read before I can make any constructive comments. You have presented lots of R and lots of AR from which I can learn. Good presentation of the RE and CE statistics that are frequently mentioned at CA and used /resorted to by the authors of temperature reconstructions.

      PS: Could you please get a certain participant at this blog more interested in the analysis details so we would forgo the constant references to McI. It reminds me of rejected divorc`es/divorce`es who spend their hours heaping scorn on their former partners and boring the hell out of all who have to listen. And when they are constantly nagging one gets a clue to why they were rejected.

    7. Ryan O said

      #6 If necessary, I can glue an EAB to his face and bleed off the 150# air system. ;)

      In spite of the reference, his statement does have a good corollary: justification for choice of noise models should be provided. I don’t know enough about paleo to make any statement about ARFIMA with respect to millennial time series of temperature proxies. I think it is possible that ARFIMA could be justified. To be honest, though, I don’t presently have any interest in determining that. Antarctica is sufficient for the moment. ;)

    8. Page48 said

      Thanks for such a detailed explanation.

    9. TCO said

      [snip] – This is a technical post not a mudslinging contest.

    10. hswiseman said

      TCO will now pretend to have a thoughful debate with Ryan O and then throw him under the bus on some other blog, ala his hit job on Lucia at Deep Climate.

    11. TCO said

      [sorry tco] – I’m back from the trade shows and not interested in name calling. I fixed the imposter in #9.

    12. TCO said

      9 was an imposter.

    13. hswiseman said

      And #10 is the juvenilia we have come to expect from a backstabbing chimp.

    14. Fluffy Clouds (Tim L) said

      R.O. this looks like conformation on the problems of short term vs long term, high frequencies vs low frequencies.
      Thank you for the post.

    15. Layman Lurker said

      Good Point Tim. What you are saying I think is that the verification tests are showing a connection between the reconstruction and high frequency temp fluctuations rather than trend. It may be then that the RE and CE would be almost unaffected using Jeff’s -2C constant trend experiment. If this were the case, would the RE and CE statistics have any meaning?

    16. Jeff Id said

      Ryan,

      I’m not sure what the details of the PCA reconstruction are. It seems to me there is more than one possibility for how that was carried out. Can you explain your methods?

    17. Ryan O said

      #15 These verification statistics are all post-1982, so they are driven primarily by the selection of 3 PCs, not RegEM. The RegEM stuff (which is where Jeff noticed the high-freq correlations were selected over the low-frequency ones) will be the pre-1982 statistics.

      Basically, the statistics above show how well the PCs preserve the spatial and temporal information from the raw AVHRR. Based on the r^2 values, the PCs do okay in the Antarctic interior, but do not do well in the Peninsula, along the coasts, or the ice shelves (Ross, Weddel, Amery). To me, just looking at the r^2 plots alone should have been sufficient indication that 3 PCs was not the way to go and that higher-order components were needed.

      #16 The nice thing about the verification statistics is that I don’t have to know how the reconstructions were done to get the statistics. In this case, I just downloaded the cloudmasked AVHRR (convert to anomalies) and the PCA reconstruction from Steig’s site. Then, after loading the functions ssq, get.stats, and get.stats.matrix, I type the following into R:

      pca.stats.early=get.stats.matrix(avhrr.anom, pca.recon, c(1982, 1), c(1994, 6), c(1994, 7), c(2006,12))

      pca.stats.late=get.stats.matrix(avhrr.anom, pca.recon, c(1994, 7), c(2006,12), c(1982, 1), c(1994, 6))

      map=pca.stats.early > pca.stats.late ### logical map to get the minimum values per Steig’s method in SI

      pca.stats=pca.stats.early

      pca.stats[map]=pca.stats.late[map]

      You can then use the map plotting function (I made some more changes to allow 1-sided color scales – I should email that to you cause WordPress will bugger it up) to plot what you want. The matrix for PCA stats is 12 rows and 5509 columns. The rows are:

      1. Hurst coefficient
      2. Average explained variance
      3. r over the verification period
      4. p-value for r
      5. r over the entire period
      6. p-value for r
      7. Kendall’s tau for the verification period (non-parametric correlation coefficient)
      8. p-value for tau
      9. Kendall’s tau for all periods
      10. p-value for tau
      11. RE
      12. CE

      Just select the row you want and plot – i.e., for RE:

      plt.map(pca.stats[1, ], divs=1000)

      Voila!

    18. Ryan O said

      I should point out that the resulting r-values need to be squared when plotted for the r^2 plots, and that they don’t have the 99th percentile Monte Carlo values subtracted. But based on the Monte Carlo r^2′s being < 0.019, it doesn’t affect the plot (I checked). When I made the r^2 plots, I did:

      plt.map(pca.stats[3, ]^2, divs=1000)

    19. Jeff Id said

      Thanks Ryan,

      I didn’t know if you had done your own PCA recon, I think it was Roman playing around with that. Nice map’s by the way.

      The paper is oddly written, before the verification maps above are presented, there isn’t any statement about what the verification statistics apply to. It’s assumed by everyone including the referees doing the peer review that it is the TIR data we’re seeing. ‘After’ the verification presented here it is implied quite strongly to be TIR verification in the AWS data without directly stating what it is.

      We apply the same method as for the TIR-based reconstruction to the AWS-data, using RegEM with k = 3 and using the READER occupied weather station temperature data to produce a reconstruction of temperature at each AWS site for the period 1957-2006.

      In my opinion the reason that the PCA was used and not described is that the recon was billed as showing a weather pattern which only shows up in the Chladni patterns created by PCA on spatially auto-correlated data. Not that there’s some hidden motive, it seems to me they could have simply admitted it was the PCA recon version.

    20. Kenneth Fritsch said

      Ryan O, have you read this thread at CA by Ross McKitrick and linked below on using r, CE and RE for analyzing regressions?

      http://www.climateaudit.org/?p=2418

      The users of RE and CE claim that r is focused on the higher frequencies while RE and CE on lower frequency components of the regression. McKitrick notes that these tests when used in conjunction should be used not to qualify a regression by it scoring high on a given statistic and not the others, but by scoring well on all.

      The use of r leads directly to the coefficient of determination r^2 which in turn gives a measure of the amount on variation in the regression that is explained by the linear trend. Obviously using CE and RE exclusively would seem to preclude calculating r^2. As a layperson I judge that r is frequently cited in these analyses without noting the explanatory power of r^2. When the slope for a regression can be determined to be significantly and even very significantly different than 0, but r^2 is small, the slope significance takes on less importance with me.

    21. TCO said

      Ken: This is part of my point, why I find it odd that Jeff slams RegEM for wiggle-watching high freq and wants to just do period trend matches. Les denialisters seem not to think about le consistencee of their vous.

    22. Jeff Id said

      #21,

      Period trend matches are the point of the paper, not the choice of my own Id. I prefer higher order trend matching.

    23. hswiseman said

      Snip me too #10 #13

    24. Ryan O said

      #20 I hadn’t read it, and I just did – so I need to think about it for a minute.

      First on r/r^2 . . . in the case of the verification statistics, it’s not linear. You’re looking at the residuals between the actual data and the reconstructed data – that’s it. r/r^2 on the plots above is not terribly sensitive to the low-frequency trends because the noise is so high. To me, it’s not a very good measure of fit (unless, as you say, your r^2 is damned near 1) because it penalizes missing that +/- 5 C noise far more than it penalizes missing that +/- 0.05 C/decade trend – yet it’s the trend that’s important.

      For RE and CE . . . well, most of my statistical background is in process control. In process control, you deal with detecting shifts in means and range primarily, and then attempt to find a physical cause outside of statistics. It is not very theoretical and it’s replete with thumbrules and approximations because, quite simply, in industry, no one cares about the theory. They just want a pretty picture that a high school grad can understand and realize that something’s wrong.

      With that being said, when I compare, say, x-bar and R charts to RE and CE, my initial thought is that RE and CE suck. The reason is that statistical process control works on first establishing a baseline and then taking continual samples to observe when the mean or the range change.

      In the case of RE and CE, however, you don’t first establish a baseline and then take continual samples. You establish a baseline and then look at the rest of the data in one large chunk. RE compares the non-baseline data to the baseline; CE compares the non-baseline data to itself. You can’t find temporally localized problems that way unless you get lucky with your choice of calibration (baseline) and verification (non-baseline) periods. Not only that, but in my non-theoretical and unsophisticated view of RE and CE, they have the further problem that they cannot distinguish between a change in means (which may not be a problem since you’re looking for trends, which would manifest itself in a change in means) and a change in variation. The former does not necessarily cause problems; the latter definitely does. But RE and CE won’t tell you that.

      Additionally, I personally don’t care if both RE and CE are positive. To me, it’s far more important that they be close in value (unless you’re looking for massive trends). Since all the trends in climate are very small compared to the noise, if RE and CE are substantially different, you have a problem – even if one of them is close to unity and the other is still significantly positive. From what I can tell, however, the wicket seems to be positive RE/CE and positive r^2-r^2(noise). That wicket seems to ignore the information that a difference in RE and CE are telling you (big change in means, which could indicate a data problem, or big change in variation, which would indicate your calculated confidence intervals are too narrow).

      With all that B.S. being said, I need to think about Ross’s post at CA and do some review work.

      Damn if this climate stuff isn’t making me re-learn a bunch o’crap I forgot (and some I never knew) . . . :)

    25. Kenneth Fritsch said

      Period trend matches are the point of the paper, not the choice of my own Id. I prefer higher order trend matching.

      Not to worry, Jeff ID. To me what climate science sometimes seems to lack, and particularly the reviews of the science that claim to be comprehensive and balanced, is looking at the evidence and analyzing it by various methods in the manner of sensitivity testing. I found the Douglass and Santer papers of great interest in their differing approaches and what one could conclude from those approaches (and perhaps differently than the authors intended).

      I think the consistency falls out of it in the process of a paper being published with an approach and then someone comes along with a different approach and points to perhaps an inconsistent result -or someone uses the original approach and results and interprets it differently. I suppose one could see lots of inconsistencies (and “partisan” ones) in the process of obtaining a more comprehensive view of what the science has to offer.

      What I do not find productive in these discussions is to personalize an approach and then beat up on the person in short quip of a post that throws around some emotional terms like denialist (warmers).

    26. kingelf said

      #25 good point on the poop throwing.
      I wish more discussions on the blogs, like on here, were on topic and less name calling.
      In a perfect world this would not be needed at all.
      this is a very good post with the technical math.
      Thank you Jeff ID, Ryan O

    27. TCO said

      22. NO! Period trend matches are not the point of the paper. The period trend is the point of the paper. THis is really not that different from using trees as proxies for thermometers. Instead, we are using the physical PATTERN of station temps as a proxy for the sattelite area temps in areas that lack a station. Jeff, you have a little bit of a mental block here.

      23 (Ryan): It’s interesting hearing you state the issues with R2. WEgman said it had issues as well. McI never engaged in a real discussion of the issues but retreated to gamesmanship on wording and assertions (a sophomoric debating style rather than a curiuous seeker of insight).

    28. TCO said

      25. You should be able to disaggregate a bit of swagger from an insight or a point. Don’t be a tender flower. SEarch for insight rather than for reinforcement within your tribe. Be brave, little monkey.

    29. Jeff Id said

      #27 I’ve been accused of confusing the issues before but never on every single post. It grows tiring. You used the term Jeff slams RegEM for wiggle-watching high freq and wants to just do period trend matches. Which I assumed was gerbil speak for linear curve fit. Perhaps your one sentence wasn’t enough for me to pick up your meaning I’ve reread it several times and don’t see it any different than the first time.

      You should know by now I don’t mind criticism, it goes with science but inaccurate and repetitive criticism is grating.

      If you take a few hours to look at the equations of RegEM and understand what a covariance matrix is, you can see why I discussed the high freq overwhelming the signal. It’s a bit more work than reading a blog post and criticizing anything you think you see but it will help you criticize the correct parts. The reason that I bring up the covariance matrix is that you are missing the point that the R2 issues and “wiggle watching” issues are actually the same thing.

      R2 problems are not a ‘deep’ mysterious concept to my knowledge but rather a simple one which can be understood by reading the equations. It’s one of the irritations of the whole hockey stick scam. The problems are well known and probably undergrad level issues. My guess in this case is that SteveM’s responses (that you work into every post) may have gone over your head because he wasn’t interested in explaining something you are unwilling to learn yourself.

      Since I have done and participated in probably two dozen different reconstructions (not all of which are posted) of the Antarctic temperature pattern, are you certain you are the one who should be explaining the meaning of the reconstructions to me? I’m not saying I get everything right but is it possible that the chosen one (who refuses to do math) may have missed a subtlety?

      Finally, I’m still learning from RyanO’s post above. There is a huge amount of detail in it which requires rereading, including the code for my own understanding. Ryan writes nice clean R code which is highly readable even though WordPress makes a hobby of destroying code formatting. I would humbly suggest that since even I am still learning from this post, perhaps the answers you claim to seek actually lie before you in the detail.

    30. TCO said

      I miss lots of subtleties. But I still got you on this thing, Jeff:

      1. You tend to set up simplifications, that simplify away the “handles” that are implicit in the method that you critique, since you don’t beleive said handles are relevant (teleconnections, high freq qualification, etc.) The problem is that this is basically circular logic.

      2. R2 loves high freq. Wiggle matching loves high freq. Statistical significance (having lots of doF) loves high freq. Trend matching is crap for stat significance.

      3. Don’t get your panties in a knicker about how much work you’ve done or how little I’ve done. If I have a relevant insight nonetheless, I still do. This is not a who’s read the paper more dick size contest. It’s not a dick size contest at all. It’s a salon of discussion.

    31. Layman Lurker said

      #30

      TCO, Jeff did not just pluck the high frequency stuff out of the air, he learned it through his work deconstructing Steig’s paper. He has shown through numerous posts that the linear trends and noise are almost unconnected and yet it is the noise corrlelation which RegEM sees. For these reasons plus the autocorrelation, it is the method artifacts, not real data, which show the negative correlations (I assume this is your example of teleconnection).

      Jeff has shown us. You have waved your arms. Can you show us anything? Can you at least give us a more detailed explanation of your criticism. If you can’t it means nothing. In the end all you are saying is that maybe Jeff because of uncertainty. Jeff or the rest of us don’t need TCO for that.

    32. Layman Lurker said

      Opps

      “In the end all you are saying is that maybe Jeff because of uncertainty.”

      should read

      “In the end all you are saying is that maybe Jeff is wrong because of uncertainty.”

    33. Jeff Id said

      #30, First, you got nutin’.

      1. Simplifications that simplify away handles?? One of us has no idea what you’re talking about. Teleconnections? It’s like you are intentionally over-complicating thermometers. The thermometer in my Illinois window knows pretty well what the temp is in my opinion. I certainly don’t need or intend to call California to correct it.

      The problem is not with high frequency qualification, it is with assuming a negative high frequency qualification (unrelated) has a negative long term trend qualification (negatively related). That is why my positive correlation reconstruction which takes into account all correlations was actually a superior reconstruction. It was impossible to convince you that I wasn’t throwing anything away.

      2. Not exactly correct, but closer. R likes the greatest variance, R basically ignores near-zero slope trends. I don’t know what you mean by trend matching being crap for stat significance. The trend is the point.

      3. You keep telling me my logic is wrong yet can’t seem to explain why. The point I’m making above #29 is that it’s apparent to me that you have more to learn by reading the information.

      BTW: Making the point and taking so much time to explain it recognizes implicitly that your insights are at times relevent. I’ve even offered an open forum for you to present your work if you decide to try some. You can bet you would get a hundred comments even if you claimed the sky was blue.

      You can hammer me one more time if you want on this thread, after that let’s focus on Ryan’s post.

    34. TCO said

      31. Jeff may even be right, that high freq is bad to use to train proxies. But he’s not automatically right. In addition, the RegEM has the ability to respond to things more complicated than simple geo-weighting, to show teleconnection, pattern frequency, etc. if that is important. That’s my point. Do you need more explication than that?*

      32. Maybe that is all I’m saying. But I think I am still helping by zeroing in on the issue analysis.

      *N.B. I am highly concerned about the 3 PCs. I think that is the bigger issue than the RegEM freq matching. (I’ve said it before, so I shouldn’t have to repeat myself, but I will, since internet people are so simple about things, so reflexive about repeating points and making others repeat theirs.)

    35. TCO said

      33.

      I still think you are missing some concepts, but it is too painful to tear it apart, especially with the one post limit (I bet you wouild allow yourself a last word, haha, so I let you have last word on the content, now.) Consider my points raised.

      Ryan has some good stuff.

      Yes, I bet that I would get all kinds of comments even if I said the sky was blue. That’s the nature of the beast and why I think the social aspect of skepticism is significant and in some cases more a part of the game than any real points raised. IOW, it shouldn’t matter if I kiss your ass and tell you it’s green…or kick it and tell you it’s blue. It’s still blue. But the tender flowers that need some time on the pond get all flummoxed from verbal aggression…

    36. Layman Lurker said

      The problem is TCO, that you throw this stuff out (teleconnections, pattern frequency, etc.) and seem to think this is a “relevant insight”. It is not. It is a vague generality. It is not connected to anything real wrt Steig’s paper. The only take home message I get from you is that there is uncertainty. Good one TCO.

    37. TCO said

      Layman:

      Sorry you feel that way, but not surprised. I am raising issues and concepts, that bear further discussion, examination, etc. If you engaged in a fruitful walk down towards hashing this stuff out, I would gladly engage with you. Just sitting back and saying TCO didn’t spell it out more, will not lure me into spending time to spell it out step by step.

      For one thing, you learn more if you engage. For another thing, I’ve done it in the past (for example with publication argument) and we never moded to deeper discussion and in fact, I was asked for the same specifics multiple times. I HATE that (oh so common on the net) style of discussion where things never move forward and people just repeat the same basic points.

    38. Layman Lurker said

      #37

      Consider #36 an invitation for you to engage by showing how the issues and concepts you mention are connected to Steig. More specifically, how they cast doubt or uncertainty on Jeff’s analysis.

      Don’t just toss stuff out TCO. Anyone can do that. If your issues are legit then show us why within the Steig frame of reference. Then we can engage.

      GTG. Back this evening.

    39. TCO said

      Give me a little more to get a handle on then “tell me more”, Layman. Engage and react and drill down and ask. I want to see more than just “say it over again”. Since in several cases, I refer to things to which I have already (on this site), explained at more length.

    40. Layman Lurker said

      “39

      My appointment is delayed so I have a few minutes.

      From your point #30 above: “You tend to set up simplifications, that simplify away the “handles” that are implicit in the method that you critique, since you don’t beleive said handles are relevant (teleconnections, high freq qualification, etc.) The problem is that this is basically circular logic.”

      If trend is virtually independant from weather noise, how can weather noise correlations be useful to compute spatial weightings for trend in a reconstruction?

    41. TCO said

      40. Layman:

      A. The trends could be matched, but by chance. It’s really just one to two degrees of freedom if the trends are matched over a 20 year period. Instead, if you see that the two functions are correlated with some wiggle-matching, that gives you much more confidence that you can use one to predict the other in an out of sample situation. Think of the bcps and 20th century general trend in global temp for instance.

      B. Note, it very well MAY BE that Jeff has a valid point that high freq matching is not helpful. But I think it bears a bit more thinking. I’m not instantly sure that he is right and I don’t think he’s shown enough to prove it. I certainly have not shown the converse, since I even think he might have a point. Just one where he is jumping too fast to an aha and needs to slow down and better think/show/research (in literature) what it means to have high freq versus trend matching and which should be more trusted.

      C/D: I think this is enough, but note the issues of pattern-matching and telecons are different and can be dug into later more.

    42. Jeff Id said

      “Jeff has a valid point that high freq matching is not helpful”

      High frequency matching is helpful. It tells us when a weather station is close to the satellite coordinate. In the closest distance reconstructions, the station data is strictly limited by the closest station. In the correlation reconstruction I did the stations data can take on a weighted shape according to high frequency correlation. If it hadn’t spread the trends across the entire continent I would really prefer that reconstruction. I’m considering redoing this recon’s weighting by correlaion^2 or ^4 simply to better localize the station data.

    43. Layman Lurker said

      #42

      Exactly, in fact you explained in your post that the hf correlations were a quality piece or info and could be utilized – WITH THE PROPER WEIGHTINGS – to impart trend. No circular logic in this.

    44. TCO said

      42. But if you constrain the problem to nearest neighbor impacts, you are a priori waiving the possibility of teleconnections/patterns. Now, in fact they may not be significant. But better to have a method that would allow them to contribute to the solution (if significant, relevant), but converge to the trivial case if they are not.

    45. Ryan O said

      #44 On this, I would have to disagree entirely. Here’s why:

      First let’s think about this idea of teleconnection in the Mannian sense (to distinguish this from the quantum sense). The Mannian form of teleconnection relies on calculating covariance at time t to predict behavior at time t+x. No other knowledge is imparted to the calculations.

      Because no other knowledge is present, how does the math distinguish between a physically meaningful correlation and a spurious one? The answer is that it cannot. How does the math know if the real covariance between the data is non-stationary in time? The answer is that it cannot.

      In cases where the data series are complete enough to allow rejection of spurious correlations (since the chances of a spurious correlation decrease as the degrees of freedom increase) and the data series are complete enough to capture temporal changes in covariance, the method may work reasonably well. Tapio Schneider’s paper on RegEM shows exactly that. Keep in mind that for his paper he randomly selected 3.3% of the data for deletion (randomized both spatially and temporally). There were not large swaths of data missing, which significantly reduced the chances of both spurious correlations and undetected changes in covariance.

      http://authors.library.caltech.edu/3973/1/SCHNjc01b.pdf

      If those conditions are not met, then the chances of nonphysical results increase. It is the burden of the user of the algorithm to show that both of those conditions are met – yet this is never done. It is the burden of the user to show that any teleconnections have physical meaning – yet this is never done. Instead, the hand-waving argument that the algorithm somehow distinguishes between non-causal and casual correlations with precisely zero information with which to make that decision is advanced.

      Mannian teleconnection is a classic case of confusing correlation and causality. A “teleconnection” IS bunk unless a physical reason can be advanced. Period. Until a physical reason can be established, a teleconnection is nothing more than a curious correlation that may or may not have meaning. It is the method applied for climate science.

      If a mechanism for physical coupling of distant stations cannot be advanced, then the argument that correlations between stations 2,000+ km apart can have predictive value is baseless.

    46. Ryan O said

      ^ that’s supposed to be “Dogs of the Dow” method, but I forgot to close the “a href” tag.

    47. Jeff Id said

      #45, A “teleconnection” IS bunk unless a physical reason can be advanced.

      You said it softer than I did. I had to snip myself.

    48. TCO said

      45. If you have the right methodology, releavant teleconnections or pattern matches will express. Otherwise, you will still produce the trivial solution. In fact, getting ghte trivial solution in that case woujld be an argument against teleconnections. But instead you want to insist they don’t exist and not allow the possiblity. This is your circular logic, little ‘hopper.

    49. TCO said

      [snip] – WTF??

      Oops, sorry boomer fag, got you confused with our nubly proprieter.

      There I found it. The name calling isn’t required chosen one.

    50. Ryan O said

      You have me confused with someone who goes 3 knots to nowhere. ;) I prefer to cavitate as necessary.

      Your answer makes as much sense as saying (page out of Ross McKitrick’s book):

      1. Pick 42 stocks.
      2. Delete 75% of the price data, preferably in large chunks.
      3. Put them into RegEM.
      4. Assume the resulting imputation accurately predicts the values of the withheld data.

      RegEM will find correlations. RegEM will find teleconnections. RegEM will predict stock prices.

      If you believe that these teleconnections have physical validity, then I challenge to you to use the result to decide where to invest all of your money. If you choose not to take my challenge, then, deep down, you, too believe that this teleconnection crap is just that: crap.

      After all, isn’t this exactly (not even metaphorically, mind you) what the government is intending to do with its tax revenues?

    51. Jeff Id said

      #50 nice. Sorry for the snip. I didn’t look up the code this time.

    52. Jeff Id said

      I looked up nubly (although I know the meaning) funny thing though:

      Nubly

      Commonly used while intoxicated due to the reduced amount of letters and impaired speech function.

    53. TCO said

      50. what part of my caveat, did you not process? It’s not impossible for a sample to predict a population. Relevant statistics apply. This is why I am very against the 3 PCs, but not so excited by the pollywog silliness from Jeff about RegEM.

    54. Jeff Id said

      #53,

      Thermometers measure temperature at their own location well and poorly at other locations. Thermometers tell you the temperature, they put it right in the name ‘thermo-meter’. Teleconnections are the claim that thermometers don’t measure temp but rather are affected by distant temps, that’s what makes the concept crazy. Now certainly a prevailing weather pattern might push similar temps to a region far away, but in this case there are thermometers already there, ordained by god and physics to be measuring away. Letting an unrelated mathematic correlation coefficient control the result without verification is … um ..not good.

      Since we have 42 thermometers and RegEM changes the reading of the thermo-meters based on distant thermometers we should be screaming as loudly as a second grader who’s Halloween basket was stolen. Instead we’re calmly discussing the possibilities that this statistical meat grinder might have worked. Claiming tele-connections somehow reduce the accuracy of the thermo-meter is something like watching an episode of ghost hunter.

      It’s mind sucking tripe! And it belongs in the waste basket but not until it has gone through the garbage disposal.

      ——-

      As a caveat RegEM does work just fine in some situations, apparently it was tested in cases where 3% of the data was missing. In this case over 50 percent of the data is missing. Not the same thing!

      Sorry for the tone TCO but occasionally even the chosen one can learn from the hopper.

    55. Ryan O said

      #53 Your words are empty. You refer to samples being used to predict population values. This is trivially true in some cases, not so trivial in other cases, and not true at all in some cases. Is it true for this case of Antarctic temperatures where the average completeness of station locations from 1957 to 2006 is less than 25%? Additionally, what relevant statistics are you referring to, and how are they implemented in RegEM? I will give you a hint: the number is less than one. RegEM is simply an algorithm that predicts values based on correlation. It has no ability to analyze whether those correlations are spurious or have physical meaning. None.

      Furthermore, the output of RegEM cannot be used to distinguish between physical correlations and spurious ones. You can only look at the difference for those data points where actual data exists. If there are not enough actual data points, then you cannot assess whether the population covariance has changed or if the correlations used for the imputation were spurious.

      You can only use statistics where enough data exists to be analyzed. There are no statistics to analyze data that does not exist.

      Therefore, unless there is a physical reason for correlations to exist and the covariance to remain constant during periods of missing data, then you cannot distinguish between physical and nonphysical results.

    56. Ryan O said

      Also, TCO, you’re forgetting that the burden of proof is on the one who says that a correlation has physical meaning – not the other way around.

      The correlation is by definition spurious until a physical reason is advanced.

    57. TCO said

      54. What part of “if there are not teleconnections the algorithm will just converge to the trivial solution” did you fail to grok?

    58. TCO said

      55.

      a. My empty words were in response to your stick example.

      b. Sure, some situations the sample can give a good prediction. Others a more uncertain one. That’s what stat tests are for.

      c. RegEM supplies the best guess of the missing value. Other stats need to be used to evaluate the risk of that number being off or of the overall recon being so.

      d. An adequately constructed algorithm will return suitable telleconnections when they are statisticaly valid and not when they aren’t. A physical rationale is not needed or perhaps optimal.

      e. I am not claiming that the current steig system is adequatge. Just rawcting against the too quick dismassal of even the concept of more heuristic reconstructions.

    59. Jeff Id said

      #57

      “What part of “if there are not teleconnections the algorithm will just converge to the trivial solution” did you fail to grok?”

      I’ve demonstrated the trivial solution RegEM didn’t come very close. Me no grok! :)

      Hop!

    60. TCO said

      no…no you didn’t. Instead you have circular logic. You claim that you proved no teleconnections when using a test that already assumes no teles.

    61. Jeff Id said

      “You claim that you proved no teleconnections when using a test that already assumes no teles.”

      I promise that I never claim that, you are correct that I assume thermo-meters know temp.

    62. TCO said

      I win. Go to your room. Hail to the Redskins!

    63. Ryan O said

      a. My empty words were in response to your stick example.
      No, your words are just plain empty. They are meaningless. They have nothing to do with whether valid “teleconnections” exist. You don’t get a free pass that you might have a valid point by tossing in a “relevant statistics” caveat without describing 1) what the relevant statistics are; and, 2) how said statistics would be able to distinguish between valid and invalid “teleconnections” in the absence of any explanation of physical cause. You are engaged in the same activity of bare assertion that you espouse to detest in others.

      b. Sure, some situations the sample can give a good prediction. Others a more uncertain one. That’s what stat tests are for.
      Again, you miss the relevant point. The relevant point is that you cannot use any statistical test where data does not exist. How many degrees of freedom do you have at Mt. Siple between 1957 and 1980? Or Harry? Or Gill? All you can assess is how well the imputed and actual data compare where both actually exist. For some reason, you seem to either overlook or ignore this.

      c. RegEM supplies the best guess of the missing value. Other stats need to be used to evaluate the risk of that number being off or of the overall recon being so.
      See answer above.

      d. An adequately constructed algorithm will return suitable telleconnections when they are statisticaly valid and not when they aren’t. A physical rationale is not needed or perhaps optimal.
      No. Period. This is equivalent to saying if the statistical measures of correlation/covariance are strong enough then causality necessarily follows. This is a gross conceptual error. Correlation – however strong – does not imply causality.

      e. I am not claiming that the current steig system is adequatge. Just rawcting against the too quick dismassal of even the concept of more heuristic reconstructions.
      Any method that claims to be able to detect physically valid correlations without any knowledge of physical causality is over-reaching, without exception.

    64. Layman Lurker said

      The whole discussion seems rather academic when you consider Steig and the deconstruction that has been done so far. As I understand it, teleconnection is a climate, not a weather phenomenon. Example: Florida CLIMATE has more in common with a sub-tropical location in the next hemisphere than it does with New York in spite of the distance. The wiggles of weather noise are not likely to match up in this teleconnection very well. In fact the weather noise wiggles may have some vague correlation to New York.

      A “black box” weather noise negative correlation in Antarctica cannot seriously be considered anything but spurious. Linear trend positive correlations from coastal (non-peninsula) stations on opposite sides would be a plausible teleconnetion IMO.

    65. Fluffy Clouds (Tim L) said

      more links for the climate challenged !

      http://en.wikipedia.org/wiki/Teleconnection
      http://en.wikipedia.org/wiki/Covariance

      Claiming tele-connections somehow reduce the accuracy of the thermo-meter is something like watching an episode of ghost hunter.
      nice one jeff! LOL :)

    66. Fluffy Clouds (Tim L) said

      Layman Lurker,
      This is what I just got out of this long brutal discussion.

      Example, sea water cools as it goes towards the arctic(Atlantic), then sinks to the bottom of the ocean, becomes an under ocean(ground) river, then flow’s to the pacific some where by japan. now there is a Teleconnection between the weather/climate of the UK to the weather/climate of japan.
      I know it maybe 100years more or less but this does give the pdo/amo a Teleconnection.
      and so explains the unexplainable temperature changes.
      Witch also PROVES the climate models are 99.9% junk.

      I now see the points made by Jeff on the frequency high/low, and you to DR. Lurker have been making here. regEM or any other WILL NOT HANDLE correlations of 100years, alone a few days(distance too lol).
      And as always maybe just more smoke out the feathers.
      I think I have a new word for this….lol…. Telecorrelations!
      Oh ya all of this flows by the Antarctic ( just to keep on topic lol)

    67. TCO said

      a. Ryan, I am NOT asserting teleconnections. Honest. My point is more subtle. I am saying that if teleconnections DO EXIST and you have a method that eliminates them from influencing the recon, that you are throwing out info. I don’t know whether they are there or not. Do you get it?

      For instance, if teleconnection does NOT exist, you would think that this would not be an issue since THERE WILL BE NO CORRELATION. If there is a correlation, there is a teleconnection! Now, I understand that you may be concerned about even more subtle issues such that even though teleconnections don’t exist, leaving the machinery open in case they exist, allows false solutions. However, this needs to be attacked in depth and it is not immediately apparent.

      I bet this will annoy you, little shallow diver…but at least understand my stance when you criticize it.

    68. TCO said

      b. I think my reply was relevant to your stated example (a survey predicting a population)…but let’s move on past that…I want to engage your points that you wanted to make.

      Of course, the overlap period is all that can be used to qualify the proxy. The question becomes…do we ONLY have the information of the nearest neighbor to help predicitons…or can the extra information contained in other neighbors, in patterns of stations, in the correlations to ALL the stations be used to give us a better answer. I honestly don’t know. But I don’t dismiss the idea out of hand. Certainly in certain types of pattern matching, etc. problems the correlations to multiple stations could be helpful. Does the physics of weather/climate here have such an impact? I don’t know. But an algorithm that would incorporate said, while converging to the trivial case if things are trivial could be powerful. Note: this is why I HATE the 3 PCs thing, much more than hate the RegEM.

    69. Ryan O said

      Jeff:

      Nubly

      Commonly used while intoxicated due to the reduced amount of letters and impaired speech function.

      The Naval usage of nubly is slightly different. Nub stands for “Non Useful Body”. Nubly is having the characteristics of a non useful body. ;)

    70. TCO said

      For instance…imagine a situation where you had 3 locations: P (predictant), N (nearest neighbor), and NN (next nearest neighbor). It is at least conceiveable that using both N and NN to predict P would be most helpful. This is in simplest form a teleconnection. Now, there may even be things like weather patterns where a shift up in one implies a shift down in the other. For instance things like El Nino, like the jet stream position, etc. etc.

      Heck, you even have Steve McI referring to such an idea when he references the ITCZ (or whatever the initials are) wrt the Cobb corals. In that case, he wants to examine a pattern change driving warming/cooling at the observable, rather than general warming/cooling.

    71. TCO said

      (let me get through, your high quality post, grommet.)

      c. I think we are in essential agreement on the need for testing during overlap to establish the believability of a proxy in the out of overlap period.

    72. TCO said

      d. Chill out on the periods, stud. I understand that you are concerned about data mining, given the lack of a physical rationale. However, it is still possible that you will not get the optimal “betting” answer if you ignore correlations or require an insanely high individual rationale for each one. Simply saying that this is all temp data and that patterns can occur, which allow better prediciton of an area using multiple stations, not merely nearest neighbor…is a “rationale”. I think it becomes more an area of refinement to look at the danger or data mining versus the danger of throwing out relevant correlations. Remember BOTH could be suboptimal.

    73. TCO said

      e. REference d. What is your “bar” for correlations. This is all weather right. Do we have to label every nuance of patterning? And shouldn’t the correlations go to zero if there is no significance? Again, I think this is a more nuanced issue than you assert with your RPM-like “period” dictates.

    74. Ryan O said

      TCO:

      1. If by “teleconnections” you mean “ability to use A to predict B when there is no underlying causal relationship”, I stand by my statement that this is a gross conceptual error.

      2. If by “teleconnections” you mean that statistics alone can demonstrate an ability to use A to predict B without any external information concerning causality, I stand by my statement that this is a gross conceptual error.

      3. If by “teleconnections” you mean “apparent in-sample ability to use A to predict B without any external information concerning causality”, then by prohibiting “teleconnections” you do indeed have the potential of censoring real physical relationships. Please note the caveats here. They are not the same as the caveats you proposed earlier, which placed your statements into #1 and #2.

      The problem with the approach of #3 is that if teleconnections are permitted without any quantitative (or even qualitative) physical relationship established, then you have no way to distinguish between spurious (non-physical) teleconnections and real physical causality.

      For example, I could replace some of the surface stations with stock prices. I’d be willing to bet that this affects the results. There is no physical reason they should affect the results, however, so I would not include them. I make a prior decision to exclude the possibility of teleconnections between stock data and temperature data.

      This is obviously an extreme example, but it illustrates the point that in order to have meaningful results, the researcher needs to make decisions on what type of data to include. Data that cannot (or is likely not to) have a physical relationship is excluded. The open-ended prior you propose can easily render the subsequent analysis meaningless.

      The fact that researchers do make decisions to censor input data is one of the reasons sensitivity analysis is important in evaluating the uncertainty of results. All researchers takes their best shots and see what results. The good researcher then re-runs the analysis under the assumption that his best shot was not accurate. The really good researcher then attempts to quantify the differences with quantitative physical explanations, and, if no satisfactory explanation is found, he explicitly states this.

      When it comes to the specific case of Antarctica, your ideas concerning pattern matching are certainly not categorically invalid. However, with any complex system, many different patterns can be matched. For example, Roman indicated that we could perform an orthogonal rotation of the 3 eigenvectors and get different answers even though the input data is mathematically equivalent. RegEM is now matching a different pattern, though not necessarily a more physically correct one. We could also find patterns in seasonal correlations, or correlations within specific temperature ranges, or correlations by regressing the stations on varying lags of other stations. There are literally an unlimited number of patterns we could find, and each is likely to give different results.

      Keeping the pattern-matching open-ended makes the problem of the reconstruction intractable. The only way to limit the analysis to a calculable number of options is to use plausible (and preferably quantifiable) physical relationships to pare down the field.

      This is why the hand-waving arguments about the teleconnections by the RC crowd and other followers are just terrible. If I were to use a different implementation of RegEM, the teleconnections would be different. If I were to truncate the eigenvectors used to a different number, the teleconnections would be different. If I were to include raw AVHRR data rather than PCs, the teleconnections would be different.

      Each set of teleconnections are mutually exclusive, yet according to the RC crowd, they are all equally valid – because if they weren’t valid they wouldn’t have happened. Rather circular reasoning. When I challenged Gavin on this, he responded with a few other hand-waving arguments and left my actual question unanswered (see the “Antarctic Warming Is Robust” thread).

      This is why it is perfectly valid to set up an analysis that excludes “teleconnections” on physical grounds – most especially when you are attempting to perform a sensitivity analysis on a reconstruction that explicitly allows them.

    75. TCO said

      64.

      Weather and climate share a continuum. The period of observation, monthly has both weather and climate relavence. Further it is possible for climate changes to be evidenced in weather patterns (more frequent snow storms, etc.) thus looking at weather coujjld give insights on climate.

      “Can not be regarded as anything but spurious”. That is an excluded middle type statement. You want absolute certainty one way or the other. Surely it is possible that large correlations have a rationale despite them not bein a prior predicted. Throwing them out is a decision as much as leaving them in. Perhaps there are times when the best predictor of Orlando temp is DC temp and times when the best predictor is Houston. Different patterns of weather can occure and allowing a more detailed climate field (multiple sensors to eval against) makes it possible to tell when one versus the other is the better predictor.

    76. TCO said

      You can have teleconnections in the ‘tractor. Push flux down one place and it pops up elsewhere. Time depndant issues with xenon. Etc.

      P.s. this is a total strain to act like a wannabe, if that is what I’m doing.

    77. TCO said

      74. I think a lot of your kvetch comes down to teh eefulness of the 3 PCs rather than of using multiple stations to train a given gridcell.

    78. Ryan O said

      TCO:

      Because you don’t like the “Periods” and assert that my “Periods” are not necessarily solid, I will reprint the relevant parts for discussion:

      d. An adequately constructed algorithm will return suitable telleconnections when they are statisticaly valid and not when they aren’t. A physical rationale is not needed or perhaps optimal.
      No. Period. This is equivalent to saying if the statistical measures of correlation/covariance are strong enough then causality necessarily follows. This is a gross conceptual error. Correlation – however strong – does not imply causality.

      e. I am not claiming that the current steig system is adequatge. Just rawcting against the too quick dismassal of even the concept of more heuristic reconstructions.
      Any method that claims to be able to detect physically valid correlations without any knowledge of physical causality is over-reaching, without exception.

      The reason for the “Period” in d) is because it my response is categorically true. There is no way to develop an algorithm that, based on statistical tests alone, can distinguish between physical and spurious correlations between arbitrary data sets. If such a thing were possible, it would be the holy grail of scientific research. No thought required; just plug in your data and let the algorithm tell you the answer. Correlation, however strong, by whatever statistical means you chose, does not imply causality. And if it does not imply causality, then you cannot place any confidence levels on out-of-sample predictions (i.e., where data is missing).

      The subsequent answer to e) is simply an extension of the above. It is categorically true.

    79. Ryan O said

      74. I think a lot of your kvetch comes down to teh eefulness of the 3 PCs rather than of using multiple stations to train a given gridcell.

      You think, but provide no evidence. I intend (on the artificial temperature maps) to provide the evidence.

    80. TCO said

      78. Please respond to the more substantive points in 72. Consider…nearest neighbor is not an IRONCLAD F–MA physical ratrionale…and pattern mathcing of sensors is a weak rationale. The thing is what is your hurdle for rationale. This is not digital, but analog. Also, consider that the danger of false correlations can vary based on the algorithm. It may be possible to construct algorithms that capture relevant correlations without blowing up on false ones. Heck..I think if there IS a significant pattern/teleconnection in the data, my “day one hypothesis” is that it is meaninguful and physical. I think overall algorithms can be constructed that avoid a lot of the spurious danger (not saying that was done here…)

    81. TCO said

      79: If you just used ALL the series, vice PCA, this would obviate Roman’s concern. Capisce?

    82. Ryan O said

      TCO:

      I’m not sure how to convince you of this, but there are no substantive points in #72 that support your assertion that a proper algorithm does not need physical causality to accurately assess using A to predict B. While your point that nearest neighbor is not ironclad is valid, it doesn’t have anything to do with your initial assertion.

      On #81, yes, I believe that would obviate Roman’s specific theoretical concerns on this point. I do not believe it obviates all of Roman’s (or anyone’s) concerns about using imputation algorithms for this kind of work.

    83. TCO said

      82.

      part one: well it’s temp and temp can have correlated pattenrs tyhat vary in geometry is a rationale. Maybe weak one, but still…one. Capisce?

      Part two: Good.

    84. Ryan O said

      #83 I’ve never argued that. I’ve instead argued that the mere existence of a pattern is not validation of its usefulness for prediction – regardless of how good the in-sample statistics are.

    85. Layman Lurker said

      #75 TCO

      “Weather and climate share a continuum. The period of observation, monthly has both weather and climate relavence. Further it is possible for climate changes to be evidenced in weather patterns (more frequent snow storms, etc.) thus looking at weather coujjld give insights on climate.”

      TCO, you are stretching so far with this one the crotch of your pants have ripped. Climate, wrt to the Steig paper, is defined by the 50 year temp trends. We have a reconstruction where the HF correlations get spread far beyond the correlations shown in the actual data (as shown by “Blendinator” and distance correlation scatterplots). Furthermore, any trend connection is incidental based on proximity – but distance?. Are you actually suggesting that using these distant smeared HF correlations to impart trend might not be spurious? That it might be a “relevant” teleconnection? Or have you wandered off on a “vague generalities” tangent again?

      Regarding the second part of your response, if it makes you feel better I will again invoke the “uncertainty principle”.

    86. TCO said

      Layman:

      I agree that the assertion would be a stretch, were I trying to say firmly “this is a good rationale”. I think of it more as a counter to the firm counterstatement “there is no good rationale”. Capisce?

    87. Layman Lurker said

      TCO, I would submit that the uncertainty principle works against your case more than for it. If trend correlations and monthly frequency correlations are almost independant then how can coincidentally similar correlations in one time period give assurance that the relationship will hold in another time period where no data exists? Note I am not even allowing for distance smearing of HF correlations in this case.

    88. TCO said

      Layman: if the two instruments are unrelated, than why is there significant correlation? And realize that we are using TEMPERATURES and on the same continent. This is not comparison of stock prices to superbowl wins or even BCPs to the global climate field, but something much closer in connection: the comparison of points on the continent with multiple other points: the idea that more than just nearest neighbor can help predict response most efficaciously (otherwise you are throwing info away), the idea that patterns of response can have implications (this is not a stretch, geographic signature weather patterns happen all the time), and that if the points are unrelated correlation should be low anyhow.

    89. Layman Lurker said

      TCO, Jeff has shown that the 50 year trend correlations and monthly weather correlations are virtually independant. If they were not independant then his analysis would have shown the contrary. Any similarity of these correlations with nearby stations are incidental due to distance. But because they are independant there is therefore no physical reason to believe the incidental similarity will hold in an unknown period. Ryan and Jeff have been arguing the same thing. Note I am not even suggesting that these nearby similarities be thrown out, just arguing the extreme case to support the obvious one – that distant smeared HF correlations of the reconstruction cannot predict trend.

    90. TCO said

      1. He’s shown that for NEAREST NEIGHBORS, which does not contain all the information. Remember that this is a problem with sparse coverage…so throwing out non-nearest neighbors is throwing out something that could help the problem.

      2. “Any similarity of these correlations with nearby stations are incidental due to distance.” Huh? What does this mean? incidental? Do you mean dice rolling chance? And “due to distance”…huh?

      3. Smearing a register as a real issue. This is why I don’t like the 3 PCs. My hope would be more that we use all the stations to predict all the gridcells. (Ones with poor correlation would get little weighting…and we also allow patterns and interactions to have an effect.)

    91. TCO said

      90

      (addendum to my 1.) And he’s shown it for nearest neighbors VERSUS the 3PC smear. The trend match was much more decent just sattelite point versus station.

    92. Ryan O said

      TCO, I think we’re all talking past each other.

      Jeff (and, more recently, myself) is saying that blindly letting RegEM impute values based on correlations alone is subject to error, especially because the signal-to-noise ratio is so low. While most of the correlations RegEM finds are physically valid (else it wouldn’t work at all), it is not possible to distinguish physically valid correlations from physically invalid correlations based on the output alone.

      So what you have to do to assess the validity of the RegEM results is to compare the output of RegEM to something else entirely. Hence the nearest neighbor reconstruction. In that, Jeff prohibits long-range correlations – because a physical reason for a valid correlation has not been established – and compares the results. He also compares the result to other methods, such as regridding and using more PCs. In all cases, the RegEM result disagrees markedly, while many of the other methods are more consistent with each other.

      This would indicate that at least some of the correlations RegEM is finding are not physically valid because the RegEM result is the outlier.

      Does this prove that RegEM is junk? No, not yet. But it does give us much reason to be suspicious of these “teleconnections” (as does the correlation does not imply causation caveat).

      The difficult part of this analysis is to attempt to separate RegEM problems from PCA problems, and frankly, I don’t think that it is possible to separate the two without a significant amount of additional work. RegEM needs to be properly benchmarked for a situation like this and it has not yet been done. I don’t think, based on your posts, that you disagree strongly with any of the above.

      So I think for the most part that we’re all sort of saying the same thing in different ways and we need more information to firm up what we think we’ve discovered so far.

      The other thing I want to say is that you keep bringing up close correlations and patterns as implying a causal relationship. No one disagrees that this may be true. The disagreement is that this is not necessarily true – and the burden of proof always lies with the individual asserting that the correlation has physical meaning, not the other way around. RegEM is a case-in-point. Some of the correlations in RegEM must be spurious. Stay tuned for a post that reinforces this.

      We also have differing degrees of trust (or, at least, suspension of disbelief) in RegEM. However, there really is not enough quantitative evidence yet to support whatever beliefs we hold. I think we would be better off waiting for quantifiable evidence of how well RegEM performs in this type of situation before arguing it further.

    93. TCO said

      I’m glad to see you finally bend to my will, shallow diver.

      P.s. Come again on how a nearest neighbor recon with 3PC recon can show the lack of telecons? especially as would be in an unlimited (non PC) recon?

    94. Kenneth Fritsch said

      I do not have the full picture of how Steig et al derives or evaluates their monthly correlations from station to station, but a simple minded view of monthly correlation, without adjustments for auto correlation and seasonal pairing, would obviously show a good correlation of warmer in the summer, colder in the winter and not as warm or cold in between. That by itself says nothing about a connection of those stations with regards to long term trends. What would annual correlations look like?

      If the correlation is simply used for filling in missing data to the extent that the missing data does not control the trend then why not extrapolate or use nearest neighbor stations.

      A correlation that does not have a physical basis is particularly fraught with danger since one has no assurances that the measured correlations will hold into that period where you have only surface stations.

      By the way, has anyone looked at how constant the high frequency correlations in the 1982-2006 period are if the period were divided in half or in thirds and compared?

    95. TCO said

      A. weather patterns are non-physical?

      B. Would you only use one sensor (best) sensor as part of a tracking party or combine all the inputs in an integrative fashion?

    96. Jeff Id said

      #94 “By the way, has anyone looked at how constant the high frequency correlations in the 1982-2006 period are if the period were divided in half or in thirds and compared?”

      Not that I’m aware of. The AVHRR is so noisy that different regions have to have some problems in certain timeframes. Jeff C, who disappeared again, was starting to think it was some kind of filtering algorithm that improved the data so much but he didn’t elaborate.

    97. Layman Lurker said

      #90

      TCO, regarding your point #2 I should have said “coincidental” instead of “incidental”. When I spoke of distance I meant that both trend and noise correlations are distance related – closer stations tend to have stronger correlations than distant for both trend and noise.

      I know you have seen this but to refresh your memory here is a link to Jeff’s posts demonstrating the poor relationship between trend and correlation coefficients:

      http://noconsensus.wordpress.com/2009/04/14/warming-the-raw-sat-data/#more-3354

      There is also some interesting discussion on trend vs. noise correlations in the “Engineer’s reconstruction”.

    98. Layman Lurker said

      Further to #97

      For a case example, Jeff noted the identical “trends” of Scott and Arturo but HF covariance was negative:

      http://noconsensus.wordpress.com/2009/04/04/whats-wrong-with-regem/

    99. Geoff Sherrington said

      Enough! Enough!

      First of all, establish that the ground stations have produced useful temperature data. Unless you can establish that for the Sreig exercise, all subsequent work fails.

      My 2 bob’s worth is that the ground station data have been adjusted then used out of the realm of reality.

    100. curious said

      Geoff – OT but do you have pointers to any decent Southern ocean sea temp records from physical measures (other than Argo)? thanks C

    101. Jeff Id said

      #99 Any ideas how to do that?

    102. Ryan O said

      #101 The simplest way is to do the split reconstruction. Verify that using the ground stations allows you to replicate the withheld portion of the AVHRR data. This is what Steig did, but his choice of PCs/regpar resulted in terrible verification statistics.

      The other thing that I haven’t figured out – for both Jeff and Geoff – is how to properly define confidence intervals. Steig’s confidence intervals are simply the +/-95% based on the regression of the reconstruction. No allowance is made for the fact that the reconstruction does not match instrumental data (where instrumental data exists). To me, the confidence intervals should reflect both the uncertainty in the regression and the uncertainty in the reconstruction. I haven’t figured out how to do the latter.

    103. Ryan O said

      #99

      Hey, Geoff . . . do you have a particular reason for believing the ground data has been altered/modified? If you do, I’d like to look into it. For the most part I’ve been going off the assumption that, while there are known problems in the ground data, the overall trend for stations with long history is reasonably accurate.

    104. TCO said

      97/98 (Layman):

      A. Well, no duh…that stuff further away is less correlated. This sorta makes sense from every day experience. My question to you and Jeff…is what are you worried about then? If it’s less correlated, why do you have to come down on top like a comissar and only allow nearest neighbors?

      B. Think about a situation with two evenly spaced neighbors 200 miles from a location. Don’t you think that using both sensors to predict the location will give a superior prediction than just picking one? Now move one foot closer to one sensor and one foot away from the other: Don’t you think using both sensors will help? Now move one to 50 and one to 150 miles. Don’t you still think that some weighted average of both sensors will best predict the given point?

      C. And distance is not the only relevant thing. direction (or really type of location) is important as well. Imagine predicting Point Loma, San Diego. You can go 120 miles up the coast to coastal LA…or 120 miles east to Borrego. Will their impacts be equal? Isn’t it even possible that a further coastal sensor could be a better reference than a closer inland one? However, isn’t it still possible that combining the two could give a more accurate prediction? Could the benefit (weighting) differ at different times of year or during different weather patterns?

      D. Citing a pair of stations that have co-trend and negative correlation is not sufficient to validate a general view of looking at trends versus correlation. First, a single example like that is useless for general assertions. You need a theoretical basis or a sufficient set of observations of impact to be statistically relevant. Second, a trend is WAY easier to match spuriously than a multiperiod correlation. Third, we have rightly criticized examples or dangers of such “second point” effects in bcps and the like.

      E. Note, I’m not asserting that Jeff has no point wrt high freq versus trends. I’m instead saying that he has not demonstrated HIS point. That’s why forcing hinself to clearly describe his concepts in mathematical terms, to write them up, and to have them reviewed by real stats thinkers would be helpful.

      F. I KNOW I’m ignorant. What’s scary to me is watching guys like Jeff or Ryan, mucking around, having half an insight, but NOT knowing that they are ignorant and need to do a lot more to really even understand their own arguments.

      P.s. Ryan, I’m not surprised that you learned some things by engagement with Steig. AND. I DID TOO make the general point/concern of PC numbers versus RegEM.

    105. TCO said

      “The other thing that I haven’t figured out – for both Jeff and Geoff – is how to properly define confidence intervals. Steig’s confidence intervals are simply the +/-95% based on the regression of the reconstruction. No allowance is made …”

      What does this mean, “of the regression”.

      Are you trying to make an envelope on the average trend line? Or to estimate the +- accuracy of the overall period trend?

      In either case, would agree that the proven ability of the sensors to predict things during overlap should be the basis of your ability to predict things out of sample. However perhaps that is already included within this “of the regression” thing of which you speak?

    106. TCO said

      I tried listening to some of the Heartland speeches. The preaching to the choir, the longwindedness, the lack of dispute were all things that turned me off…and I did not bother listening to too much. Listening to Schmidt talk about sun effects as if they were so trivial as to not need proof…turned my stomache. That and seeing McI scheduled nuts to butt with Watts-idiot.

      Jeff, you and Ryan may not have McI stats brains. But I at least think you try to figure things out and will report things whichever way they go.

      Somehow, the complex of all these guys patting each other on the back just disgusts me. They can’t bring it in real science…and they won’t call each other out. It’s really a social phenomenon. Was from the beginning with Steve meeting Ross on listservs. That’s why I love gumming up the works and pissing off all the old farts and nutters by calling them on their little monkey tribe silliness.

    107. Geoff Sherrington said

      I posted these on WUWT on 2.05.09.
      ………………………………….
      Geoff Sherrington (06:36:07) :

      Some actual data:

      http://i260.photobucket.com/albums/ii14/sherro_2008/MAWSON1955TO2006.jpg?t=1241270912

      http://i260.photobucket.com/albums/ii14/sherro_2008/DavisAntarctictempAnomaly.jpg?t=1241271003

      http://i260.photobucket.com/albums/ii14/sherro_2008/Macquariegraph.jpg?t=1241271107

      You tell me if it’s warming.

      How do you hide the disappointment that this is not happening?

      http://i260.photobucket.com/albums/ii14/sherro_2008/MOAHockystick.jpg?t=1241271264

      ………………………………

      Geoff Sherrington (20:56:52) :

      Re Mike Bryant (16:51:54) : 2.05.2009
      “Dr Watkins declined to release the temperature data to The Weekend Australian.”

      The Weekend Australian reported can go to an outlet and obtain a BOM compilation on CD that reports the temperature data. Day by day, maximum and minimum, at the several Antarctic bases, with stops and starts over the years. If the BOM are backing off from the veracity of data, then plausibly they are knowingly selling a defective product.

      The reporter has been saved the effort. I posted some of the data above at Geoff Sherrington (06:36:07) : 2.05.2009. It’s the same data.

      For those with a forensic mind, there are several sources of data reports and some type of major discrepancy seems to happen in year 2007, which reports a lot hotter that any other year at Mawson. Note also that the global 1998 peak is absent; that there are probably changes in the types of instruments used; that there could be splicing to smooth the instrument transitions; and that as shown above, the locations of the weather station screens might not be optimum. There are a few reported effects of the height of the thermometer/thermistor above the surface in the range 0-2 meters, and the snow under the screens is subject to changes in levels. Also, on this continent of ferocious winds, the wind teperature variation that is measured could have been drived from heating/cooling effects that happened quite some distance away. Not easy to pin it to the lat/long of the base.

      So, yes, there are probably complications that would stay the hand of a good scientist from blurting out an unequivocal statement.

      …………………………….

      There is disquiet that the BOM had said to me that they have no control over what happens to their “promary” data once they pass it on to others. I have pointed out some errors attributed to NOAA within the Dutch KNMI database of up to 1 deg C (for Australia outside Antarctic) and I have looked at some GISS versus BOM data. There are differences of inexplicable origin. GISS seem to homegenise the BOM data which have already been homogenised.

      …………………………….

      I have no data in sea temperatures from the Southern Ocean, sorry. I have merely posted elsewhere that many of the cyclic effects supposed to influence weather are NH studies and there is a lot of important water around the Antarctic that seems to be understudied in detail.

      ……………………………

      Error bounds. There seems to be a climate science tendency to take data through several stages of processing then to do statistics on the final result that pertain to the final result rather than to the whole process. e.g. in GCM ensembles, people put error bounds around the ensemble average that exclude some of the ensemble means. I’ve ranted many times that the error bounds should go around ALL of the GCM runs, including those that the makers have rejected because they “don’t look right”. Same comment applies to comments above. You simply cannot use the imperative of the computer printout and the snow job of large number sets to do much about systematic bias. There is a gross shortage of simple, fundamental experiments that would detect and quantify bias. In the final analysis, we are arguing about bias, not the precision of data that are biased. It all goes back to instruments that can read to 1 deg under the best of circumstances. To quote 0.1 deg per decade as a finding is a gross misuse of numbers.

    108. Layman Lurker said

      #104

      TCO, I don’t think that Jeff is holding up “nearest neighbours” as the ultimate. It is one of several angles to give perspective.

      There is nothing wrong with these correlations when it comes to explaining monthly noise type patterns. Noise correlations just cannot be the only vehicle to distribute trend because the relatioinship is a loose one.

      Regarding your point C I agree. But as I have said before I think it is more likely that the linear trends will be somewhat correlated then the weather noise over such vast distance. The connection would be more like the general connection between subtropical locations thousands of kilometers apart. It is just as concievable that the HF correlations could obscure a “trend” (tele?)connection between coastal stations on opposite sides of the Antartica.

      There is much more evidence of correlations and trend being disconnected then the Scott/Arturo example. I linked to another post. Many of the reconstructions suggest this. The -2C trend reconstruction shows spurious correlations.

      I don’t think that Jeff and Ryan think that they have a finished product yet. There will likely be more sensitivity analysis/reconstructions done which will further deduce the impacts of the various issues like HF correlations and low order processing.

    109. Geoff Sherrington said

      If you look at animations of cloud movement taken by satellite, there are commonly about 6-7 similar-looking weather systems moving around Antarctica. Since they extend south far enough to touch Base on the coast, I’d certainly expect there to be some correlation if you hit on the right frequency. What it means and how strong it is is an entirely different question.

    110. TCO said

      108. I don’t think he is either. But he often unconsiouscly makes assumptions in that direction. While you all see me as hectoring him and ruining the good feeling here and not doing work of my own…I am trying to get Jeff to actually be logical about his work.

      P.s. Hearing on the one hand that we don’t have anything publishable yet…and then on the other hand that Steig was definitely messed up and skewed the analysis…makes no sense.

    111. Jeff Id said

      #110 I assure you my assumptions are thought through and intended. You miss the point too often and that makes you appear to be hectoring.

    112. Ryan O said

      Good info, Geoff.

      I’m not yet sure how that would directly relate to this case, though, as there is no homogenization of the data or other manipulation since it comes directly from BAS. The data is simply what is recorded at the stations. BAS doesn’t do any homogenization or anything like that.

      Now, there are definitely problems with the data . . . the question is whether the problems in the raw data are of a magnitude to prevent us from extracting any real information. Based on the good match between the AVHRR data and the ground data, my initial impression is that the data is good enough to do an analysis – just perhaps not the way Steig did his analysis.

    113. Jeff Id said

      #112 #109,

      Yup, If the known signal cannot be reasonably extracted a better known signal isn’t of much use. If we run across an fix that makes everything run, we should then turn our attention to the raw data.

      I haven’t had time to check Geoff’s data at the links but I’m interested in comparing to nearby stations from Steig.

    114. Geoff Sherrington said

      112 Ryan O.

      If the BAS get “Primary” data from the BOM, then it is likely that there has already been homogenisation by the BOM because they say that they do it in incompletely specified manner(without specific mention of Antarctica) in a paper by Della-Martin et al Aust. Met. Mag. 53 (2004) 75-93.

      I have not confirmed if BAS take this work at all. If they do, I have no confirmation that they use it straight. If they get it from KNMI, there are possible errors. If they get it from GISS or HADCRU there are possible further adjustments. I simply do not know the paper trail.

      It seems quite hard to get any global temperature data in the form as read on the instrument, without getting the original observers’ paper reports. So I suspect it all of adjustment until it can be shown it is not.

    115. TCO said

      Bald Assertion Man Sherrington (99):

      “My 2 bob’s worth is that the ground station data have been adjusted then used out of the realm of reality.”

      —————————-

      Then challenged to back it up, by others here. His support for his assertion is NOT facts or observations…but essentially an inability to prove the converse.

      —————————-

      Sherrington (114):

      “If…it is likely…I have not confirmed…I have no confirmation…possible errors…possible further adjustments…I do not know the paper trail…I suspect…”

    116. Geoff Sherrington said

      TCO 115

      I have provided one scientific reference from an authoritative source. That is a start of a proof, enough to warrant some extension. I am the type of scientist who leaves room for others to correct my errors when they have better sources than I do.

      If you have documented proof that the temperature data in question were NOT adjusted, then let’s hear it. But I will not be baited. Not my game.

    117. TCO said

      [snip] – g’morning tco

    118. TCO said

      :)

    119. Geoff Sherrington said

      117 TCO

      I read the original of your snipped comment and it is most unbecoming.

      I do have more extensive information and I have been requested not to use some to protect the authors. I was trying to use oblique terms to suggest that the Steig analysis might benefit from a look at the base data and its extent of adjustment. I did this in a neutral and friendly way.

      I have some data not in the public domain. I merely suggested that if you had contra data that were not restricted, then show it. My exercise was to encourage more data to come forth, not to have it held back on the excuse of a logical expression tactic.

      So you see, it is possible to write a logical post without problems arising from missed sedation medication.

    120. Jeff Id said

      Geoff,

      If you can share some of it that I can post, I’m interested in taking a look.

    121. Geoff Sherrington said

      120 Jeff

      Would you part with your email? There are large spreadsheets involved.
      Mine is sherro1 at optusnet. com. au

    122. TCO said

      Bullshit, Geoff. If you don’t want to talk about things, email the authors. You are on a blog. On the internet. With comments enabled. You make brave assertions and then don’t back them up. Why not just say, “I’m worried there might be an issue with the base data”?

    123. Ryan O said

      I am interested as well, Geoff.

    124. Kenneth Fritsch said

      I do have more extensive information and I have been requested not to use some to protect the authors. I was trying to use oblique terms to suggest that the Steig analysis might benefit from a look at the base data and its extent of adjustment. I did this in a neutral and friendly way.

      The authors of Steig et al. (2009) elude to potential errors in the raw data from both the AVHRR (cloud masking from Steig and we have suspicions of satellite differences from Ryan O’s analyses) and the surface AWS stations (height differences as I recall) and more than one might expect given the general tone of the paper.

      I think one would have to be rather unaware of the Antarctica conditions not to suspect that there are not larger temperature measurement errors (and potential for need of adjustments) there than at my local weather station up the street. One only hopes that those errors are not biased in one direction, but even if not, the error spread could add some uncertainty to the trend slopes with time and in determining at what point those trends may be significantly different than zero.

      Perhaps we could outfit some of elite core of the Watts CRN quality team to do audits down there.

    125. TCO said

      Ken: You basically said the same as I, but less punchily.

    126. Kenneth Fritsch said

      The short version, Geoff, is that I agree with your approach and that we never get enough good audits of the temperature measurements that are used in climate science and reconstructions.

      Vive la team Watts and now Sherrington.

    127. TCO said

      [snip] He mouths off about all kinds of intermediate results and then gets pissy when JohnV adds them up. He still hasn’t retracted his solar silliness. The guy is a complete lightweight moron. Steve McI not calling him out just shows how low the Fields medal prize wannabe has fallen. [snip again]

    128. TCO said

      [snip] find a better way to say it TCO, I don’t like playing the job of dad to an out of control teenager. You are welcome to express your opinions in a reasonable manner. This is a waste of my time.

    129. TCO said

      Sorry.

    130. Geoff Sherrington said

      Right now I am part way through preparing a private correspondence to Jeff Id so that he can use his judgement on his blog as to the consequences of use of certain data. There are many discussion points and some large spreadsheets to make valid deductions and there are also past misstatements to be corrected.

    131. TCO said

      Weak.

    132. lweinstein said

      While it may be of some technical interest whether the temperature overall increased or decreased, the discussions above are about a very small change either way (order of plus or minus 0.1C per decade or less). The actual main point here should be that the Antarctic overall has not seen the large increase predicted by modelers for the AGW hypothesis. The excuse is now given that it is the Ozone hole. The rest of this discussion is missing the forest due to trees blocking the view.

    133. Geoff Sherrington said

      Lweinstein June 6, 2009 at 9:57 am

      It’s not just Antarctica. The Islands like Macquarie and others at 50s latitudes are also reluctant to warm. I can show you a number of rural stations in Australia up to 15 deg S that have refused to warm also. Another worrying point is that the 1998 global peak is absent or very hard to see in these examples. So much work has been NH-centric, poor old SH has been taken for granted too much.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

     
    Follow

    Get every new post delivered to your Inbox.

    Join 140 other followers

    %d bloggers like this: