the Air Vent

Because the world needs another opinion

Slow Feedback

Posted by Jeff Id on August 19, 2011

A recent Lindzen and Choi paper was highlighted at WUWT.  I’ve read it a couple of times and found it to be a comprehensible thing with not many field specific statements to grock the point.A lot of acronyms but those also are workable with time.  What they say, if I am understanding, is that mass regression of feedbacks vs forcings are basically an ill-conditioned matrix( with wildly sensitive solutions) because there is little signal variance to regress on. Therefore this paper chose to do their regressions on regions of the data whereby the greatest change occurred. A large change in temperature should coincide, with lag, to a large change in feedback (emissions of radiation to space).  A near zero change would have only noise to work with on feedback measurements.  So we should seek out the large changes for a proper analysis.  Sensible enough.

The reasoning for this is that, by definition, a temperature change is required to produce radiative feedback, and so the greatest signal (and least  noise) in the estimation of feedback should be associated with the largest temperature changes. Thus, it is advisable, but not essential, to restrict oneself to changes greater than 0.1oC; in fact, the impact of thresholds for ΔSST on the statistics of the results turns out, however, to be minor (Lindzen and Choi, 2009).’

The impacts of delta SST may be minor but it apparently is important to have some delta or the regression overestimates feedback:

In contrast, we show that simple regression methods used by several existing  papers generally exaggerate positive feedbacks and even show positive feedbacks when actual feedbacks are negative. We argue that feedbacks are largely concentrated in the tropics, and the tropical feedbacks can be adjusted to account for their impact on the globe as a whole. Indeed, we show that including all CERES data (not just from the tropics) leads to results similar to what are obtained for thetropics alone – though with more noise.

What I read is that by analyzing the large variance portions of SST with similar methods that look at all Sea Surface Temperatures, they found a different result supporting a very low climate sensitivity to CO2 through observation.  It turned out that the tropics are definitely the region of interest in models vs observation.

There is smoothing used in the choice of data intervals.  This seems reasonable as weather noise at monthly timescales would create difficulty picking consistent intervals and a feedback study of Earth, the choice of high variance intervals shouldn’t change dramatically with or without smoothing.  Also, the simplicity of the study is difficult to miss.   Due to the short term nature of the calculations, sensor drift over years of time makes near zero difference as well so the study has a Gavinesque robustness.  Hopefully, someone will tell me what I’m missing because the paper is fairly definitive about climate models in general.  It supports many of the model inaccuracies documented at various non-sanctioned blogs.  The paper is far to direct to be ignored by mainstream science in my opinion,  if there is a problem with it, the pro’s have to address it.

65 Responses to “Slow Feedback”

  1. timetochooseagain said

    The criticisms I’ve seen so far, and the way I see them:

    1. “The focus on the tropics is erroneous because of how important the ice albedo feedback must surely be”

    This was raise a few times in WUWT comments. Frankly I don’t buy that the ice albedo effect is that large. However, if the high latitude feedbacks are a priori expected to be higher, then the models analyzed in their tropics should have had there sensitivities systematically underestimated. This did not happen. Another problem: I, and Roy Spencer, have both analyzed the global CERES data (rather than mainly tropical ERBE data) and come to very similar results: That the data covering the entire period is decorrelated by noise, if one looks at short time periods, individual time periods (in my case, even including the small change ones) one finds indication of strong negative feedback:

    2. “This is just the fast response, the slow response is bigger” or “TCR vs ECS”

    This, I think if you read my comments at WUWT, you’ll find makes sense if you are assessing the relationship of forcing to temperature change, but most of the reactive properties of the system have timescales much faster than the time necessary for the system to reach equilibrium. So the “TCR vs ECS” issue is not really a big deal with assessing the feedback to temperature change. You can see this must be the case, sense the models, again, did not have their sensitivities systematically underestimated by this method, which they should have if the thermal inertia of the system were biasing the results due to short time periods. Lastly, if one wishes to postulate that there are slow feedback processes, I would agree, but doubt they are relevant for near future AGW: The Carbon Cycle feedback has a timescale on the order of several centuries (see the ice cores) and at any rate slow feedbacks could be positive or negative. The slow ice sheet feedback is another one, but it is probably pretty weak when the ice sheets are restricted to very high latitudes with little insolation, as during the current interglacial and even more so in a warmer world: basically it is a highly non-linear feedback, I tend to think, which weakens when the world is warmer, strengthens when it is cooler. Again, models would need to include this too, and evidently they don’t, or it is a minor factor.

    3. “Paleoclimate data from the Last Glacial Maximum says 3 C for two times CO2”

    In point of fact, the interpretation of the data so involved rests on a number of assumptions:

    A) The response on the timescale of a glaciation is the same as that really relevant for near future AGW. It almost certainly is not.

    B) That we know the forcing mechanisms that caused the glacial interglacial transitions. In fact, the forcings that are often used are highly uncertain. If you ascribe the change to a set of forcings when some other forcing is left out, because you just don’t know about it, you will tend to overestimate sensitivity. Many of the forcings are actually feedbacks (CO2, Ice Sheets) and the sensitivity if only the dust forcing is use is like 30 C.

    C) Related to above, the mechanism for the glacial interglacial swings is Milankovitch. None of the forcings used to explain the glacial interglacial swings by “mainstream science” in getting it’s sensitivity in that manner include Milankovitch, because it operates by changing the latitudinal and seasonal distribution of insolation. Averaged globally there is no forcing from Milankovitch, and yet this latitudinal/seasonal mechanism is the ultimate origin of these global changes. In the current understanding the long term climate sensitivity must be infinite, sense a zero forcing is responsible for a five degree global temperature change. In reality, Richard Lindzen has in fact proposed how the mean climate could be changed by changing latitudinal distribution of radiation, with a negative feedback strongly constraining tropical climate:

    Click to access 171nocephf.pdf

    4. “The satellite radiation data are bad”

    I really have to stand in dumbfounded by these kinds of responses. Yeah, all data are flawed in some way or another. The question is, how is it even possible for these data to be flawed in such a way that we can reasonably conclude it biases the conclusions, and not only that, that it biases the conclusions in a particular direction? To say, “well it could be something wrong with satellites” is, frankly, really difficult to believe.

    So summing up, I have yet to meet an argument against this paper that an undergrad engineering student couldn’t tear apart with minimal research.

  2. laterite said

    Hi Jeff Id, I have an issue with it, and Spencer as well, which could be summed up as if the climate sensitivity is contingent on the methodology, then his approach is not immune from this criticism either.

    First, imagine a theoretical framework for the system, based on random top-of-atmosphere temperature perturbations, and integration of those perturbations at the bottom-of-the-atmosphere, i.e. the surface. Then the BOA is the accumulation of the TOA perturbations, just like a random walk is an accumulation of IID perturbations. There is going to be a certain characteristic decay which makes it not, but ‘almost’ a random walk, but sufficiently high to be ignored for this argument.

    Now even though the BOA perturbations are fully determined by the TOA perturbations, the two are essentially uncorrelated. They can appear to be correlated under various smoothings and time scales of the analysis though, so the low correlation case (high climate sensitivity) and the high correlation case (low climate sensitivity) can be produced arbitrarily by the choice of parameters.

  3. Brian Hall said

    But he’s not addressing sensitivity per se; just the speed of OLR response to warming. This cuts the ground out from under high sensitivity on fundamental grounds: there’s no “there” there. I.e., there’s no warmth and IR left over to explain.

  4. laterite said

    Brian, from the first line of the abstract “We estimate climate sensitivity from observations …” moreover “Low sensitivity of global mean temperature anomaly to global scale forcing does not imply that major climate change cannot occur.” Much depends on the definition of sensitivity and how it is measured. To cut to the chase, I have in mind a small solar forcing over a long period of time accumulating heat in the ocean. In this case, there is nothing left over to explain, as you say, but it looks like high sensitivity, even though the K/Watt relationships are consistent Boltzmann-Planck. The atmosphere is largely ‘driven’ by this dynamic, but where and how you measure the atmosphere gives you different results for sensitivity, because different parts respond at different speed.

  5. Laterite you say “Much depends on the definition of sensitivity and how it is measured.”

    I would assume this statement applies to those studies that result in high sensitivity as well. Correct?

  6. cohenite said

    The standard for sensitivity is 2XCO2 = 3.2C; the AGW crew has muddied this simple confection with transient and equibrium sensitivity, as well as raising the bar with the latter so that, I think, the record now stands at 12C, although the paper does mention 25F:

    If I understand ‘Laterite’ correctly what ‘he’ is talking about is a sustained, above average solar forcing; that is completely different from CO2 forcing, which is ‘trapping’ of the solar forcing; I’ve never understood how extra CO2, from whatever source, could be construed as a forcing in the same league as increases in TSI.

  7. Anonymous said

    Laterite, I skimmed your submission to Climate Research. Very interesting. In fact, matches my own intuition 🙂 .

  8. jstults said

    regression of feedbacks vs forcings are basically an ill-conditioned matrix( with wildly sensitive solutions) because there is little signal variance to regress on. Therefore this paper chose to do their regressions on regions of the data whereby the greatest change occurred. […] So we should seek out the large changes for a proper analysis. Sensible enough.

    This approach seems like it is susceptible to the same sorts of “cherry picking” as the commonly criticized paleo approaches. The proper way to handle an ill-conditioned problem is to pick a better algorithm, not censor the data.

  9. Jeff Id said

    Jstults The data wasn’t censored as I read it. They simply chose high variance points because low variance points have no signal by definition. Think of it as choosing the best signal to noise regions for the regression.


    Certainly it is possible in this kind of noise that cherry picking has occurred. I haven’t run the data obviously but the methodology seems sound to me. Correlations are high where the sensitivity is low. Having almost no time myself, I’ll have to wait for the formal critiques to come out.

  10. timetochooseagain said

    8-I think it is more about attempting to isolate signal from noise, not so much “censoring” data. In my comment above, I try another approach, which focuses on short time periods without isolating larger changes from smaller ones. The answer is very similar. One thing I did notice, however, is that when there are some very short term changes and ones of small magnitude there is a lot of noise. The month length slopes averaged over years due tend, however, toward a fairly narrow range of values.

  11. Alan D McIntire said

    “timetochooseagain said
    August 19, 2011 at 11:44 pm

    The criticisms I’ve seen so far, and the way I see them:

    1. ‘The focus on the tropics is erroneous because of how important the ice albedo feedback must surely be’ ”

    According to this article

    Ice albedo feedback over the polar caps is now zilch. Decreases in ice are completely balanced by increases in clouds. It stands to reason that some sort of balance
    would have been reached, otherwise the unstable balance which led to melting glaciers at the end of the last ice age would have continued until the Greenland and Antarctic ice caps melted.

  12. Jeff Id said

    I thought the title of the post ‘slow feedback’ was cute because the paper is about fast feedback but it took a while before I wrote on it. One of the more interesting things is the lead-lag relationship plotted in the paper using the high variance data. It has quite a strong shift in correlation at zero demonstrating the direction of causality. This gives me some comfort that the authors are on the right track. It makes sense that lags of a few months or less are the case in temperature vs feedback. No proof of anything, just interesting.

  13. laterite said

    John, Anon, Cos, Jeff. The basic rule is that you cant get reliable results from statistical fitting without a correct dynamic model. If you look at Lindzen’s Figure 7 where he has done the simulation studies, his method overestimates the true response to the same degree as the opposition methodology underestimates the response. If the methodology was informed by the correct model of the system dynamics, it would be spot on.

    Both Spencer and Lindzen are aware of these issues of course, and Spencer goes to great length explaining the difficulties. Dessler, however, does not. But anyway, if the dominant dynamic of the system is driven by ocean heat variations, and the atmosphere variability is largely passively driven, its going to be difficult to get stable results without that dynamic being central.

  14. Laterite: That ocean heat dynamic may be nonstationary on all time scales rejecting a simple Dimensional model, or you could look at real nonstationary with LTP phenomena with the common time scale 500* years, correct?

  15. Jeff Id said


    “By contrast, our method shows moderately good performance
    for estimating the feedback parameter especially for significant
    negative feedbacks (comparable to what we observe in the data).”

    It seems reasonable that the high slopes of the high negative feedback would give more accurate results. If you look at pane 3 of fig 7, the estimate by their method isn’t as far off the centerline as the positive feedback.

  16. laterite said

    Jeff, yes, it is not completely wrong, but let me explain more. John, the model I keep in my mind for understanding this stuff is quite simple. The glacial/interglacial range can be modelled by an AR(1) series with coefficient almost, but not exactly one. Because its a=0.999<1 or whatever, it is stationary, and has a very long Tau, or decay time, maybe 4000 years by my calculations. It only loses a very small amount of anomaly every time step. That is the ocean. The surface as AR(1) has an a=0.95, the troposphere a=0.5, and so on up through the atmosphere. The atmosphere loses heat a lot more quickly than the ocean, and loses it even more quickly the higher up you go.

    Now when Lindzen does his comparisons, he correlates the surface with an a=0.95 with the troposphere or above with an a=0.5 or less, using a method that accommodates for the integration relation by taking longer, larger perturbations. The answer he gets is, of course, the atmosphere dumps anomalous heat fairly quickly. Its couched in feedback terminology that makes it more confusing, but essentially the Tau is short. But all that is telling us about is the AR structure of the atmosphere. The vital question of sensitivity lies in how much heat gets into the ocean, and how quickly it gets out.

    Using the example of an IID series (a=0) accumulated into an AR(1) a=1 series, before, as a guide, those random perturbations of the IID series are the signal, they can't be regarded as noise, because on accumulation they create large trends. I am just using this as a counter-example to the idea there is some inherent virtue in distinguishing 'signal from noise' by looking are larger fluctuations. Lindzen alludes to this in an oblique comment on why his method overestimates.

  17. In response to 16 and “”the glacial/interglacial range can be modelled by an AR(1) series with coefficient almost, but not exactly one. Because its a=0.999<1 or whatever, it is stationary, and has a very long Tau, or decay time, maybe 4000 years by my calculations. It only loses a very small amount of anomaly every time step. That is the ocean. The surface as AR(1) has an a=0.95, the troposphere a=0.5, and so on up through the atmosphere. The atmosphere loses heat a lot more quickly than the ocean, and loses it even more quickly the higher up you go"".

    Laterite you stated ""If the methodology was informed by the correct model of the system dynamics, it would be spot on. ""

    So, I must ask do you have at least an enthalpy balance and preferably a mass – energy – reactive-and semi-permeable membrane flux balance??

    1/10 point for the enthalpy balance, if you have to model it. 100 points for a more realistic system.

    Otherwise what would I use to choose the difference??

  18. laterite said

    “So, I must ask do you have at least an enthalpy balance and preferably a mass – energy – reactive-and semi-permeable membrane flux balance??”

    John, what I have shown is derivation of AR(1) from the valid energy balance model (Lindzen’s eqn 8) and interpretation of all of the data from this, including exact prediction of the lags. What I haven’t done relates to the atmosphere/ocean interface viz-a-viz short and long wave. I cant find much useful research on it.

  19. MikeN said

    This paper should not have been published.

  20. #18 I understand that is why I asked.

    Sometimes we just got to use the data we have.

    Sometimes we have to take all statements with a grain of salt.

    “”Especially those who just hafta use the data they got.””

    Some statistician said that I think.

    “”Torture the data long enough and it will confess.””

  21. timetochooseagain said

    11-That may well be one of the reasons I get the essentially the same results for the CERES global data.

  22. troyca said

    I understand the idea that you want to perform the regression during the largest temperature change intervals, as the magnitude of the feedback during that interval is going to dwarf that of the uncertain/internal forcings (which can lead to the decorrelation during the entire period analysis). However, after scanning the papers and the reviews, I’m not sure I understand why LC are using lag times for the feedback response (which will comprise the bulk of the flux). Shouldn’t the feedback be instantaneous? TTCA, aren’t you also (correctly) assuming instantaneous feedback response in your analysis you linked to?

    One obvious issue also seems to be the different lag times for SW vs LW components. The influence of clouds is limited in that they might increase outgoing SW, but they trap LW, somewhat cancelling out the effect…using this method it seems you might count the the increased SW flux in one month when the clouds are at a high, but then also count the increase in LW radiation a few months later when the clouds are lower. So you are adding these responses together, rather than them mostly cancelling each other out. Obviously it is not that black and white, but I agree with the criticism that there would need to be strong justification for using different lag times.

  23. timetochooseagain said

    22-Ideally you would be able to get the feedback from instantaneous reaction to temperature change: In the absence of any forcing confounding the signal Spencer and Braswell (2011) found this to be the case in a simple energy balance model situation, however the situation if forcing is confounding the signal, as probably occurs in the real world data, is that the relationship at zero time lag to surface temp is close to zero, and one needs to head out to further time lags to get any signal. Unfortunately this doesn’t successfully completely eliminate the signal. For my comparison what I have done is use atmospheric temps. I am not sure why the relationship at zero lag is so strong for atmospheric temps. My current theory would be that the atmospheric temps are occur in response a few months after the surface temperature changes, and thus one gets strong correlation with Atmospheric temps. The question is, do feedbacks occur as an instantaneous response to the surface temperature? This is usually what is assumed, but one needs to change the temperature profile of whole atmosphere to change the vertical water vapor profile, for example. So I am sort of implicitly including some lag from Surface temperature, but I am letting the atmosphere itself decide the appropriate lag, and I believe there will be some since clouds and water vapor have an ambient environment influencing them which is far from the sea surface.

  24. laterite said

    Troyca: “Shouldn’t the feedback be instantaneous?” In an accumulative system, e.g the temperature change of a mass being forced by radiation anomaly, the temperature response will be exactly 90 degrees out of phase with the forcing (the integral of sine is a cosine function). So if the perturbation has a 12 month period, the lag will be 12/4=3 months. If there are substantial losses (low Tau), then the lag will be shorter. Anyway, you get a better result with the 3 month lag because in accommodates the dynamics of the system, better relating the forcing over a period to the integrated response.

    L&C seem to be analyzing annual perturbations judging by the 3 month lag. I don’t know why that happens if the SST and TOA data is global.

  25. 23-eliminate the signal should be ~isolate~ the signal. Or eliminate the noise, if you will.

  26. It was useful to me to use an analog of dendrothermometry. In theory, if you choose a time period of tree growth in which the temperature never changes (sensibly speaking), then you have constant tree ring properties and nothing to regress against nothing. This changes if you have had a past period of temperature change for a number of years, adequate to show a signal through the noise. You can then regress because you have a slope. Next question is, do the tree rings show a lag or not as temperatures change? If there is a large lag, then one could have the confusing result of lagged tree ring patterns appearing in a counted time of flat temperatures. Lags would be expected in trees – higher temperatures might produce more leaves over a period of several years and so alter photosynthesis in AND AFTER that time. Hence both slow response and fast response.
    Using this analogy, I felt the selection of times of change was quite appropriate by LC. I felt that the unanswered question was, if lagged responses occur, is whether the rate of change during a lagged response is the same in a time of positive response as in a time of negative response.

  27. timetochooseagain said

    26-One difference I feel I must point out is that while physically a connection between tree rings and temperature may be speculative, a connection between radiation flux and temperature surely isn’t. To my knowledge Dendro work is not informed by physical theory but naive ideas about the climate conditions favored by trees.

  28. troyca said

    23- To clarify, physically the feedback response to temperature changes IS instantaneous, but it may be necessary to use a lag time to draw out the signal? I suppose this makes sense, assuming that much of the temperature change over the interval is radiatively forced. I’m still not sure I see how one can justify using different SW and LW lags, however, given the issue I’d mentioned above.

    You bring up an interesting point of whether the feedback is instantaneous with respect to surface vs. atmospheric temperatures, and I suppose it should be a little of both, but the bulk should be with the atmosphere. I too got a larger correlation/response when using TLT temps for calculating the cloud feedback:

    I also recall doing some lag regressions a while back on the AMSU SST vs. Ch.5, and did find a higher correlation after a few months. This should be pretty easy to do with the different monthly temperature indices, and I’ll do it when I get a chance.

    24- Interesting, and I understand that the feedback won’t be instantaneous with respect to the _forcing_, but rather I was saying it should be instantaneous in response to the temperature change.

  29. Layman Lurker said

    Interesting discussion. The noise structure issue raised by Laterite is a potential objection to LC and the comparison to proxy calibration should be turning on a lot of light bulbs at tAV.

  30. timetochooseagain said

    28-Yeah, basically that’s the deal. You raise a good point with respect to the separate effects of Longwave and Shortwave. This might actually justify use of discriminating between clear sky and “unclear” fluxes, to determine whether one is confusing the effects of clouds at different timescales. Personally I care little where the feedback originates, so I just use the net radiation. I’ll miss some subtleties, sure, but I think I still get a reasonable answer.

  31. 27 ttca Agreed, I tried to cover this by stipulating that the signal was visible through the noise. (I spent about 6 years researching plant nutrition and fully accept your caveat).

  32. Craig Loehle said

    Jstults #8: “the proper way to handle an ill-conditioned problem is to choose a better algorithm”–in the realm of mathematics, yes, though even then not all ill-conditioned problems are solvable. But with real data, ill-conditioned means that tiny changes in the noise (error) will lead to large differences in the result. In a good data set, with a good R^2 value, removing a data point does not affect the results much. In an ill-conditioned problem, this is not true.

  33. jstults said

    Data are not ill-conditioned. Algorithms are ill conditioned. Data do not have “results”. Algorithms have results.

    In a good data set, with a good R^2 value, removing a data point does not affect the results much. In an ill-conditioned problem, this is not true.

    Thank you for proving my point. Robust regression.

  34. DeWitt Payne said

    Re: jstults (Aug 23 15:39),

    Isn’t it the problem that is ill-conditioned? If the problem is ill-conditioned, there may be better or worse algorithms, but the results will still be sensitive to small changes in the input.

  35. jstults said

    DeWitt, I don’t think ill-conditioned is a property of problems (ill-posedness would be). Some problems just require more bandwidth or more sophisticated algorithms than others. The classic example is collisions of many 2D hard spheres, very sensitive to initial conditions; the way that is addressed is tweaks to algorithms and going to higher precision floats. So “ill-conditioned” is a relative thing that’s defined by your algorithm and the bandwidth you have available for processing, not by the data or the problem (i.e. how sensitive is “very sensitive”? well, it depends…).

    Probably this is technical nit-picking, but it seems like “ill-conditioned” is being used inappropriately here as an excuse for data selection choices which may or may not be all that well-founded.

  36. jstults said

    Dewitt, another good example of ill-conditioning being a property of solution methods and computing machines rather than problems is long-time computability of the Lorenz 1963 system (reference from this post):

    [8] Kehlet, B. and Logg, A., “Long-Time Computability of the Lorenz System,”

    They improved the algorithm (adaptive time-stepping) and increased the precision of their floats to converge solutions over long integration times. There is a definite limit to the length of the integration time for which the solution can be converged given a fixed floating point precision, so I guess you could say that any problem with that characteristic is special in some way (but I don’t think “ill-conditioned” would be the right name for it).

    It’s neat how that toy is so widely useful for illustrating concepts; I think I linked another of my posts on that system in a thread at The Blackboard you may remember about “abominable models”. Anyway, good evening to you gentlemen, I’m off to take my random walk…

  37. Craig Loehle said

    Jstults: you are referring in your examples to pure mathematical functions. Perhaps “ill-conditioned” is not the best term for what Lindzen is getting at, but the problem is that for his example very small differences in the noise in the data lead to wide divergence in the estimated sensitivity. An example using your Lorenz system would be estimating the parameters of the underlying Lorenz model from any data set with any degree of noise. Since the trajectories must be exact for correct parameters to be estimated, the inverse problem (estimating those params) is infinitely sensitive to noise (whatever you would like to call that).

  38. jstults said

    No, it is not infinitely sensitive to noise; yes, it can be very sensitive. I encourage you to read up on nonlinear time series analysis (exactly that estimation problem you mention); somehow they are able to estimate those dynamic parameters of even chaotic systems from “real” data…

    Is the level of noise low enough in the case of climate data to apply those techniques? You may be right, it’s probably not.

  39. DeWitt Payne said

    Re: Craig Loehle (Aug 25 15:21),

    Inverse problems are frequently ill-posed rather than ill-conditioned. Although they may be ill-conditioned too. They are ill-posed because there are an infinite number of solutions even if the data wasn’t noisy and you had a computer with infinite bandwidth. Some can be solved because there are other constraints on the solutions such as smoothness. You also usually need a good guess of the solution to start with as well. That’s the whole point of the various versions of the TIGR (Thermodynamic Initial Guess Retrieval) database for calculating temperature profiles from satellite data. X-ray crystallography is also ill-posed in the absence of phase data.

  40. jstults said

    Good point DeWitt; it’s never straight-forward to turn measurements on the boundary into properties on the interior. An example I worked on recently was a reconstruction based on an inverse Abel transform (where the axi-symmetric assumption is the additional bit of info that fixes the ill-posedness). It still requires a derivative of the data (which can be sensitive to noise), and there’s also a singularity; the fix wasn’t to ignore noisy parts of my data; it was to calculate an approximate derivative, and propagate the error introduced by the approximation to the reconstructed result.

    Yes, that means I get larger error bars on my results, but they represent the uncertainty that’s warranted by the all of the data I’ve got. On the other hand, quantifying the uncertainty introduced by ad hoc censoring procedures is fraught with difficulty, and is rarely done with any rigor in practice.

  41. Carrick said

    My two cents:

    Inverse problems are generally noisy, but it’s because the basis functions you are expanding over aren’t orthogonal, and it doesn’t require ill-conditioned or ill-posed problems for that to happen.

    Suppose we’re trying to measure a quantity “y” as a function of x:

    y = a_1 Phi_1(x) + a_2 Phi_2(x) + … a_N Phi_N(x)

    over some range x1… x2, where the a_i, i = 1…n are the coefficients we are trying to solve for using OLS, and the Phi_i(x) are the “basis functions” for the problem. For example, these might be as simple as:

    Phi_i(x) = cos ( 2*pi*f_i*x)

    and f_i is an array of frequencies that we’re trying to extract the amplitude of y from.

    Applying the OLS mechanics, it’s easy enough to show that you get an equation like H . A = B where H_ij = < Phi_i | Phi_j > is the dot product being the ith & jth basis function, A_i = a_i, and B_i = < Phi_i | y >

    It can be shown that the uncertain in the parameters a_1, …. a_n scales as G = sqrt(lambda_max/lambda_min), where lambda_i is the ith eigenvalue of H. This factor G is called the “noise amplification factor” and in general for OLS G ≥ 1,

    If the Phi_i are othornormal, then by definition H_ij = delta_ij, where delta_ij = 0 1 if i=j and 0 if i ≠ j, and lambda_max = lambda_min = 1, and G = 1.

    You get noise amplification regardless of whether the problem is ill-conditioned (which means means that the ratio lambda_max/lambda_min is large, and the inverse process is said to be “very noisy”) or ill posed (which means there is one or more eigenvalues lambda_i that are zero).

    Ill-posed problems are solved by taking the pseudo-inverse (known in some circles as singular value decomposition, though that’s really just the decomposition method that can be used to form the pseudo-inverse) or via some other regularization scheme (e.g., RegEM uses ridge regression as one of its methods) Regardless of method, a singular matrix gives rise to a solution space + a null space, which is the set of linear combinations, y = c_0 Phi_0 + c_1 Phi_1 + … for which H. y = 0. Since there are an infinite number of these null space solutions (it’s easy to show these are not all linearly independent themselves), you end up with an

    It’s necessary and sufficient to say that if you have an infinite number of solutions to H . A = B that you have at least one zero eigenvalue, so you can use either zero eigenvalue or infinite number of solutions as definitions of “ill-posed”

    My only clarification to what DeWitt said is that the inverse process is generally “noisy” (G > 1), regardless of whether it is ill-conditioned or ill-posed. It happens, but it’s uncommon for G = 1.

  42. […] a recent discussion at the Air Vent, TTCA brings up an interesting point.  Basically, if the bulk of the feedback response is due to atmospheric temperature changes, […]

  43. jstults said

    Just to tack on to Carrick’s two cents about pseudoinverses; here is an infamous example of a problem where some methods are ill-conditioned, and give wildly incorrect results, and some are not. People wedded to poor methods decried the problem in that case too.

  44. Carrick said

    jstults that’s a pretty interesting case.

    How you approach this problem depends on whether you actually want the fitted coefficients or just the smoothing polynomial, and then to what accuracy. Since the input data is accurate to only five places, a simple transform of the sort xprime = (x-xbar)/sigmax and yprime = (y-ybar)/sigmay would do it.

    (This transforms problem from an H which is terribly ill conditioned–numerically it has one negative eigenvalue–to one with a gain factor of about 11,000.)

    Of course you can transform back, with a slight degree of difficulty if you don’t have matlab’s symbolic package ;-), and you can get the coefficients. These will have lost about 5 significant digits. Here’s a comparison of the methods:

    -1467.4896142298 -1467.489613921602
    -2772.17959193342 -2772.179591346758
    -2316.37108160893 -2316.371081113799
    -1127.97394098372 -1127.973940739703
    -354.478233703349 -354.4782336255697
    -75.12420173937571 -75.12420172261612
    -10.8753180355343 -10.87531803306112
    -1.06221498588947 -1.06221498564259
    -0.06701911545934081 -0.06701911544337952
    -0.00246781078275479 -0.002467810782151
    -4.02962525080404e-05 -4.029625249788778e-05

    Of course the difference in the coefficients is not significant, because the uncertainty in the y-values is larger (they give the errors as 10% of the values).

    I could cheat at this point and go to quad-precision, if I wanted to match their numbers (assuming they are correct to this number of places).

    An alternative of course is to use the modified Graham-Schmidt method to generate a set of orthonormal basis functions, and fit to an expansion over these.

    If you have a link to some of the other solutions, I’d appreciate it.

  45. jstults said

    Carrick, nice! IIRC, QR works well for this one (Givens rotations or Householder transformations rather than Graham-Schmidt is preferred, but same basic idea, build a good numerical basis). Sorry I don’t have a link handy; I was looking for the description of how the NIST guys calculated the “certified” coefficients the other day, but some of the links on the site are stale. I think they used something like 500 digit extended precision.

  46. Carrick said

    Jstults, if I get a chance, I’ll test both algorithms. It’d be interesting to see of Householder gives a measurable improvement over modified Graham Schmidt for this algorithm. (Once you’re at the numerical noise floor, that’s where you are. 😉 ). It was nice to see that the general formalism predicted the correct amount of loss of precision (5 digits).

    Of course, using 500 digit precision is … blatant cheating. I could do much better just with ordinary quadruple precision & the Z transform (the statistical one I used above, not the DSP version).

  47. Carrick said

    I mean … “for this problem” not “over this algorithm.”

  48. jstults said

    Householder gives a measurable improvement over modified Graham Schmidt for this algorithm. (Once you’re at the numerical noise floor, that’s where you are. 😉 )

    Yeah, the differences between the “good” algorithms tend to be marginal. With a fixed precision, I think you’ll find better answers with the other two methods, but you’re right, at some point ease of implementation is more important than getting those last few digits.

  49. Jeff Id said

    Carrick Jstults,

    The methods might make a fun post if you’re interested. I’m not sure how you transform it back offhand but would be interested in how you did it. Years ago I ran into this problem while fitting fourth order equations to a complex lens surface in excel. I needed the general curve but needed evenly spaced points for tooling the surface. Just removing the mean and adding it back onto the data after the fit was enough to solve my problem.

  50. Andrew said

    Quite apropos to discussing feedbacks would be Lindzen’s recent ACS presentation:

    Click to access acs-2011-lindzen.pdf

    Particularly interesting is that he notes that one should also be able to assess feedbacks by looking at how evaporation changes with temperature. The evaporation (precipitation) results are very similar to those for radiation for sensitivity.

  51. jstults said

    Carrick, after centering and normalizing, what did you use to calculate the fit? The error shape of your solution is similar to what I get with just running the raw data through Scipy’s polyfit(), which I think uses a pseudoinverse. Error plot here:

  52. Carrick said

    Jeff, I just used a standard lower-up inverse routine. After centering & variance adjusting the matrix is pretty well behaved (G=10000). Just centering gives you G=1e5, which still isn’t terrible for experimental data.

  53. steve fitzpatrick said

    I beg you to compare the technical content of the above exchange with the ridiculous rubbish purveyed at most CAGW sites. Skeptics worry about how to rigorously solve messy inverse problems, alarmists worry about…. well, they mindlessly worry most everything, especially how we are killing all life forms on Earth with everything single we do. Too bad the MSM can’t (or won’t) see this.

  54. Jeff Id said

    I think this is where climate science often fails. They have no feedback for their errors on reasonable timescales. I have to admit, many think I’m cocky now, but when I was 22 there was no stopping me. It really was my own endless proven mistakes in between my unique successes which taught me to not be Michael Mann. It also taught me to keep studying and realize that errors aren’t the end of the world. I’ll disagree endlessly with Pat Frank, but if he proves me wrong, I will give in. Right now, I believe he is quite firmly incorrect on a number of issues but if he convinces me, my opinion will change on a dime and that will be the end of it for me. The same goes true for the AGW alarmists.

    Steve posted that Lindzen would be putting up his data and code on line. That will be interesting to see.

  55. jstults said

    Carrick said: Once you’re at the numerical noise floor, that’s where you are.

    That’s about what I found comparing a couple different methods; may be of interest to folks who want a little more insight into black boxes they’re using in R.

  56. Carrick said

    Jeff & Steve, my personal interest in a lot of climate science is the availability of data sets with interesting data, that I can apply modeling techniques and learn new things from. So I find discussions about which method is more efficiently numerically or statistically to be much more interesting the than “OMG we’re dead” crap found on some of the other blogs.

    I sometimes “over analyze” the data because my interest isn’t always in the answer but exploring which method works better for that data. The resolution of questions like “is there a tipping point” is less interesting than “is there a way to quantify and statistically test for what a ‘tipping point’ is?”.

    For the alarmist types (and this is pretty much a definition of them), a) of course there is a typing point and b) by definition we are nearing it, and c) the distance to the typing point is independent of time.. (We are as near to it now as we were in 1988, for example. Things are much worse than we thought.)

  57. Carrick said

    Last paragraph, of course we’re near the “typing” point. Meant “tipping point” though. LOL.

  58. My belated tuppence worth on ill-conditioning –
    We had a similar discussion of polynomial fitting a few years ago at David Stockwell’s. There the issue was not so much inverting the matrix, but the confidence limits on the regression coefficients. But the remedy was the same – shift to a centered origin, and then use orthogonal polynomials (Legendre there).

    Effectively it’s preconditioning the matrix to be inverted. With the regression equations
    y=Xb leading to b = (X^*X)^-1 X^*b
    you have to improve the condition number of X^*X. You can do this with an appropriate multiplier
    U^*X^*XU, but you have to be able to invert U. You’d often make it upper triangular. Carrick’s centering (#44) effectively makes U a triangular matrix of binomial coefficients. You can further multiply by a matrix of orthogonal polynomial coefficients.

    Gram-Schmidt orthogonalisation (#44) is effectively equivalent to inverting X*X by LU factorization. I think that makes a point that the manoeuvring with powers of x is not really what NIST intended with their problem. Applying the preconditioner directly also loses accuracy. If you make the binomial type transform by subtracting a mean value from x and then recomputing the powers, that is much better than using binomial sums. But you have used special properties of the matrix. I think NIST did not intend this.

  59. jstults said

    Nick Stokes, just curious, what do you think NIST intended?

  60. Re: jstults (Sep 1 22:51),
    Well, the link to the page which might have said explicitly is broken. But they have classed it as a problem of higher level of difficulty. Now as Carrick says, you can just centre the predictor variable and a lot of the difficulty goes away. That’s an obvious step. And using orthogonal polynomials isn’t genius stuff. So I think they refer to the problem of solving without using the polynomial underlying structure.

  61. Carrick said

    I would suppose one use of the problem was as part of a suite to test people’s inverse solvers, but without my secret decoder ring I wouldn’t be able to know whether that was the singular motivation of the author in posing this problem.

    I personally think the point is that there is no point, other than “hey this is an interesting problem, what can you learn from it?” Anyway, once you pose the problem, your motivations for posing it become irrelevant (unless you are teaching a class and have some leverage over how the reader chooses to interpret the questions posed by the problem).

    Regardless of the function you are trying to approximate, there are good ways of doing it, and poor ways of doing it. NISTs intensions for posing the problem aside, stated or unstated, I personally think this particular problem mostly illustrates the trouble with a poorly chosen basis set, bad experimental design or both.

  62. Carrick said

    Correct me if I’m wrong, but I don’t think this statement is true, at least in the world of finite precision:

    Gram-Schmidt orthogonalisation (#44) is effectively equivalent to inverting X*X by LU factorization

    Direct LU factorization has all of the problems of the high condition number of the matrix, whereas modified Graham-Schmidt is equivalent to QR.

    What am I missing?

  63. #62
    Well, since the matrix is symmetric, I probably should have said Cholesky rather than LU. But Cholesky transforms it into a product of a triangular matrix and an identity, which is pretty much QR.

    The matter seems to be described in the last para of Wiki on Gram-Schmidt. In the light of that, it’s not clear that my statement is exactly right, but it’s all connected.

    I’ve posted my version of the solution on jstults’ blog.

  64. Carrick said

    Nick, from your link (referring to Cholesky decomposition):

    Yet another alternative is motivated by the use of Cholesky decomposition for inverting the matrix of the normal equations in linear least squares. Let be a full column rank matrix, which columns need to be orthogonalized. The matrix V*V is Hermitian and positive definite, so it can be written as V*V = LL* using the Cholesky decomposition. The lower triangular matrix L with strictly positive diagonal entries is invertible. Then columns of the matrix U = V (L^-1)* are orthonormal and span the same subspace as the columns of the original matrix V. The explicit use of the product VV* makes the algorithm unstable, especially if the product’s condition number is large. Nevertheless, this algorithm is used in practice and implemented in some software packages because of its high efficiency and simplicity.

    The condition number in this case is around 10^62 (computed in Mathematica using 500 digits of precision), which certainly is “large”. 😉

    Whether you want to let Q = I and let R = L is your choice and call Cholesky a type of QR is yoru choice. I’ll stick to thinking of them as very different algorithms, as apparently does Wiki when it lists different matrix decomposition methods.

  65. jstults said

    Nick Stokes, I don’t think the various flavors of QR are “effectively equivalent” to solving the normal equations (by LU or any other method). The whole point of that factorization is to completely avoid the matrix transpose multiplication which makes the condition number explode. If you meant “analytically equivalent”, then you’re right: they are all supposed to give the same least-squares answer in theory. However, since we have finite precision machines, the algorithms are miles apart in practice, one is quite robust, and the other is often… ineffective.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: