Yet another alternative is motivated by the use of Cholesky decomposition for inverting the matrix of the normal equations in linear least squares. Let be a full column rank matrix, which columns need to be orthogonalized. The matrix V*V is Hermitian and positive definite, so it can be written as V*V = LL* using the Cholesky decomposition. The lower triangular matrix L with strictly positive diagonal entries is invertible. Then columns of the matrix U = V (L^-1)* are orthonormal and span the same subspace as the columns of the original matrix V.

The explicit use of the product VV* makes the algorithm unstable, especially if the product’s condition number is large.Nevertheless, this algorithm is used in practice and implemented in some software packages because of its high efficiency and simplicity.

The condition number in this case is around 10^62 (computed in Mathematica using 500 digits of precision), which certainly is “large”. đŸ˜‰

Whether you want to let Q = I and let R = L is your choice and call Cholesky a type of QR is yoru choice. I’ll stick to thinking of them as very different algorithms, as apparently does Wiki when it lists different matrix decomposition methods.

]]>Well, since the matrix is symmetric, I probably should have said Cholesky rather than LU. But Cholesky transforms it into a product of a triangular matrix and an identity, which is pretty much QR.

The matter seems to be described in the last para of Wiki on Gram-Schmidt. In the light of that, it’s not clear that my statement is exactly right, but it’s all connected.

I’ve posted my version of the solution on jstults’ blog.

]]>Gram-Schmidt orthogonalisation (#44) is effectively equivalent to inverting X*X by LU factorization

Direct LU factorization has all of the problems of the high condition number of the matrix, whereas modified Graham-Schmidt is equivalent to QR.

What am I missing?

]]>I personally think the point is that there is no point, other than “hey this is an interesting problem, what can you learn from it?” Anyway, once you pose the problem, your motivations for posing it become irrelevant (unless you are teaching a class and have some leverage over how the reader chooses to interpret the questions posed by the problem).

Regardless of the function you are trying to approximate, there are good ways of doing it, and poor ways of doing it. NISTs intensions for posing the problem aside, stated or unstated, I personally think this particular problem mostly illustrates the trouble with a poorly chosen basis set, bad experimental design or both.

]]>Well, the link to the page which might have said explicitly is broken. But they have classed it as a problem of higher level of difficulty. Now as Carrick says, you can just centre the predictor variable and a lot of the difficulty goes away. That’s an obvious step. And using orthogonal polynomials isn’t genius stuff. So I think they refer to the problem of solving without using the polynomial underlying structure. ]]>

We had a similar discussion of polynomial fitting a few years ago at David Stockwell’s. There the issue was not so much inverting the matrix, but the confidence limits on the regression coefficients. But the remedy was the same – shift to a centered origin, and then use orthogonal polynomials (Legendre there).

Effectively it’s preconditioning the matrix to be inverted. With the regression equations

y=Xb leading to b = (X^*X)^-1 X^*b

you have to improve the condition number of X^*X. You can do this with an appropriate multiplier

U^*X^*XU, but you have to be able to invert U. You’d often make it upper triangular. Carrick’s centering (#44) effectively makes U a triangular matrix of binomial coefficients. You can further multiply by a matrix of orthogonal polynomial coefficients.

Gram-Schmidt orthogonalisation (#44) is effectively equivalent to inverting X*X by LU factorization. I think that makes a point that the manoeuvring with powers of x is not really what NIST intended with their problem. Applying the preconditioner directly also loses accuracy. If you make the binomial type transform by subtracting a mean value from x and then recomputing the powers, that is much better than using binomial sums. But you have used special properties of the matrix. I think NIST did not intend this.

]]>I sometimes “over analyze” the data because my interest isn’t always in the answer but exploring which method works better for that data. The resolution of questions like “is there a tipping point” is less interesting than “is there a way to quantify and statistically test for what a ‘tipping point’ is?”.

For the alarmist types (and this is pretty much a definition of them), a) of course there is a typing point and b) by definition we are nearing it, and c) the distance to the typing point is independent of time.. (We are as near to it now as we were in 1988, for example. Things are much worse than we thought.)

]]>That’s about what I found comparing a couple different methods; may be of interest to folks who want a little more insight into black boxes they’re using in R.

]]>Steve posted that Lindzen would be putting up his data and code on line. That will be interesting to see.

]]>. ]]>