## Correlation is Not Causation

Posted by Jeff Id on May 1, 2011

On the internet, you can meet an amazing set of personalities. Kim øyhus, a physicist, has an unusual website with some interesting commentary and a near megalomania tone to it (think I’m kidding). There are two ‘proofs’ on it, one we’ve been discussing on a different thread and a second which states the following.

## Correlation is Evidence of Causation

A proof done with conditional probability.

Definition 1 correlation : c

Definition 2 causation : a

Definition 3 not everything correlates : P(c) < 1

Definition 4 causation give correlation: P(c|a) = 1P(a|c) : evidence for causation

= P(c|a) P(a) / P(c) : Bayesian inference

= 1 P(a) / P(c) : definition 4

> P(a) : definition 3Conclusion: P(a|c) > P(a) : Correlation is evidence of causation. Q.e.d.

Which we can all agree with. Correlation is most certainly evidence of causation. Like Kim’s other proof though, this one is also over-interpreted in the conclusion.

Quote from Daniel Dvorkin: The correlation between ignorance of statistics and using “correlation is not causatison” as an argument is close to 1.

So since anyone who puts the effort in can work the math above, and we can all agree that correlation is evidence of causation, why is it that Dvorkin and apparently Kim have such a hard time seeing the other side of the > sign.

Here’s my proof using all of the definitions above and some simple math. This doesn’t rebut Kim’s proof but rather expands on it such that proper conclusions can be drawn.

P(a|c) = P(a ∩ c) / P( c)

Read – the probability of ‘a’ (causation) given ‘c’ (correlation) equals the probability of a and c both occurring divided by the probability of c correlation.

Since by definition above P(c) < 1 – not everything correlates. We can write:

P(a | c) < P (a ∩ c)

In other words, there is a greater probability of causation and correlation existing than causation based correlation alone. So while it is true that correlation is evidence of causation, it is not proof of causation.

Such a simple concept you wonder why you need math to write it down.

My thanks to Kim for the Sunday morning puzzle.

## RomanM said

Sigh, here we go again…

Kim’s definition 4 (which is not a “definition” but an “assumption”) is false. To put it terms in which are more precise and less prone to misinterpretation:

The main problem with Kim’s “proofs” is that are based on ill-defined nebulous structures which (s)he (which is it Kim, he or she, no slight intended – it makes my writing easier) then interprets in similarly ill-defined ways. That is not how real mathematics is done.

Yes, variables that are correlated are not independent. However, the concept of

causalitygoes much deeper (as Kim discovered):That he ended up with “evidence as a mathematical synonym for causation” is unfortunately pure hogwash which redefines causation as something which it is not.

## Oliver K. Manuel said

Yes, correlation is evidence of causation.

But one cause could generate changes in x, y, z, etc.

One cause (a supernova) produced excess Xe-136 by rapid neutron capture, the r-process.

The r-process occurred in the outer part of the supernova, where light elements like He were abundant.

Therefore, we observed that excess Xe-136 correlated with primordial He:

http://www.omatumr.com/Data/1975Data.htm

But in fact He had part in the synthesis of excess Xe-136.

## curious said

“Yes, variables that are correlated are not independent.”

I’m reminded of the pirates and temperatures example:

http://www.venganza.org/about/open-letter/

:-)

## TimTheToolMan said

Jeff Id writes “Correlation is most certainly evidence of causation.”

Except its not and I’m reminded of the example/idea “criminals eat peanut butter” correlation implying peanut butter is somehow implicated with being a criminal whereas the real reason is that they’re simply fed it more often in prison.

When considering real things in real life rather than abstract concepts in mathematics, those real things had better be demonstrably dependent or you’ll go off the rails. Not all real life examples are as clear cut as the prisoner one either.

## TimTheToolMan said

Damn…beaten by the pirates example :-)

## Jeff Id said

Roman,

“Kim’s definition 4 (which is not a “definition” but an “assumption”) is false. ”

I understand what you are saying. Fortunately, my answer didn’t require 4 because it really bothered me.

Do you see anything wrong with my own answer? Seriously, this was my first writing of statistical math in this fashion, I’ve never had a stats class and was rather hoping you would critique my work. Of course I’ve never had a climatology class or a matlab class for that matter either.

## Jeff Id said

I can imagine that causation (however you define it) has a probability. If non-zero correlation is more likely in cases of causation, wouldn’t that be considered ‘evidence’ for causation. Certainly it would increase the probability of causation being true. That’s why I considered correlation as evidence of causation without much discussion of 4.

Perhaps 4 should simply be:

P(c|a) > 0

That doesn’t seem right either. a could be completely independent and still have correlation and should in fact have a zero effect on probability.

———

Nope, I’m sticking with the P(c|a) > 0 formula. Because non zero correlation is more likely in cases of causation it is evidence and that is probability.

I await being corrected :D

## David Starr said

Correlation certainly suggests causation, but doesn’t prove it. Unless a physical mechanism for causation exists, I don’t believe causation has been proven. It’s suggested, but suggestion isn’t proof. Eggs and chickens are highly correlated but we still don’t know which came first, the chicken or the egg. Do eggs cause chickens or do chickens cause eggs?

## RomanM said

#7, Jeff, do I

haveto answer that? ;)Whether c is “evidence for” a (why does Kim use such nondescript notation?) depends on whether knowing that c has occurred

increasesthe probability of a (i.e. P(a|c)) over what it was before we knew that c had occurred. What you have written does not really address that. However, you also used Kim’s incorrect generic statement, P(c|a) = 1, which in proper context is not always true.The real problem with all of the mumbo jumbo that Kim has written is that, like the concept of “teleconnection” in climate science,,although there may be a general grain of truth to some of the concepts, the devil is always in the details. Numerically, his results have zero value because those details determine whether his “theorems” apply. The problem that I assigned him (corrected version) illustrates how the answer of what constitutes evidence for something depends specifically on the numerical structure of the underlying situation. One must clearly delineate all of the assumptions one is making and only then will results have meaningful value.

Of course, because of a failure to distinguish properly among the phrases “evidence for”, “evidence against” and “‘absence’ of evidence”, it really doesn’t matter how the formulae are manipulated.

## RomanM said

#8 David Starr:

This is basically the advice that I have given for many years. This applies not just to “correlation”, but also more generally to

predictability. Without a physical mechanism as aprecedential agent, this may merely indicate a relationship whose specific form is nothing more than speculation.## Jeff Id said

“However, you also used Kim’s incorrect generic statement, P(c|a) = 1, which in proper context is not always true.”

I don’t believe I did use P(c|a) = 1. It really bothered me this morning when reading and I considered writing on it but found the rest more interesting because no matter how confused one becomes on the definitions of causation, the two equtns I wrote seem to be proof to me of the fact that correlation is not causation. It skipped the whole of assumption 4.

## curious said

These zero correlation plots are interesting:

http://en.wikipedia.org/wiki/File:Correlation_examples.png

I did a quick Excel Pearson Product Moment calc on sine x for 0-90deg, 0-359deg and 0-179deg and respectively got these numbers: 0.978511116, -0.779680017, 0.01990246

## RomanM said

#11 Jeff.

OK, go ahead and ruin my attempt at misdirection to put the blame on Kim. ;)

Your statement P(a | c) < P (a ∩ c) really doesn't tell us anything, sort of like apples and oranges. All that is required for that statement to be true is that P(c) , =, or < P(a), depending on how much of c is

outsideof a. Thus it cannot distinguish what sort of information c provides about the status of a.## RomanM said

#12 Curious.

I have been saving one of my favorite examples to toss at Kim O. when (s)he comes to argue my comment #1.

Mathematically, start with a continuous random variable U which is uniformly distributed on the interval [0, 2Pi]. Form the variables X = cos(U) and Y = sin(U). X and Y turn out to have distributions between -1 and 1 and each has a mean equal to zero (e.g. from integrating the function sin(U)/2Pi from 0 to 2Pi for the mean of Y). If you calculate the covariance of X and Y, you also get zero. The math is a little harder, but if you use the trick of sin(2U) = 2 sin(U) cos(U) which you learned in calculus 101, this is also equal to 0. This means that the theoretical correlation between X and Y is also

exactlyzero.However, are X and Y related? Notice that the sum of the squares of X and Y is ALWAYS exactly equal to 1! If you don’t know the value of X, the value of Y can be anything from -1 to 1. However, when X is known, the choice of possibilities for Y is reduced to at most two, +/-sqrt(1- X^2). I would call that

very strongly related.This particular example represents a meaningful physical situation. Put a pointer on a plane centered at 0. Draw a circle of radius 1 centered at the zero point. Spin the pointer randomly (with each angle equally likely as a result – angle is anticlockwise from the right side X-axis . The coordinates of the point chosen by the spinner on the unit circle are (X,Y).

People forget that correlation measures only

linearrelationship between variables. When the relationships are more complicated, in some cases, one variable may not be linearly predictable from the other and correlation can be equal to zero. The examples on the page you linked to are examples of the variety of possibilities that can emerge. My example is basically the second one ffom the right in the third row.## Jeff Id said

I don’t mind being wrong. I’d rather learn and have had a lot of fun blogging this morning for a change. I’m also working at the same time.

To me the fact that correlation and causation are P(!=0) correlated means that the comparison isn’t quite apples and oranges but rather McIntosh apples vs Golden sweet. Having no background in the field, I wonder if that is how stats treats the problem.

If P(abs(correlation)) >0 for causation vs correlation as measured on all cases in the universe (correlation is the norm), doesn’t that mean that P(a | c) < P (a ∩ c) tells us something? — never thought I'd see that sentence.

## Jeff Id said

Correlation being of the complex form you mention of course. The abstraction of the pure mathematical form is not outside of my experience.

## curious said

Thanks Roman – agreed; you have properly expanded the point I was lazily making! :-)

## RomanM said

#15 Jeff.

Oh my! I was so mesmerized by your words that I didn’t look carefully at your math.

P(a|c) = P(a ∩ c) / P(c)

Multiply by P(c) both sides:

P(a ∩ c) = P(c)*P(a|c)

P(c) < 1 would imply that P(c)* P(a|c) < 1 * P(a|c) = P(a|c)

Hence:

P(a ∩ c) < P(a|c), not the other way around.

But it still wouldn’t tell us anything.:)Part of the problem with this Bayesian type of analysis is that there is a lot of arm waving and very little attention that there are unspecified elements involved such as the probability measure under which the various probabilities are calculated. Not only are specific values important, but in some cases, one can ask whether a

meaningfulprobability measure would even exist.For example, suppose I consider a probability distribution generated by selecting an integer with equal probability randomly from the set of all positive integers. I could describe it, but

it wouldn’t exist. There is noproperprobability distribution which could describe this experiment. By “proper”, we mean a distribution that satisfies the usual denumerable probability conditions: probabilities non-negative and add to 1.However, Bayesians use such “distributions (appropriately called “

improperpriors”) on the basis that theirendresults consisting of conditional probabilities are unaffected by multiplying all the probabilities by the same constant. Besides the problem of whether such results are meaningful, one should also show that the mathematics behind these procedures are well-defined and will not lead to self-contradictory conclusions (such as “proving” 2 = 1).This is just one of the reasons that I have never considered Bayesian analyses as the first way to approach anything and why for me, right from square one, Kim O. is on shaky ground. Never mind the definitions, etc., etc., etc., ….

## curious said

In words and pictures:

http://xkcd.com/552/

## Jeff Id said

Woops. Got the sign wrong.

I don’t know now Roman, maybe Kim’s right!! Or maybe I shouldn’t just fit equations from wiki into form. I’m going to have to think a bit more on this.

## TimTheToolMan said

At the end of the day Albert Einstein said it best (paraphrased from Wiki)

No amount of experimentation can ever prove me right; a single experiment can prove me wrong.

And it is very easy to falsify the idea that correlation necessarily implies causation. That completely breaks the idea in the rigorous mathematical world and in the real world where there are shades of grey, dependence must be established before correlation can be used as an argument.

So for CO2 warming the planet, the dependence has been established experimentally and so correlation CAN be used to imply causation. Of course correlation is insufficient to prove causation and thats where we skeptics come in…

## TimTheToolMan said

I am soooo not a mathematician. But that never stopped me from having an opinion on something :-)

I’d be inclined to make the assumption that x correlates to y if and only if some function f relates them.

Define A as being the set of factors that influence f

Define B as being the set of factors that are non-influencing outcomes of f

If both x and y are members of A then they are unrelated through f and hence do not correlate.

If x is a member of A and y is a member of B then they correlate and that correlation is causitive.

If x and y are members of B then they may correlate but will not be causitive by definition.

## Eric Anderson said

Jeff, I think you’ve let semantics get the better of you on this one.

A large part of the issue is the fact that “correlation” is simply an after-the-fact label that we place on two separate variables to denote that they bear an (abitrarily) assigned relationship. In other words, we can gaze around the world and find literally millions of things that occur before, at the same time, or after millions of other things. They may not bear any real interactive relationship with each other. Yet, we, in our wonderful tendency to categorize and label things, can come up with all manner of correlations between these millions of things. Our doing so, however, is simply a labeling exercise.

Classic examples abound: wet sidewalks cause it to rain; hot sidewalks cause your ice cream to melt; most criminals eat bread; most deaths occur within 24 hours of drinking water; and so on ad infinitum.

In contrast to mere correlations, causation seeks to tie the events together in some kind of cause-effect relationship: a causes (or leads to) b. a can be a direct, indirect, sufficient or contributory cause of b, and we can talk about exactly how strong or direct the causation is, but there must be some causal relationship *in addition* to the temporal relationship that accompanies mere correlation.

Thus, unless and until we have a rational reason for thinking that a has a causal relationship with b, all we can say is that there is mere correlation. Correlation is therefore not really evidence of causation, rather it is an initial hurdle to get over in order to even start thinking that causation might be present.

I understand what you are trying to say in terms of causation also having correlation and that if something is caused it is more likely to have correlation that not. However, this is somewhat circular, and ignores the fact that causation is a real-world effect, while correlation is largely a label that we assign, based primarily on the fact that there is a temporal relationship between two things.

In other words, there are two kinds of correlation: correlation that is purely accidental and has no causative relationship, and correlation that also has a causative relationship. Therefore, noting that something simply has correlation does not and cannot, in and of itself, tell us whether it is of the causative kind or the non-causative kind. Therefore, correlation is not, in and of itself, evidence of causation. It is only after we adequately narrow the paramaters of what we are looking at and have some rational basis for thinking that there is a causal relationship that we can say anything about whether the particular correlation we noticed is evidence for causation, and even in that case, our taking note of the correlation is essentially only a reality-check to see if our thinking about causation may be on the right track.

Saying that correlation is evidence of causation is a result of failing to adequately define our semantics. In the real world, mere correlation is not evidence of causation. And (just to anticipate the objection here) if we persist in thinking that correlation somehow provides non-zero evidence of causation, it is worth noting that such “evidence” is so near zero that it is effectively useless — a mere rounding error in the more difficult task of identifying actual causation.

## Kim Øyhus said

Very good going Jeff!

Just keep at it, while RomanM hides in complexity.

## curious said

Kim – Please can I check something simple?:

Which comes first – an event or evidence of an event?

## Kim Øyhus said

Both happens, Curious.

A hare leaves tracks.

The tracks are evidence of the hare,

and the hare are evidence of the tracks.

## Anonymous said

For a hypothetical linear (or approximately linear) causal relationship, temporal correlation, perhaps with delay, is a minimum requirement. But I fear some of this discussion has things backwards. Looking for correlation is normally (and should be!) the result of a hypothetical causal relationship, one which is based on one or more physical mechanisms: “If a in fact causes b, according to physical process c, then we should find the following variables correlated in the following way(s): ….” The lack of expected correlation is then evidence that the proposed physical mechanism is incorrect or substantially incomplete. Finding an expected correlation, and especially one with the expected physical scale and temporal pattern, is clear supporting evidence of causation.

But searching for and finding correlation in a spaghetti bowl of data, in the absence of pre-specified physical mechanisms (mechanisms which predict both the magnitude and temporal pattern of an expected correlation), is mostly evidence of the naivete of the person doing the analysis, not evidence of causation. Paleo temperature reconstructions are scientifically weak exactly because of this bizarre, irrational disconnect from pre-specified physical mechanisms. If you search through enough data you will likely find spurious correlations of “predictors” to recent historical temperatures… even ones the satisfy ex post facto “validation tests”. The conclusions drawn from such spaghetti bowl data searches are as likely as not to be rubbish…. a silly waste of time.

## Kim Øyhus said

Anonymous, as a quantum physicist, I can tell you that the universe itself is NOT strictly causal.

Biochemical reactions are also NOT strictly causal, since most of them go both ways, forwards and backwards.

Stuff like this behaves more like random walks, with a bias for forward in time and higher entropy.

## TimTheToolMan said

Kim writes “A hare leaves tracks.

The tracks are evidence of the hare,

and the hare are evidence of the tracks.”

But fundamentally assumes the tracks in the sentence “The tracks are evidence of the hare” are from the hare for the relation to have meaning. Suppose the tracks were from a bear. Does the logic still hold?

## Anonymous said

Kim Øyhus,

Nothing is purely causal on a small enough size scale or a long enough time scale; all becomes chaotic at those scales. I would never suggest otherwise. For example, even though thermodynamics is a consequence of chaotic motion at the very small scale, thermodynamics is clearly causal and predictive in a practical sense.

## curious said

26 Kim – thanks. So can tracks generate a hare?

## John B said

Where there is a causal link, there is correlation but I thought that where correlation between two variables A & B was observed, and with no other evidence, there were 5 equally valid possibilities:-

A caused B

B caused A

Some other factor caused both

Some other factor caused either

Coincidence – no relationship

Tracking hares.

On seeing and recognising hare tracks, and later seeing a hare, one may suppose the tracks were caused by a hare, but one cannot be certain they were caused by the hare in view.

Also the tracks may have been created by some artificial means.

The evidence would be neither hare nor tracks, but seeing the hare make them.

On hearing the quack of a duck and seeing a duck does not mean the quack was made by the duck rather than a hunter luring the duck into the sights of his gun.

I would like to see that set in a mathematical formula.

## Layman Lurker said

Sticking with hockey examples, the other night the Bruins scored seven goals. It is evidence that the Bruins

likelywon the game. But is it evidence that scoring seven goalscausedthem to win the game?## curious said

33 .. or was it the absence of eight (?) goals from their opponents that swung it?? :-)

## David S said

How very odd. Kim seems to have made two relatively uncontroversial statements – correlation is evidence of causation, and absence of evidence is evidence of absence – and then either overstated their significance or others have done so. To me, they are both completely bleedin’ obvious, but inconclusive, and for both these reasons therefore entirely uninteresting.

Let’s define “evidence” as being information that on receipt increases our perceived probability of an assertion being true, which is roughly how it seems the term is used in the legal process. Then since as John B says one of the ways two events may be correlated is through direct causation, and if they are uncorrelated it is extremely unlikely (although not impossible, as Roman shows) that there is a causal relationship, it is more likely that there is a causal relationship if they are correlated than if they are not. However this tells us nothing about the direction of the possible causal relationship, nor about its probability, if that is a meaningful term in the particular circumstances.

It is even more obvious that if there is no evidence to disprove a theory, it is more likely to be true than if there were evidence disproving it, but this truism has no practical value and can be dangerously misleading. Many scientific discoveries, from microorganisms to neutron stars, have been in areas where there was previously little or no known evidence for their existence. Only a fool would think that somehow we have now got to the stage where we know everything, for the first time in history, and so nothing is left to be discovered and absence of evidence can safely be adduced in support of the theory that something is absent.

I suspect the problem can be resolved by acknowledging that there is likely to be as yet unknown evidence that we do not have either the practical tools or the theoretical framework to make use of. That we do not have the evidence does not mean it does not exist.

## BobN said

I think it is more accurate to say that correlation is a necessary condition for demonstrating causation but in and of itself does not provide evidence of causation.

## curious said

36 BobN- check Roman’s comment @ 14 above.

## BobN said

Curious – in the circle example, x and y are correlated but not linearly. From my experience in environmental investigations, i would say linear correlation is the exception rather than the rule in nature.

## Kim Øyhus said

David S, we agree very much. I too find them bleeding obvious.

However, lots of people do not, and will not believe they are true no matter how told.

So I made the proofs, and they have actually helped.

Some people have accepted the proofs with baffled confused surprise.

More people have said it was nice to have proofs confirm what they had strong suspicions of.

And some people just deny the truth or correctness of the proofs, never actually understanding them.

But they really are very obvious to sufficiently intelligent minds.

## curious said

39 Kim – Please can I check something possibly obvious – can the tracks generate hares?

## curious said

38 BobN – ok, noted re. natural systems but fwiw I think ‘dependent’ is a better description than ‘correlated’.

## RomanM said

Re: BobN (May 2 17:40), you are using the word in the generic form meaning “related” or “not independent”. I was replying to curious’ previous comment calculating correlations for certain type of data sets.

Unfortunately, many people do not always refer to the correlation coefficient when using the term “correlated”, however their decision on whether a relationship exists is made using a test of whether “the correlation equals zero” which is a clear reference to that case.

This is yet another case where a term with technical statistical meaning is commonly used but with a slightly different meaning (like e.g.

significant). I usually try to be more precise in my usage so there can be no misinterpretation of the meaning of what I say.## TimTheToolMan said

And I still want to hear Kim comment on the fact that his assumption is that the tracks are tracks of the hare and not some other hare or animal. So in “proving” causation, he’s assuming it in the first place.

The same argument goes for his “proof”

## RomanM said

#41 Curious.

We seem to have posted simultaneously.

Great minds… ;)## curious said

44 – correlated posting – whatever next!? :-) Not sure about the great minds bit though, I appear to be struggling with the bleedin’ obvious. Hopefully Kim will straighten it out.

btw – Judith Curry is talking about Black Swans – surely they don’t exist??

## TimTheToolMan said

Being more specific about this, I believe Kim’s

Definition 4 causation give correlation: P(c|a) = 1

Should be

Definition 4 causation give correlation: P(c|a) >= 0

And I’m justified in allowing zero through real life examples. Bogus correlations like Temperatures and Pirates or Criminals and Peanut butter just to name a couple.

## Eric Anderson said

Great example of coming up with a mathematical “proof” that uses poorly-defined terms, and then thinking that the proof shows more than it does. It is quite clear that coorelation does not necessarily mean causation. Therefore, you can have all the correlation in the world, and you still need something in addition to mere correlation to conclude causation.

To argue that because causation will also have correlation means that the existence of correlation somehow increases the likelihood that there is causation is perhaps true in a strictly logical sense (again, assuming we define correlation loosely enough). But it increases the likelihood of causation only to an infintessimally small degree (e.g., we then acknowledge it is non-zero), and is essentially useless in the real world. Only be significantly narrowing the paramaters of the two variables we are looking at, and by proposing some kind of rational causal relationship, does this particular “correlation” become meaningful.

## TimTheToolMan said

Eric writes : “But it increases the likelihood of causation only to an infintessimally small degree (e.g., we then acknowledge it is non-zero)”

Actually Zero is a valid value and thats vital to the argument. Kim has explicitely excluded zero by choosing a set of causations and dependent correlations (Where P(c|a)=1) but explicitely ignores the larger set of causations and independent correlations (Where P(c|a)<1 and includes zero because there are correlations that are clearly not causitive)

So if Kim truely believes in the correctness in his proof and particularly of his "definition" of 4, then he ought to be able to explain how the increasing global temperatures/decreasing numbers of pirates correlation works.

## Geoff Sherrington said

Came in late. At the absurd limit, two horizontal lines with the same origin correlate beautifully. One can say nothing about causation unless there is a mechanism, shown as a difference in character between two effects. A mechanism in this horizontal line case cannot be shown because there is no character in either of the lines.

As one raised just past the milk teeth stage on analytical chemistry, the essential starting ingredient was the “calibration curve” that related chemical abundance to instrument response. The instrument was first calibrated with standard solutions/solids whatever. That is the stop that is often missing in the climate science debate. There is a lack of a standard curve. The most common slack of the analytical standard curve arose from the difficulty in matching synthetic standards to natural substances. Sure, in climate, we have things like adiabatic lapse curves, but we know that they have a rogue element from humidity. The success of the whole topic depends on elimination of the effects of the rogues.

In climate work, we do not have a plausible relation of the most elementary kind, relating CO2 concentration in the air to any temperature change that it induces. We do not even have a very good correlation. I’d be rather circumspect about reading too much into climate work because the standard methodologies can be glossed over too easily. And they are. Too many rogues.

## Eric Anderson said

TimTheToolMan:

Thanks. I believe we are saying the same thing. I’m arguing (in my earlier comment) that the main problem is in the poor definition of “correlation” because it doesn’t acknowledge that some correlations have no causation attached. I think you’re saying the same thing, probably in a more technically clear way than I. I agree with you that the equation must include zero because there are clearly cases where there is no causation.

For me the bottom line is:

1. You only get “evidence” of causation from correlation if you fail to carefully define correlation.

2. Even with that, the “evidence” is so infintessimally small as to be useless in the real world.

## Pompous Git said

I’m reminded of Richard Swinburne’s argument that the probability of God’s existence was greater than the probability of his non-existence. When I came to grips with his argument (it was my introduction to Bayesian probability), I realised that in the absence of humans the probability of God’s existence was zero.

Phil Dowe’s book “Physical Causation” (Cambridge University Press 2000) is an interesting, though often difficult read for those interested in “what causation in fact *is* in the actual world”. Phil was my favourite philosophy lecturer in first year.

## TimTheToolMan said

Eric writes “For me the bottom line is:”

We certainly do have agreement. My bottom line is the following

1. In general there is a non zero probability that correlation implies causivity

2. In general there is a non zero probability that correlation cannot imply causivity because they are independent quantities.

Those two statements are seemingly at odds with each other but I believe simply reflect that in general correlation is a useful indicator but cannot be assumed to be evidence.

## TimTheToolMan said

This has been annoying me for days now :-S

But I think I’ve found a suitable answer in terms of Kim’s “proof”. Please critique.

Definition 1 correlation : c

Definition 2 causation : a

Definition 3 not everything correlates : P(c) < 1

Definition 3a not everything causes : P(a) < 1 P(a) : definition 3 <—— No longer valid

Conclusion: From P(a|c) & P(a) nothing can be drawn : Correlation is not evidence of causation.

## Mark T said

Roman, you need to put the smack down here… the word imply is yet another that has very specific meaning in both logic and statistics, one that is different than colloquial usage.

Mark

## Kim Øyhus said

Causation can always give some correlation.

If you do not have correlation in your variable, choose another variable that do!

If you do not have correlation because it gets balanced away, then cut away the symmetry!

Etc.

Thats why I put 100% probability that causation give correlation.

You just have to measure the right stuff in the right way.

## TimTheToolMan said

Kim writes : “You just have to measure the right stuff in the right way.”

But if you assume that the correlation implies causation by choosing only situations where that assumption is true then its a no brainer that the correlation implied causation. Its by definition.

I think I’ve found a suitable answer in terms of your “proof” though.

You’ve defined P(c) < 1 by noting that not everything correlates but you must also define P(a) < 1 because not everything "causes" either.

Definition 1 correlation : c

Definition 2 causation : a

Definition 3 not everything correlates : P(c) < 1

Definition 3a not everything causes : P(a) < 1 P(a) : definition 3 <—— No longer valid

Conclusion: From P(a|c) & P(a) nothing can be drawn : Correlation is not evidence of causation.

## TimTheToolMan said

Something went wrong with my last post and I lost the proof part. Here it is again.

Definition 1 correlation : c

Definition 2 causation : a

Definition 3 not everything correlates : P(c) < 1

Definition 3a not everything causes : P(a) < 1 P(a) : definition 3 <—— No longer valid

Conclusion: From P(a|c) & P(a) nothing can be drawn : Correlation is not evidence of causation.

## TimTheToolMan said

For some reason my posts aren’t all being successful and that last one obviously had a problem as it missed the proof part. Hopefully this one will work and I’ll try to get a correct one posted…

## TimTheToolMan said

Definition 1 correlation : c

Definition 2 causation : a

Definition 3 not everything correlates : P(c) < 1

Definition 3a not everything causes : P(a) P(a) : definition 3 <———— No longer valid

Conclusion: From P(a|c) & P(a) nothing can be drawn : Correlation is not always evidence of causation.

## TimTheToolMan said

Definition 1 correlation : c

Definition 2 causation : a

Definition 3 not everything correlates : P(c) < 1

Definition 3a not everything causes : P(a) P(a) : definition 3 ******* This is no longer valid

Conclusion: From P(a|c) & P(a) nothing can be drawn : Correlation is not always* evidence of causation.

* also I’ve added “always” because correlation can be evidence of causation, but it isn’t in general.

## TimTheToolMan said

OK, I dont know why its not accepting the whole thing but essentially when you add

define P(a) < 1 because not everything "causes"

…the proof breaks.

## Kim Øyhus said

TimTheToolMan, your comments have no bearing on what I actually wrote. You misunderstood.

## Jeff Id said

Sorry Tim, I have had no time whatsoever for checking the blog. I approved all the comments so they are repetitive. It’s too bad WP doesn’t give a little more control over the spam filter.

Kim,

Do you recognize now the fallacy of the concept that correlation is causation? I mean you quoted Dvorkin in a manner which reads a lot like support for this nonsense to me. It is quite obvious that correlation is evidence supporting causation because dependent items have a higher probability of generalized correlation (not P=1 but not far from P=1 across most events), and yet is quite insufficient for proof of any form of causation. This holds under whatever reasonable definitions I can imagine for causation and evidence.

It is an interesting point here because this concept is used to justify everything from trees to mollusk shell isotopes as temperature proxies. Effects of temperature are assumed linear with no evidence that the assumption would ever hold and VERY noisy and highly autocorrelated data is simply mashed together to create thousand year temperature trends. The result is ‘the warmest year in a thousand’. One thing most here know for certain is that we don’t know the temperature a thousand years ago — at all. This is a very serious issue sucking hundreds of millions of dollars out of our government to support these studies – and very little of it is even remotely statistically reasonable.

## Eric Anderson said

Jeff Id wrote: “It is quite obvious that correlation is evidence supporting causation . . .”

Jeff, I think you agree that there are correlations that exist without causation? Indeed, there are many more (a near infinite number of) correlations that are non-causative than those that are causative. As a result, we must take the position that a correlation *may or may not* indicate causation; indeed, it is much more likely that a given correlation is not causative than that it is causative. Therefore, when we identify a given correlation, the only thing we know for sure is that it *may or may not* be evidence of causation. It is only after we have started to identify some plausible causative mechanism between the two items that we can even rationally start to entertain the idea that there might be some causation.

The only way your statement can make sense is if there is never causation without correlation (that may be the case, and I’m inclined to think it is true, although others have expressed doubts). Even assuming it to be true, however, the so-called “evidence” provided by mere correlation can never be any stronger than simply acknowledging that there *may* be causation between these two items, but that there is still an (almost) infinite likelihood that there is not.

## Jeff Id said

Eric,

I think that you are not considering the gray areas of probability. There are a few situations where or even anticipated causative variables cause no mathematically definable correlation. Most cases are more clear. If there is correlation (broadly defined) between two variables over the bulk scale of possibilities, it increases the probability of the two variables having a causal relationship.

But it doesn’t prove it.

Therefore my opinion is that correlation is evidence supporting causation, but correlation does not prove causation.

Evidence

1. The available body of facts or information indicating whether a belief or proposition is true or valid

* – the study finds little evidence of overt discrimination

2. Information given personally, drawn from a document, or in the form of material objects, tending or used to establish facts in a legal investigation or admissible as testimony in court

* – without evidence, they can’t bring a charge

3. Signs; indications

* – there was no obvious evidence of a break-in

## Eric Anderson said

Jeff, great discussion and I don’t want to beat a dead horse, but let me try to articulate my thoughts one last time.

I understand your view that if there is a correlation between two variables it could, as a matter of pure logic, mean that there might also be a causative relationship. However,the mere fact that the probability of x does not equal zero does not mean that this is a useful statement from a scientific standpoint or in the real world.

In terms of probability, for example, there is a non-zero possibility that gravity will fail tonight at midnight, but we don’t consider it a live possibility. There is a non-zero possibility that the sun will cease to shine tomorrow at noon, but we don’t engineer for it, spend any time worrying about it, or conduct our scientific investigations and our lives as though this non-zero possibility might come to pass. For something to be of any meaningful value in the real world, it can’t be a sheer logical possibility, there has to be at least some reasonable probability.

In terms of correlations, there are, by definition, an infinite number of correlations. Only the tiniest fraction of these is causative. Thus, when I find a given correlation (absent additional evidence to make me think there is also causation), I am only justified in saying that I have found a correlation that *may or may not* have anything to do with causation. And from a mathematical standpoint, the best I can possibly say is that I’ve moved from zero probability to 1 over Infinity-1 (or whatever other astronomically massive denominator we want to argue about).

As a result, the statement that a particular “correlation is evidence supporting causation” needs to be modified to be more along the lines of: “this correlation could, hypothetically have something to do with causation, but the odds are infinitesimally small that it in fact does — so small in fact, that from a practical standpoint you can ignore it.”

So we can play with proofs and non-zero probabilities, but if I find mere correlation between two variables and think I’ve found evidence supporting causation, I need to remember how incredibly weak that “evidence” is – from a probability standpoint about as valuable as the assertion that gravity will fail tonight at midnight. In other words, so weak as to be essentially hypothetical and of no meaningful value in a scientific enterprise or in the real world.

## Jeff Id said

Eric,

I have also enjoyed this discussion – what little time I’ve had for it. I have barely had time to read even this blog. After your last comment I understand you perspective better and agree.

## Kim Øyhus said

Causation will always give some correlation because

when A causes B, there are 3 logical possibilities:

1. A. B

2, !A, !B

3, !A, B

The fourth possibility is absent:

4, A, !B

Since this fourth possibility is zero, there will be a correlation between the other three, always.

You should have been able to figure this out yourselves. Lets see if you can figure it out for other reasonable definitions of causation.

## John F. Pittman said

Kim says that causation will always give some correlation… which is true. However, will someone see it? Will someone be able to figure it out? There are three main groups to this additional condition, yes, no, maybe. Yet Kim’s discussion on this thread is that someone always sees it. That is a belief Kim. By your argument you always see it. History of science, philosophy, law, and physics says you may see some but not all, unless of course, you are omniscient. Let’s see if Kim can see other possible reasons that one should figure out that one cannot always figure it out themselves, or that none can figure it out sometimes. Your argument is rhetorical. Note there are more than 3 possibilities if you realize the intrinsic assumption of your hypothesis.

Damn, just showed that his 4. is a rhetorical construct, not a physical reality. One can point out !B really means inability to detect, and not absent. Sorry Kim. Your argument left out a real condition. It assumes omniscience. Logic has that problem when not applied correctly. Sorry, not real. Damn, just what Jeff posted upthread. Kim you run in a circle of your own beleif. Those of us on the sides can see the circle that you ignore, and that you think is a straight path to truth. You are not omniscent.

## Kim Øyhus said

Pittman, your argument would have had some merit if what you believe I mean and said had been equal to what I mean and said.

To all of you:

Just find a damn counterexample then, where causation does not give any correlation!

## John F. Pittman said

Kim you said “Thats why I put 100% probability that causation give correlation.

You just have to measure the right stuff in the right way.”

One cannot always do this.

You state “Causation can always give some correlation.” But one cannot always find the correct correlation.

David S said “How very odd. Kim seems to have made two relatively uncontroversial statements – correlation is evidence of causation, and absence of evidence is evidence of absence – and then either overstated their significance or others have done so.” This is what the posters here agree to.

My point is to your first claim of proof. Evidence is evidence, not a proof. Correlation may be spurious, it may not. Absence is evidence of a lack but not a proof of non-existence. Just evidence, not conclusive at that.

Kim you state: “To all of you: Just find a damn counterexample then, where causation does not give any correlation!” I did give you one. It is one where we cannot see the correlation. The reason it is important is that you claim evidence of absence is absense of evidence. This is why the posters are pointing out the incompleteness of your claim. In this case, the proposition you have made about absense of evidence and an inability to see, or know, leads to an incorrect conclusion.

This is what we posters are trying to point out to you. The contradictions of applying your reasoning past a very ineffective framework violates the intrinsic assumptions that you make.

## jeff id said

#71, Exactly.

————–

Kim

“You state “Causation can always give some correlation.” But one cannot always find the correct correlation.”

Now this was may point to Kim on the other thread. You cannot disprove god by lack of correlation with observation. The ‘anticipated’ observation may not be sufficient. i.e. lack of evidence.

We are aware of quantum physics state change examples which while the change in state is caused by observation yet we cannot always detect the state change or even anticipate other than by probability what the new state would be. In cases where the state change is two sided with 50% probability you have a causation with a zero correlation. There is an example.

Another example might be warming a fluid and measuring the direction of an individual atom. Ignoring the effects of observation, you know that you have accelerated the atom and therefore you know you have caused it to be traveling in a different direction than it would have been at a given moment in time (causation) but the direction is apparently randomly altered (not predictable). Direction changed by causation yet no correlation?

Now Kim do my examples represent insufficient observation (lack of evidence) or do they represent that causation doesn’t always equal correlation? I’m curious how someone as sure of themselves as you would answer that question.

In my opinion we simply don’t know but my best guess is that we simply lack observational evidence. ‘God does not play dice’.

While it is correct to state that correlation is evidence for causation, it is not sufficient proof or even strong evidence of causation under whatever definition.

## RomanM said

#69 John, as you have noticed, discussing anything with KO is pretty much a useless experience because “what

youbelieve he means and says” is probably not “whathemeans and says”. One of the reasons is that nowhere does hedefinehis terms orthe context of the situationsin which they are considered so you never know whether to interpret his words in a scientific sense or as rhetorical arm waving.A good starting point would be to specifically define “correlation” as he is using it here. Are we talking a numerical parameter or is it merely the generic concept of relationship as a lack of probabilistic independence?

What are “A” and “B”? Are they physical relationships or situations? Observable (or unobservable) measurements or quantities? Are we talking about them in the context of a theoretical probability space or are we considering them in the real world (as you discuss in your comment)? What you allow A and B to be will also determine how “correlation” can be defined or interpreted.

What do we understand as “causation”? Must there be a physical effect? Must it be observable? Does the occurrence (or existence of A)

alwayscause B or can there be (possibly unobservable) circumstances under which this is not the case? If one cannot provide an exact definition, then there need to be examples which can spell out what might qualify as causality as well as examples of cases which are not.Of course, regardless of the above, whether or not one can meaningfully

defineameaningfulprobability structure encompassingallcausative agents and their effects is another matter which someone with limited knowledge of probability theory is unlikely to appreciate.## curious said

73 Roman – Agreed. FWIW I also think he has the concept of dependency back to front and is trying to apply conditional probability to demonstrate his “proof” in an invalid way. I am also unclear if he has defined the universal set of events to which his “proof” applies.

70 Kim – “To all of you:

Just find a damn counterexample then, where causation does not give any correlation!”

Did you read 12 and 14 above? Do you have any comments that could clarify your position? Thanks

## Mark T said

a) Kim, did you really say this expecting that nobody could find one?

b) Kim, are you really a physicist, or did you just play one on TV?

sig1 = cos(omega * t)

sig2 = sin(omega * t)

Short term: over in integer # of cycles, sig1 and sig2 are uncorrelated (identically equal to zero,) though sig2 is a signal that is generated simply by lagging sig1 pi/2 radians. In other words, sig2 is caused by sig1.

Long term: a sine and a cosine with the same frequency asymptotically approach zero correlation regardless of whether the number of cycles are integer.

Wow. Ignorance really is bliss.

Mark

## TimTheToolMan said

Kim wrote : “TimTheToolMan, your comments have no bearing on what I actually wrote. You misunderstood.”

Here is what you write…

Conclusion: P(a|c) > P(a) : Correlation is evidence of causation. Q.e.d.

This is not the same as saying something that is caused by something else will correlate to it. And yet when you discuss it, this is what you’re saying. As far as I can see, your proof isn’t what you want it to mean bacause quite clearly correlation is not evidence of causation except where it is…

Or put another way, you’re saying. a implies b where a implies b.

## Kim Øyhus said

Pittman, if there is a correlation you cannot see, it means there IS a correlation.

Jeff ID, in your fluid atom example, there is a correlation between temperature and acceleration of atom.

Your quantum example is too vague to make sense, and I am a quantum mechanic.

RomanM, you lie about me when you accuse me of not defining terms. The falseness of this is easy to see, since I defined terms in my first proof. The meaning of A and B are the typical ones in conditional probability. I also wrote about this on the web pages for the proofs.

Curious, you make me a little bit happy, because you showed the intelligence of stating a problem clearly and shortly. To get correlation from the Wikipedia examples without correlation, you can cut away parts of them. For instance keeping a quadrant of the ring, and the points inside that quadrant will be correlated. This goes for Mark T´s purported counterexample too.

However, none of these examples with hidden correlation had any causation, in the usual sense, so they are doubly invalid.

## Jeff Id said

Kim,

I specifically stated direction change without mathematical correlation. We know there is on average an acceleration so we can demonstrate effective causation but what about direction?

Sorry for the QM generalization but again the point is you know as well as I that there are events which can occur which are beyond our ability to observe or correlate. Therefore your assumption in your proof is not proven. And I am one who agrees that correlation is evidence of causation, just not strong evidence.

I’m just a lowly aeronautical engineer Kim, but at least I can recognize error.

## Kim Øyhus said

Jeff, if there is a correlation between temperature and acceleration, then there IS a correlation, caused by heating in this case. The lack of correlation in atom direction does not change the fact that there IS a correlation in acceleration.

The fact that causation give correlation is bloody obvious to me. The counterexamples have been exactly as stupid as I anticipated. They were all about measuring some other correlation that was not there, while ignoring ones that were.

## Jeff Id said

But you also have causation for direction change right? Or is there no causation in that case? Or is it simply faith that the direction change was caused by heating?

I find your arguments traversing the gap to sophistic.

## Mark T said

Uh, my counter example did not have “hidden correlation” and it had 100% causation. The input to a Hilbert transform is a cosine and the output is a sine. The cosine caused the sine, but is uncorrelated with it. You asked for a counter example, I provided one. That you do not understand the example, nor logic for that matter, is not my fault. II disproved your assertion and you whine about the result… you aren’t a physicist, are you?

Mark

## Eric Anderson said

Kim, even if we assume that there is *always* correlation where there is causation, we still have a near infinity of examples in which correlation is not a result of causation, per my prior comments. So even if we grant the point you are pressing now (causation always has correlation), finding mere correlation is only “evidence” of causation in the exceedingly narrow sense that it means we have moved from zero to something like 1 over infinity-1. In other words, the “evidence” is nothing solid, nothing that can even be considered meaningful in a scientific sense or in the real world.

## Mark T said

You can’t even grant him that. Either it is

always, ornot always. Kim doesn’t get to arbitrarily choose which conditions he gets to apply – it is one or the other. Not always has at least one (there are an infinite number) example… guess the rest.Mark

## Kim Øyhus said

Jeff, heating does not cause change of direction of atom, because the atoms changes direction anyway. Thats what atoms do.

Mark, you claim your spiral example has causation. Where is it then?

And you obviously did not understand my previous answer, so I will repeat:

If one looks at a spiral closely, it is made of almost straight line segments.

If one looks at a typical gaussian correlation, it looks like a fuzzy line segment.

Get it? Segments of the spiral have correlation.

Eric, the probability of causation obviously depends on the strength of correlation.

If your correlation seems spurious and weak, there is less probability of causation.

If your correlation is clear and statistically significant, the probability of causation is much higher.

This is elementary statistics.

## Kim Øyhus said

The people here are good at ignoring and misunderstanding what I write.

Here is a very short play about this, that captures the gist of this “discussion”.

Kim:

Causation give correlation, but not everything correlates.

People here:

You are wrong, because there are things that do not correlate!

## Mark T said

What? I already told you. A sine is the Hilbert tranform of a cosine, i.e., sine = f(cosine). Causation by definition. I already pointed out how a sine and cosine correlate over time so your point is sort of silly.

What you fail to understand is that you claimed

alwaysthen asked for one counter example, which I provided – sine/cosine over an integer number of cycles (or all time.) What Gaussians do is immaterial. Your claim is simply unture by demonstration. Revise your claim or shut up.Mark

## Mark T said

Your last paragraph indicates you don’t even understand your own argument, or at least, what anybody else is arguing. Your fundamental problem is that you have several statements presented as axiomatic facts which are, in reality, assumptions that you have not proven. In at least one case, a statement that is provably false is presented as a fact.

Mark

## Mark T said

Ha! I never even read Roman’s #14 till just now. :)

Mark

## curious said

77, 84 Kim – yes this is making me laugh too! Hopefully I can cheer you up some more:

“If one looks at a spiral closely, it is made of almost straight line segments.”

By extension – Are you saying a circle is a polygon?

And

“…none of these examples with hidden correlation had any causation…”

Are you saying that “hidden correlation” is a special case of “correlation” which lacks the causation feature you claim?

## Kim Øyhus said

Mark and Curious, I see even my simplest comments go far over your arrogant self righteous heads.

In other words: You do NOT understand what I write. You MISUNDERSTAND.

So, to keep it as simple as possible, but not simple enough for you:

Go calculate the correlation of a quarter circle!

It will NOT be zero.

But this is of course beyond your capabilities.

A circle has a correlation of zero, while a quarter of the same circle will

NOT have a correlation of zero.

## Jeff Id said

“Jeff, heating does not cause change of direction of atom, because the atoms changes direction anyway. Thats what atoms do.”

Kim,

We now know how far you will go to be right having made the fully insane claim that adding heat to a box will not alter the direction of the individual atoms within. I wonder where they teach that to particle physicists.

You can’t argue with a clown. — George Jetson

## Jeff Id said

The correct answer for you Kim, is that we don’t have sufficient observational evidence to predict the direction, and that lack of correlation to observation is not proof that causation didn’t exist.

Duh.

## John F. Pittman said

#77 You restate what I pointed out as though it answered the point I made. Your proof has intrinsic assumptions, as I and Roman have tried to point out to you, invalidating the generalized claim you made unless your claimis that the proof has little explanatory power or is restricted to a very small subset of already known probablity densities or phenomena. It cannot be taken past these constraints without becoming trivial or incorrect.

So, I guess the real question is just what is the value or usefulness of your proof?

## Anonymous said

90 Kim –

“Go calculate the correlation of a quarter circle!

It will NOT be zero.

But this is of course beyond your capabilities.”

Did you read 12 above? Or is it a case of:

“In other words: You do NOT understand what I write. You MISUNDERSTAND.”

Any news on the hare generating tracks and the poly sided circle? …

Whatever – I suggest you write up your proof and submit to a journal. Perhaps then it will get the recognition its brilliance deserves. Don’t forget to include the “special case”!

## Kim Øyhus said

I give up.

What I write about is too far above you.

You just make everything into a confused mess of misunderstandings.

## Jeff Id said

You guys thought I was kidding about the megalomania.

Your retroreflector idea stinks too Kim but I don’t think it is worth explaining why.

## boballab said

Jeff:

Here is a short Interview with Dr. Jasper Kirkby, the team leader of the Cloud Experiment and a Particle Physicist:

http://link.brightcove.com/services/player/bcpid106573614001?bckey=AQ~~,AAAAGKlf6FE~,iSMGT5PckNvcgUb_ru5CAy2Tyv4G5OW3&bctid=941423264001

Just after the 5:50 mark of the interview Dr. Kirkby had this to say about Correlation and its relationship to Cause and Effect:

Maybe Dr. Kirkby of CERN is too far below Kim as well.

## jon rappoport said

There’s a slightly different way to formulate all this.

If A, then B.

B.

Nothing follows.

If A is a hypothesis, and B is a set of events that are predicted, using A, then it doesn’t matter how many times B follows. You can’t logically infer that A is true. But science doesn’t really care whether A is true. It only cares about using A to make useful predictions.

Applying this to climate science, the outcome is obvious. No useful predictions.

## Eric Anderson said

Jon, I like your simplified formulation to cut through the fog.

“Nothing follows.”

Exactly. Especially since we know that B also flows from all kinds of other stuff that has nothing to do with A. Only rarely is B produced by A.

## 34b bra|34b|34b bra size|bra inserts|bra size 34b|34b breast|34 a bras|bra calculator|34b push up bra|b 34 bra|how big is a 34b bra|34b breast size|bra size b 34|bra sizes 34b|bra sizes|34b breasts|34 b bra|34b bras|34a bra said

34b bra|34b|34b bra size|bra inserts|bra size 34b|34b breast|34 a bras|bra calculator|34b push up bra|b 34 bra|how big is a 34b bra|34b breast size|bra size b 34|bra sizes 34b|bra sizes|34b breasts|34 b bra|34b bras|34a bra…[...]Correlation is Not Causation « the Air Vent[...]…