Understanding Analytics: Correlation does not imply causality

It’s the questionable-cause logical fallacy. We usually fall into this trap, a lot.  This error is so common it even has a Latin name ”cum hoc ergo propter hoc” (“with this, therefore because of this”). 

Correlations are flimsy

Correlations are easy to find they are inconclusive and carry very little weight. This is why the word “implies” is used in technical conversations involving correlations because it means “is a Sufficient condition for” not “the cause of” as most usually interprete.

Correlations are cheap. Sometimes make absolutely no sense. A project Spurious Correlations searches through public data to find any type of correlation

Correlation causes widely held (but mistaken) belief though it should not be mistaken for Illusory correlation where there might be no actual correlation or even a negative correlation. Illusory correlation a false association may be formed because rare or novel occurrences are more salient and therefore tend to capture one’s attention. Illusory correlation is the basis of many weird beliefs or superstitions.

Causal relationships are really hard to find

Causalities or causal relationships align better with the way we tend to naturally think about things. They feel solid and are actionable. 

Anywhere a causal relationship exists, there is a correlation, but also a mechanism or sequence from cause to effect. While correlation is necessary for a causal relationship it is insufficient condition.

Causal relationships are established by correlations and more importantly by observing plausible results under different circumstances. This is the basis of the experimental approach.

But you can’t simply go back in time, replay an event that occurred, tweak a few variables and observe the outcomes. This makes causal relationships very hard to establish. This is the case with many data analytics problems. 

A simple method to identify causal relationships

This paper by Joris M. Mooij identifies a way to filter causal relationships from correlations. Take the correlation between traffic density in Lagos and the recorded bedtime of a doctor.  The main idea is that we introduce random fluctuations to both variables and we observe their correlation. It quickly becomes obvious that while the recorded bedtime of the doctor varies rather proportionately with traffic density the reverse is not the case. The doctor’s recorded bedtime does not cause or affect traffic density.

This is a simple technique, in fact, breaks down when we have more than two events, but would help immensely to avoid the questionable-cause logical fallacy trap.

Stay updated by following us Our LinkedIn Page or Twitter Page. You can also get our monthly email updates by subscribing to our newsletter.