This post will attempt to repeat, clarify, and elucidate the need for the remembrance and understanding of the phrase “correlation does not imply causation”. Scientific studies will be given, and the words in the phrase, which vary in meaning depending on usage, will be defined accordingly.
Please take a moment to go through the following actual, summarized scientific research results:
1) In a previous scientific research using quantitative assessment, numerous epidemiological studies showed that women who were taking combined hormone replacement therapy (HRT) also had a lower-than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD.
2) From a study at the University of Pennsylvania Medical Center, young children who sleep with the light on are much more likely to develop myopia in later life.
We will get back to them in a moment. Now we focus on correlation or co-relation, and why scientists, statisticians and skeptics, at the very least, should always try to maintain and promote the phrase “Correlation does not imply causation”.
Getting the words right
Now we’ll define the more important words in the phrase, so as to remove confusion. The definitions of the words will also help my point, which includes highlighting the confusion between correlation and causation.
First we review that correlation, according to merriam-webster.com is
“a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone”
Which I will suppose is a fairly permissible definition of the word in casual discussions. Next, we will define what correlation means in mathematics, particularly in a statistical sense:
“it indicates the strength and direction of a linear relationship between two random variables.”
“it refers to the departure of two random variables from independence”
Which I again will suppose is fairly permissible in a mathematical discussion.
The casual or colloquial definition of correlation generally means the existence of a relationship which may not necessarily be, as defined by mathematics, linear in nature.
Now that we have those out of the way, we move to the definition of the word “imply”. The site merriam-webster.com gives the following as some of the definitions of the word “imply”:
: to involve or indicate by inference, association, or necessary consequence rather than by direct statement
: to contain potentially
And the word is listed as synonymous to “suggest” and “infer”. This is how we would normally use the word “imply” in casual discussions, and I will suppose that it will be acceptable for such usage.
Now we define what “imply” means in a mathematical sense, or when used in logic:
To be a sufficient circumstance.
It is worth noting that, especially in mathematics and logic, correlation is a requirement for causation. But the reverse is not necessarily so, as we shall soon see. Correlation therefore is not the same as saying, in mathematics and logic, that if p then q, or rather that if p is true, then q must necessarily be true as well.
It is very important therefore to note the difference between the casual use of the words from their scientific or mathematical use. In this article we refer to the mathematical or logical definitions of the words.
I repeat, correlation does not imply causation
The fallacy comes from the fact that people, including even seasoned scientists, often fall prey to the mistake of immediately assigning causality between two correlated objects or events. Of course, when two objects or events are correlated, it cannot also be dismissed that there is no causal link between them. But the point still stands, that we tend to overlook other variables which, unseen or improperly observed, can be mistakenly dismissed, thus committing the fallacy.
I think one reason why a lot of us, again including even some scientists themselves, fall into this messy problem of causality and correlation is that, to a varying degree our brains are “biologically wired” to interpret patters. When we see a linear looking object between or beneath two rounded objects we more or less immediately see a face (happy, sad, whatever). When we look at clouds we are quite likely to see patterns. Same thing happens with correlation. We immediately infer and “jump to conclusions” that since object/event A is correlated with object/event B, they must be causally linked.
In fact the two types of correlation, positive (when one variable increases/decreases, so does the other one) and negative (when one increases, the other decreases and vice versa) correlation I think does not make it easier to see through the fallacy of outrightly assigning causality when only correlation exists.
To further prove the point up to the point of absurdity, consider the correlation of your feet size and your intelligence. One could foolishly assume that, since feet size is correlated with intelligence, that the larger your feet size the smarter you are. Imagine what a surprise that would entail if it were really true. The fact is that as we do grow older and our feet size grow larger, we more or less become more intelligent, usually relative to what our intelligence was when we were babies for example. This is an example of a positive correlation with no reasonable causation.
An example of negative correlation with no reasonable causation is stating that the decrease in hair i.e. hair loss is met with an increase in senility. As your hair count goes down, your senility goes up. The fact that you get old and you lose hair does not necessarily cause senility, but that they just happen to happen together, correlated in a negative way.
Why not causation? (plus the scientific studies at the start of this article)
We now move to the obvious reasons as to why correlation does not necessarily imply causation. One reason is that there may be a 3rd, 4th, and so on number of variables involved, and which were overlooked by the experiment or research. These variables then turn out to actually cause the causation betwen the correlated variables A and B. Another reason, which is actually an extension of the first one, is coincidence. By this we mean that the relationship of A and B are so complex and “un-unravelable” so as to consider them coincidental. Another reason is that A contributes to the occurrence of B, but is not the sole cause of B. In other words, if variables or occurrences such as these cannot be known or studied more closely, the observation data alone cannot justify a causation between A and B.
Scientific example 1 (above) turns out to be a correlation but not a causal event. Re-examining the data, it was found out that the women who underwent HRT were more likely to come from higher socio-economic status, thus they tended to live a relatively healthier lifestyle via better diet and exercise opportunities.
Example 2 (above) regarding myopia on children, was later re-examined at the Ohio State University wherein they didn’t find a causal link between having the light on during bedtime and the occurrence of myopia later on. The newer study however found a strong link between myopia in the children’s parents and the development of myopia in their children. The study noted that the parents with myopia were more likely to leave a light on in their children’s bedroom, further elucidating the source of the correlation and confusion. Thus it was finally found out that the case of child myopia and the leaving of light/s on in the children’s bedroom are caused by parental myopia.
What’s the point of all this again?
But then you ask, if even seasoned scientists fall prey to this mistake i.e. immediately jumping from correlation to causation, how should a regular guy/gal like me fare otherwise? The point of this article is to promote healthy skepticism, which starts by not taking every word that comes from an authority figure, no matter how professional or experienced that figure is, as the gospel truth. I’m quite sure that, since you can read this article on the web, you can quite surely read other articles, my resources and references below, search the Internet and so on regarding a certain scientific topic or discovery or research result you just read.
I think this is a very important but easily overlooked phrase, since it could lead to confusion, fear, anxiety, hysteria and so on. The scientific examples given above probably only gave women and parents something to worry about. But consider for example a scientific study correlating the color red and child aggression. If improperly informed, parents, teachers etc. could storm their children or their children’s belongings, scaring or worrying children in the process, based only on correlation and not causation.
Resources, references, and further reading:
- Technical Brief on the 1999 Statistical Model, InfoWorks on “Correlation does not imply causation”: http://www.infoworks.ride.uri.edu/1999/techbrief/techbrief_8.htm
- Beyond The Rhetoric article on the phrase: http://btr.michaelkwan.com/2009/01/10/correlation-does-not-imply-causation/
- Types of correlations, from “SPSS Survival Manual: A step by step guide to data analysis using SPSS” by Julie Pallant, Google books.
- Nice Wikipedia article on the phrase: http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation
- On the correlation of HRT and CHD, International Journal of Epidemiology, 33 (3): 464–7.10.1093/ije/dyh124. PMID 15166201
- CNN, May 13, 1999. Night-light may lead to nearsightedness: http://www.cnn.com/HEALTH/9905/12/children.lights/index.html
- Ohio State University Research News, March 9, 2000. Night lights don’t lead to nearsightedness, study suggests: http://researchnews.osu.edu/archive/nitelite.htm