Read other blogs in this series: Part I, Basic Science and Animal Research | Part III, Anecdotes | Part IV, Expert Opinions

In a recent blog, I looked at the failure of Vitamin A to prevent lung cancer in human trials–despite massive hype and other positive research–to demonstrate the rule that we don’t know something is safe and effective in people until it has been adequately tested in people. In my last blog, I looked at some of the limitations of animal research in predicting human safety and efficacy. In this blog, we will look at how easy it is for correlations to be misleading, even if based on a large numbers of observations.

In contrast to much of medicine that studies disease and health in individuals, epidemiology studies health and disease at a population level. As with animal research, there are certain advantages to this approach, such as being able to uncover the impact of certain environmental exposures on health, or determine the impact of public health policy on pandemic spread. There are also limitations, particularly when looking at correlational studies.

In a correlation study, researchers collect data on one or more health outcomes of interest (e.g. lung cancer, longevity, happiness) and several potential predictors of this outcome (e.g. smoking, diet, TV watching, zip code) in a sample of people. Researchers then look for correlations between the predictors and health outcomes. This seems like a pretty straight forward way to determine whether a certain predictor causes a certain health outcome or disease, but there are many ways this can go wrong:

  1. There could be bias in the sample. If I’m interested in determining whether farm work is associated with certain diseases, but only sample English-speaking people, I could underestimate some significant risks that may impact more vulnerable non-English speakers.
  2. There could be bias in who responds. If I send out a survey on “Cannabis and Happiness,” it’s likely that people who respond to the survey may be more likely to have strong feelings on the topic than people who don’t respond.
  3. The results could simply represent a statistical fluke. Ironically, the more predictors researchers look at, the more likely it is that they will come up with an erroneous conclusion. In fact, if you look at enough predictors, you can almost guarantee that you will make an error, as happened to a Swedish research group that sought to determine whether living close to power lines caused any of a list of over 800 diseases
  4. Even if the correlation is real, it does not prove causation. 
    1. Sometimes a correlation may arise because of a shared, but unmeasured, causal factor. For example, yellow teeth may be associated with lung cancer, but that is because both are associated with smoking; teeth whitening will not prevent cancer.
    2. Sometimes the conclusions drawn may actually reflect reverse causation. For example, one may see a correlation between smoking and schizophrenia, and conclude that smoking causes schizophrenia; however, it appears that at least some of this correlation may reflect persons with schizophrenia finding some symptom relief from smoking.
    3. Sometimes a correlation may simply reflect larger trends in society or other confounding factors. This website goes into this and other causation errors in depth, including a striking graph on the correlation (NOT CAUSATION) of U.S. spending on science and deaths by hanging.

The key takeaway here is that one must be skeptical of drawing strong conclusions, particularly about causation, from observational and correlational studies. This happens all the time; many news headlines and medical bullshit books are based on very weak and spurious correlations when you track down the source of the claim.