Statistical thought experiments

I guess I'm one of the few "numerate" people around--at the end of "Deep Impact" when the comet the size of Mt. Everest was blown up into a million pieces, I conjecture that I was the only person in the theater to figure out that each piece still weighed one million tons.

This doesn't seem necessarily that interesting a skill but it does have some personal benefits. For example, I can easily address the well meaning yet annoying relatives who insist that I should wear a helmet whenever I ride my bicycle. My response is that they are incorrectly applying the "ergodic hypothesis" which asserts that the statistical history of a single individual reflects the statistical average of the whole system at a given time. In particular, since I've never crashed in the 50,000 miles I've ridden in the last 7 years, I probably won't crash in the future either, no matter what most other people's experience is. A more persuasive argument against the ergodic hypothesis is to design what I call a "statistical thought experiment," i.e., a study which would never be made because its premise is inherently flawed. In this case:

"Since 10% of the population is over 6ft. tall, I will be over 6ft. tall for 10% of my life."

That was a somewhat inconsequential example, but such thought experiments can have an important function in casting doubt on entire scientific fields, especially if they predict correct results. For example, the pitfalls of epidemeology are highlighted by the following correct thought experiment:

"Smoking reduces a woman's chance of dying of breast cancer."

In fact, one can easily generate numerous "correct" flawed studies:

Collecting such fallacies was a hobby of mine, but I've recently become aware of many existing studies which exhibit such inherent flaws. A recent example which got me going was heard on National Public Radio (usually a goldmine of disinformation, as educational radio and TV are often the only news sources to go into enough depth to get things really wrong). The claim was that a study showed that cigar smokers had an even higher rate of heart disease than cigarette smokers. My immediate reaction was the undoubtedly correct thought experiment:

"Driving a Cadillac causes heart disease."

This got me to start wondering about the risks of smoking. In particular, it seems clear to me that smoking in the U.S.A. is highly correlated to poor diet, lack of exercise, and poverty, so any study correlating smoking to heart disease should take these factors into account (I'm assuming that smoking doesn't actually cause these factors). The relevant thought experiment in this case is the plausible:

"Heart disease among smokers in Switzerland is lower than the instance of heart disease in the general U.S. population."

The moral is, as always when dealing with statistical studies, to spot the hidden agenda motivating the study and see how it leads to skewed results.

Challenge Problem: Design your own statistical thought experiment.

Explanation of statistical thought experiments.

  1. If a woman smokes she is more likely to die of lung cancer, heart disease, or respiratory disease before dying of anything else.

  2. The easiest explanation is the similar thought experiment:

    "Right handed people also live longer than the rest of the population."

    The point is that it takes some time for a child to exhibit handedness, so "the general population" includes infant deaths in the first year (moreover, mortality is higher near birth and the low age compared to the average gives these deaths greater weight). Some psychologists have objected because they claim that handedness is apparent from birth. In that case, substitute "left handed people" with "firefighters."
  3. The vast majority of cycling accidents happen to children and these are now required to wear helmets in states such as California. Of course, this doesn't affect a particular individual's risk of having an accident.

  4. People who exercise vigourously require a minimum health level in order to exercise at that intensity, so for the study to be fair it has to compare only those people who are healthy enough to exercise vigorously, but choose not to. Unlike the other examples, this fallacy has probably been implemented with this exact flaw.

  5. Cadillac drivers are usually old males who have by far the highest instance of heart disease. Similarly, most cigar smokers are older males.

  6. There is a high instance of smoking in Switzerland, yet it does not seem as highly correlated to poor diet or lack of exercise as it is in the U.S. Moreover, there is less poverty in Switzerland than in the U.S. in which a significant percentage of the population has no health insurance.