How to Lie with Statistics
by Darrell Huff
The book is a fantastic primer on how we’re tricked, daily, by the sneaky use of statistics. It’s a must-read for anyone to more intelligently interpret news, media, and even medical research. Once you learn to see these tricks, you cannot unsee them.
Don’t be fooled by missing data: Just because taking a drug clears up a cold in one week doesn’t mean that it wouldn’t have cleared up in a week on its own.
Many conclusions you see come from samples that are too small, biased, or both.
When you hear a statistic, say, that the average American brushes their teeth 1.02 times a day, ask yourself: “How could they have figured it out?” Does it make sense that it could have been researched effectively? In this case, they would have had to ask, and don’t you think it’s a safe assumption people lied?
“To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed.”
If a psychiatrist says that “practically everyone is neurotic,” do you suppose that their impression has been biased by their line of work?
There are three kinds of average:
The mean: add up all the values and divide by the quantity of valuesThe mode: the most common valueThe median: the value in the middle of the sample
These can be very different numbers, and reporters and others will pick the one that best supports their argument.
In normal distributions, the three will be near each other, but in irregular distributions (e.g. annual household income) you’ll get vastly different numbers for each.
Companies will keep running experiments until they get the results they want, discarding the experiments that “failed to produce significant findings.”
With smaller samples you have larger variance. With 10 coin flips you might get 8 heads, but you’re much less likely to get 80 heads in 100 coin flips.
These three are all the same graph (data wise), but very different impression wise:
You have to look at the range of data being used on both axes. Another example:
The Semiattached Figure
“If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing.”
For example: “You can’t prove that your nostrum cures colds, but you can publish (in large type) a sworn laboratory report that half an ounce of the stuff killed 31,108 germs in a test tube in eleven seconds.”
More: ““27 percent of a large sample of eminent physicians smoke Throaties— more than any other brand.” The figure itself may be phony, of course, in any of several ways, but that really doesn’t make any difference. The only answer to a figure so irrelevant is “So what?” With all proper respect toward the medical profession, do doctors know any more about tobacco brands than you do? Do they have any inside information that permits them to choose the least harmful among cigarettes? Of course they don’t, and your doctor would be the first to say so. Yet that “27 percent” somehow manages to sound as if it meant something.”
Or: “By the same kind of nonsense that the article writer used you can show that clear weather is more dangerous than foggy weather. More accidents occur in clear weather, because there is more clear weather than foggy weather. All the same, fog may be much more dangerous to drive in.”
You can also represent the same data in many different ways: “There are often many ways of expressing any figure. You can, for instance, express exactly the same fact by calling it a one percent return on sales, a fifteen percent return on investment, a ten-million-dollar profit, an increase in profits of forty percent (compared with 1935– 39 average), or a decrease of sixty percent from last year.”
Correlation vs. Causation
“It is the one that says that if B follows A, then A has caused B. An unwarranted assumption is being made that since smoking and low grades go together, smoking causes low grades. Couldn’t it just as well be the other way around?”
“This is the post hoc fallacy at its best. It says that these figures show that if you (your son, your daughter) attend college you will probably earn more money than if you decide to spend the next four years in some other manner. This unwarranted conclusion has for its basis the equally unwarranted assumption that since college-trained folks make more money, they make it because they went to college. Actually we don’t know but that these are the people who would have made more money even if they had not gone to college.”
How to Talk Back to a Statistic
These five questions help you avoid getting tricked by statistics.
Who Says So?
“About the first thing to look for is bias— the laboratory with something to prove for the sake of a theory, a reputation, or a fee; the newspaper whose aim is a good story; labor or management with a wage level at stake.”
How Does He Know?
“Watch out for evidence of a biased sample, one that has been selected improperly or— as with this one— has selected itself. Ask the question we dealt with in an early chapter: Is the sample large enough to permit any reliable conclusion?”
“Watch out for an average, variety unspecified, in any matter where mean and median might be expected to differ substantially.”
“Sometimes it is percentages that are given and raw figures that are missing, and this can be deceptive too. Long ago, when Johns Hopkins University had just begun to admit women students, someone not particularly enamored of coeducation reported a real shocker: Thirty-three and one-third percent of the women at Hopkins had married faculty members! The raw figures gave a clearer picture. There were three women enrolled at the time, and one of them had married a faculty man.”
“A report of a great increase in deaths from cancer in the last quarter-century is misleading unless you know how much of it is a product of such extraneous factors as these: Cancer is often listed now where “causes unknown” was formerly used; autopsies are more frequent, giving surer diagnoses; reporting and compiling of medical statistics are more complete; and people more frequently reach the most susceptible ages now. And if you are looking at total deaths rather than the death rate, don’t neglect the fact that there are more people now than there used to be.”
Did Somebody Change the Subject?
Is that the real statistic, or what someone reported? (e.g. how often they bathed).
“The “population” of a large area in China was 28 million. Five years later it was 105 million. Very little of that increase was real; the great difference could be explained only by taking into account the purposes of the two enumerations and the way people would be inclined to feel about being counted in each instance. The first census was for tax and military purposes, the second for famine relief.”
Does it Make Sense?
“Hearings on amendments to the Social Security Act have been haunted by various forms of a statement that makes sense only when not looked at closely. It is an argument that goes like this: Since life expectancy is only about sixty-three years, it is a sham and a fraud to set up a social-security plan with a retirement age of sixty-five, because virtually everybody dies before that.”