A re-read. This time, in the interests of retention, I've written down one key reminder (in some cases, a couple) from each chapter:
- Counting. Ask how the thing being counted has been defined. Also remember that information/data-collection is probably not perfect.
- Size. Ask whether the number is a big number in its context. A useful hack is to take the number and divide it by something human to put it in proportion - e.g. for a government expenditure, divide it by the affected population and/or convert it to a daily/weekly/annual cost.
- Chance. Clusters will form by chance rather than being perfectly evenly distributed - e.g. a higher than usual cluster of cancer incidence may well just be chance.
- Up and Down. Numbers will naturally go up and down over time - i.e. regress to the mean - so we should look at a longer timeframe when identifying trends and attributing causation (e.g. traffic accident rates before/after speed cameras installed).
- Averages. It's important to have a sense of the distribution making up an average. Extremes at either end have a large effect (e.g. mean wealth). Consider whether mean, median or mode is most useful, and what group you're interested in. Average is not the same as typical.
- Performance. Measures of performance only capture the thing being measured (not the whole picture) and may be gamed. One approach is looking only for those so far out of line that a real problem is indicated.
- Risk. To understand how worried we should be by a percentage increase in the risk of something, we have to know the baseline. It might be that the baseline is so low that even an apparently large percentage increase would make little practical difference.
- Sampling. Is the sample large enough that it could plausibly represent the total population? What biases might exist in the sample? Small biases can lead to large errors when samples are extrapolated to whole populations; check the confidence interval (the range within which there's a 95 percent chance the real answer is).
- Data. Collecting good data is difficult.
- Shock figures. Outliers exist in most distributions, and will often explain an extreme data point (i.e. rather than concluding that a whole paradigm of accepted knowledge needs to change). Extremes should be treated with healthy scepticism - we should expect a higher standard of proof before accepting them.
- Comparison. When comparing groups, make sure they are like-for-like in all relevant ways. Composite indicators (which bundle together multiple measures) are especially tricky.
- Causation. Always ask whether it is only correlation and not causation - e.g. girls do better in single-sex schools but this is probably explained by single-sex school students being higher SES and single-sex schools being selective. The more plausible something sounds, the more likely we are to mistake correlation for causation.