·Mathematics & Probability
Section 1
The Core Idea
Most people think they update their beliefs when new evidence arrives. They don't. They filter evidence through existing beliefs and keep whatever confirms what they already think. Bayes' Theorem is the mathematical antidote — a 260-year-old formula that tells you exactly how much to change your mind, and in which direction, given new information. The equation is compact: P(A|B) = P(B|A) × P(A) / P(B). The discipline it demands is not.
The theorem was discovered by Reverend Thomas Bayes, an English Presbyterian minister and amateur mathematician who never published it. After Bayes died in 1761, his friend Richard Price found the manuscript, refined it, and presented it to the Royal Society in 1763. Pierre-Simon Laplace independently derived a more general version in 1774 and spent the next four decades developing its implications. The irony is thick: a clergyman's unpublished paper became the foundation of modern statistical inference, machine learning, and quantitative finance.
Laplace built an entire mathematical edifice on it. The 20th-century frequentist school — led by Ronald Fisher and his intellectual descendants — tried to bury it for the better part of a century. They largely succeeded in academia. But Bayesian methods kept surviving in practice, in the places where getting the right answer mattered more than methodological purity: wartime codebreaking, insurance pricing, nuclear weapons testing. By the 1990s, when cheap computing power made Bayesian calculations tractable for complex problems, the theorem came roaring back. Today it powers everything from Google's search algorithm to the trading systems at Renaissance Technologies.
Here's what the formula actually says, stripped of notation. You start with a belief — your prior. Say you estimate a 10% chance that a startup will reach $100 million in revenue. Then you observe evidence: the company signs enterprise contracts with three Fortune 500 clients in a single quarter. Bayes' Theorem tells you to update your 10% estimate, but by how much?
The answer depends on one question: how likely would you have been to see this evidence if your hypothesis were true, compared to if it were false? If signing three Fortune 500 deals is extremely unlikely for a company that won't reach $100M (say, 2% chance) but fairly likely for one that will (say, 60%), the evidence is "surprising" — it carries a high likelihood ratio. Your posterior belief should jump dramatically, perhaps to 75% or higher. If the evidence is roughly what you'd expect either way, it barely moves the needle. That ratio — the probability of the evidence given your hypothesis divided by the probability of the evidence given the alternative — is everything.
That likelihood ratio is the engine of the whole framework. It's what separates Bayesian reasoning from gut-feeling updates. Most people do one of two things when they encounter new information: they ignore it entirely (anchoring to their prior) or they overreact to it (recency bias, narrative fallacy). Bayes forces a calibrated middle path. The update is proportional to the surprise. Expected evidence teaches you almost nothing. Shocking evidence should transform your worldview.
Ed Thorp built a career on this insight — from the blackjack tables of Las Vegas in the 1960s to options markets in the 1970s.
Jim Simons built the most profitable hedge fund in history around it. Alan Turing used it to crack the Enigma cipher at Bletchley Park, inventing his own unit of measurement (the "ban") to quantify how much each intercepted message should shift the probability distribution over possible settings.
The formula hasn't changed since 1763. What's changed is the number of people who can apply it under pressure — and the computational power available to those who can. Machine learning, at its core, is Bayesian updating at industrial scale: algorithms that start with priors, observe millions of data points, and converge on posteriors that can predict credit risk, diagnose diseases, or translate languages. The theorem is everywhere. The discipline to apply it correctly — to resist the temptation to anchor, to override the emotional resistance to updating, to accept that you might be wrong — remains rare.
Consider medical diagnostics — the domain where base rate neglect causes the most measurable harm. A mammogram has a 90% sensitivity rate (it correctly detects 90% of actual cancers) and a 9% false positive rate. A 40-year-old woman with no family history tests positive. Most doctors — and virtually all patients — hear "90% accuracy" and conclude the probability of cancer is roughly 90%.
The actual number, calculated via Bayes, is approximately 9%.
The base rate of breast cancer in that population is about 1%. When you work the math, the low prior probability dominates: of every 1,000 women screened, roughly 10 have cancer (9 of whom test positive) and 990 don't (89 of whom also test positive). The positive result is far more likely to be a false alarm than a true detection.
Gerd Gigerenzer, the German psychologist who has spent decades studying this, found that roughly 80% of physicians get this calculation wrong. Not medical students — practising physicians. That's not a minor cognitive hiccup. It leads to unnecessary biopsies, treatment for conditions that don't exist, and a systematic distortion of how patients understand risk. Gigerenzer's proposed fix — presenting statistics as natural frequencies rather than percentages — dramatically improved physician accuracy in clinical trials. The problem was never that doctors can't do math. The problem was that percentages obscure base rates in a way that frequencies don't.
The non-obvious implication runs deeper than medical math. Rare events require extraordinary evidence to confirm. When the base rate is low — whether you're diagnosing cancer, evaluating a fraud accusation, or assessing the probability that a startup will become a unicorn — even a highly accurate signal produces mostly false positives. The people who internalise this have an enormous decision-making advantage over those who don't. And the advantage compounds: every decision calibrated by base rates produces slightly better outcomes, and those outcomes accumulate across hundreds of decisions into a measurably superior track record.
Nate Silver's FiveThirtyEight model gave
Donald Trump a 29% chance of winning the 2016 presidential election while most forecasters gave him 2–15%. Silver wasn't more informed. He was more Bayesian — incorporating base rates of polling errors and structural uncertainty that others ignored. When Trump won, commentators called Silver "wrong."
But a 29% event happening once isn't evidence of a broken model. It's evidence that the model understood something the pundits didn't: that uncertainty is a feature of reality, not a flaw in the forecast. The Huffington Post's model gave Trump a 2% chance. Princeton's Sam Wang gave him less than 1%. Those models failed not because they lacked data but because they ignored the base rate of polling errors — exactly the kind of mistake Bayes' Theorem is designed to prevent.