Simpson's paradox is the reversal of a trend or comparison when data are aggregated versus when they are split into groups. A treatment can look better in every subgroup but worse overall — or worse in every subgroup but better overall. The cause is usually a confounding variable: a factor that differs across groups and is related to both the grouping and the outcome. Ignore the confound and you see one result; control for it (by looking within groups) and you see the opposite. The paradox is named after Edward Simpson (1951), though the phenomenon appears in Yule (1903) and earlier.
Classic example: a university is accused of bias because a lower share of women than men are admitted overall. When admission rates are broken down by department, women have equal or higher admission rates in almost every department. The confound is that women apply more to highly selective departments. The aggregate comparison mixes application mix with admission policy. Within department, the trend reverses. The right conclusion depends on the question: if you care about fairness by department, the data show no bias; if you care about overall representation, the mix of applications matters.
In business, Simpson's paradox appears whenever you aggregate across segments that differ on a key driver. Conversion might be higher for both mobile and desktop when you look at each channel separately, but lower overall if the mix shifts toward the channel with lower conversion. Revenue per user might be higher in every segment but lower overall if you're adding users in lower-ARPU segments. The discipline is to check: does the trend hold within subgroups? If not, you're in Simpson's paradox territory — report by segment and state the confound.
Section 2
How to See It
Simpson's paradox appears when an aggregate trend reverses within subgroups. Look for "overall X is higher for A than B, but within every segment B is higher than A" — or the reverse. The diagnostic: when you slice by a plausible confound (segment, channel, cohort), does the direction of the comparison flip?
Business
You're seeing Simpson's Paradox when overall conversion falls month-over-month even though conversion improved in every segment (e.g. by device, region, source). The mix of traffic shifted toward segments with lower conversion — e.g. more mobile traffic. The aggregate hides the within-segment improvement. Slice by segment to see the real trend.
Technology
You're seeing Simpson's Paradox when a new algorithm shows worse overall engagement than the old one, but better engagement in every user cohort (new users, power users, etc.). The confound is that the new algorithm attracts or retains a different mix of users — e.g. more light users — so the aggregate comparison is misleading. Evaluate within cohort.
Investing
You're seeing Simpson's Paradox when a fund's overall return is lower than the index even though the fund outperforms in every sector it holds. The fund may be overweight sectors that did poorly; the sector mix, not stock selection, drives the aggregate underperformance. Segment by sector to judge selection.
Markets
You're seeing Simpson's Paradox when a policy or treatment appears to reduce an outcome overall but increase it in every demographic or region. The confound is usually composition: the treated group has a different mix. Policy evaluation requires within-group comparison or explicit adjustment for the confound.
Section 3
How to Use It
Decision filter
"When you see a trend or comparison (A vs B), ask: could a confound — a variable that differs across groups and affects the outcome — reverse the result within subgroups? Slice by segment, channel, or cohort. If the direction flips, report by segment and name the confound. Don't act on the aggregate alone."
As a founder
Slice metrics by segment before drawing conclusions. If overall conversion is down, check conversion by channel, cohort, and product. If it's up in every slice, the issue is mix shift — you're adding users or traffic from lower-converting segments. Fix the mix or fix the segment economics; don't optimise the wrong lever. When presenting to the board, show both aggregate and segment view so Simpson's paradox doesn't mislead.
As an investor
When a company reports a metric that moved the wrong way, ask for a segment breakdown. Is the trend consistent within segments, or does mix explain it? Portfolio-level returns can hide Simpson's paradox: the portfolio might underperform while every holding outperforms its peer group if the allocation is tilted toward underperforming segments.
As a decision-maker
Before acting on a comparison (this group vs that, this period vs that), check for confounds. Slice by the obvious candidates: segment, region, product, cohort. If the comparison reverses within slices, the aggregate is misleading. Decide based on the level that matches your question — segment-level fairness vs aggregate outcome — and state the confound explicitly.
Common misapplication: Treating the aggregate as the truth. When Simpson's paradox holds, the aggregate and the within-group results conflict. The right answer depends on the question. "Are we biased by department?" → look within department. "What is overall admission rate by gender?" → aggregate is correct but confounded by application mix. Specify the question, then choose the level of analysis.
Second misapplication: Ignoring mix shift. Many "our metric went down" stories are mix shift: you're adding volume in a segment with lower conversion, LTV, or margin. That is Simpson's paradox in growth form. Segment so you see whether you're improving within segment or just diluting with different mix.
Bezos emphasised "disaggregated metrics" — looking at segments rather than only totals. Amazon's culture of slicing by product, geography, and cohort reduces the risk of Simpson's paradox: a drop in aggregate conversion triggers a segment-level check. Mix shift (e.g. international growth with lower conversion) is understood as composition, not universal decline.
Netflix evaluates content and product by cohort and region. Overall engagement can move because of mix (more users in lower-engagement regions) rather than because every segment changed. Hastings has pushed for segment-level accountability so that teams don't hide behind aggregate numbers that are confounded by mix.
Section 6
Visual Explanation
Simpson's paradox: aggregate trend (e.g. A > B) reverses within subgroups (B > A in every segment). Cause: confound that differs across groups. Slice by segment to see the real relationship.
Section 7
Connected Models
Simpson's paradox sits at the intersection of confounding, correlation vs causation, and segmentation. These models either explain the paradox or help avoid it.
Reinforces
Confounding Factor
A confounding factor is a variable that influences both the explanatory variable and the outcome. Simpson's paradox is the dramatic case: the confound differs across groups and reverses the aggregate comparison. Controlling for the confound (slicing by it) reveals the real relationship. The two are the same idea at different levels of formality.
Reinforces
Segmentation
Segmentation is splitting the data into meaningful groups. Simpson's paradox is detected and resolved by segmenting: when you slice by the right variable, the paradox appears or disappears. Good segmentation is the antidote to aggregate illusion.
Tension
Correlation vs Causation
Correlation can reverse when you control for a confound — that is Simpson's paradox. The tension: the aggregate correlation may not reflect causation in any segment. Establishing causation requires controlling for confounds; Simpson's paradox is a warning that aggregate correlation can be misleading.
Tension
Selection Bias
Selection bias is when the sample is not representative. Simpson's paradox can look like selection: the "groups" (e.g. treated vs control) have different composition. The tension: sometimes the paradox is due to a confound you can measure and slice by; sometimes it's selection into the sample. Both require careful interpretation of aggregate vs within-group results.
Section 8
One Key Quote
"It is possible for a set of data to show a trend in a given direction when separated into groups, and the opposite trend when combined."
— Edward Simpson, 1951
The definition is the paradox. One dataset, two valid readings — and they point opposite ways. The takeaway: always ask whether the trend holds within subgroups. If it doesn't, the aggregate is confounded. Report both and name the confound.
Section 9
Analyst's Take
Faster Than Normal — Editorial View
Slice before you conclude. Any time a key metric moves, slice by segment, channel, and cohort. If the trend is positive in every slice but negative overall, you have mix shift — Simpson's paradox. Fix the mix or fix the segment; don't optimise the wrong thing.
Name the confound. When you present segment-level results that differ from the aggregate, state what's driving the reversal. "Overall conversion is down because mobile share increased and mobile converts lower." That sentence tells the reader you're not hiding behind the aggregate.
Match the level to the question. "Are we biased in admissions?" → look within department. "What's our overall gender admission rate?" → aggregate, but then explain mix. The paradox doesn't tell you which level is "right"; it tells you they can conflict. Choose the level that answers the question you care about.
Watch for it in A/B tests. If the overall treatment effect is zero or negative but positive in every segment, check for a confound (e.g. segment mix differed between arms). Pre-stratify or analyse within segment so the paradox doesn't hide a real effect.
Section 10
Test Yourself
Is this mental model at work here?
Scenario 1
Overall conversion rate falls from 4% to 3.5%. When the team slices by device, conversion is up on both mobile (2% to 2.2%) and desktop (6% to 6.2%). Traffic mix shifted from 50% mobile to 70% mobile.
Scenario 2
A university's overall admission rate is lower for women than men. Within every department, women's admission rate is equal or higher than men's. Women apply more to the most selective departments.
Scenario 3
A company reports that ARPU is down 5% year-over-year. They do not break out ARPU by segment or cohort.
Scenario 4
A team runs an A/B test. Overall treatment has no significant effect. Within each of three user segments, treatment is positive and significant. The team concludes the treatment works and ships it.
Section 11
Further Reading
Simpson's paradox is a staple of statistics and causal inference. These sources cover the math, examples, and how to avoid misinterpretation.
Pearl on causal inference. Simpson's paradox appears in the context of confounding and the need to control for the right variables. Accessible treatment of when and why conditioning reverses associations.
McElreath's Bayesian treatment of regression and confounding. Simpson's paradox appears when you add a variable and a coefficient flips. The book shows how to think about and model confounds.
Summary: Simpson's paradox is the reversal of a trend when data are aggregated versus when split into subgroups. A confound (a variable that differs across groups and affects the outcome) drives the reversal. Slice by segment to see the real relationship; report both aggregate and segment-level results and name the confound. Don't act on the aggregate alone when the within-group trend points the other way.
Leads-to
Regression Analysis
Regression analysis controls for multiple variables at once. When you add the confound as a covariate, the coefficient for the main variable can flip — the regression analogue of Simpson's paradox. Multiple regression is the formal way to "slice" by many confounds simultaneously.
Leads-to
Berkson's Paradox
Berkson's paradox is a selection-induced reversal: in a selected sample, two variables can be negatively correlated even when they are positively correlated in the population. Like Simpson's paradox, it's a reversal due to conditioning (selection). Different mechanism, same lesson: conditioning changes relationships.