A confounding factor (confounder) is a variable that influences both the supposed cause and the supposed effect, creating a spurious association between them. You see that X and Y are correlated and infer that X causes Y — but a third variable Z causes both. Ice cream sales and drownings rise together in summer; the confounder is temperature. Without controlling for Z, you misattribute cause.
The cost is wrong inference and wrong decisions. You invest in a "cause" that doesn't work. You copy a "best practice" that succeeded for other reasons. You scale a feature that looked like the driver of retention when it was just correlated with a segment that was already sticky. Confounding is why correlation is not causation: the correlation can be entirely explained by a common cause.
The discipline is to ask: what could cause both X and Y? Then control for it (regression, stratification, matching) or break the link (randomisation). Randomised experiments assign treatment at random so that confounders are balanced across groups; that is why they are the gold standard for causal claims. In observational settings, you try to measure and adjust for confounders — knowing you may miss some (unmeasured confounding).
Section 2
How to See It
You see confounding when an observed association might be explained by a third variable that affects both the cause and the effect. The diagnostic: "X and Y are related — but could Z drive both?" When a success story ignores segment, timing, or context, confounding is a candidate explanation.
Business
You're seeing Confounding Factor when a team attributes a lift in retention to a new onboarding flow. The confounder: they launched the flow to a segment that was already higher-retention (e.g. product-qualified leads). Without a controlled experiment or adjustment for segment, the "effect" may be selection, not cause.
Technology
You're seeing Confounding Factor when a model shows that "users who get push notifications convert more." The confounder: users who enable push may be more engaged to begin with. The association is real; the causal claim (push → conversion) may be wrong. You need a test that randomises who gets push.
Investing
You're seeing Confounding Factor when a study says "CEOs who wake at 5am outperform." The confounder could be industry, company size, or survivorship — early risers may be in different kinds of firms. Without controlling for those, you can't conclude that wake time causes performance.
Markets
You're seeing Confounding Factor when a macro variable (e.g. rates) is said to "drive" asset returns. Often a third factor — growth, risk appetite, or policy regime — influences both rates and returns. The observed correlation may be confounded; the causal story may be wrong.
Section 3
How to Use It
Decision filter
"When you see X associated with Y, ask: what could cause both? List confounders. Then control for them (data) or break the link (randomise). Don't infer cause from correlation until you've considered and, where possible, ruled out confounding."
As a founder
Before you scale a "win," check for confounding. Did the segment, timing, or channel explain the result? Run A/B tests so treatment is randomised and confounders are balanced. When you only have observational data, name the likely confounders and try to control (e.g. compare within segment or use regression). The mistake is attributing success to the thing you changed when something else (who got it, when, where) drove the outcome.
As an investor
When a company attributes growth or retention to a specific lever, ask: was it randomised? If not, what could confound the association? Segment, cohort, or market conditions often do. Due diligence should include "how do you know it was cause?" — and the answer should involve design (experiment) or adjustment (control for confounders), not just correlation.
As a decision-maker
When you act on a correlation (e.g. "companies that do X perform better"), list confounders. Do they do X because they're already successful? Are they in a different market? Control or stratify before you conclude that X causes performance. Decisions based on confounded associations are systematically wrong.
Common misapplication: Assuming that controlling for one confounder is enough. There may be many; unmeasured confounding can remain. Second misapplication: Dismissing every correlation as confounded. Some associations are causal. The move is to think through confounders and test or control, not to give up on causal inference.
Netflix's culture of A/B testing and experimentation is designed to avoid confounding: by randomising who gets a feature, they balance confounders (segment, engagement, geography) across treatment and control. Hastings has emphasised that intuition and correlation are not enough — you need experiments to infer cause. That is confounding awareness in practice.
Amazon runs thousands of experiments; the default is to test, not to infer from observational data. Bezos's insistence on "disagree and commit" and on data-driven decisions includes the expectation that causal claims are backed by experiments or by explicit control for confounders. Avoiding confounded inference is part of scaling decisions correctly.
Section 6
Visual Explanation
Confounding: Z causes both X and Y, so X and Y are correlated without X causing Y. Control for Z or randomise X to infer cause.
Section 7
Connected Models
Confounding sits with models about cause, evidence, and bias. The grid below shows what reinforces it, what creates tension, and what it leads to.
Reinforces
Correlation vs Causation
Correlation does not imply causation because confounding (and reverse cause, selection) can produce correlation. Confounding is one of the main reasons we need more than association to infer cause. The two models are paired: correlation is observed; confounding explains why it can be misleading.
Reinforces
Randomized Controlled Experiment
Randomisation assigns treatment independently of confounders, so on average the groups are comparable. That is why RCTs are the gold standard for causal inference — they eliminate confounding by design. Confounding is what you're trying to avoid; RCTs are how you avoid it.
Tension
Selection Bias
Selection bias is when who is in the sample (or who gets the treatment) is related to the outcome. It can look like confounding: a third factor (selection) drives both "treatment" and outcome. The tension: both produce non-causal association; the fix may be design (randomise, include all) or analysis (control, weight).
Tension
Simpson's Paradox
Simpson's paradox is when a trend reverses when you stratify by a third variable. That variable is often a confounder: it's associated with both X and Y, and ignoring it gives the wrong sign or magnitude. Controlling for the confounder can reverse the conclusion — so confounding and Simpson's paradox are closely linked.
Section 8
One Key Quote
"If you torture the data long enough, it will confess to anything. But the confession may be confounded."
— Ronald Coase (paraphrased)
The point: association can be found between almost any X and Y if you don't control for confounders. The "confession" is the correlation; the torture is the search for a pattern. Without a design or controls that address confounding, the confession is not causal evidence.
Section 9
Analyst's Take
Faster Than Normal — Editorial View
List confounders before you conclude cause. When you see X and Y move together, write down: what could cause both? Segment, time period, geography, prior behaviour — any Z that influences both. Then control for Z or run an experiment. If you can't, state the limitation: "we see association; we don't know if it's causal."
Prefer experiments when you can. Randomisation balances confounders (measured and unmeasured) across groups. That is the cleanest way to avoid confounding. When you can't randomise, control for what you can and be explicit about what you can't — unmeasured confounding remains a threat.
Don't trust "we did X and then Y happened." Post hoc stories are confounded by definition: you chose when and where to do X, and that choice may be correlated with Y for other reasons. The only way to infer cause is to break that link (randomise) or to model it (control). Narrative is not evidence.
Section 10
Test Yourself
Is this mental model at work here?
Scenario 1
Companies with more female board members have higher returns. A report concludes that adding women causes better performance.
Scenario 2
A team runs an A/B test: 50% get the new onboarding, 50% don't. Assignment is random. They find a significant lift in retention for the new flow.
Scenario 3
Users who use the 'recommendations' feature have higher LTV. The team concludes the feature drives LTV and invests in improving it.
Scenario 4
A regression of sales on ad spend includes region, season, and segment. The coefficient on ad spend is positive and significant.
Section 11
Summary & Further Reading
Summary: A confounding factor is a variable that causes both the supposed cause and the effect, producing a non-causal association. To infer cause, control for confounders (regression, stratification) or break the link (randomise). Don't infer causation from correlation without considering and addressing confounding.
Practical guide to RCTs as the way to avoid confounding when evaluating interventions and policies.
Leads-to
Regression Analysis
Regression is a tool to control for confounders: include Z as a covariate and the coefficient on X is the association of X with Y holding Z constant. That removes (linear) confounding by Z. Multiple regression extends to many confounders. The limit is unmeasured confounders.
Leads-to
Ceteris Paribus
Ceteris paribus ("all else equal") is the thought experiment of varying one factor while holding others fixed. Controlling for confounders is the empirical version: you hold Z constant (by stratification or regression) to see the effect of X on Y. Confounding is what violates ceteris paribus when you don't control.