The replication crisis is the discovery that many published research findings do not hold when other researchers try to repeat the same study. In psychology, medicine, economics, and other fields, high-profile results have failed to replicate. The causes include publication bias (negative results don't get published), p-hacking (researchers try many analyses until one is "significant"), small samples (underpowered studies produce fragile results), and incentive misalignment (careers reward novel, positive findings). The crisis doesn't mean all science is wrong — it means that a single study, especially a surprising one, should be treated as provisional until replicated.
The mental model applies beyond academia. In business and investing, "studies show" and "research says" often rest on single papers or internal analyses that were never replicated. The discipline is to treat first findings as hypotheses, not truths; to prefer replicated and pre-registered work; and to be suspicious of results that are too good, too neat, or from small samples. When you base decisions on evidence, ask: has this been replicated? Was the analysis pre-registered? What's the base rate for replication in this domain?
Section 2
How to See It
Replication-crisis thinking shows up when someone treats a single finding as definitive or when they ask "has this been replicated?" The diagnostic: are we relying on one study, one experiment, or one internal analysis? Look for claims that cite a single paper, a single A/B test, or a single backtest. When the finding is surprising or convenient, extra scepticism is warranted. The pattern: first result → excitement → replication attempt → often failure. The discipline is to wait for replication or to treat the first result as a hypothesis to be tested.
Business
You're seeing Replication Crisis when a company bases a product or strategy decision on one successful A/B test or one cohort analysis. The result might be real — or it might be noise, p-hacking, or a fluke. Without replication (run the test again, or in another segment), the finding is provisional. The same applies to "best practices" drawn from a single case study. One company's success might be replicable; it might not. The crisis mindset: treat single findings as hypotheses.
Technology
You're seeing Replication Crisis when a model or algorithm is validated on one dataset or one time period and then deployed. Out-of-sample and out-of-time validation are forms of replication. When a quant strategy or ML model hasn't been tested on holdout data or new periods, it may not replicate. The crisis in ML and data science is the same: many published or internal results don't generalise. Replication means testing on new data and new contexts.
Investing
You're seeing Replication Crisis when an investor or analyst cites "research shows" or a single backtest to support a strategy. Backtests can be overfit; single studies can be wrong. The discipline is to ask: has this been replicated in other markets, periods, or by other researchers? What's the track record of this type of finding? The replication crisis in finance is well documented — many published factor and strategy results don't hold up out of sample.
Markets
You're seeing Replication Crisis when policy or regulation is based on a handful of studies that haven't been replicated. Evidence-based policy is good; policy based on unreplicated findings is risky. The same applies to industry standards or benchmarks that rest on single studies. The mental model: prefer meta-analyses and replicated work; treat single studies as suggestive, not conclusive.
Section 3
How to Use It
Decision filter
"Treat single findings as provisional. Prefer replicated, pre-registered, or out-of-sample evidence. When someone says 'research shows' or 'the data says,' ask: how many studies? Were they replicated? Was the analysis pre-registered? Adjust your confidence and your decisions accordingly."
As a founder
When you run experiments or analyses, don't treat the first significant result as the truth. Replicate: run the test again, or in another segment or period. Pre-register analyses where possible so you're not p-hacking. When you cite external research to support a decision, check whether the finding has been replicated — in psychology and medicine, many headline results have failed. The mistake is building strategy on a single study or a single internal test. The second mistake is ignoring negative replications because they're less exciting. Update your beliefs when replication fails.
As an investor
Portfolio companies and research often present single-study or single-backtest evidence. Ask: has this been replicated? For strategies and factors, what's the out-of-sample record? For product and growth claims, were the experiments run more than once? The replication crisis in asset pricing and factor investing means many published alphas don't hold. Apply the same scepticism to internal and external research. Prefer evidence that has been stress-tested across time, markets, or teams.
As a decision-maker
When evidence is presented to support a decision, grade it. Single study, no replication, surprising result? Low confidence. Replicated, pre-registered, or consistent across contexts? Higher confidence. Don't let a striking finding override the base rate — in many fields, most first findings don't replicate. Build a habit of asking "has this been replicated?" and of treating first findings as hypotheses. That reduces the chance of betting on false positives.
Common misapplication: Dismissing all research as unreliable. The replication crisis doesn't mean nothing replicates — it means we should distinguish between single findings and replicated bodies of work. Use the crisis to calibrate confidence, not to reject evidence altogether.
Second misapplication: Demanding replication for every small decision. Some decisions are low-stakes or time-sensitive; you'll often act on the best available evidence. Reserve strong replication standards for high-stakes, repeatable decisions. The model is a calibration tool, not a veto on all single-study evidence.
Section 4
The Mechanism
Section 5
Founders & Leaders in Action
Charlie MungerVice Chairman, Berkshire Hathaway, 1978–2023
Munger has long warned about incentive-caused bias and the reliability of reported results. His point: when people are rewarded for certain outcomes, they'll produce them — including in research and analysis. The replication crisis is incentive-caused bias in academia: careers reward novel, positive findings, so the literature is skewed. Munger's discipline is to ask "what are the incentives?" and to treat single findings with scepticism when the incentives favour positive or surprising results. The same applies to business: don't trust one backtest or one case study without asking whether it could be selection or gaming.
Renaissance's edge depends on strategies that replicate out of sample and across time. Simons has emphasised that in quantitative investing, most ideas don't work when tested rigorously — they're overfit or flukes. The replication standard is built in: if a signal doesn't hold on holdout data or new periods, it's discarded. The replication crisis in finance — many published factors and alphas fail out of sample — is why firms like Renaissance treat in-sample results as provisional. The discipline: test on new data; don't deploy until it replicates.
Section 6
Visual Explanation
Replication Crisis: Many first findings fail when repeated. Treat single studies as provisional; prefer replicated, pre-registered, or out-of-sample evidence.
Section 7
Connected Models
The replication crisis sits with models of evidence, bias, and inference. The connections below either describe the same problem (publication bias, p-hacking), the mindset that amplifies it (confirmation bias), or the tools that address it (RCT, scientific method, significance).
Reinforces
Scientific Method
The scientific method is hypothesis, test, revise. Replication is the "test" that separates real findings from artefact. When we skip replication or don't value it, we weaken the method. The replication crisis is a reminder that the method requires replication and that incentives have to reward it.
Reinforces
Publication Bias
Publication bias is the tendency to publish positive results and not negative or null results. It's a direct cause of the replication crisis: the published literature is skewed toward findings that are more likely to be false positives. Fixing publication bias — publishing replications and null results — is part of fixing the crisis.
Reinforces
P-hacking
P-hacking is trying many analyses or specifications until one is "significant." It produces false positives that won't replicate. The replication crisis is partly a p-hacking crisis. The fix is pre-registration (commit to the analysis before seeing the data) and replication (run the pre-registered analysis again).
Reinforces
Confirmation Bias
Confirmation bias is the tendency to seek and accept evidence that supports our view. The replication crisis is exacerbated when we prefer striking, positive findings and downplay replications that fail. The discipline is to update when replication fails and to treat first findings as hypotheses, not confirmations.
Section 8
One Key Quote
"When the same scientific question is subjected to independent replication, the proportion of findings that are confirmed is often surprisingly low."
— John Ioannidis, Why Most Published Research Findings Are False (2005)
Ioannidis's paper was an early formal argument that many published findings are false. The replication efforts of the 2010s confirmed it in several fields. The practitioner's job: assume that a single finding might not replicate; prefer replicated and pre-registered work; and calibrate confidence and decisions accordingly.
Section 9
Analyst's Take
Faster Than Normal — Editorial View
Ask "has this been replicated?" When someone cites a study or an internal result, that's the first question. Single findings are provisional. Replicated findings (same result in another sample, period, or team) deserve more weight. In many domains, the base rate for replication is low — use that to calibrate.
Pre-register when you can. When you're running an analysis or an experiment, state the hypothesis and the analysis plan before you see the full results. That reduces p-hacking and makes the result more interpretable. Pre-registration doesn't guarantee truth, but it reduces the chance that the finding is an artefact of flexible analysis.
Don't dismiss all research. The replication crisis is a calibration tool, not a reason to reject evidence. Some findings replicate; meta-analyses and replicated bodies of work are valuable. The discipline is to distinguish between single, surprising findings and evidence that has been stress-tested. Use the former as hypotheses; use the latter with appropriate confidence.
Apply the same standard to internal work. Backtests, A/B tests, and internal analyses can suffer from the same ills: selection, p-hacking, underpowering. Replicate internal findings when the decision is high-stakes. Treat the first significant result as a hypothesis to be confirmed.
Section 10
Test Yourself
Is this mental model at work here?
Scenario 1
A company bases a major product decision on one A/B test that showed a significant lift. They don't rerun the test or check other segments.
Scenario 2
A team pre-registers their analysis plan before seeing the data, runs the analysis, and then replicates it on a holdout sample.
Scenario 3
An analyst dismisses all published research as unreliable because 'most findings don't replicate.'
Section 11
Summary & Further Reading
Summary: The replication crisis is the finding that many published research results do not hold when replicated. Causes include publication bias, p-hacking, small samples, and incentives for novelty. Use the model by treating single findings as provisional; preferring replicated, pre-registered, or out-of-sample evidence; and asking "has this been replicated?" when basing decisions on research. Don't dismiss all evidence — calibrate confidence. Connected ideas include scientific method, publication bias, p-hacking, confirmation bias, and RCTs.
The foundational paper. Ioannidis argues mathematically that under plausible assumptions, most published findings are false. The replication crisis confirmed the argument empirically.
Gawande on reducing error in complex tasks. Replication and pre-registration are checklists for research quality. The same mindset: make the process explicit to reduce failure.
Kahneman on biases and heuristics. He has written and spoken extensively on the replication crisis in psychology. The book provides the cognitive basis for why we're drawn to striking findings and why replication is essential.
Leads-to
Randomized Controlled Experiment
RCTs are the gold standard for causal evidence. They're also expensive and can be run only once in some settings. The replication crisis says: when possible, replicate the RCT or run it in another context. Single RCTs are stronger than single observational studies, but replication still raises confidence.
Tension
Statistical Significance
Statistical significance (p < 0.05) is often used as a gate for publication. The replication crisis shows that many "significant" results don't replicate — partly because p-hacking and publication bias make the published p-values optimistic. Significance is necessary but not sufficient; replication and pre-registration are the supplements.