Philip Tetlock spent twenty years tracking 28,000 predictions made by 284 experts — political scientists, economists, intelligence analysts, journalists — across domains where their credentials were supposed to confer predictive authority. The study, published as Expert Political Judgment in 2005, is the most comprehensive empirical evaluation of expert forecasting ever conducted. The results were devastating. On average, the experts performed barely better than chance — and notably worse than simple statistical algorithms that extrapolated base rates. A dart-throwing chimp selecting from three options (things will get better, stay the same, get worse) would have matched the experts' calibration. The experts with the highest media profiles and the strongest reputations performed the worst. Credentials did not predict accuracy. Confidence did not predict accuracy. The single strongest predictor of forecasting failure was the expert's certainty that they were right.
Nassim Taleb crystallised the implication in The Black Swan and Antifragile: we do not have an expertise problem. We have a domain problem. Expertise is genuine and powerful in fields with stable rules, tight feedback loops, and repeatable patterns — chess, surgery, weather forecasting in the short term, actuarial science. These are what Tetlock calls "kind" learning environments: the relationship between action and outcome is visible, consistent, and available for calibration. A surgeon who botches a procedure sees the patient deteriorate. A chess player who makes a bad move loses the game. The feedback is immediate, unambiguous, and corrective.
The expert problem emerges in "wicked" environments — complex systems where causation is opaque, feedback is delayed or nonexistent, outcomes are driven by interactions among thousands of variables, and the same conditions never recur in identical form. Macroeconomics. Geopolitics. Long-term financial markets. Technology adoption curves. In these domains, the machinery of expertise — pattern recognition, causal modelling, scenario construction — still runs. It just produces outputs that are no more reliable than coin flips, dressed in the vocabulary of authority. The economist who predicted the 2008 crisis did not predict the 2020 pandemic crash. The geopolitical analyst who called the fall of the Soviet Union did not foresee 9/11. Each correct prediction looks prescient in isolation. Across the full portfolio of predictions, the hit rate is indistinguishable from noise.
The social cost is not that experts are wrong. Everyone is wrong about complex systems. The cost is that society allocates credibility, capital, and policy influence based on the assumption that expert predictions in these domains are meaningfully better than non-expert guesses — an assumption Tetlock's data comprehensively demolished. We pay economists to forecast GDP growth. We pay strategists to predict competitive landscapes. We pay political analysts to forecast election outcomes. In each case, the credential creates a confidence premium that the track record does not support. The expert's authority comes not from demonstrated predictive accuracy but from the institutional expectation that expertise should confer predictive accuracy — an expectation that persists because nobody tracks the predictions systematically enough to expose the gap.
Tetlock identified two cognitive profiles among his experts. "Hedgehogs" — experts who knew one big thing and applied it everywhere — were the worst forecasters. They were confident, media-friendly, and spectacularly unreliable. "Foxes" — experts who drew from multiple frameworks, updated frequently, and expressed uncertainty — performed materially better, though still modestly. The fox's advantage was not superior intelligence. It was the willingness to treat every prediction as provisional and to revise when new evidence contradicted the prior view. The hedgehog's disadvantage was not stupidity. It was the psychological investment in a single explanatory framework that made disconfirming evidence feel like a personal attack rather than useful data.
The expert problem is not anti-expertise. It is a precision instrument for distinguishing the domains where expertise delivers genuine predictive power from the domains where it delivers only the performance of predictive power — and for recognising that the second category is far larger than our institutions acknowledge.
Section 2
How to See It
The expert problem is operating whenever someone with credentials in a complex domain is treated as a reliable predictor — and the track record of their predictions is never audited. The diagnostic is the absence of a scorecard. In domains where expertise genuinely predicts outcomes, practitioners keep score obsessively: surgeons track mortality rates, pilots track incident records, meteorologists track forecast accuracy. In domains where the expert problem dominates, nobody keeps score — because the results would be embarrassing.
You're seeing the Expert Problem when a credentialled authority makes confident predictions about complex systems, the predictions are treated as meaningfully more reliable than base-rate extrapolation, and no one tracks whether the predictions come true.
Investing
You're seeing the Expert Problem when Wall Street analysts issue twelve-month price targets with decimal-point precision for stocks in industries undergoing structural disruption. The analyst has a PhD, a Bloomberg terminal, and a proprietary model — and their consensus accuracy on forward earnings for the S&P 500 has averaged an error rate of roughly 30% over the past three decades. The apparatus of expertise is real. The predictive improvement over naive extrapolation is negligible. McKinsey's review of analyst forecasts found that analysts were no better at predicting earnings growth than a simple model that assumed growth would revert to long-term averages. The twelve-month target is not a forecast. It is a credential displayed as a number.
Startups
You're seeing the Expert Problem when venture capitalists cite pattern recognition from prior investments as the basis for forecasting which startups will succeed. VC hit rates on individual investments hover around 10–20% for returning capital, and the distribution of returns follows a power law where a tiny fraction of bets generate virtually all the profits. The "expertise" is real in deal sourcing, term negotiation, and portfolio support. It is largely illusory in predicting which specific company will become the outlier. The most honest VCs acknowledge this openly. The least honest present their winners as evidence of superior selection ability while their losers vanish from the narrative.
Leadership
You're seeing the Expert Problem when management consultants deliver strategic recommendations based on industry expertise — and the client treats those recommendations as predictions about what will work. The consultant's expertise in frameworks, competitive analysis, and organisational design is genuine. Their ability to predict whether a specific strategy will succeed in a specific market at a specific time is not meaningfully better than the client's own informed guess. Bain, McKinsey, and BCG do not publish their prediction track records. The absence of the scorecard is the tell.
Personal Decisions
You're seeing the Expert Problem when you defer to a financial advisor's market outlook as the basis for asset allocation timing — instead of treating the advisor's value as residing in tax planning, behavioural coaching, and portfolio construction. The advisor's credential signals competence in financial planning. It does not signal predictive ability in market direction. The distinction between what the credential actually qualifies and what the audience assumes it qualifies is the expert problem's operating mechanism.
Section 3
How to Use It
Decision filter
"Before deferring to any expert's prediction about a complex system, I ask two questions: Is this a domain with tight feedback loops and stable rules, or a domain where causation is opaque and outcomes are driven by nonlinear interactions? And does this expert have a verified track record of predictive accuracy in this specific domain — not adjacent credentials, not a prestigious title, but actual predictions scored against actual outcomes?"
As a founder
The expert problem immunises you against the most expensive form of bad advice: confident strategic guidance from people whose authority rests on credentials rather than demonstrated predictive accuracy. When a former Fortune 500 executive on your advisory board tells you how your market will develop, check whether their track record supports the confidence. When a management consultant presents a competitive landscape analysis as a prediction of where the market is heading, recognise that the analysis is a snapshot, not a forecast. The frameworks are useful. The predictions are unreliable.
Build your strategic planning process around scenario analysis rather than point forecasts. Instead of asking "what will happen?" ask "what are the three to five most plausible futures, and what would we do in each?" This shifts the value of expertise from prediction — where it fails — to preparation, where it genuinely helps.
As an investor
The expert problem is the reason that index funds outperform the vast majority of actively managed funds over time horizons exceeding ten years. Active management is built on the premise that expert analysis generates predictions accurate enough to justify the fees. Tetlock's data, Malkiel's data, and the SPIVA scorecards all demonstrate the same finding: in aggregate, expert stock-pickers do not outperform random selection after fees. The few who do outperform in any given period are indistinguishable from lucky coin-flippers until the sample size is far larger than any career permits.
Use expert analysis for what it delivers: frameworks for understanding a company's competitive position, capital allocation discipline, and risk management architecture. Do not use it for what it does not deliver: reliable predictions about which stocks, sectors, or asset classes will outperform over the next twelve months.
As a decision-maker
Inside organisations, the expert problem manifests as deference to the most senior person's prediction — regardless of whether seniority correlates with predictive accuracy in the domain under discussion. The corrective is to separate expertise in analysis from authority in prediction. A CFO's analysis of the company's financial structure is domain expertise operating in a kind environment. The same CFO's prediction of next year's revenue in a market undergoing disruption is a guess wrapped in a spreadsheet.
Create a prediction registry for your leadership team. Before major strategic decisions, require each decision-maker to record their specific prediction, their confidence level, and their reasoning. Review annually. The prediction registry converts the expert problem from an invisible bias into a measurable phenomenon — and the measurement itself changes behaviour, because executives who know their predictions are being tracked become more calibrated.
Common misapplication: Concluding that experts are useless. Expertise is extraordinarily valuable — in the right domains. A cardiologist's diagnostic expertise, a structural engineer's load calculations, a chess grandmaster's positional evaluation — these represent genuine predictive power earned through deliberate practice in environments with clear feedback. The expert problem targets the specific claim that this type of expertise transfers to complex, nonlinear systems where it does not.
Second misapplication: Assuming that because experts are bad at prediction, everyone is equally bad. Tetlock's research showed that "foxes" — experts who aggregated multiple perspectives, updated their views, and expressed calibrated uncertainty — outperformed "hedgehogs" by a meaningful margin. The expert problem does not equalise all forecasters. It identifies the conditions under which credentials create false confidence and the cognitive styles that partially mitigate the problem.
Section 4
The Mechanism
Section 5
Founders & Leaders in Action
The leaders below share a common operating principle: they recognised that expertise in understanding a system does not confer the ability to predict that system — and they built decision architectures that embraced this distinction. Their advantage was not ignoring experts. It was using experts for what expertise actually delivers — frameworks, pattern recognition, risk identification — while refusing to treat expert predictions as reliable forecasts in complex domains.
Charlie MungerVice Chairman, Berkshire Hathaway, 1978–2023
Munger's entire investment philosophy was a structural response to the expert problem. His insistence on "staying within the circle of competence" was not intellectual modesty — it was a precision instrument for distinguishing domains where his expertise conferred genuine analytical advantage from domains where it would produce only the illusion of understanding. Munger explicitly warned against the "man with a hammer" syndrome — the expert who applies their single framework to every problem, which is precisely Tetlock's hedgehog. His response was to study broadly across disciplines (building what he called a "latticework of mental models") while investing narrowly in businesses simple enough that prediction was partially tractable: durable consumer brands, regulated utilities, insurance float. Munger never claimed to predict macroeconomic outcomes. He claimed to identify businesses whose value was resilient to the predictions he could not make — and this reframing, from prediction to resilience, is the most operationally useful response to the expert problem ever articulated by a practitioner.
George SorosFounder, Soros Fund Management, 1969–2011
Soros built a multi-billion-dollar track record by treating the expert problem as a source of profit rather than a hazard to avoid. His theory of reflexivity — the idea that market participants' beliefs about the market change the market, which changes the beliefs, in a recursive loop — was a direct critique of expert models that assumed markets as external objects to be predicted. Soros argued that the economists and analysts trying to predict markets were themselves part of the system they were predicting, which made their predictions self-referentially unreliable. His trading strategy exploited the gap between expert consensus (which assumed stable equilibria) and market reality (which was driven by reflexive feedback loops that created booms and busts). The British pound trade of 1992, which netted over $1 billion, was a bet against the expert consensus that the European Exchange Rate Mechanism was sustainable. Soros did not claim to predict better than the experts. He claimed the experts were predicting within a framework that misunderstood the system's structure — and that the gap between their confidence and reality was a tradeable asset.
Bezos structured Amazon's strategic planning around what he called "regret minimisation" and "two-way door" decisions — frameworks that explicitly acknowledged the futility of expert prediction in fast-moving markets. Rather than forecasting which businesses would succeed, Bezos optimised for optionality: launch quickly, fail cheaply, and preserve the ability to reverse decisions that turn out to be wrong. AWS, Amazon's most profitable business, was not the product of expert prediction about cloud computing's trajectory. It emerged from an internal infrastructure decision that Bezos recognised could be externalised. The experts in 2006 predicted that enterprises would never trust a retailer with their computing infrastructure. Amazon ignored the expert consensus and tested the market. Bezos's annual letters repeatedly emphasise that Amazon's strategy does not depend on being right about the future — it depends on making many bets with asymmetric payoffs, so that the cost of the wrong predictions is small and the reward from the right ones is enormous.
Section 6
Visual Explanation
The top half maps the divergence: as domain complexity increases, expert predictive accuracy declines toward the dart-throwing baseline — but expert confidence remains flat. The gap between the confidence line and the accuracy line is the expert problem's operating cost. In kind domains (left), the lines nearly overlap: experts are accurate and appropriately confident. In wicked domains (right), confidence stays high while accuracy collapses to chance. The bottom half shows Tetlock's key finding: hedgehogs — the experts with one big theory — are the most confident and the least accurate. Foxes — the integrators and updaters — sacrifice the narrative clarity that makes hedgehogs media-ready, but they earn the modest accuracy improvement that represents the ceiling of prediction in complex systems.
Section 7
Connected Models
The expert problem does not operate in isolation. It interacts with the cognitive biases that sustain false authority, the analytical frameworks that expose it, and the downstream errors that compound when expert predictions are treated as reliable in domains where they are not.
Reinforces
Dunning-Kruger Effect
The expert problem and the Dunning-Kruger effect create a particularly dangerous feedback loop in complex domains. The Dunning-Kruger effect means the least competent are the most confident. The expert problem means that in wicked environments, even the most competent are no better at prediction than the less competent. The combination produces a landscape where confidence is inversely correlated with accuracy at the low end (Dunning-Kruger) and uncorrelated with accuracy at the high end (expert problem). The audience, unable to distinguish between the two — because both present with confidence — defaults to the most authoritative-sounding voice. Media amplifies this: the expert who says "I'm not sure, it could go either way" is never invited back. The expert who says "here's exactly what will happen" gets the prime-time segment. The selection process for public expertise is optimised for confidence, which the expert problem demonstrates is the wrong signal.
Reinforces
Narrative Fallacy
The narrative fallacy sustains the expert problem by converting failed predictions into coherent post-hoc explanations that preserve the expert's credibility. An economist predicts a recession. No recession occurs. The economist explains that unprecedented central bank intervention "delayed" the inevitable downturn — a narrative that is unfalsifiable, internally coherent, and allows the expert to maintain both the original thesis and their authority. The narrative fallacy means that expert prediction failures never accumulate into evidence against the expert, because each failure is individually explained away. Tetlock found that experts' explanations for why they were wrong were more sophisticated than their original predictions — the narrative machinery is most powerful when it is working to repair credibility rather than to generate insight.
Tension
All Models Are Wrong
Section 8
One Key Quote
"We reach the point of diminishing marginal predictive returns for knowledge disconcertingly quickly. In this age of academic hyperspecialization, there is no reason for supposing that contributors to top journals — distinguished economists, political scientists, journalists — are any better than journalists or attentive readers of the New York Times."
— Philip Tetlock, Expert Political Judgment (2005)
Tetlock's finding is not an attack on knowledge. It is a precision measurement of where knowledge stops converting into predictive accuracy. The word "disconcertingly" carries the weight of the passage — Tetlock expected expertise to help more than it did, and the data forced him to report a finding that the entire academic and policy establishment has incentives to suppress. Distinguished economists are not better at predicting GDP growth than attentive newspaper readers. Distinguished political scientists are not better at predicting regime change than informed generalists. The credentials certify deep knowledge of how the system works. They do not certify the ability to predict what the system will do — because what the system will do depends on interactions among more variables than any single mind can model, updated by feedback loops that operate faster than any analytical framework can incorporate.
The deepest implication is for capital allocation. Trillions of dollars are allocated annually based on expert economic forecasts, analyst price targets, and strategist outlooks. If the marginal predictive value of expertise in these domains is near zero — as Tetlock's data demonstrates — then the capital allocation premium paid for expert prediction is a tax on credulity. The money spent on expert forecasts buys confidence. It does not buy accuracy. The discipline is to pay for expertise where it delivers — analysis, frameworks, risk identification — and to stop paying for expertise where it does not deliver: point predictions about complex systems.
Section 9
Analyst's Take
Faster Than Normal — Editorial View
The expert problem is the single most underpriced insight in professional decision-making. Every industry runs on expert prediction — economic forecasts drive fiscal policy, analyst ratings drive capital allocation, strategic outlooks drive corporate planning — and the empirical evidence that these predictions are barely better than chance has been available for two decades with virtually no structural response.
The reason is incentive alignment, not ignorance. Everyone in the prediction industry knows the track records are poor. The economists know. The analysts know. The strategists know. They continue because the demand for expert prediction is not a demand for accuracy — it is a demand for confidence. A CEO who tells the board "we don't know what the market will do, so we've built for multiple scenarios" sounds uncertain. A CEO who presents McKinsey's market forecast as the basis for a capital allocation plan sounds strategic. The forecast buys organisational confidence, not predictive accuracy — and organisational confidence is what boards, investors, and employees are actually purchasing.
The most expensive manifestation of the expert problem is consensus-driven capital allocation. When every major bank's chief economist predicts 2.5% GDP growth, the consensus creates a false floor of certainty that shapes trillions of dollars in lending, investment, and risk-taking. The consensus is not more reliable than any individual forecast — it is the average of individually unreliable forecasts, which regression to the mean makes appear more stable than any single prediction but no more accurate in absolute terms. The 2008 financial crisis, the 2020 pandemic shock, and every major market disruption share a common feature: the expert consensus immediately before the shock indicated stability. The consensus did not fail because the experts were stupid. It failed because consensus in complex systems is the aggregation of models that share the same structural blindness — and the aggregation of shared blindness is not wisdom.
What Tetlock's later work showed — the Good Judgment Project and the superforecasting research — is that the expert problem has a partial solution, and the solution is methodological rather than credentialist. The superforecasters who outperformed intelligence analysts with classified access were not domain experts. They were calibration experts — people who broke complex questions into tractable components, updated their estimates frequently, assigned precise probabilities, and tracked their own accuracy obsessively. The improvement came not from knowing more about geopolitics or economics but from knowing more about the mechanics of prediction itself. This is the expert problem's most actionable implication: the skill of forecasting is distinct from the skill of domain expertise, and the former is trainable even when the latter is not predictively useful.
Section 10
Test Yourself
These scenarios test whether you can distinguish between domains where expert prediction is genuinely reliable and domains where credential-backed confidence is substituting for demonstrated accuracy. The critical diagnostic is always the same: does this domain have the feedback structure — tight loops, stable rules, repeatable patterns — that allows expertise to convert into predictive accuracy? Or is the domain complex, nonlinear, and characterised by delayed feedback that prevents calibration?
Is the Expert Problem operating here?
Scenario 1
A Nobel Prize-winning economist confidently predicts that the Federal Reserve will raise interest rates three times in the coming year, based on his inflation model. Financial media amplifies the prediction as 'expert consensus.' His model correctly predicted the direction of one of the last four rate-change cycles.
Scenario 2
A radiologist with 20 years of experience examines a chest X-ray and identifies a nodule that requires biopsy. Her diagnostic accuracy rate on similar cases, tracked by her hospital's quality assurance program, is 94%.
Scenario 3
A prominent technology analyst predicts that a specific AI startup will become a $100 billion company within five years, citing the founder's background, the TAM, and analogies to previous platform shifts. The analyst has made similar predictions about twelve startups in the past decade; two achieved the predicted outcome.
Section 11
Top Resources
The expert problem literature spans political science, psychology, decision theory, and investment management. The strongest foundation begins with Tetlock's empirical research, extends to Kahneman's cognitive architecture, and deepens with Taleb's philosophical framework for decision-making under radical uncertainty.
The foundational empirical study. Twenty years, 284 experts, 28,000 predictions. Tetlock's data demonstrated that expert predictions in complex political and economic domains were barely better than chance — and that the most famous, most confident experts were the least accurate. The book also identifies the fox-hedgehog distinction that provides the partial corrective: experts who integrate multiple frameworks and update frequently outperform those who apply a single theory with high conviction. Essential reading for anyone who makes decisions based on expert forecasts.
The constructive sequel to Expert Political Judgment. Where the first book demonstrated that experts are bad at prediction, this book identifies who is good at prediction and why. The superforecasters — who outperformed intelligence analysts with classified access — shared specific cognitive habits: breaking complex questions into components, assigning precise probabilities, updating frequently, and tracking their own accuracy. The book provides trainable techniques for improving forecasting calibration.
Taleb's treatment of expert failure in the face of extreme events provides the philosophical complement to Tetlock's empirical work. The concept of "epistemic arrogance" — the systematic overconfidence of experts in domains governed by fat-tailed distributions — explains why expert models fail most catastrophically precisely when they matter most. The chapters on prediction, narrative fallacy, and the ludic fallacy (the error of applying models from games to real-world complexity) are essential for understanding why expertise works in kind environments and fails in wicked ones.
Meehl's study — conducted a half-century before Tetlock's — demonstrated that simple statistical formulas outperformed expert clinical judgment in predicting outcomes ranging from parole violation to academic performance. The finding has been replicated in over 200 subsequent studies. Meehl's work establishes the foundational evidence for the expert problem: that even in domains where experts have extensive training and direct contact with the subject, mechanical rules frequently outperform their judgment.
Kahneman's chapters on expert intuition, the illusion of validity, and the distinction between "skilled intuition" and "overconfident judgment" provide the cognitive science that explains why the expert problem persists. His collaboration with Gary Klein on identifying when expert intuition is trustworthy versus when it is unreliable — kind environments versus wicked environments — gives the most precise theoretical framework for determining when to defer to experts and when to discount their predictions.
The Expert Problem — Expertise improves prediction in 'kind' domains with stable rules and fast feedback. In 'wicked' domains with complex causation and delayed feedback, expert predictions converge toward chance — but expert confidence does not.
George Box's dictum — "all models are wrong, but some are useful" — creates productive tension with the expert problem by reframing what expertise should deliver. The expert problem critiques the claim that expert models are predictive. Box's framework preserves the claim that expert models are useful — for understanding structure, identifying variables, and mapping relationships — while explicitly denying that utility implies accuracy. The tension is productive because it rescues expertise from the expert problem's most nihilistic implication. Experts are not useless. Their models are genuinely valuable for understanding how systems work. The error is treating the model's structural insights as predictive outputs. A macroeconomic model that identifies the relationship between interest rates and employment is useful. The same model deployed as a GDP forecast for next quarter is almost certainly wrong.
Tension
Circle of Competence
The circle of competence is the operational defence against the expert problem. Munger and Buffett's insistence on defining and staying within the boundary of genuine expertise is a direct response to the finding that credentials spill over into domains where they confer no predictive advantage. The tension is between expansion and restraint: the expert problem pushes toward narrowing one's claims of predictive authority, while career incentives push toward expanding them. The expert who says "I don't know" about adjacent domains sacrifices media appearances, consulting fees, and status — all of which reward breadth of confident opinion. The circle of competence formalises the discipline of refusing to trade credibility for reach.
Leads-to
Survivorship Bias
The expert problem compounds survivorship bias in the advice ecosystem. The experts who are most visible — the ones writing books, giving keynotes, appearing on television — are disproportionately those whose past predictions happened to be correct, regardless of whether their methodology was sound. The experts whose predictions failed are not invited back. The survivorship-filtered sample of visible experts creates the illusion that expert prediction works, because the evidence of failure has been removed. Tetlock's study was revolutionary precisely because it tracked the full population of predictions — including the failures that the public never sees. Without systematic tracking, the expert problem is invisible, sustained by the survivorship-biased visibility of the predictions that happened to be right.
Leads-to
Black Swan Theory
The expert problem is a primary mechanism through which Black Swan events cause disproportionate damage. Taleb's argument is that experts systematically underestimate tail risk — extreme events outside the range of historical experience — because their models are calibrated on the data that exists, which by definition excludes the unprecedented. When the unprecedented arrives — a pandemic, a financial system collapse, a geopolitical shock — the expert models fail simultaneously, and the institutions that relied on those models are unprepared. The expert problem does not cause Black Swans. It amplifies their damage by creating a false sense of preparedness. The more confident the expert consensus that "the system is stable," the more catastrophic the outcome when the system proves otherwise.
My operating framework: use experts for analysis, not for prediction. When an economist explains the transmission mechanism of monetary policy, I am receiving genuine expertise — the kind that improves my understanding of how the system works. When the same economist tells me where interest rates will be in twelve months, I am receiving a guess with better vocabulary than mine. The analysis is worth paying for. The prediction is not — and the conflation of the two is where the expert problem extracts its highest cost.
The practical defence is systematic prediction tracking. Keep a record of every expert prediction that influences your decisions — the economic forecasts you used for planning, the analyst ratings you used for investment, the strategic outlooks you used for resource allocation. Score them annually against outcomes. Within three years, you will have an empirical basis for determining which expert inputs are predictively useful and which are theatre. The exercise is simple. Almost no one does it. The gap between the simplicity of the solution and the rarity of its adoption tells you everything about the incentive structure that sustains the expert problem.