Use this when you need a credible forecast on something nobody can measure directly — emerging technology timelines, market size in five years, regulatory trajectories, geopolitical risk. The Delphi Method structures anonymous, iterative rounds of expert judgment to produce convergent estimates without the distortions of face-to-face debate, status hierarchies, and the loudest voice in the room.
Section 1
What This Tool Does
Put ten smart people in a room and ask them to estimate when autonomous vehicles will capture 20% of new car sales. What happens is predictable and depressing. The most senior person speaks first — or the most confident, which is often worse — and their number becomes the anchor. The group clusters around it. Dissenters self-censor because disagreeing with the VP of Strategy in front of peers carries career risk that disagreeing with a forecast does not. Someone with genuine domain expertise in battery technology stays quiet because the conversation has already moved to regulatory timelines, which is the CEO's hobby horse. The group converges on a number that feels like consensus but is actually one person's guess with nine people's implicit endorsement. This is not collective intelligence. It's collective capitulation.
Olaf Helmer and Norman Dalkey at the RAND Corporation understood this in the early 1950s, when the U.S. Air Force needed forecasts about Soviet nuclear capabilities and the available data was, to put it gently, insufficient. They couldn't run experiments. They couldn't build models — the variables were too uncertain and too political. What they had was experts: physicists, intelligence analysts, military strategists, each holding a piece of the puzzle but none holding the whole picture. The question was how to combine those pieces without the social dynamics that corrupt group judgment.
Their answer was elegant in its simplicity. Remove the room. Give each expert a questionnaire. Collect the responses anonymously. Aggregate the results — medians, interquartile ranges, distributions. Feed the aggregate back to the panel, along with the anonymised reasoning behind outlier positions. Then ask everyone to revise their estimates in light of the group data. Repeat. Two rounds, sometimes three, rarely more than four. The core mechanism is controlled feedback without social pressure — experts learn what others think and why, but never who thinks it, which means they can update their beliefs based on arguments rather than authority. The method was classified for nearly a decade. When RAND finally published it in 1963, it carried the name of the Oracle at Delphi — a fitting allusion to prophecy derived from structured consultation.
What makes the Delphi Method more than a glorified survey is the iteration. A single anonymous poll captures initial impressions. The feedback-and-revision cycle is where the real work happens. Experts who were anchored on a narrow frame see the range of estimates and realise their confidence was unwarranted.
Outliers who hold genuinely novel information get a mechanism to explain their reasoning without the social penalty of being the contrarian in a room full of nodding heads. The group doesn't converge because of conformity pressure — it converges because information flows. When it works, the final-round estimate is measurably more accurate than the first-round estimate, and substantially more accurate than the average of unstructured group discussions. When it doesn't work, the reasons are almost always procedural: bad panel selection, poorly framed questions, or too few rounds to let the information actually circulate.
The Delphi Method occupies a specific niche in the decision-making toolkit. It is not for problems where data exists and models can be built — use the models. It is not for problems where a single expert clearly knows more than everyone else — just ask that expert. It is for the genuinely uncertain, the multi-dimensional, the problems where no individual has enough information but a structured collective might. Technology forecasting. Strategic planning under deep uncertainty. Policy design where the consequences are long-term and the evidence base is thin. These are the domains where human judgment, properly aggregated, remains the best instrument available.