In 1747, a Scottish naval surgeon named James Lind had a problem. Scurvy was killing more British sailors than enemy action — roughly 1,400 men had died on George Anson's circumnavigation just five years earlier, most from the disease. Dozens of proposed remedies circulated: vinegar, seawater, sulfuric acid, cider, barley water, fresh air. Every ship's surgeon had a favourite. None of them had evidence.
Lind did something unprecedented. He selected twelve sailors with scurvy at similar stages of the disease, divided them into six pairs, and gave each pair a different treatment while keeping all other conditions constant. The pair receiving oranges and lemons recovered within a week. The other ten continued to deteriorate. It was the first controlled clinical trial in recorded history — and a demonstration that systematic testing could resolve questions that centuries of argument, authority, and tradition had failed to settle.
The scientific method is the formalisation of what Lind did intuitively: observe a phenomenon, form a testable explanation, design an experiment that could prove that explanation wrong, run the experiment, analyse the results, and revise the explanation accordingly. Then repeat. The cycle has no terminal point. Every conclusion is provisional — the best explanation the evidence currently supports, held until better evidence arrives.
The intellectual architecture took centuries to assemble. Francis Bacon laid the groundwork in Novum Organum (1620), arguing that knowledge should be built from systematic observation and induction rather than from Aristotelian deduction and received authority. Galileo demonstrated the principle physically, dropping objects from the Leaning Tower of Pisa (probably apocryphal) and rolling balls down inclined planes (definitely real) to show that heavier objects don't fall faster than lighter ones — contradicting Aristotle's claim, which had gone untested for nearly two millennia because nobody had thought to check.
Isaac Newton codified the approach in the Principia (1687), deriving universal gravitation from Kepler's observational data and his own laws of motion. Antoine Lavoisier applied rigorous measurement to chemistry in the 1770s, disproving the phlogiston theory by showing that combustion required oxygen — and demonstrating that careful weighing could settle what philosophical argument could not. Each contribution added a structural element: Bacon contributed systematic induction, Galileo contributed controlled experimentation, Newton contributed mathematical formalisation, Lavoisier contributed quantitative precision.
Karl Popper sharpened the entire framework in The Logic of Scientific Discovery (1934, English translation 1959) with a single, devastating criterion: a theory is scientific only if it is falsifiable — if there exists an observation that could, in principle, prove it wrong. Popper's insight reorganised the hierarchy of intellectual merit. A theory that explains everything predicts nothing. A theory that makes specific, falsifiable predictions and survives repeated attempts to disprove it earns provisional trust proportional to the severity of the tests it has passed. The more opportunities a hypothesis has had to fail, and hasn't, the more seriously you should take it.
The method's history is also a history of institutional resistance to its conclusions. In 1847, the Hungarian physician Ignaz Semmelweis observed that maternity wards staffed by doctors had mortality rates five times higher than those staffed by midwives. His hypothesis: doctors were carrying "cadaverous particles" from autopsy rooms to delivery wards. He instituted a handwashing policy with chlorinated lime solution. Mortality dropped from roughly 10% to under 2% in months. The medical establishment rejected his findings for two decades — not because the data was weak, but because the implication (that doctors were killing their own patients) was socially intolerable. Semmelweis died in an asylum in 1865, thirty years before germ theory vindicated him completely. The data was never the problem. The method produced the right answer. The institution couldn't accept it.
This pattern — method produces evidence, institution resists conclusion — recurs with sufficient regularity that it should be treated as a structural feature, not an anomaly. The method is a machine for generating truths that are sometimes inconvenient. Its value is proportional to the degree of inconvenience it can withstand.
The non-obvious insight: the method's power doesn't come from generating correct answers. It comes from systematically eliminating incorrect ones. Every failed experiment narrows the space of possibilities. The process is subtractive, not additive — a disciplined pruning of the hypothesis tree through contact with evidence. Most people misunderstand this. They think science is about proving things right. It is about finding out what survives being proven wrong.
This subtractive logic is what makes the method transferable to business, investing, and decision-making. A founder running A/B tests, an investor stress-testing a thesis against disconfirming data, a manager running a controlled pilot before rolling out a new process — each is applying the same structural discipline: state a falsifiable hypothesis, design a test that could kill it, run the test honestly, and update.
The uncomfortable corollary: most organisations and most people don't actually do this. They form conclusions, then seek confirming evidence. They design experiments that cannot fail. They interpret ambiguous results as validation. The scientific method isn't hard to understand. It's hard to practise, because every step of the cycle creates opportunities for self-deception — and the human brain is exquisitely designed to take them.
Consider how rarely the method appears in corporate strategy. A company deciding to enter a new market typically builds a business case — a narrative that justifies the decision already made. What they almost never do is state the specific conditions under which they would abandon the initiative, measure against those conditions on a fixed cadence, and actually pull the plug when the kill criteria are met. That process — which is nothing more than the scientific method applied to capital allocation — would prevent the majority of failed expansions, unsuccessful product launches, and value-destroying acquisitions that characterise large-company strategy. The method is available. The organisational will to use it honestly is not.
Section 2
How to See It
The method's signature is unmistakable once you know what to look for: someone translating uncertainty into a structured test rather than an argument. The tell is specificity — not "I think this will work" but "if this hypothesis is correct, we should observe X within Y timeframe, and if we observe Z instead, the hypothesis is wrong."
The absence of the method is equally recognisable: confident assertions without falsification criteria, "data-driven" decisions where the data was selected after the decision, and experiments designed so that every possible outcome counts as confirmation.
Technology
You're seeing Scientific Method when an engineering team deploys a feature to 5% of users before a full rollout, defines success metrics in advance, and establishes a kill criterion — the specific result that would cause them to revert the change. The discipline isn't the A/B test itself. It's the pre-commitment to a falsification threshold before the data arrives. Teams that define success criteria after seeing results are performing confirmation rituals, not experiments.
Business
You're seeing Scientific Method when a founder runs a pre-sale campaign before building the product — not as a marketing tactic but as a hypothesis test. "If 200 people will pay $50 for this before it exists, the demand is real. If fewer than 50 respond, the hypothesis is falsified and I need a different approach." The Kickstarter campaigns that produced Pebble, Oculus, and hundreds of less famous products were structured experiments with explicit success criteria.
Investing
You're seeing Scientific Method when an investor articulates the specific conditions under which they would exit a position before entering it. George Soros's reflexivity framework operationalised this: maintain a working hypothesis, define the evidence that would invalidate it, and reverse the position when that evidence appears. The traders who survive decades are the ones who treat every position as a hypothesis subject to revision — not a conviction to defend.
Science
You're seeing Scientific Method when a researcher publishes a result alongside the exact protocol needed to replicate it — inviting others to try to prove the finding wrong. The replication crisis in psychology, where roughly 60% of landmark studies failed to replicate in the Open Science Collaboration's 2015 analysis, revealed which subfields had been practising the method and which had been performing its theatre. Real science is the hypothesis that survives hostile replication. Everything else is storytelling with error bars.
Section 3
How to Use It
Decision filter
"What specific, observable outcome would prove this assumption wrong? If I can't answer that question, I don't have a hypothesis — I have a belief. And beliefs don't self-correct."
As a founder
Treat every strategic assumption as a hypothesis with an expiration date. Your product roadmap isn't a plan — it's a set of testable predictions about what customers will value. Structure each feature launch as an experiment: define the metric, set the threshold, run the test, and commit in advance to what you'll do if the results disappoint. Jeff Bezos codified this at Amazon, where every significant initiative began with a written hypothesis and a measurable prediction. The teams that produced the most reliable results weren't the ones with the best ideas. They were the ones with the fastest test cycles and the most honest relationship with negative results.
As an investor
Before allocating capital, write down the three to five falsifiable predictions your thesis depends on. "Revenue will grow at 30% for three years" is a prediction. "The competitive moat will hold" is a wish. The distinction matters because predictions can be tracked against reality on a quarterly cadence — and when two of your five predictions fail in the first year, the method demands you revise the thesis rather than rationalise the miss. Ed Thorp ran his entire career this way: every trade was a hypothesis, every outcome was data, and the portfolio evolved through the same iterative revision cycle that governs experimental physics.
As a decision-maker
When your team presents a recommendation, ask one question: "What result would cause you to reverse this recommendation?" If they can't answer, the recommendation isn't grounded in evidence — it's grounded in narrative. Andy Grove built Intel's strategic planning around "strategic inflection points" — moments when new evidence demanded wholesale revision of the operating thesis. The discipline wasn't identifying the inflection point. It was building an organisation willing to act on it when the evidence arrived, even when acting meant abandoning a profitable strategy.
Common misapplication: The most dangerous misuse is treating the method as a single pass rather than a cycle. Running one experiment, getting a positive result, and declaring the hypothesis "proven" isn't science — it's a press release. The method requires repeated testing, ideally under varied conditions, with genuine attempts at falsification. A single A/B test that shows a 3% conversion lift is a data point. Ten tests across different cohorts, geographies, and time periods that consistently show a lift — that's evidence. The distinction between a data point and evidence is where most organisations stop too early.
Section 4
The Mechanism
Section 5
Founders & Leaders in Action
The scientific method isn't confined to laboratories. Its most consequential applications over the past century have come from people who applied systematic hypothesis testing to domains where intuition, authority, and convention had previously governed — and produced results that convention said were impossible.
What unifies these cases isn't scientific training, though several have it. It's the willingness to subordinate opinion to evidence, to treat negative results as information rather than failure, and to iterate through the observe-hypothesise-test-revise cycle faster than competitors who operate on instinct and committee consensus.
The range is instructive: a physicist deriving the behaviour of subatomic particles, a chemist isolating unknown elements by hand, a CEO building experimentation infrastructure at planetary scale, a mathematician applying hypothesis testing to financial markets, and a rocket engineer treating explosions as data. The method is domain-agnostic. The discipline is universal.
Feynman's approach to quantum electrodynamics in the late 1940s was the scientific method operating at its highest frequency. The existing formalism for calculating electron-photon interactions was technically correct but computationally impractical — pages of integrals for a single scattering event, each containing divergent infinities that had to be painstakingly removed.
Feynman didn't accept the existing framework and refine it. He went back to the observable phenomena — scattering cross-sections, energy levels, magnetic moments — and asked what formalism would predict those observables most directly. His path integral formulation and the diagrams that followed (now universally called Feynman diagrams) weren't just a shortcut. They were a reconceptualisation derived from insisting that theory serve prediction, not the other way around.
The diagrams let physicists calculate in hours what previously took weeks. They predicted the electron's anomalous magnetic moment to ten decimal places — the most precise agreement between theory and experiment in the history of physics. Seventy-five years later, the diagrams remain the standard notation in particle physics. Not because they're elegant — though they are — but because they work. The predictions match the measurements. That's all the method asks.
Feynman's 1974 Caltech commencement address, "Cargo Cult Science," distilled his philosophy: the method requires "a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty." He described South Pacific cargo cults that built imitation airstrips, complete with wooden headphones and bamboo antennas, hoping to attract planes. Everything looked right. Nothing worked. The analogy to organisations that perform experimental theatre without intellectual honesty — running tests designed to confirm, not to falsify — was precise and devastating.
In 1897, Marie Curie began investigating Henri Becquerel's recent discovery that uranium compounds emitted mysterious rays. Her first hypothesis was straightforward: the radiation was a property of uranium itself, not a chemical reaction. She tested it by measuring the radiation from every uranium compound she could obtain, and from uranium metal, and found that the radiation intensity was proportional to the amount of uranium present — regardless of the compound's chemical form. Hypothesis confirmed.
Then she noticed something that didn't fit. Pitchblende — the ore from which uranium was extracted — was more radioactive than pure uranium. This anomaly was the critical observation. Pitchblende shouldn't be more radioactive than its own primary component unless the ore contained another, unknown radioactive element.
Curie spent the next four years testing this hypothesis through what can only be described as industrial-scale experimentation: chemically processing tons of pitchblende in an unheated shed, separating it into fractions, and measuring the radioactivity of each fraction to track the unknown element through the separation process. She and Pierre discovered two new elements — polonium in July 1898 and radium in December 1898. She isolated pure radium chloride in 1902, confirming the hypothesis with a measurable atomic weight.
The four-year process was the scientific method operating under conditions of extreme physical hardship and near-zero funding. Each separation was a test. Each measurement refined the hypothesis. The protocol was so rigorous that her notebooks remain too radioactive to handle without protective equipment, over a century later.
Curie won two Nobel Prizes — Physics in 1903 and Chemistry in 1911 — the only person to win in two different sciences. Both awards recognised not just the discoveries but the experimental methodology: systematic, quantitative, replicable. Her work established radioactivity as a measurable physical property, not a curiosity, and created the template for experimental nuclear physics that would dominate the twentieth century. The method, faithfully applied under adverse conditions, didn't just find two elements. It founded a field.
Bezos described Amazon's operating philosophy as "our success is a function of how many experiments we do per year, per month, per week, per day." This isn't metaphor. Amazon's product development runs on a formalised version of the scientific method that Bezos embedded in the company's operational DNA from its earliest years.
Every significant initiative at Amazon begins with a six-page narrative memo that functions as a written hypothesis: here is what we believe, here is the evidence supporting it, and here are the specific predictions we're making about customer behaviour. The memo format forces precision that PowerPoint obscures — you cannot hide a weak hypothesis behind bullet points and transitions.
The experimentation infrastructure is industrial. Amazon reportedly ran over 12,000 A/B tests in a single year during the mid-2010s, each structured with pre-defined success criteria. The one-click purchase button, the recommendation engine, Prime's free shipping threshold, the Kindle hardware pricing strategy — each emerged from iterative testing cycles where negative results were as valued as positive ones because they narrowed the hypothesis space.
AWS itself began as an internal experiment. The hypothesis: developers outside Amazon would pay for the same infrastructure tools Amazon had built for itself. Bezos didn't commission a market study. He tested it. The first AWS services launched in 2006. By 2024, AWS generated over $90 billion in annual revenue. The hypothesis survived contact with reality — which is all the method ever promises.
Simons built Renaissance Technologies on an explicit translation of the scientific method into quantitative finance. His hiring criterion was revealing: he recruited mathematicians, physicists, and computational linguists — scientists accustomed to the hypothesis-test-revise cycle — and avoided Wall Street traders accustomed to operating on conviction and narrative.
The Medallion Fund's process was structurally identical to bench science. Researchers identified statistical patterns in historical market data (observation). They formulated models predicting that those patterns would persist (hypothesis). They tested those models against out-of-sample data that the model hadn't seen (experiment). They measured performance against predicted outcomes (analysis). Models that failed the out-of-sample test were discarded without sentiment. Models that passed entered live trading — where they continued to be monitored against predictions and retired when performance degraded.
The fund averaged roughly 66% gross annual returns from 1988 to 2018 — a track record unmatched in the history of finance. Simons attributed the performance not to any single brilliant insight but to the velocity and honesty of the revision cycle. "We're right about 50.75% of the time," he reportedly said. "But we're 100% disciplined about cutting what's wrong."
That formulation — marginal accuracy compounded by rigorous error correction — is the scientific method's value proposition expressed as a return profile. The fund was closed to outside investors from 1993 onward, partly because external capital creates narrative pressure that corrupts the willingness to discard models when the data demands it. Simons understood that the method's integrity depended on insulating the revision process from the social costs of admitting error.
SpaceX's development methodology is the scientific method applied to hardware at a cadence the aerospace industry considered reckless. The traditional approach — inherited from NASA's Apollo-era protocols — required exhaustive paper analysis, multi-year design reviews, and zero tolerance for test failures. SpaceX inverted the protocol: build fast, test fast, fail fast, learn from the failure data, revise, test again.
The Starship programme epitomises the approach. Between 2019 and 2024, SpaceX built and flew a series of full-scale prototypes, many of which exploded spectacularly on landing. Each explosion was an experiment generating data that simulations couldn't replicate. SN8 in December 2020 executed a perfect belly-flop manoeuvre but landed too hard and detonated. SN9 repeated the flight profile with revised engine relight timing. SN10 landed, then exploded minutes later. SN15, incorporating revisions from all prior failures, landed successfully in May 2021.
The pattern — observe failure, form hypothesis about cause, modify design, test again — is the same cycle Curie used to isolate radium. The domain is different. The logic is identical. Each failed prototype cost millions but generated data that simulations alone could never produce. The traditional aerospace approach — analyse for years, build once, pray it works — is analogous to a scientist who reads the literature but never enters the lab.
Musk's willingness to accept visible, public failure as a feature of the process — rather than a career risk to be avoided — is what separates SpaceX's development velocity from the rest of the industry. When Starship's first integrated flight test in April 2023 ended in a mid-air explosion, Musk's team had cameras and telemetry capturing every millisecond. Within weeks, the failure data had been incorporated into design revisions. By the fourth flight test in June 2024, every major objective was achieved. Four cycles. Eighteen months. The method, compressed to its tightest operational tempo.
Section 6
Visual Explanation
The scientific method as an iterative cycle — observe, hypothesise, predict, test, analyse. Falsification drives revision. Survival through testing builds confidence.
Section 7
Connected Models
The scientific method doesn't operate in isolation. It interacts with other mental models in ways that either amplify its effectiveness or create productive tensions that sharpen thinking. Understanding these connections — which models compound the method's power, which ones resist it, and which ones naturally follow from it — turns a single tool into a network.
Reinforces
First Principles Thinking
First principles decomposes a problem to its fundamental truths. The scientific method tests whether those truths are actually true. The two form a natural sequence: decompose, then verify. Elon Musk's battery cost analysis was first principles reasoning (what do the materials actually cost?) followed by scientific method (build prototypes, test at scale, measure whether the cost reduction holds in production). First principles without testing is speculation. Testing without decomposition is undirected. Together, they produce conclusions grounded in both logic and evidence.
Reinforces
Bayes' Theorem
Bayesian updating is the mathematical engine inside the method's revision step. When a test produces results, Bayes tells you exactly how much to revise your confidence: proportional to the surprise value of the evidence. A result that was equally likely under both your hypothesis and its alternative teaches you nothing. A result that was highly unlikely under your hypothesis but expected under the alternative demands a large update. Jim Simons's Medallion Fund operationalised this connection — every model was a hypothesis, every trade was a test, and every outcome updated the posterior through explicit Bayesian calculations.
Tension
Narrative Fallacy
The scientific method demands that conclusions follow from data. The narrative fallacy — Nassim Taleb's term for the compulsive human tendency to impose coherent stories on random events — demands that data follow from conclusions. The tension is structural: humans remember stories, not datasets, and a compelling narrative about why a product will succeed can survive any amount of contradictory A/B test data if the storyteller is charismatic enough. The method is the antidote, but an antidote that requires constant reapplication because the narrative instinct never diminishes.
Section 8
One Key Quote
"The first principle is that you must not fool yourself — and you are the easiest person to fool."
— Richard Feynman, Cargo Cult Science, Caltech Commencement Address, 1974
Section 9
Analyst's Take
Faster Than Normal — Editorial View
The scientific method is the most powerful general-purpose reasoning tool humans have produced. It is also the most commonly misapplied, because the gap between performing the method and performing its theatre is invisible to people doing the latter.
The theatre looks like this: a team runs an experiment, gets a positive result, declares victory, and ships. They never defined a falsification criterion in advance. They never considered what a negative result would look like. They ran one test, not ten. They interpreted a noisy signal as a clean confirmation because the result aligned with what the team's most senior person already believed. This isn't the scientific method. It's bureaucratic confirmation bias with an experimental veneer.
The real method is uncomfortable. It requires you to specify, in writing, before you see any data, what result would change your mind. That act — pre-committing to a revision criterion — is where 90% of so-called "data-driven" organisations fail. They're willing to collect data. They're not willing to let data overrule the strategy they've already committed to emotionally and politically.
The founders and investors who use this model effectively share a specific trait: they treat negative results as the primary output, not a failure mode. Bezos celebrates failed experiments at Amazon because each one eliminates a hypothesis that would otherwise have consumed resources indefinitely. Simons's researchers at Renaissance discard models without sentiment when out-of-sample tests fail. SpaceX engineers analyse explosion footage with the same dispassion a chemist brings to a failed synthesis. The information content of a well-structured failure is often higher than the information content of a success, because failure is specific — it tells you exactly what went wrong — while success is ambiguous — it could be due to your hypothesis being correct, or to luck, or to a confounding variable you didn't measure.
The organisational challenge is severe. The method requires a culture where being wrong is cheap and being dishonest about data is expensive. Most organisations have the incentives inverted: being wrong is career-threatening and massaging data is rewarded with continued funding. Pharmaceutical companies that suppress unfavourable trial results, startups that cherry-pick cohorts for board presentations, research labs that publish only positive findings — all are systematically degrading the method's error-correction mechanism. The method doesn't fail. The institutions hosting it fail.
There's a cost dimension that most discussions of the method ignore. Each test consumes time, capital, attention, and sometimes credibility. SpaceX's exploding prototypes cost millions per failure. Amazon's failed product launches — the Fire Phone, Amazon Destinations, Amazon Local — consumed years of engineering effort. The method doesn't say "test everything." It says "test the assumptions that carry the most risk at the lowest cost." Designing the right experiment is as important as running it. The founders who waste resources running expensive tests on hypotheses that could have been falsified with a spreadsheet and three phone calls are misapplying the model as badly as those who skip testing entirely.
Section 10
Test Yourself
The scientific method sounds simple in description and proves surprisingly difficult in application — particularly under organisational pressure, where admitting a hypothesis has failed carries real social and political cost.
These scenarios test whether you can distinguish genuine hypothesis testing from its common impersonators — confirmation theatre, post-hoc rationalisation, and selective evidence interpretation.
Is this mental model at work here?
Scenario 1
A SaaS company's growth team runs 30 variations of a landing page simultaneously, finds that one variant outperforms the control by 12%, and immediately rolls it out to all traffic. They did not pre-register which metric defined success, and the winning variant was selected from the highest performer across multiple metrics.
Scenario 2
A pharmaceutical company runs a Phase III clinical trial for a cancer drug. The trial is double-blinded, randomised, with a pre-registered primary endpoint (overall survival at 24 months). The drug fails to meet the primary endpoint. The company publishes the full results, including the negative outcome, in a peer-reviewed journal.
Scenario 3
A startup founder surveys 500 potential customers, finds that 85% say they would 'definitely' or 'probably' purchase the product at $29/month, and uses this data to raise a Series A. The survey was conducted with a self-selected audience from the founder's existing social media following.
Scenario 4
In 2011, physicists at CERN's OPERA experiment observed neutrinos appearing to travel faster than light — a result that, if confirmed, would have overturned Einstein's special relativity. Rather than announcing a discovery, the team published the anomalous result and invited the global physics community to identify errors in their methodology. Within months, a loose fibre optic cable was found to have introduced a timing error of 73 nanoseconds.
Section 11
Top Resources
The best resources on the scientific method span its philosophical foundations, its historical development, and its application beyond laboratory science. Start with Popper for the logic, move to Kuhn for the sociology, and finish with the practitioners who translated the method into operational discipline outside the lab.
The foundational text on falsificationism. Popper's argument — that what distinguishes science from non-science is the willingness to specify conditions under which a theory would be wrong — remains the sharpest demarcation criterion available. Dense but essential. Read it alongside his more accessible Conjectures and Refutations (1963) if the original proves too demanding.
Kuhn's account of how scientific paradigms form, harden, accumulate anomalies, and eventually collapse under the weight of contradictory evidence. The book introduced "paradigm shift" into common usage but its deeper value is its sociology of science — the mechanisms by which scientific communities resist revision until resistance becomes untenable. Essential for anyone who wants to understand why organisations ignore evidence that contradicts their operating assumptions.
Feynman's memoir is the scientific method in action across physics, biology, art, and safecracking. His "Cargo Cult Science" commencement address (available separately) is the single best short essay on the difference between performing the method and performing its theatre. His insistence on deriving results from first principles rather than trusting authority is the method's spirit distilled to its purest form.
Zuckerman's account of Jim Simons and Renaissance Technologies is the most detailed available study of the scientific method applied to financial markets. The book documents how physicists and mathematicians built a systematic process of hypothesis testing, out-of-sample validation, and iterative revision that produced the greatest track record in investment history. Read it as a case study in what happens when you staff a trading floor with scientists instead of traders.
Bezos's collected shareholder letters and speeches reveal Amazon's experimental culture from the inside. His repeated emphasis on the "institutional yes" — making it easy to run experiments and hard to block them — is the scientific method translated into corporate governance. The 2015 letter on the distinction between "one-way door" and "two-way door" decisions is particularly valuable: it explains when experimentation is warranted and when the cost of testing exceeds the cost of deciding.
Tension
Confirmation Bias
Confirmation bias is the method's oldest and most persistent enemy. The method says: design experiments that could falsify your hypothesis. Confirmation bias says: design experiments that will validate it. Every scientist, founder, and investor who has ever unconsciously structured a test to produce the answer they wanted has experienced this tension. Peter Wason's 1960 experiments demonstrated that even when explicitly instructed to seek disconfirming evidence, most subjects couldn't do it. The method's formal protocols — blinding, randomisation, pre-registration of hypotheses — exist precisely because the human brain will not pursue falsification voluntarily.
Leads-to
[Feedback](/mental-models/feedback) Loops
The scientific method is itself a feedback loop: observation feeds hypothesis, which feeds prediction, which feeds testing, which feeds revised observation. But the method also reveals feedback loops in the systems it studies. Curie's observation that pitchblende was more radioactive than pure uranium was a signal from a feedback loop she hadn't yet identified — the decay chain of radioactive elements, where one element's decay product is another radioactive element. Understanding the method trains you to see iterative, self-reinforcing processes everywhere, because the method's own structure makes recursive causation intuitive.
Leads-to
Iteration [Velocity](/mental-models/velocity)
Once you've internalised the method, the natural question becomes: how fast can we cycle? The gap between SpaceX and traditional aerospace isn't the quality of their engineers. It's the speed of their test-revise loop. Amazon's 12,000 annual A/B tests aren't better individually than a competitor's 200 — they compound faster. The scientific method leads directly to iteration velocity as a competitive variable, because the method's value is proportional to the number of cycles you can complete before your resources or your competitors' patience runs out.
The connection has a practical implication for founders: invest in the infrastructure that makes testing cheaper and faster before you invest in the tests themselves. Amazon built its experimentation platform before it ran experiments at scale. Renaissance built its data infrastructure before it tested models at speed. The method's returns compound with cycle velocity, and cycle velocity is an engineering problem that yields to investment.
Experiments are not free.
The relationship between the method and intuition deserves more attention than it gets. Intuition is not the enemy of the method — it's the source of hypotheses. Curie's intuition that pitchblende contained an unknown element preceded the four years of experimentation that confirmed it. Feynman's visual intuition about particle interactions preceded the mathematical formalism of his diagrams. The method doesn't replace intuition. It disciplines it — subjecting intuitive leaps to structured verification before they harden into unexamined assumptions. The founders I respect most have strong intuitions and weak attachments to them. They generate hypotheses fast and kill them faster.
One more thing worth stating directly: the method does not produce certainty. It produces provisional confidence proportional to the severity of the tests survived. A hypothesis that has passed a thousand attempts at falsification is not proven true — it has merely not been proven false yet. Newton's gravitational theory passed every test for over two centuries before Einstein showed it was an approximation. The intellectual humility this demands — holding your best conclusions lightly, always ready to revise — is the method's hardest requirement and its deepest gift. The founders who operate this way don't just make better decisions. They make better second decisions, and third decisions, because the revision habit compounds into an increasingly accurate model of reality.