When a measure becomes a target, it ceases to be a good measure.
The sentence is eight words long and describes a failure mode so pervasive that it operates, undetected, inside virtually every organisation that uses quantitative targets — which is to say, virtually every organisation.
Charles Goodhart, a British economist advising the Bank of England, first articulated this principle in a 1975 paper on monetary policy. His original formulation was narrower — "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes" — and was aimed specifically at the problem of using monetary aggregates as policy targets. If the central bank targeted M3 money supply growth, the relationship between M3 and inflation that had made M3 a useful indicator would break down, because financial institutions would find ways to shift activity outside the measured category.
The anthropologist Marilyn Strathern generalised Goodhart's observation in 1997 into the version that has become canonical: "When a measure becomes a target, it ceases to be a good measure." The expansion was decisive. Goodhart was talking about monetary economics. Strathern was talking about everything.
The mechanism is straightforward and ruthless. A metric is chosen because it correlates with something you care about — customer satisfaction, educational achievement, code quality, national productivity. The metric is useful precisely because people aren't trying to manipulate it. Then you make it a target. You attach rewards to it, or punishments for missing it.
At that moment, the humans in the system redirect their intelligence and effort from producing the underlying outcome to producing the number. The correlation between the metric and the thing it was supposed to measure begins to decay. Not because people are malicious, but because they are rational. The metric is legible, concrete, and attached to consequences. The underlying goal is ambiguous, multidimensional, and hard to verify. Rational agents, operating under time and resource constraints, will always converge on the legible target.
Soviet factories provide the most cited historical illustration. When Moscow set production targets by weight, factories produced absurdly heavy chandeliers and nails the size of railroad spikes. When targets shifted to unit count, factories produced millions of tiny, useless nails. The factories hit every target. The economy got neither the chandeliers nor the nails it needed. The planners weren't stupid. They were fighting a structural law: the act of targeting a proxy for quality predictably severs the proxy from quality.
British colonial India surfaced the same dynamic in a different guise. The British government in Delhi, concerned about the number of venomous cobras, offered a bounty for every dead cobra presented. Enterprising citizens began breeding cobras for the income. When the authorities discovered the farms and cancelled the bounty, the breeders released their now-worthless snakes, increasing the cobra population beyond its original level. The "cobra effect," as it's now known, is Goodhart's Law operating through the gap between what was measured (dead cobras) and what was desired (fewer living cobras). The metric responded perfectly to the incentive. The outcome moved in the opposite direction.
The phenomenon isn't confined to command economies or colonial misadventures. The No Child Left Behind Act of 2001 measured school quality by standardised test scores in reading and maths. Schools responded rationally: they narrowed curricula to tested subjects, eliminated art, music, and science instruction, and focused resources on students near the proficiency cutoff — the "bubble kids" — whose score improvements would most efficiently move the metric. A 2007 RAND Corporation study found that score gains on state-specific tests were two to five times larger than gains on the National Assessment of Educational Progress, which measured the same skills but carried no institutional stakes. The schools were producing scores, not learning. Goodhart's Law explains the gap.
In technology, the pattern recurs with metronomic regularity. When YouTube optimised for watch time in 2012, its recommendation algorithm began surfacing increasingly extreme and sensationalist content — because outrage, conspiracy, and emotional escalation held eyeballs longest. Watch time rose. The quality of the user experience — the thing watch time was supposed to correlate with — degraded.
When Facebook optimised for engagement, its algorithm learned that divisive political content generated more clicks, comments, and shares than any other category. Engagement rose. Social cohesion, and Facebook's own brand equity, fell.
In each case, the metric was originally a reasonable proxy. The moment it became the target, the system optimised for the proxy at the expense of the thing the proxy was supposed to represent.
The financial sector generates the highest-stakes examples. When banks targeted quarterly earnings-per-share growth in the 2000s, executives discovered that share buybacks — funded by debt, with no corresponding improvement in underlying business performance — were the fastest way to move the number. EPS rose because the denominator (shares outstanding) shrank, not because the numerator (earnings) grew.
Between 2003 and 2023, S&P 500 companies spent over $8 trillion on buybacks, and a substantial portion of that spending was driven not by genuine capital allocation logic but by executive compensation tied to EPS targets. The metric rewarded financial engineering. The underlying businesses received the investment that was left over.
The deepest version of Goodhart's Law operates at the level of incentive design. Every KPI dashboard, every quarterly OKR, every compensation structure tied to measurable outputs is a Goodhart's Law experiment. The question is never whether gaming will occur. Gaming is the predicted response of intelligent agents to any explicit target. The question is whether the game that emerges from the target system produces outcomes aligned with what you actually want — or outcomes that merely produce numbers you like looking at while the real performance deteriorates underneath.
Section 2
How to See It
Goodhart's Law is operating wherever the numbers look excellent and the reality feels wrong. The tell is a growing gap between the dashboard and the ground truth — a divergence that everyone senses but nobody wants to name because naming it means admitting the measurement system is broken.
Business
You're seeing Goodhart's Law when Wells Fargo employees open 3.5 million fake accounts between 2002 and 2016 to meet cross-selling quotas. The metric — "products per household," targeted at eight — was selected because customers with more products were more profitable and less likely to churn. True, as a correlational observation. Catastrophic, as a target. The branch employees didn't become fraudsters because they were dishonest. They became fraudsters because the shortest path between the target and the reward ran through fabrication. The metric hit 6.11 products per household. The bank paid $3 billion in fines and its CEO resigned. The number was pristine. The reality was criminal.
Technology
You're seeing Goodhart's Law when a software engineering team, measured on lines of code committed per sprint, begins writing verbose, redundant implementations and splitting single logical changes into multiple commits. The metric rises. Code quality falls. Review cycles lengthen. Bugs per release increase. The team is producing more of what is counted and less of what counts. Any experienced engineering leader has seen a version of this — the deployment frequency metric that produces meaningless micro-deploys, the ticket closure rate that incentivises splitting work into trivially small units, the uptime SLA that is gamed by redefining what constitutes "downtime."
Education
You're seeing Goodhart's Law when universities climb in the U.S. News & World Report rankings by gaming the methodology rather than improving education. Northeastern University rose from No. 162 to No. 49 between 1996 and 2023 through systematic reverse-engineering of the ranking algorithm — increasing application volume to lower acceptance rates, reclassifying alumni donations to inflate giving rates, and adjusting class sizes to hit threshold breakpoints in the formula. The ranking improved. Whether the education improved at the same rate is a question the ranking is structurally incapable of answering, because the ranking is now the target.
Healthcare
You're seeing Goodhart's Law when a hospital system reduces reported surgical mortality rates by refusing to operate on the sickest patients. New York State's public reporting of cardiac surgery mortality by surgeon, introduced in 1989, produced a measurable decline in reported mortality. It also produced evidence that surgeons were declining high-risk cases — patients who most needed intervention — because those patients would damage the surgeon's public scorecard. A 2005 study in the Journal of Health Economics found that risk-adjusted mortality improvements were significantly smaller than raw mortality improvements, suggesting that selection effects accounted for much of the apparent gain. The metric improved. The patients who were turned away didn't appear in the data at all.
Section 3
How to Use It
The primary application of Goodhart's Law is defensive: before attaching any incentive to any metric, ask what behaviour the metric will produce when it becomes the target — not just the intended behaviour, but the unintended optimisation paths that intelligent agents will discover. The operating assumption should be adversarial: assume the smartest person on your team will find the cheapest way to move the number, and design the metric system so that the cheapest path is also the most valuable one.
Decision filter
"For every metric you're about to target, ask: if my team could only optimise this number and nothing else, what is the worst behaviour that would still technically hit the target? If that behaviour is destructive, the metric needs a counterweight or a redesign before you attach incentives to it."
As a founder
Build measurement systems that pair output metrics with input metrics and quality counterweights. Revenue alone is gameable — salespeople close unprofitable deals, offer unsustainable discounts, or pull forward future revenue into the current quarter. Revenue paired with gross margin, customer retention at 90 days, and NPS creates a system where gaming any single metric damages another. The goal isn't a perfect metric — that doesn't exist. The goal is a constellation of metrics where optimising for one at the expense of the others produces a visible distortion that triggers review.
Jeff Bezos embedded this logic into Amazon's operating system. The company's internal metrics emphasise controllable input metrics — selection breadth, delivery speed, page load time — rather than output metrics like revenue or stock price. The reasoning: output metrics are lagging indicators that invite manipulation. Input metrics are leading indicators of customer experience, and improving them tends to produce the output metrics as a downstream consequence. Amazon tracks hundreds of these input metrics weekly. The dashboard isn't the destination — it's a navigation instrument.
As an investor
When evaluating a company, look at the gap between the metrics management highlights and the metrics they don't discuss. Every earnings call presents the numbers that look best. Goodhart's Law says those are precisely the numbers most likely to have been optimised at the expense of something unmeasured.
Ask specifically: what metrics did the team choose not to target? A management team that can articulate why they've deliberately avoided targeting certain numbers — and what those numbers might do if targeted — demonstrates a sophistication about measurement that correlates with long-term operational quality. The absence of a target can be as revealing as the presence of one.
Charlie Munger observed that the businesses he most trusts are the ones where management spends more time discussing what could go wrong than what is going right. The same principle applies to metrics: the leaders who understand Goodhart's Law spend time on what they're not measuring and why. The investor's job is to determine whether the metrics being presented reflect operational excellence or operational theatre.
As a decision-maker
Rotate metrics. A metric that has been a target for more than two to three years is almost certainly being gamed — consciously or unconsciously. The gaming becomes institutionalised: teams develop processes optimised for the metric rather than the outcome, and those processes calcify into "how we do things here."
Andy Grove's OKR framework at Intel addressed this by requiring that objectives be qualitative and key results be time-bound and refreshed quarterly. The quarterly refresh wasn't just administrative — it was a deliberate anti-Goodhart's mechanism. By changing which key results mattered every ninety days, the system made it harder for teams to build durable gaming strategies. The targets moved before the gaming could mature. Google adopted OKRs in 1999 for the same reason: not because quarterly goals are intrinsically superior, but because rotating targets resist corruption better than fixed ones.
Common misapplication: Goodhart's Law is not an argument against measurement. It's an argument against naive measurement. Founders who respond to the insight by abandoning metrics entirely — "we don't do KPIs because they'll just get gamed" — have replaced one failure mode with another. Without measurement, you have no feedback signal at all, which is worse than a distorted one. The discipline is designing measurement systems that are aware of their own corruptibility — that build in counterweights, rotation, and qualitative review alongside the quantitative targets.
A second misapplication: using Goodhart's Law retroactively to explain any metric that didn't deliver the desired result. Not every failed metric is a Goodhart's failure. Sometimes the metric was simply wrong — measuring the wrong thing, not measuring the wrong thing well. The diagnostic question is specific: did the metric improve while the underlying outcome deteriorated? If both declined together, the problem is likely measurement error, not Goodhart's corruption. If the metric improved while the outcome worsened, you're looking at Goodhart's. The divergence is the signature.
Section 4
The Mechanism
Section 5
Founders & Leaders in Action
The leaders who navigate Goodhart's Law successfully share a common trait: they treat measurement as a tool that requires constant recalibration, not a set-and-forget control system. They design for the inevitable gaming rather than pretending it won't occur. What separates them from average operators isn't their choice of metrics — it's their relationship to metrics. They hold every number with suspicion, pair every target with a counterweight, and maintain the organisational discipline to question whether the dashboard is describing reality or performing a flattering simulation of it.
Grove didn't just pioneer OKRs — he built a measurement philosophy explicitly designed to resist Goodhart's corruption. His central insight, articulated in High Output Management (1983), was that every metric must be paired with a "counter-metric" that captures what the first metric might destroy.
Manufacturing yield targets, for instance, were always paired with quality audit results. If a fab line hit its yield numbers but quality scores declined, the signal was immediate: the line was probably reclassifying defective units rather than producing better chips. Production volume was paired with customer return rates. Revenue targets were paired with design win metrics that indicated future competitive position. Hiring speed was paired with new-hire performance reviews at six months.
The principle extended to every function. No metric at Intel existed without its shadow — a second number designed to reveal whether the first was being gamed.
Grove called this "pairing indicators" and was explicit about why: "For every metric, there is a resistance metric. Without the pair, you're only seeing half the picture — the half the organisation wants you to see." The framework acknowledged that his own employees were intelligent agents who would find optimisation paths he hadn't anticipated. Rather than trying to design un-gameable metrics — which he regarded as impossible — he designed metric systems where gaming one dimension produced a visible distortion in another. Intel's quarterly business reviews were structured around these paired metrics, and Grove reportedly spent more time on the gaps between paired indicators than on the indicators themselves.
Bezos's most underappreciated strategic innovation is Amazon's metric architecture. In the 2009 shareholder letter, he wrote that the company tracked 452 detailed goals for the year, and 360 of them — nearly 80% — had the words "customer," "selection," "speed," or "eliminate" in them, while the word "revenue" appeared in only a handful.
This was a deliberate Goodhart's countermeasure. Bezos understood that targeting revenue directly would produce behaviour optimised for short-term extraction: price increases, cost-cutting that degraded the customer experience, cross-selling that annoyed rather than served. Instead, Amazon targeted the inputs that caused revenue — product selection, delivery speed, page load time, defect rates — on the theory that if those inputs improved continuously, revenue was a mathematical certainty. The insight was structural: output metrics are maximally gameable because the agent controls the measurement. Input metrics are harder to game because they're closer to the physical reality of the customer experience.
The approach required uncommon discipline. When Wall Street analysts pressed Bezos on quarterly earnings, his response was consistent: the moment we target the number you're asking about, we destroy the thing that produces it. For years, Wall Street punished Amazon's stock for this refusal. Analysts wanted earnings guidance. Bezos gave them defect rates. The stock underperformed the market in multiple years during the 2000s. But the measurement architecture held, and the compounding effects of genuinely improving customer experience — rather than performing the appearance of improvement through metric manipulation — eventually produced the output metrics that Wall Street had been demanding all along. Amazon's $574 billion in 2023 revenue is the downstream consequence of a measurement philosophy that refuses to target revenue.
Netflix's metric evolution is a case study in recognising and correcting Goodhart's distortion in real time. In the DVD era, the company tracked subscriber growth as its primary metric. This worked as long as subscriber growth correlated with customer satisfaction. When Netflix launched streaming, the company initially targeted viewing hours — the assumption being that more hours watched meant more value delivered.
By 2016, Hastings began publicly questioning this metric. The problem: targeting viewing hours incentivised the content algorithm to recommend whatever maximised watch time — which often meant autoplaying mediocre content that viewers didn't actively choose and didn't remember enjoying. The metric was rising, but customer satisfaction surveys showed flattening or declining satisfaction among power users. The proxy had decoupled from the underlying value.
Netflix shifted to a composite approach: retention probability, title-level completion rates, and "choose to view" metrics that distinguished active selection from passive autoplay. Hastings described the logic in a 2019 earnings call: "We compete for screen time, but we don't want to win that competition by being the junk food of entertainment. We want members to feel their time was well spent."
The metric redesign was an explicit Goodhart's correction — acknowledging that the original target had begun incentivising content and algorithmic decisions that undermined the customer relationship it was supposed to proxy for. The willingness to retire a metric that was performing well on its own terms, because it had decoupled from the underlying value, is the hallmark of a leadership team that understands measurement corruption. Most companies never make this correction. They celebrate the rising line and wonder, months later, why the business is deteriorating beneath it.
Ed CatmullCo-founder & President, Pixar, 1986–2019
Catmull built Pixar's creative process around a principle he articulated in Creativity, Inc. (2014): the most important things about a movie's quality are precisely the things that cannot be measured. Story resonance, emotional depth, visual originality — none of these have quantitative proxies that survive being targeted.
Pixar's response was to separate the creative development process entirely from measurable production targets. The "Braintrust" — a group of senior directors and writers who reviewed each film at multiple stages — operated without authority, without metrics, and without formal scoring. Their feedback was qualitative, specific, and non-binding. No dashboard tracked Braintrust recommendations or measured compliance with their suggestions.
Catmull was explicit about why: the moment you attach a number to creative quality, people start managing the number instead of making the movie. He had seen this pattern at other studios where executive mandates for specific test screening scores produced films that were mathematically optimised for test audience responses — and creatively lifeless. The films scored well in previews and died at the box office because the metric was capturing novelty reaction, not lasting quality.
When Disney acquired Pixar in 2006, Catmull spent significant effort preventing Disney's more metric-heavy production management from being imposed on Pixar's process. Pixar produced fifteen consecutive commercially and critically successful films between 1995 and 2010 — a record unmatched in Hollywood history. The success emerged from a system designed to be unmeasurable — not because measurement is bad, but because Catmull understood that in creative domains, the most important outcomes are precisely the ones that resist quantification. Target them, and you destroy them.
Section 6
Visual Explanation
Goodhart's Law describes a predictable sequence: a useful metric is selected, promoted to a target, and then systematically corrupted by the optimisation pressure it attracts. The top half of the diagram shows the four-step corruption process. The bottom half shows the divergence over time: before the metric becomes a target, it tracks the underlying outcome closely. After it becomes a target, the metric continues to rise while the outcome it was meant to capture declines. The growing gap between the two lines is the Goodhart's signal — invisible on the metric's own chart, obvious when you plot both.
Section 7
Connected Models
Goodhart's Law sits at the intersection of incentive design, measurement theory, and systems thinking. It connects to models that explain why metrics fail, how to detect the failure, and how to build systems that resist it. Some of these connections reinforce the same insight from different angles. Others create productive tension that sharpens the application.
Reinforces
Incentive-Caused Bias
Incentive-Caused Bias explains the psychological mechanism through which Goodhart's Law operates. The metric becomes a target. The target creates an incentive. The incentive warps cognition. The Wells Fargo employee opening fake accounts isn't cynically gaming a system — they've internalised the metric as the definition of good performance, exactly as Munger's model predicts. Goodhart's Law describes the structural phenomenon. Incentive-Caused Bias describes what happens inside the human brain as the phenomenon unfolds. Together, they explain both why metric corruption occurs and why the people doing the corrupting genuinely don't see it.
The Agency Problem describes the misalignment between principals (who set targets) and agents (who pursue them). Goodhart's Law explains why the most common solution to agency problems — measurable targets with attached incentives — often makes the misalignment worse. The principal selects a metric as a proxy for the outcome they want. The agent, who has information advantages about how to game the metric, optimises for the proxy rather than the outcome. The metric dashboard shows alignment. The reality shows divergence. Every performance management system in every corporation is simultaneously an agency problem and a Goodhart's experiment. The two models together explain why "pay for performance" — the dominant compensation philosophy of the past fifty years — produces so many perverse outcomes despite its intuitive logic.
Tension
[Simplify](/mental-models/simplify)
Simplification says: reduce complexity to the essential signal. Cut the noise. Focus. Goodhart's Law says: the moment you simplify a complex phenomenon into a single measurable signal and target it, you've created a distortion field. The tension is real and productive. You need simplification to make decisions — you cannot manage what you cannot measure. But the act of simplifying a multidimensional outcome into a low-dimensional metric creates the exact conditions under which Goodhart's corruption occurs. The resolution isn't to avoid simplification. It's to simplify with awareness that every simplification creates a gaming surface, and to design counterweights accordingly. The best measurement systems simplify the signal while preserving enough dimensionality to resist single-axis gaming.
Section 8
One Key Quote
"When a measure becomes a target, it ceases to be a good measure."
— Marilyn Strathern, 'Improving Ratings': Audit in the British University System (1997)
Section 9
Analyst's Take
Faster Than Normal — Editorial View
Goodhart's Law is the mental model I apply most frequently when evaluating companies, and the one I find most consistently underestimated by the people running them.
The pattern is nearly universal: a founder or executive describes their KPI dashboard with pride — growth metrics trending up, churn metrics trending down, engagement scores at all-time highs. I ask one question: "What behaviour is this metric rewarding that you don't want?" The conversation either gets very productive or very uncomfortable, depending on whether the person has asked themselves that question before.
The deeper problem isn't metric gaming. It's metric blindness. Most leaders don't even recognise when Goodhart's Law is operating because the dashboard — the only feedback mechanism they've built — shows exactly what they want to see. The salesperson who pulls revenue forward by offering unsustainable discounts shows up as a top performer this quarter and a churn problem next year. The engineering team that ships features rapidly by accumulating technical debt shows up as high-velocity now and a reliability crisis in eighteen months. The metric tells a true story about itself and a false story about reality, and the leader who trusts the metric is being deceived by their own measurement system.
The companies I've seen navigate this best share three design principles. First, paired metrics: every target has a counterweight that captures what the target might destroy. Revenue paired with retention. Velocity paired with quality. Growth paired with unit economics. The pair creates a tension that makes gaming visible. Second, metric rotation: targets change every one to three quarters, not because the previous targets were wrong, but because any target that persists long enough will be gamed. The rotation forces the organisation to demonstrate genuine capability rather than metric-specific optimisation. Third, qualitative override: someone with authority regularly reviews the ground truth behind the numbers and has the mandate to declare that the numbers are lying. At Amazon, this is the "andon cord" principle — anyone can halt a process when the reality doesn't match the data.
The most sophisticated operators don't trust their own metrics. Bezos at Amazon, Grove at Intel, Hastings at Netflix — each built systems on the assumption that their metrics would eventually be gamed, and designed the organisation to detect and correct the gaming in real time. That's not cynicism. It's Goodhart's Law taken seriously as an engineering constraint.
Section 10
Test Yourself
The following scenarios test whether you can identify Goodhart's Law in action — and distinguish it from related but distinct phenomena like ordinary incompetence, honest error, or incentive structures that happen to work. The key diagnostic: is the metric diverging from the outcome it was supposed to represent?
Is this mental model at work here?
Scenario 1
A police department, under pressure to reduce crime statistics, reclassifies felonies as misdemeanours and discourages officers from filing reports for minor crimes. The city's published crime rate drops 18% in one year. Independent victimisation surveys show no change in the actual rate of crime experienced by residents.
Scenario 2
A SaaS company targets Net Promoter Score as its primary customer satisfaction metric. The customer success team begins timing NPS surveys to arrive immediately after resolving support tickets — when satisfaction is highest — rather than at random intervals. NPS rises from 42 to 67 in six months. Renewal rates remain flat.
Scenario 3
A venture-backed startup sets a target of reaching 10,000 daily active users before its Series A. The growth team achieves the target through a combination of incentivised referrals, content marketing, and a freemium model with generous free-tier access. Monthly retention at 90 days is 45%, above the industry benchmark of 35%.
Scenario 4
A country ties teacher compensation to student performance on standardised maths exams. Over five years, maths scores rise 22%. Science scores, which are not tied to compensation, decline 14%. Teachers report spending 80% of instructional time on tested maths content and reducing science, history, and art instruction to minimal levels.
Section 11
Top Resources
The best thinking on Goodhart's Law spans economics, measurement theory, and organisational design. The literature is thinner than you'd expect for a concept this important — partly because the law operates across so many domains that no single discipline has fully claimed it. Start with Muller for the comprehensive treatment, then build operational depth with Grove and Austin.
The most comprehensive treatment of Goodhart's Law applied across domains. Muller examines metric fixation in education, healthcare, policing, the military, business, and philanthropy, documenting the consistent pattern: targets that were supposed to improve performance instead distort it. His distinction between "metric fixation" and useful measurement is essential for anyone designing incentive systems. Dense with case studies, including the NYPD CompStat programme and the VA hospital wait-time scandal.
Grove's chapter on performance indicators contains the clearest operational framework for resisting Goodhart's corruption. His principle of paired indicators — every output metric coupled with a quality counterweight — is the most practical anti-Goodhart's mechanism in management literature. The book predates the formal popularisation of Goodhart's Law in management circles by two decades, but Grove was already designing around it.
03
Measuring and Managing Performance in Organizations — Robert D. Austin (1996)
Book
Austin's distinction between "motivational measurement" (metrics used to influence behaviour) and "informational measurement" (metrics used to understand a system) is the theoretical foundation for understanding when Goodhart's Law activates. His central argument: measurement dysfunction occurs specifically when metrics are used motivationally, because agents have an incentive to manage the metric rather than the process. When metrics are purely informational, the distortion doesn't occur. Out of print but available secondhand, and its framework has influenced a generation of engineering management thinking.
The essay that produced the canonical formulation of Goodhart's Law. Strathern's analysis of the British Research Assessment Exercise — in which universities gamed audit metrics to the point where the metrics measured gaming skill rather than research quality — remains the clearest demonstration of how institutional intelligence redirects itself from producing value to producing numbers. Academic in tone but devastating in implication. The single sentence it contributed to the discourse has shaped how a generation of operators thinks about measurement.
Bezos's description of Amazon's 452 goals — 80% targeting controllable input metrics rather than output metrics — is the best real-world example of a measurement architecture designed to resist Goodhart's Law at scale. The letter explains why Amazon targets selection, speed, and defect reduction rather than revenue: because input metrics are harder to game and more directly connected to the customer experience that produces revenue as a downstream consequence. The letter is short, specific, and demonstrates the rare discipline of a CEO who understood that the metrics you refuse to target can matter more than the ones you do. Essential reading for any founder designing a KPI system.
Goodhart's Law — A useful measure is chosen as a target. Agents optimise for the measure. The measure and the underlying outcome diverge. The metric improves while the reality it was meant to capture deteriorates.
Feedback loops are supposed to be self-correcting: measure the output, adjust the input. Goodhart's Law describes the condition under which feedback loops become self-deceiving instead of self-correcting. If the measurement in the loop is corrupted — because agents are optimising the metric rather than the outcome — the loop amplifies the distortion. More "positive" metric signals produce more investment in the metric-gaming behaviour, which produces more positive signals. The loop runs faster and more confidently in precisely the wrong direction. YouTube's recommendation algorithm is the canonical example: an engagement feedback loop that, unchecked, amplified increasingly extreme content because the engagement metric was capturing compulsion, not satisfaction.
Leads-to
Second-Order Thinking
Goodhart's Law is a first-order insight: targets corrupt metrics. The natural next step is second-order analysis: what happens after the metric is corrupted? When schools teach to the test, the second-order consequence isn't just worse education — it's a generation of workers who are credentialled but unprepared, which compounds into labour market dysfunction, employer distrust of credentials, and credential inflation. When hospitals game surgical mortality rates, the second-order effect is that the sickest patients are denied care, increasing mortality in the population the reporting system was designed to protect. Goodhart's Law identifies the corruption. Second-Order Thinking traces where the corruption leads.
Leads-to
[Inversion](/mental-models/inversion)
Once you understand Goodhart's Law, the natural design question inverts: instead of asking "what metric should we target?", ask "what metric, if targeted, would produce the most destructive gaming behaviour?" Then don't target that one. Inversion transforms Goodhart's from a diagnostic tool into a design tool. Before committing to any KPI, run the inverted scenario: assume the team will game it maximally. What does the worst-case gaming look like? Is that outcome tolerable? If not, redesign the metric before attaching incentives. The leaders who manage measurement well — Grove, Bezos — use this inverted logic reflexively. They ask "what will this metric destroy?" before asking "what will this metric track?" The inverted question is the single most effective pre-deployment diagnostic for any new KPI, and it takes less than five minutes to run.
One pattern I've noticed is generational. First-time founders tend to set simple, aggressive numerical targets — "10x revenue this year," "$1M ARR by Q4" — because the targets feel motivating and concrete. They learn Goodhart's Law empirically, usually during a painful board meeting where the numbers look great and the business is falling apart underneath. Second-time founders build paired metrics from day one. They've been burned. The best investors I know evaluate this sophistication explicitly: "Tell me about a metric you stopped tracking and why" is a question that reveals more about operational maturity than any revenue chart.
The limitation worth naming: Goodhart's Law can become a thought-terminating cliche. "Any metric will be gamed, so why bother measuring?" is not the lesson. The lesson is that measurement is a design problem that requires ongoing engineering, not a solved problem that runs on autopilot. Metrics are indispensable. Naive faith in metrics is dangerous. The space between those two truths is where organisational intelligence lives.