·Mathematics & Probability
Section 1
The Core Idea
Imagine you're hiking in dense fog. You climb until every direction slopes downward. You've reached the highest point you can find — but you can't see the mountain range. You might be standing on the tallest peak for miles, or you might be on a foothill while the true summit towers somewhere beyond the cloud line. This is the difference between a local maximum and the global maximum, and it is one of the most consequential distinctions in decision-making.
A local maximum is the best outcome reachable through incremental improvements from your current position. A global maximum is the best outcome available across the entire landscape of possibilities. The two are almost never the same place. The tragedy of most careers, companies, and strategies is that they optimise relentlessly for the local peak — making things better within the current framework — without ever asking whether an entirely different framework would yield a fundamentally higher ceiling.
The concept originates in mathematical optimisation. Given a function with multiple peaks and valleys — what mathematicians call a non-convex landscape — gradient ascent algorithms climb by always moving in the direction of steepest improvement. They are guaranteed to find a local maximum: a point where no small step in any direction produces a better result. They are not guaranteed to find the global maximum, because reaching it may require descending first — getting worse before you can get better. The algorithm has no mechanism for that descent. Neither do most humans.
The formal mathematics traces to the calculus of variations developed by Euler and Lagrange in the eighteenth century, and to the optimisation theory that emerged from operations research during World War II. George Dantzig's simplex method (1947) could find global optima for linear problems, but for nonlinear, non-convex problems — which describe nearly every interesting real-world situation — no deterministic algorithm guarantees finding the global maximum in polynomial time. The problem is NP-hard in general. This means that in complex landscapes with many local peaks, finding the absolute best solution is computationally intractable. You can find a good solution. You can rarely prove it's the best.
The breakthrough insight came from the physical sciences. In 1983, Kirkpatrick, Gelatt, and Vecchi published "
Optimization by Simulated Annealing" in Science, borrowing a metallurgical technique to solve combinatorial optimisation problems. The idea: heat a metal to high temperature (where atoms move freely and explore many configurations), then cool it gradually (allowing the system to settle into a low-energy crystalline structure). Translated into optimisation: start with high randomness, accepting worse solutions early to escape local traps, then progressively reduce randomness as the search converges on the global optimum. The algorithm works because it trades short-term degradation for long-term discovery. It gets worse on purpose so it can eventually get better.
This is precisely what humans and organisations find hardest to do. Loss aversion — Kahneman and Tversky's foundational finding from prospect theory — means the psychological pain of descending from a local peak is felt roughly twice as intensely as the potential gain from reaching a higher one. The certain loss of abandoning a working strategy looms larger than the uncertain gain of a potentially superior one. The result is systematic entrapment at local maxima: individuals stay in careers that are good but not great, companies iterate on products that are profitable but not transformative, investors hold positions that are adequate but not optimal. Each incremental improvement feels like progress. The landscape beyond the valley remains unexplored.
The concept maps cleanly onto fitness landscapes — a framework introduced by Sewall Wright in 1932 to describe evolutionary dynamics. Wright imagined a multidimensional landscape where each point represents a combination of genetic traits and the height represents the organism's fitness. Evolution, like gradient ascent, climbs toward local peaks through incremental mutation and selection. But the fittest possible organism might require a combination of traits that can only be reached by passing through less-fit intermediate forms. Wright showed that small, isolated populations were more likely to discover global maxima because genetic drift — random fluctuation — could push them off local peaks and into valleys that might lead to higher ground. Large populations, by contrast, were trapped on local peaks by the sheer weight of their own optimisation. The evolutionary insight maps directly onto corporate strategy: large, successful organisations are the most likely to be trapped at local maxima, because their size and success make the descent into uncertainty feel irrational.
Stuart Kauffman extended Wright's framework in the 1990s with his NK fitness landscape model, demonstrating that as the number of interacting variables (K) increases, the landscape becomes increasingly rugged — more local peaks, steeper valleys, and greater distance between local and global optima. The more complex the system, the harder it is to find the global best. A startup with three variables (product, market, pricing) navigates a relatively smooth landscape. A multinational corporation with hundreds of interdependent variables (product lines, geographies, regulatory environments, organisational structures, partner relationships) navigates a landscape so rugged that the global maximum may be unreachable through any sequence of incremental moves. The only path to the highest peak may require a discontinuous leap — what business strategists call a pivot and what Kuhn called a paradigm shift.
The concept shows up with different vocabulary across every domain that matters. In machine learning, it's the problem of gradient descent converging to suboptimal solutions — the reason researchers use techniques like random restarts, momentum, and learning rate scheduling to shake models out of local minima. In career theory, it's the "golden handcuffs" problem — high-paying positions that are locally optimal but globally suboptimal because they prevent exploration of higher-ceiling trajectories. In product strategy, it's feature-factory syndrome — teams that optimise an existing feature set because the metrics are clear and the improvements are measurable, while the fundamentally better product architecture remains unbuilt because nobody wants to reset the dashboard to zero. In investment theory, it's the disposition effect — the tendency to hold losing positions too long and sell winning positions too early, which Hersh Shefrin and Meir Statman documented in 1985, and which functions as a local-maximum trap in portfolio construction.
The practical power of the model is diagnostic. It forces a question that incremental thinking never asks: is the ceiling on my current trajectory high enough? You can optimise a horse-drawn carriage for decades — lighter materials, better suspension, faster horses — and never arrive at the automobile. Every improvement is real. Every improvement is progress. And every improvement takes you further from the insight that the entire framework should be abandoned. The local maximum of horse-drawn transport is infinitely below the global maximum of combustion-engine transport, and no amount of optimisation within the first paradigm reaches the second. You have to descend — abandon the working system — and cross a valley of uncertainty before the higher peak becomes accessible.