In the spring of 2016, Mondelez International offered $23 billion to buy The Hershey Company. The bid represented a roughly 10% premium to Hershey's already-elevated market capitalization and would have merged Oreo, Cadbury, and Chips Ahoy with Reese's, Kisses, and Twizzlers into a transatlantic confectionery colossus — the kind of deal investment bankers build entire careers hoping to close. Hershey's board rejected it. Not because the price was wrong, though they said it was, but because the Hershey
Trust Company — the entity
Milton Hershey created in 1905 to fund a school for orphaned boys, and which still controls approximately 80% of the voting power of the company's stock — would not let the company be sold. The Trust's fiduciary obligation ran not to shareholders seeking a liquidity event but to disadvantaged children seeking an education. The most consequential corporate governance mechanism in American confectionery is, and has been for over a century, an orphanage.
This is the paradox at the center of Hershey: a publicly traded company whose controlling shareholder is a philanthropic trust, whose products are sold in ninety countries but whose identity is indivisible from a single small town in central Pennsylvania, whose competitive moat was built in the early twentieth century on a formula for milk chocolate that European chocolatiers consider barely palatable — and whose stock has compounded at an annualized rate of roughly 11% over four decades. Hershey is not a startup story, not a disruption narrative, not a platform play. It is something rarer and, in certain respects, more instructive: a study in how brand permanence, structural governance, and relentless incrementalism can generate extraordinary long-term returns in a commodity-adjacent industry where the raw material — cocoa — is subject to wild price swings, consumer preferences shift with every wellness trend, and the competition includes two of the most formidable private companies on earth.
By the Numbers
Hershey at a Glance
$11.2BNet sales, FY2024
~20%Operating margin (historical average)
#1U.S. confectionery market position
~80%Voting power held by Hershey Trust
100+Brands in portfolio
$1.6BNet income, FY2022
130+Years of continuous operation
~19,000Employees worldwide
The Candy Man's Theorem
Milton Snavely Hershey was born on September 13, 1857, in Derry Township, Pennsylvania, to a Mennonite family fractured by his father Henry's serial entrepreneurial failures. Henry was a dreamer — charming, literate, perpetually chasing the next scheme — while Fanny Hershey was austere, practical, and deeply religious. The marriage dissolved under the weight of Henry's wandering. Milton, who never progressed past fourth grade, was apprenticed at age fourteen to a confectioner in Lancaster, Pennsylvania. He absorbed the trade the way certain people absorb language: totally, instinctively, without needing to understand the grammar.
What followed was a decade of failure. Hershey started a candy business in Philadelphia in 1876. It went bankrupt. He tried again in Denver, learning to make caramels with fresh milk — a technique that would prove decisive — then moved to New York. That venture failed too. He returned to Lancaster in 1886, humiliated and nearly broke, and started the Lancaster Caramel Company with borrowed money. This time it worked. The fresh-milk caramels were superior. By the early 1890s, the business was generating over $1 million in annual sales.
But Hershey had seen something at the 1893 World's Columbian Exposition in Chicago that rewired his ambition. A German manufacturer was demonstrating chocolate-making machinery. Caramels, Hershey reportedly told associates, were a fad. Chocolate was permanent. In 1900, he sold the Lancaster Caramel Company for $1 million — a staggering sum, equivalent to roughly $37 million today — and kept only the chocolate-manufacturing equipment. He was betting everything on a single product category, and he was forty-three years old.
Michael D'Antonio's biography
Hershey: Milton S. Hershey's Extraordinary Life of Wealth, Empire, and Utopian Dreams captures the almost reckless clarity of this pivot. Hershey didn't just want to make chocolate. He wanted to democratize it. In 1900, chocolate was a luxury good — handmade, expensive, consumed by the wealthy. Hershey's insight, which he shared with
Henry Ford's insight about automobiles, was that the real money was in making a luxury product affordable through industrial-scale production. He would make milk chocolate — richer, sweeter, more accessible than the dark European varieties — and sell it for a nickel.
Give them quality. That's the best kind of advertising in the world.
— Milton Hershey, as quoted in various biographical accounts
Building a Town to Build a Brand
The factory Hershey constructed was not in a city. It was in the middle of Derry Township's dairy country — the same rural landscape where he'd been born. His logic was characteristically concrete: milk chocolate required enormous quantities of fresh milk. Pennsylvania dairy country provided it. He built not only a factory but an entire town: paved streets named Chocolate Avenue and Cocoa Avenue, worker housing, a trolley system, a department store, a bank, churches, a community center, an amusement park, a zoo. Hershey, Pennsylvania — "The Sweetest Place on Earth" — was a company town in the fullest sense, designed and financed by one man's conviction that the environment in which people worked determined the quality of what they produced.
This was not mere philanthropy, though it contained genuine idealism. It was a vertically integrated brand strategy executed before the concept had a name. The town was the marketing. Hershey famously refused to advertise nationally — a stance the company maintained for nearly seventy years, from its founding until 1970. While competitors like Mars spent aggressively on print and radio, Hershey relied on the wrapper itself (tossed on the ground, it was a free billboard), word of mouth, and the magnetic pull of the town, which drew tourists who left as brand evangelists. The Hershey bar's brown-and-silver wrapper became, through sheer ubiquity, as recognizable as the Coca-Cola script.
The no-advertising policy was not, strictly speaking, rational. It was ideological — rooted in Milton Hershey's Mennonite conviction that quality spoke for itself and in his shrewd understanding that the story of Hershey (the generous chocolatier, the utopian town, the school for orphans) was a more powerful brand narrative than any advertisement could construct. He was, without using the term, building what we'd now call an earned-media moat.
The Formula That Shouldn't Work
The milk chocolate Hershey developed in the early 1900s has a distinctive flavor that Europeans — and a significant number of American food critics — find puzzling. It has a slight tang, a faintly sour, almost cheesy note that results from the way Hershey's process handles milk. The prevailing theory, widely accepted though Hershey has never publicly confirmed the precise details, is that the milk undergoes a controlled lipolysis — a partial breakdown of milk fats — before being combined with chocolate liquor and sugar. The resulting flavor compound, butyric acid, is the same molecule found in Parmesan cheese and, less appetizingly, vomit.
This sounds like a defect. It is, instead, the moat.
American consumers who grew up eating Hershey's chocolate — which is to say, nearly all American consumers — imprinted on this flavor profile the way ducklings imprint on the first moving object they see. The taste of a Hershey bar is the taste of chocolate, to Americans. It is the flavor of Halloween, of s'mores around a campfire, of the Reese's Peanut Butter Cup torn open in a movie theater. Competitors who have tried to enter the U.S. market with "superior" European-style chocolate have consistently discovered that American consumers do not want superior chocolate. They want their chocolate. The butyric acid note that Swiss chocolatiers find objectionable is, for 330 million Americans, the Proustian madeleine.
This is a textbook illustration of Hamilton Helmer's concept of "counter-positioning" mutating over time into "brand power." The initial process was likely a cost-driven manufacturing decision — a way to use less-than-perfectly-fresh milk in a pre-refrigeration era. But the flavor it produced became the standard against which all other chocolate in America was measured. When you define the category, you own the category. And when the category definition is literally baked into the taste buds of the national population through a century of consumption, the switching costs are not financial. They are neurological.
The Orphans' Dividend
In 1909, Milton and his wife Catherine — who were unable to have children — established the Hershey Industrial School (later renamed the Milton Hershey School) for orphaned boys. In 1918, three years after Catherine's death, Milton transferred his entire fortune — including his controlling stake in the Hershey Chocolate Company — to the Milton Hershey School Trust. The value of the transfer was approximately $60 million, equivalent to roughly $1.2 billion today. He was sixty-one years old. He had given away everything.
The structural implications of this act have echoed for more than a century. The Hershey Trust Company, which administers the school's endowment, holds all of Hershey's Class B common stock, which carries ten votes per share. This gives the Trust approximately 80% of the voting power of the company, despite owning a smaller percentage of total equity. The Trust's fiduciary duty runs to the students of the Milton Hershey School — currently about 2,100 children from low-income families who receive free education, housing, and comprehensive support from pre-kindergarten through twelfth grade, on a 10,000-acre campus in Hershey, Pennsylvania.
This governance structure is, simultaneously, Hershey's greatest defense and its most debated constraint. The Trust has repeatedly blocked takeover attempts — most dramatically the 2002 bid by Wrigley (backed by a group that reportedly included Nestlé and Cadbury) and the 2016 Mondelez approach. The Pennsylvania Attorney General has intervened in past attempts to sell, arguing that the Trust's control of Hershey is essential to its charitable mission. The result is a public company that cannot be acquired against the Trust's wishes, which effectively means: Hershey cannot be acquired. Period.
For long-term shareholders, this is a feature. The Trust's permanence as a controlling shareholder creates a time horizon that is genuinely intergenerational — it is not optimizing for quarterly earnings or a three-year private equity hold period but for the perpetual funding of a school. For activist investors or would-be acquirers, it is a fortress with no drawbridge.
How Milton Hershey's philanthropy became a governance mechanism
1909Milton and Catherine Hershey establish the Hershey Industrial School for orphaned boys.
1918Milton transfers his entire Hershey Chocolate Company stake to the school trust, valued at ~$60 million.
1927Hershey Chocolate Company goes public on the NYSE. The Trust retains controlling interest.
2002Hershey Trust explores sale to Wrigley consortium. Pennsylvania AG and public outcry block the deal.
2016Mondelez bids $23 billion. Trust rejects, maintaining independence.
2025Trust still controls ~80% of voting power. Milton Hershey School enrolls ~2,100 students.
The Depression, the War, and the Wrapper
The 1930s tested every American enterprise. Hershey, remarkably, never laid off a worker during the Great Depression. Milton instead redirected employees to construction projects — building the Hotel Hershey, a community center, a sports arena, new facilities for the school. He was, in effect, running a Keynesian stimulus program in miniature, a decade before the concept had entered mainstream economic thought. The loyalty this generated was immense — and it was tested almost immediately.
In April 1937, workers at the Hershey factory staged a sit-down strike, inspired by the wave of labor actions sweeping American industry. The strikers, affiliated with the CIO (Congress of Industrial Organizations), occupied the plant for six days. The resolution was violent: local dairy farmers and non-striking workers, fearful of losing their livelihoods if the factory closed, physically ejected the strikers. Milton Hershey, who viewed the strike as a personal betrayal, was devastated. The episode revealed the fragility of even the most paternalistic corporate culture — the town that had been built on generosity could not, in the end, accommodate dissent.
World War II restored the company's narrative. The U.S. military contracted Hershey to produce a survival ration — the "Field Ration D" bar, engineered to withstand high temperatures and provide concentrated calories. It tasted terrible by design (to prevent soldiers from eating it as candy rather than saving it for emergencies), but Hershey produced over a billion of them. The Tropical Chocolate Bar followed — slightly more palatable, distributed across every theater of the war. Soldiers returned home with the Hershey name encoded in their wartime memories, which is about as powerful a brand association as any company could wish for. The government awarded Hershey the Army-Navy "E" Production Award five times.
Milton Hershey died on October 13, 1945, at age eighty-eight, having witnessed his creation survive depression, labor unrest, and global war. He left behind no biological heirs. He left behind a company, a town, a school, and a trust — an interlocking system designed to perpetuate itself indefinitely.
The Long Plateau and the Advertising Question
For the quarter century following Milton's death, the company was managed by loyalists who treated his methods as scripture. The no-advertising policy held. The product line expanded cautiously — Reese's Peanut Butter Cups (acquired when H.B. Reese, a former Hershey employee, died in 1963 and his family sold the business) became a cornerstone, but the core identity remained chocolate bars and Kisses sold without promotional support in a market that was rapidly professionalizing its marketing.
The reckoning arrived in the late 1960s. Mars — private, aggressive, and increasingly sophisticated — was eroding Hershey's market share with M&M's, Snickers, and Milky Way, backed by heavy television advertising. Health-consciousness was creeping into American food culture for the first time, and candy consumption per capita was declining. The company's sales were stagnating.
William Dearden — who had attended the Milton Hershey School as a boy after his mother's death, worked his way through the company, and eventually became CEO — made the heretical decision to advertise. In July 1970, Hershey placed ads in 114 Sunday newspapers. Two months later, the company's first television and radio commercials aired. The move was itself a news event — the fact that Hershey was advertising at all generated coverage that amplified the campaign's reach. Dearden hired Ogilvy & Mather, one of the era's preeminent agencies.
The irony was exquisite: seventy years of not advertising had created such powerful brand recognition that the act of finally advertising became its own form of publicity. Milton Hershey's refusal to advertise had, paradoxically, made advertising maximally effective when it was finally deployed.
The Reese's Machine
If the Hershey bar is the company's soul, Reese's is its engine. The brand, built on the deceptively simple combination of chocolate and peanut butter, has become the single largest confectionery brand in the United States — larger than Snickers, larger than M&M's, larger than the Hershey bar itself. Reese's Peanut Butter Cups alone generate billions in annual retail sales.
H.B. Reese was a dairy farmer who went to work at the Hershey factory, then left to start his own candy company in the basement of his home in Hershey, Pennsylvania, in 1923. He used Hershey's chocolate coating for his products — a relationship that was simultaneously competitive and symbiotic. When Reese died in 1956, his sons continued the business until merging it with Hershey in 1963. The merger price, by contemporary standards, was modest. The return on that investment has been, by any measure, one of the great acquisitions in consumer products history.
What makes Reese's durable is the specificity of its flavor combination. Chocolate-and-peanut-butter is not a generic pairing that any manufacturer can replicate — it is, like Hershey's chocolate itself, a specific ratio, a specific texture, a specific experience that consumers have bonded with over decades. The brand has proven extensible (Reese's Pieces, Reese's Sticks, Reese's Miniatures, Reese's Big Cup, seasonal shapes) without diluting the core. Michele Buck, who became CEO in 2017, has called Reese's "the jewel in the crown" — and the financials support the metaphor.
Our consumers tell us that chocolate provides emotional well-being, and we've seen especially recently as stress levels register higher, that chocolate plays an even bigger role in their lives.
— Michele Buck, CEO of Hershey, Fortune interview, 2025
The First Woman and the Salty Pivot
Michele Buck grew up on a farm without indoor plumbing, put herself through school, and spent years at Frito-Lay and other PepsiCo divisions before joining Hershey in 2005. She became the company's first female CEO in March 2017, following a stint as chief operating officer, and immediately articulated a vision that would have been unthinkable to Milton Hershey: she wanted to make Hershey a snacking powerhouse, not just a candy company.
The logic was deceptively straightforward. The American snacking market — broadly defined to include salty snacks, better-for-you options, and on-the-go formats — was growing faster than confectionery alone. Hershey's distribution infrastructure, its relationships with mass retailers and convenience stores, and its consumer insights capabilities were transferable across snack categories. The constraint was that Hershey had no credible brands outside of confectionery. Buck's answer was M&A.
The acquisitions came in a deliberate sequence: Amplify Snack Brands (SkinnyPop popcorn) in 2018 for approximately $1.6 billion, Pirate's Booty in 2018, and Dot's Homestyle Pretzels in 2021. Each target shared specific characteristics Buck would later codify: at least $100 million in annual sales, high margins, ease of integration into Hershey's distribution system, and a brand identity that Hershey could build rather than create from scratch. "We are builders of brands, we are not necessarily creators," she told Fortune in 2025. It is an unusually honest self-assessment for a CEO — an acknowledgment that the company's core competency is distribution, marketing, and brand stewardship, not product invention.
The salty snacks segment reached approximately 10% of Hershey's $11.2 billion in 2024 revenue — meaningful but still a fraction of the confectionery business. Buck herself acknowledged that she wished the transformation had moved faster. "Early on, our M&A capability was not where it needed to be and we stubbed our toe on a couple of small acquisitions," she admitted. The candor is notable. Building M&A as an organizational competency — not just finding deals but developing the institutional muscle to evaluate, integrate, and grow acquired businesses — is among the most underappreciated challenges in corporate strategy.
🥨
The Snacking Acquisitions
Hershey's expansion beyond confectionery
| Acquisition | Year | Category | Rationale |
|---|
| Amplify Snack Brands (SkinnyPop) | 2018 | Better-for-you popcorn | Entry into $10B+ salty snacks market |
| Pirate's Booty | 2018 | Puffed snacks | Kids/family snacking occasions |
| Dot's Homestyle Pretzels | 2021 | Salty/savory pretzels | High-margin, high-growth pretzel category |
| ONE Brands (protein bars) | 2019 | Better-for-you bars | Health-conscious consumer segment |
The Cocoa Crisis and the Price of Sweetness
In 2024, cocoa prices hit fifty-year highs. The causes were structural and meteorological — poor harvests in West Africa (which produces roughly 70% of the world's cocoa), disease affecting cocoa trees, and the long-term underinvestment in farming infrastructure that had been building for decades. Cocoa futures that had traded around $2,500 per metric ton for most of the 2010s surged past $10,000 in early 2024 and spiked above $12,000 by year-end, a price level with no modern precedent.
For Hershey, which purchases enormous quantities of cocoa and cocoa derivatives, the impact was severe. Input costs surged. Margins compressed. The stock price, which had peaked near $275 in April 2023, fell more than 30% over the following eighteen months. Wall Street analysts, long accustomed to Hershey's steady, almost boring predictability, issued rare criticisms. The company raised prices — it had always been able to raise prices, given the brand power — but the magnitude of the cocoa shock tested the limits of consumer price elasticity in a way that sugar taxes and health trends never had.
Buck navigated the crisis with the same approach she'd brought to COVID-19, which she described as an exercise in empathetic leadership and operational triage. The company hedged cocoa purchases, adjusted pack sizes, and leaned on its salty snacks business — which was not cocoa-dependent — as a partial buffer. But the episode exposed a vulnerability that had always been latent in the business model: Hershey is, at its core, a value-added processor of an agricultural commodity whose supply is concentrated in politically unstable regions. The brand power that allows pricing is real. The commodity exposure that requires it is also real.
The MAHA Shadow
As Buck prepared to hand the CEO role to Kirk Tanner — a PepsiCo and Wendy's veteran who took over in August 2025 — a different kind of threat was gathering. The "Make America Healthy Again" movement, energized by political figures and amplified by social media, was pressuring food manufacturers to reformulate products, reduce sugar content, and confront the health implications of ultra-processed foods. For a company whose flagship products are chocolate bars and peanut butter cups, the rhetorical environment was hostile.
Buck, characteristically, refused to panic. "I don't see it as a hard right turn or like nothing we've ever seen before," she told Fortune in her exit interview. The candy industry had survived the health-food movement of the 1970s, the low-fat craze of the 1990s, and the organic revolution of the 2000s. Confectionery, Buck argued, occupied a different psychological space than everyday food — it was a treat, an indulgence, an emotional reward. Consumers who virtuously purchased kale and quinoa for dinner still wanted a Reese's Cup after.
The data largely supports this. U.S. confectionery sales have grown through every major health-consciousness cycle of the past fifty years. The category's resilience suggests that confectionery is not competing with healthy food — it is competing with other forms of small indulgence: a latte, a cocktail, a streaming subscription. The real competitive set is not carrots. It is dopamine.
The Anti-Acquisition
Hershey's governance structure — the Trust's voting control, the Pennsylvania AG's historical willingness to intervene, the deep emotional attachment between the Hershey brand and the community that bears its name — has created what might be the most takeover-proof company in American consumer products. This is not, as it might first appear, simply a story about defense. It is a story about what a company does when it cannot be bought.
Companies that can be acquired are disciplined by the threat of acquisition. If management underperforms, an acquirer will replace them. The market for corporate control, as economists call it, provides a check on complacency. Hershey lacks this check. The Trust's control means that no matter how badly the stock underperforms, no hostile bidder can take the company. This should, in theory, breed complacency.
It hasn't — at least not fatally. Why? The answer may lie in the unusual alignment of interests created by the Trust structure. The Trust needs Hershey's dividends to fund the school. The school needs the dividends to grow indefinitely. The Trust therefore has an incentive to ensure that Hershey generates strong, growing cash flows — not the rapid growth that venture-backed companies pursue, but the steady compounding that funds a perpetual endowment. The time horizon is not quarterly. It is not even decadal. It is genuinely perpetual. And this orientation, paradoxically, may be the most rational long-term shareholder structure available to a public company.
I'm very open to mergers and acquisitions. I see them playing a key role in our growth agenda going forward. We've an opportunity with mergers and acquisitions to go into spaces where our brands currently can't travel.
— Michele Buck, Fortune, 2017
130 Years of Wrapper on the Ground
Consider the arc. A farm boy with a fourth-grade education fails twice in the candy business, succeeds on the third attempt with milk caramels, sees a chocolate machine at a world's fair, sells everything, builds a factory in a cornfield surrounded by dairy cows, designs an entire town, creates a flavor of chocolate that is technically inferior by European standards but neurologically imprinted on the American palate, refuses to advertise for seven decades, gives his entire fortune to an orphanage that still controls the company, and — through this improbable sequence — produces a business that generates over $11 billion in annual revenue, maintains operating margins above 20%, and has compounded shareholder wealth for nearly a century.
The company Milton Hershey built has survived him by eighty years. It has survived two world wars, the Great Depression, a violent strike, hostile takeover attempts, the health-food movement, a global pandemic, and the most severe cocoa price shock in half a century. It has done this not through disruption or technological innovation but through the compounding of small advantages: a proprietary flavor, extraordinary brand recognition, relentless distribution, a governance structure that prevents short-termism, and a willingness to evolve — slowly, cautiously, but purposefully — from a chocolate company into a snacking company.
Kirk Tanner, the new CEO, inherits a machine. The question is not whether the machine works — it has worked for 130 years — but whether the same principles that built it can carry it into a world of GLP-1 drugs, shifting cocoa economics, and consumers who increasingly demand that corporations justify not just their products but their existence.
In Hershey, Pennsylvania, the streetlights are still shaped like Hershey's Kisses. The air, on certain days when the factory is running at full capacity, still smells of chocolate. And 2,100 children from low-income families are attending school — free of charge, fully funded — because a man who went bankrupt twice decided that the best use of his fortune was not a dynasty but a trust, and the best advertisement for his chocolate was a wrapper on the ground.
Hershey's 130-year operating history offers a set of principles that are less obvious than they first appear. They are not principles of speed, disruption, or network effects. They are principles of duration — of building competitive advantages that compound across generations, of structuring governance to resist the gravitational pull of short-termism, of understanding that in consumer products, the most durable moat is the one encoded in the customer's nervous system.
Table of Contents
- 1.Define the category, then become the definition.
- 2.Let the product be the advertisement.
- 3.Build the town around the factory.
- 4.Structure governance for permanence, not liquidity.
- 5.Acquire builders, not inventions.
- 6.Treat the commodity exposure as a feature to price through.
- 7.Extend the brand into adjacent occasions, not adjacent identities.
- 8.Use constraint as competitive advantage.
- 9.Compound small advantages across long time horizons.
- 10.Never confuse treats with food.
Principle 1
Define the category, then become the definition.
Milton Hershey didn't make the best chocolate. He made the first mass-market milk chocolate in America — and by doing so, he defined what chocolate tasted like for an entire nation. The slightly tangy, butyric-acid-tinged flavor profile that results from Hershey's proprietary milk-processing technique became the baseline against which all other chocolate in the U.S. was measured. Competitors could make objectively "better" chocolate by European standards, but they were entering a market where the reference point had already been set.
This is Hamilton Helmer's concept of branding power taken to its extreme: when the brand and the category become synonymous, switching costs are not financial or informational — they are sensory. The taste of a Hershey bar is, for most Americans, the taste of childhood. You cannot compete with someone's childhood.
Benefit: Category definition creates a moat that is nearly impossible to replicate because it is encoded in consumer behavior at a pre-conscious level. The switching cost is not price sensitivity — it is identity.
Tradeoff: The same flavor specificity that locks in domestic consumers limits international expansion. European and Asian consumers, who did not imprint on Hershey's flavor profile, often reject it. Hershey's international business remains a fraction of its domestic revenue.
Tactic for operators: If you are first to define a product category, invest heavily in ensuring your specific implementation — not the generic category — becomes the default reference point. The goal is not market share. The goal is to become the mental model.
Principle 2
Let the product be the advertisement.
For sixty-nine years, Hershey did not advertise nationally. No print ads. No radio spots. No television commercials. The company relied entirely on product quality, word of mouth, the visibility of the wrapper, and the draw of Hershey, Pennsylvania, as a tourist destination. When the company finally began advertising in 1970, the fact of advertising became itself a major news story — generating enormous free publicity.
The lesson is not "don't advertise." (Hershey now spends hundreds of millions on marketing.) The lesson is that a period of earned attention — when the product and its story do the marketing work — creates a reservoir of brand equity that paid advertising can later amplify but cannot create from scratch. Milton Hershey understood, intuitively, that the story of Hershey (the generous founder, the company town, the school for orphans) was a more powerful brand narrative than any agency could fabricate.
Benefit: Decades of organic brand-building created authenticity that paid media cannot replicate. When Hershey finally advertised, it was amplifying an existing emotional connection, not constructing one.
Tradeoff: The sixty-nine-year abstinence from advertising allowed Mars to gain significant market share during the mid-twentieth century. Purity of brand philosophy came at the cost of competitive positioning in a rapidly professionalizing market.
Tactic for operators: Before spending on paid acquisition, ask whether your product generates genuine word-of-mouth. If it does, consider investing disproportionately in the story around the product (founder narrative, community, mission) before scaling paid channels. The story compounds. The ads depreciate.
Principle 3
Build the town around the factory.
Hershey, Pennsylvania, is the most extreme example in American business history of a company creating its own ecosystem. Milton Hershey didn't just build a factory; he built worker housing, schools, infrastructure, an amusement park, a hotel — an entire community designed to attract, retain, and motivate the workforce that would produce his chocolate. The town became a tourist destination, a brand experience, and a recruitment tool simultaneously.
The modern equivalent is not a literal company town but the deliberate construction of an ecosystem around a core product — app stores, developer communities, content networks, educational programs. The principle is the same: when you control the environment in which your product is produced and consumed, you capture value that would otherwise leak to intermediaries.
Benefit: The ecosystem created self-reinforcing loyalty loops — workers invested in the community, consumers visited the town, the town's reputation enhanced the brand. Each element strengthened the others.
Tradeoff: Company towns concentrate risk. When the 1937 strike occurred, it was not just a labor dispute — it was a community fracture. The same tight coupling that created loyalty created fragility. Hershey's economic and social dependence on one employer in one town remains a structural vulnerability.
Tactic for operators: Invest in the infrastructure around your product, not just the product itself. Developer tools, community events, educational content, and physical spaces all create switching costs that pure product quality cannot. But design for resilience — avoid dependencies where a single point of failure can collapse the entire system.
Principle 4
Structure governance for permanence, not liquidity.
Milton Hershey's decision to transfer his controlling stake to a charitable trust created a governance structure that has protected the company for over a century. The Trust's ~80% voting control means Hershey cannot be acquired, cannot be pressured by activist investors into value-destructive short-term actions, and must generate the steady dividends that fund its charitable mission. The time horizon is not quarterly — it is perpetual.
How charitable control creates shareholder value
| Feature | Conventional Public Company | Hershey (Trust-Controlled) |
|---|
| Time horizon | Quarterly / 3–5 year | Perpetual |
| Takeover vulnerability | Yes | Effectively zero |
| Activist pressure | High | Minimal |
| Capital allocation bias | Buybacks, growth at any cost | Steady dividends, disciplined reinvestment |
| Management accountability | Market for corporate control | Trust board oversight |
Benefit: The governance structure aligns the controlling shareholder's interests with genuinely long-term value creation. The Trust doesn't need the stock to go up next quarter — it needs the dividend to grow for decades. This orientation has allowed patient capital allocation and insulated the company from the short-termism that plagues many public companies.
Tradeoff: The absence of takeover discipline can breed complacency. There have been periods — notably in the 1980s and early 2000s — where Hershey underperformed peers, and the normal market mechanism for correcting underperformance (acquisition) was unavailable. Trust governance also creates opacity: the Trust's decision-making is not always transparent to minority shareholders.
Tactic for operators: If you are structuring a company for multi-generational duration, consider dual-class share structures or mission-locked governance that insulates strategic decisions from short-term capital market pressure. But build in accountability mechanisms — the absence of external discipline requires stronger internal discipline.
Principle 5
Acquire builders, not inventions.
Michele Buck's M&A strategy was defined by a brutally honest self-assessment: "We are builders of brands, we are not necessarily creators." Hershey's acquisitions — SkinnyPop, Pirate's Booty, Dot's Pretzels — were not technology acquisitions or acqui-hires. They were brand acquisitions, purchased because Hershey's distribution system and marketing capabilities could scale them faster and more efficiently than their original owners.
The criteria Buck established — minimum $100 million in sales, high margins, ease of integration — are notable for what they exclude: early-stage bets, turnaround plays, and anything requiring Hershey to develop capabilities it doesn't possess. This is a company that knows what it is and what it isn't.
Benefit: Disciplined acquisition criteria dramatically reduce integration risk and ensure that the acquired business can immediately benefit from Hershey's scale advantages (distribution, shelf placement, marketing spend).
Tradeoff: The criteria exclude transformational bets. Hershey will never acquire the next category-creating brand at an early stage because the $100 million revenue floor eliminates pre-scale companies. Buck herself acknowledged the approach was slower than she wished.
Tactic for operators: Before pursuing M&A, define with surgical precision what your company does well and what it does not. Acquire businesses that need what you have (distribution, capital, operational excellence) rather than businesses that have what you need (technology, talent, a business model you haven't proven). The former is scaling. The latter is hoping.
Principle 6
Treat the commodity exposure as a feature to price through.
Hershey's dependence on cocoa — a volatile agricultural commodity concentrated in West Africa — is its most visible structural vulnerability. But the company has historically treated commodity exposure not as a risk to hedge away entirely but as a pricing opportunity. When cocoa prices rise, Hershey raises consumer prices. When prices fall, margins expand. The brand power that allows consumers to absorb a 10% price increase on a Reese's Cup without switching to a private-label alternative is, itself, the return on 130 years of brand investment.
The 2024 cocoa spike tested this principle at unprecedented levels. Prices quadrupled. Hershey raised prices, adjusted pack sizes, and leaned on its non-cocoa businesses (salty snacks). The stock fell, margins compressed, and analysts complained. But unit volumes held. The brand absorbed the shock.
Benefit: Pricing power transforms commodity volatility from a pure cost risk into a profit-margin management tool. Companies with sufficient brand power can pass through input cost increases, which effectively transfers commodity risk to the consumer.
Tradeoff: There are limits. The 2024 cocoa crisis demonstrated that even Hershey's brand power has a breaking point — or at least a stress point — when input costs quadruple. Extended periods of extreme pricing may erode consumer goodwill and invite private-label competition.
Tactic for operators: If your business depends on volatile inputs, invest obsessively in brand power. Pricing power is not a financial strategy — it is a brand strategy. The ability to pass through costs is not granted by spreadsheets. It is earned by decades of consumer trust.
Principle 7
Extend the brand into adjacent occasions, not adjacent identities.
Hershey's expansion into salty snacks was not a brand extension — it was an occasion extension. The company was not trying to make Hershey-branded pretzels. It was trying to capture more of the American snacking occasion by adding SkinnyPop, Dot's, and Pirate's Booty to a distribution system already reaching every convenience store, grocery chain, and mass retailer in the country. The Hershey name does not appear prominently on these products. What Hershey provides is infrastructure, not identity.
This distinction matters. Brand extensions that dilute the core identity (Hershey-branded salad dressing, hypothetically) destroy value. Occasion extensions that leverage distribution infrastructure without borrowing the brand name preserve the core while expanding the addressable market.
Benefit: Occasion extension preserves brand equity while expanding the revenue base. The core Hershey and Reese's brands remain unmixed with non-confectionery associations.
Tradeoff: Without the Hershey brand on salty snack products, the company must build or buy brand equity separately for each new category. This is slower and more expensive than leveraging the master brand.
Tactic for operators: When expanding into adjacent categories, ask whether you are extending your brand or your infrastructure. If the answer is infrastructure, keep the brands separate.
Distribution systems are promiscuous. Brand identities are monogamous.
Principle 8
Use constraint as competitive advantage.
Hershey's inability to be acquired, its governance by a charitable trust, its geographic concentration in central Pennsylvania, its dependence on a single raw material — these are constraints. And constraints, for 130 years, have forced the company to develop capabilities it would not have built if acquisition or diversification had been easy options.
The Trust structure forced long-term thinking. The inability to acquire (or be acquired by) a global giant forced Hershey to maximize the domestic market rather than chasing international scale. The geographic concentration forced deep community investment. The commodity dependence forced pricing discipline and brand investment.
Benefit: Constraints narrow the decision space, which reduces strategic drift and forces depth over breadth. Hershey's domestic dominance is partly a consequence of being unable to pursue the global diversification strategies of its competitors.
Tradeoff: Constraints are only advantages when the constrained space is large enough to support growth. Hershey's domestic focus works because the U.S. confectionery market is enormous. In a smaller market, the same constraints would be fatal.
Tactic for operators: Audit your constraints. Not the ones you complain about — the ones you've adapted to so thoroughly that you no longer notice them. Some of those constraints may be the source of your deepest competitive advantages, precisely because they forced you to develop capabilities your unconstrained competitors never had to build.
Principle 9
Compound small advantages across long time horizons.
Hershey has never made a single bet-the-company move (other than Milton's original pivot from caramels to chocolate). The company's history is instead a story of incremental compounding: small product-line extensions, gradual geographic expansion, steady pricing power, disciplined capital allocation, and a governance structure that prevents the kind of dramatic strategic pivots — leveraged buyouts, mega-mergers, wholesale reinventions — that destroy value at least as often as they create it.
The stock's performance reflects this: roughly 11% annualized returns over four decades, with remarkably low volatility relative to the broader market. This is not exciting. It is, however, extraordinarily rare. The number of companies that can compound at double-digit rates for forty years is vanishingly small.
Benefit: Compounding rewards patience and consistency over brilliance and boldness. A 20% operating margin maintained for decades is more valuable than a 40% margin achieved for three years.
Tradeoff: Incremental compounding requires that the base business remain healthy. If the core erodes — through commodity shocks, health trends, or competitive disruption — incremental improvements cannot offset structural decline. The approach is fragile to discontinuities.
Tactic for operators: If your business has a durable competitive advantage, resist the temptation to pursue dramatic strategic pivots. Invest in deepening the advantage, extending its duration, and compounding its returns. The unsexy path of consistent 10–12% growth, maintained for decades, creates more value than most "transformational" strategies.
Principle 10
Never confuse treats with food.
Hershey's resilience through every health-consciousness wave of the past fifty years — from the 1970s health-food movement to the 1990s low-fat craze to the current MAHA movement — rests on a psychological insight that Buck articulated clearly: confectionery does not compete with healthy food. It competes with other forms of small indulgence. The relevant competitive set for a Reese's Cup is not a carrot stick. It is a $6 latte, a glass of wine, a moment of self-reward.
This positioning is both descriptively accurate and strategically crucial. Companies that try to make confectionery "healthy" (reducing sugar, adding protein, reformulating for nutritional virtue) almost invariably destroy the indulgence value that drives the purchase. The consumer who wants healthy food buys healthy food. The consumer who wants a treat buys a treat. The worst strategic error is to serve neither audience well.
Benefit: Clear psychological positioning in the "treat" category insulates confectionery from health trends that primarily affect everyday food consumption. The data supports this: U.S. candy sales grow through health-consciousness cycles.
Tradeoff: This positioning limits the credibility of better-for-you line extensions. ONE Bars (protein bars) and SkinnyPop (positioned as a lighter snack) sit in tension with the core indulgence identity. Managing both requires brand architecture discipline.
Tactic for operators: Understand which psychological need your product serves — and defend that positioning aggressively. Do not let external pressure (health trends, regulatory scrutiny, media narrative) push you to redefine your category in ways that weaken the core value proposition. Be honest about what you sell and whom you sell it to.
Conclusion
The Permanent Company
The principles above share a common thread: they are principles of duration. Hershey has not disrupted anything. It has not created a platform or built a network effect or achieved hypergrowth. What it has done — for 130 years, through depression and war and commodity shocks and changing consumer tastes — is endure. And endurance, in a business world obsessed with speed and disruption, is itself a form of competitive advantage.
The operators who study Hershey will not find a playbook for building the next unicorn. They will find something arguably more valuable: a model for building a business that can outlast its founder, its competitors, and the assumptions of the era in which it was created. Milton Hershey built a company that could survive him by a century. The principles that enabled that survival — category definition, brand patience, structural governance, disciplined acquisition, occasion extension, constraint exploitation, and relentless compounding — are available to any operator willing to trade speed for permanence.
The question is whether that trade is worth making. Hershey's 130-year track record suggests the answer.
Part IIIBusiness Breakdown
The Business at a Glance
Current Vital Signs
Hershey FY2024
$11.2BNet sales
~20%Operating margin
~$1.6BFree cash flow (FY2022 reference)
~19,000Employees
~$39BMarket capitalization (early 2024)
22%Return on invested capital (ROIC)
80+Countries with product distribution
Hershey is the largest confectionery company in the United States and the fourth-largest globally, behind Mars, Ferrero, and Mondelez International. The company's scale is concentrated in North America, where it commands dominant market share in both the chocolate and overall confectionery categories. International operations, while profitable and growing, remain a relatively small contributor to total revenue — a structural choice reflecting both the governance constraints described above and the difficulty of exporting a flavor profile calibrated for American tastes.
The business is organized into three reporting segments: North America Confectionery (NAC), North America Salty Snacks (NASS), and International. NAC, which includes the Hershey bar, Reese's, Kisses, Kit Kat (under license from Nestlé in the U.S.), Jolly Rancher, Twizzlers, and Bubble Yum, accounts for approximately 81% of total revenue. NASS, driven by SkinnyPop, Dot's Pretzels, and Pirate's Booty, contributes approximately 10%. International rounds out the remainder.
How Hershey Makes Money
Hershey's revenue model is conceptually simple: the company manufactures branded confectionery and snack products and sells them to retailers (grocery chains, mass merchandisers, convenience stores, dollar stores, drug stores, and wholesale distributors), who sell them to consumers. There is no subscription model, no platform fee, no recurring SaaS revenue. The business runs on the oldest model in capitalism: make a product, sell the product, repeat.
The sophistication lies in the execution. Hershey's pricing power — the ability to raise the price of a Reese's Cup by 8-10% without losing meaningful unit volume — is the financial manifestation of brand equity accumulated over more than a century. The company's relationships with major retailers give it privileged shelf placement, particularly during the seasonal holidays (Halloween, Christmas, Valentine's Day, Easter) that drive a disproportionate share of confectionery sales.
Hershey's three operating segments
| Segment | % of Revenue (approx.) | Key Brands | Growth Profile |
|---|
| North America Confectionery | ~81% | Reese's, Hershey's, Kisses, Kit Kat (U.S. license), Jolly Rancher, Twizzlers | Mature, pricing-driven |
| North America Salty Snacks | ~10% | SkinnyPop, Dot's Pretzels, Pirate's Booty | Expanding |
| International | ~9% | Hershey's, Reese's, Jolly Rancher (global markets) | |
The unit economics of confectionery are attractive. Raw materials (cocoa, sugar, milk, peanuts, packaging) represent the largest cost component, but the value-add from branding and manufacturing allows gross margins consistently above 40%. Operating margins have historically hovered around 20%, supported by the company's scale advantages in procurement, manufacturing, and distribution.
Free cash flow generation is strong and consistent — the business requires modest capital expenditure relative to its revenue, and the working capital cycle is well-managed.
Seasonal dynamics are pronounced. Halloween alone accounts for a significant percentage of annual confectionery sales in the U.S. — the holiday season broadly defined (October through February, spanning Halloween, Christmas, Valentine's Day) is the industry's peak. Hershey's product portfolio is particularly well-positioned for seasonal gifting (Kisses in foil, boxed chocolates, seasonal shapes of Reese's Cups).
Competitive Position and Moat
Hershey operates in one of the most concentrated consumer products markets in the world. In U.S. confectionery, Hershey and Mars together command the vast majority of the market. Globally, the competitive landscape includes Mars (private, estimated $47 billion in annual revenue across all divisions), Ferrero (private, ~$17 billion), Mondelez International (public, ~$36 billion), Nestlé (public, with a massive global confectionery business), and Lindt & Sprüngli (public, ~$5 billion).
Hershey's moat derives from five reinforcing sources:
1. Brand Power. Hershey's, Reese's, and Kisses are among the most recognized consumer brands in the United States. Brand recognition translates directly into pricing power — the ability to raise prices without proportional volume loss. This was demonstrated repeatedly during the 2022–2024 inflationary environment.
2. Sensory Imprinting. As discussed extensively, the specific flavor profile of Hershey's chocolate is neurologically imprinted on American consumers from childhood. This creates a switching cost that is pre-conscious and non-financial — the most durable form of lock-in available to a consumer products company.
3. Distribution and Shelf Placement. Hershey's distribution infrastructure reaches virtually every point of sale in the United States — grocery, convenience, mass merchandise, dollar stores, drug stores, vending machines. The company's category management expertise (helping retailers optimize their confectionery sections) creates deep retailer relationships that are difficult for smaller competitors to replicate.
4. Scale Economies. As the largest U.S. confectionery manufacturer, Hershey enjoys procurement advantages in cocoa, sugar, and milk, as well as manufacturing efficiencies across its network of production facilities. These scale advantages widen as smaller competitors face the same input cost pressures without the same ability to negotiate pricing.
5. Governance Moat. The Hershey Trust's voting control prevents hostile acquisition, which means competitors cannot consolidate Hershey's brands and distribution into a larger entity. This effectively preserves an independent competitor in the market and prevents the kind of industry consolidation that has occurred in beer, spirits, and other consumer categories.
Sources of durable competitive advantage
| Moat Source | Strength | Vulnerability |
|---|
| Brand power / pricing | Very strong | Extreme input cost inflation (2024 cocoa crisis) |
| Sensory imprinting | Very strong | Does not transfer internationally |
| Distribution / shelf placement | Very strong | Retailer consolidation, private-label pressure |
| Scale economies | |
The moat's weakness is geographic. Hershey's brand power, sensory imprinting, and distribution dominance are overwhelmingly U.S.-centric. International expansion has been slow and difficult. Mars and Ferrero, both private companies with multi-generational time horizons of their own, dominate global confectionery in ways Hershey has not been able to match.
The Flywheel
Hershey's compounding engine operates through a reinforcing cycle that has been turning, in recognizable form, for over a century:
Reinforcing cycle of brand, distribution, and pricing power
1. Brand recognition → Pricing power. 130 years of consumer exposure, supported by category-defining flavor and ubiquitous distribution, creates brand equity that allows above-inflation price increases.
2. Pricing power → Margin expansion. The ability to pass through (and sometimes exceed) input cost increases generates consistent operating margins above 20%, even in volatile commodity environments.
3. Margin expansion → Cash flow generation. High margins on a $11+ billion revenue base generate substantial free cash flow (~$1.5–2.0 billion annually in normal commodity environments).
4. Cash flow → Reinvestment. Free cash flow funds dividends to the Trust (supporting the school), share buybacks, and strategic acquisitions (salty snacks, better-for-you categories) that expand the addressable market.
5. Reinvestment → Distribution breadth. Acquisitions like SkinnyPop and Dot's leverage Hershey's existing distribution system, adding products to an infrastructure already reaching every major retail channel in the U.S.
6. Distribution breadth → Brand exposure. More products on more shelves creates more consumer touchpoints, reinforcing brand recognition and the cycle begins again.
The flywheel's speed is slow. This is not a viral loop or a network effect that compounds weekly. It compounds annually and generationally. The power of the flywheel is not its velocity but its durability — it has been spinning, with minor interruptions, since the early 1900s.
Growth Drivers and Strategic Outlook
Five specific vectors are likely to drive Hershey's growth over the next five to ten years:
1. Salty Snacks Scaling. The NASS segment, currently ~10% of revenue, has significant room to grow. The U.S. salty snacks market exceeds $30 billion, and Hershey's portfolio (SkinnyPop, Dot's, Pirate's Booty) is positioned in the fastest-growing subcategories. Under Kirk Tanner, who brings deep experience from PepsiCo/Frito-Lay and Wendy's, further M&A in the snacking space is likely.
2. Pricing Realization. Hershey has demonstrated consistent ability to take 5–10% annual price increases without proportional volume decline. In inflationary environments, this creates significant revenue growth even with flat or modestly declining volumes. The key question is whether the post-2024 cocoa environment allows prices to hold at elevated levels while input costs normalize — a scenario that would generate substantial margin expansion.
3. Seasonal and Occasion Expansion. Hershey has expanded beyond the traditional "Big Four" seasons (Halloween, Christmas, Valentine's Day, Easter) into movie-theater snacking, everyday snacking occasions, and gifting formats. Each new occasion represents incremental volume for existing brands.
4. International Expansion. While Hershey's international business remains small, the company has targeted specific markets (Mexico, Brazil, India, China) where rising incomes and growing middle classes create confectionery demand. International confectionery is estimated at over $200 billion globally, and Hershey's current ~9% revenue share from international operations represents significant whitespace — assuming the product adaptation challenge (flavor preferences, format preferences) can be navigated.
5. Digital and Data Capabilities. Hershey has invested in data analytics and AI tools for demand forecasting, consumer insights, and marketing optimization. The company was an early adopter of category management technology, using data to help retailers optimize confectionery shelf layouts — a capability that deepens retailer relationships and strengthens distribution positioning.
Key Risks and Debates
1. Cocoa Price Structural Shift. The 2024 cocoa crisis may not be cyclical. West African cocoa production faces long-term structural pressures: aging trees, climate change, soil depletion, and chronic underinvestment in farming infrastructure. If cocoa prices settle at a new, permanently higher baseline (say, $5,000–$7,000/metric ton versus the historical ~$2,500), Hershey's margin structure will be permanently different. The company can price through some of this, but a sustained 2–3x increase in its largest input cost would fundamentally alter the economics.
2. GLP-1 Drugs and the Appetite Suppression Effect. Ozempic, Wegovy, Mounjaro, and their successors represent a genuinely novel threat to confectionery demand. These drugs suppress appetite and reduce cravings for calorie-dense, sugary foods. If GLP-1 adoption reaches 10–15% of the U.S. adult population (current estimates suggest this is plausible by 2030), the impact on confectionery volumes could be material — potentially 3–5% of category demand, though estimates vary widely.
3. Private-Label and Dollar-Store Pressure. As Hershey raises prices, the gap between branded and private-label confectionery widens. Dollar stores and discount retailers, which have become an increasingly important channel, may shift shelf space toward private-label alternatives that offer higher margins for the retailer. Hershey's brand power provides insulation, but not immunity.
4. Trust Governance Conflicts. The Hershey Trust's fiduciary obligation runs to the Milton Hershey School, not to minority shareholders. In periods of strategic disagreement — whether to pursue a major acquisition, whether to accept a premium takeover bid, whether to alter dividend policy — the Trust's interests may diverge from those of other shareholders. The 2002 near-sale generated intense controversy and litigation. Future governance conflicts are not hypothetical.
5. CEO Transition Risk. Kirk Tanner, who succeeded Michele Buck in August 2025, inherits a company in transition — simultaneously managing a cocoa crisis, a strategic pivot into snacking, and the MAHA headwind. Tanner brings relevant experience (PepsiCo, Wendy's) but has no history at Hershey. The cultural fit between a new CEO and a 130-year-old company with a deeply entrenched identity is not guaranteed.
Why Hershey Matters
Hershey matters to operators and investors not because it is the most innovative company in consumer products — it manifestly is not — but because it is among the most instructive. The company's 130-year history is a masterclass in the compounding of seemingly modest competitive advantages over genuinely long time horizons. Brand recognition accumulated without advertising for seven decades. A flavor profile that became the American definition of chocolate through industrial standardization rather than artisanal quality. A governance structure that was designed for philanthropy and became, accidentally, the most effective anti-takeover defense in corporate America. A distribution system built for chocolate bars that now carries popcorn and pretzels.
None of these advantages, individually, is dramatic. Collectively, compounded across a century, they have produced a $39 billion enterprise generating over $11 billion in annual revenue with consistent 20%+ operating margins and double-digit annualized shareholder returns sustained across four decades. The lesson is not specific to confectionery. It is about the power of duration itself — the insight that in many industries, the most valuable strategic asset is not the ability to move fast but the accumulated weight of having been there, consistently and profitably, for a very long time.
Milton Hershey failed twice before succeeding. He bet everything on a commodity product in a commodity industry. He built a factory in a cornfield, a town around the factory, and a school to inherit both. One hundred and thirty years later, the streetlights in that town are still shaped like Kisses, and the orphanage still controls the company. The most enduring businesses are not always the most brilliant. Sometimes they are simply the ones that refused to stop compounding.