Redundancy is duplicate capacity or parallel paths so that if one element fails, the system can still function. It is a hedge against failure: you pay a cost (extra components, complexity, maintenance) for the ability to absorb the loss of a part. The trade-off is explicit: redundancy costs more in the normal case and pays off when the primary fails. When the cost of failure is high — life, capital, continuity — redundancy is often worth it. When the cost of failure is low or the failure mode is unlikely, redundancy can be waste.
Redundancy can be passive (standby capacity that activates on failure) or active (multiple elements working in parallel, e.g. load sharing). It can apply to hardware, people, data, or process. The key question is: redundant with respect to what? A second server is redundant for server failure but not for power failure; a second data centre is redundant for site failure but not for a bad deploy. Identify the failure mode you're hedging, then add redundancy that actually covers it.
Redundancy has downsides. It increases complexity and can create hidden single points of failure (e.g. two systems that share one dependency). It can also induce complacency: "we have a backup" can mean less care with the primary. The discipline is to design redundancy so it is independent of the primary for the failure modes that matter, and to test it so you know it works when needed.
Section 2
How to See It
Redundancy shows up wherever there is backup capacity or parallel capability. Look for: duplicate systems, failover, standby, or multiple paths to the same outcome. When someone says "we have a backup" or "we run two of those," redundancy is in play. The diagnostic: for which failure mode does this redundancy actually protect?
Business
You're seeing Redundancy when a company has a backup supplier for a critical component. If the primary supplier fails (quality, delivery, or insolvency), the second can ramp. The redundancy costs: dual qualification, possibly higher prices, and relationship management. It pays when the primary fails. The same logic applies to key person risk: backup coverage, documentation, and cross-training.
Technology
You're seeing Redundancy when a service runs on multiple instances across availability zones. One zone or instance can fail; the others continue. The redundancy is for hardware and zone failure — not for a bad code deploy or a security breach that affects all instances. Redundancy is always relative to a failure mode.
Investing
You're seeing Redundancy when a portfolio holds uncorrelated or negatively correlated assets. When one segment falls, others may hold or rise. The redundancy is for concentration risk. The cost is dilution of upside in the best-performing asset; the benefit is survival in a drawdown. Position sizing and diversification are forms of redundancy.
Markets
You're seeing Redundancy when a clearinghouse or exchange has multiple liquidity providers or backup settlement paths. If one path fails, another can clear. The redundancy protects against operational or counterparty failure. The design question: are the backup paths independent of the primary for the failure modes we care about?
Section 3
How to Use It
Decision filter
"Before adding redundancy, name the failure mode you're hedging. Add redundancy that is independent for that mode. Test it. Weigh the cost of redundancy against the cost and probability of failure. Don't assume redundancy is always good — it has cost and can create false confidence."
As a founder
Use redundancy where failure would be existential or very costly: critical infrastructure, key person, single supplier, or single distribution channel. For each, ask: what fails? Is the backup independent for that failure? Do we test it? The mistake: redundant systems that share a single point of failure (e.g. same power, same vendor). The second mistake: no redundancy where one failure would kill the company, because "we'll fix it before it happens."
As an investor
Assess redundancy in critical paths: tech, key people, supply chain, regulatory. Companies that have no backup for a single point of failure carry tail risk. Companies that have tested, independent redundancy in the right places are more resilient. Check whether "we have redundancy" means real independence or shared dependency.
As a decision-maker
Add redundancy when the expected cost of failure (probability × impact) exceeds the cost of the redundancy. Define the failure mode; ensure the redundant element would actually work when that mode occurs. Test; don't assume. For low-stakes or reversible failure, redundancy may not be worth the cost.
Common misapplication: Redundancy that doesn't cover the real failure mode. Two servers in the same rack are redundant for server failure but not for power or fire. Design redundancy for the failure you're hedging.
Second misapplication: Treating redundancy as a substitute for fixing the primary. Redundancy is a hedge; it doesn't remove the incentive to make the primary reliable. Over-reliance on backup can mask fragility in the main system.
FedEx built redundancy into the network: multiple hubs, aircraft, and routes so that weather, mechanical failure, or congestion in one place doesn't stop delivery. The redundancy is for operational failure; the company tests and drills so that when one path is blocked, another is used. Smith's insight was that reliability required redundant capacity and clear procedures to use it — not just hope that the primary never fails.
Netflix designed for redundancy and failure: multi-region deployment, chaos engineering, and the expectation that components will fail. The redundancy is architectural (many instances, many regions) and cultural (practice failure so the system survives). Hastings framed "design for failure" as a competitive advantage: the system that assumes failure and has redundant paths is more reliable than the one that assumes success.
Section 6
Visual Explanation
Redundancy: parallel or standby elements so that if one fails, the system continues. Redundancy must be independent for the failure mode you're hedging.
Section 7
Connected Models
Redundancy sits with fail-safes, margin of safety, and resilience. The models below either implement it (Backup System Model), extend it (Defense in Depth), or explain when systems still fail (Normal Accidents).
Reinforces
[Fail-safes](/mental-models/fail-safes)
Fail-safes are mechanisms that trigger when something goes wrong. Redundancy is one form: when the primary fails, the backup takes over. Fail-safes can also include circuit breakers and shutdowns. Redundancy provides capacity; fail-safes can activate it or protect when redundancy isn't enough.
Reinforces
[Margin of Safety](/mental-models/margin-of-safety) (Systems)
Margin of safety is buffer against failure. Redundancy is a form of margin: extra capacity so that loss of one part doesn't breach the limit. The two align: margin can be capacity, time, or redundancy; the aim is to survive failure.
Leads-to
Backup System Model
The backup system model is the explicit design of standby or parallel systems. Redundancy is the principle; backup system model is the implementation. Design backup so it is independent and tested.
Reinforces
[Resilience](/mental-models/resilience)
Resilience is the ability to absorb disruption and recover. Redundancy contributes: when one path fails, another can carry load. Resilience can also come from flexibility and fast recovery; redundancy is one lever.
Section 8
One Key Quote
"Failure is not an option — but we plan for it anyway."
— Gene Kranz, NASA Flight Director
The line captures the mindset: aim for success, but assume failure will occur and design redundancy and procedures so the system can survive. Redundancy is part of that plan.
Section 9
Analyst's Take
Faster Than Normal — Editorial View
Redundancy is always relative to a failure mode. Two data centres are redundant for site failure; they're not redundant for a bad global deploy or a shared vendor. Name the failure you're hedging, then check that your redundancy is independent for that failure. Most "redundancy" discussions skip this and assume any backup is enough.
Test the backup. Redundancy that isn't tested is a hope, not a design. Run failover drills, restore from backup, and simulate the failure. You'll find shared dependencies and broken procedures. Fix them before the real failure.
Cost versus consequence. Not everything needs redundancy. Where failure is cheap or reversible, redundancy may be waste. Where failure is existential or very costly, redundancy is insurance. Make the trade-off explicit: probability × impact versus cost of redundancy.
Section 10
Summary
Redundancy is duplicate or parallel capacity so the system can continue when one part fails. It costs more in the normal case and pays off when failure occurs. Design redundancy for specific failure modes and ensure the backup is independent for those modes. Test it. Weigh cost against the expected cost of failure; don't add redundancy everywhere.
Testing redundancy and failure paths by deliberately breaking things. Practice so that when failure happens, the system and team are ready.
Leads-to
Defense in Depth
Defense in depth is multiple layers of protection. Redundancy can be one layer — e.g. redundant servers behind redundant networks. The two combine: depth means more than one line of defense; redundancy means duplicate capacity within or across layers.
Normal Accidents theory says complex, tightly coupled systems can have failures that cascade across redundant elements (common-mode or cascading failure). Redundancy helps when failures are independent; it can fail when they're not. Design for independence and test for common-mode failure.