A backup system is a parallel capability that takes over when the primary fails. The model is simple: identify single points of failure, then add fallbacks that activate automatically or with minimal delay. The goal is not to prevent every failure but to ensure the system continues operating when components break. Critical infrastructure — power grids, data centres, payment networks — runs on this logic. One path carries the load; the other sits idle until needed. The discipline is designing the handoff so it happens before damage compounds.
Backups differ by activation. Hot backups run in parallel and switch instantly (database replicas, redundant servers). Warm backups need a short ramp (standby data centre, backup team). Cold backups require manual deployment (tape archives, contingency plans). The trade-off is cost versus recovery time. Hot is expensive; cold is cheap but slow. Most organisations mix: hot for customer-facing systems, warm for internal tools, cold for disaster recovery. The mistake is treating "we have a backup" as sufficient without testing the switch. Untested backups fail when invoked.
The model extends beyond technology. Key-person risk is a single point of failure; succession plans and cross-training are backups. Single suppliers create counterparty risk; dual sourcing is backup. Concentration in one market or product is unbuffered; diversification is backup. The strategic question is always: what happens when this component goes to zero? If the answer is "we stop," you need a backup or you accept the risk explicitly. High-reliability organisations assume components will fail and design the system to absorb those failures.
Section 2
How to See It
Backup systems reveal themselves when primary paths fail and operations continue. Look for redundant capacity, failover procedures, and recovery-time targets. When an outage occurs and a secondary system takes over without customer impact, the backup system model is at work. The inverse signal: a single failure causes cascading stoppage because no fallback existed.
Business
You're seeing Backup System Model when a payment processor's primary data centre goes offline and transactions route to a secondary region within seconds. Customers see no interruption. The backup was built, tested, and automated. The same logic applies to key executives: when a critical leader leaves, the organisation that has groomed a successor and documented decisions continues. The one that relied on one person stalls.
Technology
You're seeing Backup System Model when a cloud provider runs multi-AZ deployments so that a single availability zone failure doesn't take down the service. Database replication, backup power, and geographic distribution are all backup-system design. The alternative — a single server, single region, no replica — is a bet that nothing will fail. That bet loses eventually.
Investing
You're seeing Backup System Model when a fund or company holds liquidity buffers and diversified counterparties so that one bank failure or one asset freeze doesn't force a fire sale. The backup is optional until it isn't. Concentrated exposure to one custodian, one prime broker, or one currency is an untested backup — and often no backup at all.
Markets
You're seeing Backup System Model when supply chains have alternate suppliers or when a central bank has swap lines so that a liquidity crunch in one jurisdiction can be met with foreign-currency backup. The 2008 crisis showed that institutions without funding backups collapsed; those with access to multiple liquidity sources survived.
Section 3
How to Use It
Decision filter
"Before depending on any single component — person, system, supplier, or path — ask: what is the backup? If the answer is unclear or 'we'll figure it out,' you have a single point of failure. Design the backup before the failure. Test it."
As a founder
Identify single points of failure in the business: key people, critical vendors, core systems. For each, add a backup — succession, dual sourcing, redundancy — and test it. Hot backup for revenue-critical paths (payments, auth); warm or cold for the rest. The mistake is assuming backups work. Run failover drills. Document recovery procedures. When a key person leaves or a supplier fails, the backup should already be in place.
As an investor
Assess whether portfolio companies have backup systems for critical dependencies. Single cloud provider, single customer, single key person — each is a risk. The question: can the company operate if this one thing goes away? Companies with tested backups are more resilient; those without are one failure away from a crisis. Value the ones that have run the drill.
As a decision-maker
Before committing to a single path, map the cost of adding a backup against the cost of failure. For low-impact failures, skip the backup. For high-impact ones, build it and test it. The backup system model is not "back up everything" — it's prioritising backup where failure is unacceptable and accepting risk where it isn't.
Common misapplication: Treating backup as a one-time project. Backups decay. People leave, systems change, procedures go stale. Without periodic testing and updates, the backup may not work when needed. Schedule failover tests. Refresh documentation. Treat backup as a living capability.
Second misapplication: Backing up the wrong thing. Organisations often back up data or hardware while leaving processes, decisions, and key relationships undocumented. The backup that matters is the one for the constraint. If the constraint is a person's judgment, back up the decision framework and the successor, not just the files.
Netflix built its streaming infrastructure on redundant, multi-region AWS deployments. The company runs chaos engineering — deliberately killing servers and zones — to verify that backups and failover work. The backup system model is explicit: assume components will fail; design so that the service continues. That discipline allowed Netflix to scale without single points of failure.
Epic provides electronic health records for large hospital systems. Faulkner has emphasised reliability and backup: hospitals cannot afford downtime. Epic's architecture and deployment practices reflect a backup-system mindset — redundant systems and recovery procedures so that when something fails, care delivery continues.
Section 6
Visual Explanation
Backup System Model — Primary path carries load; backup path is idle until primary fails. Failover must be independent, detected, and fast.
Section 7
Connected Models
Reinforces
Redundancy
Redundancy is the presence of extra capacity or components so that the system can tolerate loss. The backup system model is one way to implement redundancy: a dedicated backup path that activates on primary failure.
Reinforces
Fail-safes
Fail-safes are mechanisms that default to a safe state when something goes wrong. A backup system is a form of fail-safe: when the primary fails, the system fails over to the backup rather than failing entirely.
Reinforces
Margin of Safety (Systems)
Margin of safety in systems is extra capacity or buffer so that normal variance and failures don't breach the limit. Backup systems provide margin when the primary is lost.
Leads-to
Resilience
Resilience is the ability to recover from disruption. Backup systems are a direct contributor: they reduce recovery time and limit impact when the primary fails.
Leads-to
Defense in Depth
Section 8
One Key Quote
"One test is worth a thousand expert opinions."
— Werner von Braun, on rocket design
The quote applies directly to backup systems. Expert opinion says the backup will work; only a test proves it. Organisations that run failover drills discover gaps. Those that never test discover those gaps during an outage.
Section 9
Analyst's Take
Faster Than Normal — Editorial View
Backup is not backup until it's tested. The number of organisations that claim to have backups and have never run a failover is high. When the primary fails, the backup often fails too. The discipline is scheduling tests and treating backup as a live capability.
Prioritise backup by consequence. Backup the components whose failure would stop the system or cause unacceptable damage. For the rest, accept the risk or add cheaper mitigation.
Independence matters. A backup that shares the same failure mode as the primary is useless. Geographic and organisational separation increase the chance the backup works when called.
Section 10
Test Yourself
Is this mental model at work here?
Scenario 1
A company runs its main application in two regions. When one region goes down, traffic switches to the other within 30 seconds. Customers see a brief latency spike but no outage.
Google's SRE practice is built on redundancy, failover, and testing. Chapters on redundancy and disaster recovery are direct applications of the backup system model.
Perrow analyses why complex systems fail and how redundancy can sometimes increase risk. Essential for understanding when backup helps and when it adds complexity.
Summary: The backup system model says: for any critical single point of failure, maintain a parallel path that can take over when the primary fails. Design the handoff, test it, and keep the backup current.
Further Reading: For implementation, see SRE and resilience literature. For organisational backup (key person, succession), see succession planning and bus-factor reduction practices.
Defense in depth uses multiple layers of protection. Backup systems are one layer: the backup is the second line.
Tension
Antifragility
Antifragility is gaining from stress. Backup systems reduce the cost of failure but don't require failure to improve. The tension: too much backup can reduce pressure to improve the primary.