Part 7: Designing the Whole Colony
Individual concepts are ingredients. This section is the recipe. When you sit down to design a system, what’s the process?
7.1 Start With One Chinchilla, One Burrow
Section titled “7.1 Start With One Chinchilla, One Burrow”The biggest mistake: Starting with distributed systems. “We need 5 microservices, a message queue, a CDN, and a Kubernetes cluster.”
No. Start with the SIMPLEST thing that could possibly work:
- One server
- One database
- No cache
- No queue
- Synchronous everything
Why? Because you need to understand the REQUIREMENTS before you can make TRADEOFFS. And you can’t make tradeoffs until you know what breaks.
Then ask: “What breaks first?“
7.2 The Breaking Points
Section titled “7.2 The Breaking Points”As load increases, things break in a predictable order:
Stage 1: Single server, single database (0-1K users)
- Works fine. Ship it.
Stage 2: Database becomes slow (1K-10K users)
- Add indexes for common queries
- Add a cache (Redis) for frequent reads
- Read replicas for read-heavy workloads
- This gets you surprisingly far.
Stage 3: Server can’t handle the load (10K-100K users)
- Add a load balancer
- Run multiple instances of your application
- Make your app STATELESS (no in-memory sessions: use Redis or cookies)
- Now any server can handle any request.
Stage 4: Database is the bottleneck again (100K-1M users)
- Vertical scaling (bigger database server)
- Read replicas (for read-heavy workloads)
- Sharding (for write-heavy workloads)
- Consider: some data can move to NoSQL
Stage 5: Global scale (1M+ users)
- Multi-region deployment
- CDN for static content
- Async processing via message queues
- Microservices for independent scaling
- Full observability (metrics, logs, traces)
- Caching at every layer
The point: You don’t need distributed systems for 1,000 users. You NEED them at 1,000,000. Design for where you are, not where you wish you were.
Start with one burrow. Only add complexity when something BREAKS. Never before.
7.3 The Six-Instinct Checklist
Section titled “7.3 The Six-Instinct Checklist”For EVERY system design decision, run through the instincts:
| Instinct | Question | If you skip this… |
|---|---|---|
| Remember | What happens if it crashes? Is state persisted? | Data loss. Incomplete operations. Angry users. |
| Agree | Can two components see different truth? | Silent corruption. Duplicate charges. Double bookings. |
| Survive | What’s the worst failure? What’s the blast radius? | Full outages. Cascading failures. Lost revenue. |
| Protect | Who sees what? What’s validated? | Data breaches. Compliance violations. Trust loss. |
| Sustain | What happens at 100x load? On worst hardware? | Crashes. Slow responses. Massive cloud bills. |
| Organize | Can someone new understand this in a day? | Tech debt. Impossible migrations. Onboarding hell. |
Not every instinct applies to every decision. But THINKING about all six ensures you don’t miss something critical.
7.4 Common Architectures (As Colonies)
Section titled “7.4 Common Architectures (As Colonies)”The Monolith (One Big Burrow)
Section titled “The Monolith (One Big Burrow)”Everything in one application. One codebase, one deployment, one database.
- When: You’re small. You’re moving fast. Your team is fewer than 10 people.
- Pro: Simple. Fast to develop. Easy to debug.
- Con: Scaling means scaling EVERYTHING. One component change requires redeploying EVERYTHING.
Microservices (Many Small Burrows)
Section titled “Microservices (Many Small Burrows)”Each feature is a separate service with its own database.
- When: You’re big. Features have very different scaling needs. Teams need to deploy independently.
- Pro: Scale pieces independently. Teams own their service. Polyglot (each service picks its own tech).
- Con: Network calls between services (latency). Distributed transactions (complexity). Debugging across 50 services (observability).
Event-Driven (The Grapevine Colony)
Section titled “Event-Driven (The Grapevine Colony)”Services communicate through events, not direct calls. “User signed up” -> everyone who cares reacts independently.
- When: Loose coupling matters. Events are the natural model. Multiple consumers per event.
- Pro: Maximum decoupling. Easy to add new consumers. Natural audit trail (event log).
- Con: Eventual consistency. Harder to reason about (“what caused this state change? Let me trace 7 events…”).
Serverless (Chinchillas Without Burrows)
Section titled “Serverless (Chinchillas Without Burrows)”No servers to manage. Function-based. Write code, cloud runs it when needed, you pay per invocation.
- When: Unpredictable/spiky traffic. Simple, independent functions. Rapid prototyping.
- Pro: Zero server management. Scales to zero (no traffic = no cost). Scales to infinity (provider handles it).
- Con: Cold starts (latency on first invocation). Vendor lock-in. Hard to debug. Execution time limits.
The right answer: Almost always a hybrid. Monolith for the core, a few extracted services for things that scale differently, events for cross-service communication, serverless for simple glue functions.