Skip to content

Part 5: The Mountain Gets Bigger

The colony had 10 chinchillas and 500 seeds. It worked great. Now there are 10,000 chinchillas and 5 million seeds. Everything that worked before is breaking. Welcome to scaling.

The problem: Every time a chinchilla wants a seed, it runs to the seed burrow, digs, grabs one, and runs back. 50 trips a day. Exhausting.

The solution: Keep the most popular seeds in a quick-access stash near the den entrance. Instant access. Only go to the main burrow when the stash is empty.

The principle: Store frequently-accessed data in a fast, nearby location. Trade freshness for speed.

Cache: A faster (usually smaller) copy of data. CPU cache. Browser cache. CDN cache. Redis cache. All the same idea.

Cache strategies:

Cache-aside (lazy loading): Check cache first. Cache miss? Read from database, put it in cache for next time, return the data.

  • Most common pattern. Simple.
  • First request is always slow (cache miss). Subsequent requests are fast.

Read-through: Cache sits in front of database. Application only talks to cache. Cache handles misses by reading from database itself.

  • Application code is simpler (just reads from cache).
  • Cache must understand the database.

Write-through: Every write goes to cache AND database simultaneously.

  • Cache is always up-to-date. No stale data.
  • Writes are slower (waiting for both cache and database).

Write-back (write-behind): Write to cache only. Cache periodically flushes to database.

  • Fastest writes.
  • Risk: if cache crashes before flushing, data is LOST.

Cache eviction (when the stash is full):

  • LRU (Least Recently Used): Evict what hasn’t been accessed in the longest time. Most popular strategy.
  • LFU (Least Frequently Used): Evict what’s accessed least often overall.
  • TTL (Time To Live): Evict after a fixed time period regardless of usage.

The hardest problem in caching: invalidation. When does the cache become stale? If you change data in the database, the cached copy is now wrong. Options:

  • TTL: Accept staleness for up to N seconds (simple, usually good enough)
  • Event-driven invalidation: When data changes, actively delete the cached version
  • Never cache it: Some data changes so often that caching doesn’t help

“There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors.”

Instinct: SUSTAIN

The problem: All chinchillas live in different parts of the mountain. Everyone runs to the central seed depot on the east face. Chinchillas from the west face take 10x longer to get there.

The solution: Build satellite depots across the mountain. Stock them with copies of the most popular seeds. Each chinchilla goes to the nearest depot.

The principle: Move data closer to where it’s needed. Geography matters. The speed of light is a hard limit: crossing an ocean takes about 100ms.

CDN (Content Delivery Network): Servers distributed globally that cache static assets (images, videos, CSS, JS) close to users. Cloudflare, AWS CloudFront, Akamai.

Edge computing: Run some computation at the edge (near the user), not just in a central data center. Process at the satellite depot instead of shipping everything to headquarters.

Multi-region deployment: Run your actual application in multiple regions (US-East, EU-West, Asia). Users connect to the nearest one.

  • Challenge: How do you keep data in sync across regions?
  • See: Replication, consistency, CAP theorem

Instinct: SUSTAIN

The problem: At dawn, every chinchilla wakes up and runs to the seed depot simultaneously. At noon, hardly anyone is there. The depot is either overwhelmed or idle: never just right.

The solution: Multiple strategies:

  1. Queue chinchillas at the entrance so the depot processes them at a sustainable rate
  2. Open more depot windows when it’s busy, close them when it’s slow
  3. Have chinchillas come at staggered times

The principle: Traffic is rarely uniform. Systems must handle spikes without dying and without wasting resources during quiet periods.

Auto-scaling: Automatically add more servers when load increases, remove them when it decreases. Cloud providers (AWS, GCP, Azure) make this easy.

  • Scale on CPU usage: >70% CPU for 5 minutes? Add a server.
  • Scale on queue depth: >1000 messages waiting? Add a worker.
  • Scale on latency: p99 latency >500ms? Add capacity.

Back-pressure: When a system is overwhelmed, push back on the caller. “I’m full, slow down.” This prevents the system from collapsing under load.

  • HTTP 429: “Too Many Requests: try again in 30 seconds”
  • Queue-based: When the queue hits max size, reject new messages
  • TCP flow control: The receiver tells the sender to slow down

Load shedding: When completely overwhelmed, intentionally DROP some requests to save the rest. Better to serve 80% of users well than 100% of users poorly.

  • Drop low-priority requests (analytics) to protect high-priority ones (payments)
  • Return degraded responses (cached, partial) instead of timing out

Thundering herd: 10,000 cached items expire at exactly the same second. 10,000 requests hit the database simultaneously to rebuild the cache. Database dies.

  • Solution: Stagger expiry times with random jitter
  • Solution: Cache locking: only ONE request rebuilds the cache, others wait for it

Instinct: SUSTAIN + SURVIVE