← Insights
build

You Added a Cache. Now You Have Two Problems.

A cache was supposed to make things faster. Instead you have stale data and a thundering herd. A teardown of caching layers, invalidation, and stampede control

A page was slow, so someone added a cache. For about a week it was glorious — P99 dropped, the database stopped sweating, everyone moved on. Then a customer filed a bug: they updated their billing address and the old one kept showing up. Then, during a deploy that flushed the cache, the database fell over for ninety seconds because forty thousand requests all missed at once and stampeded it simultaneously.

The cache didn't fix your performance problem. It traded a slow-but-correct system for a fast-but-occasionally-lying one, plus a brand-new failure mode where an empty cache is more dangerous than no cache at all. There's an old joke that there are only two hard problems in computer science: cache invalidation and naming things. The joke is load-bearing. Most teams add the cache and skip the invalidation strategy entirely, which is how you end up here.

Let's take the whole thing apart and put it back together so it actually helps.

The three places a cache can live

"Add a cache" is not one decision. It's at least three, and they solve different problems. Conflating them is the first mistake.

The CDN / edge cache sits in front of everything, geographically close to the user. It caches whole HTTP responses — static assets, images, and, if you're disciplined about cache headers, full pages or API responses that don't vary per user. This is the cheapest, highest-leverage cache you have, and it's the one teams under-use because it requires thinking about Cache-Control headers instead of writing application code. A correctly cached static response never touches your origin at all. Zero database load, zero app-server load, served from a city near the user in single-digit milliseconds.

The application cache is the one everyone means when they say "add a cache" — Redis or Memcached holding the results of expensive computations and queries. The user's permission set. The rendered dashboard payload. The product catalog. This is where the power is, and where all the danger is, because this is the layer that goes stale.

The database's own cache is the one you didn't add and shouldn't ignore. Postgres keeps hot pages in shared_buffers; the OS keeps a page cache underneath that. A surprising number of "we need Redis" situations are actually "our working set doesn't fit in memory and the database is hitting disk." Before you put a cache in front of the database, check whether the database is already caching effectively and just needs more RAM or a better index. Adding Redis to compensate for a missing index is paying rent to avoid a one-line fix.

The order of operations: serve at the edge what you can, cache in the app what you must, and make sure the database tier is actually using the memory you're paying for.

Cache-aside vs write-through: pick on purpose

There are two dominant patterns for keeping an application cache populated, and the difference is who writes to the cache and when.

Cache-aside (lazy loading) is the default. On a read, the app checks the cache. Hit — return it. Miss — read from the database, write the result into the cache, return it. Writes go straight to the database and invalidate the cache entry so the next read repopulates it. It's simple, it only caches what's actually requested, and a cache failure degrades to "slower," not "broken," because the database is still the source of truth.

read:   cache hit?  -> return
        cache miss? -> db read -> populate cache -> return
write:  write db -> delete cache key  (next read repopulates)

The trap in cache-aside is the gap between writing the database and invalidating the key. Get the order wrong — invalidate first, then write — and a concurrent read can repopulate the cache with the old value in that window, leaving you stale until the TTL expires. Write the database first, then invalidate. Always.

Write-through updates the cache and the database together on every write, so the cache is always warm and consistent. Reads are always fast because the data is always there. The cost is write latency (you're writing twice) and the fact that you're caching things nobody may ever read. It earns its keep for read-heavy data with a small, hot key space — the kind where a cold cache after a flush would be catastrophic.

Most systems want cache-aside for the broad case and write-through only for the handful of keys where a cold miss is unacceptable. The wrong move is picking one globally because a blog post said so.

Invalidation is the actual job

A cache without an invalidation strategy is a bug with a TTL. The two tools are time and events, and you need both.

TTL (time-to-live) is the safety net. Every cached entry gets an expiry, so the worst case for staleness is bounded — set a five-minute TTL and the data is never more than five minutes wrong, even if your event-based invalidation has a bug. TTL alone is fine for data where bounded staleness is acceptable: a product catalog, a list of regions, yesterday's analytics. It is not fine for a billing address.

Event-based invalidation is the precise tool. When the underlying data changes, you explicitly evict (or update) the affected keys. This is what keeps the billing address correct — the moment the user saves, the code that performs the write also kills the cached entry. The hard part is coverage: every code path that mutates the data has to invalidate the right keys, and the failure mode is the path someone forgot. The discipline that makes this tractable is centralizing writes so there's one place that owns both the database mutation and the invalidation, instead of scattering both across the codebase.

Use TTL as the backstop and events as the precision instrument. TTL bounds your blast radius when the event logic has a hole — and it will, eventually, have a hole.

The thundering herd

Here's the failure that took down your database during the deploy. It has a name: cache stampede, or the thundering herd.

A popular key expires (or the cache gets flushed). At that instant, every in-flight request for that key misses simultaneously. In cache-aside, every one of those misses independently decides to go regenerate the value from the database. So instead of one request recomputing an expensive query, you get ten thousand identical expensive queries hitting the database in the same millisecond. The database, which was comfortable serving cached traffic, gets a synchronized spike of its single most expensive operation and falls over. The empty cache is what killed you, not a full one.

Three defenses, used together:

Request coalescing (single-flight). When many requests miss the same key at once, let exactly one of them go regenerate the value while the others wait for that result. One database query, not ten thousand. This is the single most important stampede defense and most cache libraries support it directly.

Probabilistic early expiration. Instead of every entry expiring at a hard deadline, refresh keys slightly before they expire, with a randomized jitter, so the hot ones get regenerated by a single background-ish refresh rather than expiring under live traffic. Expirations spread out instead of synchronizing.

Staggered TTLs and warm-up on deploy. Never give a thousand keys the same exact TTL — add jitter so they don't all expire in the same second. And don't deploy in a way that flushes the entire cache cold into peak traffic. Warm the hot keys, or roll the flush gradually, so the database never faces a fully cold cache under load.

naive:  key expires -> 10,000 misses -> 10,000 db queries -> db dies
fixed:  key expires -> single-flight -> 1 db query -> 9,999 wait ~20ms

When the cache is hiding a real problem

The uncomfortable section. Sometimes the cache isn't a performance tool — it's a tourniquet on a wound you don't want to look at.

If a query is so slow that the product is unusable without caching it, the cache is masking a missing index, an N+1, or a data model that can't answer the question efficiently. Caching it means the first user every five minutes eats the slow query, and your TTL is now a dial that trades "how stale" against "how often someone suffers." That's not a fix; it's a schedule for the pain. The fix is making the underlying query fast, and then caching it because it's hot, not because it's broken.

The tell: if you can't disable the cache in a staging environment without the product becoming unusable, the cache is structural, not an optimization. Real optimizations make a working system faster. Tourniquets make a broken system shippable. Know which one you built.

What fixed looks like

The billing address updates instantly, because the write path that saves it also invalidates its cache key — and a short TTL backstops any path that forgets. The CDN serves static and per-public responses at the edge, so most traffic never reaches your origin. The application cache uses cache-aside with database-first ordering, write-through reserved for the few keys where a cold miss is unacceptable.

When a hot key expires under load, single-flight lets one request regenerate it while the rest wait twenty milliseconds for the result — no stampede, no synchronized spike. Deploys warm the hot keys instead of flushing cold into peak traffic. And the queries underneath the cache are fast on their own, so the cache is making a healthy system faster rather than keeping a sick one upright. Pull the cache and the product is slower, not broken.

This is for you if

You're running a funded product with real traffic, you've already added a cache, and you've already been bitten — stale data in front of a customer, or a stampede that took the database down when the cache went cold. You want caching that's correct under load, not caching that works until it spectacularly doesn't.

A caching and performance engagement runs $50k+: we map what's actually hot, place the right cache at the right layer, build invalidation that covers every write path, and add stampede protection — then prove it by flushing the cache under load and watching the database stay calm. A full performance re-architecture, where the queries underneath get fixed so the cache is an optimization and not a tourniquet, runs $100k+.

It's not for you if you have light traffic and a quiet database — at that stage a cache adds correctness risk to solve a problem you don't have, and the right move is a good index and a simple query path. It's for the team whose cache has already lied to a customer once and shouldn't get a second chance.

// cache the answer, not the problem

< transmit >