
Your app is “fine”… until it isn’t.
A dashboard page loads in 400ms in staging, and then one day it’s 6 seconds in production. The database CPU is pegged, the same expensive query is running thousands of times per minute, and now every tiny spike cascades into a mini outage.
Caching is one of the simplest ways to make systems feel fast and to reduce pressure on your most expensive dependencies.
If you’re following the learning path, start with What Is Scalability? A Beginner’s Guide for Developers and What Is Load Balancing and How It Works first.
For the full roadmap, use the System Design Foundations series as the pillar page and the System Design tag as a category page.
Related foundation posts that pair well with this topic:
- What Are Sticky Sessions in Load Balancing? (Session Affinity)
- Horizontal vs Vertical Scaling Explained (Scale Out vs Up)
- What Is Database Connection Pooling and Why It Matters
Table of Contents
Open Table of Contents
- What Is Caching? (Definition)
- Cache Hit vs Cache Miss (And Why It Matters)
- Where Caches Live (Browser, CDN, App, Database)
- TTL, Eviction, and Staleness
- Common Caching Strategies (Cache-Aside, Read-Through, Write-Through)
- Cache Invalidation: The Hard Part
- A Practical Example: Cache-Aside in Code (TTL Cache)
- Real-World Examples
- Common Mistakes
- Interview Questions
- 1. What is caching and why does it improve performance?
- 2. What’s the difference between cache-aside and write-through?
- 3. How do you choose a TTL for a cache key?
- 4. What are common cache invalidation strategies?
- 5. What can go wrong if you add caching too early?
- 6. How do you prevent a cache stampede?
- Conclusion
- References
- YouTube Videos
What Is Caching? (Definition)
Caching means storing a copy of data somewhere faster (or closer) so you can serve repeated requests without doing the expensive work every time.
In system design terms, caching is usually about one of these:
- reducing latency (fewer network hops, fewer slow queries)
- reducing load (less work for your database or downstream APIs)
- smoothing spikes (absorbing bursts of repeated reads)
A simple mental model:
- Primary storage is the source of truth (database, object storage, upstream API)
- Cache is a fast copy that’s allowed to be slightly stale
Cache Hit vs Cache Miss (And Why It Matters)
A cache only helps when you get a cache hit.
- Cache hit: data is already in the cache → return quickly.
- Cache miss: data is not in the cache → fetch from the source of truth, then populate the cache.
The metric that matters is cache hit ratio:
- high hit ratio → big latency win and big cost/load reduction
- low hit ratio → you pay cache overhead but still hammer the database
In production, hit ratio isn’t a vanity metric - it drives real outcomes like:
- database CPU
- p95 latency
- tail latency during spikes
Where Caches Live (Browser, CDN, App, Database)
Most real systems use multiple cache layers (each with different trade-offs).
flowchart TD
U[User Browser] --> CDN[CDN / Edge Cache]
CDN --> LB[Load Balancer]
LB --> APP[Application Servers]
APP --> DC[(Distributed Cache
Redis/Memcached)]
APP --> DB[(Database)]
classDef layer fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000000;
class U,CDN,LB,APP,DC,DB layer;
1. Browser cache
The browser caches static assets (JS/CSS/images) and sometimes even API responses.
- Great for repeat visits.
- Doesn’t help your server load if you have lots of unique users who never repeat requests.
- Needs careful use of HTTP caching headers.
2. CDN / edge cache
A CDN caches content close to the user.
- Big win for global latency.
- Great for static assets and cacheable API responses.
- You must control cache keys carefully (especially for personalized content).
3. Application (in-memory) cache
Each app instance keeps a local cache in memory (a Map, LRU, etc.).
- Extremely fast.
- But each instance has its own cache, so hit ratio can be worse at scale.
- Cache contents vanish when the instance restarts.
4. Distributed cache (Redis/Memcached)
A shared cache that all app servers can use.
- Better hit ratio because it’s shared.
- Extra network hop (still usually much faster than a database query).
- Needs operational maturity: sizing, eviction behavior, failover.
If you want the deeper interview-style version of this, see System Design Interview: Distributed Cache Like Redis/Memcached.
TTL, Eviction, and Staleness
Caching always introduces a trade-off: freshness vs speed.
Three concepts show up everywhere:
TTL (time to live)
A TTL is how long a cached value is considered valid.
- Short TTL → fresher data, more cache misses.
- Long TTL → higher hit ratio, higher staleness risk.
Eviction
Caches have limited space. When they fill up, something must be removed.
Common eviction policies include:
- LRU (least recently used)
- LFU (least frequently used)
- FIFO (first in, first out)
Staleness
If the source data changes before the TTL expires, the cache becomes stale.
That’s not always wrong. Many systems intentionally accept a little staleness (seconds to minutes) to gain massive performance improvements.
Common Caching Strategies (Cache-Aside, Read-Through, Write-Through)
When people say “we added caching,” they usually mean one of these patterns.
1. Cache-aside (lazy loading)
The application checks cache first. On miss, it loads from the DB and then sets the cache.
flowchart TD
R[Request] --> G{Get from cache}
G -->|Hit| C[Return cached value]
G -->|Miss| Q[Query database]
Q --> S[Set cache with TTL]
S --> D[Return DB value]
classDef step fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000000;
class R,G,C,Q,S,D step;
Pros:
- simple
- cache fills naturally with hot keys
Cons:
- first request is always slow
- high miss bursts can stampede the DB if you’re not careful
2. Read-through
The cache acts like it “knows” how to load data from the database automatically.
Pros:
- application code can be simpler
Cons:
- harder to implement unless you’re using a caching library/framework that supports it
3. Write-through
On write, you write to the cache and the database (synchronously).
Pros:
- keeps cache warm
- reduces “read after write” misses
Cons:
- writes become slower
- failure handling is trickier (what if cache write succeeds but DB write fails?)
There are also write-back and write-around strategies, but cache-aside + TTL is usually the first pattern teams reach for.
Cache Invalidation: The Hard Part
Caching feels easy until the data changes.
Invalidation is hard because the cache is no longer “truth.” You must decide how to keep it correct enough.
Common invalidation approaches:
- TTL-only: accept staleness until TTL expires
- explicit invalidation: after DB update, delete or update relevant cache keys
- versioned keys: embed a version in the key (
product:v3:123) so you can “flip” versions
Two practical rules:
- If your cache key is too broad, you’ll accidentally cache user-specific data and leak it.
- If your invalidation logic is too complex, you’ll ship bugs where data is wrong “sometimes.”
For beginner systems, “short TTL + delete-on-write for obvious keys” is often the best balance.
A Practical Example: Cache-Aside in Code (TTL Cache)
This is a tiny in-memory TTL cache you can use to understand the pattern. It’s not a production cache, but it explains the flow.
type CacheEntry<T> = {
value: T;
expiresAtMs: number;
};
export class TtlCache<K, V> {
private store = new Map<K, CacheEntry<V>>();
get(key: K): V | undefined {
const entry = this.store.get(key);
if (!entry) return undefined;
// Expired → behave like a miss
if (Date.now() >= entry.expiresAtMs) {
this.store.delete(key);
return undefined;
}
return entry.value;
}
set(key: K, value: V, ttlMs: number): void {
this.store.set(key, {
value,
expiresAtMs: Date.now() + ttlMs,
});
}
}
// Example usage (cache-aside)
async function getUserProfile(userId: string): Promise<{ id: string; name: string }> {
const key = `user:${userId}`;
const cached = cache.get(key);
if (cached) return cached;
// Cache miss: fetch from the source of truth
const profile = await fetchUserFromDatabase(userId);
// Cache set: pick a TTL based on how fresh it needs to be
cache.set(key, profile, 60_000);
return profile;
}
const cache = new TtlCache<string, { id: string; name: string }>();
async function fetchUserFromDatabase(userId: string) {
// Replace with real DB call
return { id: userId, name: "Ada" };
}
What to notice:
- expired entries become cache misses
- TTL is a design choice (it should match your freshness requirements)
Real-World Examples
Netflix: caching to protect the database and speed up reads
Consumer apps like Netflix serve massive read traffic. A big part of keeping latency low is ensuring that repeated reads (catalog data, metadata, personalization inputs) don’t always translate into expensive database queries.
Caching gives you a “shock absorber” layer: the database still exists, but it’s not doing the same work on every request.
CDNs: caching static assets at the edge
A CDN is essentially a specialized distributed cache.
If you’ve ever noticed that a website’s images load instantly the second time, that’s caching at work - often at multiple layers (browser + CDN).
Session state (and why it relates to sticky sessions)
Many teams start caching or storing session-like state in memory because it’s fast.
Then they add multiple app servers, and requests start landing on different instances. That’s one reason sticky sessions exist, but the more scalable solution is to move shared state into a shared cache/store like Redis.
If you want that full story, see What Are Sticky Sessions in Load Balancing? (Session Affinity).
Common Mistakes
1. Caching personalized data without a safe cache key
If the cache key doesn’t include the user/tenant scope, you can leak one user’s data to another. Treat cache key design like security work.
2. “Forever” caches with no invalidation plan
A cache with no TTL and no invalidation becomes a correctness bug factory. If you can’t invalidate safely, start with TTL.
3. Stampedes on cache miss
When a hot key expires, lots of requests can miss at once and hammer the database.
Common mitigations include:
- adding jitter to TTLs
- using request coalescing (only one in-flight refresh)
- pre-warming the cache for known hot keys
4. Treating cache as the source of truth
Caches fail. They restart. They evict under memory pressure. If losing cache data breaks correctness, you built a database with worse durability.
Interview Questions
1. What is caching and why does it improve performance?
Caching is storing a copy of data in a faster place so repeated requests can avoid expensive work like database queries or remote API calls. It improves performance by reducing latency and by lowering load on downstream systems, which also reduces tail latency during traffic spikes. In interviews, I emphasize that caching is about both speed and stability: protecting your database can prevent incident cascades. The trade-off is freshness, because cached data can become stale.
2. What’s the difference between cache-aside and write-through?
Cache-aside loads data into cache only on a read miss, so the first request is slow but the pattern is simple and flexible. Write-through updates the cache at the time of the write, which keeps the cache warm and reduces read-after-write misses. The trade-off is that write-through makes writes slower and introduces tricky failure handling when one of the writes fails. Cache-aside is usually the default starting point, especially when you’re still learning access patterns.
3. How do you choose a TTL for a cache key?
A good TTL depends on how stale the data is allowed to be, how expensive it is to recompute, and how often it changes. For “pretty static” data like product catalogs, longer TTLs can be fine; for rapidly changing data like inventory counts, you may need shorter TTLs or explicit invalidation. In practice, I pick an initial TTL, watch hit ratio and correctness complaints, then iterate. TTL is not a one-time decision - it’s an operational tuning knob.
4. What are common cache invalidation strategies?
The simplest approach is TTL-only: accept staleness until the entry expires. More advanced approaches include explicit invalidation on write (delete/update affected keys) and versioned keys that allow you to switch to a new cache namespace safely. The hardest part is identifying all the keys that a write affects, especially when cached data is a derived view. Many teams choose TTL-first because it avoids complex invalidation logic that’s easy to get wrong.
5. What can go wrong if you add caching too early?
Caching adds complexity and can hide underlying problems, like an unindexed query or an N+1 query pattern. It can also introduce correctness bugs if your cache key design is wrong or invalidation is incomplete, leading to inconsistent user experiences. Another risk is overconfidence: teams sometimes assume cache makes things “safe,” but a cache miss stampede can make outages worse. I treat caching as a powerful tool, but one that requires observability and a rollback plan.
6. How do you prevent a cache stampede?
A cache stampede happens when many requests miss at once (often due to expiration) and all hit the database simultaneously. Practical mitigations include adding jitter so keys don’t expire at the exact same time, coalescing requests so only one refresh is in-flight per key, and pre-warming the cache for known hot keys. In distributed systems, you might also use a short-lived lock per key, but you need to be careful that the lock mechanism itself doesn’t become a bottleneck. The goal is to keep the database protected even when the cache is cold.
Conclusion
- Caching means keeping a fast copy of data to serve repeated requests with lower latency and lower downstream load.
- Cache hits are where the benefit comes from; cache miss patterns determine whether your database stays protected.
- Most systems use multiple cache layers (browser, CDN, app memory, distributed cache), each with different trade-offs.
- TTL and eviction are how caches stay bounded, but they also introduce staleness you must plan for.
- Cache-aside is a great starting strategy; invalidation is the part that needs the most discipline.
The next topic in this series covers In-Memory Cache vs Distributed Cache - and how to choose between “fast local” and “shared but networked.”
If you want to revisit how caching interacts with scaling and routing, What Is Load Balancing and How It Works and What Are Sticky Sessions in Load Balancing? (Session Affinity) connect the dots.
References
-
What is Caching and How it Works - AWS
https://aws.amazon.com/caching/ -
HTTP caching - MDN Web Docs
https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching -
Caching strategies overview - Cloudflare Learning Center
https://www.cloudflare.com/learning/cdn/what-is-caching/ -
Eviction policies - Redis Documentation
https://redis.io/docs/latest/operate/rs/databases/memory-performance/eviction-policy/
YouTube Videos
-
“Caching Explained in Simple Words | Browser, CDN, Redis & DB”
https://www.youtube.com/watch?v=MIsEOHgB_Ic -
“Master Caching Strategy Selection | Cache-Aside, Write-Through, Write-Back, Write-Around Explained”
https://www.youtube.com/watch?v=Iim9lDEIh2g -
“Cache Hit & Cache Miss Explained | Caching Fundamentals”
https://www.youtube.com/watch?v=bPvW4uAYj_A