What Is Caching? Why It Improves Performance

Your app is “fine”… until it isn’t.

A dashboard page loads in 400ms in staging, and then one day it’s 6 seconds in production. The database CPU is pegged, the same expensive query is running thousands of times per minute, and now every tiny spike cascades into a mini outage.

Caching is one of the simplest ways to make systems feel fast and to reduce pressure on your most expensive dependencies.

If you’re following the learning path, start with What Is Scalability? A Beginner’s Guide for Developers and What Is Load Balancing and How It Works first.

For the full roadmap, use the System Design Foundations series as the pillar page and the System Design tag as a category page.

Related foundation posts that pair well with this topic:

Open Table of Contents

What Is Caching? (Definition)
Cache Hit vs Cache Miss (And Why It Matters)
Where Caches Live (Browser, CDN, App, Database)
TTL, Eviction, and Staleness
Common Caching Strategies (Cache-Aside, Read-Through, Write-Through)
Cache Invalidation: The Hard Part
A Practical Example: Cache-Aside in Code (TTL Cache)
Real-World Examples
Common Mistakes
Interview Questions
Conclusion
References
YouTube Videos

What Is Caching? (Definition)

Caching means storing a copy of data somewhere faster (or closer) so you can serve repeated requests without doing the expensive work every time.

In system design terms, caching is usually about one of these:

reducing latency (fewer network hops, fewer slow queries)
reducing load (less work for your database or downstream APIs)
smoothing spikes (absorbing bursts of repeated reads)

A simple mental model:

Primary storage is the source of truth (database, object storage, upstream API)
Cache is a fast copy that’s allowed to be slightly stale

Cache Hit vs Cache Miss (And Why It Matters)

A cache only helps when you get a cache hit.

Cache hit: data is already in the cache → return quickly.
Cache miss: data is not in the cache → fetch from the source of truth, then populate the cache.

The metric that matters is cache hit ratio:

high hit ratio → big latency win and big cost/load reduction
low hit ratio → you pay cache overhead but still hammer the database

In production, hit ratio isn’t a vanity metric - it drives real outcomes like:

database CPU
p95 latency
tail latency during spikes

Where Caches Live (Browser, CDN, App, Database)

Most real systems use multiple cache layers (each with different trade-offs).

flowchart TD
    U[User Browser] --> CDN[CDN / Edge Cache]
    CDN --> LB[Load Balancer]
    LB --> APP[Application Servers]
    APP --> DC[(Distributed Cache
Redis/Memcached)]
    APP --> DB[(Database)]

    classDef layer fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000000;
    class U,CDN,LB,APP,DC,DB layer;

1. Browser cache

The browser caches static assets (JS/CSS/images) and sometimes even API responses.

Great for repeat visits.
Doesn’t help your server load if you have lots of unique users who never repeat requests.
Needs careful use of HTTP caching headers.

2. CDN / edge cache

A CDN caches content close to the user.

Big win for global latency.
Great for static assets and cacheable API responses.
You must control cache keys carefully (especially for personalized content).

3. Application (in-memory) cache

Each app instance keeps a local cache in memory (a Map, LRU, etc.).

Extremely fast.
But each instance has its own cache, so hit ratio can be worse at scale.
Cache contents vanish when the instance restarts.

4. Distributed cache (Redis/Memcached)

A shared cache that all app servers can use.

Better hit ratio because it’s shared.
Extra network hop (still usually much faster than a database query).
Needs operational maturity: sizing, eviction behavior, failover.

If you want the deeper interview-style version of this, see System Design Interview: Distributed Cache Like Redis/Memcached.

TTL, Eviction, and Staleness

Caching always introduces a trade-off: freshness vs speed.

Three concepts show up everywhere:

TTL (time to live)

A TTL is how long a cached value is considered valid.

Short TTL → fresher data, more cache misses.
Long TTL → higher hit ratio, higher staleness risk.

Eviction

Caches have limited space. When they fill up, something must be removed.

Common eviction policies include:

LRU (least recently used)
LFU (least frequently used)
FIFO (first in, first out)

Staleness

If the source data changes before the TTL expires, the cache becomes stale.

That’s not always wrong. Many systems intentionally accept a little staleness (seconds to minutes) to gain massive performance improvements.

Common Caching Strategies (Cache-Aside, Read-Through, Write-Through)

When people say “we added caching,” they usually mean one of these patterns.

1. Cache-aside (lazy loading)

The application checks cache first. On miss, it loads from the DB and then sets the cache.

flowchart TD
    R[Request] --> G{Get from cache}
    G -->|Hit| C[Return cached value]
    G -->|Miss| Q[Query database]
    Q --> S[Set cache with TTL]
    S --> D[Return DB value]

    classDef step fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000000;
    class R,G,C,Q,S,D step;

Pros:

simple
cache fills naturally with hot keys

Cons:

first request is always slow
high miss bursts can stampede the DB if you’re not careful

2. Read-through

The cache acts like it “knows” how to load data from the database automatically.

Pros:

application code can be simpler

Cons:

harder to implement unless you’re using a caching library/framework that supports it

3. Write-through

On write, you write to the cache and the database (synchronously).

Pros:

keeps cache warm
reduces “read after write” misses

Cons:

writes become slower
failure handling is trickier (what if cache write succeeds but DB write fails?)

There are also write-back and write-around strategies, but cache-aside + TTL is usually the first pattern teams reach for.

Cache Invalidation: The Hard Part

Caching feels easy until the data changes.

Invalidation is hard because the cache is no longer “truth.” You must decide how to keep it correct enough.

Common invalidation approaches:

TTL-only: accept staleness until TTL expires
explicit invalidation: after DB update, delete or update relevant cache keys
versioned keys: embed a version in the key (product:v3:123) so you can “flip” versions

Two practical rules:

If your cache key is too broad, you’ll accidentally cache user-specific data and leak it.
If your invalidation logic is too complex, you’ll ship bugs where data is wrong “sometimes.”

For beginner systems, “short TTL + delete-on-write for obvious keys” is often the best balance.

A Practical Example: Cache-Aside in Code (TTL Cache)

This is a tiny in-memory TTL cache you can use to understand the pattern. It’s not a production cache, but it explains the flow.

type CacheEntry<T> = {
  value: T;
  expiresAtMs: number;
};

export class TtlCache<K, V> {
  private store = new Map<K, CacheEntry<V>>();

  get(key: K): V | undefined {
    const entry = this.store.get(key);
    if (!entry) return undefined;

    // Expired → behave like a miss
    if (Date.now() >= entry.expiresAtMs) {
      this.store.delete(key);
      return undefined;
    }

    return entry.value;
  }

  set(key: K, value: V, ttlMs: number): void {
    this.store.set(key, {
      value,
      expiresAtMs: Date.now() + ttlMs,
    });
  }
}

// Example usage (cache-aside)
async function getUserProfile(userId: string): Promise<{ id: string; name: string }> {
  const key = `user:${userId}`;

  const cached = cache.get(key);
  if (cached) return cached;

  // Cache miss: fetch from the source of truth
  const profile = await fetchUserFromDatabase(userId);

  // Cache set: pick a TTL based on how fresh it needs to be
  cache.set(key, profile, 60_000);

  return profile;
}

const cache = new TtlCache<string, { id: string; name: string }>();

async function fetchUserFromDatabase(userId: string) {
  // Replace with real DB call
  return { id: userId, name: "Ada" };
}

What to notice:

expired entries become cache misses
TTL is a design choice (it should match your freshness requirements)

Real-World Examples

Netflix: caching to protect the database and speed up reads

Consumer apps like Netflix serve massive read traffic. A big part of keeping latency low is ensuring that repeated reads (catalog data, metadata, personalization inputs) don’t always translate into expensive database queries.

Caching gives you a “shock absorber” layer: the database still exists, but it’s not doing the same work on every request.

CDNs: caching static assets at the edge

A CDN is essentially a specialized distributed cache.

If you’ve ever noticed that a website’s images load instantly the second time, that’s caching at work - often at multiple layers (browser + CDN).

Session state (and why it relates to sticky sessions)

Many teams start caching or storing session-like state in memory because it’s fast.

Then they add multiple app servers, and requests start landing on different instances. That’s one reason sticky sessions exist, but the more scalable solution is to move shared state into a shared cache/store like Redis.

If you want that full story, see What Are Sticky Sessions in Load Balancing? (Session Affinity).

Common Mistakes

1. Caching personalized data without a safe cache key

If the cache key doesn’t include the user/tenant scope, you can leak one user’s data to another. Treat cache key design like security work.

2. “Forever” caches with no invalidation plan

A cache with no TTL and no invalidation becomes a correctness bug factory. If you can’t invalidate safely, start with TTL.

3. Stampedes on cache miss

When a hot key expires, lots of requests can miss at once and hammer the database.

Common mitigations include:

adding jitter to TTLs
using request coalescing (only one in-flight refresh)
pre-warming the cache for known hot keys

4. Treating cache as the source of truth

Caches fail. They restart. They evict under memory pressure. If losing cache data breaks correctness, you built a database with worse durability.

Interview Questions

1. What is caching and why does it improve performance?

Caching is storing a copy of data in a faster place so repeated requests can avoid expensive work like database queries or remote API calls. It improves performance by reducing latency and by lowering load on downstream systems, which also reduces tail latency during traffic spikes. In interviews, I emphasize that caching is about both speed and stability: protecting your database can prevent incident cascades. The trade-off is freshness, because cached data can become stale.

2. What’s the difference between cache-aside and write-through?

Cache-aside loads data into cache only on a read miss, so the first request is slow but the pattern is simple and flexible. Write-through updates the cache at the time of the write, which keeps the cache warm and reduces read-after-write misses. The trade-off is that write-through makes writes slower and introduces tricky failure handling when one of the writes fails. Cache-aside is usually the default starting point, especially when you’re still learning access patterns.

3. How do you choose a TTL for a cache key?

A good TTL depends on how stale the data is allowed to be, how expensive it is to recompute, and how often it changes. For “pretty static” data like product catalogs, longer TTLs can be fine; for rapidly changing data like inventory counts, you may need shorter TTLs or explicit invalidation. In practice, I pick an initial TTL, watch hit ratio and correctness complaints, then iterate. TTL is not a one-time decision - it’s an operational tuning knob.

4. What are common cache invalidation strategies?

The simplest approach is TTL-only: accept staleness until the entry expires. More advanced approaches include explicit invalidation on write (delete/update affected keys) and versioned keys that allow you to switch to a new cache namespace safely. The hardest part is identifying all the keys that a write affects, especially when cached data is a derived view. Many teams choose TTL-first because it avoids complex invalidation logic that’s easy to get wrong.

5. What can go wrong if you add caching too early?

Caching adds complexity and can hide underlying problems, like an unindexed query or an N+1 query pattern. It can also introduce correctness bugs if your cache key design is wrong or invalidation is incomplete, leading to inconsistent user experiences. Another risk is overconfidence: teams sometimes assume cache makes things “safe,” but a cache miss stampede can make outages worse. I treat caching as a powerful tool, but one that requires observability and a rollback plan.

6. How do you prevent a cache stampede?

A cache stampede happens when many requests miss at once (often due to expiration) and all hit the database simultaneously. Practical mitigations include adding jitter so keys don’t expire at the exact same time, coalescing requests so only one refresh is in-flight per key, and pre-warming the cache for known hot keys. In distributed systems, you might also use a short-lived lock per key, but you need to be careful that the lock mechanism itself doesn’t become a bottleneck. The goal is to keep the database protected even when the cache is cold.

Conclusion

Caching means keeping a fast copy of data to serve repeated requests with lower latency and lower downstream load.
Cache hits are where the benefit comes from; cache miss patterns determine whether your database stays protected.
Most systems use multiple cache layers (browser, CDN, app memory, distributed cache), each with different trade-offs.
TTL and eviction are how caches stay bounded, but they also introduce staleness you must plan for.
Cache-aside is a great starting strategy; invalidation is the part that needs the most discipline.

The next topic in this series covers In-Memory Cache vs Distributed Cache - and how to choose between “fast local” and “shared but networked.”

If you want to revisit how caching interacts with scaling and routing, What Is Load Balancing and How It Works and What Are Sticky Sessions in Load Balancing? (Session Affinity) connect the dots.

References

What is Caching and How it Works - AWS
https://aws.amazon.com/caching/
HTTP caching - MDN Web Docs
https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching
Caching strategies overview - Cloudflare Learning Center
https://www.cloudflare.com/learning/cdn/what-is-caching/
Eviction policies - Redis Documentation
https://redis.io/docs/latest/operate/rs/databases/memory-performance/eviction-policy/

YouTube Videos

“Caching Explained in Simple Words | Browser, CDN, Redis & DB”
https://www.youtube.com/watch?v=MIsEOHgB_Ic
“Master Caching Strategy Selection | Cache-Aside, Write-Through, Write-Back, Write-Around Explained”
https://www.youtube.com/watch?v=Iim9lDEIh2g
“Cache Hit & Cache Miss Explained | Caching Fundamentals”
https://www.youtube.com/watch?v=bPvW4uAYj_A

What Is Caching? Why It Improves Performance

Key Takeaways

Table of Contents

What Is Caching? (Definition)

Cache Hit vs Cache Miss (And Why It Matters)

Where Caches Live (Browser, CDN, App, Database)

1. Browser cache

2. CDN / edge cache

3. Application (in-memory) cache

4. Distributed cache (Redis/Memcached)

TTL, Eviction, and Staleness

TTL (time to live)

Eviction

Staleness

Common Caching Strategies (Cache-Aside, Read-Through, Write-Through)

1. Cache-aside (lazy loading)

2. Read-through

3. Write-through

Cache Invalidation: The Hard Part

A Practical Example: Cache-Aside in Code (TTL Cache)

Real-World Examples

Netflix: caching to protect the database and speed up reads

CDNs: caching static assets at the edge

Session state (and why it relates to sticky sessions)

Common Mistakes

1. Caching personalized data without a safe cache key

2. “Forever” caches with no invalidation plan

3. Stampedes on cache miss

4. Treating cache as the source of truth

Interview Questions

1. What is caching and why does it improve performance?

2. What’s the difference between cache-aside and write-through?

3. How do you choose a TTL for a cache key?

4. What are common cache invalidation strategies?

5. What can go wrong if you add caching too early?

6. How do you prevent a cache stampede?

Conclusion

References

YouTube Videos

Next in Series

Related Posts

What Is a Single Point of Failure (SPOF)?

What Is High Availability? A Beginner's Guide

In-Memory Cache vs Distributed Cache Explained

Keep Learning with New Posts

Was this guide helpful?