Blog/System Design/Caching Strategies in System Design
POST
July 25, 2025
LAST UPDATEDJuly 25, 2025

Caching Strategies in System Design

Explore caching strategies like write-through, write-back, and cache-aside for system design. Learn Redis caching patterns and CDN optimization techniques.

Tags

System DesignCachingRedisCDNPerformance
Caching Strategies in System Design
10 min read

Caching Strategies in System Design

This is Part 5 of the System Design from Zero to Hero series.

TL;DR

Caching stores frequently accessed data closer to where it is needed, reducing latency and database load. The main strategies — cache-aside, write-through, write-behind, and read-through — each offer different trade-offs between consistency, performance, and complexity. The hardest problem in caching is not storing data; it is knowing when to invalidate it. Effective caching happens at multiple layers: browser, CDN, application (Redis), and database query cache.

Why This Matters

In Part 4, we discussed choosing the right database for your data. But even the best database hits performance limits when every request queries it directly. A well-tuned PostgreSQL instance on solid hardware can handle thousands of queries per second. A Redis cache on the same hardware handles hundreds of thousands.

Caching is not optional at scale. It is the difference between sub-50ms response times and multi-second page loads. It is the difference between a database humming along at 30% CPU and one falling over during peak traffic. Every major system — from Google Search to Netflix to Twitter — relies on aggressive, multi-layer caching to deliver the performance users expect.

Core Concepts

The Caching Hierarchy

Caching happens at every layer of the stack, each with different latency characteristics and use cases:

Layer            Latency        Scope              Example
──────────────────────────────────────────────────────────────
Browser cache    ~0ms           Single user         Static assets, API responses
CDN cache        ~10-50ms       Geographic region   Images, CSS, JS, HTML pages
Load Balancer    ~1ms           Per-server          SSL sessions, simple responses
Application      ~1-5ms         Cluster-wide        Redis/Memcached
Database cache   ~5-10ms        Per-database        Query cache, buffer pool
Disk cache       ~0.1ms         Per-machine         OS page cache

The goal is to serve as many requests as possible from the fastest layer. A request that hits the browser cache never touches your servers at all. A request that hits the CDN avoids your application entirely. A request that hits Redis avoids an expensive database query. Each cache layer absorbs traffic that would otherwise cascade down to slower, more expensive layers.

Cache-Aside (Lazy Loading)

Cache-aside is the most common caching pattern. The application code is responsible for reading from and writing to the cache. Data is loaded into the cache on demand — only when a request needs it.

Read path:
1. Application checks cache
2. Cache HIT → return cached data
3. Cache MISS → query database → store in cache → return data

Write path:
1. Application writes to database
2. Application invalidates (deletes) the cache entry
3. Next read will repopulate the cache
python
import redis
import json
 
cache = redis.Redis(host='redis.internal', port=6379)
CACHE_TTL = 3600  # 1 hour
 
async def get_user_profile(user_id: str) -> dict:
    cache_key = f"user:profile:{user_id}"
 
    # Step 1: Check cache
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)  # Cache HIT
 
    # Step 2: Cache MISS - query database
    profile = await db.fetchrow(
        "SELECT id, name, email, avatar, bio FROM users WHERE id = $1",
        user_id
    )
 
    if profile is None:
        return None
 
    result = dict(profile)
 
    # Step 3: Store in cache with TTL
    cache.setex(cache_key, CACHE_TTL, json.dumps(result))
 
    return result
 
async def update_user_profile(user_id: str, updates: dict):
    # Step 1: Update database (source of truth)
    await db.execute(
        "UPDATE users SET name = $1, bio = $2 WHERE id = $3",
        updates['name'], updates['bio'], user_id
    )
 
    # Step 2: Invalidate cache (delete, not update)
    cache.delete(f"user:profile:{user_id}")
    # Next read will fetch fresh data from DB and repopulate cache

Advantages: Only caches data that is actually requested. Simple to implement. Cache failures do not block the application — you fall back to the database.

Disadvantages: First request for any piece of data is always slow (cache miss). Potential for stale data between database write and cache invalidation. Cache stampede risk (explained below).

Write-Through

In write-through caching, every write goes to both the cache and the database simultaneously. The write is not considered complete until both have been updated.

Write path:
1. Application writes to cache AND database (synchronously)
2. Both are always in sync

Read path:
1. Application reads from cache (always populated)
2. Cache MISS → query database (rare, only after eviction)
python
async def update_product_price(product_id: str, new_price: float):
    # Write to database
    await db.execute(
        "UPDATE products SET price = $1 WHERE id = $2",
        new_price, product_id
    )
 
    # Write to cache (synchronously - both must succeed)
    product = await db.fetchrow("SELECT * FROM products WHERE id = $1", product_id)
    cache.setex(
        f"product:{product_id}",
        CACHE_TTL,
        json.dumps(dict(product))
    )

Advantages: Cache is always consistent with the database. Reads are always fast (no cold cache on reads).

Disadvantages: Higher write latency (two writes per operation). Caches data that might never be read, wasting memory. Not suitable for write-heavy workloads where cache churn is high.

Write-Behind (Write-Back)

Write-behind writes to the cache immediately and asynchronously flushes changes to the database in the background. This optimizes write performance at the cost of potential data loss.

Write path:
1. Application writes to cache (fast, returns immediately)
2. Background process batches and flushes cache writes to database

Read path:
1. Application reads from cache (always up-to-date)
python
import asyncio
from collections import defaultdict
 
# Write buffer for batching database writes
write_buffer = defaultdict(dict)
 
async def update_view_count(video_id: str):
    """Increment view count in cache immediately, batch-write to DB"""
    cache_key = f"video:views:{video_id}"
 
    # Atomic increment in Redis (fast)
    new_count = cache.incr(cache_key)
 
    # Buffer the write for batch processing
    write_buffer[video_id] = new_count
 
async def flush_view_counts():
    """Periodically flush buffered counts to database"""
    while True:
        await asyncio.sleep(10)  # Flush every 10 seconds
 
        if not write_buffer:
            continue
 
        # Batch update database
        batch = dict(write_buffer)
        write_buffer.clear()
 
        for video_id, count in batch.items():
            await db.execute(
                "UPDATE videos SET view_count = $1 WHERE id = $2",
                count, video_id
            )

Advantages: Extremely fast writes. Batching reduces database load. Ideal for high-frequency counters, metrics, and analytics.

Disadvantages: Risk of data loss if the cache crashes before flushing. Increased complexity. Not suitable for data that requires immediate durability (financial transactions).

Read-Through

Read-through is similar to cache-aside, but the cache itself is responsible for loading data from the database on a miss. The application only talks to the cache, never directly to the database for reads.

Read path:
1. Application requests data from cache
2. Cache HIT → return data
3. Cache MISS → cache queries database, stores result, returns data

The application never queries the database directly for reads.

This pattern is often implemented by the caching framework itself (e.g., NCache, Hazelcast) rather than in application code. The benefit is that cache population logic is centralized in the cache layer rather than scattered across application code.

Redis Data Structures for Caching

Redis is far more than a simple key-value store. Its data structures enable powerful caching patterns:

python
import redis
 
r = redis.Redis(host='redis.internal', port=6379)
 
# STRING: Simple key-value caching
r.setex('user:42:profile', 3600, json.dumps(user_data))
 
# HASH: Cache object fields independently
r.hset('user:42', mapping={
    'name': 'Alice',
    'email': 'alice@example.com',
    'role': 'admin'
})
# Update single field without fetching entire object
r.hset('user:42', 'last_login', '2024-01-15T10:30:00Z')
# Get single field
name = r.hget('user:42', 'name')
 
# SORTED SET: Leaderboards, top-N queries
r.zadd('trending:articles', {'article:101': 4500, 'article:205': 3200})
# Get top 10 articles
top_articles = r.zrevrange('trending:articles', 0, 9, withscores=True)
 
# LIST: Recent activity feeds, queues
r.lpush('user:42:feed', json.dumps(new_activity))
r.ltrim('user:42:feed', 0, 99)  # Keep only last 100 items
recent = r.lrange('user:42:feed', 0, 19)  # Get 20 most recent
 
# SET: Unique values, membership checks
r.sadd('online:users', 'user:42', 'user:88', 'user:101')
is_online = r.sismember('online:users', 'user:42')  # O(1) lookup
online_count = r.scard('online:users')
 
# HyperLogLog: Approximate unique counts (uses ~12 KB regardless of count)
r.pfadd('page:home:visitors', 'user:42', 'user:88')
r.pfadd('page:home:visitors', 'user:42')  # duplicate, not counted
approx_unique = r.pfcount('page:home:visitors')

Choosing the right Redis data structure can eliminate the need for complex database queries entirely. A sorted set replaces a SELECT ... ORDER BY score DESC LIMIT 10 query that scans and sorts rows on every request.

CDN Caching

A Content Delivery Network caches content at edge servers geographically close to users. For static assets (images, CSS, JavaScript), CDN caching reduces latency from hundreds of milliseconds to single-digit milliseconds.

Without CDN:
User in Tokyo ──500ms──▶ Origin server in Virginia

With CDN:
User in Tokyo ──10ms──▶ CDN edge in Tokyo
                         (cached copy of content)

CDN caching is controlled through HTTP headers:

# Cache-Control header examples

# Cache for 1 year (static assets with fingerprinted filenames)
Cache-Control: public, max-age=31536000, immutable

# Cache for 5 minutes, serve stale while revalidating
Cache-Control: public, max-age=300, stale-while-revalidate=60

# Don't cache (user-specific content)
Cache-Control: private, no-store

# Cache but always revalidate with origin
Cache-Control: public, no-cache

The immutable directive is particularly powerful for assets with content hashes in their filenames (e.g., app.a3b2c1.js). It tells the CDN and browser to never revalidate — the content at that URL will never change. If you update the file, the filename changes and it is treated as a new resource.

Browser Caching

Browser caching is the fastest cache layer — zero network latency. The browser stores responses locally based on Cache-Control headers.

javascript
// Service Worker: programmatic browser caching
self.addEventListener('fetch', (event) => {
  event.respondWith(
    caches.match(event.request).then((cached) => {
      if (cached) {
        // Serve from cache, update in background (stale-while-revalidate)
        event.waitUntil(
          fetch(event.request).then((response) => {
            caches.open('v1').then((cache) => {
              cache.put(event.request, response);
            });
          })
        );
        return cached;
      }
      // Not cached: fetch from network and cache the response
      return fetch(event.request).then((response) => {
        const clone = response.clone();
        caches.open('v1').then((cache) => cache.put(event.request, clone));
        return response;
      });
    })
  );
});

Cache Invalidation Strategies

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. Here are the proven approaches:

TTL-based (Time-To-Live): Set an expiration time on cached entries. Simple and predictable, but users see stale data until the TTL expires.

python
# TTL-based: simple but potentially stale
cache.setex('product:42', 300, json.dumps(product))  # expires in 5 minutes

Event-based invalidation: When data changes, publish an event that triggers cache invalidation. More complex but ensures freshness.

python
# Event-based: invalidate on write
async def update_product(product_id, updates):
    await db.execute("UPDATE products SET ... WHERE id = $1", product_id)
 
    # Publish invalidation event
    await message_bus.publish('cache.invalidate', {
        'keys': [f'product:{product_id}', f'category:{updates["category_id"]}:products']
    })
 
# Cache invalidation subscriber (runs on all app servers)
async def on_cache_invalidate(event):
    for key in event['keys']:
        cache.delete(key)

Versioned keys: Include a version number in cache keys. When data changes, increment the version. Old cache entries are never explicitly deleted — they simply expire via TTL.

python
# Versioned keys: avoid explicit invalidation
async def get_product(product_id):
    version = await db.fetchval(
        "SELECT version FROM products WHERE id = $1", product_id
    )
    cache_key = f"product:{product_id}:v{version}"
 
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
 
    product = await db.fetchrow("SELECT * FROM products WHERE id = $1", product_id)
    cache.setex(cache_key, 3600, json.dumps(dict(product)))
    return dict(product)
 
# On update: just increment version. Old cache entry expires naturally.
async def update_product(product_id, updates):
    await db.execute(
        "UPDATE products SET name=$1, version=version+1 WHERE id=$2",
        updates['name'], product_id
    )

Cache Stampede Prevention

A cache stampede (also called the thundering herd problem) occurs when a popular cache entry expires and hundreds of concurrent requests simultaneously query the database to repopulate it. This can overwhelm the database.

Normal operation:
1000 requests/sec ──▶ Cache (HIT) ──▶ response
                     Only 1 request/sec leaks to DB

Cache entry expires:
1000 requests/sec ──▶ Cache (ALL MISS) ──▶ DB receives 1000 queries simultaneously
                                           DB overwhelmed

Solution 1: Locking — Only one request repopulates the cache; others wait.

python
import time
 
async def get_with_lock(key: str, fetch_fn, ttl: int = 3600):
    # Try cache first
    cached = cache.get(key)
    if cached:
        return json.loads(cached)
 
    # Acquire lock (only one process populates the cache)
    lock_key = f"lock:{key}"
    acquired = cache.set(lock_key, "1", nx=True, ex=10)  # 10s lock timeout
 
    if acquired:
        try:
            # This process populates the cache
            data = await fetch_fn()
            cache.setex(key, ttl, json.dumps(data))
            return data
        finally:
            cache.delete(lock_key)
    else:
        # Another process is populating — wait and retry
        for _ in range(50):  # wait up to 5 seconds
            time.sleep(0.1)
            cached = cache.get(key)
            if cached:
                return json.loads(cached)
 
        # Fallback: query database directly
        return await fetch_fn()

Solution 2: Early expiration (probabilistic) — Refresh the cache before it actually expires. Each request that reads a near-expiry entry has a small probability of triggering a background refresh.

python
import random
import time
 
async def get_with_early_refresh(key: str, fetch_fn, ttl: int = 3600):
    cached = cache.get(key)
    if cached:
        data = json.loads(cached)
        remaining_ttl = cache.ttl(key)
 
        # If less than 10% of TTL remaining, probabilistically refresh
        if remaining_ttl < ttl * 0.1:
            refresh_probability = 1 - (remaining_ttl / (ttl * 0.1))
            if random.random() < refresh_probability:
                # Refresh in background
                asyncio.create_task(refresh_cache(key, fetch_fn, ttl))
 
        return data
 
    # Cache miss: fetch and populate
    data = await fetch_fn()
    cache.setex(key, ttl, json.dumps(data))
    return data

Solution 3: Never expire hot keys — For extremely popular cache entries, refresh them proactively on a schedule rather than relying on TTL expiration.

Practical Implementation

Here is a complete caching layer for a web application that combines multiple strategies:

python
class CacheLayer:
    """Multi-strategy caching with stampede prevention"""
 
    def __init__(self, redis_client, default_ttl=3600):
        self.cache = redis_client
        self.default_ttl = default_ttl
 
    async def get_or_fetch(self, key, fetch_fn, ttl=None, strategy='cache_aside'):
        ttl = ttl or self.default_ttl
 
        if strategy == 'cache_aside':
            return await self._cache_aside(key, fetch_fn, ttl)
        elif strategy == 'read_through_locked':
            return await self._read_through_locked(key, fetch_fn, ttl)
 
    async def _cache_aside(self, key, fetch_fn, ttl):
        cached = self.cache.get(key)
        if cached:
            return json.loads(cached)
 
        data = await fetch_fn()
        if data is not None:
            self.cache.setex(key, ttl, json.dumps(data))
        return data
 
    async def _read_through_locked(self, key, fetch_fn, ttl):
        """Cache-aside with stampede prevention via locking"""
        cached = self.cache.get(key)
        if cached:
            return json.loads(cached)
 
        lock_key = f"lock:{key}"
        if self.cache.set(lock_key, "1", nx=True, ex=10):
            try:
                data = await fetch_fn()
                if data is not None:
                    self.cache.setex(key, ttl, json.dumps(data))
                return data
            finally:
                self.cache.delete(lock_key)
        else:
            # Wait for the lock holder to populate cache
            for _ in range(50):
                await asyncio.sleep(0.1)
                cached = self.cache.get(key)
                if cached:
                    return json.loads(cached)
            return await fetch_fn()
 
    def invalidate(self, *keys):
        """Delete one or more cache entries"""
        if keys:
            self.cache.delete(*keys)
 
    def invalidate_pattern(self, pattern):
        """Delete all keys matching a pattern (use sparingly)"""
        cursor = 0
        while True:
            cursor, keys = self.cache.scan(cursor, match=pattern, count=100)
            if keys:
                self.cache.delete(*keys)
            if cursor == 0:
                break
 
# Usage
cache_layer = CacheLayer(redis.Redis(host='redis.internal'))
 
# Standard cache-aside for user profiles
profile = await cache_layer.get_or_fetch(
    f"user:{user_id}:profile",
    lambda: db.get_user_profile(user_id),
    ttl=1800
)
 
# Locked read-through for popular product pages
product = await cache_layer.get_or_fetch(
    f"product:{product_id}",
    lambda: db.get_product(product_id),
    ttl=300,
    strategy='read_through_locked'
)

Trade-offs and Decision Framework

StrategyConsistencyRead PerfWrite PerfComplexityBest For
Cache-AsideEventualGood (after warm-up)GoodLowGeneral purpose
Write-ThroughStrongExcellentSlowerMediumRead-heavy, consistency matters
Write-BehindEventualExcellentExcellentHighCounters, analytics, metrics
Read-ThroughEventualGood (after warm-up)GoodMediumFramework-managed caching

Decision guidelines:

  • Start with cache-aside. It is the simplest, most flexible, and handles most use cases.
  • Use write-through when read consistency is important and write volume is moderate.
  • Use write-behind only for data where temporary loss is acceptable (view counts, analytics).
  • Set TTLs on everything. A cache entry without a TTL is a memory leak waiting to happen.
  • Monitor cache hit rates. Below 80% suggests your TTLs are too short or you are caching the wrong data.
  • Prefer short TTLs (minutes) over long ones (hours) when starting out. You can always increase TTLs after validating that stale data is acceptable for your use case.

Common Interview Questions

Q: How would you implement caching for a social media news feed? Use a fan-out-on-write approach: when a user posts, push the post ID to each follower's cached feed (a Redis list). For celebrities with millions of followers, use fan-out-on-read instead — merge the celebrity's posts with the cached feed at read time. Use cache-aside with a TTL for the feed, and sorted sets to maintain ordering by timestamp. Invalidate or update the feed cache when a user creates, deletes, or hides a post.

Q: Your cache hit rate dropped from 95% to 60% overnight. What happened? Common causes: (1) A deployment changed cache keys (new naming convention invalidated all existing entries). (2) The cache was restarted or scaled, causing a cold cache. (3) A new feature introduced uncacheable queries. (4) Traffic patterns shifted (new users accessing uncached content). (5) TTLs were inadvertently shortened. Diagnose by checking cache memory usage, eviction rates, and correlating with recent deployments or traffic changes.

Q: When should you NOT use caching? When your data changes with every request (real-time stock prices during trading hours). When strong consistency is a regulatory requirement and even milliseconds of staleness is unacceptable. When your working set is larger than available cache memory (you will get constant evictions and low hit rates). When the overhead of cache management exceeds the performance benefit (very small datasets that the database serves fast enough).

Q: Explain the difference between cache eviction and cache invalidation. Cache invalidation is intentionally removing specific entries because the underlying data changed (you update a product price and delete the cached product). Cache eviction is the cache automatically removing entries to free memory when it is full, typically using an LRU (Least Recently Used) or LFU (Least Frequently Used) policy. Invalidation ensures correctness; eviction ensures the cache does not run out of memory. Both can cause cache misses, but for different reasons.

What's Next

This concludes the first five parts of our System Design from Zero to Hero series. We have covered the foundational concepts in Part 1, scaling strategies in Part 2, load balancing in Part 3, database selection in Part 4, and caching in this post. Together, these form the building blocks for designing systems that scale. Future posts in the series will cover message queues, API design, microservice communication patterns, and more.

FAQ

What is the difference between cache-aside and write-through caching?

Cache-aside loads data into cache on demand (lazy loading) — the application checks the cache first, and on a miss, queries the database and populates the cache. Write-through updates the cache synchronously on every write, ensuring the cache always has fresh data. Cache-aside is simpler and avoids caching data that is never read, but risks cache misses on the first access and brief windows of stale data. Write-through guarantees consistency between cache and database but adds write latency and may cache data that is rarely read. Most systems start with cache-aside and move to write-through for specific hot paths where consistency matters.

How do I handle cache invalidation in a distributed system?

Use TTL-based expiration for simplicity — every cache entry gets a time-to-live and automatically expires. For tighter consistency, use event-driven invalidation with a message queue: when data changes, publish an invalidation event that all application instances consume to delete stale cache entries. For immutable data, use versioned keys where the version number is part of the cache key, and old entries simply expire via TTL. Avoid manual, ad-hoc invalidation across services as it is error-prone and does not scale. In practice, combining TTL as a safety net with event-driven invalidation for critical data provides the best balance of simplicity and correctness.

When should I use a CDN instead of application-level caching?

Use CDNs for static assets (images, CSS, JavaScript, fonts), media files (video, audio), and any content that is identical for all users and benefits from geographic distribution. CDN edge servers are physically closer to users worldwide, reducing latency from hundreds of milliseconds to single digits. Use application-level caching with Redis or Memcached for dynamic, user-specific data (session tokens, personalized recommendations, computed results) that varies per request and cannot be served from a shared edge cache. Many systems use both: CDN for static assets and Redis for dynamic data, each handling what it does best.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

SH

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Related Articles

Design an E-Commerce Order Processing System
Jan 10, 202612 min read
System Design
E-Commerce
Saga Pattern

Design an E-Commerce Order Processing System

Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.

Monitoring, Observability, and Site Reliability
Dec 10, 20259 min read
System Design
Observability
Monitoring

Monitoring, Observability, and Site Reliability

Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.

CAP Theorem and Distributed Consensus
Nov 12, 202510 min read
System Design
CAP Theorem
Distributed Systems

CAP Theorem and Distributed Consensus

Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.