Caching Strategies in System Design
Explore caching strategies like write-through, write-back, and cache-aside for system design. Learn Redis caching patterns and CDN optimization techniques.
Tags
Caching Strategies in System Design
This is Part 5 of the System Design from Zero to Hero series.
TL;DR
Caching stores frequently accessed data closer to where it is needed, reducing latency and database load. The main strategies — cache-aside, write-through, write-behind, and read-through — each offer different trade-offs between consistency, performance, and complexity. The hardest problem in caching is not storing data; it is knowing when to invalidate it. Effective caching happens at multiple layers: browser, CDN, application (Redis), and database query cache.
Why This Matters
In Part 4, we discussed choosing the right database for your data. But even the best database hits performance limits when every request queries it directly. A well-tuned PostgreSQL instance on solid hardware can handle thousands of queries per second. A Redis cache on the same hardware handles hundreds of thousands.
Caching is not optional at scale. It is the difference between sub-50ms response times and multi-second page loads. It is the difference between a database humming along at 30% CPU and one falling over during peak traffic. Every major system — from Google Search to Netflix to Twitter — relies on aggressive, multi-layer caching to deliver the performance users expect.
Core Concepts
The Caching Hierarchy
Caching happens at every layer of the stack, each with different latency characteristics and use cases:
Layer Latency Scope Example
──────────────────────────────────────────────────────────────
Browser cache ~0ms Single user Static assets, API responses
CDN cache ~10-50ms Geographic region Images, CSS, JS, HTML pages
Load Balancer ~1ms Per-server SSL sessions, simple responses
Application ~1-5ms Cluster-wide Redis/Memcached
Database cache ~5-10ms Per-database Query cache, buffer pool
Disk cache ~0.1ms Per-machine OS page cache
The goal is to serve as many requests as possible from the fastest layer. A request that hits the browser cache never touches your servers at all. A request that hits the CDN avoids your application entirely. A request that hits Redis avoids an expensive database query. Each cache layer absorbs traffic that would otherwise cascade down to slower, more expensive layers.
Cache-Aside (Lazy Loading)
Cache-aside is the most common caching pattern. The application code is responsible for reading from and writing to the cache. Data is loaded into the cache on demand — only when a request needs it.
Read path:
1. Application checks cache
2. Cache HIT → return cached data
3. Cache MISS → query database → store in cache → return data
Write path:
1. Application writes to database
2. Application invalidates (deletes) the cache entry
3. Next read will repopulate the cache
import redis
import json
cache = redis.Redis(host='redis.internal', port=6379)
CACHE_TTL = 3600 # 1 hour
async def get_user_profile(user_id: str) -> dict:
cache_key = f"user:profile:{user_id}"
# Step 1: Check cache
cached = cache.get(cache_key)
if cached:
return json.loads(cached) # Cache HIT
# Step 2: Cache MISS - query database
profile = await db.fetchrow(
"SELECT id, name, email, avatar, bio FROM users WHERE id = $1",
user_id
)
if profile is None:
return None
result = dict(profile)
# Step 3: Store in cache with TTL
cache.setex(cache_key, CACHE_TTL, json.dumps(result))
return result
async def update_user_profile(user_id: str, updates: dict):
# Step 1: Update database (source of truth)
await db.execute(
"UPDATE users SET name = $1, bio = $2 WHERE id = $3",
updates['name'], updates['bio'], user_id
)
# Step 2: Invalidate cache (delete, not update)
cache.delete(f"user:profile:{user_id}")
# Next read will fetch fresh data from DB and repopulate cacheAdvantages: Only caches data that is actually requested. Simple to implement. Cache failures do not block the application — you fall back to the database.
Disadvantages: First request for any piece of data is always slow (cache miss). Potential for stale data between database write and cache invalidation. Cache stampede risk (explained below).
Write-Through
In write-through caching, every write goes to both the cache and the database simultaneously. The write is not considered complete until both have been updated.
Write path:
1. Application writes to cache AND database (synchronously)
2. Both are always in sync
Read path:
1. Application reads from cache (always populated)
2. Cache MISS → query database (rare, only after eviction)
async def update_product_price(product_id: str, new_price: float):
# Write to database
await db.execute(
"UPDATE products SET price = $1 WHERE id = $2",
new_price, product_id
)
# Write to cache (synchronously - both must succeed)
product = await db.fetchrow("SELECT * FROM products WHERE id = $1", product_id)
cache.setex(
f"product:{product_id}",
CACHE_TTL,
json.dumps(dict(product))
)Advantages: Cache is always consistent with the database. Reads are always fast (no cold cache on reads).
Disadvantages: Higher write latency (two writes per operation). Caches data that might never be read, wasting memory. Not suitable for write-heavy workloads where cache churn is high.
Write-Behind (Write-Back)
Write-behind writes to the cache immediately and asynchronously flushes changes to the database in the background. This optimizes write performance at the cost of potential data loss.
Write path:
1. Application writes to cache (fast, returns immediately)
2. Background process batches and flushes cache writes to database
Read path:
1. Application reads from cache (always up-to-date)
import asyncio
from collections import defaultdict
# Write buffer for batching database writes
write_buffer = defaultdict(dict)
async def update_view_count(video_id: str):
"""Increment view count in cache immediately, batch-write to DB"""
cache_key = f"video:views:{video_id}"
# Atomic increment in Redis (fast)
new_count = cache.incr(cache_key)
# Buffer the write for batch processing
write_buffer[video_id] = new_count
async def flush_view_counts():
"""Periodically flush buffered counts to database"""
while True:
await asyncio.sleep(10) # Flush every 10 seconds
if not write_buffer:
continue
# Batch update database
batch = dict(write_buffer)
write_buffer.clear()
for video_id, count in batch.items():
await db.execute(
"UPDATE videos SET view_count = $1 WHERE id = $2",
count, video_id
)Advantages: Extremely fast writes. Batching reduces database load. Ideal for high-frequency counters, metrics, and analytics.
Disadvantages: Risk of data loss if the cache crashes before flushing. Increased complexity. Not suitable for data that requires immediate durability (financial transactions).
Read-Through
Read-through is similar to cache-aside, but the cache itself is responsible for loading data from the database on a miss. The application only talks to the cache, never directly to the database for reads.
Read path:
1. Application requests data from cache
2. Cache HIT → return data
3. Cache MISS → cache queries database, stores result, returns data
The application never queries the database directly for reads.
This pattern is often implemented by the caching framework itself (e.g., NCache, Hazelcast) rather than in application code. The benefit is that cache population logic is centralized in the cache layer rather than scattered across application code.
Redis Data Structures for Caching
Redis is far more than a simple key-value store. Its data structures enable powerful caching patterns:
import redis
r = redis.Redis(host='redis.internal', port=6379)
# STRING: Simple key-value caching
r.setex('user:42:profile', 3600, json.dumps(user_data))
# HASH: Cache object fields independently
r.hset('user:42', mapping={
'name': 'Alice',
'email': 'alice@example.com',
'role': 'admin'
})
# Update single field without fetching entire object
r.hset('user:42', 'last_login', '2024-01-15T10:30:00Z')
# Get single field
name = r.hget('user:42', 'name')
# SORTED SET: Leaderboards, top-N queries
r.zadd('trending:articles', {'article:101': 4500, 'article:205': 3200})
# Get top 10 articles
top_articles = r.zrevrange('trending:articles', 0, 9, withscores=True)
# LIST: Recent activity feeds, queues
r.lpush('user:42:feed', json.dumps(new_activity))
r.ltrim('user:42:feed', 0, 99) # Keep only last 100 items
recent = r.lrange('user:42:feed', 0, 19) # Get 20 most recent
# SET: Unique values, membership checks
r.sadd('online:users', 'user:42', 'user:88', 'user:101')
is_online = r.sismember('online:users', 'user:42') # O(1) lookup
online_count = r.scard('online:users')
# HyperLogLog: Approximate unique counts (uses ~12 KB regardless of count)
r.pfadd('page:home:visitors', 'user:42', 'user:88')
r.pfadd('page:home:visitors', 'user:42') # duplicate, not counted
approx_unique = r.pfcount('page:home:visitors')Choosing the right Redis data structure can eliminate the need for complex database queries entirely. A sorted set replaces a SELECT ... ORDER BY score DESC LIMIT 10 query that scans and sorts rows on every request.
CDN Caching
A Content Delivery Network caches content at edge servers geographically close to users. For static assets (images, CSS, JavaScript), CDN caching reduces latency from hundreds of milliseconds to single-digit milliseconds.
Without CDN:
User in Tokyo ──500ms──▶ Origin server in Virginia
With CDN:
User in Tokyo ──10ms──▶ CDN edge in Tokyo
(cached copy of content)
CDN caching is controlled through HTTP headers:
# Cache-Control header examples
# Cache for 1 year (static assets with fingerprinted filenames)
Cache-Control: public, max-age=31536000, immutable
# Cache for 5 minutes, serve stale while revalidating
Cache-Control: public, max-age=300, stale-while-revalidate=60
# Don't cache (user-specific content)
Cache-Control: private, no-store
# Cache but always revalidate with origin
Cache-Control: public, no-cache
The immutable directive is particularly powerful for assets with content hashes in their filenames (e.g., app.a3b2c1.js). It tells the CDN and browser to never revalidate — the content at that URL will never change. If you update the file, the filename changes and it is treated as a new resource.
Browser Caching
Browser caching is the fastest cache layer — zero network latency. The browser stores responses locally based on Cache-Control headers.
// Service Worker: programmatic browser caching
self.addEventListener('fetch', (event) => {
event.respondWith(
caches.match(event.request).then((cached) => {
if (cached) {
// Serve from cache, update in background (stale-while-revalidate)
event.waitUntil(
fetch(event.request).then((response) => {
caches.open('v1').then((cache) => {
cache.put(event.request, response);
});
})
);
return cached;
}
// Not cached: fetch from network and cache the response
return fetch(event.request).then((response) => {
const clone = response.clone();
caches.open('v1').then((cache) => cache.put(event.request, clone));
return response;
});
})
);
});Cache Invalidation Strategies
Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. Here are the proven approaches:
TTL-based (Time-To-Live): Set an expiration time on cached entries. Simple and predictable, but users see stale data until the TTL expires.
# TTL-based: simple but potentially stale
cache.setex('product:42', 300, json.dumps(product)) # expires in 5 minutesEvent-based invalidation: When data changes, publish an event that triggers cache invalidation. More complex but ensures freshness.
# Event-based: invalidate on write
async def update_product(product_id, updates):
await db.execute("UPDATE products SET ... WHERE id = $1", product_id)
# Publish invalidation event
await message_bus.publish('cache.invalidate', {
'keys': [f'product:{product_id}', f'category:{updates["category_id"]}:products']
})
# Cache invalidation subscriber (runs on all app servers)
async def on_cache_invalidate(event):
for key in event['keys']:
cache.delete(key)Versioned keys: Include a version number in cache keys. When data changes, increment the version. Old cache entries are never explicitly deleted — they simply expire via TTL.
# Versioned keys: avoid explicit invalidation
async def get_product(product_id):
version = await db.fetchval(
"SELECT version FROM products WHERE id = $1", product_id
)
cache_key = f"product:{product_id}:v{version}"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
product = await db.fetchrow("SELECT * FROM products WHERE id = $1", product_id)
cache.setex(cache_key, 3600, json.dumps(dict(product)))
return dict(product)
# On update: just increment version. Old cache entry expires naturally.
async def update_product(product_id, updates):
await db.execute(
"UPDATE products SET name=$1, version=version+1 WHERE id=$2",
updates['name'], product_id
)Cache Stampede Prevention
A cache stampede (also called the thundering herd problem) occurs when a popular cache entry expires and hundreds of concurrent requests simultaneously query the database to repopulate it. This can overwhelm the database.
Normal operation:
1000 requests/sec ──▶ Cache (HIT) ──▶ response
Only 1 request/sec leaks to DB
Cache entry expires:
1000 requests/sec ──▶ Cache (ALL MISS) ──▶ DB receives 1000 queries simultaneously
DB overwhelmed
Solution 1: Locking — Only one request repopulates the cache; others wait.
import time
async def get_with_lock(key: str, fetch_fn, ttl: int = 3600):
# Try cache first
cached = cache.get(key)
if cached:
return json.loads(cached)
# Acquire lock (only one process populates the cache)
lock_key = f"lock:{key}"
acquired = cache.set(lock_key, "1", nx=True, ex=10) # 10s lock timeout
if acquired:
try:
# This process populates the cache
data = await fetch_fn()
cache.setex(key, ttl, json.dumps(data))
return data
finally:
cache.delete(lock_key)
else:
# Another process is populating — wait and retry
for _ in range(50): # wait up to 5 seconds
time.sleep(0.1)
cached = cache.get(key)
if cached:
return json.loads(cached)
# Fallback: query database directly
return await fetch_fn()Solution 2: Early expiration (probabilistic) — Refresh the cache before it actually expires. Each request that reads a near-expiry entry has a small probability of triggering a background refresh.
import random
import time
async def get_with_early_refresh(key: str, fetch_fn, ttl: int = 3600):
cached = cache.get(key)
if cached:
data = json.loads(cached)
remaining_ttl = cache.ttl(key)
# If less than 10% of TTL remaining, probabilistically refresh
if remaining_ttl < ttl * 0.1:
refresh_probability = 1 - (remaining_ttl / (ttl * 0.1))
if random.random() < refresh_probability:
# Refresh in background
asyncio.create_task(refresh_cache(key, fetch_fn, ttl))
return data
# Cache miss: fetch and populate
data = await fetch_fn()
cache.setex(key, ttl, json.dumps(data))
return dataSolution 3: Never expire hot keys — For extremely popular cache entries, refresh them proactively on a schedule rather than relying on TTL expiration.
Practical Implementation
Here is a complete caching layer for a web application that combines multiple strategies:
class CacheLayer:
"""Multi-strategy caching with stampede prevention"""
def __init__(self, redis_client, default_ttl=3600):
self.cache = redis_client
self.default_ttl = default_ttl
async def get_or_fetch(self, key, fetch_fn, ttl=None, strategy='cache_aside'):
ttl = ttl or self.default_ttl
if strategy == 'cache_aside':
return await self._cache_aside(key, fetch_fn, ttl)
elif strategy == 'read_through_locked':
return await self._read_through_locked(key, fetch_fn, ttl)
async def _cache_aside(self, key, fetch_fn, ttl):
cached = self.cache.get(key)
if cached:
return json.loads(cached)
data = await fetch_fn()
if data is not None:
self.cache.setex(key, ttl, json.dumps(data))
return data
async def _read_through_locked(self, key, fetch_fn, ttl):
"""Cache-aside with stampede prevention via locking"""
cached = self.cache.get(key)
if cached:
return json.loads(cached)
lock_key = f"lock:{key}"
if self.cache.set(lock_key, "1", nx=True, ex=10):
try:
data = await fetch_fn()
if data is not None:
self.cache.setex(key, ttl, json.dumps(data))
return data
finally:
self.cache.delete(lock_key)
else:
# Wait for the lock holder to populate cache
for _ in range(50):
await asyncio.sleep(0.1)
cached = self.cache.get(key)
if cached:
return json.loads(cached)
return await fetch_fn()
def invalidate(self, *keys):
"""Delete one or more cache entries"""
if keys:
self.cache.delete(*keys)
def invalidate_pattern(self, pattern):
"""Delete all keys matching a pattern (use sparingly)"""
cursor = 0
while True:
cursor, keys = self.cache.scan(cursor, match=pattern, count=100)
if keys:
self.cache.delete(*keys)
if cursor == 0:
break
# Usage
cache_layer = CacheLayer(redis.Redis(host='redis.internal'))
# Standard cache-aside for user profiles
profile = await cache_layer.get_or_fetch(
f"user:{user_id}:profile",
lambda: db.get_user_profile(user_id),
ttl=1800
)
# Locked read-through for popular product pages
product = await cache_layer.get_or_fetch(
f"product:{product_id}",
lambda: db.get_product(product_id),
ttl=300,
strategy='read_through_locked'
)Trade-offs and Decision Framework
| Strategy | Consistency | Read Perf | Write Perf | Complexity | Best For |
|---|---|---|---|---|---|
| Cache-Aside | Eventual | Good (after warm-up) | Good | Low | General purpose |
| Write-Through | Strong | Excellent | Slower | Medium | Read-heavy, consistency matters |
| Write-Behind | Eventual | Excellent | Excellent | High | Counters, analytics, metrics |
| Read-Through | Eventual | Good (after warm-up) | Good | Medium | Framework-managed caching |
Decision guidelines:
- ›Start with cache-aside. It is the simplest, most flexible, and handles most use cases.
- ›Use write-through when read consistency is important and write volume is moderate.
- ›Use write-behind only for data where temporary loss is acceptable (view counts, analytics).
- ›Set TTLs on everything. A cache entry without a TTL is a memory leak waiting to happen.
- ›Monitor cache hit rates. Below 80% suggests your TTLs are too short or you are caching the wrong data.
- ›Prefer short TTLs (minutes) over long ones (hours) when starting out. You can always increase TTLs after validating that stale data is acceptable for your use case.
Common Interview Questions
Q: How would you implement caching for a social media news feed? Use a fan-out-on-write approach: when a user posts, push the post ID to each follower's cached feed (a Redis list). For celebrities with millions of followers, use fan-out-on-read instead — merge the celebrity's posts with the cached feed at read time. Use cache-aside with a TTL for the feed, and sorted sets to maintain ordering by timestamp. Invalidate or update the feed cache when a user creates, deletes, or hides a post.
Q: Your cache hit rate dropped from 95% to 60% overnight. What happened? Common causes: (1) A deployment changed cache keys (new naming convention invalidated all existing entries). (2) The cache was restarted or scaled, causing a cold cache. (3) A new feature introduced uncacheable queries. (4) Traffic patterns shifted (new users accessing uncached content). (5) TTLs were inadvertently shortened. Diagnose by checking cache memory usage, eviction rates, and correlating with recent deployments or traffic changes.
Q: When should you NOT use caching? When your data changes with every request (real-time stock prices during trading hours). When strong consistency is a regulatory requirement and even milliseconds of staleness is unacceptable. When your working set is larger than available cache memory (you will get constant evictions and low hit rates). When the overhead of cache management exceeds the performance benefit (very small datasets that the database serves fast enough).
Q: Explain the difference between cache eviction and cache invalidation. Cache invalidation is intentionally removing specific entries because the underlying data changed (you update a product price and delete the cached product). Cache eviction is the cache automatically removing entries to free memory when it is full, typically using an LRU (Least Recently Used) or LFU (Least Frequently Used) policy. Invalidation ensures correctness; eviction ensures the cache does not run out of memory. Both can cause cache misses, but for different reasons.
What's Next
This concludes the first five parts of our System Design from Zero to Hero series. We have covered the foundational concepts in Part 1, scaling strategies in Part 2, load balancing in Part 3, database selection in Part 4, and caching in this post. Together, these form the building blocks for designing systems that scale. Future posts in the series will cover message queues, API design, microservice communication patterns, and more.
FAQ
What is the difference between cache-aside and write-through caching?
Cache-aside loads data into cache on demand (lazy loading) — the application checks the cache first, and on a miss, queries the database and populates the cache. Write-through updates the cache synchronously on every write, ensuring the cache always has fresh data. Cache-aside is simpler and avoids caching data that is never read, but risks cache misses on the first access and brief windows of stale data. Write-through guarantees consistency between cache and database but adds write latency and may cache data that is rarely read. Most systems start with cache-aside and move to write-through for specific hot paths where consistency matters.
How do I handle cache invalidation in a distributed system?
Use TTL-based expiration for simplicity — every cache entry gets a time-to-live and automatically expires. For tighter consistency, use event-driven invalidation with a message queue: when data changes, publish an invalidation event that all application instances consume to delete stale cache entries. For immutable data, use versioned keys where the version number is part of the cache key, and old entries simply expire via TTL. Avoid manual, ad-hoc invalidation across services as it is error-prone and does not scale. In practice, combining TTL as a safety net with event-driven invalidation for critical data provides the best balance of simplicity and correctness.
When should I use a CDN instead of application-level caching?
Use CDNs for static assets (images, CSS, JavaScript, fonts), media files (video, audio), and any content that is identical for all users and benefits from geographic distribution. CDN edge servers are physically closer to users worldwide, reducing latency from hundreds of milliseconds to single digits. Use application-level caching with Redis or Memcached for dynamic, user-specific data (session tokens, personalized recommendations, computed results) that varies per request and cannot be served from a shared edge cache. Many systems use both: CDN for static assets and Redis for dynamic data, each handling what it does best.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
Design an E-Commerce Order Processing System
Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.
Monitoring, Observability, and Site Reliability
Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.
CAP Theorem and Distributed Consensus
Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.