Load Balancing and Reverse Proxies: Distributing Traffic
Master load balancing algorithms and reverse proxy patterns. Learn how Nginx, HAProxy, and cloud load balancers distribute traffic across servers effectively.
Tags
Load Balancing and Reverse Proxies: Distributing Traffic
This is Part 3 of the System Design from Zero to Hero series.
TL;DR
Load balancers distribute incoming traffic across multiple backend servers to maximize throughput, minimize response times, and prevent any single server from becoming overwhelmed. They operate at Layer 4 (transport) or Layer 7 (application) of the network stack, using algorithms like round-robin, least connections, or consistent hashing. A reverse proxy is the broader concept — a server that sits between clients and backends — and a load balancer is a reverse proxy with traffic distribution as its primary job.
Why This Matters
In Part 2, we discussed horizontal scaling — adding more servers to handle increased load. But once you have multiple servers, you need a way to distribute traffic across them. Without a load balancer, you would need to give clients the addresses of all your servers and hope they spread their requests evenly (they will not).
Load balancers are so fundamental that they appear in virtually every production architecture. They are the front door to your application. Understanding how they work, what algorithms they use, and how they handle failures is essential for designing systems that stay fast and available under real-world conditions.
Core Concepts
What Is a Reverse Proxy?
A forward proxy acts on behalf of clients — it sits between your browser and the internet, forwarding your requests outward. A VPN is a type of forward proxy.
A reverse proxy acts on behalf of servers — it sits between the internet and your backend servers, forwarding incoming requests inward. Clients never communicate directly with your application servers; they only see the reverse proxy.
Forward Proxy:
Client ──▶ Proxy ──▶ Internet ──▶ Server
(client hides behind proxy)
Reverse Proxy:
Client ──▶ Internet ──▶ Reverse Proxy ──▶ Server(s)
(servers hide behind proxy)
Reverse proxies provide several benefits beyond load balancing:
- ›SSL termination: Handle TLS encryption/decryption so backend servers do not have to.
- ›Compression: Compress responses before sending them to clients.
- ›Caching: Cache static content and reduce load on backends.
- ›Security: Hide the identity and topology of backend servers. Block malicious requests.
- ›Rate limiting: Throttle excessive requests before they reach your application.
Layer 4 vs Layer 7 Load Balancing
Load balancers operate at two network layers, each with distinct capabilities:
Layer 4 (Transport Layer) load balancers route traffic based on IP addresses and TCP/UDP ports. They do not inspect the content of the request — they see only network packets. This makes them extremely fast but limited in routing flexibility.
Layer 7 (Application Layer) load balancers inspect the full HTTP request, including headers, URL path, cookies, and body. They can make intelligent routing decisions based on content.
Layer 4 Load Balancer:
┌─────────────┐
│ Sees: │
│ - Source IP │ ┌──────────┐
│ - Dest IP │──────▶ │ Server 1 │
│ - Src Port │──────▶ │ Server 2 │
│ - Dst Port │──────▶ │ Server 3 │
│ │ └──────────┘
│ Cannot see: │
│ - URL path │
│ - Headers │
│ - Cookies │
└─────────────┘
Layer 7 Load Balancer:
┌──────────────────┐
│ Sees everything: │ /api/* ──▶ API Servers
│ - URL path │────────────────────────────
│ - HTTP method │ /static/* ──▶ CDN/Static
│ - Headers │────────────────────────────
│ - Cookies │ /ws/* ──▶ WebSocket Servers
│ - Request body │
└──────────────────┘
When to use each:
- ›L4 when you need raw performance with minimal overhead, when you are load balancing non-HTTP protocols (database connections, gRPC), or when routing decisions do not depend on request content.
- ›L7 when you need content-based routing (different paths to different backends), SSL termination, header manipulation, or WebSocket support.
Most web applications use L7 load balancers because the routing flexibility is worth the small performance overhead.
Load Balancing Algorithms
The algorithm determines which backend server receives each incoming request. The right choice depends on your workload characteristics.
Round-Robin: Requests are distributed sequentially across servers. Server 1, Server 2, Server 3, Server 1, Server 2, Server 3, and so on. Simple and effective when all servers have identical capacity and all requests have similar processing costs.
Request 1 ──▶ Server A
Request 2 ──▶ Server B
Request 3 ──▶ Server C
Request 4 ──▶ Server A (cycles back)
Request 5 ──▶ Server B
Weighted Round-Robin: Like round-robin, but servers with higher weights receive proportionally more requests. Useful when servers have different hardware specs.
Weights: Server A=3, Server B=1, Server C=1
Request 1 ──▶ Server A
Request 2 ──▶ Server A
Request 3 ──▶ Server A
Request 4 ──▶ Server B
Request 5 ──▶ Server C
(cycle repeats)
Least Connections: Routes each request to the server with the fewest active connections. This naturally handles variable request durations — a server processing a slow database query accumulates connections and temporarily receives fewer new requests.
IP Hash: The client's IP address is hashed to determine which server receives the request. The same client always goes to the same server. This provides session affinity without cookies, but causes uneven distribution when clients are behind NAT (thousands of users sharing one IP address).
Consistent Hashing: A more sophisticated approach where both servers and requests are mapped onto a hash ring. When a server is added or removed, only a fraction of requests are remapped — not all of them. This is critical for caching layers where remapping requests means cache misses.
Hash Ring (simplified):
0°
│
S1 ──┤
│ Requests hashing between
330°──┤ S3 (270°) and S1 go to S1
│
│
S2 ──┤ 180°
│
│
S3 ──┤ 270°
If S2 is removed, only requests that were
going to S2 get remapped (to S3).
S1's traffic is unchanged.
Health Checks
A load balancer is only useful if it knows which backends are healthy. Health checks continuously verify that servers can handle traffic.
Passive health checks monitor real traffic. If a server returns five consecutive 500 errors, mark it as unhealthy. This is reactive — users experience the errors before the server is removed.
Active health checks send periodic probe requests to a dedicated health endpoint. If a server fails to respond to three consecutive probes, remove it from the pool. This is proactive — unhealthy servers are removed before users are affected.
Load Balancer
│
├── GET /health ──▶ Server 1: 200 OK ✓ healthy
├── GET /health ──▶ Server 2: 200 OK ✓ healthy
├── GET /health ──▶ Server 3: timeout ✗ unhealthy (removed)
└── GET /health ──▶ Server 4: 503 ✗ unhealthy (removed)
Traffic now routes only to Server 1 and Server 2.
A good health check endpoint verifies more than "the process is running." It should check database connectivity, cache connectivity, and available disk space — anything whose failure means the server cannot meaningfully serve requests.
SSL/TLS Termination
SSL termination means the load balancer handles the TLS handshake and decryption, then forwards unencrypted HTTP to the backend servers. This offloads CPU-intensive cryptographic operations from your application servers and simplifies certificate management (you only manage certificates in one place).
Client ──HTTPS──▶ Load Balancer ──HTTP──▶ Backend Servers
(decrypts here) (no TLS overhead)
For environments requiring end-to-end encryption (PCI compliance, healthcare), use SSL passthrough (L4) or SSL re-encryption (L7 — decrypt at the LB, then re-encrypt to the backend using an internal certificate).
Sticky Sessions
Sticky sessions (session affinity) ensure that a client's requests consistently route to the same backend server. The load balancer typically sets a cookie containing the target server's identifier.
First request:
Client ──▶ LB (no cookie) ──▶ assigns Server 2
sets cookie: SERVERID=srv2
Subsequent requests:
Client ──▶ LB (cookie: SERVERID=srv2) ──▶ Server 2 (always)
Use sticky sessions only when you cannot externalize state to Redis or a database. They create uneven load distribution and complicate failover, as we discussed in Part 2.
Practical Implementation
Nginx as a Load Balancer
Nginx is the most widely deployed reverse proxy and load balancer. Here is a production-ready configuration:
# /etc/nginx/nginx.conf
upstream api_backend {
# Least connections algorithm
least_conn;
# Backend servers with health check parameters
server 10.0.1.10:3000 weight=3 max_fails=3 fail_timeout=30s;
server 10.0.1.11:3000 weight=3 max_fails=3 fail_timeout=30s;
server 10.0.1.12:3000 weight=2 max_fails=3 fail_timeout=30s;
# Backup server - only receives traffic if all others are down
server 10.0.1.20:3000 backup;
# Keep connections alive to backends (connection pooling)
keepalive 64;
}
server {
listen 443 ssl http2;
server_name api.example.com;
# SSL termination
ssl_certificate /etc/ssl/certs/api.example.com.pem;
ssl_certificate_key /etc/ssl/private/api.example.com.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Gzip compression
gzip on;
gzip_types application/json text/plain application/javascript;
gzip_min_length 1000;
# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=30r/s;
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://api_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
proxy_send_timeout 30s;
# Retry on failure (only for idempotent methods)
proxy_next_upstream error timeout http_502 http_503;
proxy_next_upstream_tries 2;
}
# Static files served directly by Nginx (no backend needed)
location /static/ {
root /var/www;
expires 30d;
add_header Cache-Control "public, immutable";
}
}Key configuration decisions in this setup:
- ›
least_connroutes to the server handling the fewest requests, adapting to variable request durations. - ›
max_fails=3 fail_timeout=30smeans if a server fails 3 times within 30 seconds, it is removed from the pool for 30 seconds (passive health check). - ›
keepalive 64maintains a pool of persistent connections to backends, avoiding the overhead of establishing new TCP connections per request. - ›
proxy_next_upstreamretries failed requests on another server, but only for safe-to-retry errors and methods.
AWS ALB vs NLB
Amazon Web Services offers two primary load balancers:
| Feature | ALB (Application Load Balancer) | NLB (Network Load Balancer) |
|---|---|---|
| Layer | Layer 7 | Layer 4 |
| Protocols | HTTP, HTTPS, WebSocket | TCP, UDP, TLS |
| Routing | Path-based, host-based, header-based | Port-based only |
| SSL termination | Yes | Yes (TLS listener) |
| Performance | Good (thousands of requests/sec) | Extreme (millions of requests/sec) |
| Static IP | No (DNS-based) | Yes (Elastic IP per AZ) |
| Cost | Per hour + per LCU | Per hour + per NLCU |
| Best for | Web applications, APIs, microservices | Gaming, IoT, extreme throughput, gRPC |
Use ALB for most web applications. Path-based routing (/api/* to one target group, /admin/* to another) is essential for microservice architectures.
Use NLB when you need static IP addresses (for allowlisting), extreme low-latency, or when load balancing non-HTTP protocols.
Trade-offs and Decision Framework
| Decision | Choose When | Avoid When |
|---|---|---|
| L7 Load Balancing | Web apps, content-based routing, SSL termination | Non-HTTP protocols, ultra-low-latency needs |
| L4 Load Balancing | Database connections, gaming, extreme throughput | Need to inspect or route by HTTP content |
| Round-Robin | Uniform servers, uniform request cost | Variable server capacity or request duration |
| Least Connections | Variable request durations (some fast, some slow) | All requests take the same time |
| Consistent Hashing | Caching layers, need minimal remapping on changes | Simple stateless services |
| Sticky Sessions | Cannot externalize session state (short-term) | Stateless services (prefer external state) |
| Single LB | Development, small-scale production | Production systems requiring high availability |
| Active-Passive LB pair | Production systems needing LB redundancy | Cost-constrained environments |
An important pattern is multi-tier load balancing: a global DNS-based load balancer (like AWS Route 53 or Cloudflare) distributes traffic across regions, each region has an L7 load balancer (ALB) distributing across services, and individual services may have internal L4 load balancers for database connection distribution.
Common Interview Questions
Q: Your load balancer is a single point of failure. How do you address this? Deploy load balancers in an active-passive or active-active pair. In active-passive, the passive LB monitors the active one and takes over using a floating IP (via VRRP/keepalived) if the active LB fails. Cloud providers handle this automatically — AWS ALB runs across multiple Availability Zones with built-in redundancy. For global redundancy, use DNS-based load balancing across multiple regions.
Q: How does consistent hashing differ from IP hash, and when would you use each? Both use hashing to route clients to servers, but they differ in how they handle server changes. With IP hash, adding or removing a server changes the hash-to-server mapping for most keys — causing widespread cache misses. With consistent hashing, only the keys that were mapped to the changed server are remapped. Use consistent hashing for cache layers (Memcached, Redis) where cache misses are expensive. IP hash is fine when cache locality does not matter and you just need simple session affinity.
Q: A backend server is returning 200 OK on the health check but users are getting errors. What is happening? The health check endpoint is too shallow — it confirms the process is running but does not verify the server can actually serve requests. The server might have a broken database connection, a full disk, or a deadlocked thread pool. Fix this by making the health endpoint check all critical dependencies (database, cache, external services) and return unhealthy if any are degraded. Also implement passive health checks that monitor real traffic error rates.
Q: How would you handle WebSocket connections with a load balancer? WebSockets require a persistent connection, so you need an L7 load balancer that supports connection upgrades (HTTP Upgrade header). Configure the load balancer to route the initial HTTP handshake and then maintain the persistent connection to the same backend. Both Nginx and AWS ALB support this natively. For scaling, use Redis Pub/Sub or a message broker so that messages can reach clients connected to any server.
What's Next
Load balancers distribute traffic across servers, but those servers need to store and retrieve data efficiently. Continue to Part 4: Databases — SQL vs NoSQL where we explore database selection, indexing strategies, and the trade-offs between relational and non-relational data stores.
FAQ
What is the difference between a load balancer and a reverse proxy?
A reverse proxy sits in front of servers and forwards client requests, while a load balancer is a specific type of reverse proxy that distributes traffic across multiple backend servers based on an algorithm. Every load balancer is a reverse proxy, but not every reverse proxy is a load balancer. A reverse proxy with a single backend still provides benefits: SSL termination, caching, compression, and security (hiding backend topology). When you add multiple backends and a distribution algorithm, that reverse proxy becomes a load balancer.
Which load balancing algorithm should I use?
Round-robin works for stateless apps with uniform servers — it is the simplest and most predictable. Use least-connections for workloads with variable request durations, such as APIs where some endpoints query the database and others serve cached data. Use consistent hashing when you need session affinity or cache locality, particularly for caching layers where routing the same key to the same server maximizes cache hit rates. When in doubt, start with round-robin and switch to least-connections if you observe uneven server utilization.
How do load balancers handle server failures?
Load balancers perform health checks on backend servers and automatically remove unhealthy nodes from the pool, redistributing traffic to healthy servers until the failed ones recover. Active health checks send periodic probes (e.g., HTTP GET to /health every 10 seconds) and mark a server unhealthy after a configured number of consecutive failures. Passive health checks monitor real request outcomes and remove servers that produce too many errors. Production systems typically use both: active checks for fast detection and passive checks as a safety net for issues the health endpoint does not cover.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
Design an E-Commerce Order Processing System
Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.
Monitoring, Observability, and Site Reliability
Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.
CAP Theorem and Distributed Consensus
Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.