Blog/System Design/Load Balancing and Reverse Proxies: Distributing Traffic

POST

June 01, 2025

LAST UPDATEDJune 01, 2025

Load Balancing and Reverse Proxies: Distributing Traffic

Q: Which load balancing algorithm should I use?

Round-robin works for stateless apps with uniform servers. Use least-connections for variable request durations, and consistent hashing when you need session affinity or cache locality.

Master load balancing algorithms and reverse proxy patterns. Learn how Nginx, HAProxy, and cloud load balancers distribute traffic across servers effectively.

Load Balancing and Reverse Proxies: Distributing Traffic

This is Part 3 of the System Design from Zero to Hero series.

TL;DR

Load balancers distribute incoming traffic across multiple backend servers to maximize throughput, minimize response times, and prevent any single server from becoming overwhelmed. They operate at Layer 4 (transport) or Layer 7 (application) of the network stack, using algorithms like round-robin, least connections, or consistent hashing. A reverse proxy is the broader concept — a server that sits between clients and backends — and a load balancer is a reverse proxy with traffic distribution as its primary job.

Why This Matters

In Part 2, we discussed horizontal scaling — adding more servers to handle increased load. But once you have multiple servers, you need a way to distribute traffic across them. Without a load balancer, you would need to give clients the addresses of all your servers and hope they spread their requests evenly (they will not).

Load balancers are so fundamental that they appear in virtually every production architecture. They are the front door to your application. Understanding how they work, what algorithms they use, and how they handle failures is essential for designing systems that stay fast and available under real-world conditions.

Core Concepts

What Is a Reverse Proxy?

A forward proxy acts on behalf of clients — it sits between your browser and the internet, forwarding your requests outward. A VPN is a type of forward proxy.

A reverse proxy acts on behalf of servers — it sits between the internet and your backend servers, forwarding incoming requests inward. Clients never communicate directly with your application servers; they only see the reverse proxy.

Forward Proxy:
Client ──▶ Proxy ──▶ Internet ──▶ Server
(client hides behind proxy)

Reverse Proxy:
Client ──▶ Internet ──▶ Reverse Proxy ──▶ Server(s)
(servers hide behind proxy)

Reverse proxies provide several benefits beyond load balancing:

›SSL termination: Handle TLS encryption/decryption so backend servers do not have to.
›Compression: Compress responses before sending them to clients.
›Caching: Cache static content and reduce load on backends.
›Security: Hide the identity and topology of backend servers. Block malicious requests.
›Rate limiting: Throttle excessive requests before they reach your application.

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at two network layers, each with distinct capabilities:

Layer 4 (Transport Layer) load balancers route traffic based on IP addresses and TCP/UDP ports. They do not inspect the content of the request — they see only network packets. This makes them extremely fast but limited in routing flexibility.

Layer 7 (Application Layer) load balancers inspect the full HTTP request, including headers, URL path, cookies, and body. They can make intelligent routing decisions based on content.

Layer 4 Load Balancer:
┌─────────────┐
│ Sees:       │
│ - Source IP │        ┌──────────┐
│ - Dest IP   │──────▶ │ Server 1 │
│ - Src Port  │──────▶ │ Server 2 │
│ - Dst Port  │──────▶ │ Server 3 │
│             │        └──────────┘
│ Cannot see: │
│ - URL path  │
│ - Headers   │
│ - Cookies   │
└─────────────┘

Layer 7 Load Balancer:
┌──────────────────┐
│ Sees everything: │     /api/*  ──▶ API Servers
│ - URL path       │────────────────────────────
│ - HTTP method    │     /static/* ──▶ CDN/Static
│ - Headers        │────────────────────────────
│ - Cookies        │     /ws/*   ──▶ WebSocket Servers
│ - Request body   │
└──────────────────┘

When to use each:

›L4 when you need raw performance with minimal overhead, when you are load balancing non-HTTP protocols (database connections, gRPC), or when routing decisions do not depend on request content.
›L7 when you need content-based routing (different paths to different backends), SSL termination, header manipulation, or WebSocket support.

Most web applications use L7 load balancers because the routing flexibility is worth the small performance overhead.

Load Balancing Algorithms

The algorithm determines which backend server receives each incoming request. The right choice depends on your workload characteristics.

Round-Robin: Requests are distributed sequentially across servers. Server 1, Server 2, Server 3, Server 1, Server 2, Server 3, and so on. Simple and effective when all servers have identical capacity and all requests have similar processing costs.

Request 1 ──▶ Server A
Request 2 ──▶ Server B
Request 3 ──▶ Server C
Request 4 ──▶ Server A  (cycles back)
Request 5 ──▶ Server B

Weighted Round-Robin: Like round-robin, but servers with higher weights receive proportionally more requests. Useful when servers have different hardware specs.

Weights: Server A=3, Server B=1, Server C=1
Request 1 ──▶ Server A
Request 2 ──▶ Server A
Request 3 ──▶ Server A
Request 4 ──▶ Server B
Request 5 ──▶ Server C
(cycle repeats)

Least Connections: Routes each request to the server with the fewest active connections. This naturally handles variable request durations — a server processing a slow database query accumulates connections and temporarily receives fewer new requests.

IP Hash: The client's IP address is hashed to determine which server receives the request. The same client always goes to the same server. This provides session affinity without cookies, but causes uneven distribution when clients are behind NAT (thousands of users sharing one IP address).

Consistent Hashing: A more sophisticated approach where both servers and requests are mapped onto a hash ring. When a server is added or removed, only a fraction of requests are remapped — not all of them. This is critical for caching layers where remapping requests means cache misses.

Hash Ring (simplified):
        0°
        │
   S1 ──┤
        │      Requests hashing between
  330°──┤      S3 (270°) and S1 go to S1
        │
        │
  S2 ──┤ 180°
        │
        │
  S3 ──┤ 270°

If S2 is removed, only requests that were
going to S2 get remapped (to S3).
S1's traffic is unchanged.

Health Checks

A load balancer is only useful if it knows which backends are healthy. Health checks continuously verify that servers can handle traffic.

Passive health checks monitor real traffic. If a server returns five consecutive 500 errors, mark it as unhealthy. This is reactive — users experience the errors before the server is removed.

Active health checks send periodic probe requests to a dedicated health endpoint. If a server fails to respond to three consecutive probes, remove it from the pool. This is proactive — unhealthy servers are removed before users are affected.

Load Balancer
    │
    ├── GET /health ──▶ Server 1: 200 OK     ✓ healthy
    ├── GET /health ──▶ Server 2: 200 OK     ✓ healthy
    ├── GET /health ──▶ Server 3: timeout    ✗ unhealthy (removed)
    └── GET /health ──▶ Server 4: 503        ✗ unhealthy (removed)

Traffic now routes only to Server 1 and Server 2.

A good health check endpoint verifies more than "the process is running." It should check database connectivity, cache connectivity, and available disk space — anything whose failure means the server cannot meaningfully serve requests.

SSL/TLS Termination

SSL termination means the load balancer handles the TLS handshake and decryption, then forwards unencrypted HTTP to the backend servers. This offloads CPU-intensive cryptographic operations from your application servers and simplifies certificate management (you only manage certificates in one place).

Client ──HTTPS──▶ Load Balancer ──HTTP──▶ Backend Servers
                  (decrypts here)         (no TLS overhead)

For environments requiring end-to-end encryption (PCI compliance, healthcare), use SSL passthrough (L4) or SSL re-encryption (L7 — decrypt at the LB, then re-encrypt to the backend using an internal certificate).

Sticky Sessions

Sticky sessions (session affinity) ensure that a client's requests consistently route to the same backend server. The load balancer typically sets a cookie containing the target server's identifier.

First request:
Client ──▶ LB (no cookie) ──▶ assigns Server 2
                               sets cookie: SERVERID=srv2

Subsequent requests:
Client ──▶ LB (cookie: SERVERID=srv2) ──▶ Server 2 (always)

Use sticky sessions only when you cannot externalize state to Redis or a database. They create uneven load distribution and complicate failover, as we discussed in Part 2.

Practical Implementation

Nginx as a Load Balancer

Nginx is the most widely deployed reverse proxy and load balancer. Here is a production-ready configuration:

nginx

# /etc/nginx/nginx.conf
 
upstream api_backend {
    # Least connections algorithm
    least_conn;
 
    # Backend servers with health check parameters
    server 10.0.1.10:3000 weight=3 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:3000 weight=3 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:3000 weight=2 max_fails=3 fail_timeout=30s;
 
    # Backup server - only receives traffic if all others are down
    server 10.0.1.20:3000 backup;
 
    # Keep connections alive to backends (connection pooling)
    keepalive 64;
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
 
    # SSL termination
    ssl_certificate     /etc/ssl/certs/api.example.com.pem;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;
 
    # Gzip compression
    gzip on;
    gzip_types application/json text/plain application/javascript;
    gzip_min_length 1000;
 
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=30r/s;
 
    location /api/ {
        limit_req zone=api_limit burst=20 nodelay;
 
        proxy_pass http://api_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
 
        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
        proxy_send_timeout 30s;
 
        # Retry on failure (only for idempotent methods)
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 2;
    }
 
    # Static files served directly by Nginx (no backend needed)
    location /static/ {
        root /var/www;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }
}

Key configuration decisions in this setup:

›least_conn routes to the server handling the fewest requests, adapting to variable request durations.
›max_fails=3 fail_timeout=30s means if a server fails 3 times within 30 seconds, it is removed from the pool for 30 seconds (passive health check).
›keepalive 64 maintains a pool of persistent connections to backends, avoiding the overhead of establishing new TCP connections per request.
›proxy_next_upstream retries failed requests on another server, but only for safe-to-retry errors and methods.

AWS ALB vs NLB

Amazon Web Services offers two primary load balancers:

Feature	ALB (Application Load Balancer)	NLB (Network Load Balancer)
Layer	Layer 7	Layer 4
Protocols	HTTP, HTTPS, WebSocket	TCP, UDP, TLS
Routing	Path-based, host-based, header-based	Port-based only
SSL termination	Yes	Yes (TLS listener)
Performance	Good (thousands of requests/sec)	Extreme (millions of requests/sec)
Static IP	No (DNS-based)	Yes (Elastic IP per AZ)
Cost	Per hour + per LCU	Per hour + per NLCU
Best for	Web applications, APIs, microservices	Gaming, IoT, extreme throughput, gRPC

Use ALB for most web applications. Path-based routing (/api/* to one target group, /admin/* to another) is essential for microservice architectures.

Use NLB when you need static IP addresses (for allowlisting), extreme low-latency, or when load balancing non-HTTP protocols.

Trade-offs and Decision Framework

Decision	Choose When	Avoid When
L7 Load Balancing	Web apps, content-based routing, SSL termination	Non-HTTP protocols, ultra-low-latency needs
L4 Load Balancing	Database connections, gaming, extreme throughput	Need to inspect or route by HTTP content
Round-Robin	Uniform servers, uniform request cost	Variable server capacity or request duration
Least Connections	Variable request durations (some fast, some slow)	All requests take the same time
Consistent Hashing	Caching layers, need minimal remapping on changes	Simple stateless services
Sticky Sessions	Cannot externalize session state (short-term)	Stateless services (prefer external state)
Single LB	Development, small-scale production	Production systems requiring high availability
Active-Passive LB pair	Production systems needing LB redundancy	Cost-constrained environments

An important pattern is multi-tier load balancing: a global DNS-based load balancer (like AWS Route 53 or Cloudflare) distributes traffic across regions, each region has an L7 load balancer (ALB) distributing across services, and individual services may have internal L4 load balancers for database connection distribution.

Common Interview Questions

Q: Your load balancer is a single point of failure. How do you address this? Deploy load balancers in an active-passive or active-active pair. In active-passive, the passive LB monitors the active one and takes over using a floating IP (via VRRP/keepalived) if the active LB fails. Cloud providers handle this automatically — AWS ALB runs across multiple Availability Zones with built-in redundancy. For global redundancy, use DNS-based load balancing across multiple regions.

Q: How does consistent hashing differ from IP hash, and when would you use each? Both use hashing to route clients to servers, but they differ in how they handle server changes. With IP hash, adding or removing a server changes the hash-to-server mapping for most keys — causing widespread cache misses. With consistent hashing, only the keys that were mapped to the changed server are remapped. Use consistent hashing for cache layers (Memcached, Redis) where cache misses are expensive. IP hash is fine when cache locality does not matter and you just need simple session affinity.

Q: A backend server is returning 200 OK on the health check but users are getting errors. What is happening? The health check endpoint is too shallow — it confirms the process is running but does not verify the server can actually serve requests. The server might have a broken database connection, a full disk, or a deadlocked thread pool. Fix this by making the health endpoint check all critical dependencies (database, cache, external services) and return unhealthy if any are degraded. Also implement passive health checks that monitor real traffic error rates.

Q: How would you handle WebSocket connections with a load balancer? WebSockets require a persistent connection, so you need an L7 load balancer that supports connection upgrades (HTTP Upgrade header). Configure the load balancer to route the initial HTTP handshake and then maintain the persistent connection to the same backend. Both Nginx and AWS ALB support this natively. For scaling, use Redis Pub/Sub or a message broker so that messages can reach clients connected to any server.

What's Next

Load balancers distribute traffic across servers, but those servers need to store and retrieve data efficiently. Continue to Part 4: Databases — SQL vs NoSQL where we explore database selection, indexing strategies, and the trade-offs between relational and non-relational data stores.

FAQ

What is the difference between a load balancer and a reverse proxy?

A reverse proxy sits in front of servers and forwards client requests, while a load balancer is a specific type of reverse proxy that distributes traffic across multiple backend servers based on an algorithm. Every load balancer is a reverse proxy, but not every reverse proxy is a load balancer. A reverse proxy with a single backend still provides benefits: SSL termination, caching, compression, and security (hiding backend topology). When you add multiple backends and a distribution algorithm, that reverse proxy becomes a load balancer.

Which load balancing algorithm should I use?

Round-robin works for stateless apps with uniform servers — it is the simplest and most predictable. Use least-connections for workloads with variable request durations, such as APIs where some endpoints query the database and others serve cached data. Use consistent hashing when you need session affinity or cache locality, particularly for caching layers where routing the same key to the same server maximizes cache hit rates. When in doubt, start with round-robin and switch to least-connections if you observe uneven server utilization.

How do load balancers handle server failures?

Load balancers perform health checks on backend servers and automatically remove unhealthy nodes from the pool, redistributing traffic to healthy servers until the failed ones recover. Active health checks send periodic probes (e.g., HTTP GET to /health every 10 seconds) and mark a server unhealthy after a configured number of consecutive failures. Passive health checks monitor real request outcomes and remove servers that produce too many errors. Production systems typically use both: active checks for fast detection and passive checks as a safety net for issues the health endpoint does not cover.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

Start a Conversation

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Set Up CI/CD with GitHub Actions and Docker

PostgreSQL Indexing Strategies That Actually Matter

Design an E-Commerce Order Processing System

Jan 10, 202612 min read

System Design

E-Commerce

Saga Pattern

Design an E-Commerce Order Processing System

Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.

Read Article

Monitoring, Observability, and Site Reliability

Dec 10, 20259 min read

System Design

Observability

Monitoring

Monitoring, Observability, and Site Reliability

Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.

Read Article

Nov 12, 202510 min read

System Design

CAP Theorem

Distributed Systems

CAP Theorem and Distributed Consensus

Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.

Read Article

Load Balancing and Reverse Proxies: Distributing Traffic

Load Balancing and Reverse Proxies: Distributing Traffic

TL;DR

Why This Matters

Core Concepts

What Is a Reverse Proxy?

Layer 4 vs Layer 7 Load Balancing

Load Balancing Algorithms

Health Checks

SSL/TLS Termination

Sticky Sessions

Practical Implementation

Nginx as a Load Balancer

AWS ALB vs NLB

Trade-offs and Decision Framework

Common Interview Questions

What's Next

FAQ

What is the difference between a load balancer and a reverse proxy?

Which load balancing algorithm should I use?

How do load balancers handle server failures?

Need help with a project?

Let's Build It

Sadam Hussain

Related Articles

Design an E-Commerce Order Processing System

Monitoring, Observability, and Site Reliability

CAP Theorem and Distributed Consensus