Blog/System Design/Database Sharding and Partitioning at Scale

POST

September 18, 2025

LAST UPDATEDSeptember 18, 2025

Database Sharding and Partitioning at Scale

Deep dive into database sharding and partitioning strategies. Learn hash-based, range-based, and geographic sharding with real-world examples and trade-offs.

Database Sharding and Partitioning at Scale

This is Part 7 of the System Design from Zero to Hero series.

TL;DR

Sharding splits a database across multiple machines by a shard key, enabling horizontal scaling but introducing complexity in cross-shard queries and rebalancing. Choosing the right shard key and partitioning strategy is one of the most consequential decisions you will make in a distributed system.

Why This Matters

In Part 4, we covered how to choose between SQL and NoSQL databases. In Part 2, we discussed vertical and horizontal scaling. But there comes a point where even the beefiest single database server cannot handle your data volume or query load. At that point, you need to split your data across multiple machines.

This is where sharding enters the picture. Every large-scale system -- from social media platforms handling billions of posts to e-commerce platforms processing millions of orders -- relies on some form of data partitioning. Getting it right means seamless scaling. Getting it wrong means years of painful migrations.

Core Concepts

Partitioning vs Sharding

These terms are often used interchangeably, but they are distinct:

Partitioning splits data within a single database instance. PostgreSQL table partitioning, for example, divides a large table into smaller physical tables based on a partition key. The database engine handles routing queries to the correct partition transparently.

Sharding distributes partitions across multiple database instances running on different machines. Each shard is an independent database that holds a subset of the total data. The application (or a proxy layer) must route queries to the correct shard.

Vertical Partitioning splits a table by columns. You might store frequently accessed columns (user_id, name, email) in one table and rarely accessed columns (bio, preferences, settings) in another. This reduces row size and improves cache efficiency.

Horizontal Partitioning splits a table by rows. Users with IDs 1-1,000,000 go to partition A, and users with IDs 1,000,001-2,000,000 go to partition B. This is the basis of sharding.

Range-Based Sharding

Range-based sharding assigns rows to shards based on value ranges of the shard key.

Shard 1: user_id 1 - 1,000,000
Shard 2: user_id 1,000,001 - 2,000,000
Shard 3: user_id 2,000,001 - 3,000,000

Advantages:

›Range queries are efficient (find all users with IDs between X and Y)
›Easy to understand and implement
›Supports sequential scans within a range

Disadvantages:

›Prone to hotspots. If new users are always assigned incrementing IDs, Shard 3 handles all the write traffic while Shard 1 sits idle.
›Uneven data distribution if ranges are not carefully calibrated

Range-based sharding works well for time-series data where you query by time ranges and can tolerate that recent data shards are hotter than historical ones.

Hash-Based Sharding

Hash-based sharding applies a hash function to the shard key and uses modulo to determine the shard:

python

def get_shard(user_id: int, num_shards: int) -> int:
    return hash(user_id) % num_shards

This distributes data uniformly across shards, eliminating the hotspot problem of range-based sharding.

The critical flaw: When you add or remove shards, num_shards changes, and almost every key maps to a different shard. This means rehashing and moving nearly all your data -- a catastrophic operation on a live system.

Consistent Hashing Ring

Consistent hashing solves the rehashing problem. Instead of hash(key) % N, it maps both keys and nodes onto a circular hash space (a ring).

            Node A (position 90)
               /        \
              /          \
    Node D (position 340)  Node B (position 180)
              \          /
               \        /
            Node C (position 270)

Each key is assigned to the first node encountered when walking clockwise from the key's position on the ring. When a node is added or removed, only the keys between the new node and its predecessor need to move -- roughly 1/N of the total data instead of nearly all of it.

python

import hashlib
from bisect import bisect_right
 
class ConsistentHashRing:
    def __init__(self, nodes=None, virtual_nodes=150):
        self.ring = {}
        self.sorted_keys = []
        self.virtual_nodes = virtual_nodes
 
        if nodes:
            for node in nodes:
                self.add_node(node)
 
    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
 
    def add_node(self, node: str):
        """Add a node with virtual nodes for better distribution."""
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:vn{i}"
            hash_val = self._hash(virtual_key)
            self.ring[hash_val] = node
            self.sorted_keys.append(hash_val)
        self.sorted_keys.sort()
 
    def remove_node(self, node: str):
        """Remove a node and all its virtual nodes."""
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:vn{i}"
            hash_val = self._hash(virtual_key)
            del self.ring[hash_val]
            self.sorted_keys.remove(hash_val)
 
    def get_node(self, key: str) -> str:
        """Find which node a key belongs to."""
        if not self.ring:
            raise Exception("No nodes in ring")
 
        hash_val = self._hash(key)
        idx = bisect_right(self.sorted_keys, hash_val)
 
        # Wrap around to the first node if past the end
        if idx == len(self.sorted_keys):
            idx = 0
 
        return self.ring[self.sorted_keys[idx]]
 
 
# Usage
ring = ConsistentHashRing(nodes=["shard-1", "shard-2", "shard-3"])
shard = ring.get_node("user:12345")  # Deterministically routes to a shard
 
# Adding a new shard only moves ~1/N of the keys
ring.add_node("shard-4")

Virtual nodes are critical. Without them, nodes may be unevenly spaced on the ring, leading to skewed data distribution. Each physical node gets multiple positions (virtual nodes) on the ring, smoothing out the distribution.

Shard Key Selection

The shard key is the single most important decision in a sharding strategy. A bad shard key can make the entire system unusable.

Good shard key properties:

›High cardinality -- Many distinct values to distribute across shards evenly
›Even distribution -- No single value dominates (avoid country_code if 60% of users are in one country)
›Query alignment -- Most queries include the shard key, avoiding cross-shard operations
›Immutability -- Changing a shard key value means moving the row between shards

Common shard keys by domain:

›Multi-tenant SaaS: tenant_id (all tenant data on one shard, no cross-shard queries for tenant operations)
›Social media: user_id (user's posts, followers, and activity colocated)
›E-commerce: order_id or customer_id depending on query patterns
›Time-series: device_id + time range (not timestamp alone, which creates write hotspots)

Cross-Shard Queries and Joins

Once data is sharded, queries that span multiple shards become expensive. A query like "find all orders over $100 across all customers" must be sent to every shard, and results must be aggregated at the application layer.

python

async def cross_shard_query(query: str, shards: list) -> list:
    """Execute a query across all shards and merge results."""
    tasks = []
    for shard in shards:
        tasks.append(execute_on_shard(shard, query))
 
    # Fan out to all shards in parallel
    results = await asyncio.gather(*tasks)
 
    # Merge and sort results at the application layer
    merged = []
    for result_set in results:
        merged.extend(result_set)
 
    return sorted(merged, key=lambda r: r['created_at'], reverse=True)

Strategies for minimizing cross-shard operations:

›Denormalize: Store frequently joined data together on the same shard
›Global tables: Replicate small, rarely-changing reference tables to every shard
›Application-level joins: Fetch from multiple shards and join in application code

Global Secondary Indexes

When you shard by user_id but need to look up users by email, you have two options:

Local secondary index: Each shard indexes only its own data. Looking up by email requires querying all shards (scatter-gather). Simple to maintain, expensive to query.

Global secondary index: A separate service maps email to the shard containing that user. Fast lookups, but the index itself becomes a component that must be kept in sync with all shards.

Rebalancing Strategies

When shards become unevenly loaded, you need to rebalance:

›Split and merge: Split a hot shard into two smaller ones, or merge cold shards. Requires application-level routing updates.
›Virtual shards (logical shards): Create many more logical shards than physical nodes. Rebalancing means reassigning logical shards to different physical nodes without moving data within a shard.
›Online resharding: Tools like Vitess perform live resharding by copying data to new shards while serving reads from old shards, then cutting over.

Practical Implementation

Vitess (MySQL Sharding)

Vitess, originally built at YouTube, adds horizontal sharding to MySQL:

yaml

# Vitess VSchema - defines sharding rules
{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    }
  },
  "tables": {
    "users": {
      "column_vindexes": [
        {
          "column": "user_id",
          "name": "hash"
        }
      ]
    },
    "orders": {
      "column_vindexes": [
        {
          "column": "user_id",
          "name": "hash"
        }
      ]
    }
  }
}

Vitess handles query routing, cross-shard queries, and online resharding transparently. Applications connect to Vitess as if it were a single MySQL instance.

Citus (PostgreSQL Sharding)

Citus extends PostgreSQL with distributed tables:

sql

-- Create a distributed table sharded by tenant_id
SELECT create_distributed_table('orders', 'tenant_id');
 
-- Colocate related tables on the same shards
SELECT create_distributed_table('order_items', 'tenant_id',
    colocate_with => 'orders');
 
-- Queries that include tenant_id are routed to a single shard
SELECT * FROM orders WHERE tenant_id = 42 AND status = 'pending';
 
-- Cross-shard queries work but are slower
SELECT status, count(*) FROM orders GROUP BY status;

The colocate_with option is powerful -- it ensures that related tables sharing the same shard key land on the same physical node, enabling local joins.

Trade-offs and Decision Framework

Shard when:

›A single database cannot handle the write throughput
›Dataset exceeds the storage capacity of a single machine
›You need data locality (geographic sharding)

Do not shard when:

›Read replicas can handle the load (as covered in Part 2)
›Vertical scaling is still viable
›Your data model requires many cross-entity joins
›Caching can reduce database load sufficiently

Sharding is a one-way door. Once you shard, you cannot easily unshard. Exhaust all other scaling options first -- read replicas, caching, query optimization, vertical scaling. Shard only when you must.

Strategy	Distribution	Range Queries	Hotspot Risk	Rebalancing
Range-based	Uneven	Efficient	High	Easy (split ranges)
Hash-based	Even	Scatter-gather	Low	Hard (rehash all)
Consistent hashing	Even	Scatter-gather	Low	Easy (move ~1/N)
Directory-based	Configurable	Depends	Configurable	Manual

Common Interview Questions

Q: You have a users table with 2 billion rows. How do you shard it? A: Shard by user_id using consistent hashing. Use virtual nodes for even distribution. Colocate user-related tables (posts, settings) on the same shard. Build a global secondary index for email-based lookups. Start with 256 logical shards mapped to a smaller number of physical nodes for easier future rebalancing.

Q: How do you handle a cross-shard JOIN between orders and products? A: Replicate the products table (small, read-heavy, rarely changes) to every shard as a reference table. This converts the cross-shard join into a local join. For truly large cross-shard queries, use a separate analytics pipeline that aggregates data outside the transactional path.

Q: What happens if one shard receives disproportionately more traffic? A: This is a hotspot. First, verify the shard key has sufficient cardinality. If one key value is extremely popular (a celebrity user), split that shard or use secondary sub-sharding. Monitor shard-level metrics (CPU, query latency, storage) and set up alerts for imbalance. Use virtual shards that can be redistributed without data movement.

Q: How do you perform schema migrations across shards? A: Use online schema change tools (pt-online-schema-change for MySQL, pg_repack for PostgreSQL) that work on each shard independently. Roll out changes shard by shard, not all at once. Ensure the application code handles both old and new schemas during the migration window.

What's Next

With data properly distributed across shards, Part 8: API Design, Rate Limiting, and Authentication covers how to expose your distributed system through well-designed APIs while protecting it from abuse.

FAQ

How do I choose the right shard key for my database?

Pick a shard key with high cardinality that evenly distributes data and aligns with your most common query patterns. Avoid keys that create hotspots, like timestamp-based keys in write-heavy systems.

What happens when I need to add more shards?

Adding shards requires resharding, which moves data between nodes. Use consistent hashing to minimize data movement, or use virtual shards that can be reassigned to new physical nodes without full redistribution.

What is the difference between sharding and partitioning?

Partitioning splits data within a single database instance (vertical or horizontal), while sharding distributes partitions across multiple database instances on different machines for horizontal scalability.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

Start a Conversation

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Building an Enterprise Design System with 50+ Components

Build a Multi-Tool AI Agent with MCP and Claude

Design an E-Commerce Order Processing System

Jan 10, 202612 min read

System Design

E-Commerce

Saga Pattern

Design an E-Commerce Order Processing System

Design a fault-tolerant e-commerce order system with inventory management, payment processing, saga pattern for transactions, and event-driven order fulfillment.

Read Article

Monitoring, Observability, and Site Reliability

Dec 10, 20259 min read

System Design

Observability

Monitoring

Monitoring, Observability, and Site Reliability

Build observable systems with structured logging, distributed tracing, and metrics dashboards. Learn SRE practices including SLOs, error budgets, and incident response.

Read Article

Nov 12, 202510 min read

System Design

CAP Theorem

Distributed Systems

CAP Theorem and Distributed Consensus

Understand the CAP theorem, its practical implications, and distributed consensus algorithms like Raft and Paxos. Learn how real databases handle partition tolerance.

Read Article

Database Sharding and Partitioning at Scale

Database Sharding and Partitioning at Scale

TL;DR

Why This Matters

Core Concepts

Partitioning vs Sharding

Range-Based Sharding

Hash-Based Sharding

Consistent Hashing Ring

Shard Key Selection

Cross-Shard Queries and Joins

Global Secondary Indexes

Rebalancing Strategies

Practical Implementation

Vitess (MySQL Sharding)

Citus (PostgreSQL Sharding)

Trade-offs and Decision Framework

Common Interview Questions

What's Next

FAQ

How do I choose the right shard key for my database?

What happens when I need to add more shards?

What is the difference between sharding and partitioning?

Need help with a project?

Let's Build It

Sadam Hussain

Related Articles

Design an E-Commerce Order Processing System

Monitoring, Observability, and Site Reliability

CAP Theorem and Distributed Consensus