Polyglot Database Architecture for E-Commerce at Scale
How we designed a polyglot database architecture using PostgreSQL, MongoDB, Redis, and Elasticsearch for an e-commerce platform, with data flow patterns and consistency strategies.
Tags
Polyglot Database Architecture for E-Commerce at Scale
TL;DR
Using MySQL for transactional data (orders, inventory, payments), MongoDB for product catalogs and vendor profiles, Redis for caching, and Elasticsearch for search gave each workload its optimal storage engine. Change data capture with Debezium kept everything in sync with eventual consistency. The architecture let us scale each data tier independently while maintaining data integrity where it mattered most.
The Challenge
I was building an e-commerce platform that had outgrown its single-database architecture. Everything lived in MySQL: product catalogs with wildly varying attribute schemas, order transactions requiring strict ACID guarantees, vendor profiles with nested documents, full-text search across millions of products, and session/cart data that needed sub-millisecond reads.
The symptoms were clear:
- ›Product catalog queries were slow because we had an Entity-Attribute-Value (EAV) pattern to handle variable product attributes, requiring multiple joins for every product listing
- ›Search was being handled by MySQL
LIKEqueries, which fell apart at scale and could not support faceted filtering, typo tolerance, or relevance ranking - ›The same MySQL instance was handling both write-heavy order processing and read-heavy catalog browsing, creating lock contention during peak traffic
- ›Vendor onboarding required schema migrations every time a new vendor category introduced new profile fields
The decision to adopt a polyglot persistence strategy was not taken lightly. Multiple databases mean multiple failure modes, multiple backup strategies, and multiple skill sets to maintain. But the pain of forcing everything into a single relational database had become the bigger problem.
The Architecture
Data Ownership Boundaries
The first and most important decision was defining which database owned which data. Ownership meant that database was the source of truth for that domain, and all other systems derived their copies from it.
MySQL (Source of Truth: Transactional Data)
├── Orders (order_id, user_id, status, total, created_at)
├── Order Items (order_item_id, order_id, product_id, quantity, price)
├── Inventory (product_id, warehouse_id, quantity, reserved)
├── Payments (payment_id, order_id, amount, status, provider_ref)
└── Users (user_id, email, hashed_password, created_at)
MongoDB (Source of Truth: Content & Catalog Data)
├── Products (flexible schema per category)
├── Vendor Profiles (nested documents, varying structures)
├── Categories (hierarchical, with inherited attributes)
└── Reviews (denormalized with user display info)
Redis (Derived/Ephemeral Data)
├── Session Data
├── Cart State
├── Rate Limiting Counters
├── Cached Product Listings
└── Inventory Availability Cache
Elasticsearch (Derived Data — fed by CDC)
├── Product Search Index
├── Vendor Search Index
└── Order Search Index (for admin/support tools)
MySQL for Transactional Integrity
Orders, inventory, and payments required ACID guarantees. When a customer places an order, we need to atomically deduct inventory, create the order record, and initiate payment. If any step fails, everything rolls back.
// order-service.ts — transactional order creation
import { getConnection } from 'typeorm';
async function createOrder(
userId: string,
items: CartItem[],
paymentMethod: PaymentMethod
): Promise<Order> {
const connection = getConnection();
const queryRunner = connection.createQueryRunner();
await queryRunner.startTransaction('SERIALIZABLE');
try {
// 1. Lock and verify inventory for all items
for (const item of items) {
const inventory = await queryRunner.manager
.createQueryBuilder(Inventory, 'inv')
.setLock('pessimistic_write')
.where('inv.product_id = :productId AND inv.warehouse_id = :warehouseId', {
productId: item.productId,
warehouseId: item.warehouseId,
})
.getOne();
if (!inventory || inventory.available < item.quantity) {
throw new InsufficientInventoryError(item.productId);
}
// Reserve inventory
inventory.reserved += item.quantity;
await queryRunner.manager.save(inventory);
}
// 2. Create order record
const order = queryRunner.manager.create(Order, {
userId,
status: OrderStatus.PENDING_PAYMENT,
items: items.map(item => ({
productId: item.productId,
quantity: item.quantity,
unitPrice: item.price,
})),
total: items.reduce((sum, i) => sum + i.price * i.quantity, 0),
});
await queryRunner.manager.save(order);
// 3. Initiate payment
const payment = await paymentService.charge({
orderId: order.id,
amount: order.total,
method: paymentMethod,
});
if (payment.status === 'failed') {
throw new PaymentFailedError(payment.declineReason);
}
order.status = OrderStatus.CONFIRMED;
await queryRunner.manager.save(order);
await queryRunner.commitTransaction();
return order;
} catch (error) {
await queryRunner.rollbackTransaction();
throw error;
} finally {
await queryRunner.release();
}
}The SERIALIZABLE isolation level and pessimistic write locks on inventory rows prevented overselling during concurrent purchases of the same product. This is the exact scenario where MySQL's transactional guarantees are essential and where a document database would be the wrong tool.
MongoDB for Product Catalogs
Product data was the worst fit for a relational schema. A t-shirt has size and color. A laptop has CPU, RAM, storage, and screen size. A food item has allergens, nutritional info, and expiry date. In MySQL, this required either an EAV table (performance nightmare) or a JSON column (losing the ability to index and query individual attributes efficiently).
MongoDB's flexible document model solved this naturally:
// Product documents — different categories, different shapes
// All valid, all in the same collection
const tshirtProduct = {
_id: ObjectId('...'),
sku: 'TS-BLU-M',
name: 'Classic Cotton T-Shirt',
vendorId: 'vendor_123',
category: ['apparel', 'tops', 't-shirts'],
basePrice: 29.99,
attributes: {
size: 'M',
color: 'Blue',
material: '100% Cotton',
fit: 'Regular',
},
variants: [
{ sku: 'TS-BLU-S', attributes: { size: 'S' }, priceAdjustment: 0 },
{ sku: 'TS-BLU-L', attributes: { size: 'L' }, priceAdjustment: 0 },
{ sku: 'TS-BLU-XL', attributes: { size: 'XL' }, priceAdjustment: 2.00 },
],
media: [
{ type: 'image', url: '/images/ts-blu-front.jpg', alt: 'Front view' },
{ type: 'image', url: '/images/ts-blu-back.jpg', alt: 'Back view' },
],
seo: {
metaTitle: 'Classic Cotton T-Shirt - Blue',
metaDescription: 'Comfortable 100% cotton t-shirt in classic blue.',
slug: 'classic-cotton-t-shirt-blue',
},
};
const laptopProduct = {
_id: ObjectId('...'),
sku: 'LP-PRO-15',
name: 'ProBook 15 Laptop',
vendorId: 'vendor_456',
category: ['electronics', 'computers', 'laptops'],
basePrice: 1299.99,
attributes: {
cpu: 'Intel Core i7-13700H',
ram: '16GB DDR5',
storage: '512GB NVMe SSD',
screenSize: '15.6"',
resolution: '1920x1080',
gpu: 'Intel Iris Xe',
batteryLife: '10 hours',
weight: '1.8 kg',
ports: ['USB-C x2', 'USB-A x1', 'HDMI', '3.5mm audio'],
},
variants: [
{
sku: 'LP-PRO-15-32',
attributes: { ram: '32GB DDR5', storage: '1TB NVMe SSD' },
priceAdjustment: 300.00,
},
],
media: [/* ... */],
seo: {/* ... */},
};Vendor profiles had the same flexibility requirement. A clothing vendor needed fields for sizing charts and fabric sourcing. An electronics vendor needed warranty information and certification documents. MongoDB let each vendor profile contain exactly the fields relevant to their business without schema migrations.
// Vendor profile — flexible nested documents
const vendorProfile = {
_id: ObjectId('...'),
businessName: 'TechParts Direct',
contactInfo: {
primaryEmail: 'ops@techparts.example.com',
phone: '+1-555-0123',
address: {
street: '123 Commerce Way',
city: 'Toronto',
province: 'ON',
postalCode: 'M5V 1A1',
country: 'CA',
},
},
certifications: [
{ name: 'ISO 9001', issuedBy: 'Bureau Veritas', expiresAt: '2026-03-01' },
],
// Category-specific fields — no schema change required
electronicsSpecific: {
warrantyPolicy: '2-year manufacturer warranty',
returnWindow: 30,
authorizedRepairPartner: true,
},
performance: {
avgShipTime: 2.3,
returnRate: 0.04,
rating: 4.6,
totalOrders: 12847,
},
};Data Synchronization with Change Data Capture
The critical question with polyglot persistence is: how do you keep data consistent across databases? We used Debezium for change data capture (CDC) from MySQL, and MongoDB change streams for MongoDB-sourced updates.
// CDC consumer — listens to MySQL changes, updates Elasticsearch
// Simplified representation of the Debezium connector pipeline
interface CDCEvent {
source: 'mysql' | 'mongodb';
operation: 'INSERT' | 'UPDATE' | 'DELETE';
table: string;
before: Record<string, unknown> | null;
after: Record<string, unknown> | null;
timestamp: number;
}
class SearchIndexSynchronizer {
constructor(
private esClient: ElasticsearchClient,
private productRepo: MongoProductRepository
) {}
async handleOrderEvent(event: CDCEvent): Promise<void> {
// Order status changes update the admin search index
if (event.table === 'orders' && event.operation === 'UPDATE') {
const order = event.after as OrderRow;
await this.esClient.update({
index: 'orders',
id: order.id.toString(),
body: {
doc: {
status: order.status,
updatedAt: new Date(event.timestamp).toISOString(),
},
},
});
}
}
async handleInventoryEvent(event: CDCEvent): Promise<void> {
// Inventory changes update Redis cache and product availability in ES
if (event.table === 'inventory') {
const inventory = event.after as InventoryRow;
const available = inventory.quantity - inventory.reserved;
// Update Redis cache for real-time availability checks
await this.redisClient.set(
`inventory:${inventory.product_id}:${inventory.warehouse_id}`,
available.toString(),
'EX',
300 // 5 minute TTL as safety net
);
// Update Elasticsearch for search filtering
await this.esClient.update({
index: 'products',
id: inventory.product_id.toString(),
body: {
doc: { inStock: available > 0 },
},
});
}
}
async handleProductChangeStream(change: ChangeStreamDocument): Promise<void> {
// MongoDB product changes sync to Elasticsearch search index
if (change.operationType === 'update' || change.operationType === 'replace') {
const product = change.fullDocument;
await this.esClient.index({
index: 'products',
id: product._id.toString(),
body: this.mapProductToSearchDocument(product),
});
}
}
private mapProductToSearchDocument(product: ProductDocument) {
return {
name: product.name,
description: product.description,
category: product.category,
vendorId: product.vendorId,
price: product.basePrice,
attributes: product.attributes,
// Flatten for faceted search
...this.flattenAttributes(product.attributes),
};
}
}The CDC approach avoided the dual-write problem. Instead of writing to MySQL and then writing to Elasticsearch in the same application code (which risks inconsistency if the second write fails), we wrote to the source of truth and let the CDC pipeline propagate changes asynchronously.
Handling Cross-Database Queries
Some operations naturally span databases. Displaying an order details page requires order data from MySQL and product details from MongoDB. We handled this with an aggregation layer:
// order-details-aggregator.ts
async function getOrderDetails(orderId: string): Promise<OrderDetailsView> {
// Fetch order from MySQL
const order = await mysqlOrderRepo.findById(orderId);
if (!order) throw new NotFoundError('Order not found');
// Fetch product details from MongoDB for all order items
const productIds = order.items.map(item => item.productId);
const products = await mongoProductRepo.findByIds(productIds);
// Build a lookup map for O(1) access
const productMap = new Map(
products.map(p => [p._id.toString(), p])
);
// Compose the view
return {
orderId: order.id,
status: order.status,
createdAt: order.createdAt,
items: order.items.map(item => {
const product = productMap.get(item.productId);
return {
productId: item.productId,
productName: product?.name ?? 'Product unavailable',
productImage: product?.media?.[0]?.url ?? null,
quantity: item.quantity,
unitPrice: item.unitPrice,
lineTotal: item.quantity * item.unitPrice,
};
}),
total: order.total,
};
}This is an explicit join at the application layer. It is less elegant than a SQL join, but it allows each database to serve the data it is optimized for. The product name and image are display-only context on the order page; the authoritative order data (quantities, prices, totals) comes from MySQL.
Key Decisions & Trade-offs
MySQL over PostgreSQL for transactional data. The team had deep MySQL expertise, and the existing codebase was already on MySQL. PostgreSQL's JSON support might have reduced the need for MongoDB for some use cases, but the migration cost and the learning curve were not justified given our timeline. The pragmatic choice was to keep MySQL for what it did well and add purpose-built tools for what it did not.
Eventual consistency acceptance. CDC introduces a propagation delay. A product updated in MongoDB might take a few hundred milliseconds to appear in Elasticsearch search results. We accepted this trade-off because product catalog updates are not time-critical in the way that inventory deductions are. For inventory, we used Redis with short TTLs and fell back to MySQL for authoritative reads during cache misses.
Denormalization in MongoDB. Product documents included denormalized vendor display names and category breadcrumbs. This meant updating a vendor's name required updating all their product documents. We chose this trade-off because product reads vastly outnumbered vendor name changes, and the MongoDB change stream handled propagation when vendor data did change.
Redis as cache, not source of truth. Redis was used exclusively for derived and ephemeral data. Cart state, session data, and inventory availability caches could all be rebuilt from MySQL and MongoDB if Redis was lost. This meant a Redis failure caused degraded performance (cache misses) but never data loss.
Results & Outcomes
Product listing pages became dramatically faster once catalog data was served from MongoDB documents instead of multi-join EAV queries. Search moved from basic keyword matching to full faceted search with typo tolerance, synonym support, and relevance ranking powered by Elasticsearch.
Order processing throughput improved because the read-heavy catalog workload no longer competed with write-heavy transactional operations for the same database resources. Each database could be scaled independently based on its workload characteristics.
Vendor onboarding became a configuration task instead of a schema migration. New vendor categories with unique profile fields could be added without database changes, reducing onboarding time from days (waiting for a migration to be reviewed, tested, and deployed) to hours.
The CDC pipeline proved reliable in production. Data propagation from MySQL to Elasticsearch typically completed within a few hundred milliseconds, and the system handled CDC consumer restarts gracefully by replaying from the last committed offset.
What I'd Do Differently
Evaluate PostgreSQL more seriously. PostgreSQL's jsonb type with GIN indexes might have covered the product catalog use case well enough to avoid introducing MongoDB entirely. Fewer databases means fewer operational concerns. I would prototype the catalog workload on PostgreSQL jsonb before committing to a separate document database.
Implement a unified data access layer from the start. The aggregation code that joins data across databases was scattered across multiple services initially. A centralized data access layer with clear interfaces for cross-database reads would have been cleaner and easier to maintain.
Add CDC lag monitoring earlier. We did not have good visibility into CDC pipeline lag until we hit a situation where Elasticsearch search results were minutes behind MySQL inventory updates. Adding lag monitoring and alerting from day one would have caught this during initial rollout instead of in production.
Consider CRDTs for cart state. We used Redis for cart state, which worked but had edge cases with concurrent modifications from multiple devices. Conflict-free Replicated Data Types (CRDTs) would have handled concurrent cart modifications more gracefully, though the added complexity might not have been worth it for our scale.
FAQ
What is polyglot persistence?
Polyglot persistence means using different database technologies for different data needs within the same application. Instead of forcing all data into one database, you pick the best tool for each workload: relational databases for transactions that require ACID guarantees, document databases for flexible schemas where each record can have different fields, key-value stores for high-speed caching, and search engines for full-text queries with faceting and relevance ranking. The key is defining clear ownership boundaries so each database is the source of truth for its domain, with synchronization handled through event-driven patterns like change data capture.
How do you keep multiple databases in sync?
We used change data capture (CDC) with Debezium to stream changes from MySQL to Elasticsearch. For MongoDB, we used native change streams. This event-driven approach provides eventual consistency with minimal latency and avoids the dual-write problem where application code writes to two databases and one write fails. The CDC consumer reads the database transaction log, converts changes to events, and applies them to downstream databases. If the consumer falls behind or restarts, it replays from the last committed offset, ensuring no changes are lost. The trade-off is eventual consistency: there is a small window (typically milliseconds) where downstream databases lag behind the source of truth.
Does polyglot persistence add too much complexity?
It adds operational complexity that is only justified when a single database has become a genuine bottleneck. For most applications, PostgreSQL handles relational, document, and even basic full-text search workloads adequately. We adopted polyglot persistence when product catalog flexibility, search quality, and workload isolation became bottlenecks that a single MySQL instance could not resolve efficiently. The complexity cost includes operating multiple database clusters, maintaining CDC pipelines, handling cross-database queries at the application layer, and training the team on multiple database technologies. If you can solve your problem with one database, that is almost always the better choice.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
Optimizing Core Web Vitals for e-Commerce
Our journey to scoring 100 on Google PageSpeed Insights for a major Shopify-backed e-commerce platform.
Building an AI-Powered Interview Feedback System
How we built an AI-powered system that analyzes mock interview recordings and generates structured feedback on communication, technical accuracy, and problem-solving approach using LLMs.
Migrating from Pages to App Router
A detailed post-mortem on migrating a massive enterprise dashboard from Next.js Pages Router to the App Router.