Architecting GraphQL Gateways
Designing a unified API layer using Apollo Federation to aggregate disparate microservices into a single cohesive graph.
Tags
Architecting GraphQL Gateways
TL;DR
A GraphQL Gateway aggregates multiple backend microservices into a single unified API endpoint. Apollo Federation is the dominant pattern for this, letting independent teams own their subgraphs while a gateway composes them at runtime. This guide covers federation architecture, schema design, solving the N+1 problem with DataLoader, caching strategies at the gateway level, and error handling patterns that keep your graph resilient when downstream services fail.
Why This Matters
The transition from monolith to microservices creates an immediate problem for frontend teams. A single page that previously made one API call to the monolith now needs data from five or six different services -- users from the identity service, orders from the commerce service, recommendations from the ML service, and so on. The frontend becomes an orchestration layer, making waterfall requests and stitching JSON responses together in the browser.
This is backwards. The backend should handle data composition, not the client. A GraphQL Gateway solves this by giving clients a single endpoint that understands the shape of the data they need and distributes queries to the right backend services automatically.
The alternative -- a Backend-for-Frontend (BFF) REST layer -- works for small teams but does not scale. Every new feature requires changes to both the BFF and the frontend. With a GraphQL Gateway, the frontend team can query new fields without waiting for the BFF team to expose them.
How It Works
Schema Stitching vs. Apollo Federation
There are two main approaches to building a GraphQL Gateway, and the distinction matters.
Schema Stitching was the original approach. The gateway fetches schemas from each service, merges them into a single schema, and resolves conflicts manually. It works, but it creates tight coupling between the gateway and the services. Every schema change requires gateway redeployment.
Apollo Federation inverts the relationship. Each service declares its own subgraph and explicitly marks which types it extends from other services. The gateway uses a composition algorithm to build the supergraph at startup.
# Users Subgraph
type User @key(fields: "id") {
id: ID!
name: String!
email: String!
}
type Query {
user(id: ID!): User
users: [User!]!
}# Orders Subgraph
type Order @key(fields: "id") {
id: ID!
total: Float!
items: [OrderItem!]!
}
# Extending User from the Users subgraph
type User @key(fields: "id") {
id: ID! @external
orders: [Order!]!
}The @key directive tells the gateway which fields uniquely identify an entity across services. The @external directive marks fields that are owned by another subgraph but needed for resolution.
Gateway Architecture
The gateway sits between clients and subgraphs. When a query arrives, the gateway builds a query plan -- a directed acyclic graph of fetches to the underlying services.
import { ApolloGateway, IntrospectAndCompose } from '@apollo/gateway';
import { ApolloServer } from '@apollo/server';
const gateway = new ApolloGateway({
supergraphSdl: new IntrospectAndCompose({
subgraphs: [
{ name: 'users', url: 'http://users-service:4001/graphql' },
{ name: 'orders', url: 'http://orders-service:4002/graphql' },
{ name: 'products', url: 'http://products-service:4003/graphql' },
],
}),
});
const server = new ApolloServer({ gateway });In production, you would use a managed federation approach where subgraphs publish their schemas to a registry (like Apollo Studio), and the gateway fetches the composed supergraph from the registry. This decouples deployment -- subgraphs can update their schemas without restarting the gateway.
Solving the N+1 Problem with DataLoader
The N+1 problem is the biggest performance trap in GraphQL. Consider this query:
query {
orders {
id
total
user {
name
}
}
}Without optimization, resolving 100 orders means 1 query for the order list + 100 individual queries for each user. DataLoader batches and deduplicates these into a single bulk fetch:
import DataLoader from 'dataloader';
const userLoader = new DataLoader(async (userIds: string[]) => {
const users = await userService.getUsersByIds(userIds);
// DataLoader requires results in the same order as the input keys
return userIds.map(id => users.find(u => u.id === id));
});
const resolvers = {
Order: {
user: (order) => userLoader.load(order.userId),
},
};DataLoader instances must be created per-request to prevent data leaking between users. Create them in the context factory:
const server = new ApolloServer({
typeDefs,
resolvers,
context: ({ req }) => ({
loaders: {
user: new DataLoader(batchUsers),
product: new DataLoader(batchProducts),
},
}),
});Caching at the Gateway Level
GraphQL caching is more nuanced than REST caching because queries are POST requests with dynamic shapes. There are several strategies:
Response caching stores full query results keyed by the query string and variables. This works well for public, unauthenticated queries.
import responseCachePlugin from '@apollo/server-plugin-response-cache';
const server = new ApolloServer({
plugins: [
responseCachePlugin({
sessionId: (ctx) => ctx.request.http.headers.get('authorization') || null,
}),
],
});Partial query caching uses cache hints on individual fields:
type Product @cacheControl(maxAge: 3600) {
id: ID!
name: String!
price: Float! @cacheControl(maxAge: 60)
inventory: Int! @cacheControl(maxAge: 0)
}Persisted queries reduce bandwidth by sending a query hash instead of the full query string, and they enable CDN caching by converting POST requests to GET requests with cacheable URLs.
Practical Implementation
A production gateway needs health checks, graceful degradation, and observability. Here is a more complete setup:
import { ApolloGateway, RemoteGraphQLDataSource } from '@apollo/gateway';
class AuthenticatedDataSource extends RemoteGraphQLDataSource {
willSendRequest({ request, context }) {
// Forward auth headers to subgraphs
request.http.headers.set('authorization', context.authToken);
request.http.headers.set('x-request-id', context.requestId);
}
didReceiveResponse({ response, context }) {
// Capture subgraph response metrics
const duration = Date.now() - context.startTime;
metrics.record('subgraph_latency', duration, {
subgraph: this.name,
});
return response;
}
}
const gateway = new ApolloGateway({
buildService({ url }) {
return new AuthenticatedDataSource({ url });
},
});Common Pitfalls
Overly granular subgraphs. Splitting every database table into its own subgraph creates unnecessary network hops. Group subgraphs by business domain, not by data model. A "commerce" subgraph that owns orders, carts, and payments is better than three separate services.
Ignoring query complexity. Without limits, a single deeply nested query can trigger thousands of downstream requests. Use query depth limiting and cost analysis:
import { createComplexityLimitRule } from 'graphql-validation-complexity';
const server = new ApolloServer({
validationRules: [createComplexityLimitRule(1000)],
});No timeout on subgraph calls. If one subgraph hangs, the entire query hangs. Set aggressive timeouts and return partial data when possible.
Leaking internal schema details. Your subgraph schemas are an internal implementation detail. The composed supergraph is your public API. Use @inaccessible to hide fields that should not be exposed to clients.
Skipping the schema registry. Without a registry, you cannot validate that a subgraph schema change will not break composition. Apollo Studio and GraphQL Hive both provide composition checks in CI pipelines.
When to Use (and When Not To)
A GraphQL Gateway is the right choice when:
- ›Multiple frontend teams consume data from multiple backend services
- ›You need a unified API for mobile, web, and third-party consumers
- ›Teams need to iterate on their schemas independently
- ›You have complex data requirements that span multiple domains
Skip the gateway when:
- ›You have a single backend service (just expose GraphQL directly)
- ›Your API consumers are all internal and you can coordinate easily
- ›Your data model is simple and REST serves it well
- ›Your team is small and the operational overhead is not justified
Consider alternatives when:
- ›You need real-time data (GraphQL subscriptions through a gateway add complexity -- consider a dedicated WebSocket service)
- ›Your services are not HTTP-based (gRPC services need a translation layer)
FAQ
What is a GraphQL Gateway?
A GraphQL Gateway acts as a reverse proxy that understands your data requirements, aggregating multiple backend microservices into a single cohesive API endpoint that clients can query.
How does Apollo Federation work?
Federation allows different backend teams to build independent GraphQL APIs (subgraphs), while a gateway automatically stitches them together, intelligently distributing client queries to the appropriate services.
When should you use a GraphQL Gateway over REST?
Use a GraphQL Gateway when your frontend needs data from multiple microservices for a single view, as it eliminates the need for multiple REST calls and prevents over-fetching of data.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
How to Design API Contracts Between Micro-Frontends and BFFs
Learn how to design stable API contracts between Micro-Frontends and Backend-for-Frontend layers with versioning, ownership boundaries, error handling, and schema governance.
Next.js BFF Architecture
An architectural deep dive into using Next.js as a Backend-for-Frontend, including route handlers, server components, auth boundaries, caching, and service orchestration.
Next.js Cache Components and PPR in Real Apps
A practical guide to using Next.js Cache Components and Partial Prerendering in real applications, with tradeoffs, cache strategy, and freshness considerations.