Implementing Feature Flags for Progressive Rollout
How we implemented a feature flag system for progressive rollout across web and mobile, with percentage-based targeting, A/B testing integration, and instant kill switches for production incidents.
Tags
Implementing Feature Flags for Progressive Rollout
TL;DR
A custom feature flag service with percentage-based rollout and real-time kill switches let us ship daily with confidence across a micro-frontend platform. The system supported progressive rollout from 1% to 100%, A/B testing readiness through consistent user bucketing, backward-compatible API contracts, and CI/CD integration that tied flag states to deployment pipelines. Flag lifecycle management prevented stale flags from accumulating as tech debt.
The Challenge
We were running a micro-frontend platform with multiple teams shipping features independently. The problem was that "independently" did not mean "safely." A bad deploy from one team could cascade across the platform, and rolling back meant reverting an entire deployment, which could undo other teams' work in the process.
The platform served a multi-tenant enterprise client base, which meant we could not afford "move fast and break things." Each tenant had contractual SLAs, and a production incident affecting the wrong tenant at the wrong time had real business consequences.
We needed a system that let us:
- ›Ship code to production without immediately exposing it to users
- ›Gradually roll out features to catch issues early with limited blast radius
- ›Kill a feature instantly without deploying new code
- ›Run A/B tests using the same flag infrastructure
- ›Keep flag evaluation fast on both server-side Node.js and client-side React
I designed and built this feature flag system, including the flag evaluation engine, the management API, and the CI/CD integration points.
The Architecture
Flag Evaluation Engine
The core of the system was a deterministic flag evaluation engine. Given a flag key and a user context, it always returned the same result for the same inputs. This consistency was essential for both user experience (a user should not see a feature appear and disappear on page refreshes) and A/B testing validity.
// flag-evaluator.ts
import { createHash } from 'crypto';
interface FlagConfig {
key: string;
enabled: boolean;
rolloutPercentage: number; // 0-100
allowlist: string[]; // user IDs that always get the feature
blocklist: string[]; // user IDs that never get the feature
targetingRules: TargetingRule[]; // attribute-based targeting
variants?: Variant[]; // for A/B testing
killSwitch: boolean; // overrides everything
}
interface UserContext {
userId: string;
tenantId: string;
attributes: Record<string, string | number | boolean>;
}
function evaluateFlag(flag: FlagConfig, user: UserContext): FlagResult {
// Kill switch overrides everything — instant off
if (flag.killSwitch || !flag.enabled) {
return { enabled: false, variant: null };
}
// Blocklist check
if (flag.blocklist.includes(user.userId)) {
return { enabled: false, variant: null };
}
// Allowlist check — bypass percentage rollout
if (flag.allowlist.includes(user.userId)) {
return { enabled: true, variant: assignVariant(flag, user) };
}
// Targeting rules — evaluate attribute-based conditions
if (flag.targetingRules.length > 0) {
const matched = flag.targetingRules.some(rule =>
evaluateTargetingRule(rule, user)
);
if (!matched) {
return { enabled: false, variant: null };
}
}
// Percentage-based rollout using deterministic hashing
const bucket = getUserBucket(flag.key, user.userId);
const enabled = bucket < flag.rolloutPercentage;
return {
enabled,
variant: enabled ? assignVariant(flag, user) : null,
};
}
function getUserBucket(flagKey: string, userId: string): number {
// Deterministic hash ensures same user always gets same bucket
const hash = createHash('md5')
.update(`${flagKey}:${userId}`)
.digest('hex');
// Convert first 8 hex chars to number, mod 100 for bucket
const numericHash = parseInt(hash.substring(0, 8), 16);
return numericHash % 100;
}The MD5 hash approach was deliberately chosen over random assignment. By hashing the combination of flag key and user ID, the same user consistently lands in the same bucket. This means:
- ›A user at the 15% rollout stage who sees the feature will continue to see it at 25%, 50%, and 100%
- ›Different flags hash independently, so being in the "early" bucket for one flag does not affect other flags
- ›No state storage is needed for bucket assignments
Progressive Rollout Pipeline
Rollout was not a single percentage slider. It was a structured pipeline with gates:
// rollout-pipeline.ts
interface RolloutStage {
percentage: number;
duration: string; // minimum time at this stage
healthChecks: HealthCheck[];
autoAdvance: boolean; // auto-advance if health checks pass
}
const defaultRolloutPipeline: RolloutStage[] = [
{
percentage: 1,
duration: '1h',
healthChecks: [
{ type: 'error-rate', threshold: 0.01, window: '15m' },
{ type: 'latency-p99', threshold: 500, window: '15m' },
],
autoAdvance: false, // manual approval for first stage
},
{
percentage: 10,
duration: '4h',
healthChecks: [
{ type: 'error-rate', threshold: 0.01, window: '30m' },
{ type: 'latency-p99', threshold: 500, window: '30m' },
],
autoAdvance: true,
},
{
percentage: 25,
duration: '24h',
healthChecks: [
{ type: 'error-rate', threshold: 0.005, window: '1h' },
{ type: 'latency-p99', threshold: 500, window: '1h' },
{ type: 'business-metric', name: 'conversion-rate', regression: 0.05 },
],
autoAdvance: true,
},
{
percentage: 100,
duration: '48h',
healthChecks: [
{ type: 'error-rate', threshold: 0.005, window: '2h' },
],
autoAdvance: false, // manual confirmation for full rollout
},
];The health check integration was key. At each stage, the system monitored error rates and latency. If error rates spiked above the threshold during a rollout stage, the system would automatically halt advancement and alert the team. The first and final stages required manual approval, creating human checkpoints at the most critical moments.
Client-Side SDK
The React SDK needed to be lightweight (flags evaluated on the client should not add bundle size) and fast (no visible flicker when flags resolve):
// useFeatureFlag.ts
import { createContext, useContext, useEffect, useState } from 'react';
interface FlagStore {
flags: Map<string, FlagResult>;
loading: boolean;
}
const FlagContext = createContext<FlagStore>({
flags: new Map(),
loading: true,
});
export function FlagProvider({ children }: { children: React.ReactNode }) {
const [store, setStore] = useState<FlagStore>({
flags: new Map(),
loading: true,
});
useEffect(() => {
// Bootstrap: load all flags for current user on mount
fetchFlags().then(flags => {
setStore({ flags, loading: false });
});
// Real-time updates via SSE for kill switches
const eventSource = new EventSource('/api/flags/stream');
eventSource.addEventListener('flag-update', (event) => {
const update = JSON.parse(event.data);
setStore(prev => {
const next = new Map(prev.flags);
next.set(update.key, update.result);
return { ...prev, flags: next };
});
});
return () => eventSource.close();
}, []);
return (
<FlagContext.Provider value={store}>
{children}
</FlagContext.Provider>
);
}
export function useFeatureFlag(key: string): {
enabled: boolean;
variant: string | null;
loading: boolean;
} {
const { flags, loading } = useContext(FlagContext);
const result = flags.get(key);
return {
enabled: result?.enabled ?? false,
variant: result?.variant ?? null,
loading,
};
}The Server-Sent Events (SSE) connection enabled real-time kill switches. When an operator flipped the kill switch in the management dashboard, the SSE channel pushed the update to all connected clients within seconds, without requiring a page refresh or re-deployment.
Backward-Compatible API Contracts
Feature flags in a micro-frontend platform introduce a contract problem. If Team A's MFE enables a feature that changes the API contract, Team B's MFE (which has not adopted the flag yet) will break.
We solved this with versioned API responses:
// API controller with flag-aware response shaping
async function getOrderDetails(req: Request, res: Response) {
const order = await orderService.findById(req.params.id);
const flags = req.flagContext; // injected by middleware
// Base response — always present
const response: OrderResponse = {
id: order.id,
status: order.status,
items: order.items.map(formatItem),
total: order.total,
};
// New fields are additive, never replace existing ones
if (flags.isEnabled('enhanced-order-tracking')) {
response.tracking = {
carrier: order.trackingCarrier,
trackingNumber: order.trackingNumber,
estimatedDelivery: order.estimatedDelivery,
events: order.trackingEvents.map(formatTrackingEvent),
};
}
// Deprecated fields remain until all consumers migrate
if (!flags.isEnabled('remove-legacy-status-field')) {
response.statusText = legacyStatusMap[order.status];
}
res.json(response);
}The contract rule was: new fields are always additive. Existing fields could only be removed through a separate deprecation flag that was rolled out after all consumers had migrated.
CI/CD Integration
Feature flags were tied into the deployment pipeline at two points:
# .github/workflows/deploy.yml
steps:
- name: Validate flag references
run: |
# Scan codebase for flag keys and validate they exist in flag service
npx flag-lint check --config .flagrc.json
- name: Deploy to production
run: npm run deploy
- name: Set rollout stage
if: env.FLAG_KEY != ''
run: |
curl -X POST "$FLAG_SERVICE_URL/api/flags/$FLAG_KEY/rollout" \
-H "Authorization: Bearer $FLAG_TOKEN" \
-d '{"stage": "canary", "percentage": 1}'The flag-lint step was a custom script that parsed the codebase for all useFeatureFlag() and flags.isEnabled() calls, extracted the flag keys, and validated them against the flag service API. This caught typos in flag keys before they reached production and identified references to flags that had been archived.
Flag Lifecycle Management
Stale flags are tech debt. We enforced a lifecycle:
// Flag metadata includes lifecycle tracking
interface FlagMetadata {
key: string;
owner: string; // team responsible
createdAt: string;
expectedCleanupDate: string; // when the flag should be removed
status: 'active' | 'fully-rolled-out' | 'archived';
jiraTicket: string; // cleanup ticket auto-created
}When a flag reached 100% rollout and passed its stabilization period, the system automatically:
- ›Created a Jira ticket for flag cleanup
- ›Assigned it to the owning team
- ›Started a 30-day countdown
- ›Sent weekly reminders until the flag references were removed from code
- ›Archived the flag after cleanup was confirmed
This lifecycle management prevented the common failure mode where feature flags accumulate indefinitely, creating a combinatorial explosion of possible states that no one can reason about.
Key Decisions & Trade-offs
Custom build vs. LaunchDarkly or Unleash. Third-party solutions are excellent for most teams. We built custom because we needed deep integration with our multi-tenant architecture (flag evaluation needed tenant context), we wanted flag evaluation on both server and React client without SDK overhead, and we needed the SSE-based kill switch mechanism. The trade-off was engineering time: roughly six weeks of focused development for the initial system.
Deterministic hashing vs. stored assignments. Hashing is stateless and fast but means you cannot move a specific user between buckets. Stored assignments give you per-user control but require a database lookup on every flag evaluation. We chose hashing because the performance characteristics matched our needs and the allowlist/blocklist covered the cases where we needed per-user overrides.
SSE vs. WebSocket for real-time updates. SSE is simpler (unidirectional, auto-reconnect, works through most proxies) and sufficient for our use case where updates only flow from server to client. WebSocket would have been over-engineering for a notification channel.
Auto-advance vs. manual approval at every stage. Full manual approval slows down rollouts and creates bottlenecks when the approver is unavailable. Full auto-advance risks advancing past issues that automated health checks do not catch. The hybrid approach (manual for first and last stages, auto for middle stages with health gates) balanced safety and velocity.
Results & Outcomes
The feature flag system changed how teams shipped code. Deployments became non-events because deploying code and enabling a feature were decoupled. Teams shipped to production multiple times per day, with new features sitting behind flags until they were ready for rollout.
The kill switch capability was used several times during production incidents. In each case, the problematic feature was disabled within seconds, restoring service while the team investigated and fixed the underlying issue. Previously, this would have required a rollback deployment that took minutes and potentially reverted other changes.
A/B testing became accessible without additional infrastructure. Product managers could define experiments using the flag management dashboard, and the deterministic bucketing ensured statistically valid results.
Flag lifecycle management kept the system clean. The automated cleanup reminders and Jira ticket creation prevented stale flags from accumulating.
What I'd Do Differently
Add a flag dependency graph from the start. Some flags had implicit dependencies on other flags. Flag A assumed Flag B was enabled, but there was no way to express or enforce this in the system. A dependency graph with validation would have caught invalid flag combinations before they caused issues in production.
Implement flag evaluation caching on the client more aggressively. The initial implementation re-evaluated flags on every render cycle via the context provider. Adding a memoization layer based on user context hash would have reduced unnecessary re-renders in flag-heavy UIs.
Build the management dashboard first. We built the evaluation engine and API first, managing flags through direct API calls and scripts. The dashboard came later. In hindsight, the dashboard should have been the first deliverable because non-engineering stakeholders (product managers, support) needed to view and manage flags without developer involvement.
Define flag naming conventions more strictly. We ended up with inconsistent flag names like new-checkout-flow, enable_dark_mode, and FEATURE_enhanced_search. A strict naming convention enforced by the flag-lint tool from day one would have prevented this inconsistency.
FAQ
What is progressive rollout with feature flags?
Progressive rollout gradually exposes a new feature to increasing percentages of users, starting at 1%, then 10%, 25%, and finally 100%. At each stage, the system monitors health metrics like error rates, latency, and business KPIs. If any metric regresses beyond a configured threshold, the rollout halts automatically. This catches issues early with a limited blast radius. The key enabler is deterministic user bucketing: once a user is included in a rollout stage, they remain included as the percentage increases, which prevents the disorienting experience of a feature appearing and disappearing.
Should you build or buy a feature flag system?
For most teams, a managed service like LaunchDarkly, Unleash, or Flagsmith is the right choice. They handle the operational burden, provide SDKs for multiple platforms, and include management dashboards out of the box. We built a custom solution because we needed deep integration with our multi-tenant architecture and wanted flag evaluation on both server and React Native client without SDK size overhead. The build cost was roughly six weeks of focused engineering time, plus ongoing maintenance. Unless you have specific requirements that managed services cannot accommodate, buying is almost always more cost-effective.
How do feature flags integrate with A/B testing?
Feature flags control which variant a user sees, and the A/B testing layer tracks conversion metrics per variant. By assigning users to consistent flag buckets using hashed user IDs, you get statistically valid experiments without separate A/B testing infrastructure. The flag configuration defines the variants and their traffic split, and the evaluation engine returns both whether the flag is enabled and which variant the user is assigned to. Analytics events are tagged with the variant identifier, which lets you compare conversion rates, engagement metrics, and other KPIs across variants in your existing analytics platform.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
Optimizing Core Web Vitals for e-Commerce
Our journey to scoring 100 on Google PageSpeed Insights for a major Shopify-backed e-commerce platform.
Building an AI-Powered Interview Feedback System
How we built an AI-powered system that analyzes mock interview recordings and generates structured feedback on communication, technical accuracy, and problem-solving approach using LLMs.
Migrating from Pages to App Router
A detailed post-mortem on migrating a massive enterprise dashboard from Next.js Pages Router to the App Router.