WebRTC Architecture: Signaling, ICE, and Peer Connections
A deep dive into WebRTC architecture covering signaling servers, ICE framework, STUN/TURN, SDP negotiation, and peer connection lifecycle.
Tags
WebRTC Architecture: Signaling, ICE, and Peer Connections
WebRTC (Web Real-Time Communication) enables direct peer-to-peer audio, video, and data transfer between browsers without requiring plugins or native applications. However, establishing that direct connection is far more complex than most developers realize. Before any media flows between peers, a sophisticated dance of signaling, network discovery, and capability negotiation must occur. This guide breaks down each layer of the WebRTC architecture so you can build reliable real-time applications.
TL;DR
WebRTC uses three core mechanisms to establish peer-to-peer connections: a signaling server for exchanging session descriptions (SDP) and connection candidates, the ICE framework (with STUN/TURN servers) for NAT traversal, and the RTCPeerConnection API for managing media streams and data channels. Understanding how these pieces interact is essential for building production-grade real-time communication systems.
Why This Matters
Real-time communication has become a baseline expectation across industries. Video conferencing, telehealth, live customer support, collaborative editing, and multiplayer gaming all depend on low-latency peer-to-peer connections. WebRTC is the technology that makes this possible natively in browsers, but its architecture has enough moving parts that misunderstanding any single component can result in connections that fail silently for a subset of users—particularly those behind corporate firewalls or symmetric NATs.
If you are building anything that requires real-time media or data transfer in the browser, understanding WebRTC architecture is not optional—it is foundational.
How It Works
The Signaling Server
WebRTC itself does not define a signaling protocol. It leaves that choice to the developer. The signaling server is responsible for two critical tasks: helping peers discover each other and relaying the metadata required to establish a connection.
The metadata exchanged during signaling includes:
- ›SDP (Session Description Protocol) offers and answers — describing media capabilities, codecs, and connection parameters
- ›ICE candidates — potential network paths for the connection
You can implement signaling over WebSockets, HTTP long polling, Server-Sent Events, or even manual copy-paste for debugging. Here is a basic signaling flow using WebSockets:
// Signaling server (Node.js with ws)
import { WebSocketServer, WebSocket } from 'ws';
const wss = new WebSocketServer({ port: 8080 });
const rooms = new Map<string, Set<WebSocket>>();
wss.on('connection', (ws) => {
let currentRoom: string | null = null;
ws.on('message', (data) => {
const message = JSON.parse(data.toString());
switch (message.type) {
case 'join':
currentRoom = message.room;
if (!rooms.has(currentRoom)) {
rooms.set(currentRoom, new Set());
}
rooms.get(currentRoom)!.add(ws);
break;
case 'offer':
case 'answer':
case 'ice-candidate':
// Relay to all other peers in the room
if (currentRoom && rooms.has(currentRoom)) {
rooms.get(currentRoom)!.forEach((peer) => {
if (peer !== ws && peer.readyState === WebSocket.OPEN) {
peer.send(JSON.stringify(message));
}
});
}
break;
}
});
ws.on('close', () => {
if (currentRoom && rooms.has(currentRoom)) {
rooms.get(currentRoom)!.delete(ws);
}
});
});The signaling server never sees the actual media. It is purely a coordination mechanism.
The SDP Offer/Answer Model
Before peers can exchange media, they must agree on what media to send and how to encode it. This negotiation happens through SDP. The initiating peer creates an offer, and the receiving peer responds with an answer.
An SDP message contains information about:
- ›Media types (audio, video, data)
- ›Codec preferences and parameters
- ›Encryption keys for SRTP
- ›ICE credentials and fingerprints
// Initiating peer creates an offer
const peerConnection = new RTCPeerConnection(configuration);
// Add local media tracks
const stream = await navigator.mediaDevices.getUserMedia({
video: true,
audio: true,
});
stream.getTracks().forEach((track) => {
peerConnection.addTrack(track, stream);
});
// Create and set local description
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
// Send offer through signaling server
signalingChannel.send(JSON.stringify({
type: 'offer',
sdp: offer.sdp,
}));On the receiving side:
// Receiving peer handles the offer
signalingChannel.onmessage = async (event) => {
const message = JSON.parse(event.data);
if (message.type === 'offer') {
await peerConnection.setRemoteDescription(
new RTCSessionDescription({ type: 'offer', sdp: message.sdp })
);
const answer = await peerConnection.createAnswer();
await peerConnection.setLocalDescription(answer);
signalingChannel.send(JSON.stringify({
type: 'answer',
sdp: answer.sdp,
}));
}
};The ICE Framework and NAT Traversal
Most devices sit behind NATs (Network Address Translators), which means their local IP addresses are not directly reachable from the internet. The ICE (Interactive Connectivity Establishment) framework solves this by discovering all possible network paths between peers and selecting the best one.
ICE gathers three types of candidates:
- ›Host candidates — the device's local IP addresses (works when both peers are on the same network)
- ›Server-reflexive candidates — the public IP and port as discovered by a STUN server (works when the NAT is not too restrictive)
- ›Relay candidates — a TURN server that relays traffic between peers (works when direct connectivity fails entirely)
const configuration: RTCConfiguration = {
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: 'turn:turn.example.com:3478',
username: 'user',
credential: 'password',
},
],
};
const peerConnection = new RTCPeerConnection(configuration);
// ICE candidates are gathered asynchronously
peerConnection.onicecandidate = (event) => {
if (event.candidate) {
signalingChannel.send(JSON.stringify({
type: 'ice-candidate',
candidate: event.candidate.toJSON(),
}));
}
};
// Receive ICE candidates from the remote peer
signalingChannel.onmessage = async (event) => {
const message = JSON.parse(event.data);
if (message.type === 'ice-candidate') {
await peerConnection.addIceCandidate(
new RTCIceCandidate(message.candidate)
);
}
};STUN and TURN Servers
STUN (Session Traversal Utilities for NAT) servers are lightweight. A peer sends a binding request to the STUN server, and the server responds with the public IP address and port that the request appeared to come from. This allows the peer to share its "server-reflexive" address with the other peer. STUN servers handle no media traffic and are inexpensive to operate.
TURN (Traversal Using Relays around NAT) servers are the fallback. When both peers are behind restrictive NATs (symmetric NATs) or firewalls that block UDP entirely, TURN servers relay all media traffic between the peers. This means the TURN server must handle the full bandwidth of the media stream, making it the most expensive component in a WebRTC deployment.
In production, you should always configure both STUN and TURN servers. Without TURN, a meaningful percentage of your users will be unable to connect.
Peer Connection Lifecycle
The RTCPeerConnection goes through a well-defined lifecycle:
- ›new — the connection object is created
- ›connecting — ICE and DTLS negotiation is in progress
- ›connected — at least one ICE candidate pair is active and DTLS handshake is complete
- ›disconnected — connectivity checks indicate a temporary loss of connectivity
- ›failed — ICE has exhausted all candidate pairs without finding a working connection
- ›closed — the connection is shut down
peerConnection.onconnectionstatechange = () => {
console.log('Connection state:', peerConnection.connectionState);
switch (peerConnection.connectionState) {
case 'connected':
console.log('Peers connected successfully');
break;
case 'disconnected':
console.log('Peer disconnected — may reconnect');
// Implement reconnection logic
break;
case 'failed':
console.log('Connection failed — restart ICE or create new connection');
peerConnection.restartIce();
break;
case 'closed':
console.log('Connection closed');
cleanup();
break;
}
};Media Streams and Data Channels
WebRTC supports two types of real-time communication:
Media tracks handle audio and video using addTrack() and the ontrack event:
// Receiving remote media
peerConnection.ontrack = (event) => {
const remoteVideo = document.getElementById('remoteVideo') as HTMLVideoElement;
if (remoteVideo.srcObject !== event.streams[0]) {
remoteVideo.srcObject = event.streams[0];
}
};Data channels provide arbitrary bidirectional data transfer with configurable reliability:
// Creating a data channel
const dataChannel = peerConnection.createDataChannel('chat', {
ordered: true, // guarantee order
maxRetransmits: 3, // or use maxPacketLifeTime for time-based
});
dataChannel.onopen = () => {
dataChannel.send('Hello from peer A');
};
dataChannel.onmessage = (event) => {
console.log('Received:', event.data);
};
// Receiving peer listens for data channels
peerConnection.ondatachannel = (event) => {
const channel = event.channel;
channel.onmessage = (e) => {
console.log('Received:', e.data);
};
};Data channels use SCTP over DTLS, providing encryption by default. You can configure them as ordered or unordered, reliable or unreliable, making them suitable for everything from chat messages to game state updates.
Practical Implementation
A complete peer connection setup brings all these pieces together. Here is a consolidated example:
class WebRTCConnection {
private pc: RTCPeerConnection;
private signalingChannel: WebSocket;
constructor(signalingUrl: string, iceServers: RTCIceServer[]) {
this.signalingChannel = new WebSocket(signalingUrl);
this.pc = new RTCPeerConnection({ iceServers });
this.pc.onicecandidate = ({ candidate }) => {
if (candidate) {
this.signal({ type: 'ice-candidate', candidate: candidate.toJSON() });
}
};
this.pc.ontrack = (event) => {
this.onRemoteStream(event.streams[0]);
};
this.pc.onconnectionstatechange = () => {
if (this.pc.connectionState === 'failed') {
this.pc.restartIce();
}
};
this.signalingChannel.onmessage = (event) => {
this.handleSignal(JSON.parse(event.data));
};
}
async startCall(): Promise<void> {
const stream = await navigator.mediaDevices.getUserMedia({
video: { width: 1280, height: 720 },
audio: { echoCancellation: true, noiseSuppression: true },
});
stream.getTracks().forEach((track) => this.pc.addTrack(track, stream));
const offer = await this.pc.createOffer();
await this.pc.setLocalDescription(offer);
this.signal({ type: 'offer', sdp: offer.sdp });
}
private async handleSignal(message: any): Promise<void> {
switch (message.type) {
case 'offer':
await this.pc.setRemoteDescription(new RTCSessionDescription(message));
const answer = await this.pc.createAnswer();
await this.pc.setLocalDescription(answer);
this.signal({ type: 'answer', sdp: answer.sdp });
break;
case 'answer':
await this.pc.setRemoteDescription(new RTCSessionDescription(message));
break;
case 'ice-candidate':
await this.pc.addIceCandidate(new RTCIceCandidate(message.candidate));
break;
}
}
private signal(data: object): void {
this.signalingChannel.send(JSON.stringify(data));
}
private onRemoteStream(stream: MediaStream): void {
const video = document.getElementById('remoteVideo') as HTMLVideoElement;
video.srcObject = stream;
}
close(): void {
this.pc.close();
this.signalingChannel.close();
}
}Common Pitfalls
Not including TURN servers. The most frequent production issue. Developers test on the same network or with permissive NATs and skip TURN configuration. When real users behind corporate firewalls try to connect, they get silent failures.
Ignoring ICE restart. When a connection enters the "failed" state, many implementations simply give up. Calling restartIce() can recover connections that temporarily lost connectivity without requiring a full re-negotiation.
Race conditions in signaling. If ICE candidates arrive before the remote description is set, addIceCandidate will throw. Buffer incoming candidates until setRemoteDescription has been called.
Not handling renegotiation. Adding or removing tracks after the initial connection requires renegotiation via the negotiationneeded event. Ignoring this event leads to tracks that never appear on the remote side.
Assuming UDP availability. Some enterprise networks block all UDP traffic. Without TURN over TCP (or TURN over TLS on port 443), these users cannot connect at all.
When to Use (and When Not To)
Use WebRTC when:
- ›You need low-latency, real-time media streaming between browsers
- ›Privacy matters and you want end-to-end encrypted media without server-side processing
- ›You are building video calls, screen sharing, file transfer, or real-time gaming
- ›You need data channels for low-latency bidirectional communication
Do not use WebRTC when:
- ›You need to broadcast to thousands of viewers (use HLS/DASH or a media server like Janus/mediasoup for SFU architecture)
- ›You need server-side recording or processing of media (you will need a media server intermediary)
- ›Your use case is simple request-response (standard HTTP or WebSockets are simpler)
- ›You need guaranteed delivery of large files (standard file transfer protocols are more appropriate)
For large-scale video applications, consider a Selective Forwarding Unit (SFU) architecture where a media server receives streams from each participant and selectively forwards them, avoiding the exponential bandwidth growth of full mesh peer-to-peer topologies.
FAQ
What is WebRTC and how does it work?
WebRTC is a set of browser APIs that enables peer-to-peer audio, video, and data transfer without plugins. It works by using a signaling server to exchange connection metadata, ICE candidates for NAT traversal, and STUN/TURN servers to discover public IP addresses and relay media when direct connections fail.
Why does WebRTC need a signaling server if it is peer-to-peer?
WebRTC is peer-to-peer for media transfer, but peers need a way to discover each other and exchange connection details first. The signaling server handles this coordination—exchanging SDP offers/answers and ICE candidates—before the direct connection is established.
What is the difference between STUN and TURN servers?
STUN servers help peers discover their public IP address for NAT traversal, enabling direct connections. TURN servers relay traffic when direct connections fail due to restrictive firewalls or symmetric NATs. STUN is lightweight; TURN consumes significant bandwidth.
When does WebRTC fall back to TURN relay?
WebRTC falls back to TURN when direct peer-to-peer connectivity fails, typically due to symmetric NATs, strict corporate firewalls, or VPN configurations that block UDP traffic.
Can WebRTC be used for more than video calls?
Yes. WebRTC data channels support arbitrary data transfer, enabling file sharing, real-time gaming, collaborative editing, and IoT device communication with the same low-latency peer-to-peer architecture.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
How to Design API Contracts Between Micro-Frontends and BFFs
Learn how to design stable API contracts between Micro-Frontends and Backend-for-Frontend layers with versioning, ownership boundaries, error handling, and schema governance.
Next.js BFF Architecture
An architectural deep dive into using Next.js as a Backend-for-Frontend, including route handlers, server components, auth boundaries, caching, and service orchestration.
Next.js Cache Components and PPR in Real Apps
A practical guide to using Next.js Cache Components and Partial Prerendering in real applications, with tradeoffs, cache strategy, and freshness considerations.