Building a Realtime Collaboration Tool

TL;DR

CRDTs via Yjs provide a simpler, more robust approach to real-time collaborative editing than Operational Transforms, with automatic conflict resolution and offline support. I built a Notion-like multiplayer editor using Yjs, React, and WebSockets that supported concurrent editing across dozens of users without a single merge conflict.

The Challenge

A client needed a collaborative document editor embedded within their project management platform. Their existing workflow involved teams editing shared documents one person at a time, creating constant bottlenecks. They wanted Google Docs-like real-time editing, but within their own application, not relying on a third-party embed.

The core requirements were straightforward on paper but brutal in practice:

›Multiple users editing the same document simultaneously with sub-second latency
›Rich text editing with blocks, headings, lists, code snippets, and inline formatting
›Offline support so users on flaky connections would not lose work
›Presence awareness showing cursors and selections of other users in real time
›Conflict resolution that never corrupts or loses data

The previous team had attempted a naive implementation using database locks and last-write-wins semantics. The result was predictable: users would overwrite each other's paragraphs, edits would vanish, and the "collaborative" editor was anything but.

The Architecture

Why CRDTs Over Operational Transforms

The two dominant approaches for collaborative editing are Operational Transforms (OT) and Conflict-free Replicated Data Types (CRDTs). I spent the first week of the project evaluating both.

Operational Transforms power Google Docs and have decades of academic backing. OT works by transforming operations against each other. If User A inserts "hello" at position 5 and User B deletes character 3, the server must transform A's operation to account for B's deletion before applying it. This sounds manageable until you realize the transformation functions grow combinatorially with the number of operation types. Google's OT implementation reportedly took years to stabilize.

CRDTs take a fundamentally different approach. Instead of transforming operations, CRDTs use data structures that are mathematically guaranteed to converge regardless of the order operations are applied. There is no central server needed to resolve conflicts because every replica independently arrives at the same state.

I chose CRDTs, specifically Yjs, for three reasons:

›No central conflict resolution server required. The WebSocket server is a relay, not an arbiter. This massively simplifies the backend.
›Offline support is built into the model. A CRDT can accept edits while disconnected and merge cleanly when reconnected.
›Yjs is battle-tested. It is used in production by applications like JupyterLab, Hocuspocus, and numerous collaborative editors.

The Editor Layer

I chose TipTap as the rich text editor, which is built on top of ProseMirror and has a first-class Yjs integration via the y-prosemirror binding. The component hierarchy looked like this:

tsx

import { useEditor, EditorContent } from '@tiptap/react';
import StarterKit from '@tiptap/starter-kit';
import Collaboration from '@tiptap/extension-collaboration';
import CollaborationCursor from '@tiptap/extension-collaboration-cursor';
import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';
 
interface CollaborativeEditorProps {
  documentId: string;
  user: { name: string; color: string };
}
 
export function CollaborativeEditor({ documentId, user }: CollaborativeEditorProps) {
  const ydoc = useMemo(() => new Y.Doc(), []);
 
  const provider = useMemo(
    () =>
      new WebsocketProvider(
        process.env.NEXT_PUBLIC_WS_URL!,
        documentId,
        ydoc,
        { connect: true }
      ),
    [documentId, ydoc]
  );
 
  const editor = useEditor({
    extensions: [
      StarterKit.configure({ history: false }), // Disable default history; Yjs handles undo/redo
      Collaboration.configure({ document: ydoc }),
      CollaborationCursor.configure({
        provider,
        user: { name: user.name, color: user.color },
      }),
    ],
  });
 
  useEffect(() => {
    return () => {
      provider.destroy();
      ydoc.destroy();
    };
  }, [provider, ydoc]);
 
  return <EditorContent editor={editor} />;
}

The critical detail is history: false on StarterKit. ProseMirror's built-in undo/redo tracks local operations, but in a collaborative context, you need Yjs's undo manager, which understands the CRDT operation log and can undo only the current user's changes without reverting other collaborators' edits.

The WebSocket Sync Layer

The backend used a Node.js WebSocket server built with the y-websocket utility package. The server's responsibilities were minimal by design:

typescript

import { WebSocketServer } from 'ws';
import { setupWSConnection } from 'y-websocket/bin/utils';
 
const wss = new WebSocketServer({ port: 1234 });
 
wss.on('connection', (ws, req) => {
  const docName = req.url?.slice(1) || 'default';
  setupWSConnection(ws, req, { docName, gc: true });
});

The y-websocket utility handles document state management, binary encoding of updates, and garbage collection of tombstoned CRDT nodes. The server holds the authoritative copy of each Y.Doc in memory and persists snapshots to a database on a configurable interval.

Persistence and Recovery

Yjs documents serialize to a compact binary format via Y.encodeStateAsUpdate(ydoc). I stored these snapshots in PostgreSQL as bytea columns, with a debounced persistence strategy:

typescript

const PERSIST_INTERVAL_MS = 5000;
const pendingDocs = new Map<string, NodeJS.Timeout>();
 
function schedulePersistence(docName: string, ydoc: Y.Doc) {
  if (pendingDocs.has(docName)) return;
 
  const timeout = setTimeout(async () => {
    const update = Y.encodeStateAsUpdate(ydoc);
    await db.query(
      'INSERT INTO documents (name, state) VALUES ($1, $2) ON CONFLICT (name) DO UPDATE SET state = $2, updated_at = NOW()',
      [docName, Buffer.from(update)]
    );
    pendingDocs.delete(docName);
  }, PERSIST_INTERVAL_MS);
 
  pendingDocs.set(docName, timeout);
}

On server restart or when a new client connects to a document not in memory, the server loads the last snapshot and applies it to a fresh Y.Doc before any WebSocket connections are accepted.

Presence and Awareness

Yjs includes an awareness protocol separate from the document state. Each client broadcasts ephemeral presence data (cursor position, selection range, user name, color) through the same WebSocket connection. This data is not persisted and is automatically cleaned up when a client disconnects.

The awareness protocol handles edge cases like stale cursors when a user closes their laptop without a clean disconnect. A configurable timeout (default 30 seconds) removes awareness state for unresponsive clients.

Key Decisions & Trade-offs

Yjs over Automerge. Both are mature CRDT libraries. I chose Yjs because its binary encoding is significantly more compact, its ProseMirror binding is more mature, and its WebSocket provider is production-ready out of the box. Automerge has a cleaner API for general-purpose CRDT use cases, but Yjs wins for text editing specifically.

TipTap over Slate. Slate's collaborative editing story was less mature at the time, and its data model required more custom work to integrate with Yjs. TipTap's ProseMirror foundation meant I could leverage the existing y-prosemirror binding directly.

In-memory document state on the server. Storing active Y.Docs in server memory is a trade-off. It limits horizontal scalability because a given document must be served by a single server instance (or you need sticky sessions). For this project, the document count was manageable on a single server. For a larger-scale deployment, I would use a shared persistence layer like Redis to coordinate across instances.

Debounced persistence over write-on-every-update. Writing every CRDT update to the database would guarantee zero data loss but create unsustainable write throughput. The 5-second debounce means a hard server crash could lose at most 5 seconds of edits. For a document editor, this was an acceptable trade-off. For financial data, it would not be.

Results & Outcomes

The collaborative editor shipped and handled concurrent editing sessions smoothly. Users reported that the editing experience felt "instant" even with 15-20 simultaneous collaborators in a single document. The offline support worked seamlessly during demos where we deliberately killed network connections and reconnected.

The most significant outcome was architectural simplicity. The entire WebSocket server was under 200 lines of code because Yjs handles the hard parts (conflict resolution, binary encoding, awareness protocol) internally. This meant the maintenance burden was minimal compared to what an OT-based solution would have required.

The team that took over the project after my engagement was able to onboard within a few days because the CRDT model is conceptually simpler than OT. There are no transformation functions to debug, no operation ordering issues, and no server-side conflict resolution logic.

What I'd Do Differently

Invest in operational tooling earlier. Debugging collaborative editing issues in production is difficult because problems are inherently multi-client. I would build a replay system from the start that records the stream of CRDT updates for a document session, allowing me to replay and reproduce issues locally.

Use a shared persistence layer from day one. The in-memory document model works for a single server, but the client's user base grew faster than expected. Retrofitting sticky sessions was messy. Starting with Redis-backed document state would have made horizontal scaling straightforward.

Add structured logging for awareness events. Presence bugs (stale cursors, phantom users) were the most common support tickets. Better logging around awareness state changes would have cut debugging time significantly.

Rate-limit update broadcasting. In documents with heavy concurrent editing, the WebSocket server would broadcast every granular CRDT update immediately. Batching updates on a short interval (50-100ms) would reduce network traffic without any perceptible latency increase.

FAQ

What is a CRDT?

A Conflict-free Replicated Data Type (CRDT) is a data structure that can be replicated across multiple nodes and merged automatically without conflicts, making it ideal for real-time collaboration. Unlike traditional data structures that require a central authority to resolve concurrent modifications, CRDTs are designed so that any two replicas that have received the same set of updates will converge to the same state, regardless of the order those updates were applied. This mathematical property, called strong eventual consistency, eliminates an entire class of synchronization bugs.

How does Yjs handle conflicts in collaborative editing?

Yjs uses CRDTs to automatically merge concurrent edits from multiple users without requiring a central server to resolve conflicts, ensuring eventual consistency. Internally, every character in a Yjs text document is assigned a unique identifier based on the client ID and a logical clock. When two users type at the same position simultaneously, Yjs uses a deterministic ordering rule based on these identifiers to decide which character comes first. The result is that both users see the same final document without either edit being lost.

When should you use CRDTs over Operational Transforms?

CRDTs are preferred when you need offline support, peer-to-peer collaboration, or simpler implementation, while OT may be better for centralized server architectures with strict ordering requirements. In practice, CRDTs have become the dominant choice for new collaborative editing projects because the tooling (Yjs, Automerge) has matured to the point where the theoretical complexity of CRDTs is hidden behind clean APIs. OT still makes sense if you are building on top of an existing OT-based infrastructure or need extremely fine-grained server-side control over every operation applied to a document.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

Start a Conversation

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Building a Realtime Collaboration Tool

Building a Realtime Collaboration Tool

TL;DR

The Challenge

The Architecture

Why CRDTs Over Operational Transforms

The Editor Layer

The WebSocket Sync Layer

Persistence and Recovery

Presence and Awareness

Key Decisions & Trade-offs

Results & Outcomes

What I'd Do Differently

FAQ

What is a CRDT?

How does Yjs handle conflicts in collaborative editing?

When should you use CRDTs over Operational Transforms?

Need help with a project?

Let's Build It

Sadam Hussain

Related Articles

Building a Resume RAG Chatbot for a Portfolio Assistant

Optimizing Core Web Vitals for e-Commerce

Building an AI-Powered Interview Feedback System