WebRTC Patterns for Browser Multiplayer: Building Real-Time Collaborative Apps
Real-time is no longer a nice extra on the web. Players expect cursors to glide across a shared whiteboard the instant a teammate moves them. Game sessions expect sub-100ms reactions on hit detection. If your browser app routes every packet through a central server, physics feels sluggish and collaboration feels dead.
At our studio, we keep returning to the same transport when we need real-time — WebRTC. This guide walks through the patterns we use to build browser multiplayer experiences and collaborative apps on top of WebRTC data channels and media streams, from signaling all the way to rollback networking.
What WebRTC Actually Is
WebRTC is a peer-to-peer transport stack baked into every modern browser. It was originally designed for video calls but turns out to be the fastest way to move arbitrary bytes between two browsers on the open internet.
Unlike WebSockets, which always tunnel through your server, WebRTC negotiates a direct socket between peers whenever possible. That direct path typically delivers data in under 50ms, versus 150 to 300ms for the WebSocket round trip through a data center.
WebRTC is a browser API that establishes peer-to-peer connections between clients using ICE for NAT traversal, DTLS for encryption, and SCTP data channels or SRTP media streams for transport. It bypasses central servers for the data path, enabling latencies near 50ms that are impossible with a server-routed WebSocket.
The stack under the hood is a stack of acronyms: ICE (Interactive Connectivity Establishment) discovers network paths, STUN (Session Traversal Utilities for NAT) tells you your public IP, TURN (Traversal Using Relays around NAT) relays packets when direct connections fail, and DTLS/SRTP encrypt everything end-to-end.
You do not need to implement any of it yourself. The browser handles the cryptography, the NAT punching, and the congestion control. Your job is to orchestrate the handshake and then push bytes down the channel.
WebRTC vs WebSocket vs SSE
Picking the right transport is a load-bearing decision for a real-time app. The three browser-native options each solve a different problem, and mixing them up costs you either latency or reliability.
| Characteristic | WebRTC | WebSocket | SSE |
|---|---|---|---|
| Topology | Peer-to-peer (or relay) | Client ↔ server | Server → client only |
| Typical Latency | 30–80 ms | 100–300 ms | 200–500 ms |
| Delivery | Reliable or unreliable | Reliable only (TCP) | Reliable only (TCP) |
| Ordering | Optional per channel | Always ordered | Always ordered |
| Handshake Cost | High (ICE + DTLS) | Low (HTTP Upgrade) | Low (HTTP GET) |
| Best For | Games, voice, cursors | Chat, dashboards | Notifications, feeds |
The headline feature is that unreliable, unordered delivery option on WebRTC data channels. For gameplay state and cursor positions, you actively want packets to be dropped rather than queued — a 200ms-stale cursor position is worse than no update at all.
Signaling: How Peers Actually Find Each Other
WebRTC is peer-to-peer for data, but peers still need an introduction. Signaling is the out-of-band exchange where two clients swap session descriptions (SDP) and ICE candidates before the direct connection exists.
The signaling channel itself is not part of the WebRTC spec. You can use WebSockets, HTTP long polling, or even a shared Google Doc — anything that moves small JSON blobs between the two clients works.
A signaling server relays SDP offers, SDP answers, and ICE candidates between two peers that do not yet have a direct connection. It only carries handshake metadata, never gameplay data, so a single small WebSocket server can broker thousands of peer introductions per second before stepping out of the data path entirely.
The handshake goes in four beats. Peer A calls createOffer and sends the resulting SDP to the signaling server. Peer B receives the offer, calls createAnswer, and sends the SDP answer back. Each side trickles ICE candidates as they are discovered. Once a candidate pair succeeds, the data channel opens and the signaling server is no longer needed.
You do want authenticated signaling. Anonymous signaling lets attackers spoof identities, inject malicious SDP, or hijack sessions. Gate your signaling endpoint behind the same token you use elsewhere — our guide on cross-platform player identity walks through the canonical token flow we reuse here.
Data Channels vs Media Streams
WebRTC carries two fundamentally different payloads. Data channels move arbitrary bytes using SCTP over DTLS, while media streams move encoded audio and video using SRTP with RTCP feedback.
Use data channels for games, cursors, presence, chat, and CRDT operations. Use media streams for voice chat, video calls, and screen sharing. The distinction matters because media streams get hardware-accelerated codec paths that you cannot replicate with raw data channels.
Data channels let you pick reliability per channel. Configure one ordered reliable channel for chat and CRDT state, and a separate unordered unreliable channel for 20Hz position updates. Mixing them on a single channel means a dropped position packet stalls every chat message behind it.
Data channels carry arbitrary application bytes with configurable reliability and ordering, ideal for game state, cursors, and CRDT patches. Media streams carry compressed audio and video frames through hardware codecs, ideal for voice, video, and screen sharing. Use both side-by-side on the same peer connection when an app needs gameplay plus voice.
Mesh vs SFU vs MCU Topologies
Peer-to-peer is magical for two players. At eight players it is painful. At thirty players it is impossible. Your topology choice determines how far WebRTC scales for your specific app.
A full mesh has every peer connected to every other peer — n-squared connections. A selective forwarding unit (SFU) is a server peer that every client connects to once, and it forwards streams it receives. A multipoint control unit (MCU) is a server that mixes all incoming streams into a single composite and sends that mix to each client.
Topology Scaling Limits
The bars below show how many simultaneous peers each topology realistically supports in a modern browser on a typical residential connection. These numbers are approximate and workload-dependent.
The trade-off is latency and cost. Mesh is the lowest latency because there is no server hop, but client bandwidth and CPU climb quadratically. SFU adds one server hop (typically 10-30ms) but keeps client load linear. MCU adds encode cost on the server but lets you support massive audiences on thin clients.
For collaborative apps like shared whiteboards or cursors, mesh is usually fine because payloads are tiny. For voice chat with more than four players, reach for an SFU immediately.
NAT Traversal: When You Actually Need TURN
NAT traversal is the part of WebRTC that surprises everyone. About 80% of peer connections succeed with STUN-derived candidates alone. The other 20% live behind symmetric NATs, corporate firewalls, or carrier-grade NAT, and they need a TURN relay to connect at all.
STUN is free to run and stateless — a STUN server just echoes back your public IP and port. TURN is expensive because it actually relays your media traffic, which means bandwidth costs scale with session volume.
| Server Type | Purpose | Bandwidth Cost | When Needed |
|---|---|---|---|
| STUN | Public IP discovery | Negligible | Always |
| TURN (UDP) | Relay for symmetric NAT | Full session bandwidth | ~15–20% of sessions |
| TURN (TCP/TLS) | Corporate firewall bypass | Full + TLS overhead | ~3–5% of sessions |
| ICE | Candidate negotiation | None (protocol layer) | Always |
You must provide TURN for any production app. The 20% of sessions that need it will simply fail to connect otherwise, and those users will never know why. Treat TURN as infrastructure insurance, not as an optional extra.
Use short-lived TURN credentials. Issue them from your signaling server after authentication, with a TTL of 1 hour. Long-lived static credentials are one leak away from becoming someone else's bandwidth budget.
Latency Targets and Why They Matter
Human perception has hard thresholds. Under 50ms feels instant. 50-100ms feels responsive. 100-200ms feels sluggish. Above 200ms, collaboration falls apart and competitive gameplay becomes unplayable.
WebRTC peer-to-peer gets you near the floor of physics — typically 30-80ms on a good network. The same data routed through a central WebSocket server usually lands at 150-300ms because packets make a double round trip through a data center that may not be near either player.
Latency budget is cumulative. A 30ms transit plus a 16ms frame buffer plus a 16ms render means your input-to-display loop is already 62ms. Every millisecond you save on the wire buys you headroom for simulation, rendering, and animation — exactly the kind of budget needed by the browser 3D work covered in our WebGL and Three.js guide.
Synchronization Patterns
Low latency alone does not make multiplayer feel good. You still need a state replication model that reconciles what each peer believes is happening. Three patterns cover 95% of real-world apps.
State replication sends the entire authoritative state at a fixed tick rate (usually 20-30 Hz). Lag compensation rewinds the simulation when resolving player actions to account for the sender's historical view. Rollback networking speculatively simulates forward and rolls back when new inputs arrive from other peers.
Rollback networking predicts remote player inputs, simulates forward, then rewinds and resimulates when real inputs arrive. It gives fighting-game-tier responsiveness over unreliable channels but requires a fully deterministic simulation — no floating-point drift, no Math.random without seeded PRNG, no async state reads. If your game is not deterministic, use state replication instead.
For collaborative apps like whiteboards and docs, CRDTs (conflict-free replicated data types) are the right mental model. Each peer applies local operations optimistically, then broadcasts the operation over the data channel. CRDT merge semantics guarantee every peer converges to the same final state regardless of delivery order.
The trick is picking the smallest CRDT that solves your problem. A sequence CRDT like RGA for text editing. A grow-only set for presence lists. A last-writer-wins map for cursor positions. Generic libraries like Yjs and Automerge are excellent, but a hand-rolled LWW map is often 10x smaller and plenty for cursors.
Cursors, Presence, and Shared Whiteboards
Shared cursors are the canonical real-time collaboration demo for a reason. They are cheap, visually impressive, and exercise every part of the WebRTC stack in miniature.
Send cursor positions at 30 Hz on an unordered, unreliable data channel. Each packet is a 20-byte tuple of (peerId, x, y, timestamp). Drop receive-side packets older than the most recent one — you only ever care about the latest position.
Render other peers' cursors with a smoothing interpolation (typically 80-120ms lookback) so they glide instead of teleporting. The same interpolation trick drives the smooth motion work we cover in our guide on procedural browser animation — it is cheap, fast, and makes everything feel alive.
Presence (who is online, who is idle) is a separate concern. Use a small reliable data channel with infrequent heartbeats (every 5 seconds), and treat missing heartbeats as grounds for a UI fade-out, not an immediate disconnect.
Step-by-Step Implementation Flow
When we wire WebRTC into a browser app, we follow the same sequence every time. Skipping steps or doing them out of order creates debugging nightmares later.
Build a tiny WebSocket server that relays offer, answer, and ice-candidate messages between authenticated clients. Do not put any application data on this server. Its only job is introductions.
Use Google's public STUN servers for dev, then provision your own STUN and TURN for production (coturn on a small VM works fine). Issue short-lived TURN credentials from the signaling server — never bake static credentials into the client bundle.
Call new RTCPeerConnection(iceConfig), create data channels before generating the offer, exchange SDPs through signaling, and trickle ICE candidates as they arrive. Treat iceconnectionstatechange as the source of truth for connection health.
Create one ordered reliable channel for CRDT ops and chat, and a separate unordered unreliable channel (maxRetransmits: 0) for position and state updates. Mixing them on one channel means a dropped packet stalls your latency-sensitive stream.
JSON is fine for control messages but wasteful for high-frequency data. Pack position updates into a fixed-size DataView or Uint8Array — you can usually fit a full player state into 24-32 bytes versus 150+ bytes for the JSON equivalent.
Listen for iceconnectionstatechange transitions to disconnected or failed. On failure, call restartIce() and renegotiate through signaling. Expose connection state in the UI so users know whether their input is reaching other peers.
Connection Reliability: ICE Restart and Reconnection
Networks fail. Laptops go to sleep. Wi-Fi switches to cellular. WebRTC has a specific mechanism for surviving these events without blowing away the entire session — ICE restart.
An ICE restart reuses the existing DTLS encryption context but renegotiates candidates from scratch. It preserves data channel state and media stream subscriptions while picking up a new network path. Call pc.restartIce() and trigger a new offer-answer exchange through signaling.
For longer outages, you want a full reconnection path. Keep a session ID on the signaling server, let the client re-authenticate and rejoin the session, and re-sync any CRDT state that drifted during the outage. The reconnection logic is often more code than the happy path, and skipping it is the single most common reason real-time apps feel unreliable.
ICE restart renegotiates network candidates while preserving the encrypted session and open channels, ideal for brief network changes like Wi-Fi to cellular handoffs. Full reconnection tears down the peer connection and rejoins through signaling, needed for longer outages or server-side session loss. Implement both — users toggle between them constantly without knowing it.
Security: DTLS, Authenticated Signaling, and Origin Checks
WebRTC ships with strong defaults. Every data channel and media stream is encrypted with DTLS 1.2 or higher, keys are negotiated per-session, and browsers refuse to connect without successful cipher negotiation.
The weak point is almost always signaling. An attacker who can inject SDP or ICE candidates into your signaling channel can redirect the data path, harvest TURN credentials, or cause denial of service. Require an authenticated token on every signaling message and validate that the sender is actually a member of the session they're messaging about.
Validate the origin of WebSocket connections to your signaling server. Reject anything that is not your domain. The same defense-in-depth mindset applies to 3D interactive experiences — our Web3D frontier guide discusses the shared surface area for browser games and interactive sites.
Finally, treat TURN credentials as short-lived secrets. Rotate them every hour, scope them per session, and log when they are issued. An attacker who harvests long-lived TURN creds can silently use your relay infrastructure to anonymize their own traffic — expensive and embarrassing.
Where AI Fits In
Real-time multiplayer is increasingly a substrate for AI-driven interactions. Cursor-level presence updates, speech-to-text transcripts streamed over data channels, and AI NPCs that respond to player actions all ride the same WebRTC pipes.
The pattern we use is simple. AI inference runs on a dedicated peer (often server-side) that joins the WebRTC session like any other participant. Players broadcast events on the data channel, the AI peer consumes them, and the AI peer publishes its own events back. Our guide on AI in game development goes deeper into the inference pipelines behind these kinds of AI peers.
Frequently Asked Questions
Key Takeaways
WebRTC is not just for video calls. It is the fastest way to move arbitrary data between browsers, and it is the right default for any real-time collaborative or multiplayer browser app where latency matters.
Start with signaling and ICE before you touch data channels. Provision STUN and TURN early — the 20% of sessions that need TURN will silently fail otherwise. Split your channels by reliability needs, pack payloads densely, and build reconnection logic on day one.
For collaborative apps, layer CRDTs on top of reliable data channels and render remote state with a small interpolation buffer. For games, pick a synchronization model (state replication, lag compensation, or rollback) that matches your simulation's determinism guarantees. Either way, WebRTC gets you within a few milliseconds of the physical limit of real-time on the open web.
