The SVC scalability mode set is the small list of canonical strings – L1T1, L1T2, L1T3, L2T2, L3T3, and friends – that a WebRTC developer drops into RTCRtpEncodingParameters.scalabilityMode to tell the encoder how many spatial and temporal layers to produce. It is the actual control surface for SVC in WebRTC. Everything else – codec selection, simulcast, bandwidth allocation – eventually points back at one of these strings.
Most engineers never read the W3C spec. They copy a scalabilityMode value from a sample app, ship it, and move on. That is fine until something breaks and the question becomes “what mode am I actually running, and is it doing what I think it is?” This entry is the lookup page for that moment.
Why this notation exists
The W3C webrtc-svc spec defines a small registry of named modes so the browser, the application, and the SFU can all agree on what an encoder is producing without negotiating it byte by byte. Before this registry, SVC in WebRTC was a soup of vendor-specific knobs and SDP munging with no standard way of doing things across browsers, media servers and applications. The mode set replaces that with a single string.
The notation is LxTy:
Lis the number of spatial layers (different resolutions, stacked)xis the count –L1means one resolution,L3means three resolutions on top of each otherTis the number of temporal layers (different frame rates inside the same resolution)yis the count –T1means a single frame rate,T3means three frame rates that SFUs and decoders can selectively drop
So L1T1 is one resolution and one frame rate – effectively no scalability. L3T3 is three resolutions, each with three temporal layers, which is the “everything on” mode that VP9 and AV1 SFUs lean on for large conferences.
There is also a less-known S family – S2T1, S3T3, etc. – where S means “simulcast on a single SSRC, no inter-layer prediction”.
The mode list
The canonical predefined modes from the W3C webrtc-svc spec, with one-line meanings:
Temporal-only modes (work with all video codecs)
| Mode | Spatial | Temporal | Meaning |
|---|---|---|---|
L1T1 | 1 | 1 | One layer, no scalability. The default if nothing is set |
L1T2 | 1 | 2 | One resolution, two frame rates. Cheap insurance for packet loss |
L1T3 | 1 | 3 | One resolution, three frame rates. The standard temporal SVC pick |
Spatial + temporal modes, 2:1 resolution ratio (VP9 and AV1 only)
| Mode | Spatial | Temporal | Meaning |
|---|---|---|---|
L2T1 | 2 | 1 | Two resolutions, one frame rate each |
L2T2 | 2 | 2 | Two resolutions, two frame rates each |
L2T3 | 2 | 3 | Two resolutions, three frame rates each |
L3T1 | 3 | 1 | Three resolutions, one frame rate each |
L3T2 | 3 | 2 | Three resolutions, two frame rates each |
L3T3 | 3 | 3 | Three resolutions, three frame rates each. The full SVC mode |
Codec support matrix
Not every codec supports every mode. The spec is explicit:
| Codec | Modes supported |
|---|---|
| VP8 | L1T1, L1T2, L1T3 (temporal only) |
| H.264 | L1T1, L1T2, L1T3 (temporal only, browser support varies) |
| VP9 | All L modes including _KEY and _KEY_SHIFT variants |
| AV1 | All L modes including _KEY and _KEY_SHIFT variants |
Notes:
- VP8 and H.264 do not do spatial SVC at all in WebRTC. If a developer asks for
L2T2on a VP8 sender, the browser either rejects it or silently falls back - While simulcast (VP8) supports 3 temporal layers in the spec, today’s browser implementations only encode 2 temporal scalability layers for simulcast
What to pick for which use case
There is no single correct answer. There is a default that fits most cases and edge cases that justify deviating.
1:1 calls. Current best practice is to not use simulcast or SVC. Temporal scalability can still be highly useful in this case, but not enabled by web browsers on the encoder.
Group calls (3 or more participants). Use simulcast or SVC, based on the codecs you select and support. In simulcast, disable layers non-low that no one is consuming/viewing. In SVC, reduce bitrates to match the highest viewer’s needs. If you never used either, start with simulcast and get the implementation fine tuned and optimized before trying to introduce SVC.
Broadcast / one-to-many. Same rules as in group calls apply. Here, there might not be a real benefit to optimize for less layers or lower bitrates – especially in larger broadcasts or when the media gets recorded or streamed to other protocols.


