Last updated: May 2, 2026

The SVC scalability mode set is the small list of canonical strings – L1T1, L1T2, L1T3, L2T2, L3T3, and friends – that a WebRTC developer drops into RTCRtpEncodingParameters.scalabilityMode to tell the encoder how many spatial and temporal layers to produce. It is the actual control surface for SVC in WebRTC. Everything else – codec selection, simulcast, bandwidth allocation – eventually points back at one of these strings.

Most engineers never read the W3C spec. They copy a scalabilityMode value from a sample app, ship it, and move on. That is fine until something breaks and the question becomes “what mode am I actually running, and is it doing what I think it is?” This entry is the lookup page for that moment.

Why this notation exists

The W3C webrtc-svc spec defines a small registry of named modes so the browser, the application, and the SFU can all agree on what an encoder is producing without negotiating it byte by byte. Before this registry, SVC in WebRTC was a soup of vendor-specific knobs and SDP munging with no standard way of doing things across browsers, media servers and applications. The mode set replaces that with a single string.

The notation is LxTy:

  • L is the number of spatial layers (different resolutions, stacked)
  • x is the count – L1 means one resolution, L3 means three resolutions on top of each other
  • T is the number of temporal layers (different frame rates inside the same resolution)
  • y is the count – T1 means a single frame rate, T3 means three frame rates that SFUs and decoders can selectively drop

So L1T1 is one resolution and one frame rate – effectively no scalability. L3T3 is three resolutions, each with three temporal layers, which is the “everything on” mode that VP9 and AV1 SFUs lean on for large conferences.

There is also a less-known S family – S2T1, S3T3, etc. – where S means “simulcast on a single SSRC, no inter-layer prediction”.

The mode list

The canonical predefined modes from the W3C webrtc-svc spec, with one-line meanings:

Temporal-only modes (work with all video codecs)

ModeSpatialTemporalMeaning
L1T111One layer, no scalability. The default if nothing is set
L1T212One resolution, two frame rates. Cheap insurance for packet loss
L1T313One resolution, three frame rates. The standard temporal SVC pick

Spatial + temporal modes, 2:1 resolution ratio (VP9 and AV1 only)

ModeSpatialTemporalMeaning
L2T121Two resolutions, one frame rate each
L2T222Two resolutions, two frame rates each
L2T323Two resolutions, three frame rates each
L3T131Three resolutions, one frame rate each
L3T232Three resolutions, two frame rates each
L3T333Three resolutions, three frame rates each. The full SVC mode

Codec support matrix

Not every codec supports every mode. The spec is explicit:

CodecModes supported
VP8L1T1, L1T2, L1T3 (temporal only)
H.264L1T1, L1T2, L1T3 (temporal only, browser support varies)
VP9All L modes including _KEY and _KEY_SHIFT variants
AV1All L modes including _KEY and _KEY_SHIFT variants

Notes:

  • VP8 and H.264 do not do spatial SVC at all in WebRTC. If a developer asks for L2T2 on a VP8 sender, the browser either rejects it or silently falls back
  • While simulcast (VP8) supports 3 temporal layers in the spec, today’s browser implementations only encode 2 temporal scalability layers for simulcast

What to pick for which use case

There is no single correct answer. There is a default that fits most cases and edge cases that justify deviating.

1:1 calls. Current best practice is to not use simulcast or SVC. Temporal scalability can still be highly useful in this case, but not enabled by web browsers on the encoder.

Group calls (3 or more participants). Use simulcast or SVC, based on the codecs you select and support. In simulcast, disable layers non-low that no one is consuming/viewing. In SVC, reduce bitrates to match the highest viewer’s needs. If you never used either, start with simulcast and get the implementation fine tuned and optimized before trying to introduce SVC.

Broadcast / one-to-many. Same rules as in group calls apply. Here, there might not be a real benefit to optimize for less layers or lower bitrates – especially in larger broadcasts or when the media gets recorded or streamed to other protocols.

Tags:

Looking to learn more about WebRTC? 

Check my WebRTC training courses

About WebRTC Glossary

The WebRTC Glossary is an ongoing project where users can learn more about WebRTC related terms. It is maintained by Tsahi Levent-Levi of BlogGeek.me.