In video coding, temporal scalability is the option to decode only some of the frames in a video stream instead of the whole stream. This enables an SFU for example to reduce the bitrate sent towards viewers who doesn’t have enough bitrate or CPU to handle the whole stream. It also assists devices that miss a packet to continue decoding the stream partially until an intra-frame is received.
What is Temporal Scalability?
Temporal scalability is a concept that is crucial to the smooth operation of video streams, especially in the context of WebRTC and group video conferencing. To understand temporal scalability, we first need to look at the structure of video streams.
A video stream is composed of different types of frames. There are I-frames, or intra-frames, which contain all the data about a specific frame, much like how a JPG stores data about an image. Then there are P-frames, or partial frames, which store only the changes from one frame to the next, relying on previous frames for the full picture. Usually, an I-frame will be followed by many consecutive P-frames, each dependent on one another, creating a long dependency chain.
The Challenge of Packet Loss
This system of dependent P-frames works efficiently as long as the network is stable. However, when packet loss occurs, it can disrupt the entire chain of frames, since each P-frame is dependent on the last. To resolve this, a new I-frame needs to be transmitted, which can be bandwidth-intensive.
How Temporal Scalability Works
Temporal scalability offers a solution to this problem. It introduces the concept of layering frames, typically referred to as L0 and L1. Let’s say we’re transmitting at 30 frames per second. We start with a keyframe, and then an L0 frame that depends on this keyframe. The next frame, an L1 frame, will depend on both the keyframe and the L0 frame. This chain of dependencies continues, allowing for a structured hierarchy of frames.
The Benefits of Temporal Scalability
The real advantage of temporal scalability is the flexibility it provides. For instance, if we want to reduce the frame rate from 30 to 15 frames per second, we can simply drop all L1 frames, maintaining the dependency chain between the L0 frames. This can be done by the SFU, enabling it to decide which viewers receive which stream, reducing the stream to 15 frames per second and by extension the bitrate to those who have lower bandwidth available to them.
This ability to build a dependency tree within the encoder and selectively drop frames to adjust the frame rate is what defines temporal scalability.
Temporal Scalability in Different Codecs
Temporal scalability is available in VP8, but only in simulcast. In VP9 and AV1. To some extent, it is also available in H.264. You’ll find SVC (Scalable Video Coding), which includes temporal scalability as part of its capabilities. Interestingly, Google has modified the implementation in VP8 from three temporal layers to just two, which has implications for the flexibility and quality of streams.
Impact on Selective Forwarding Units (SFU)
The introduction of temporal scalability significantly enhances the capabilities of an SFU. Without these techniques, an SFU would have to forward whatever it receives. However, with simulcast and temporal scalability, the number of alternatives available to an SFU increases.
For example, from three different streams at varying bitrates, we can now effectively have six options by dropping frames to reduce the frame rate from 30 to 15 frames per second. This added flexibility allows for larger calls with higher quality than would be possible otherwise.
Temporal scalability is just one of the many terms and concepts crucial to understanding and effectively working with WebRTC. If you’re looking to expand your knowledge with a comprehensive training course on WebRTC, check out webrtccourse.com. Thank you for exploring the intricacies of temporal scalability with us.