What are the Challenges of DIY your WebRTC SFU?

S doesn't stand for Simple in WebRTC SFU.

I have noticed recently that more and more companies are attempting the creation of their own SFU. SFU stands for Selective Forwarding Unit, and it is by far the most popular and cost efficient architecture today for multiparty video with WebRTC. With it, all participants send their video to a single entity (usually in multiple resolutions/bitrates), and that single entity decides (selectively) how to route the incoming video to all the participants.

One such popular framework is the Jitsi Videobridge.

Up until today, an SFU for WebRTC was rather simplistic. You had VP8 to contend with as a developer but that's about it. Life was good. You built your service and mostly whined about incompatibility between browsers.

Things have changed.

Here are a few things that you need to take into consideration when you either build your own WebRTC SFU or adopt/acquire one from others:

  • Do you use VP8 or VP9 in your SFU?
    • Google is already adding VP9 to Chrome
    • How long will it take until it catches on for some use cases?
    • VP9 is a better codec, so why not use it?
  • Can it support multiple codecs simultaneously?
    • Before the end of this year, we will have VP8, VP9 and H.264 available to us in browsers
    • Not all browsers will support them all
    • VP8 seems like the lowest common denominator at the moment
    • This may change to H.264 when Microsoft Edge and Chrome support it though
    • An SFU supporting only VP8 will start looking old pretty fast - and won't work on Edge
    • Staying in H.264/VP8 land will not perform as well as VP9 in terms of perceived quality for the users
    • So it would be beneficial to be able to use whatever is available at the best possible quality
    • Which makes it a lot more complex for an SFU - more decisions to make with more data points to take into consideration
  • Mobile
    • Mobile doesn't like multiple, simultaneous video decoders
    • Especially not when this is hardware accelerated - not all smartphone hardware can work this way
    • For mobile devices, you just might want to select a single video stream to send it - or combine multiple video streams to a single one (which looks more like an MCU, but who cares?)
  • Broadcast
    • In many new use cases, people want to have multiple participants chatting, but many more passively viewing
    • Can an SFU scale there? And if it can't, what will you do instead?

Like any other technology, once you get down to it, there's more to do than the proof of concept. Consider these aspects at the beginning of your project - and not when you need to seriously rethink your architecture.

Tsahi Levent-Levi

Tsahi Levent-Levi

Independent WebRTC analyst. I help companies ship real-time communications they can actually monitor. 20+ years in the comms space, last 13 focused on WebRTC.

More about Tsahi →