How To Implement Multipoint Video Using WebRTC: Broadcast

January 29, 2013
Here's how you broadcast using WebRTC. This is the second post dealing with multipoint:
  1. Introduction
  2. Broadcast (this post)
  3. Small groups
  4. Large groups
- If you are developing a WebRTC service that requires broadcasting, then there are several aspects you need to consider. First off, don't assume you will be able to broadcast directly from the browser – doing so isn't healthy for several reasons. Let's start with an initial analysis of the various resources required for sending media over WebRTC:
  1. Camera acquisition, where the browser grabs raw data from the camera
  2. Video encoding, where the browser encodes the raw video data. This part is CPU intensive, especially now when there's no hardware acceleration available for the VP8 video codec
  3. Sending the encoded packets over the network, where there are two resources being consumed:
    1. Bandwidth, on the uplink, where it is usually scarce
    2. For lack of a better term, network driver
The first two – camera and encoding is done once, so it takes the same amount of effort as P2P. The third one gets multiplied the number of endpoints you wish to reach with your broadcast. As there is no real way to use multicast, what is used is multi-unicast – each receiving side gets its own media stream, and has its own open session with our broadcaster. While home broadband can enable you to get multiple such sessions and "broadcast" them, it won't scale up enough, and it definitely won't work for mobile handsets: there it will eat up battery life as well as your data plan, assuming it works at all. The above are rather easy to solve – we haven't even dealt with the issue of different capabilities by different receiving clients. For this reason, when planning on broadcasting, you will need to stick with a server-side solution. Here's how a simplified architecture for broadcasting will look like (I've ignored signaling for the sake of simplicity, and because WebRTC ignores it):

The things to note:
  1. Client-side broadcaster sends out a single media stream to the server, who takes care of the heavy lifting here.
  2. The server side does the work of "multiplying" the media stream into streams in front of each participant. And with it, it will have an open, active session. This can either be simple, or nasty, depending on how smart you want to be about it:
    1. Easiest way, the server acts as a proxy, not holding a real "session" in front of each client – just forwarding whatever it receives to the clients it serves. Easy to do, costs only bandwidth, but not the best quality achievable
    2. You can decide to transcode and change resolution, frame rate, bitrate – whatever in front of each receiver. Here it is also relatively easy to hold context in front of each receiver, providing the best quality
    3. You can do the above to generate several bit streams that fit different devices (laptops, tablets, smartphones, etc), and then send out that data as a proxy
    4. You can maintain sessions in front of each client, but not delve too deep into the packets – give a bit better quality, but at a reasonable cost
  3. The server side is… more than a single server – in most cases. Either because it is easier to scale, or because it is better to keep the transcoders and the proxies separated in terms of the design itself.
  4. While broadcasting, you might want to consider supporting non-WebRTC clients – serving pure HTML5 video, Flash or iOS devices natively. This means another gateway function is required here.
If you plan on broadcasting, make sure you know where you are headed with it – map your requirements against your business model to make sure it fits there – bandwidth and the computing power required for these tasks doesn't come cheap.

You may also like