The Future of WebRTC Live Broadcast

February 18, 2016
It is in the viewer side. Live broadcast is all the rage when it comes to WebRTC. In 2015 it grew 3-fold. It is a hard nut to crack, but there are solutions out there already - including the new Spotlight service from TokBox.

WebRTC Live Broadcast Today

If you look closely, most of the deployments today for live broadcast using WebRTC look somewhat like the following diagram:
How you live broadcast using WebRTC today
What happens today, is that WebRTC is used for the presenter - the acquisition of the initial video happens using WebRTC - just right to the broadcast server. There, the media gets transcoded and changes format to the dialects used for broadcasting - Flash, HLS and/or MPEG-DASH. The problem is that these broadcast dialects add latency - check this explanation about HLS to understand. With our infatuation to real time and the strive of moving any type of workload and use case towards real time, there's no wonder that the above architecture isn't good enough. With my discussions, many entrepreneurs would love to see this obstacle removed with live broadcasts having latency of mere seconds (if not less). The current approaches won't work, because they rely heavily on the ability to buffer content before playing it, and that buffering adds up to latency.

WebRTC Live Broadcast Tomorrow

This is why a new architecture is needed - one where low latency and real time are imperatives and not an afterthought. Since standardization and deployment takes time, the best alternative out there today is utilizing WebRTC, which is already available in most browsers.
How WebRTC live broadcast will look like tomorrow
The main difference here? The broadcast server needs to be able to send WebRTC at scale and not only handle it on its ingress. To do this, we need a totally different server side WebRTC media implementation than the alternatives on the market today (both open source and commercial). What happens today is that WebRTC implementations on the server are designed to work almost back-to-back - they simulate a full WebRTC client per connection. That's all nice and well, but it can't scale to 100's, 1000's or millions of connections. To get there, the sever will first need to split the dependency on the presenter - it will need to be able to process media by itself, but do that in a way that optimizes for large scale sessions. This, in turn, means rethinking how a WebRTC media stack is architected and built. Someone will need to rebuild WebRTC from the ground up with this single use case in mind. I am leaving a lot of the details out of this article due to two reasons:
  1. While I am certain it can be done, I don't have the whole picture in my mind at the moment
  2. I have a different purpose here, which we are now getting to

A Skillset Issue

To build such a thing, one cannot just say he wants low latency broadcast capabilities. Especially not if he is new to video processing and WebRTC. The only teams that can get such a thing built are ones who have experience with video streaming, video conferencing and WebRTC - that's three different domains of expertise. While such people exist, they are scarce.

Is it worth it?

Optimizing down from 20 seconds latency to 2 seconds latency. That's what we're talking about. Is investing in it worth the effort? I don't have a good answer for this one.  
Need to pick an open source WebRTC media server framework for your project? Check out this free selection worksheet.

You may also like