Too many WebRTC open source projects. Not enough good ones.
There are many shapes and sizes of WebRTC servers. In this article, I want to focus on WebRTC media servers, and in this category, the open source alternatives.
Ever went to github to search for something you needed for your WebRTC project? Great. Today, there’s almost as many WebRTC github projects as there are WebRTC LinkedIn profiles.
Some of these code repositories really are popular and useful while others? Less so.
Here’s the most glaring example for me –
When you just search for WebRTC on github, and let it select the “Best match” by default for you, you’ll get PubNub’s sample of using PubNub as your signaling for a simple 1:1 video call using WebRTC. And here’s the funny thing – it doesn’t even work any longer. Because it uses an old PubNub WebRTC SDK. And that’s for an area that requires less of an effort from you anyway (signaling).
You can read more about this in my open source WebRTC media server github comparison.
Let’s assume you actually did find a WebRTC open source media server that you like (on github of course). How do you know that it is any good? Here are 10 different signals (not WebRTC ones) that you can use to make that decision.
Need to pick a WebRTC media server framework? Why not use my Free Media Server Framework Selection Worksheet when checking your alternatives?
1. Do You Grok the Code?
If you are going to adopt an open source media server for your WebRTC project then expect to need to dive into the code every once in awhile.
This is something you’ll have to do either to get the darn thing to work, fix a bug, tweak a setting or even write the functionality you need in a plugin/add-on/extension or whatever name that media server uses for making it work.
In many of the cases I see when vendors rely on an open source WebRTC media server framework, they end up having to dig in its code, sometimes to the point of taking full ownership of it and forking it from its baseline forever.
To make sure you’re making the right decision – check first that you understand what you’re getting yourself into and try to grok the code first.
My own personal preference would be a code that has comments in it (I know I have high standards). I just can’t get myself behind the notion of code that explains itself (it never does). So be sure to check if the non-obvious parts (think bandwidth estimation) are commented properly while you’re at it.
2. Is the Code Fresh?
Apple just landed with WebRTC. And yes. We’re all happyyyyy.
But now we all need to shift out focus to H.264. Including that WebRTC media server you’ve been planning to use.
Oh – and Google? They just announced they will be migrating slowly from Plan B to Unified Plan. Don’t worry about the details – it just changes the way multiple streams are handled. Affecting most group calling implementations out there.
And there was that getstats() API change recently, since the draft specification of WebRTC finally decided on the correct definition of it, which was different than its Chrome implementation.
The end result?
Code written a year or two ago have serious issues in actually running.
Without even talking about upgrades, updates, security patches, etc.
Just baseline “make it work” stuff.
When you check that github page of the WebRTC media server you plan on adopting – make sure to look when it was last updated. Bonus points if you check what that update was about and how frequently updates take place.
3. Anyone Using It?
Nothing like making the same mistakes others are making.
Err… wrong expression.
What you want is a popular open source WebRTC media server. Anything else has a reason, and that reason won’t be that you found a diamond in the rough.
Go for a popular framework. One that is battle tested. Preferably used by big names. In production. Inside commercial products.
Why? Because it gives you two things you need:
It gives you validation that this thing is worth something – others have already used it.
And it gives you an ecosystem of knowledge and experience around it. This can be leveraged sometimes for finding some freelancers who have already used it or to get assistance from more people in the “community”.
I wouldn’t pick a platform only based on popularity, but I would use it as a strong signal.
4. Is This Thing Documented?
What you get in a media framework for WebRTC is a sort of an engine. Something that has to be connected to your whole app. To do that, you need to integrate with it somehow – either through its APIs when you link on it – or a lot more commonly these days by REST APIs (or GraphQL or whatever). You might need both if you plan on writing your own addon/extension to it.
And to do that, you need to know the interface that was designed especially for you.
Which means something written. Documentation.
When it comes to open source frameworks, documentation is not guaranteed to be of specific quality – it will vary greatly from one project to another.
Just make sure the one you’re using is documented to a level that makes it… understandable.
If possible, check that the documentation includes:
- Some introductory material to the makeup and architecture of the project
- An API reference
- A few demos and examples of how to use this thing
- Some information about installation, configuration, maintenance, scaling, …
The more documentation the better off you will be a year down the road.
5. Is It Debuggable?
WebRTC is real time and real time is hard to debug.
It gets harder still if what you need to look at isn’t the signaling part but rather the media part.
I know you just LOVE adding your own printf and cout statements in your C++ code and try reproducing that nagging bug. Or better yet – start collecting PCAP files and… err… read them.
It would be nice though if some of that logging, debugging, etc would be available without you having to always add these statements into the code. Maybe even have a mechanism in place with different log levels – and have sensible information written into the logs so the time you’ll need to invest in finding bugs will be reduced.
Also – make sure it is easy to plug a monitoring system to the “application” metrics of the media server you are going to use. If there is no easy way to test the health of this thing in production, it is either not used in production or requires some modifications to get there. You don’t want to be in that predicament.
While at it – make sure the code itself is well documented. There’s nothing as frustrating (and as stupid) as the self explanatory code notion. People – code can’t explain itself – it does what it does. I know that the person who wrote that media server was god incarnate, but the rest of us aren’t. Your programmers are excellent, but trust me – not that good. Pick something maintainable. Something that is self explanatory because someone took the time to write some damn good comments in the tricky parts of the code. I know this one is also part of grokking the code, but it is that important – check it twice.
For me, the ability to debug, troubleshoot and find issues faster is usually a critical one for something that is going to get into my own business. Just a thought.
6. Does it Scale?
Media servers are resource hogs (check this video mini series for a quick explanation).
This means that in most likelihood, if your business will become successful in any way, you will need more than a single media server to run at “scale”.
You can always crank it up on Amazon AWS from m4.large up to m4.16xlarge, but then what’s next?
At the end of the day, scaling of media servers comes down to adding more machines. Which is simple enough until you start dealing with group calls.
Here’s an example.
- Assume a single machine can handle 100 participants, split into any group type (I am simplifying here)
- And we have 10 participants on average in each call
- Each group call can have 2 participants, up to… 50 participants
Now… how do we scale this thing?
How many machines do we need to put out there? When do we decide that we don’t add new calls into a running machine? Do we cascade these machines when they are fully booked? Throw out calls and try to shift them to other machines?
I don’t know, but you should. At least if you’re serious about your product.
You probably won’t find the answers to this in the open source WebRTC media server’s documentation, but you should at least make sure you have some reasonable documentation of how to run it at scale, and not as a one-off instance only.
7. What Languages Does it Use?
They wrote it in Kotlin.
Because that’s the greatest language ever. Here. Even Google just made it an official one for Android.
What can go wrong?
Two things I look for in the language being used:
- That the end result will be highly performant. Remember it’s a resource hog we’re dealing with here, so better start with something that is even remotely optimized
- That I know how to use and have it covered by my developers
Some of these things are Node.js based while other are written in Java. Which one are your developers more comfortable with? Which one fits better with your technology plans for your company in the next 5 years?
If you need to make a decision between two media servers and the main difference between them for you is the language – then go with the one that works better for your organization.
8. Does It Fit Your Signaling Paradigm?
Three things you need in your WebRTC product:
- Signaling server
- STUN/TURN server
- Media server
That 3rd one isn’t mandatory, but it is if you’re reading this.
That media server needs to interact with the signaling server and the STUN/TURN server.
Sure. you can use the signaling capabilities of the media server, but they aren’t really meant for that, and my own suggestion is not to put the media server publicly out there for everything – have it controlled internally in your service. It just doesn’t make architectural sense for me.
So you’ll need to have it interacting with your signaling server. Check that they both share similar paradigms and notions, otherwise, this in itself is going to be quite a headache.
While at it – check that there’s easy integration between the media server you’re selecting and the STUN/TURN server that you’ve decided to use. This one is usually simple enough, but just make sure you’re not in for a surprise here.
9. Is the License Right For You?
BSD? MIT? APL? GPL? AGPL?
What license does this open source WebRTC media server framework comes with exactly?
Interestingly, some projects switch their license somewhere along the way. Jitsi did it after its acquisition by Atlassian – moving from LGPL to a more permissive APL.
The way your business model looks like, and the way you plan to deploy the service are going to affect the types of open source licenses you will want and will be able to adopt inside your service.
There are different types of free when it comes to open source licenses.
Every piece of code you pick and use needs to adhere to these internal requirements and constraints that you have, and that also includes the media server. Since the media server isn’t something you’ll be replacing in and out of place like a used battery, my suggestion is to pick something that comes with a permissive open source license – or go for a commercial product instead (it will cost you, but will solve this nagging issue).
I’ve written about open source licenses before – check it out as well.
10. Is Anyone Offering Paid Support For It?
Yes. I know.
You’re using open source to avoid paying anyone.
And yet. If you check many of the successful and well maintained open source projects – especially the small and mid-sized ones – you will see a business model behind them. A way for those who invest their time in it to make a living. In many cases, that business model is in support and custom development.
Having paid support as an option means two things:
- Someone is willing to take ownership and improve this thing and he is doing it as a day job – and not as a hobby
- If you’ll need help ASAP – you can always pay to get it
If no one is offering any paid support, then who is maintaining this code and to what end? What would encourage them to continue with this investment? Will they just decide to stop investing in it next month? Last month? Next year?
Making the Decision
I am not sure about you, but I love making decisions. I really do.
Taking in the requirements and the constraints, understanding that there’s always unknowns and partial information. And from there distill a decision. A decisive selection of what to go with.
You can find more technical information on media servers in this great compilation made by Gustavo Garcia.
After you take the functional requirements that you have, and find a few suitable open source WebRTC media frameworks that can fit these requirements, go over this list.
See how each of them addresses the points raised. It should help you get to the answer you are seeking about which framework to go with.
Oh, and remember that media servers are just one piece of the puzzle of the WebRTC servers you will need in your application.
Towards that goal, I also created a Media Server Framework Selection Sheet. Use it when the need comes to select an open source WebRTC media server framework for your project.