I don’t really know, but there’s a lot in this innocent “WebRTC JS library” question that isn’t clear without digging a lot further.
Every now and again (= a week or two) I get a question asking me to help with the selection of this or that open source component, pick a CPaaS vendor for a project, find someone to outsource WebRTC work to or hire a stellar WebRTC developer.
Many of these emails are about shortcuts. Give us that silver bullet. Shortcuts seldomly work with WebRTC.
Last week, I had a question come in. A startup is looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.
The problem I had with it, is that this simple question of which WebRTC JS library should I use didn’t align that well with the set of questions asked.
This article is about what components are needed for WebRTC deployments. If you’re looking to dig deeper into the media paths in WebRTC, then join my free webinar: Mesh, MCU or SFU
Let’s break down WebRTC to its main components as seen from a network architecture perspective:
- NAT traversal
Here’s a slide I’ve been using to explain where a device gets connected to in a typical WebRTC session –
Signaling is how the devices reach out to one another. They can’t do it directly, since they don’t have each other’s IP address, and even if they could, we need some kind of a “protocol” for them to do that.
Signaling in WebRTC is… non-existent. You need to bring your own signaling. This approach confuses some developers, and probably causes this lack of a good solution that fits no-one and everyone at the same time.
Today, you can use SIP, XMPP, MQTT or just proprietary protocols as your signaling for WebRTC traffic. Each such protocol will have its own set of frameworks, services and SDKs that you can use. Some will be free (open source) while others will be licensable software or SaaS based.
NAT traversal is about being able to actually get media flowing.
WebRTC is P2P (peer to peer), meaning you can, in some cases, send media directly across devices. This is something that is impossible otherwise with web browsers. WebRTC also have a preference on using UDP, since it offers better real time low latency characteristics. It is also the only web browser traffic that makes use of UDP, which means it is sometimes blocked as well.
NAT traversal is how WebRTC get past these pesky issues, and it requires additional servers to help it out to do so. Some of these servers (TURN) may end up relaying all traffic through it…
At the end of the day, you will need to deploy these servers or pay for someone to do it for you (no free meals here).
Recording. Group calling. The need to control media paths. Broadcasting. All these end up requiring media servers in the backend. Ones that can process media in one way or another.
The most common approaches today is to use SFUs and solve most of the world/media problems with them. These also offer some signaling protocol of their own – my preference is usually to short circuit these and redirect all this traffic through a different signaling/messaging path – especially for the more complex applications.
Again, they come in different shapes, sizes and types – open source ones and commercial ones. You usually won’t be able to pay for them separately as a hosted service and will need to go to a CPaaS vendor to get the whole set of solutions – if you’re looking for the hosted/managed path.
Payments, user authentication and identity, the website itself and a large number of other things you might be needing.
These are really out of scope of WebRTC, but sometimes are provided by the various vendors and frameworks out there.
Back to that question
What were we dealing with to begin with here?
looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.
Here’s how I’d break this one down to try and understand what was asked:
- That “WebRTC JS library” gives a hint of someone searching for a signaling framework. Which is great
- 1:1 voice chats strengthens that feeling we’re dealing with signaling only
- The word rooms… that feels more like an SFU media server. In this case, I’ll assume there’s no need for a media server though – due to the price points asked (free), the fact that there’s no ask on recording and that this is a 1:1 scenario
- Stores user profiles. Hmm. this usually has nothing to do with WebRTC. So much so that most CPaaS vendors don’t offer such a capability either
- Twilio is about the full shebang – getting a hosted, SaaS, CPaaS, managed (pick the term you like best) solution that gives you signaling, NAT traversal, media and some other knick knacks. Doesn’t quite fit in with the rest of the ask here
When I get such jumbled questions, it feels like there’s a bit of a misunderstanding of what WebRTC is and about how the ecosystem of vendors and services has evolved around it.
Want to learn more about WebRTC?
There are several things to do at this point if you need to grok WebRTC: