When someone says WebRTC Server – what does he really mean? There are 4 different WebRTC servers that you need to know about: application, signaling, NAT traversal and media.
WebRTC is a communications standard that enables us to build a variety of applications. The most common ones will be voice or video calling services (1:1 or group calls). You can use it for broadcasts, live streaming, private/secure messaging, etc.
To get it working requires using a multitude of “WebRTC servers” – machines that reside in the cloud (or at least remotely enough and reachable) and provide functionality that is necessary to get WebRTC sessions connected properly.
What I’d like to do here is explain what types of WebRTC servers exist, what they are used for and when will you be needing them. There are 4 types of servers detailed in this article:
- WebRTC application servers – essentially the website hosting the service
- WebRTC signaling servers – how clients find each other and connect to each other
- NAT traversal servers for WebRTC – servers used to assist in connecting through NATs and firewalls
- WebRTC media servers – media processing servers for group calling, recording, broadcasting and other more complex features
More of the audio-visual type? I’ve recorded a quick free 3-part video course on WebRTC servers.
WebRTC application servers
Not exactly a WebRTC server, but you can’t really have a service without it 😀
Think of it as the server that serves you the web page when you open the application’s website itself. It hosts the HTML, CSS and JS files. A few (or many) images. Some of it might not even be served directly from the application server but rather from a CDN for the static files.
What’s so interesting about WebRTC application servers? Nothing at all. They are just there and are needed, just like in any other web application out there.
WebRTC signaling servers
Signaling servers for WebRTC are sometimes embedded or collocated/co-hosted with the application servers, but more often than not they are built and managed separately from the application itself.
While WebRTC handles the media, it leaves the signaling to “someone else” to take care of. WebRTC will generate SDP – these are fragments of messages that the application needs to pass between the users. Passing these messages is the main concern of a signaling server.
There are 4 main signaling protocols that are used today with WebRTC, each lending itself to different signaling servers that will be used in the application:
- SIP – The dominant telecom VoIP protocol out there. When used with WebRTC, it is done as SIP over WebSocket. CPaaS and telecom vendors end up using it with WebRTC, mostly because they already had it in use in their infrastructure
- XMPP – A presence and messaging protocol. Some of the CPaaS vendors picked this one for their signaling protocol
- MQTT – Messaging protocol used mainly for IOT (Internet of Things). First time I’ve seen it used with WebRTC was Facebook Messenger, which makes it a very popular/common/widespread signaling server for WebRTC
- Proprietary – the most common approach of all, where people just implement or pick an alternative that just works for them
SIP, XMPP and MQTT all have existing servers that can be deployed with WebRTC.
The proprietary option takes many shapes and sizes. Node.js is quite a common server alternative used for WebRTC signaling (just make sure not to pick an outdated alternative – that’s quite a common mistake in WebRTC).
If you are going towards the proprietary route:
- Don’t use apprtc as the baseline for your WebRTC signaling server
- Consider my WebRTC the missing codelab course
NAT traversal servers for WebRTC
To work well, WebRTC requires NAT traversal servers. These WebRTC servers are in charge of making sure you can send media from one browser to another.
STUN is used to answer the question “what is my public IP address?” and then share the answer with the other user in the session, so he can try and use that address to send media directly.
TRUN is used to relay the media through it (so it costs more in bandwidth costs), and is used when you can’t really reach the other user directly.
A few quick thoughts here:
- You need both STUN and TURN to make WebRTC work. You can skip STUN if the other end is a media server. You will need TURN in WebRTC even if your other end of the session is a media server on a public IP address
- Don’t use free STUN servers in your production environment. And don’t never ever use “free” TURN servers
- If you deploy your own servers, you will need to place the TURN servers as close as possible to your users, which means handling TURN geolocation
- TURN servers don’t have access to the media. Ever. They don’t pose a privacy issue if they are configured properly, and they can’t be used by you or anyone else to record the conversations
- Prefer using paid managed TURN servers instead of hosting your own if you can
- Make sure you configure NAT traversal sensibly. Here’s a free 3-part video course on effectively connecting WebRTC sessions
WebRTC media servers
WebRTC media servers are servers that act as WebRTC clients but run on the server side. They are termination points for the media where we’d like to take action. Popular tasks done on WebRTC media servers include:
- Group calling
- Broadcast and live streaming
- Gateway to other networks/protocols
- Server-side machine learning
- Cloud rendering (gaming or 3D)
The adventurous and strong hearted will go and develop their own WebRTC media server. Most would pick a commercial service or an open source one. For the latter, check out these tips for choosing WebRTC open source media server framework.
In many cases, the thing developers are looking for is support for group calling, something that almost always requires a media server. In that case, you need to decide if you’d go with the classing (and now somewhat old) MCU mixing model or with the more accepted and modern SFU routing model. You will also need to think a lot about the sizing of your WebRTC media server.
For recording WebRTC sessions, you can either do that on the client side or the server side. In both cases you’ll be needing a server, but what that server is and how it works will be very different in each case.
If it is broadcasting you’re after, then you need to think about the broadcast size of your WebRTC session.
A quick FAQ on WebRTC servers
Not really. You will need somehow to know who to communicate with and in many cases, you will need to somehow negotiate IP addresses and even route data through a server to connect your session properly.
That depends on the service you are using, as different implementations will put their focus on different features.
In general, signaling and NAT traversal servers in WebRTC don’t have access to the actual data. Media servers often have (and need) access to the actual data.
Yes. You can host your WebRTC servers on AWS. Many popular WebRTC services are hosted today on AWS, Google Cloud, Microsoft Azure and Digital Ocean servers. I am sure other hosting providers and data center vendors work as well.
WebRTC can be added to any WordPress, PHP or other website. In such a case, the PHP WordPress server will serve as the application server and you will need to add into the mix the other WebRTC servers: signaling server, NAT traversal server and sometimes media servers.
Know your WebRTC servers
No matter how or what it is you are developing with WebRTC, you should know what WebRTC servers are and what they are used for.
If you want to expand your knowledge and understanding of WebRTC, check out my WebRTC training courses.