WebRTC Signaling Protocols and WebRTC Transport Protocols Demystified

By Tsahi Levent-Levi

May 16, 2016

WebRTC as a protocol comes without signaling. This means that you as a developer will need to take care of it.

Can you guess the signaling and transport here?

The first step will be selecting the protocol for it. Or more accurately – two protocols: transport and signaling. In many cases, we don’t see the distinction (or just don’t care), but sometimes, they are important. A recent question in the comments section of one of the two posts mentioned here in the beginning, got me to write this explanation. Probably yet again.

Having a hard time figuring out which signaling protocol you need? Just read my article on choosing the best signaling protocol for your WebRTC application.

WebRTC Transport Protocols and Browsers

This actually fits any browser transport protocol.

A transport protocol is necessary for us to sent a message from one device to another. I don’t care what is in that message or how the message is structured at this point – just that it can be sent – and then received.

HTTP/1.1

5 years ago browsers were simple when it came to transport protocols. We essentially had HTTP/1.1 and all the hacks on top of it, known as XHR, SSE, BOSH, Comet, etc. If you are interested in the exact mechanics of it, then leave a comment and I’ll do my best to explain in a future post (though there’s a lot of existing explanation around the internet already).

I call the group of solutions on top of HTTP/1.1 workarounds. They make use of HTTP/1.1 because there was no alternative at the time, but they do it in a way that makes no technical sense.

Oh – and you can even use REST to some extent, which is again a minor “detail” above HTTP/1.1.

Since then, three more technique materialized: WebSocket, WebRTC and recently HTTP/2.

WebSocket

The WebSocket was added to do what HTTP/1.1 can’t. Provide a bidirectional mechanism where both the client and the web server can send each other messages. What these messages are, what they mean and what type of format they follow was left to the implementer of the web page to decide.

There’s also socket.io or the less popular SockJS. Both offer client side implementations that simulate WebSocket in cases it cannot be used (browser or proxy doesn’t support it). If you hear that the transport is socket.io – for the most part you can just think about it as WebSocket.

When your WebSocket work great, they are great. But sometimes it doesn’t (more on that below, under the HTTP/2 part).

WebRTC’s Data Channel

To some extent, the Data Channel in WebRTC can be used for signaling.

Yes. You’ll need to negotiate IP addresses and use ICE first – and for that you’ll need an additional layer of signaling and transport (from the list in this post here), but once connected, you can use the data channel for it.

This can be done either directly between the two peers, or through intermediaries (for multiple reasons).

Where would you want to do that?

To reduce latency in your signaling – this is theoretically the fastest you can go
To reduce load on the server – now it won’t receive all messages just to route them around – you’ll be sending it things it really needs
To increase privacy – not sending messages through the server means the server can’t be privy to their content – or even the fact there was communication

For the most part, this is quite rare as transport for signaling in WebRTC.

HTTP/2

I’ve written about HTTP/2 before. Since then, HTTP/2 has grown in its popularity and spread.

HTTP/2 fixes a lot of the limitations in HTTP/1.1, which can make it a good long term candidate for transport of signaling protocols.

A good read here would be Allan Denis’ writeup on how HTTP/2 may affect the need for WebSocket.

WebRTC Signaling Protocols

Signaling is where you express yourself. Or rather your service does. You want one user to be able to reach out to another one. Or a group of people to join a virtual room of sorts. To that end, you decide on what types of messages you need, what they mean, how they look like, etc.

That’s your signaling protocol.

As opposed to the transport protocol, you aren’t really limited by what the browser allows, but rather by what you are trying to achieve.

Here are the 3 main signaling protocols out there in common use with WebRTC:

SIP

I hate SIP.

Never really cared for it.

It has its uses, especially when it comes to telephony and connecting to legacy voice and video services.

Other than that, I find it too bloated, complex and unnecessary. At least for most of the use cases people approach me with.

SIP comes from the telephony world. Its main transport was UDP. Then TCP and TLS were added as transport protocols for it. Later on SCTP. You don’t care about any of these, as you can’t really access them directly with a browser. So what was done was to add WebSocket as a SIP transport and just call it “SIP over WebSocket”. Before WebRTC got standardized (it hasn’t yet), SIP over WebSocket got standardized and already has an RFC of its own. Why is it important? Because the only use of SIP over WebSocket is to enable it to use WebRTC.

So there’s SIP. And if you know it, like it or need it. You can use it for your WebRTC signaling protocol.

XMPP

I hate XMPP.

Not really sure why. Probably because any time I say something bad about it, a few hard core fans/followers/fanatics of XMPP come rushing in to its rescue in the comments section. It makes things fun.

XMPP has a worldview revolving around presence and instant messaging, and use cases that need it can really benefit from it – especially if the developer already knows XMPP and what he is doing.

If you like it enough – make sure to slam me in the comments – you’ll find their section at the end of this post…

Proprietary

I hate NIH. And yet a proprietary signaling protocol has a lot of benefits in my view.

In many cases, you just want to get the two darn users into the “same page”. Not much more. I know I am dumbing it down, but the alternative is to carry around you extra protocol messages you don’t need or intend using.

In many other cases, you don’t really want to add another web server to handle signaling. You want your web server to host the whole site. So you resolve into a proprietary signaling protocol. You might not even call it that, or think of it as a signaling protocol at all.

How to Choose?

Always start from the signaling protocol.

If there’s reason to use SIP due to existing infrastructure or external systems you need to connect to – then use it. If there’s no such need, then my suggestion would be to skip it.

If you like XMPP, or need its presence and instant messaging capabilities – then go use it.

If the service you are adding WebRTC to already has some logic of its own, it probably has signaling in there. So you just add the relevant messages you need to that proprietary signaling.

In any other case, my advice would be to use a proprietary signaling solution that fits your exact need. If you’re fine with it, I’d even go as far as picking a SaaS vendor for signaling.

Need to understand WebRTC and how to design and architect real world solutions with it? A first step is to understand the servers used to connect WebRTC.

Choosing the best WebRTC signaling protocol for your application

WebRTC is about reducing friction and barriers of entry

Matthew says:

May 16, 2016 at 4:22 pm

You forgot Matrix.org as a less hateful alternative to SIP, XMPP and proprietary SaaS for signalling 🙂

Reply
1. Tsahi Levent-Levi says:
  
  May 16, 2016 at 5:07 pm
  
  Matthew,
  
  Thought about you and Matrix (https://matrix.org) multiple times when writing this one. You know I love what you’re doing and also suggesting it to people when it makes sense.
  
  What you might not like is where I have Matrix compartmentalized – in my mind, this is still in the domain of proprietary even if that isn’t the intent with it 🙂
  
  Reply
Jeff - VoipDIY says:

May 16, 2016 at 7:37 pm

As a ham radio operator since the age of 13, answering the question to the above question to the photo caption:
“Can you guess the signaling and transport here?”
didn’t take me more than a millisecond to recall.

As the photo is of a telegraph hand key, I would presume the “protocol” is International Morse Code – more commonly known as just – “Morse Code”.

In the day of Thomas Edison, the “transport” would typically be be just a pair of wires hung between telephone/telegraph poles connected to a telegraph key and “sounder” at each end of the wires.

Later, the main transport for transmitting and receiving Morse Code became the HF radio transmitter/antenna pair at the near end, and an antenna/radio receiver pair at the far end.

In my early days of ham radio, Morse Code was the primary methodology I used to communicate with other ham radio operators around the world. It was also used by the military, merchant marines, and not to mention Western Union. 20 wpm was my top transmission rate speed.

Morse Code over radio was also known as “CW” (continuous wave modulation). The Morse Code hand key essentially modulates a “continuous wave” radio signal with its on/off action producing a coded message encoded with Morse Code.

One might consider the human/telegraph-key pair as the Encoder and the human/radio receiver pair as the decoder, while the radio transmitter/receiver antenna pairs as the transport system.

Sorry if I was too long winded answering that question to the photo caption… 🙂

Reply
1. Tsahi Levent-Levi says:
  
  May 16, 2016 at 7:39 pm
  
  Thanks for the complete answer Jeff 🙂
  
  Reply
Shair Khan says:

May 17, 2016 at 2:36 am

Let me ask a stupid question.

Why do we need an elaborate icE and Turn and Stun servers to discover each other address? Why we cannot use socket.io to exchange the address with the peers?

Reply
1. Sergey Markovich says:
  
  May 17, 2016 at 4:27 am
  
  Shair Khan,
  
  The problem is that if your Browser is behind the NAT, it has a private address like 192.168.*.*, etc. Sharing only this address (we call it “local candidate”) with the peer (via socket.io or something else) does not make any sense because the peer will not be able to use this address to contact you. So ICE is needed to discover more “candidates” (addresses) – like your public address on the public side of the NAT and probably allocating another address on the TURN server. Then you share ALL your candidates (addresses) with the peer via socket.io or any other mechanism. This is done by a special SDP extension. The peer does the same and shares the list of his candidates with you. After that the connectivity testing starts – the goal of this process is to find the pair of candidates (one yours and one of your peer) that will work TOGETHER, i.e. media will flow in both directions.
  This is very brief and simplified description 🙂
  
  Reply