How to Select a Signaling Protocol for Your Next WebRTC Project?

When you build your next project with WebRTC – how should you select what signaling protocol to use?

In my monthly email from a few weeks ago, I gave a quick answer to this question. What are the alternative signaling protocols for WebRTC?

As I am currently looking closely at various API platforms for WebRTC, and dealing with that question myself with several clients, I decided it would be beneficial to share my answer here as well, in a bit of a longer form.

There are essentially 5 different options to choose from.WebRTC signaling options

1. COMET / XHR / SSE

Consider this the classic approach to web signaling. If you don’t know enough about it, then read about it on Wikipedia. In essence, this is a hack that enables a web server to send messages to clients – something you need to be able to do when dealing with something like a session across two users/browsers that runs via a server.

These techniques are widely available on web browsers, which makes them commonplace and relatively easy to set up and use.

The only problem? Scaling them. Because they are hacks in nature, they tend to take up more resources on the server side, which means less browsers connected to a server, and that translates to the cost of operation.

On small scales, that might not be an issue, but if you plan on millions of users, you might want to think this one through.

UPDATE: As someone smart pointed out – on its own, this technique still require you to define your own proprietary signaling messages.

2. WebSocket

WebSockets are a relatively new addition to browsers. They enable opening up a session from the client to the web server, and then leaving it open for messages from both directions.

These messages can be textual or binary, they can be as rich as you wish, and they run really fast. You end up with nice scaling capabilities when using WebSockets. You can read more about it here.

The down side of WebSockets? They might not be available in the browser you plan on using (a non-issue for browsers supporting WebRTC, but may become an issue once you wrap WebRTC in a plugin for IE as an example). Oh, and not all web servers and proxies support them, so depending on your architecture and network deployment – you might not be able to even make use of WebSockets.

If you do plan on using WebSockets, I suggest you do two additional things:

  1. Run them over a secured TLS connection, which in general is what you should do for any WebRTC signaling anyway
  2. Think of using a hybrid solution like socket.io or SockJS, which can automatically “downgrade” to COMET mechanisms if WebSockets aren’t available

I’d also use WebSockets whenever. As in whenever I don’t feel that options 3-5 below make sense to me.

UPDATE: As someone smart pointed out – on its own, this technique still require you to define your own proprietary signaling messages.

 3. SIP over WebSocket

This is like WebSockets, only instead of placing inside proprietary messages, you end up putting SIP messages in there.

Ugly as hell, but gets the job done – especially if what you are looking for is connecting to an existing telephony backend. Who does this? Asterisk. Those that try to fuze WebRTC to IMS or RCS. People who need to “gateway” their way into SIP.

Unless you already have a SIP investment in place, and unless a major part of your use case includes calling to PSTN – don’t use this. Even if your origins are in VoIP and SIP is your mother tongue.

4. XMPP/Jingle

Similar to SIP, but this time using another standard signaling protocol called XMPP.

If you take this route, it is probably either because you have an existing XMPP installation or you need the presence capabilities that XMPP comes with out of the box (and with server side implementations readily available).

I am not a fan of XMPP to say the least, but I don’t really have anything bad to say about this approach. If you know and like XMPP – go for it.

5. Data Channel

WebRTC has a data channel. Once an initial connection is made between the two “endpoints”, you can use the data channel to communication and drive your signaling instead of going via a server.

There are few I’ve seen that use this approach, and it does have merit. If has 3 main benefits:

  1. Latency of signaling messages is lower, as there’s no server in-between that needs to parse and understand them
  2. Since a server isn’t involved, server scalability improves – it handles less messages from each connected browser
  3. Improved privacy, simply because tapping into the server gives you less information

UPDATE: As with Comet and WebSockets, you still need to define your messages when using the data channel.

 

Why is it important?

Selection of the signaling protocol will decide the development effort required for certain features as well as the cost you pay for it –in setup time of sessions, server performance, etc.

It is a decision that shouldn’t be taken lightly.

-

While we’re here – how about subscribing to my monthly email? It deals with such questions on… a monthly basis.

 

 

Tags: , , , , , , , ,

Liked this post?

Share it!

Never miss a post!

Or just grab the RSS feed!

Comments

  1. Tsahi, this is a nice balanced review of WebRTC signaling options. Scalability is indeed the primary drawback of do-it-yourself approaches regardless of where Comet, WebSockets, etc. are used.

    For this reason, we’ve published the WebRTC SDK on GitHub back in June for app developers (http://www.pubnub.com/blog/webrtc-sdk-now-available-on-pubnub/) not only to provide a highly-scalable signaling solution but also address the equally-important reliability and QoS aspects.

    The approach is similar to what we’ve done with socket.io which is a very popular WebSockets open-source library but does not scale by itself. Hence, we injected PubNub underneath in order to make it suitable for mass-scale commercial deployments.

    Similarly, if you need to build a mass-scale WebRTC video chat messenger with presence: http://www.pubnub.com/blog/building-video-calling-with-pubnub-and-webrtc/

    • Doron,

      Correct – I didn’t deal with who’s hosting the solution or managing it – just the technology. Now that you mention it, I need to add it to my posting schedule as another topic to touch when it comes to signaling.

      Tsahi

  2. This is a nice overview of the options for signaling in WebRTC projects. When I’m talking with our clients I usually bring the focus back to the use cases particularly around interoperability.

    If your WebRTC app is going to be self contained as in you provide all of the control features in your application then you can choose based on pure technical merit. When there is need to interop with existing infrastructure then we normally are recommending the SIP/WebSocket to avoid most of the need to signaling gateway.

    In reality you will still need to have some level of signaling gateway in place due to the nuances of sip implementations and especially if you are trying to interop with some less standard implementations like MS Lync.

    This said I do encourage software companies to really think about what level of signaling support they want in their WebRTC app. Not every app is going to need all the features/functions that SIP offers and if you go that route you will invariably get drawn into extended testing cycles.

  3. I was mulling over commenting here, because I’m not sure I really like being branded a fanatic. But that horse has probably taken a train, so it’s too late to bolt the station door. Or something.

    So I thought I’d point out that “XMPP” is just an option for putting inside the Comet/Websocket/Data Channel. The bindings for each are well-defined (though to be fair supporting XMPP in SCTP is pretty experimental). For XMPP in a long polling session, there’s BOSH. There’s also more web-like gateway libraries such as xmpp-ftw.

    A typical server (like open-source Prosody, or commercial Isode M-Link), ships with support for BOSH. As the XMPP/Websocket draft stabilizes, we’ll see that support for Websocket too (Prosody already supports Websocket). Data Channel won’t be very far behind; the work on Websocket support for XMPP is carefully considering SCTP as well.

    XMPP based services have been proven at very high scalability levels, too. Google are not the only ones to have run multi-million client systems, though they’re the most well-known – and for VOIP, that’s still on XMPP (as are their cloud services, too). WhatsApp also use something that’s mostly XMPP. They deviated from the standard partly sensibly (they were in uncharted waters when they started) and stupidly (hence their security problems).

    I’m thankfully not actually as religious as I may seem to be here – I’m just a bit confused when you say things like, “I am not a fan of XMPP to say the least, but I don’t really have anything bad to say about this approach.” – statements like that seem at odds with claiming to give a balanced view, and I feel like I have to balance things a bit.

    I actually think, given the current deployed landscape, that if you want interop with SIP, then it should be easy on paper, but is likely harder because of trickle ICE, security model mismatch, and codecs. In addition, most SIP servers don’t speak any web binding (Asterisk is an outlier here), and as far as I know, none are standardized.

    Leaving out federation for the moment, the question becomes on of least work, highest scaling, and minimum lock-in. That means you’ll be up and running faster, you’ll be able to keep to the same technology as you grow, and yet you’ll still be pretty adaptive. With XMPP you’ve a choice of three libraries and a slew of servers, all interchangeable and most of the servers scale impressively.

    As I say, if you’ve a simple set of needs then the overhead of XMPP won’t be useful. XMPP provides you with a rich set of operations, and if you’re not going to benefit from them there’s little point. But that’s really the main argument against XMPP.

    If your main reason for using XMPP is because you’ve a burning need to tell people you’ve an XML-based backbone, then you’re using XMPP for the wrong reasons, and furthermore, you’re probably a relic from the ’90′s.

    On the other hand, if your application is going to need signalling, other peer-to-peer messaging, presence, persistent addressing, user authentication, security, reliable messaging, pub-sub, and/or federation, you’re going to save yourself a lot of effort by using XMPP.

    • I think we are in agreement.

      The only place that we might defer is in how many of the use cases will end up needing that much or how many won’t need to divert in ways that will be hard to acheive with something like XMPP or SIP.

      • I’m not implying that you need to use *every* feature, you know. I’d have thought that just signalling, security, and reliability would be enough to make most people find it an attractive option. Basic signalling is relatively easy, but security and reliability turn out to be quite hard, and personally I prefer leaning on the work of real experts.

        • Now let’s see…
          Just signaling – websites are doing this for years already. Security – ask Paypal and your bank – they do that over the web quite well. Reliability – got that covered.

          Where does XMPP or SIP help me here? I’d use SIP only if I must, and XMPP if my use case relies heavily on presence (and the way presence is modeled by XMPP).

  4. Philipp Hancke says:

    I have yet to see all those innovative usecases you talk about where Jingle or SIP are not capable of expressing the signalling semantics.
    I hit (and report) bugs in chrome whenver I try something “innovative”, so the lack of bug reports suggests to me that nobody is innovating here.

    • I have yet to see a vendor that selected SIP or XMPP for his WebRTC service if he didn’t have an existing deployment with one of these protocols already – and even then, the decision isn’t always SIP or XMPP.

      These services and their developers? They made that decision without consulting me.

Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">