Now that we have WebRTC, people are starting to think about SIP over HTML.
At one point in time, we tried to struggle around the idea of going with a video conferencing offer on the web. That was 2 or 3 years ago – before WebRTC. We ended up ruling HTML out and focusing on Flash, which was then ditched due to some technical issues we had. The project got parked.
When WebRTC came out, the architectural solutions I thought you’ll be putting around it look somewhat like this:
You get your current VoIP infrastructure, which probably uses SIP, connected to a new specialized gateway, which then “talks” with web browsers using a proprietary protocol over JavaScript. Media is sent using WebRTC – either directly between endpoints or routed through a media server where transcoding is required.
But now, it seems, there are those who are working on placing SIP right inside the web browser as well. I’ve found three such projects: sip-js, sipml5 and SIP on the Web. This changes the architecture:
Now we have native SIP going end to end at all times. There is a small proxy that translates the transport from TCP/UDP to WebSocket. A media server for transcoding might still be required though.
The gravity of it is huge. This means that now native web applications can be written in a way that require little change in current VoIP architecture and infrastructure. And there are those who are on top of it already – Kevin P. Fleming announced on the Asterisk blog their plans to support WebRTC:
Very soon (hopefully in the next few weeks), we’ll be able to demonstrate a standards-compliant audio/video call from a browser through Asterisk (to any destination or channel technology that Asterisk supports) without the use of any plugins or native code in the browser, at all. Pure HTML5 and JavaScript on the browser end, and new modules on the Asterisk end.
It can’t get any better than that.
SIP in HTML5 – The Library
I took the liberty of looking a bit closer to the sipml5 version, to see what it means. Here are a few observations:
- The package itself is rather simplistic. It provides the basic implementation of a SIP user agent, with the usual supplementary services.
- It comes with presence and instant messaging (a real plus)
- It provides no security related features – no TLS or authentication
- It is probably enough for a lot of the solutions out there…
- The library size at this point is 1.8 Mb. You can expect it to grow considerably as additional features are added. Compared to Strophe.js XMPP library which weighs a mere 137 Kb that’s a lot
The Timing
Why only now? Why did it take so long for SIP to become a web resident, where XMPP (a “competing” protocol) has multiple such libraries written in JavaScript already?
- XMPP was designed to be aligned to HTTP, making it easier to implement
- SIP required native TCP support, something possible only with the introduction of HTML5 WebSocket (and then some standardization wizardry to hook them together)
- Now that we have WebRTC, it made sense to get the signaling up and running in a browser as well
There’s a lot of innovation happening around WebRTC – and not only with voice and video communication. SIP in HTML was the missing piece in the puzzle of connecting current VoIP with the web browser. And now, that piece is here.
Mr. Tsahi,
I am Viet, a WebRTC newbie. I just two things, Media Relay is required just for Video Calls. In other word, Media Relay could be skipped if just Audio Call only.
Next, multi-communication features. while Flash support multi-communications but WebRTC not (which just does P2P). This could be get over with:
1. App opens and control several P2P sessions at a same time. I am going to test this.
2. If 1 impossible, we must have Service on Cloud to do.
I also interested in future of WebRTC, I am still trying to define venture business with WebRTC. That is why I love to read your scripts.
Viet
Viet,
Voice is the same as video when it comes to P2P – when video needs a media relay so does voice. It doesn’t happen all the time – it depends on the whereabouts of both parties and the type of NAT/firewall they are behind.
If you are trying to achieve multi-point video call then I am sure this is in WebRTC’s roadmap, as Google is planning on consolidating Google Voice, Talk and Hangout – and it is a reasonable assumption that they will want to use WebRTC for that. Google Hangout does multi-point.
4 years elapsed since this article was published and we still don’t have WebRTC support in half of browsers (IE, Edge, Safari),
The best approach for VoIP in browsers seems to be hybrid solutions like combining WebRTC with NPAPI and using which is supported by the actual browser, not to talk about the complexity required to setup your WebRTC solution properly (TURN servers for ICE). So it is a total mess unless you buy a commercial solution like the mizu webphone to handle all these consistently.
I tend to disagree, and will write about it in the near future.
One of the reasons why is that in the 4 years that elapsed, IE market share plummeted. Safari hasn’t grown either and Edge isn’t interesting (and has/will have WebRTC soon enough).
Moreover, if what you are looking for is a communications tool, then apps (mobile or desktop) are the mainstream. The mizu webphone might be a great tool, but I have a feeling it isn’t for everyone either.