VoIP Signaling 101

June 7, 2012
It is time for a short introduction to VoIP signaling. My post on how WebRTC will affect SIP and H.323 prompted a short conversation in the comments section. It got me to the understanding that some of my readers here are a bit new to VoIP but would still want to learn more about WebRTC. This is why I decided to write this post: VoIP Signaling 101.

Think about driving for a moment. I order to get it right, there's a whole infrastructure that is used: cars, roads, signposts, traffic lights, parking spaces, fuel stations and a few other components that I probably missed. VoIP telephony is similar in that aspect – it is more than just a selection of a codec. If I had to split VoIP telephony into components, then there are 3 main aspects I'd focus on:
  1. Network
  2. Signaling
  3. Media

Network

The network gets the data from one point to the other – be it signaling or media. It has its own characteristics which then affect what types of signaling and media are used. In the case of the internet, we can generally speak about these characteristics:
  • No guaranteed quality of service: data sent might not be received
  • No guaranteed latency: data sent may arrive with different delay characteristics
  • Heterogenic in nature: different parts of the network behave differently (WiFi, LAN, Ethernet, MPLS, LTE, etc)
  • Asymmetric: NAT and firewall devices may restrict reaching certain addresses

Signaling

Signaling is the addressing part of VoIP. If I want to reach someone over the network for a voice call, there are a few things I will need to know. At the most basic level, these would be where he is and if he is available or not. Usually, signaling includes:
  • Registration and discovery – how do I tell the world how to reach me?
  • Dialing – how do I dial, receive calls, drop them, etc.
  • Capabilities – how do participants in the call understand what each side is capable of?
  • Supplementary services – the usual suspects: hold, mute, transfer, forward, park, conferencing, …
The above must be done in the same "language" between all participants – all should know how to communicate in SIP, H.323, XMPP, Skype's proprietary protocol or any other "VoIP dialect". And each such protocol has its own nuances and quirks. You can look at signaling for another angle, and that is non-functional features it needs to provide:
  • Networking – IPv4, IPv6, UDP, TCP, etc.
  • Security – things like privacy, authentication, etc.
  • Connectivity – ability to connect endpoints no matter where they are. This focuses mainly on NAT traversal issues
The non-functional features also fit in with requirements for the media.

Media

Media is what we're here for. The moment enough knowledge exist between the participants of the call, media kicks in and starts to flow between the participants to get us our call. Media is usually thought of as the voice and video codecs along with their transport mechanism. But at times, the lines between signaling and media are a little blurry. In recent years, media got wrapped by vendors into components called media engines or media frameworks. These took care of the codecs and their transport. As everything else, these vendors tried (and are trying) to move up the food chain, and for them this means adding some signaling features into these media engines. It works great most of the times, as it eases the integration and the development efforts of application vendors.

Where does WebRTC fit in?

WebRTC is a media engine with a twist – it is targeted at browsers and offers a Java Script API that is being standardized, so it will be available to all web developers eventually. WebRTC also include a bit of signaling into it – the non-functional NAT traversal mechanisms: STUN, TURN and ICE. Luckily, these 3 mechanisms are also the ones defined for NAT traversal use in both SIP and XMPP. - When people compare WebRTC to Skype, they are missing the fact that WebRTC doesn't include signaling and isn't a service but rather an infrastructure protocol. When people compare WebRTC to SIP, they are missing the fact that SIP does signaling and WebRTC does media - and they actually fit rather well together. If you are going to use WebRTC, make sure you know these distinctions and understand what is missing:
  • There's signaling you need to complete
  • Server side components you will need to install (NAT traversal servers and signaling servers)

You may also like