Why is there only a Single Voice Codec in WebRTC?

July 3, 2014
There are 2, but actually... just one. There's a large number of voice codecs out there. Many of them used quite a bit. At the end of the day, WebRTC leaned towards using G.711 and Opus. Why is that?

G.711

Consider G.711 the fallback to crappy audio. G.711 is naive and stupid. It does nothing well, eats up bandwidth and sensitive to network conditions. The only thing it has going for it is that it is supported everywhere. Those using it today do so to connect to existing systems – mainly because transcoding from Opus to whatever they have on the other side requires more effort.

Opus

Opus is considered by some the best codec in existence today. It is brand new – it is newer than WebRTC itself. And it is heading towards non-WebRTC VoIP products as well. What makes Opus interesting is its unique design.

From narrowband to fullband

The human ear hears only parts of the sounds around us. The hearing spectrum of humans is usually split into 4 broad categories:

  • Narrowband – you can also refer to it as what you hear on a normal phone call today (i.e. nothing)
  • Wideband – something that captures speech nicely, but doesn't work that well for music. This is what is known as HD voice today when rolled out in some of the mobile phone carriers
  • Super-wideband – that would be something that covers music as well as speech (and would definitely be good enough for my ears)
  • Fullband – what audiophiles will be looking for
The higher the bands, the more bits you need to express them when you capture and send them – and the more demand there will be on your hardware components and CPU processing. Different codecs are designed to work for different bands. Here's a general rule of thumb for some of these codecs:
Codec Quality
G.711 Narrowband
G.723 Narrowband
G.729 Narrowband
iLBC Narrowband
AMR-NR Narrowband
AMR-WB Narrowband and Wideband
Speex Narrowband and Wideband
G.722 Wideband
G.719 Super-wideband
AAC Fullband
Vorbis Fullband
MP3 Fullband
Opus Narrowband, Wideband, Super-wideband and Fullband
Opus is the only codec that fits everything from narrowband to fullband. Which brings us to the next characteristic of Opus.

2 for the price of one

Opus is actually 2 codecs, baked into 1:
  1. SILK, the codec introduced by Skype. Its focus is low bitrate speech
  2. CELT, a new codec, which focuses on music and high fidelity
My first thought when I learned that was that based on the specific call, Opus will choose one of these codecs, but apparently it is a lot smarter than I am – it uses both at the same time:

At any given moment, Opus can encode a single audio sample by breaking that sample up to two parts, encoding narrowband and wideband frequencies using SILK and the super-wideband and fullband using CELT. This hybrid mode allows a lot of flexibility in Opus and also ensures in a way that there can be optimizations in implementations based on the use case and processor capabilities.

FEC and flow control

Opus has built in FEC and flow control mechanisms. FEC stands for Forward Error Correction. When things go bad on a network, packets get lost. Opus has the ability to send additional packets serving as a kind of insurance – if packets get lost, then these packets can be used to regenerate the lost packets. FEC adds robustness at the cost of bandwidth and improves media quality. What is interesting here, is that FEC can done only the SILK codec only, making sure you hear voice well, but losing the higher bands of music. This saves up bandwidth and processing. Flow control is what enables a codec to negotiate and indicate any commands and information it requires to change behavior during the call. Some codecs have external flow control mechanisms, where the control messages are sent over the signaling channel or the RTP that wraps around the codec. Opus does all flow control by itself. This means the codec's implementation is packaged nicely with little points of integration to other layers. It makes it easier to maintain.

Why is this important?

There are many codec implementations out there for voice and audio, and yet WebRTC decided to focus on a single one – Opus. Having more codecs means headaches to implementers in many levels:
  • Footprint of the implementation grows
  • Need to handle patents and royalties (most codecs have that problem)
  • Deciding on a codec in a call becomes more complex
WebRTC leaned towards using a codec that can fit as many use cases as possible without sacrificing quality. What it did sacrifice is interoperability – you seend sever side transcoding for that. For now, WebRTC's selection of a voice codec means it can offer the best audio experience compared to other VoIP and communication systems. I have no doubt that a newer and better voice codec will present itself. How will that be handled with WebRTC will be interesting to see.

You may also like