Why is there only a Single Voice Codec in WebRTC?

03/07/2014

There are 2, but actually… just one.

There’s a large number of voice codecs out there. Many of them used quite a bit. At the end of the day, WebRTC leaned towards using G.711 and Opus. Why is that?

G.711

Consider G.711 the fallback to crappy audio.

G.711 is naive and stupid. It does nothing well, eats up bandwidth and sensitive to network conditions. The only thing it has going for it is that it is supported everywhere.

Those using it today do so to connect to existing systems – mainly because transcoding from Opus to whatever they have on the other side requires more effort.

Opus

Opus is considered by some the best codec in existence today. It is brand new – it is newer than WebRTC itself. And it is heading towards non-WebRTC VoIP products as well.

What makes Opus interesting is its unique design.

From narrowband to fullband

The human ear hears only parts of the sounds around us. The hearing spectrum of humans is usually split into 4 broad categories:

Voice bands

  • Narrowband – you can also refer to it as what you hear on a normal phone call today (i.e. nothing)
  • Wideband – something that captures speech nicely, but doesn’t work that well for music. This is what is known as HD voice today when rolled out in some of the mobile phone carriers
  • Super-wideband – that would be something that covers music as well as speech (and would definitely be good enough for my ears)
  • Fullband – what audiophiles will be looking for

The higher the bands, the more bits you need to express them when you capture and send them – and the more demand there will be on your hardware components and CPU processing.

Different codecs are designed to work for different bands. Here’s a general rule of thumb for some of these codecs:

Codec Quality
G.711 Narrowband
G.723 Narrowband
G.729 Narrowband
iLBC Narrowband
AMR-NR Narrowband
AMR-WB Narrowband and Wideband
Speex Narrowband and Wideband
G.722 Wideband
G.719 Super-wideband
AAC Fullband
Vorbis Fullband
MP3 Fullband
Opus Narrowband, Wideband, Super-wideband and Fullband

Opus is the only codec that fits everything from narrowband to fullband. Which brings us to the next characteristic of Opus.

2 for the price of one

Opus is actually 2 codecs, baked into 1:

  1. SILK, the codec introduced by Skype. Its focus is low bitrate speech
  2. CELT, a new codec, which focuses on music and high fidelity

My first thought when I learned that was that based on the specific call, Opus will choose one of these codecs, but apparently it is a lot smarter than I am – it uses both at the same time:

Opus hybrid use of SILK and CELT

At any given moment, Opus can encode a single audio sample by breaking that sample up to two parts, encoding narrowband and wideband frequencies using SILK and the super-wideband and fullband using CELT.

This hybrid mode allows a lot of flexibility in Opus and also ensures in a way that there can be optimizations in implementations based on the use case and processor capabilities.

FEC and flow control

Opus has built in FEC and flow control mechanisms.

FEC stands for Forward Error Correction. When things go bad on a network, packets get lost. Opus has the ability to send additional packets serving as a kind of insurance – if packets get lost, then these packets can be used to regenerate the lost packets.

FEC adds robustness at the cost of bandwidth and improves media quality.

What is interesting here, is that FEC can done only the SILK codec only, making sure you hear voice well, but losing the higher bands of music. This saves up bandwidth and processing.

Flow control is what enables a codec to negotiate and indicate any commands and information it requires to change behavior during the call. Some codecs have external flow control mechanisms, where the control messages are sent over the signaling channel or the RTP that wraps around the codec.

Opus does all flow control by itself. This means the codec’s implementation is packaged nicely with little points of integration to other layers. It makes it easier to maintain.

Why is this important?

There are many codec implementations out there for voice and audio, and yet WebRTC decided to focus on a single one – Opus. Having more codecs means headaches to implementers in many levels:

  • Footprint of the implementation grows
  • Need to handle patents and royalties (most codecs have that problem)
  • Deciding on a codec in a call becomes more complex

WebRTC leaned towards using a codec that can fit as many use cases as possible without sacrificing quality. What it did sacrifice is interoperability – you seend sever side transcoding for that.

For now, WebRTC’s selection of a voice codec means it can offer the best audio experience compared to other VoIP and communication systems.

I have no doubt that a newer and better voice codec will present itself. How will that be handled with WebRTC will be interesting to see.

Responses

Michael Graves says:
July 3, 2014

In truth, WebRTC and Opus are somewhat codependent. Without the WebRTC battlefront Opus would not see widespread uptake as quickly as it has. Without Opus the WebRTC effort would be mired in even more debate about codecs.

Reply
Techie007 says:
December 15, 2014

Opus and Vorbis support in Internet Explorer for HTML5 is now up for vote. Please vote it up here so that webpages using HTML5 audio with Opus will work in Internet Explorer (all you need is an email address to sign-in): https://wpdev.uservoice.com/forums/257854-internet-explorer-platform/suggestions/6513488-ogg-vorbis-and-opus-audio-formats-support-firefox

Chrome, Firefox, and Opera already support these awesome codecs.

Reply
Supported Audio Constraints in getUserMedia() - Pipe Blog says:
November 9, 2017

[…] Opus is a great audio codec. It’s open, royalty free and so flexible it can encode (fullband) music better than the AAC encoder in iTunes but also encode narrowband voice with a latency that’s lower than any other codec’s. Tsahi Levent Levi has a great blog post on why Opus is so great. […]

Reply

Comment