Can a native media engine beat WebRTC’s performance?

By Tsahi Levent-Levi

February 6, 2023  

WebRTC is the best media engine out there. And it has nothing to do with its performance…

I’ve been part of the video conferencing industry throughout the first decade of the 21st century and a bit of the 2nd decade as well. The driving force at the time was resolution and frame rate. There was an arms race among vendors as to who provides higher resolutions and frame rates in their room system. A lot of the ethos at the time was the implementation of proprietary media engines that were built for the task at hand. Optimizing and fine tuning them for media quality was considered a core competency.

Fast forward to 2023, what should be the mindset and ethos today?

🤔👉 This is a kind of a continuation to my article on the WebRTC predictions for 2023

What is a media engine?

In the context of VoIP and WebRTC, a media engine is a component that takes care of media processing. Simplifying it, a media engine implementation does something like this:

  • Capturing the raw data from the input devices (camera and microphone, but also the display)
  • Encoding that media and then sending it over the network (with WebRTC, that’s using SRTP)
  • Receiving the media from the network and then decoding it
  • Playing it back to the speakers and the display

The media engine also deals with improving voice and video – things such as echo cancellation, noise suppression, packet loss concealment, background blurring, etc.

WebRTC (and libWebRTC) as a media engine

One of the descriptions of WebRTC that I love is that WebRTC is a media engine with a JavaScript API on top.

Google’s implementation of WebRTC is libWebRTC. Originally, it came from its acquisition of GIPS (Global IP Solutions) – a company that licensed their proprietary media engine to VoIP developers. Google took that library, sprinkled the WebRTC API definition on top of it and integrated it with their Chrome browser.

10 years ago, there were other media engines as well. Most large vendors built and maintained their own media engine – especially if their market was video conferencing.

WebRTC, being a standard on both network and interface later, with libWebRTC being an open source implementation of it (that is maintained by Google AND integrated inside the most popular web browser) – became the best media engine out there practically overnight (or at least within 10 years and through a pandemic).

Joining a video call in your browser? Great! If you aren’t using Zoom, then 99.99% chance that what you are using is WebRTC, with the libWebRTC implementation.

Can a media engine other than WebRTC perform better?

Made with Midjourney

Yes.

But what does that even mean?

What does performing better than WebRTC mean exactly?

  • If it supports HEVC. Is it better?
  • Let’s say it uses 10% less CPU. Is it better? How about 30% less memory consumption. That’s definitely much better
  • The video encoder compresses the same video input at 5% less bits with similar video quality. Is it better now?
  • It has more resilience to packet losses. It must be better!
  • Offering more voice codecs makes it better. Obviously…
  • …

libWebRTC isn’t the best media engine out there. At least not in that one (or more) parameters you’ve decided to compare it with your own proprietary alternative. But does it even matter?

Advantages of native (and proprietary) media engines

Building and maintaining your own native and proprietary media engine? Good for you! Lets’ see what advantages you gain by doing that:

  • You own and control your destiny
    • The code is yours
    • Along with it, the ability to modify it at will
  • Your application, your behavior
    • libWebRTC is optimized for… well… nothing. Almost – it is optimized for Google’s own needs
    • Your implementation of a media engine can be optimized to the exact needs, architecture, hardware and software that you use
  • Easy to differentiate
    • You own the code. You modify it to your heart’s content
    • This means that media specific capabilities can be unique and differentiated

Challenges of native (and proprietary) media engines

Now that we’re happy with building our own native and proprietary media engines, lets see what are our challenges:

  • Resources
    • Developing and maintaining media engines is ridiculously expensive and time consuming
    • There aren’t a lot of experienced media engine engineers out there waiting in line to be hired
  • Availability
    • Where exactly is your media engine running? Windows?
      • Now we need it for Mac
      • Next week on iOS and Android
      • And on a gazillion of devices and chipsets
    • Every new device permutation you need to support is a new headache to deal with and optimize for
    • Did I mention it takes time and money to do that?
  • Browsers
    • You’ve got your super perfect solution, but what happens the moment your customers want to be able to use it in a browser?
    • That’s when you need WebRTC…
    • And for that, you need to gateway and interoperate between your own media engine and the WebRTC implementation found in browsers
    • In most cases, doing that will degrade the media experience AND remove most of your proprietary differentiated features

WebTransport, WebCodecs, WebAssembly

We’re in the 3rd year of the WebRTC unbundling trend. This is still early days.

WebAssembly is here. It is powerful. And it is used more and more, with ever increasing usefulness.

WebTransport and WebCodecs are still great experiments – usable mostly for proof of concepts or early implementations. Using these to power a full fledged media engine that doesn’t make use of WebRTC is still a challenge.

Not all browsers support these interfaces, and those that do still have instabilities and a lot of optimization work to pore into them.

Using these is a long term investment that won’t offer a usable solution for 2023.

Why would I choose WebRTC as my media engine every day of the week?

Going to use your own native and proprietary media engine implementation? Good for you!

But do you need browser support in your application? Are these 5% of the user base or interactions or is it more like 50% or more?

Are you looking to make use of open source media servers and components? If so, then are these available for your proprietary implementation or will it be easier to just use ones that support… 🤔 WebRTC!

Assuming you need browser support for your application and that said browser support isn’t there just as another unused feature to win a customer deal (and then lay forgotten somewhere), then you should just use WebRTC.

Why?

Because at the end of the day, that’s what browsers have available for you.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked

  1. Great post Tsahi!

    I think the costs associated with developing something akin to WebRTC should be mentioned. I think Google already spent $1B+ in development, so it's hard too see how developing it in house would justify this, unless it would be constrained to run on a specific platform and without many features.

    WebTransport, WebCodecs andWebAssembly are interesting technologies. I suppose we might see one day a Zoom version for the browser leveraging these technologies?

  2. Another advantage of a media engine is handling scenes with fast movement. The typical video conferencing is talking heads with not much movement. But you can test the quality even by moving your hands fast and see how it looks on the receiving side. I think that what you can call the webRTC is good enough quality implementation that will progress over time.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}