2020 marks the point of WebRTC unbundling. It seems like the new initiatives are the beginning of the end of WebRTC as we know it as we enter the era of differentiation.
Life is interesting with WebRTC. One moment, it is the only way to get real time media towards a web browser. And the next, there are other alternatives. Though no one is quite announcing them the way they should.
We’re at the cusp of getting WebRTC 1.0 officially released. Seriously this time. For real. I think. Well… maybe.
If I were to chart our path through this crazy world of WebRTC, it would look something like this:
Towards the end of 2019, and at greater force during the pandemic, we’ve seen how the future of WebRTC looks like. It is all about differentiation.
Up until now, all vendors had access to the same WebRTC stack, as it is implemented by Google (and the other browser vendors), with the exact same capabilities in the browser.
I’ve alluded to it in my article about Google’s private WebRTC roadmap. Since then, many additional signals came from Google marking this as the way forward.
Today, there are 2 separate WebRTC stacks – the one available to all, and the one used internally by Google in native applications. While this is something everyone can do, Google is now leveraging this option to its fullest.
The interesting thing that is happening is taking place somewhat elsewhere though. WebRTC is now being unbundled so that Google (and others) don’t need to maintain two separate versions, but rather can have their own “differentiation” implemented on top of “WebRTC”.
At this point, you’re probably asking yourselves what does that mean exactly. Before we continue, I suggest you watch the last 15 minutes from web.dev LIVE Day Two:
That’s where Google is showing off the progress made in Chrome and what the future holds.
The whole framing of this session feels “off”. Google here is contemplating how they can bring a solution that can fit Zoom, that when 99% of all vendors have figured out already how to be in the browser – by using WebRTC.
The solution here is to unbundle WebRTC into 3 separate components:
- WebTransport – enables sending bidirectional low latency UDP-like traffic between a client and a “web server”, which in our context is a media server
- WebCodecs – gives the browser the ability to encode and decode audio and video independently of WebRTC
- WebAssembly – a browser accelerator for running code and an enabler for machine learning
While these can all be used for new and exciting use cases (think Google Stadia, with a simpler implementation), they can also be used to implement something akin to what WebRTC does (without the peer-to-peer capability).
WebTransport replaces SRTP. WebCodecs does the encoding/decoding. WebAssembly does all the differentiation and some of the heavy lifting left (things like bandwidth estimation). Echo cancellation and other audio algorithms… not sure where they end up with – maybe inside WebCodecs or implemented using WebAssembly.
What comes after the unbundling of WebRTC?
This isn’t just a thought. It is an active effort taking place at Google and in standardization bodies. The discussion today is about enabling new use cases, but the more important discussion is about what that means to the future of WebRTC.
As we unbundle WebRTC, do we need it anymore?
With Google, as they have switched gears towards differentiation already, it is not that hard to see how they shift away from WebRTC in their own applications:
Google Stadia is all about cloud gaming. WebRTC is currently used there because it was the closest and only solution Google had for low latency live streaming towards a web browser.
What does Google Stadia need from WebRTC?
- The ability to decode video in real time in the browser
- Send back user actions from the remote control towards the cloud at low latency
That’s a small portion of what WebRTC can do, and using it as the monolith that it is is probably hurting Google’s ability to optimize the performance further.
Sending back user actions were already implemented in Stadia on top of QUIC and not SCTP. That’s because Google has greater control over QUIC’s implementation than it does over SCTP. They are probably already using an early implementation of WebTransport, which is built on top of QUIC in Stadia.
The decoding part? Easier to just do over WebTransport as well and be done with it instead of messing around with the intricacies of setting up WebRTC peer connections and maintaining them.
For Stadia, unbundling WebRTC will result moving away from WebRTC to a WebTransport+WebCodecs combo is the natural choice.
Google Duo & Google Meet
For Duo and Meet things are a bit less apparent.
They are built on top of WebRTC and use it to its fullest. Both have been optimized during this pandemic to squeeze every ounce of potential out of what WebRTC can do.
But is it enough?
Differentiation in WebRTC
Google has been adding layers of differentiation and features on top and inside of WebRTC recently to fit their requirements as the pandemic hit. Suddenly, video became important enough and Zoom’s IPO and its huge rise in popularity made sure that management attention inside Google shifted towards these two products.
This caused an acceleration of the roadmap and the introduction of new features – most of them to catch up and close the gap with Zoom’s capabilities.
These features ranged from simple performance optimizations, through beefing up security (Google Duo doing E2EE now), towards machine learning stuff:
- Proprietary packet loss concealment algorithm in native Duo app
- Cloud based noise suppression for Meet
- Upcoming background replacement for Meet
Advantages of unbundling WebRTC for Duo/Meet
Can Google innovate and move faster if they used the unbundled variant? Instead of using WebRTC, just make use of WebTransport+WebCodecs+WebAssembly?
What advantages would they derive out of such a move?
- Faster time to market on some features, as there’s no need to haggle with standardization organizations on how to introduce them (E2EE requires the introduction of Insertable Streams to WebRTC)
- Google Meet is predominantly server based, so the P2P capability of WebRTC isn’t really necessary. Removing it would reduce the complexity of the implementation
- More places to add machine learning in a differentiated way, instead of offering it to everyone. Like the new WaveNetEQ packet loss concealment was added outside of WebRTC and only in native apps, it could theoretically now be implemented without the need to maintain two separate implementations
If I were Google, I’d be planning ahead to migrate away from WebRTC to this newer alternative in the next 3-5 years. It won’t happen in a day, but it certainly makes sense to do.
Can/should Google maintain two versions of WebRTC?
Today, for all intent and purpose, Google maintains two separate versions of WebRTC.
The first is the one we all know and love. It is the version found in webrtc.org and the one that is compiled into Chrome.
The other one is the one Google uses and promotes, where it invests in differentiation through the use of machine learning. This is where their WaveNetEQ can be found.
Do you think Google will be putting engineers to improve the packet loss concealment algorithm in the WebRTC code in Chrome or would it put these engineers to improve its WaveNetEQ packet loss concealment algorithm? Which one would further its goal, assuming they don’t have the manpower to do both? (and they don’t)
I can easily see a mid-term future where Google invests a lot less in WebRTC inside Chrome and shifts focus towards WebTransport+WebCodecs with their own proprietary media engine implementation on top of it powered by WebAssembly.
Will that happen tomorrow? No.
Should you be concerned and even prepare for such an outcome? That depends, but for many the answer should be Yes.
The end of a level playing field and back to survival of the fittest
WebRTC brought us to an interested world. It leveled the playing field for anyone to adopt and use real-time voice and video communication technologies with a relatively small investment. It got us as far as where we are today, but it might not take us any further.
For this to be sustainable, browser vendors need to further invest in the quality of their WebRTC implementations and make that investment open for general use. Here’s the problem:
Doesn’t really invest in anything of consequence in WebRTC.
- They seem to care more about having an HEVC implementation than in getting their audio to work properly in mobile Safari in WebRTC
- To date, they have taken the libwebrtc implementation from Google and ported to work inside Safari, making token adjustments to their own media pipelines
- I am not aware of any specific improvements Apple made in Safari’s WebRTC implementation to quality via the media algorithms used by libwebrtc itself
Apple cares more about FaceTime than all of that WebRTC nonsense anyways…
Actually have a decent implementation.
- While Firefox uses libwebrtc as the baseline, they replaced components of it with their own
- This includes media capturer and renderer for audio and video
- They have invested a lot in improving the audio pipeline in Firefox, which affects quality in WebRTC
Their latest Edge release is Chromium based.
- They aren’t doing much at the moment in the WebRTC part of it is far as I am aware
- They could improve the media pipeline implementation of Chromium (and by extension Edge) for Windows 10
- Do they have an incentive?
- Would they contribute such a thing back to Google or keep it in their Edge implementation?
- Would Google take it if Microsoft gave it to them?
And then there’s Microsoft Teams, which offers a sub par experience in the browser than it does in the native application. All of the investment of Teams is going towards improving quality and user experience in the app. The web is just an afterthought at the moment
Believe WebRTC is good enough.
- There are some optimizations and improvements that are now finding their way into WebRTC in the browser
- But a lot of what is done now is kept out of the web and the open source community. WaveNetEQ is but an example of things to come
- It is their right to do that, but does this further the goal of WebRTC as a whole and the community around it?
Now that we’re heading towards differentiation, the larger vendors are going to invest in gutting WebRTC and improving it while keeping that effort to themselves.
No more level playing field.
Prepare for the future of WebRTC
What I’ve outlined above is a potential future of WebRTC. One that is becoming more and more possible in my mind.
There’s a lot you can do today to take WebRTC and optimize around it. Making your application more scalable. Offering better media quality as well as use experience. Growing call sizes to hundreds or participants or more.
Investing in these areas is going to be important moving forward.
I’ve recently created a workshop covering the present and future of WebRTC, along with techniques and best practices employed by vendors in this space. If you want to learn more, you may want to take that workshop.