Google’s private WebRTC roadmap for 2020 = AI

06/01/2020

Google’s plans for WebRTC have either changed or finally got revealed. Where? In its internal WebRTC roadmap.

WebRTC is many things.

On one hand, it is a standard specification at the W3C (and is reaching 1.0 milestone).

On the other hand, it is an open source project. While there are a few such projects today, the most important one is Google’s webrtc.org. This is the code that gets into Chrome itself and the one being adopted by many (simply because it is already highly optimized for the main scenarios. And… it is free).

Google made it super simple for companies to adopt its WebRTC implementation – it uses a BSD open source license, making it quite permissive.

In the last 8 years, we’ve been treated like royalty, having access to a world-class media engine implementation for free.

The WebRTC roadmap we’ve seen so far from Google had 3 types of features in it:

  1. Making sure the implementation fits the spec
  2. Improve the architecture to perform better
  3. Add features specific to Google’s needs in other projects (not necessarily abiding to the spec)

At all times, these were available to everyone.

Google’s intent in open sourcing WebRTC

When WebRTC was first introduced it was about who has the balls to take something that up until that point was considered a core competency and make it freely available. This was a piece of technology that video conferencing companies protected fiercely, battling about through their sales and marketing pitches, each claiming to have superior media quality. At the time, media quality wasn’t in the “good enough” position that it is today:

Google made the calculated risk at the time:

  • Media quality was improving. So were bandwidth available and compute. It made sense that it would get to a point of “good enough” within a few years time
  • A migration to the cloud for video conferencing wasn’t at most companies agenda yet, but as cloud migration started picking up everywhere, it made sense to occur here as well. These cloud migrations took place hand in hand with the use of browsers
  • Dominance in browsers and lack of a real operating system footprint meant needing to have a media engine as part of the browser
  • Google had no leading service in video conferencing. Google Hangouts was available, but wasn’t any real competition to the leading platforms at the time, so they didn’t have much to lose by the decision

Other vendors just following along in the ride, making minor contributions here and there. Today, the leading (and only) media engine out there for WebRTC is still the Google one. At least in any meaningful way. So much so that Google’s “competitors” are using Google’s WebRTC stack directly in their products.

Where has this lead Google?

WebRTC is a huge success. All modern browsers now support it. They interoperate (to a good extent). Today, in every industry and market where live or real time media is needed, WebRTC is playing an important role.

But what about Google and WebRTC? What success did Google exert from WebRTC?

Not a lot. Or at least not enough.

Google uses WebRTC in the following services it offers:

  • Hangouts / Google Meet
  • Duo
  • Stadia
  • Chrome Remote Desktop
  • YouTube Live

Lets see how well did Google fare in each.

Hangouts / Google Meet

I use these two services almost on a daily basis. My calendar meetings default to them simply because they are so each to schedule with the Google Calendar. They offer what I need without any of the complexity.

But.

When you read or hear discussions about the video conferencing market, the vendors mentioned are usually Zoom and Cisco. Maybe Microsoft Teams or Skype for Business. Also Bluejeans and Pexip. A few others. Google isn’t one of the top vendors that come to mind here. Even though their service is rather good.

Did I mention that almost all their competitors are using WebRTC as well?

Duo

Duo. Google’s answer to Apple’s FaceTime.

It is a standalone video calling app available on Android and iOS. It isn’t installed by default on most smartphones and users need to actively find it, install it and make a decision on using it. Not an easy feat.

Why hasn’t Google nailed and bolted it smack into Android? Probably due to carriers and not wanting to hurt their feelings (and Google’s relationship with them). Otherwise, it makes no sense for Google to try and compete with the likes of FaceTime with one hand tied behind their backs.

Anyways… Duo is quite popular. Even on iPhone. It is ranked #7 in the social apps in the Apple App Store. This is higher than Houseparty (positioned somewhere at #17-20), which is rather interesting considering the high engagement Houseparty sees for its users.

Google doesn’t share any stats on usage of Duo. The only thing we know is downloads and the number of people who ranked it – two stat points that are useless for social networks. This is quite telling to the real usage numbers – not publishing them means they aren’t on par with the competition.

Curious myself, I’ve put out a quick poll on Twitter:

This is most definitely NOT the way to know or understand usage, but it is interesting.

My audience is probably tech savvy. Those answering the poll are highly likely to know about WebRTC. And still. We have over 50% who never tried it and 13% who use it. I’d consider 13% quite a lot and surprising. But it isn’t scratching the surface of where it should be given that Google owns and controls Android.

Stadia

Google Stadia is something totally different. It is cloud gaming. The game is being processed and rendered in “the cloud” and gets streamed in real time to your device using WebRTC. Google even made modifications to its WebRTC implementation to make it a better fit for gaming.

The concept is great. The technology is solid. The experience is said to be good (if you’re close enough to the data center and have a good network connection).

From the media, it seems like there are hurdles and challenges to the Stadia launch – this type of an article titled “Stadia’s biggest problem? Google” or this one titled “Google Duo is the best video calling service you’re not using” are rather common. Especially when put in comparison to the Apple Arcade launch.

Looking at Google Play store numbers for the Stadia app, things look rather disappointing: below 1M installs so far:

I have this feeling Google expected more.

Cloud gaming is still new and nascent. It will take time to happen and mature.

Take this into an adjacent industry, Netflix introduced streaming in 2007. It took them 3-4 years for the stock to take notice and the service to mature enough to make a dent in the industry. Whereas today, every other production studio is launching their own streaming service.

Will Google have the patience with Stadia to get there or will it end up shutting it down like many other “experiments” it has been running throughout the years? The thought itself is making it hard for Google to entice game developers to jump on its platform.

Chrome Remote Desktop

Google apparently has a remote desktop service. It makes use of WebRTC’s screen sharing capability and is called Chrome Remote Desktop.

While I haven’t used it myself, this does seem to have quite a following. 10M+ installs on Android, The Chrome extension shows ~4.8M users.

There is no apparent business model as the service is offered freely, and while the market has similar paid services, it doesn’t seem to be big enough to attract a company like Google. This isn’t interesting enough to value an investment in WebRTC itself by Google.

YouTube Live

YouTube has the ability to host live events. And it does that with the help of WebRTC.

That said, its use of WebRTC isn’t an impressive one – it is just a window into the service if you want to broadcast from your browser. It isn’t used for live streaming to the users themselves. There’s more on the technical side of it on webrcHacks, where they analyze what goes on the wire with YouTube Live.

Here’s the thing – just like Chrome Remote Desktop, this is Google exploiting a technology that is there. It isn’t about leading the industry or the market with it. And as with Chrome Remote Desktop, it isn’t of enough value to make it worth their while to invest in making WebRTC itself better.

WebRTC is now part of HTML5 and part of what browsers need to do, so Google needs to invest in having it in Chrome. How much to invest is the real challenge.

To WebRTC or not to WebRTC?

Meet, Duo and Stadia seem to be the leading factors in whatever Google is doing in WebRTC, other than dealing with complaints and feedback from the community.

Google Meet

Google Meet is using VP9. It is one of the only group calling services running in production at scale that have made that shift.

By harnessing WebRTC and owning its roadmap, Google is able to experiment and build their service faster than others can on WebRTC.

Two interesting examples we’ve had in the past year –

1. At Kranky Geek 2018, Google showed an experiment of using WebAssembly with WebRTC to improve video switching in a conference by distinguishing noise and speech:

Did it find its way into Google Meet? Maybe.

Then there’s the new captioning feature in Google Meet, which Gustavo nicely explains. It uses the data channel in WebRTC to send back the results. Assuming anything in WebRTC was needed to change to make this work better, Google could do that as it owns the WebRTC roadmap.

Google Meet, being predominantly a browser based experience, will need to rely on changes made directly into WebRTC or things that can be bolted on top using WebAssembly.

Google Duo

Google Duo is a mobile first service. It has browser support via Duo for Web, but for the most part, it is meant to be used on your smartphone.

Last month, Google announced some new features in Pixel phones, but also 3 machine learning based improvements for Duo:

Auto-framing:

“Auto-framing keeps your face centered during your Duo video calls, even as you move around, thanks to Pixel 4’s wide-angle lens. And if another person joins you in the shot, the camera automatically adjusts to keep both of you in the frame.”

We’ve seen Facebook do that in Portal and a few video conferencing vendors adding that to their room systems.

Packet loss concealment:

“When a bad connection leads to spotty audio, a machine learning model on your Pixel 4 predicts the likely next sound and helps you to keep the conversation going with minimum disruptions.”

Packet loss concealment using machine learning is something not many are doing (or publishing that their are doing).

Background blur:

“you can now apply a portrait filter as well. You’ll look sharper against the gentle blur of your background, while the busy office or messy bedroom behind you goes out of focus.”

Another nice feature, which is available in other services such as Zoom.

From the looks of it, auto-framing and background blur rely on hardware based capabilities of the Pixel devices. Packet loss concealment… a lot less so.

Could we see machine learning based packet loss concealment find its way into the WebRTC codebase? (where it makes the most sense to add it instead of as an external piece of software). Not soon…

Google Stadia

For Google Stadia, Google went with QUIC instead of SCTP for the controls. It decided to make use of WebRTC for live streaming itself.

But it wasn’t enough. It needed the low latency of WebRTC to be even lower. So it added a Chrome experiment to enable them to reduce the playout delay in WebRTC. A few of my clients have already adopted it and are happy with the results for their own use case.

Google also tweaked and improved the VP9 decoder to make it work with 4K 60fps streams.

In the case of Stadia, the changes need to be made inside the WebRTC codebase to apply well for its service anywhere.

What is changing with Google’s strategy about WebRTC in 2020?

WebRTC 1.0 is “out”. Almost.

The latest CR (Candidate Recommendation) is dated December 13. Hopefully the last one before we go to the next step. It is interesting to look at the original charter of WebRTC:

It took somewhat longer to get here than originally expected, but we’re almost there.

Google held its internal milestone of WebRTC 1.0 code complete two months back.

What now?

Besides housekeeping, bug fixes, and talking about WebRTC NV (the next version), I think a lot of it will change internally at Google to how can they make more of their investment in WebRTC and stay or become more competitive in the market. This being an open source project, means that some features will need to be kept out of the open source codebase. Like the new packet loss concealment mechanism in Google Duo.

How is that achievable?

The leading factor is going to be adding more flexibility and control to developers over what WebRTC is and how it operates. Ideally by using WebAssembly and in the future by using WebTransport and WebCodecs, two new initiatives that will unbundle a lot of what WebRTC is.

This gives the ability to take out improvements out of the baseline implementation and introduce them as proprietary features.

The demarcation line of what will go into the WebRTC codebase by Google and what will be kept out of it is going to be the use of machine learning and artificial intelligence. Whenever a feature makes use of learned machine learning models, Google will most probably try to keep that implementation out of WebRTC. Why? Because it has the greatest value and the highest investment today.

Should this worry you?

Maybe, but it is to be expected.

Google has invested heavily in WebRTC. Without this investment nothing that we see and use today in WebRTC and take for granted would have been possible.

It is even surprising that it lasted this long…

WebRTC closes the basic gaps and requirements of media engines. It is good enough. If you want to improve upon it, differentiate or be at the cutting edge of the WebRTC technology, you will need to invest in it yourself as well. Relying only on Google isn’t an option. And probably never really was.

Here’s to an interesting and eventful 2020 with WebRTC!

To learn more about the future of WebRTC, I suggest you go ahead and also read about WebRTC unbundling.

Responses

Lawrence Byrd says:
February 28, 2020

Tsahi,

Your analysis of “value of WebRTC to Google” is based solely on their apps, where I agree results have been iffy. However, a significant additional goal (stated at the time) and benefit has been to drive Chrome to be the #1 used browser by making it a complete application delivery platform. And this then supports the whole Chromebook market as well – which could not support many widely used use cases like contact centers and video sessions (with anyone’s conferencing apps) without WebRTC. Of course WebRTC was just a piece of HTML5 and other capabilities needed for Chrome/Chromebook success, but an important one.

My belief is that about $1 billion has been spent on the “free” WebRTC core technology we all use every day, accounting for Google’s original acquisitions, plus follow-on work by Google and others. What’s your estimate?

This all means that we in this part of the communications industry owe Google a big Thank You!

Reply
    Tsahi Levent-Levi says:
    February 28, 2020

    I think there’s a different purpose/focus to the work in Google done in 2020 than in 2012 with regards to WebRTC.

    I have no clue how much they’ve invested in total, but it would be a sabstantial amount.

    Reply
Brian Burt says:
August 11, 2020

There are other good reasons as well for such an investment. For example: to take away profitability from competitors (see: invest in gdocs; take $ from microsoft, which ms would use to compete with google on other fronts).

Lawrence, share your back of the envelope for the billion? It’s an interesting question. Any idea/guesses how many developer years have been put it into it?

Reply

Comment