Are you blocked by the rules of your upbringing in your WebRTC application?

By Tsahi Levent-Levi

March 11, 2019  

I know I am. I am constantly surprised what people are doing with WebRTC.

Here’s something I hear a lot:

How do you make a call with WebRTC?

Well… you don’t. Not really. And in many scenarios – that term call, or dialing, or answering – has no real meaning.

Here’s a funny opposite for you:

Kids in front of old phones don’t know what to do. It isn’t “natural”. Guess what? Nothing is. The things that are natural to you are things you’ve learned, and are now used to. They are a set of rules in your upbringing.

If you come from a VoIP background, then WebRTC brings with it quite a challenge to your world. I know – I had 13 years of VoIP background before WebRTC was announced. Since that announcement, I’ve been surprised time and again by what people are doing with WebRTC. Especially people who shouldn’t be able to even use it because they don’t know VoIP enough.

Coming from VoIP? Interested in streaming? Broadcasting? Some other communication use cases? Tomorrow I am hosting a free webinar – Google Does Gaming: WebRTC Man-to-Machine Use Cases

When we all first started out in this adventure called WebRTC, what we’ve seen was video calling. It was all about face to face meetings. It took time to think about WebRTC in other settings and for other use cases.

And here we are. Years later, dealing with WebRTC in the aid of cloud gaming. Google used WebRTC in Project Stream, where they showcased playing the game Spartan through a web browser – the game itself was rendered in Google’s cloud.


(that’s a screenshot of one of my slides for tomorrow’s webinar)

Who would have thought WebRTC would be used for that?

Anyways, if you come from a VoIP background, here are some aspects of WebRTC you’ll need to unlearn and relearn – I am still grappling with them myself every once in awhile:

Signaling? What’s “Signaling”?

With any other VoIP protocol out there, it seems like we’re starting off with signaling.

H.323? Signaling.

SIP? That’s signaling.

XMPP? Ditto.

WebRTC? Nope. No signaling. Sorry.

What does that mean exactly? That you can use whatever signaling mechanism/protocol you see fit. That’s assuming you can get it to run inside a web browser or wherever it is your application needs to operate.

SIP, which is the most popular VoIP signaling protocol out there, is probably an overkill for a lot of WebRTC services. I tend to look at it as a hindrance when I see it in architectures – I often ask time and again why is it there to make sure there’s a real need other than saying someone needed signaling for his WebRTC application.

You. Don’t. Answer. Calls.

There’s no such thing as a call while we’re at it.

I remember doing a live WebRTC training a couple of years back. I had to hammer out of the people the need to ask incessant questions about dial, answer, mute, hold and a bunch of other paradigms they thought are golden rules in communications.

If you feel that way too, then look at that video at the top of this article again. What made sense 20 years ago doesn’t hold water today.

WebRTC isn’t fixed in any specific concept of how “calls” are made. I prefer using the term session and deal with the initiation part of it on a case by case basis.

If there’s no need for dialing or answering – just don’t force it on your WebRTC solution.

It isn’t only Google

Most days of the week, I like thinking of WebRTC as the source code that resides on webrtc.org. That’s the codebase Google is maintaining and putting inside its Chrome browser.

The thing is, many end up modifying it for their own needs. They:

  • Port it over to mobile
  • Fix private bugs in it
  • Add their own minor modifications to it where needed
  • Seriously change it (check out what Discord did)
  • Modify the Chromium version, replace it inside Electron and release their own stuff

There are some really interesting “mods” to the vinyl WebRTC implementations out there, usually held privately for internal use of companies. In many ways, this is a shortcut to building your own media engine from scratch.

There’s more than one way

What I like about WebRTC is that usually, there’s a single way of doing things with it: everything is encrypted – you can’t override that; it defaults to multiplex and bundle its media connections; the list goes on.

How you use it is a totally different story.

Each SFU implementation is different than the other. There are different ways to record a session. Different ideas and approaches to broadcasting at low latency.

The “right” answer differs a lot not only based on the use case, but also on the business model, the developers available, the DNA of the company, etc.

Wasteful can be just fine

There’s also a school of thought that never really existed with VoIP: the “good enough” approach – one where we’re just fine with not optimizing everything and leaving things it a kind of a mediocre stage that is good enough for what we’re trying to do. It may eat up to much bandwidth or tax on the CPU. Or just not be how things are done around here. But it works. Good enough.

Heck – the default WebRTC implementation does it on its own, deciding to waste 1.7Mbps for a VGA resolution encoding instead of limiting it to 800kbps or less. Such a waste of good resources.

I learned to love this approach (and then try to optimize it with my clients).

How do you think about WebRTC?

What about you?

What mistakes you see people make when thinking about WebRTC that fits the web or VoIP better?

What things do you need to unlearn about WebRTC?

Coming from VoIP? Interested in streaming? Broadcasting? Some other communication use cases? Tomorrow I am hosting a free webinar – Google Does Gaming: WebRTC Man-to-Machine Use Cases


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked

  1. What are you talking about dial, answer, mute and hold, when the standards itself talks bout Early Media? 🙂

    But seriously, one of my disappointment is not very many WebRTC apps decouple a “session” and “media flow”. H.323 had codified it in its framework; SIP has it theoretically, but most of the implementations don’t exploit it since their focus has been interworking with POTS.

    1. Aswath,

      Not sure what you mean be decoupling here. I’ve seen many services who don’t interwork or need POTS. In these, the two paths of signaling and media are totally separate.

      Also, whenever a client of mine starts adopting a media server to use, one of the first suggestions I give him is to get the media server to communicate with his application for signaling and not work directly with the devices – making sure that only media gets routed to the media server from devices.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}