The Makeup of a WebRTC API Platform


WebRTC API Platforms are different than the classic/legacy/common CPaaS.

As I am working on getting the final TBDs in my upcoming report update on Choosing a WebRTC API Platform, I wanted to share something that may seem obvious, but probably isn’t.

When talking about CPaaS, WebRTC brings with it something more than just accessibility from the browser.

Here’s the makeup of a CPaaS platform:

There’s backend telephony in there, built out of some VoIP server components, connected to the carriers to handle things like phone numbers and actual calling.

Developers connect to that backend via REST APIs, or some other form of scripting interface.

Latencies and wait times aren’t important for the most part, so the CPaaS vendor doesn’t need to be spread across the globe to provide the service. A couple of data centers for redundancy and some reduction in latencies is usually enough.

Here’s what a WebRTC API platform looks like:

There might or might not be REST APIs. they are important, but definitely aren’t the main way developers interact with the system. That’s done via the SDKs. The SDKs are wrappers around the REST APIs or some other interface (probably WebSocket based), allowing getting the actual media and processing it as part of the SDK – either in the browser or on a mobile device.

And then there’s the backend. Signaling and NAT traversal are rather mandatory. Without them, this won’t be a WebRTC API platform. In the majority of the cases, you’ll also have access to an SFU, allowing you to support group video calls. All that backend? Especially the media parts of NAT traversal and SFU? They have to be as close to the end user as possible, so these platforms often deploy globally, on all possible data centers of a cloud provider (think AWS or GCE) and sometimes running on multiple cloud providers to increase their reach.

The difference then?

  1. SDK that handles actual media processing; with less focus on REST APIs
  2. Globally spread backend, to reduce latencies

The Build vs Buy Challenge of WebRTC API Platforms

There’s a challenge selling to developers. They tend to underestimate the effort involved. And they usually prefer building new shiny toys than polishing and maintaining something that’s working. This is made worse by the seemingly “easy” fashion by which you can get a WebRTC peer-to-peer call happen inside a browser between two tabs. It gives the impression that developing and running WebRTC at scale is trivial.

Especially when you compare it to connecting to a phone number and dialing it. Doing this via an API is easy. But how do you go about dialing out a number on your own without the assistance of CPaaS? Is there a really simple example of this? Not really. This requires more than just programming – the value here is the accessibility to the phone network, which is considered a royal ongoing headache. So it is easy to outsource and to understand its value.

Here’s how the thinking goes:

SDKs? Sure. We can write them.

Signaling? I found a project on github that looks popular enough.

NAT Traversal? Everyone’s already using coturn. Should be simple enough to get it up and running.

SFU? Just passing data around. Can be written in a weekend.

Will WebRTC API Platform vendors be able to overcome this challenge? How can this be explained to developers? There is a lot that goes into building such a platform. More than the mere initial technical hurdles.

Browsers are changing. There are now 4 of them that have “support” for WebRTC. That support is different between browsers. New browser versions break things that used to work before. The specification is being finalized now, but no browser supports it yet.

Media backends need to be maintained. Monitored. Updated. Secured. In an ongoing basis.

In the coming years we will see a shift from H.264 and VP8 video codecs to VP9, HEVC and/or AV1 video codecs. This will require additional investment in the infrastructure.

And still it is believed to be easy and simple.

It isn’t.

Planning on Launching Your Own WebRTC API Platform?

If you are planning to launch your own WebRTC API Platform, then you should know what you’re up against.

In the past 4 years I’ve been looking at this market, analyzing it. Seeing it grow and mature. The report covers 20+ vendors offering WebRTC API Platforms. Most of the are active. A few died or got acquired and taken off market.

One of the things to note is how new WebRTC API Platform vendors make their decision to launch their service. What do they decide to include in their initial launch. What do they use as differentiating factors from the existing players.

The space is rather crowded already, even if no clear winner exists yet.

Make sure to do your homework here. Understand what you’re up against and why should developers come to you and not to others. And plan for the long run.

Planning to Use a WebRTC API Platform?

If you are in the build vs buy decision point, then think of the alternative costs of each approach. Also figure out your time to market and each and the risk of failure. For new projects, I tend to suggest a platform instead of self development. It reduces risk and upfront costs, but more than that, it enables experimenting and proving the business before committing too much into the project.

If you decided to build on your own, make sure your reasoning is rock solid. If the only reason is cost, then I suggest you recalculate.

If you decided to buy into a platform instead, then pick a platform that fits your need. But make sure it is here to stay as much as you can – this market is dynamic and is bound to stay that way for a few more years.

The Report Update

The updated report will get published later this week.

If you want to learn more about it, just contact me.


Adiel says:
December 4, 2017

Thanks Tsahi, as always very informative. I think it also depends on type of provide service. The prices of WebRTC vendors is comparable to price of wholesale VOIP termination (0.004-0.005$ per min). It probably makes perfect sense for enterprise solution but is prohibitively costly in many other cases.

    Tsahi Levent-Levi says:
    December 4, 2017

    Thanks Adiel.

    WebRTC API platforms will never fit everyone. I’d look at the opposite as well – it is hard today to maintain profitability as such a vendor without relying on SMS/phone revenue. So there must be some validity to these price points.

Lawrence Byrd says:
December 5, 2017

“The SDKs are wrappers around the REST APIs or some other interface allowing getting the actual media and processing it as part of the SDK.” You almost say it here, Tsahi, but to re-emphasize: for mobile SDKs the SDKs ARE the media processing engine with all the requirements for video, audio, encode, decode, echo cancellation, noise elimination, fall-back strategies, etc (leveraging WebRTC core tech). For browsers, of course, this is embedded inside the browser, but still has to be controlled and tracked across version changes. But in all cases WebRTC is utilizing a much more distributed media processing architecture, versus PSTN-control, with broad impact on how this “massively distributed processing edge” of the network is managed and its latency and failover across appropriate global points-of-presence, SFUs, signaling servers, etc. As you say, a very different architecture.

As an aside, some of the funnest conversations with startups looking at API platforms are with real-time-experienced founders and CTOs who actually come from the Googles and Facebooks etc. who say “never again do I want to do all that as part of my startup’s core focus!!” 🙂

    Tsahi Levent-Levi says:
    December 5, 2017

    Lawrence – so true – thanks for that addition!

    As for that core focus – I did that in my past at RADVISION and… never again do I want to…

hfan says:
December 11, 2017

Commercial and multi-parties is a key drive to Webrtc API, P2P is very strait-forward apps, most users would build by themown.

Be careful, lot of P2P scenarios is actually 3 parties, when recording is introduced.