Vonage acquires TokBox. Where do we go from here?

August 6, 2018

Video, in the hands of the correct company can be a powerful thing.

In 2012 Telefonica acquires TokBox. I wrote about it at the time – almost 6 years ago. It seems sad reading that piece about TokBox acquisition again. I suggested three areas where Telefonica can make a difference with TokBox. Let’s see what happened.

What Could Telefonica do with TokBox?

What I said in 2012:

Will Telefonica wait the same amount of time it did with Jajah until it does something with this acquisition? I hope they will move faster this time…

Telefonica did nothing with TokBox. They haven’t integrated them into anything. They decided to leave TokBox independent.

This has helped grow TokBox in the 6 years into one of the dominant players in video APIs for real time communications. Almost any developer and initiative that I talk to which has decided to go for a 3rd party platform decided to use TokBox. I see others as well, but not as frequent.

Since the acquisition, TokBox:

  • Switched to WebRTC fully, killing its Flash based solution
  • Increased its session sizes to fit thousands of parallel streams per session
  • Added recording and broadcasting
  • Created their Inspector tool, one of the best I’ve seen on the market for debugging sessions after the fact
  • Cleaned, beefed up and curated their documentation. Again – one of the best I’ve seen on the market for communication APIs
  • They gained customers as well. Per the press release, over 2,300 customers

Telefonica failed to make use of TokBox. It didn’t go into video with it. It didn’t try to figure our VoIP. It didn’t try to understand why developers chose TokBox. Telefonica did nothing other than let TokBox continue in its trajectory. It is probably why Telefonica lost interest and decided to sell TokBox to Vonage.

Telefonica plans on folding TokBox into BlueVia, but how will they combine TokBox, if at all, with their Tu Me VoIP OTT service?

  • Didn’t happen
  • BlueVia died somewhere between 2013-2014
  • Along with Jajah, Tu Me and Tu whatever that Telefonica built
  • VoIP is not a thing for carriers
  • appear.in was sold by Telenor to Videonor
  • AT&T started and stopped its WebRTC APIs initiative
  • What will happen with Deutsche Telekom’s immmr?

Telefonica made no use of its strengths to find synergies with TokBox. Would doing so kill TokBox altogether, or could it made them stronger?

What will Telefonica do about voice? Their main API set doesn’t seem to include voice calling, but now it has video… will they be going for Twilio or Voxeo for that one? Or will they roll out their own? Will they skip voice altogether?

TokBox doubled down on video, beefing up their capabilities in that domain. It has a SIP connector, but nothing more than that. It is a missed opportunity.

Where is TokBox today?

TokBox is video communication APIs. There are other vendors out there doing that today: Twilio, Vidyo.io, Agora, Sinch, Voximplant, Temasys and probably a few others I forgot to mention (sorry for missing out on you).

TokBox are the market leader here, when it comes to breadths of features in the video space.

It just wasn’t enough to get them to more customers and garner more than $35 million in the acquisition. I’d attribute this to:

  1. They weren’t operating as a startup. Being part of Telefonica meant stability, which probably took away their focus on revenue and growth in the way you see in other CPaaS vendors. The end result of such a thing is expenses that were too high when aligned to revenue or to the potential to raise money in the VC world. Vonage will need to handle this, and a change in direction and DNA is never an easy one
  2. Telefonica probably wanted out. They weren’t interested in continuing with this, so any amount above $0 was a good number for them

Does this say anything about the market of video APIs? The viability of it to other vendors? The importance of video in the bigger picture?

I don’t really know.

Where are we with Video CPaaS?

Video CPaaS, and in a way we can extend it to WebRTC CPaaS vendors – those who don’t dabble too much with PSTN voice and/or SMS is a finickey market. The vendors that get acquired in this space are gobbled up never to be seen again (think AddLive or Requestec) or they just don’t grow fast enough or become as big as their PSTN voice/SMS counterparts.

And yet.

IDC maintains that the U.S. programmable video market will be a $7.4 billion opportunity by 2022, representing more than a 140% four-year CAGR. Assuming only 10% of that becomes a reality, the question becomes who will be the winners in programmable video?

What types of services do they need to offer? What products? Are these lower level APIs, or higher level abstractions? Maybe we’re looking at almost complete solutions with a nice API lipstick on top that get calculated in that $7.4 billion.

Video is here to stay.

It won’t be replacing every voice call. But it definitely has its place.

Otherwise, why did apple go for group video calls in FaceTime with 32 participants in their latest iOS?

And why did Whatsapp just add group video calls? And Instagram added group video calls?

Are they doing it just for fun? Is the market bound to be focused only on larger social networks?

I can’t believe that will be the case.

I came from a video conferencing company. Every year I was promised by management that this year will be the year of video. It never happened.

The last 5 years, I am using video so much that the year of video has passed already.

I guess the next question is what year will be the year of video CPaaS?

The difference in these two questions is that the year of video is the year when video became a widespread service. The year of video CPaaS will be the year when video becomes a widespread feature. We’re not there yet, but we’re heading in that direction.

In many ways, TokBox is one of the vendors figuring out how to get there.

Where are we with CPaaS?

CPaaS seems to be different, but only slightly.

Growth in this space, as far as I understand, comes from SMS and PSTN voice. That’s it.

VoIP? WebRTC? IP messaging? Social omnichannel aggregation? Video? All nice to have features for now that don’t affect the bottomline enough. And at the moment, they don’t seem to be big enough to fill in the gap when SMS and PSTN voice fall out of favor.

To be a successful CPaaS vendor today, you need to:

  1. Look into the future and execute the future
  2. Rely on SMS and PSTN revenue – AND improve your services in that domain
  3. Cultivate multiple IP based solutions and services, preparing to reap rewards once that market grows exponentially

The thing about that third point, is that it won’t be as simple to achieve as doing what CPaaS did with SMS and PSTN. In SMS and PSTN, CPaaS needed to act as an aggregator of carriers with a simple API. No one wants to deal with carriers (which is why they fail with these API initiatives when it comes to WebRTC and video services), so friendly CPaaS vendors are a great alternative.

What is the mote/barrier that CPaaS vendors are building in the IP world? Answering this question holds the key to the future of CPaaS.

What will Vonage do with TokBox?

Not have it as a standalone business.

Doing that, would mean perpetuating what happened in Telefonica. While not all of it was bad, it didn’t bring the expected growth with it.

Vonage is uniquely positioned here – more than any other vendor in the market, which is probably why it ended up acquiring TokBox.

The opportunity space:

  • VBC at Vonage deals with UCaaS
  • Nexmo and TokBox are all about CPaaS


  • TokBox will probably be merged with Nexmo, brining a single offering to developers
  • Nexmo has voice, SMS, IP messaging and omnichannel aggregation, with video just launched. TokBox has video
  • Together, that completes the gap in communication services for developers, brining Vonage on par with its biggest CPaaS competitor – Twilio
  • This means the threat of customers leaving TokBox to Twilio because they want to deal with a single vendor and need other telephony services is now lessened
  • It also means that the threat of customers leaving Nexmo to Twilio because Nexmo lacks a good video service is now lessened as well
  • If you are a TokBox customer that also uses Twilio, it might make sense for you to switch to Nexmo. I am sure Nexmo will be running the roster of TokBox customers to see if they have there Twilio customers that they can convert
  • TokBox had time to flesh out their service in a unique way – the time Telefonica gave them were put into good use when it comes to infrastructure and developer related capabilities (look at Inspector and their documentation). Next, Vonage can decide to cherry pick the best pieces of Nexmo and TokBox to combine them and give a better user experience across the board for the developers using their CPaaS platform


  • On the UCaaS front, Vonage is using Amazon Chime today. The challenge with Chime is that it is a complete standalone product – something that is harder to embed and integrate into an existing experience. Vonage isn’t alone here – RingCentral is relying on Zoom. Such integrations are nice, but they can’t go deep
  • TokBox brings APIs that are far superior and more flexible than what Zoom, Chime or any other video conferencing player can bring with its integration APIs. Using these to bake video right into its UCaaS VBC app makes sense, and puts Vonage at a better position than its UCaaS competitors
  • Especially if video is the next frontier

What does this mean to TokBox competitors?

TBTelefonica was never a serious competitor in video CPaaS.

Nexmo and by extension Vonage is.

Nexmo is probably second to only Twilio.

TokBox is probably first in video CPaaS.

They combine nicely and offer Nexmo a capability that its competitors don’t have if you look at the breadth of their video offering.

If Vonage executes this well, the end result will be a better CPaaS offering, a better Nexmo and a better Vonage.

You may also like

WebRTC predictions for 2023

WebRTC predictions for 2023

Your email address will not be published. Required fields are marked

  1. Excellent analysis
    One thing missing from it that could help both Vonage and their developer base :
    Pricing was a hell of a pain. Too complicated and not predictable in different use cases. This can’t be done in the space. Everything should be a price a month. Here, the more participants, the costs were sky rocketed!
    Understandable because of the scalability needed to support this (especially on recorded sessions ) but this was not practical. Their customers couldn’t compete even the most expensive of the video conf solutions out there…
    So, not easy webrtc user acceptance

    1. Sotiris,

      I am not sure I can agree with this approach. CPaaS vendors in general price per usage in one way or another. Some do it by user/seat, others by minute, device, stream or gigabyte.

      The more you use the more you need to pay – just makes sense.

      It might be said that the price point was high to begin with, or that the calculation required to understand costs was complex. But asking for a monthly flat rate makes no real sense.

    2. The pricing model was horrendous and a source of constant friction. Some customers need cost certainty, other cost efficiency. One of the biggest challenges the CPaaS industry still needs to deal with is the pricing model. Agree that the more you use the more you need to pay, but how that is implemented today is klugy IMO.

      1. This is exactly what I am talking about @Todd.

        Developers that were coming into the WebRTC arena and choosing a CPaaS vendor, had to understand the full business and pricing model of that CPaaS vendor before they understand how they can serve their customers.
        Some times, this was a prohibiting factor for most of the Startup/Developer value proposition / use case and they had to limit their exposure to high prices due to the fact that a meeting with 5 participants was almost 5 times more expensive from a meeting with 2 due to this crazy “published streams” concept of WebRTC tech.

        I am not saying that this was not a problem for the CPaaS vendor to solve, but why should the customer (the Developer) need to learn all that and at teh end find out that they should instead abandon the WebRTC meeting space for the safety of a platform integration with Zoom or Webex ?

        More context can be given @Tsahi if needed.

  2. As more interactions shift to video / WebRTC, transcription would be an important piece to turn conversations into searchable, mineable text.

    Otter.ai can fill that gap for cloud video collaboration as it already powers transcription for Zoom.us

    Learn more: http://aisense.com/zoom

  3. Great article! A couple of comments:
    1. Telcos are not interested in owning sub-$100M business units. Tokbox failed to deliver on that and for Telefonica, Tokbox became a distraction, not worth the management time and corporate resources. As a result, the sale price is irrelevant.

    2. From a strategic standpoint, the question is: how do you make money in this business? Tokbox and Twilio have a commodity approach, with generally similar feature sets.
    Remember how Google was not the first search engine to be available, not even the second one, but the seventh one?
    Remember how Uber accounted for 14% of Twilio’s revenues in 2016, 9% in 2017, and Twilio acknowledged that “Uber decreased its usage of [its] products throughout 2017”?
    This game is far from over. While also price competitive, a more nimble player like http://www.voxeet.com has a more differentiated offering with its 3D audio capability, simple, per-minute pricing and a UX toolkit that accelerates deployments.

    1. Time will tell if 3D audio is enough of a differentiator.

      Twilio and TokBox are definitely different than one another and offer different feature sets.

      Twilio is already a business that makes above $100M a year, so you could say carriers missed that one – at least for now. The challenge for carriers isn’t lack of technology, but rather the fact that potential customers (=developers) don’t want to be carrier customers.

    1. That’s a good question Jeff.

      I don’t think I have a good example of a pure API player in the market who started out as something else. Carriers tried adding APIs and failed. Cisco tried multiple times, only to end up acquiring Tropo.

      Building an API first company is hard. Really hard – you know that better than most. There are multiple reasons why, with the biggest ones being that developers are hard customers to sell to AND that 80% of the work isn’t the code or the API itself but rather the things around it – documentation, samples, developer outreach – and that is often neglected by those that add an API just because they can.

      Back to Vonage – before the acquisition of Nexmo – I have no clue. I don’t remember any API initiatives so can’t really say.

      After the acquisition – Nexmo had an API which catered for developers. UCaaS APIs are different in nature, as they are geared more towards integrations. Video APIs were just launched at Nexmo. The TokBox acquisition came at a good point in time is it allows Nexmo to leapfrog the whole part of building up the video feature set AND the process of acquiring the first customers. It saved them 2-3 years easily.

  4. Have you ever dealt with TokBox customer service? Completely useless. Also, TokBox is at least 20x more expensive than Twilio. Sure, TokBox may have some bells and whistles that Twilio lacks, but TokBox’s simply not worth it.

    1. Daniel,

      As with all companies I talk to, I can say that support is a mixed bag. There are those who are happy with TokBox support and those who aren’t. I’ve seen this across the board with customers of TokBox as well as customers of other CPaaS players.

      The fact that there are alternatives on the market is great, as it enables anyone to choose the vendor that fits his needs the most.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}