Where does Machine Learning fit in Real Time Communication (ML in RTC)?

By Tsahi Levent-Levi

June 11, 2018

ML in RTC can fit anywhere – from low level optimization to the higher application layers.

TL;DR – I am working with Chad Hart on a new ML in RTC report. If you are interested in it, scroll down to the end of this article.

Machine Learning (ML), Artificial Intelligence (AI), Big Data Analytics. Call it what you will. You’ll be finding it everywhere. Autonomous cars, ecommerce websites, healthcare – the list goes on. In recent years we’ve seen a flourish in this domain due to the increase in memory and processing power, but also due to some interesting breakthrough in machine learning algorithms – breakthroughs that have rapidly increased the accuracy of what a machine can now do.

My ML Origin Story

I’ve been looking and dealing with machine learning for many years now. Never directly calling it that, but always in the vicinity of the communications industry.

It probably started in university. I decided to do an M.Sc because I was somewhat bored at work. I took a course in computational linguistics which then ended with me doing research in backward transliteration, looking at phonemic similarities between English and Spanish (#truestory). That was in 2005, and we used a variant of dynamic programming and the viterbi algorithm. That and other topics such as hidden markov model were my part and parcel at the time.

Later on, I researched the domain of Big Data and Analytics at Amdocs. I was part of a larger group trying to understand what these mean in telecommunications. Since then, that effort grew into a full business group within Amdocs (as well as the acquisition of Pontis, well after I left Amdocs for independent consulting).

Which is why when I talked to Chad Hart about what we can do together, we came to an agreement that something around ML and AI made a lot of sense for both of us, and taking it through the prism of RTC (real time communications), placed it in the comfort zone of both of us.

We molded that effort under the Kranky Geek roof, calling it Kranky Geek Research. Created a landing page for our research, a brochure and a survey (more on that later).

During that period, we thought a lot about what domains we wish to cover and what ML in RTC really means.

Categorizing ML in RTC

Communications is a broad enough topic, even when limited to the type that involves humans. So we limited even further to real time communications – RTC. And while at it, threw text out the window (or at the very least decided that it must include voice and video).

Why do that? So we don’t have to deal with the chatbots craze. That’s too broad of a topic on its own, and we figured there should be quite a few reports there already – and a few oil snake sellers as well. Not our cup of tea.

This still left the interesting question – what exactly can you do with AI and ML in RTC?

We set out to look at the various vendors out there and understand what are they doing when it comes to ML in RTC.

Our decision was to model it around 4 domains: Speech Analytics, Computer Vision, Voice Bots / Assistants and RTC quality / cost optimization.

1. Speech Analytics

Speech Analytics deals a lot with Natural Language Processing (NLP) and Natural Language Understanding (NLU).

Each has a ton of different use cases and algorithms to it.

Think of a contact center and what you can do there with speech analytics:

Employ speech-to-text for transcription of the sessions
Go further with sentiment analysis from analyzing voice queues and not only the transcripted text
Glean meaning out of the transcription and glean actionable insights based on that meaning

You will find a lot of speech analytics related RTC ML taking place in contact centers. A bit less of it in unified communications, though that might be changing if you factor in Dialpad’s acquisition of TalkIQ.

2. Computer Vision

Computer Vision deals a lot with object classification and face detection, with all the derivative use cases you can bring to bear from it.

“Simple” things like face recognition or emotion recognition can be used in real time communications for a multitude of communication applications. Object detection and classification can be used in augmented reality scenarios, where you want to mark and emphasize certain elements in the scene.

Compared to speech analytics, computer vision is still nascent, though moving rapidly forward. You’ll find a growing number of startups in this domain as well as the cloud platform giants.

3. Voice Bots & Assistants

To me, voice bots and assistants is the tier that comes right above speech analytics.

If speech analytics gets you to NLP and NLU, the ability to convert speech to text and from there moving to intent. Voice bots are about conversations – moving from a single request to a fluid interaction. The best example? Probably the Google Duplex demo – the future of what conversational AI may feel like.

Voice bots and assistants are rather new to the scene and they bring with them another challenge – do you build them as a closed application or do you latch on to the new voice bot ecosystems that have been rapidly making headway? How do you factor in the likes of Amazon Alexa, Google Home, Google Assistant, Siri and Cortana into your planning? Are they going to be the interaction points of your customers? Does building your own independent voice bot even makes sense?

Whatever the answers are, I am pretty sure there’s a voice bot in the future of your communications application. Maybe not in 2018, but down the road this is something you’ll need to plan for.

4. RTC Quality & Cost Optimizations

While the previous 3 machine learning domain areas revolve around new use cases, scenarios and applications through enabling technologies, this one is all about optimization.

There are many areas in real time communication that are built around heuristics or simple rule engines. To give an example, when we compress and decompress media we do so using a codec. The encoding process (=compression) is lossy in nature. We don’t keep all the data from the original media, but rather throw away stuff we assume won’t be noticed anyway (sounds outside the human hearing range, small changes in color tones, etc) and then we compress the data.

The codecs we use for that purpose are defined by the decoder – by what you do if you receive a compressed bitstream. No one is defining when an encoder needs to look like or behave. That is left to developers to decide, and ecoders differ in many ways. They can’t brute-force their way to the best possible media quality, especially not in real-time – there’s not enough time to do that. So they end up being built around guesswork and heuristics.

Can we improve this with machine learning? Definitely.

Can we improve network routing, bandwidth estimation, echo cancellation and the myriad of other algorithms necessary in real time communications using machine learning? Sure we can.

The result is that you get better media quality and user experience by optimizing under the hood. Not many do it, as the work isn’t as high profile as the other domains. That said, it is necessary.

Interested in ML in RTC?

Check out our report. A preview of the report is available for download.

Share your opinion on AI in RTC

Doing something interesting in this space? Share your thoughts with us.

Choosing the best WebRTC signaling protocol for your application

WebRTC is about reducing friction and barriers of entry

Oshri Naparstek says:

June 12, 2018 at 5:13 pm

I recently published a paper about using deep reinforcement learning for dynamic channel access

https://arxiv.org/abs/1704.02613

Reply
1. Tsahi Levent-Levi says:
  
  June 13, 2018 at 8:44 am
  
  Oshri,
  
  For the layman, where and how does this fit in real time comms?
  
  Reply