Autonomous Cars Are Killing Video AI in RTC

By Tsahi Levent-Levi

July 16, 2018

Autonomous cars are sucking all the oxygen out of video AI in real time comms. Talent is focusing elsewhere 🙁

I went to the data science summit in Israel a month or so back. It was an interesting day. But somehow, I had to make sure to dodge all the boring autonomous cars sessions .they just weren’t meant for me, as I was wondering around, trying to figure out where machine learning and AI fit in RTC (you do remember I am working on a report on this – right?).

After countless of interviews done this past month, along with my partner in crime here, Chad Hart, I can say that I now know a lot more about this topic. We’ve mapped the industry in and out. Talking to technology vendors, open source projects, suppliers, consumers, you name it.

There were two interesting themes that relate to the use of AI in video – again – focus is on real time communications:

There’s a lot less expertise to go around in the industry, where the industry is real time comms and not machine learning or computer vision in general
The industry’s standards and capabilities seem higher and better than what we see in RTC today

Guess what – we’re about to incorporate the responses we got on our web survey on AI in RTC into the report. If you fill it, you’ll get our upcoming “Introduction to AI in RTC ebook” and a chance to win on of 5 $100 Amazon gift cards – along with our appreciation of helping us out. Why wait?

Knowledge in AI is lacking

In broad strokes, when you want to do something with AI, you’ll need to either source it from other vendors or build it on your own.

As an example, you can just use Amazon Rekognition to handle object classification, and then you don’t need a lot of in-house expertise.

The savvy vendors will have people handling machine learning and AI internally as well. Being in the build category, means you need 3 types of skills:

Data scientists – people who can look at hoards of data, check out different algorithms and decide on what works best – what pieces of data to look at and what model to build
Data engineers – these are the devops of this field. They are there to connect the dots of the different elements in the system and build a kind of a pipeline where data gets processed and handled. They don’t need to know the details of algorithms, but they do need to know the jargon and concepts
Product managers – these are the guys who need to decide what to do. Without them, engineers will play without any focus or oversight, wasting time and resources instead of working towards value creation. These product managers need to know a thing or two about data science, machine learning and how it works

Data scientists are the hardest to find and retain. In one of our interviews, we were told that the company in question had to train their internal workforce for machine learning because it was impossible to hire experience in the valley – Google, Apple, Facebook and Amazon are the main recruiters for that position and they are too competitive in what they offer employees.

Data engineers are probably easier to find and train, but what is it you need them to do exactly?

And then there’s product managers. I am not even sure there’s any training program specifically for product managers who need to work in this space. I know I am still learning what that means exactly. Part of it by asking through our current research how do vendors end up adding AI into their products. The answers vary and are quite interesting.

Anyways – lots of hype. Less in the way of real skills out there you can hire for the job.

Autonomous driving is where computer vision is today

If you follow the general technology media out there, then there are 3 things that bubble up to the surface these days when it comes to AI:

AI and job displacement
The end of privacy (coupled with fake news in some ways)
Autonomous cars

The third one is a very distinct use case. And it is the one that is probably eating away a lot of the talent when it comes to computer vision. The industry as a whole is interested for some reasons to take a stab at making cars drive on their own. This is quite a challenge, and it is probably why so many researchers are flocking towards it. A lot of the data being processed in order to get us there is visual data.

Vision in autonomous cars cannot be understated. This ABC News clip of the recent Uber accident drives that point home. Look at these few seconds explaining things:

“These vehicles are trained to see pedestrians, to see cyclists, to see redlights. So it’s really unclear what went wrong here”

And then you ask a data scientist to deal withboring video meeting recordings to do whatever it is we need to do in real time communications with AI. Not enough fame in it as opposed to self driving cars. Not enough of a good story to tell your friends when you meet them after work.

Computer vision in video meetings is nascent

Then there’s the actual tidbit of what we do with AI in computer vision versus what we do with AI in video meetings.

I’d like to break this down into a table:

Computer vision	Video meeting AI
Count faces/people Speaker identification Facial recognition Gesture control Emotion detection	Auto-frame participants

Why is this difference? Two main reasons:

Video meetings are real time in nature and limited in the available compute power. There’s more on that in our upcoming report. But the end result is that adopting the latest and greatest that computer vision has to offer isn’t trivial
We haven’t figured out as an industry where’s the ROI in most of the computer vision capabilities when it comes to video meetings – there are lower hanging fruit these days in the form of transcription, translation and what you can do with speech

As we move forward, companies will start figuring this one out – deciding how data pipeline for computer vision need to look like in video meetings AND decide what use cases are best addressed with computer vision.

Where are we headed?

The communication market is changing. We are seeing tremendous shifts in our market – cloud and APIs are major contributors to this. Adding AI into the mix means change is ahead of us for years to come.

On my end, I am adding ML/AI expertise to the things I consult about, with the usual focus of communications in mind. If you want to take the first step into understanding where AI in RTC is headed, check out our upcoming report – there’s a discount associated with purchasing it before it gets published:

You can download our report prospectus here.

The future of Video APIs is… AI: LiveKit, Daily and Cloudflare this month

WebRTC gives voice to LLMs

TechNoFy says:

July 16, 2018 at 3:25 pm

Hi Tsahi,

Very nice article. Can AI in RTC be considered as deep technology?

Reply
1. Tsahi Levent-Levi says:
  
  July 16, 2018 at 5:55 pm
  
  Thanks.
  
  Not sure what you mean by deep technology. If you are referring to deep learning, then many of the techniques used today in RTC when it comes to machine learning algorithms are employing neural networks, which are considered as deep learning.
  
  Reply
  1. TechNoFy says:
    
    July 17, 2018 at 7:21 pm
    
    Deep technology can be considered as a term where they solve previously-intractable real-world problems, e.g. medical devices and drugs that cure disease and extend life; artificial intelligence to forecast natural disasters such as earthquakes; and clean energy solutions that can help stem global warming. Usually Deep technology are developed from years of Research and testing.
    
    So I was thinking if AI in RTC can be considered the same as it is relatively new, complex and need lot of research/data, But has the capacity to solve real-world problem.
    
    Reply
    1. Tsahi Levent-Levi says:
      
      July 17, 2018 at 9:15 pm
      
      I don’t think AI in RTC is a specific technology, area of research or even data types. There are multiple of technologies there, all from the machine learning domain, which are used in totally different ways to solve a large variety of problems.
      
      Reply
      1. Stefan Karlsson says:
        
        July 18, 2018 at 5:45 pm
        
        ah buzz words are like rabbits.
        https://rationalwiki.org/wiki/Deepity
Stefan Karlsson says:

July 17, 2018 at 5:16 pm

Tsahi,

Its very interesting to hear about your endeavours, both on your blog and in your news letter. I especially liked your ideas on how to use CNN approaches to solve the gaze problem in video conferencing.

Here are some topics you may want to look closer into in your future endeavours:

– CNN LSTM, architectures that are well suited for audio generation
– Google duplex, the one biggest thing happening in the junction AI and RTC
– specialized hardware, especially intel movidius chipsets, that will make ugly GPU solutions go away for real-time inference on most any cheap hardware in the future.
– next generation deep fakes are coming out with new impressive features (look in the clip for a solution to your gaze problem in video conf: https://www.youtube.com/watch?v=qc5P2bvfl44)

I work actively with some connected problems and have only briefly visited RTC for web-based streaming. I may be available for a chat, but promise nothing 🙂

You are making great content, keep up the good work

Reply
1. Tsahi Levent-Levi says:
  
  July 17, 2018 at 5:29 pm
  
  Stefan – thanks for the kind words and the suggestions – I’ll definitely follow up on them 🙂
  
  Reply
  1. Stefan Karlsson says:
    
    July 18, 2018 at 5:35 pm
    
    did you see the potential for gaze correction in the video-link i sent? Dont miss it
    
    Reply
    1. Tsahi Levent-Levi says:
      
      July 18, 2018 at 9:54 pm
      
      I’ve seen similar ones in the past month or two. Kinda scary…
      
      Reply
      1. Stefan Karlsson says:
        
        July 19, 2018 at 12:57 pm
        
        I havent seen the kind of parameterization that this work has. Its in that parameterization that the key to your problem lies…
        
        blink while watching the youtube clip, and youll miss it (all pun intended)