Autonomous Cars Are Killing Video AI in RTC

July 16, 2018
Autonomous cars are sucking all the oxygen out of video AI in real time comms. Talent is focusing elsewhere :-( I went to the data science summit in Israel a month or so back. It was an interesting day. But somehow, I had to make sure to dodge all the boring autonomous cars sessions .they just weren’t meant for me, as I was wondering around, trying to figure out where machine learning and AI fit in RTC (you do remember I am working on a report on this - right?). After countless of interviews done this past month, along with my partner in crime here, Chad Hart, I can say that I now know a lot more about this topic. We’ve mapped the industry in and out. Talking to technology vendors, open source projects, suppliers, consumers, you name it. There were two interesting themes that relate to the use of AI in video - again - focus is on real time communications:
  1. There’s a lot less expertise to go around in the industry, where the industry is real time comms and not machine learning or computer vision in general
  2. The industry’s standards and capabilities seem higher and better than what we see in RTC today
Guess what - we’re about to incorporate the responses we got on our web survey on AI in RTC into the report. If you fill it, you’ll get our upcoming “Introduction to AI in RTC ebook” and a chance to win on of 5 $100 Amazon gift cards - along with our appreciation of helping us out. Why wait?

Knowledge in AI is lacking

In broad strokes, when you want to do something with AI, you’ll need to either source it from other vendors or build it on your own. As an example, you can just use Amazon Rekognition to handle object classification, and then you don’t need a lot of in-house expertise. The savvy vendors will have people handling machine learning and AI internally as well. Being in the build category, means you need 3 types of skills:
  1. Data scientists - people who can look at hoards of data, check out different algorithms and decide on what works best - what pieces of data to look at and what model to build
  2. Data engineers - these are the devops of this field. They are there to connect the dots of the different elements in the system and build a kind of a pipeline where data gets processed and handled. They don’t need to know the details of algorithms, but they do need to know the jargon and concepts
  3. Product managers - these are the guys who need to decide what to do. Without them, engineers will play without any focus or oversight, wasting time and resources instead of working towards value creation. These product managers need to know a thing or two about data science, machine learning and how it works
Data scientists are the hardest to find and retain. In one of our interviews, we were told that the company in question had to train their internal workforce for machine learning because it was impossible to hire experience in the valley - Google, Apple, Facebook and Amazon are the main recruiters for that position and they are too competitive in what they offer employees. Data engineers are probably easier to find and train, but what is it you need them to do exactly? And then there’s product managers. I am not even sure there’s any training program specifically for product managers who need to work in this space. I know I am still learning what that means exactly. Part of it by asking through our current research how do vendors end up adding AI into their products. The answers vary and are quite interesting. Anyways - lots of hype. Less in the way of real skills out there you can hire for the job.

Autonomous driving is where computer vision is today

If you follow the general technology media out there, then there are 3 things that bubble up to the surface these days when it comes to AI:
  1. AI and job displacement
  2. The end of privacy (coupled with fake news in some ways)
  3. Autonomous cars
The third one is a very distinct use case. And it is the one that is probably eating away a lot of the talent when it comes to computer vision. The industry as a whole is interested for some reasons to take a stab at making cars drive on their own. This is quite a challenge, and it is probably why so many researchers are flocking towards it. A lot of the data being processed in order to get us there is visual data. Vision in autonomous cars cannot be understated. This ABC News clip of the recent Uber accident drives that point home. Look at these few seconds explaining things: https://youtu.be/ufNNuafuU7M?t=51s
“These vehicles are trained to see pedestrians, to see cyclists, to see redlights. So it’s really unclear what went wrong here”
And then you ask a data scientist to deal withboring video meeting recordings to do whatever it is we need to do in real time communications with AI. Not enough fame in it as opposed to self driving cars. Not enough of a good story to tell your friends when you meet them after work.

Computer vision in video meetings is nascent

Then there’s the actual tidbit of what we do with AI in computer vision versus what we do with AI in video meetings. I’d like to break this down into a table:
Computer vision Video meeting AI
  • Count faces/people
  • Speaker identification
  • Facial recognition
  • Gesture control
  • Emotion detection
  • Auto-frame participants
Why is this difference? Two main reasons:
  1. Video meetings are real time in nature and limited in the available compute power. There’s more on that in our upcoming report. But the end result is that adopting the latest and greatest that computer vision has to offer isn’t trivial
  2. We haven’t figured out as an industry where’s the ROI in most of the computer vision capabilities when it comes to video meetings - there are lower hanging fruit these days in the form of transcription, translation and what you can do with speech
As we move forward, companies will start figuring this one out - deciding how data pipeline for computer vision need to look like in video meetings AND decide what use cases are best addressed with computer vision.

Where are we headed?

The communication market is changing. We are seeing tremendous shifts in our market - cloud and APIs are major contributors to this. Adding AI into the mix means change is ahead of us for years to come. On my end, I am adding ML/AI expertise to the things I consult about, with the usual focus of communications in mind. If you want to take the first step into understanding where AI in RTC is headed, check out our upcoming report - there’s a discount associated with purchasing it before it gets published:

You can download our report prospectus here.

You may also like

RTC@Scale 2024 – an event summary

RTC@Scale is Facebook’s virtual WebRTC event, covering current and future topics. Here’s the summary for RTC@Scale 2024 so you can pick and choose the relevant ones for you.

Read More