The future of Programmable Video: Prebuilt and marketplaces

By Tsahi Levent-Levi

December 16, 2024  

Uncover the synergy between Programmable Video, Prebuilt, and marketplaces. Explore the role of video APIs in accelerating development.

Programmable Video is a known quantity. It is part of the CPaaS movement where in this case, video APIs are used to enable developers to build their applications faster. WebRTC is also a part of all this.

Prebuilt is another concept that is well defined, but differently from Programmable Video, it is still reshaping itself. Prebuilt is about embedding the UX/UI component of the video interactions, and not just using an API – it makes the faster development of Programmable Video… well… faster.

Then there are/were marketplaces. When taken to the domain of Programmable Video, it takes a slightly different shape still.

What’s there between Programmable Video, Prebuilt and marketplaces? Where are we headed with this? This is something I want to explore in this article.

Not my first look at video and Prebuilt

This isn’t my first foray and look at the Prebuilt market. Even before WebRTC, I’ve been fascinated about what a lowcode/nocode solution looks like for a Programmable Video offering.

At RADVISION, I’ve been in charge of defining our cloud vision for developers. What we did was license protocol stacks to developers building their own voice and video communication applications. That was before CPaaS was called CPaaS. And before WebSocket was part of the web.

Later on, I’ve written about embeddable solutions and then published an ebook on lowcode/nocode for video communications – and the Prebuilt solutions they prescribed for us. That ebook is still relevant today and can be downloaded freely.

Marketplaces 101

Let’s switch gears a bit, before we head back to programmable video and lowcode/nocode solutions, I want to talk about a different topic called marketplaces.

When a vendor wants to build an ecosystem around his solution, one of the ways of doing that is to introduce a marketplace. The higher you go in the food chain, the more likely you are to find a marketplace as part of the complete offering.

I thought it would be best to explain what a marketplace is by looking at one. But just searching for an example gives you the gist of it. This is what I got for searching “AWS marketplace” on Google:

For me, a marketplace usually means:

  • A mechanism for a vendor to partner, curate, share and expand its offerings using third party vendors
  • A place for customers to search and find solutions they need easily
  • Where partners can showcase their offerings and have a faster and more comfortable channel to potential customers

The biggest marketplaces today are probably Apple’s App Store and Android Play. There’s the Microsoft Store for Windows applications.

Then there are the marketplaces of all the big IaaS vendors (Amazon AWS, Microsoft Azure and Google Cloud).

Cloud contact center vendors? All the big ones have marketplaces (I just searched for NICE, Five9 and Genesys to confirm).

Zoom has their own App Marketplace.

To complete this part – marketplaces are there once a vendor is big enough and looking to encompass third party solutions and services as part of his own offering.

Oh, and if you don’t understand why this is here, then just think what a marketplace for a Programmable Video Prebuilt offering may look like and mean.

Lowcode/nocode in Programmable Video: The next generation

When I wrote the lowcode/nocode ebook most of the market for Prebuilt revolved around CPaaS vendors who were just adding a UI layer on top – anywhere from a source code reference application to a higher level of abstraction with an API and documentation.

Since then, the market has evolved. We are now seeing vendors coming from a different origin story into the domain of CPaaS and Programmable Video, and these come with a different view of Prebuilt. Here’s how I explained recently the origin stories:

The vendors with a SaaS origin story started life with a full fledged video meetings application – UX/UI – the whole shebang. For them, going down the food chain towards Programmable Video meant their focus was Prebuilt first and then the rest of the low level APIs. As such, they brought with them some new qualities and capabilities not often found in Prebuilt Programmable Video solutions up to that point.

That brings us to the next generation of what Prebuilt is in Programmable Video, and how this market is evolving and shaping up towards the future.

A few notable examples

Here are a few notable examples of what is changing in the Prebuilt space, and how this is shaping the coming changes in what future Prebuilt solutions in the programmable video space are going to look like.

Supporting multiple languages

Moving from the API layer to the UX/UI layer brings with it a need to deal with different languages and internationalization.

That means that the text messages displayed on the screen and shared with the user need to be conveyed in different languages. Which ones? That depends on the vendor. Each vendor offers a different set of languages, usually based on the customer base it has.

This is more than just text translation – there’s changing direction of the text for some languages (Arabic and Hebrew), numbering and dates conventions, layout of the screen – placing text in a given area or on a specific button might require changing its size.

For a Prebuilt service, there’s also the ability (maybe?) of letting the customer make changes to the text being displayed, and that needs to be done – again – in multiple languages.

It may sound obvious, especially if you’ve built consumer applications. But for those focused on developing APIs for other developers, this is a new type of a headache they need to deal with.

Prebuilt in Programmable Video is now coming in multiple languages from some vendors. Others may need to follow.

A user construct

For the most part, CPaaS and Programmable Video vendors don’t think in terms of users. Mostly minutes and peers or devices. Got a meeting to connect to? You publish your streams and subscribe to streams of others. They aren’t users in the sense that they aren’t identified or known users. Their identity, if any, is decided and managed by the application on top.

Programmable Video offers no memory or notion of the users, their preferences or history.

Prebuilt? Sometimes…

Some Prebuilt solutions are starting to show signs of dealing with users – their identification and authentication. Sometimes even offering different permission types within meetings based on who they are.

I am not sure how this will hold moving forward, but it is something to track and contemplate if you are investing in a Prebuilt offering.

Calendar integration

Meetings are sometimes done on a set schedule. And that schedule means there’s a calendar involved.

Programmable Video doesn’t have calendars integrated into it, so adding external ones via partnerships might make sense – and it does for some of the Prebuilt vendors.

The ones adding such an integration into their Prebuilt solution are mostly those with a SaaS origin story. They see such requirements from their users and then translate it to their embedded offering as well.

Transcriptions, translations, summaries and everything AI

Like calendars and multiple languages, there are other meeting features that aren’t a “classic” match for Programmable Video APIs but make sense for Prebuilt. These include the gamut of handling the speech to text side of things – the ability to transcribe, translate, generate summaries, extract action items, etc.

All these are things that Prebuilt solutions for Programmable Video are introducing now. And again, it comes mostly from those with a SaaS origin story.

While these are getting better and more accurate due to LLM and generative AI, I think it is worth separating the two. Which leads me to the next thing – LLMs.

How will LLM, conversational AI and bots fit in

With the introduction of generative AI due to the concept of LLM we’ve seen huge amounts of money poured into this space. This is geared and focused towards the creation of conversational AI solutions along with voice and video bots. How will this affect the programmable video space is yet to be seen.

Programmable Voice and Video goes generative AI

Open AI just released a Realtime API for ChatGPT. This is Websocket based and not easy enough to use for live interaction from browsers or end devices.

This left a kind of a gap in the market, which a lot of CPaaS vendors and Programmable Video vendors have rushed to fill with an interface of their own connecting to OpenAI’s Realtime API. We’re tracking these as part of the WebRTC Insights service:

The reason for this rush is threefold:

  1. Basking in the light of OpenAI and their success. These vendors are doing what is needed to get into the mindshare of potential customers
  2. Existing use cases can enjoy the use of this technology to generate better summaries. There are potentially other features that can improve here as well
  3. It opens up a new interaction mode of 1:machine which is different from 1:1. It means more minutes to charge for

What is missing here though is the fact that once OpenAI does release a decent realtime API that fixes the gaps (think WebRTC interface), where does that leave all the Programmable Voice (and Video) vendors for the 1:machine use case?

Will they need to compete head on with LLM technology vendors for developer mindshare (and pocket) or will they still be viewed as viable partners?

Prebuilt and generative AI

Would it enable plugging machine intelligence instead of humans into these conversations? Making an attempt to focus on specific industries and market niches. Or would it be more towards interfacing with such third parties who bring the machine intelligence piece from “elsewhere”?

More importantly to this article, how will that fit into the world of Prebuilt solutions? On the one hand, this can keep developers away from adopting Prebuilt approaches, as these may or may not be able to cater for the latest approach that comes along with generative AI. using Prebuit may be viewed as a way to stick with the best practice in the video conferencing domain. But we are at an inflection point where trying to figure out and understand what conversational AI really means and how it will look like in the future is practically like writing best practices from scratch. Keeping at the forefront here might mean skipping Prebuilt and needing to go at least one level lower in the abstraction stack.

On the other hand, going Prebuilt might mean having the ability and resources needed to figure out how to add conversational AI to such a solution, assuming it is flexible enough. But how does one know which Prebuilt solution is going to be flexible enough in a domain that is only now being defined?

And maybe, going Prebuilt might mean not needing to deal with this new technology front, and instead, having it provided by the Prebuilt vendor itself – at some (near) future point in time.

More questions here than answers.

The challenge of a niche focus

A word of caution though. Taking the strategy of Prebuilt means diving into a niche market.

If you are developing a Prebuilt offering, then know that not all businesses are going to need or align with your offering. Each has his own unique requirements, many of which you are unlikely to be able to cater for. It means knowing and understanding that your potential target market is smaller, but also likely different in nature than the traditional programmable video market.

For those looking for a solution, choosing a Prebuilt alternative means ascribing to the set of features and capabilities provided by that specific vendor. At its code, a Prebuilt offering is less generic and more opinionated. You essentially get what the vendor thinks makes sense. It might be the common sense best practices that he baked into his solution, but that doesn’t mean that it fits your needs exactly. In some cases, using a more generic programmable video offering in the form of a video API might be the better option.

A hybrid approach

Some vendors have decided to enjoy both worlds. They do so by offering both a low level generic API while at the same time offering a higher level Prebuilt construct.

How they go about doing that is different and interesting. It is also explained in the video above. They might start with a low level API, adding a Prebuilt solution on top. Or rather start with a kind of a SaaS offering of video communications, later on creating a Prebuilt solution from it and further down the road introduce a lower level API for it as well.

As time goes by and the market matures, we will see more vendors taking up the hybrid approach.

We are seeing this today with CPaaS where quite a few vendors offer both a generic API and a drag and drop Flow/Studio interface.

What’s next

If you are into this domain and need assistance. Be it in validating the work you are doing with your own APIs and lowcode/nocode solution. Or if what you are after is deciding which vendor to work with for your application, reach out to me. I can help.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}