Explore the role of voice LLM in interactive AI. Understand how voice interfaces in generative AI require the use of WebRTC technology.
[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]
We’re all into ChatGPT, LLMs, Agentic AI, Conversational AI, bots, whatever you want to call them.
Our world and life now revolves around prompting. It used to be search and copy+paste. Now it is all prompting.
A natural extension of text is voice. And for that, we need to also understand that the whole interaction is going to be different:
Where prompting is turn by turn, voice is a lot more interactive.
At the “beginning” (as if we have ChatGPT with us for a decade…), companies introduced support for voice interfaces to their Generative AI LLM models using WebSockets. Some still introduce it today as well – even calling it “low latency”.
Rather quickly, that notion has died and has been replaced with the use of… WebRTC.
Why? Because we need something that is low latency, real time, interactive and live. All words that are used to describe WebRTC.
Want to dig deeper into this? Check out the following articles:
🧩 What Will Be the API Giving Voice to LLMs? (Nordic APIs)
🧩 CPaaS and LLMs both need APIs and SDKs
🧩 OpenAI, LLMs, WebRTC, voice bots and Programmable Video
🧩 Generative AI and WebRTC: The fourth era in the evolution of WebRTC
Need help?
👉 My Generative AI & WebRTC workshop is available for corporate customers to enroll for live sessions
👉 This blog is chock full with resources and articles that deal with these things. You just need to search for it and read
👉 I offer consulting to companies who want to develop with WebRTC. This includes making use of Generative AI technologies