There’s scaling and then there’s scaling.
One thing that was missing from these comments is an understanding of what scale means. Or rather the different types of scaling that are required when it comes to real time video.
Here are a few different aspects of scaling real time video.
#1 – Streams per machine
This is something that was raised on one of the comments on Facebook:
Most of the SFUs out there can actually handle 100’s and even 1000’s of connections (our data is not public but look at JVB:https://jitsi.org/Projects/JitsiVideobridgePerformance) and with most of them it should be possible without much effort to configure multiple SFUs in cascade to scale almost without any limit in my opinion.
That answers the question how many parallel sessions can you conduct on a single machine?
What is this one good for?
When you know how many sessions / streams you plan on having, you can then calculate how many machines you’ll need to run that scenario. From there, it is easier to extrapolate costs.
But that’s not our only vector of scale.
#2 – Streams per session
How many streams can we “bundle” per session?
In the comment above, what was failed to be mentioned was that these tests of 100’s and 100’s of connections were when each session had no more than 33 streams in it. So if what I want is to live broadcast a singer to 1000’s of viewers in real time – this SFU solution won’t be suitable for my need.
It is nice to be able to do multiparty video or to broadcast live with low latency, but always ask yourself – what’s the upper limit here for this single session? How many participants can I cram into that session without making things impossible on my infrastructure?
There are, in general, two critical challenges here:
- When the number of users per session grows, the amount of communications between peers should be limited. At the extreme, a broadcaster should not be harassed by viewers directly (which is wher e the SFU starts breaking at scale and why I assume Jitsi preferred not to check above 33 participants)
- When the number of users per session grows beyond a single machine, how does that compute? You’ll need to be able to distribute the session somehow either by cascading or using some other means of architectural magic
It is also worth pointing out that the larger the group, the more fragmentation issues you’ll have across parallel sessions – if the size of a session is dynamic, then on what kind of a machine should you start it? One which is free or one which is already somewhat busy? Can you dynamically route a session to other machines when the need arise? How do you load balance this?
#3 – Failure diffusion
This one is related because the higher the scale and capacity, the more of an issue this will be.
Let’s assume we can get a machine to run 10,000 streams in parallel. I am optimistic today. Let’s also assume that this all happens in a single process running in our machine.
What happens if there’s a bug somewhere (and believe me – there already is), which happen to cause the system to crash? Whenever we hit the bug, 10,000 streams get disconnected.
Now let’s further assume that each session holds 10 streams on average. And the bug was invoked due to one of these streams doing something slightly unorthodox. Now we have one session causing the disconnection of 999 more sessions on that machine.
Which leads us to the question –
Can I run multiple processes on the same machine, each catering a smaller number of sessions? Maybe even only a single session? How does that impact memory and performance? Is it even desirable?
For some, this might be necessary in their architecture – and it is very far from how telecom services are architected…
When Talking About Scaling…
Make sure you refer to the specific aspects you wish to scale.
Need to pick an open source WebRTC media server framework for your project? Check out this free selection worksheet.