Jitsi is getting a boost in its development.
When a developers focused company gets acquired it is time to start worrying.
Was the acquisition due to the technology, the customers or the business model?
Will the product continue to grow and flourish in the new regime?
Are the current signed agreements going to be renewed?
For open source, there are even more questions.
How will the community that was created around the open source project be treated?
Will existing business models around support, customization and dual licensing be maintained or will they be killed?
Two and a half years ago or so we had 3 popular open source media servers for WebRTC: Janus, Jitsi and Kurento.
The progress made around Kurento since its acquisition was minimal at best. My guess is that Twilio is just too busy in getting its own multiparty video ready for GA to focus on the Kurento open source project itself. It also haven’t quite acquired everything that is Kurento – parts of it were left for the community and the original parent company Naevatec. The time passed is making a lot of the Kurento adopters frustrated and in search of different alternatives.
So time to ask –
How did Jitsi fair since its acquisition?
And it seems to be getting a lot more interesting lately.
In the past 4 months, I’ve been adding almost on a weekly basis a post about Jitsi into the WebRTC Weekly. The team there has been continuously churning out new features into the project.
Here’s what was announced on the Jitsi blog since June when it comes to new features:
- New Layouts in Jitsi Meet
- Control the Volume for Every Meet Participant
- Speaker Times in Jitsi Meet
- Telephony Support on meet.jit.si
There’s a mix of announcements here. They range from addition of UX feature to some deep optimizations of the media server itself. And part of it is due to GSoC, Google Summer of Code, a project started by Google some years ago where university students can join open source projects as interns. Jitsi has been part of this project for some time now.
In a way, these are the least interesting features when it comes to a media server, but the ones that makes it easier to use.
What Jitsi did in this round was tweak the UI to be a bit more modern and easier to use. For video layouts, there was a decision to better cater for 1:1 scenarios and to move video thumbnails from the bottom of the page to the right side of the page. This is also what Google decided to do once they shifted away from Hangouts to Meet. This makes for a more modern approach that sits well with the wider displays we have in recent years.
An audio only button was added to the UI. I am assuming it is just a shortcut to muting incoming and outgoing video. Having this UI element there makes it easier for users to operate (and easier for adopters of the Jitsi Videobridge to customize).
The interesting addition to me is the speaker times one.
I am intrigued in this case to know how easy would it be for an application to get that information from the Jitsi Videobridge – is this supported via the signaling offered by Jitsi towards the web client or is it also available as a backend-to-backend REST API? I can see this being used later in various ways, assuming the API is detailed enough and easy to use.
A WebRTC media server is but a part of what you need to run a full application. While central and important, there are other aspects to it. In recent months, Jitsi have added a few additional integrations, making it easier to use and connect to.
Three such integration points were announced:
1. Mobile SDK
Jitsi had mobile applications for quite some time. While nice, it is different than having a mobile SDK.
Something I’ve been telling media server vendors for a few years now, is that they should offer a mobile SDK as part of their media server. In WebRTC, it is an important part of their offering and one that is hard to ignore.
In the case of Jitsi, users had to use the mobile application as a reference and modify it to their heart’s content. The problem with this approach starts when you need to maintain the codebase in the long run. When a new version of the mobile app comes out – how do you know which parts are critical to upgrade (=without them the app will break with the new Jitsi Videoserver) and which ones are just UI fixes that you can ignore or just pass since you’ve created your own UI experience already?
This is exactly why an SDK is such an important aspect of the solution:
With a mobile SDK, application developers can now just use the Jitsi Meet mobile application as a reference or even write something from scratch on top of the mobile SDK itself. Each is independently updated and maintained, making it easier to upgrade to newer releases.
2. Speech to text
Translation and NLP seems all the rage these days.
The way you get these things connected to WebRTC varies, but follows a similar approach for media servers:
You somehow collect the audio streams on the media server, mix and process them to the format supported by a 3rd party speech-to-text engine (Google Cloud speech-to-text seems quite popular these days), and once you get the resulting text, you do something with it.
In the case of Jitsi, this was a GSoC project. Information about its current status can be found on the developer’s website – Nik Vaessen.
This probably requires some more improvements and polish, but offers a good starting point for developers.
I’d wager that in GSoC 2018, the Jitsi team is planning on adding translation and text-to-speech to it.
Telephony was already available in Jitsi before. It is implemented via a Jigasi server (JItsi GAteway to SIP). Now Atlassian is eating its own dogfood and not only with its internal HipChat service but in its free meet.jit.si showcase service.
In the case of meet.jit.si, the length of calls was limited to 2 minutes, enabling hunting down meeting participants who haven’t joined the session.
This serves two purposes:
- Show that Jigasi works and showcase its use
- Work out the kinks of getting this into the UX
Media Server Optimizations
At the heart of Jitsi is the media server itself. This is what developers aim for to begin with and the additions there are quite interesting.
The first one is that Jitsi now supports peer to peer media traversal for 1:1 sessions – in effect – no media server. The reasoning being that many of the calls end up being 1:1 and it is far easier and cost effective to share media directly between the participants.
In the past, supporting such a thing with Jitsi required running a separate signaling mechanism for 1:1 sessions and then once the need arise to grow, shift and renegotiate everything in front of Jitsi. It was tedious at best.
The other work effort is way more interesting.
Bandwidth estimation is nasty. Network conditions are varying and dynamic. You can start a session with 2Mbps and have it considerably drop throughout the session, coming back up again and changing characteristics.
To get that right, WebRTC (and any other VoIP alternative) needs to use bandwidth estimation. This is a process where the device tries to understand how much bandwidth is available to him at any given point in time. The algorithm can be naive, smart, complex, whatever. And a lot of the perceived quality of a call would rely on the quality of the algorithm used for bandwidth estimation.
WebRTC has its own built in bandwidth estimation mechanism. It works. But you need your own algorithm in a media server. Jitsi has its algorithm, and it is work in progress.
The Jitsi team are now taking it to the next level, trying to not only understand availability of bandwidth but also what the best course of action should be – it is trying to discern if it is better to reduce bitrate or add forward error correction instead.
It also does that with the coolest set of tech tools available to us today – Tensor Flow and Machine Learning.
Here’s what Emil Ivov shared during our Kranky Geek event last month:
Where to Next?
Looking for an open source alternative for your media server?
The most popular approaches out there for you are Janus and Jitsi.
Which one to pick out of the two seems to be based on personal taste more than anything else.