Should you use Kurento or Jitsi for your multiparty WebRTC video conference product?

By Tsahi Levent-Levi

September 5, 2016  

Kurento or Jitsi; Kurento vs Jitsi – is the the ultimate head to head comparison for open source media servers in WebRTC?

Kurento vs Jitsi - which one best fits your needs?

Yes and no. And if you want an easy answer of “Kurento is the way to go” or “Jitsi will solve all of your headaches” then you’ve come to the wrong place. As with everything else here, the answer depends a lot on what it is you are trying to achieve.

Need to pick a WebRTC media server framework? Why not use my Free Media Server Framework Selection Worksheet when checking your alternatives?

Since this is something that get raised quite often these days by the people I chat with, I decided to share my views here. To do that, the best way I know is to start by explaining how I compartmentalized these two projects in my mind:

Jitsi Videobridge

The Jitsi Videobridge is an SFU. It is an open source one, which is currently owned and maintained by Atlassian.

The acquisition of the Jitsi Videobridge serves Atlassian in two ways:

  1. Integrating Jitsi Videobridge into HipChat while owning the technology (it took the better part of the last 18 months)
  2. Showing some open source love – they did change the license of Jitsi from LGPL to APL

Here’s the intro of Jitsi from its github page:

Jitsi Videobridge is an XMPP server component that allows for multiuser video communication. Unlike the expensive dedicated hardware videobridges, Jitsi Videobridge does not mix the video channels into a composite video stream, but only relays the received video channels to all call participants. Therefore, while it does need to run on a server with good network bandwidth, CPU horsepower is not that critical for performance.

I emphasized the important parts for you. Here’s what they mean:

  • XMPP server component – a decision was made as to the signaling of Jitsi. It was made years ago, where the idea was to “compete” head-to-head with Google Hangouts. So the choice was made to use XMPP signaling. This means that if you need/want/desire anything else, you are in for a world of pain – doable, but not fun
  • does not mix the video channels – it doesn’t look into the media at all or can process raw video in any way
  • only relays the received video – it is an SFU

Put simply – Jitsi is an SFU with XMPP signaling.

If this is what you’re looking for then this baby is for you. If you don’t want/need an SFU or have other signaling protocol, better start elsewhere.

You can find outsourcing vendors who are happy to use Jitsi and have it customized or integrated to your use case.

Kurento

Kurento is a kind of an media server framework. This too is an open source one, but one that is maintained by Kurento Technologies.

With Kurento you can essentially build whatever you want when it comes to backend media processing: SFU, MCU, recording, transcoding, gateway, etc.

This is an advantage and a disadvantage.

An advantage because it means you can practically use it for any type of use case you have.

A disadvantage because there’s more work to be done with it than something that is single purpose and focused.

Kurento has its own set of vendors who are happy to support, customize and integrate it for you, one of which are the actual authors and maintainers of the Kurento code base.

Which one’s for you? Kurento or Jitsi?

Both frameworks are very popular, with each having at the very least 10’s of independent installations and integrations done on top of them and running in production services.

Kurento or Jitsi? Kurento or Jitsi? Not always an easy choice, but here’s where I draw the line:

If what you need is a pure SFU with XMPP on top, then go with Jitsi. Or find some other “out of the box” SFU that you like.

If what you need is more complex, or necessitates more integration points, then you are probably better off using Kurento.

What about Janus?

Janus is… somewhat tougher to explain.

Their website states that it is a “general purpose WebRTC Gateway”. So in my mind it will mostly fit into the role of a WebRTC-SIP gateway.

That said, I’ve seen more than a single vendor using it in totally other ways – anything from an SFU to an IOT gateway.

I need to see more evidence of use cases where production services end up using it for multiparty as opposed to a gateway component to suggest it as a solid alternative.

Oh – and there are other frameworks out there as well – open source or commercial.

Where can I learn more?

Multiparty and server components are a small part of what is needed when going about building a WebRTC infrastructure for a communication service.

In the past few months, I’ve noticed a growing requests in challenges and misunderstandings of how and what WebRTC really is. People tend to focus on the obvious side of the browser APIs that WebRTC has, and forget to think about the backend infrastructure for it – something that is just as important, if not more.

It is why I’ve decided to launch an online WebRTC Architecture course that tackles these types of questions.

Course starts October 24, priced at $247 USD per student. If you enroll before October 10, there’s a $50 discount – so why wait? Until I get enrollment automation up, contact me directly.

Need to pick a WebRTC media server framework? Why not use my Free Media Server Framework Selection Worksheet when checking your alternatives?


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked

    1. Gustavo – thanks

      Slack for now is voice-only. While they might add video now that HipChat relaunched it on their service (via Jitsi), who knows if it will still be Janus or not.

      On top of that, Slack is a great reference, but might not be the right one for others. I don’t know how much support they got, how much customizations they made and how much crap they ate along the way – it might be the best thing that happened to Slack – or it might not.

      The track record I see and the recommendations I give are based on multiple variables – it relates to the DNA of the vendor adopting the framework, the feature set he needs, the type of support he is looking for, the scale he needs, the direct feedback I get from others on their use of said frameworks and on discussions with the vendors themselves.

      For now, I am still waiting for more evidence about Janus besides Slack – and not because I have anything bad to say about it.

      1. Just for a little “Cicero pro domo sua”, as I’m the Janus main author, here’s a link to a presentation I made a few months ago at Kamailio World:

        http://www.kamailio.org/events/2016-KamailioWorld/Day1/10-Lorenzo.Miniero-Janus-WebRTC-SIP-Gateway.pdf (there’s also a video on YouTube of me presenting this, if you’ve time to watch it)

        Despite the title, it was a more generic overview on Janus in general, and towards the end you can see a (non-exhaustive) list of different products using Janus nowadays, most of them exploiting the SFU plugin. As to Slack, they did everything by themselves without involving us, apart from a couple open discussions on our Google group.

        Hope this helps.

  1. XMPP (or rather colibri) is just the control layer. You can write a server for the xmpp component connection to the jitsi video bridge and translate to whatever you want from there… just requires skill.

    1. Hi,
      I’m the Kurento lead and my opinion might be probably biased, but my honest feeling is that Tsahi’s diagnose in this post is quite accurate. Jitsi was designed with a specific videoconferencing model in mind and XMPP makes a lot of sense on it. When creating applications complying with such model, Jitsi makes a great job and using it may save lots of development hours. However, this may be too narrow when special requirements need to be satisfied. First, because using XMPP-inspired control mechanisms is not appropriate for all types of media control logic one may need to have. Just as an example, consider how you would be using XMPP for doing things such as interoperating with IP cameras or smart video devices, controlling computer vision filters, combining media mixing models with SFU models dynamically or orchestrating a complex dynamic media processing topology, etc. Using XMPP-like control mechanisms for those might be a counter intuitive and complex task for developers. Second because extending Jitsi video bridge with further capabilities requires a lot of hacking and deep knowledge of its code internals. On the other hand, Kurento was designed, since its very beginning, as a modular media development framework providing full composability and extensibility. Due to this, Kurento developers use consistent APIs available through programming-language-dependent SDKs that are designed based on software engineering principles (e.g. type protection, efficient management of synchronous/asynchronous calls, efficient use of threads, concurrency control, distributed garbage collection, testability, etc.) These APIs are fully agnostic to any kind of signaling or assumption being the “call model” just one of the possibilities for it. This provides a lot of flexibility but it also has the drawback Tsahi comments: when you just need a standard videoconferencing call flow you might need to develop your own signaling stack and then you might feel that you are reinventing the wheel. For minimizing this effect, on top of the Kurento raw APIs, the Kurento team also created several high-level APIs providing specific signaling such as the Kurento Room API and the Kurento Tree API, but this is another story.

    2. Jitsi Videobridge also supports a REST control layer so XMPP is by no means a requirement. Some pretty eminent adopters out there (e.g, HighFive, join.me and others) are using it through REST control.

  2. Hi,
    I am currently developing a many-to-many video/audio conference solution for a german company which should integrate a public phone conference system via SIP in the future. Therefore I played around with Kurento and Janus. From my experiences so far I can tell that there are advantages and drawbacks in both media servers.
    Both of them have interesting approaches regarding the architecture. I like the Kurento way of connecting media endpoints to pipelines. But also the Plugin based approach of Janus offers a lot of possibilities as long as the developer is able to create those native plugins.
    In my view the typical use case for multiparty audio/video conferences is a SFU-approach for video and a MCU-approach (mixing) for audio. Mixing video would eat a lot of server resources and it’s not the typical use case that all participant in a big conference show video concurrently. But it’s different for audio. Especially if a gateway to the public phone world is necessary. Here you don’t get around mixing the audio channels. And this is the point where I see advantages on the Janus side. I compared server resource consumption of the Janus Audio Bridge to the Kurento composite element. Janus is using libOpus in the plugin to decode the opus audio, mixes the streams and encodes the mixed stream. This implementation limits the usage to the opus codec but it’s saving a lot of resources compared to the Kurento composite. As far as I understood Kurento uses gstreamer libs and pipelines for all the media processing. This may be much more flexible but leads to much more CPU consumption. My rough measurements showed approx. four times higher CPU load for a audio room with 10 participants.

        1. Yes. The dominant players are still Jitsi and Janus in the open source part. Kurento is going down in popularity and use.

          Other alternatives haven’t made enough progress yet – at least not based on the conversations I have with developers.

          1. Hi, Tsahi. Nice review of these technologies but what’s your take on mediasoup. We’re a full Javascript(Node) team and while Kurento provides NodeJS support, your latest comments indicates that its usage has began to dwindle. How does Mediasoup compare with Kurento?

          2. Kurento has everything in a box, which means it is very generic in nature. It also wasn’t really updated in the past year that much, although things should be improving now.

            As for mediasoup, it is still new with a small team behind it and a small ecosystem. I know of projects who’ve opted for using it, though not about their current status.

          3. Thanks for the prompt reply Tsahi. I looked at Kurento during the weekend while it appears as though work has started on it again after almost a year of inactivity. The tutorial repos for the Javascript still appears dated to 6.6.2 as against the current 6.7.* . Moreso it appears to be an overkill for our particular use case though. I think I’ll peruse mediasoup as it appears more lightweight and better suited for our use case. Thanks again.

          4. Good luck and be sure to update me on your findings – I am really interested in hearing feedback from developers on the various media server alternatives out there.

        1. Tsahi, i agree with you. I haven’t heard from anyone using the same for an enterprise level solution. The main problem for me is “multi-party seems to be limited to 16 participants” in Intel’s SDK. I want to acheive video calls with as many as 100 pax in a single call. I also went through your blog concerning the load testing for Kurento and it was bit dissapointing. Do you still think I can make a 100 pax call work with Kurento in MCU mode,
          with a very high spec server

          1. MCUs have challenges with load which amount to the cost of the service.
            You can use the Intel one to do that by cascading machines one on top of the other (assuming Intel supports that mode).

            If you don’t need to see all 100 participants at all times, then my suggestion would be to go for an SFU based architecture.

    1. Aruna,

      Frankly, I am not aware of anything dominant or typical when it comes to signaling alongside Kurento. My guess is that it takes one of two forms:

      1. The developers use the Node.js server coming from Kurento and modify it, making it their de facto signaling server
      2. The developers use whatever it is they decided to drive their app interactions with

      If other readers here can share their views and experiences that will be appreciated.

      1. Thank you Tsahi for your insights on Kurento . Any experiences with Nubomedia which is an extension to Kurento . Are there any commercial deployments using it ?

        1. Hi,
          I’m Luis, NUBOMEDIA project coordinator. NUBOMEDIA is a research infrastructure that was created to experiment novel paradigms for combining WebRTC with advanced media processing capabilities in a scalable way. As a research project, it is not devoted to production and it lacks features that are probably required for such purpose (e.g. billing, fault-resilience mechanisms, etc.) Hence, NUBOMEDIA is a good starting point for any organization wishing to create a next-generation WebRTC PaaS, but not for being used direcly in production. In other words, if you are willing to create your very own WebRTC PaaS, evolving NUBOMEDIA may save you thousands of development hours, but further efforts need to be invested before having NUBOMEDIA to be production ready.

      2. If i will do the signaling myself on a separate server , which APIs Kurento provide to interact with? Is their a detailed reference for that?

    1. Shuan,

      I guess it depends on how Kurento will progress from here now that Twilio is in the helm. Better? Worse? Who knows?

      Once Twilio officially releases their own Kurento supported CPaaS capabilities, we will see what gets pushed back into the open source Kurento code and be able to know better.

          1. Well, it’s been a few months. It appears one can still select kurento from the AWS marketplace, though I have not tried it and officially, twilio has stated (back in ’16!) that they are not taking new electricRTC clients. Maybe that only means they are not offering support services for new electricRTC clients, but you are still free to use it. I’m trying to pick the right media server and hosting environment for my needs, and I wish there was some resolution on this. Twilio pricing seems to be $0.001 per participant minute. That’s better than Vidyo.io ($0.01 per participant minute) but I think I can do better by self-hosting.

          2. Rick,

            I think you’re misinterpreting the Twilio pricing – it should be higher, especially if you are aiming at a media server.

            Look at Janus and Jitsi for viable alternatives. I wouldn’t use Kurento today, as it hasn’t been updated for quite some time.

            Good luck!

  3. Hi
    Can you please let us know who are all different vendors who provide customization/integration and general support for the Kurento Stack other than Kurento team itself?

    Really appreciate your response here.

    Br, Sudhi

  4. hi Tsahi

    Nice basic article. i have a question .

    1) A Media server (open source) can they run on a Windows Server
    2) Do you have any Stun/Turn Server software (Open Source) that can run on Windows Servers
    3)I am using Xamarin. Can you direct me to an already existing wrapper for Xamarin

    Thanks

    1. Vuyiswa,

      For media servers look at Jitsi, Janus and mediasoup these days. Not sure about their Windows support, but they probably have solutions for that.

      For STUN/TURN use coturn.

      Xamarin – I am not an expert on that, so don’t have an out of the box answer for you for this one.

  5. Hello Tsahi.
    Thanks for your article.
    I have 1 question.
    jitsi vs kurento vs others -> based on only performance of video conference, which is good solution between these?
    like limitation of users and video quality etc
    I am in struggling which framework should I choose for video conference of that over 30 users participate in one room, but there will be some more rooms who others participate.

    thanks

    1. You will need to figure out on your own which one performs best for your needs. There are tools on the market to conduct such tests, including https://testrtc.com where I am a cofounder (I can’t really tell you the answer simply because I don’t know it – our focus as a company isn’t to compare media servers but to assist clients in testing and monitoring).

      As for going to 30 participants and more, I just released an ebook on that: https://webrtccourse.com/product/scale-webrtc-group-call/

  6. Hello Sumit and Aruna,

    We are using Intel CS for WebRTC framework to develop our video conferencing platform. We can have any number of participants in a single video call either in SFU mode or Mixing mode. It all depends on your use cases, UI design, hardware resources. Intel supports H.264 video hardware acceleration also, if you use Intel server with UHD graphics. Also, it provides excellent support to develop native conferencing applications on Android, and iOS platforms. Of course, there are some issues also with Intel WebRTC framework. One of the major issue is, community support is very very less. Thank you.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}