AWS is NOT your WebRTC Hosting of Choice

By Tsahi Levent-Levi

January 23, 2014  

UPDATE: This article is from 2014. The jury was out there at the time to see if cloud virtualization was suitable for real time communication solutions such as WebRTC. In 2020 it is quite apparent that WebRTC fits nicely in cloud, virtualization and containers. Times change 😀

Amazon Web Services? Keep it off my WebRTC service.

WebRTC and AWS

Here’s the thing:

  1. This is going to be a controversial post
  2. I haven’t got a solid opinion, but I will take a side here, for the fun of argumentation

I have been talking to WebRTC vendors for a long time now. It is a lot of fun and very educating for me. One of the questions where I get the most interesting answers is about the deployment model of these services – they range from fully virtualized-redundant-cloudified-dynamic solution to dedicated servers hosted in select data centers. The whole rainbow of choices is out there.

But here’s the thing – WebRTC is still video communications. And it is still done in real time. As such, it isn’t that fond of latencies and surprises. It has mechanisms to deal with these, but at the end of the day, give it a healthy dose of stability and predictability (with a touch of low latency) and it will give you back in media quality.

Virtualization… how would I say it… isn’t that fond of predictability. It likes keeping its options open. It tries to squeeze whatever fits on the CPU to make sure it increases utilization as a whole.

See the contention here? The misalignment in purpose?

WebRTC has a hard time today with smartphones (and desktop browsers) because they run on operating systems that aren’t real time in nature. It is hard to commit to processing frames at 20 milliseconds intervals and calculating the echo cancellation required if the operating system oscillates between 15 to 30 millisecond intervals (or more if it decided to garbage collect memory somewhere on another application). These same headaches? They exist at a whole different level on the server side – especially if you intend on doing some video processing work there (complex or otherwise).

Thinking of deploying your server?

Striving for the best media quality possible under the circumstances? Steer away from AWS and cloud/virtualize data centers. Go for dedicated ones.

Think otherwise? Let’s hear your voice in the comments below – as I said, I am not sure I am in agreement with myself on this one.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked

  1. In this point, I am not fully agree with you. AWS is an amazing and useful platform, as other virtualization platforms, for instance, OpenStack. They have a great potential and we are seeing a lot of activity regarding to Software Defining-Networking (SDN) and Network Function Virtualization (NFV) so, this market cannot be separated of WebRTC. Of course, there are a lot of different architectures and requirements, I am not saying virtualization is the silver bullet but it’s a option to be checked. For example, TeleStax is doing great things in AWS with excellent results: http://www.telestax.com/load-testing-smsc-on-ec2/ and we’ve deployed OCWSC in our own environment with great results. As I’ve said before, IMHO virtualization can be a good idea in some scenarios.

    1. SMS is a simple case – it doesn’t require predictability at the millisecond level as voice and video does, so I don’t see this as a case for virtualization in WebRTC media. That said, virtualization definitely have its place for most of the scenarios today (and growing in that area by the day).

  2. Latency and jitter matter most at the endpoints. If your webRTC media is peer to peer between 2 browsers, you don’t much care if the signalling is delayed slightly in an amazon VM.

    Even if the media is going through a gateway on a VM (say being decrypted or something) – you probably don’t mind much as the network latency and jitter on your wifi or 3g last mile will probably exceed anything EC2 is likely to add.

    If however you want to mix webRTC media with PSTN originated media, then you may care much more. The PSTN has historically had super accurate clocks (sync’d by GPS these days) and gets all upset by delayed media.

    So – It depends on your use case. Middle boxes are probably fine in the cloud, endpoints are risky but possible if you examine the detail correctly.

    1. Tim,

      I think we are on the same page. Doing signaling on virtualized servers is something I think is a definite “must” today. It is easy to do, it has huge benefits with little in the term of downsides. Media is the challenge here, and there it means TURN, transcoding, routing, mixing, etc.

  3. From a media server perspective, we have been able to make AWS work but its very unreliable if you are using a shared instance – especially the cheapest micro instances. We are finding many optimizations you can do to make it work better.

    Some of our customers have had better results with other hosters like Rackspace. The use of shared instances is definitely something that needs to be considered if you care about quality.

    Twilio uses AWS and seems to survive fine, although I do not think they do not do any server-side video processing.

    1. It would be interesting to see Chad how Dialogic will work this out.
      One of the best things going for you is the fact that you are pure software. Next step is probably easy cloud deployment for customers and from there, full virtualization.

  4. Big difference between shared/virtualized instances running AWS own OS.. vs. dedicated server instances (on-demand), also from AWS. We have our own selection of OS and optimized libraries for real time video services. We had to do the hard work to “eliminate” the unpredictability and maximize performance to provide the benefits of cloud and the reliability of telco.

      1. Containers is a much better fit than VMs, less latency because there is no overhead/emulation/layers:

        http://www.youtube.com/watch?v=p-x9wC94E38

        I’ve suggested Joyent in the past on your blog, they use containers (or actually Solaris/SmartOS Zones which is similar).

        http://www.docker.io/ is also getting a lot of adoption, which is an developer oriented (can I say optimized ?) way to develop and deploy with Linux containers: http://www.youtube.com/watch?v=Q5POuMHxW-0

  5. The abstracted point that you make is critical: use case specific deployments and no one size fits all. Glad you wrote it up in nice manner as we gloss over it too often.

    My guess is the minority of overall WebRTC deployments will need dedicated compute:

    Yes, if you are doing video transcoding and need to keep audio and video streams in sync with minimal frame delays and jitter buffers, especially if you are dealing with lossy networks.

    Likely no, if you aren’t doing video, are doing P2P media, or even if you are *just* switching media streams in the cloud…would really need to do some benchmarking and use case analysis to determine if the dedicated compute is necessary.

    1. Well… almost any use case will need TURN. TURN relays media. The less it affects the stream (=jitter) the better it will be to handle. Routing and switching video is also critical because it will usually imply multipoint scenarios where again, there will be a need to sync not only a single session but a couple of them.

      1. I suspect that TURN (shuffling around network packets) has somewhat different requirements here than anything that does mixing.

        I don’t have data to prove it though.

  6. That’s a pretty broad recommendation. I wouldn’t agree with it. There are different server services involved in setting up a webrtc call. You should be able to safely host the webpages, the signalling server, and a stun server in the clound. Turn server, MCU and any kind of transcoding/real-time manipulation service is likely better on a dedicated machine. Remember that the low latency is required for transport of Audio and video, not so much for anything else. Most calls should be peer-to-peer so the speed of the server should eventually not matter.

  7. To test your thesis, I put up a JitMeet site in Azure, with the jitsi-videobridge audio mixer and video conferencing server running in a small VM. I haven’t done extensive load tests, but it seems to work ok for a small number of users. Check it out:
    http://meet.azurewebsites.net/

    Note that the jitsi-videobridge is not a conventional MCU, but rather a Forwarding Unit (e.g. RTP translator). Since it just forwards packets, it doesn’t add much latency.

    1. Greg,

      The Kinesis solution as far as I am aware is still focused on peer-to-peer and IOT. I am guessing it is best used for places where you are trying to connect embedded devices and their video streams towards the cloud and less for video conferencing type scenarios – for that it has little appeal at this point in time.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}