Whatsapp VoIP Implementation: The Good, the Bad and the Ugly

07/04/2015

Whatsapp soft launched their voice calling feature and it is far from satisfactory.

Almost a year after promising it, Whatsapp finally started rolling out  its voice calling feature. There are great many things to say about it. Some good. Some bad. Some just ugly.

Whatsapp

Take the information below with a grain of salt. I live in Israel, which is far from Whatsapp’s target audience, so my guess is the calls I’ve made traveled the world before getting to their destination (which was usually in the same building).

The Good

  • I loved the launch
    • Never really making it official
    • Starting off with opening up the feature if you received a call from someone who has that feature
    • Then allowing people to download the latest version from their website
  • The voice quality
    • Awesome. Made a call with my wife. She immediately said I sound weird. I asked her if she hears me better than on the phone. She said yes. She wasn’t used to HD voice
    • My guess (which is detailed below)? This is WebRTC and the voice codec is Opus. But again – only a guess
  • Works well on LTE with little coverage
    • My smartphone (OnePlus) had 2 bars of LTE cellular coverage
    • It didn’t hurt the call quality as far as I could tell – it worked better than most
  • User profile image of the caller
    • Don’t know why, but I really liked the implementation they did
    • Better than other apps I am using
    • It seems people invest in their Whatsapp profile image for no good reason, and now it actually shows
  • Recent calls
    • This is where the real power lies
    • Once this service become widely spread and usable, people will find it easier to go to their recent Whatsapp text chats and just call someone from there

The Bad

  • Missing desktop notifications
    • It’s really nice to see calls coming to my phone
    • Would be nice if the web interface notified on incoming calls as well

The Ugly

  • Echo
    • There’s some latency issue there, where a call can get 1-2 seconds of echo easily
    • It sounds bad when it happens, which isn’t every time, but somehow, in most of my calls one of the sides had this nagging echo issue, and it wasn’t something I could nail down to network condition or a specific device

Until they fix this echo issue, the use of Whatsapp for voice for me will be non-existent.

In Israel, Whatsapp is everywhere. I receive on a daily basis a voice call over Whatsapp from someone just trying to see what that phone icon on the screen does. And then there’s echo in one of the sides of the call. Never in the same side. A missed opportunity.

Is it WebRTC?

I don’t know. When Whatsapp was 500 million users strong, I hypothesized that Whatsapp voice will be based on WebRTC. Whatsapp recently passed the 700 million users mark. I haven’t changed my mind and still believe it is WebRTC based – it is the most logical thing.

What makes WebRTC a better decision is also Whatsapp’s recent foray into the desktop – and its use of WebRTC there. If it is using WebRTC in its mobile app, then adding voice to the browser will be easier to achieve.

iOS First Anyone?

Whatsapp decided to launch voice for Android prior to iOS.

They also decided to launch Whatsapp for web on Chrome first. Then Firefox. And then… wait.

This should be a huge red light for those who believe they had the hegemony up until now – iOS first is the way to go for most apps and developers, but in some cases, it seems that Android works best.

For those interested in VoIP, Android may well be a better platform to start with.

 

Responses

Andy Abramson says:
April 7, 2015

Tsahi,

I use Viber to speak to friends all over the world who also use WhatsApp. No latency, no echo. Given my WhatsApp account is on my iPhone6 and they don’t support simultaneous login, treating each mobile device as its own account endpoint, when I switch phones it confuses people on the other end so I have not tried the service on my OnePlus.

I’ve found echo to be the biggest problem, but often its more than the app that causes it. The routing of the call, the mobile operator, WiFi, the wireline provider who is the ISP, even the WiFi router all can be contributors. Standards are supposed to be coming that make WiFi calling better but I’m skeptical.

What WhatsApp/Facebook is seeking to do is be the next Skype, but the old Skype dealt with echo and latency from almost the start, and they were rigorous in their pursuit to keep it as good as any landline. They basically looked at IMS and figured out how to do it with the operator. Until WhatsApp approaches it the same way, it will suffer from what you described.

Reply
    Tsahi Levent-Levi says:
    April 7, 2015

    Andy,

    Thanks for the tips. Much appreciated.

    I believe you can do better with such echo on the software side of things – others succeeded, so I see no reason why Facebook/Whatsapp won’t be able to as well. I’ll be checking their voice calling every month or so (or whenever I am called) to see if things change. Here in Israel, Whatsapp is THE messaging system, so if their voice calling service improve, it shouldn’t be hard for them to win considerable chunks of the voice market here as well.

    Reply
Andy Abramson says:
April 7, 2015

Tsahi,

I agree if they want to. What I call “the ins and outs” of transcoding, network interconnects and device configuration all impacts the call quality. Here in the USA I use WiFi calling with T-Mobile and when calling T-Mobile to T-Mobile the call quality is amazing. When a call comes in via Google Voice it drops back to SD over WiFi, and is good. When I move to the mobile network, the call quality degrades because the network doesn’t handle the network changes as the routing is slightly different with respect to how Google’s network sees it. Until things like that are fixed, and usually its latency, we will still never have perfection…but we’re getting there.

Reply
Carlos says:
April 7, 2015

I pretty sure they use Opus [1]. I can’t think of a better option than this in the current situation of the market.

I’d dare to say they use PJSIP too, which in case it’s true, wouldn’t surprise me much.

[1] https://twitter.com/caruizdiaz/status/558102717361831937

Reply
    Tsahi Levent-Levi says:
    April 7, 2015

    Carlos,

    Thanks. My bet won’t be on PJSIP as it is known that Whatsapp runs on top of a variant of XMPP, so adding the SIP protocol makes no sense.

    Reply
      Carlos says:
      April 7, 2015

      Then there must be a non-obvious reason why they initialized PJLIB (see screenshot in my tweet) which is a direct indication of SIP usage.

      Reply
        Diego says:
        October 13, 2017

        Hi;
        PJLIB can be used without the SIP part.
        It has very good OS abstraction libraries.

        Reply
Antón says:
April 7, 2015

I was using the service for a while and it’s quite good, enough to be a great success. People doesn’t care about technology, there is a new button and they can make free calls when they are connected to the Wifi, they don’t need to install nothing more in the smartphone (advantage over Skype) and they have most their contacts (advantage over Hangouts) so it doesn’t matter if the quality of voice is better or not… It worked initially for chat and it’s going to work for voice in the same way.

I bet we’ll see the phone icon in the web very soon… and WebRTC is the logic candidate to implement it.

My only doubt it’s how Facebook is going to capitalize this, the infrastructure to support a service like this is huge and nobody was able to earn significant revenues offering free voip calls. I know, “If something is free, you are the product.”… but it’s not so easy.

Reply
    Tsahi Levent-Levi says:
    April 7, 2015

    Anton,

    Thanks for commenting. Here in Israel, for ALL voice calls I’ve done so far on Whatsapp, none has been good enough to qualify using it. I am sure this will improve with time, but until it does, I am not going to use it.

    As for capitalizing from it, I am sure Facebook have done their ROI and are aware of the risks and the opportunities involved.

    Reply
      Antón says:
      April 7, 2015

      Tsahi, you know what Opus means… so you are not probably the Whatsapp target, but probably you are going to use it more than you expect, because some contacts are going to call you using it or they are going to be available ONLY in whatsapp when you are traveling. In my opinion, the success or fail of Whatsapp voice isn’t dependent of the quality of the service but only time will tell.

      Regarding the capitalizing, yes, sure they did but they may be wrong or they can see it as the only opportunity to grow and they are taking risks. I don’t know but I am sure about this, if Whatsapp is able to manage this service, support a so huge number of subscribers and earn money with it, CSPs are going to see an important decline in one of their main revenue sources… and, at least for me, this is a bad new.

      Reply
Zoa says:
April 7, 2015

I wouldn’t call android the best for voip or real time audio in general. They (google) are trying hard to make things better, but nobody can beat the <10ms iOS audio latency and google does not have full control over the manufacturers to do a better job.

Even if you get decent latency, it doesn't mean quality will also be ok, it takes a lot of testing and reworking to get consistent quality on the worst 10% of devices.

I spoke briefly about the subject on elastixworld, this is the video:
https://vimeo.com/channels/896559?ref=tw-share

You can test your phones audio latency with this app:
https://play.google.com/store/apps/details?id=com.zoiper.audiolatency.app&hl=en

Zoa (Zoiper CTO)

Reply
    Tsahi Levent-Levi says:
    April 7, 2015

    Zoa,

    Thanks – I’ll be checking that video soon.

    The point I was trying to convey is the fact that in the last two months or so I’ve been noticing a couple of companies who roll out Android first features and apps – where the main theme is the use of VoIP technology in them. It is counter intuitive to most other trends I am monitoring in this space, so I tried to explain it from the information I have.

    Reply
      Zoa says:
      April 7, 2015

      Tsahi,

      I think the complexity for the implementation of the VoIP part is about the same on the different platform. Most companies will use off the shelf cross platform SDK’s (we make one of those SDK’s).

      I don’t know what the reason is they would go for android first, better examples possibly, but i presume it has more to do with the android market share and the price of the devices.

      The apps will typically pass small scale beta testing, but when deployed to a larger number of devices (there are about 7000 devices listed in google play now), a lot of solutions will suffer from bad quality. The bad quality can in many cases be worked around with bigger buffers, but latency is already bad so that is not the best option. Hardcoding settings for a specific device is tricky as well, many devices and firmware combinations possible.

      The latency issue is discussed in this famous ticket:
      https://code.google.com/p/android/issues/detail?id=3434

      On ios, there are some issues too, but the number of devices and firmware versions to test on are limited.

      Cheers!

      Reply
Dean Bubley (@disruptivedean) says:
April 8, 2015

I suspect there’s a pretty strong correlation between Whatsapp usage and Android ownership. On iOS & within iPhone-heavy social groups, a certain proportion of “basic” messaging probably uses iMessage instead.

Also, iPhones are often used by high end post-paid contract users, who have bundled SMS (& often unlimited voice call minutes). Whatsapp on Android probably has a higher % of prepay, lower-end users, who use up their PAYG credit per-message/minute.

ie voice telephony arbitrage opportunities are probably (on average) more attractive to Android users

Reply
    Tsahi Levent-Levi says:
    April 8, 2015

    Thanks Dean.

    I am noticing this as well to some extent here, but Israel is slightly different:

    Whatsapp is THE messaging app. You can’t live an orderly social life (especially with kids or while at school) without having Whatsapp. Here it is not a matter of phone type or social grouping – it is just mandatory.

    To top it off, in Israel, the situation has moved to unlimited, low, flat rates for SMS, voice and data. It hasn’t changed the behavior of people at all.

    Reply
Florian Overkamp (@florianoverkamp) says:
April 8, 2015

If it is indeed WebRTC under the hood (would make sense!) there is ample opportunity to minimize latency, since audio streams should be able to go peer to peer given the proper NAT-traversal methods. It would also open up the possibility to add video at some point.

As for the echo problems: echo is not caused by latency, but it is amplified by it. If an echo trace exists, for example because of the phone form factor or audio settings, it will become much much worse as the latency increases.

It would be cool if it were possible to use the phone’s hardware (echo cancellation, codecs) for this problem, but I haven’t seen any softclient that does that. I’d suspect there are no viable API’s to access that hardware.

Reply
    Tsahi Levent-Levi says:
    April 8, 2015

    Florian, thanks.

    From impressions and comments I received via Facebook, the implementation most probably isn’t based on WebRTC – I’d love to understand why.

    The echo thing isn’t specific to a device or even a device type, so I can’t really say what goes on there – just that it sucks.

    As far as I know, the hardware echo cancellation isn’t accessible via APIs to most (all?) handset devices out there.

    That said, Whatsapp (and Facebook) should be able to fix these issues – I can’t see them living matters as bad as they are for long.

    Reply
Vipul Rastogi says:
April 8, 2015

Any thoughts on how Facebook can monetize voice calling ? or video calling in future ? Do you experts see WhatsApp also moving into voice/video conference space ?

Reply
    Tsahi Levent-Levi says:
    April 10, 2015

    Not sure. I guess Facebook is more concerned at the moment at increasing the engagement time of users than anything else. Getting voice and video chats to there is part of it.

    Reply
josefina says:
April 9, 2015

That’s interesting, information about Whatsapp calling completely slipped my attention. I will give it a try but I assume it will be same as Facebook, service is not bad but still there better applications I would use instead.

Reply
    Tsahi Levent-Levi says:
    April 9, 2015

    Thanks Josefina.

    I had a colleague tell me her sons were ecstatic about this feature – they liked it a lot more than using Skype – even if the quality is worse at the moment. Go figure teenagers.

    Reply
Alex Cohn says:
April 10, 2015

There is a strong reason to believe that they use Opus: methods exported by /data/data/com.whatsapp/lib/libwhatsapp.so include

Java_com_whatsapp_util_OpusPlayer_allocateNative()

and some others.

Reply
Alex Cohn says:
April 10, 2015

Further analysis: even though the same library includes strings

Using WebRTC AEC
Using WebRTC AECM
Using WebRTC AGC
Using WebRTC noise suppression
Using WebRTC high-pass filtering

these patterns do not match the strings that appear in Chrome or in WebRTC source tree. So, probably they use parts of WebRTC code integrated into their app. Less likely that their implementation is compatible with WebRTC clients.

Reply
    Tsahi Levent-Levi says:
    April 10, 2015

    Thanks Alex.

    It strikes me as weird that they take this route with Whatsapp but the native WebRTC route with Facebook Messenger.

    Reply
      Alex Cohn says:
      April 13, 2015

      “native WebRTC route with Facebook Messenger” is on desktop, and uses the browser infrastructure, isn’t it? But on Android, they may have chosen to distribute minimal size, and cut out whatever possible. For one JavaScript API is of no use there.

      Reply
        Tsahi Levent-Levi says:
        April 13, 2015

        I am not a purist. If they built it on the browser with WebRTC, it is most likely that they used the same type of codecs, signaling and network stack for their mobile app. This translates into WebRTC everywhere for Facebook.

        Reply
Philipp Hancke says:
April 10, 2015

This commentbox is too narrow to contain my comments.

Reply
    Tsahi Levent-Levi says:
    April 10, 2015

    Fippo,

    Just for you, I am willing to give the alternative of writing a full length comment and I’ll publish it as a guest post 🙂

    Reply
What's up with WhatsApp and WebRTC? - webrtcHacks says:
May 16, 2015

[…] This spurred some interest in the WebRTC world with the usual suspects like Tsahi Levent-Levi chiming in and starting a heated debate. Unfortunately, the comment box on Tsahi’s BlogGeek.Me blog was […]

Reply
Joe bliw says:
October 27, 2015

What do you expect m??? As far as QoS goes it’s hit and miss for all free uncontrolled VoIP no matter where you live or what so called app service you use. If you want guaranteed voice quality…. The make a call via a telco (even then can be dodgy)

The world has been duped by all the vibers, Skype’s, facetimes, whatsapp and so on. You are using the internet… There is no control over packets between endpoints. It’s hit and miss if it works. The sooner you come to terms with this the sooner site’s like this will stop wasting everyone’s time by publishing rubbish abort the pros and cons of something that is fundamentally flawed regardless of what the app is. There is no such thing as a free lunch. If you want a comms app that can be supported and has have a chance of working then you need to pay for a service like Cisco Webex….

Reply
    Tsahi Levent-Levi says:
    October 28, 2015

    Joe,

    You are missing the point. I know VoIP quite well. Have been working in this field for most of my life, and used it for the past 15+ years on a daily basis. Whatsapp’s implementation at that point in time was useless for Israeli customers – their servers were located too far away and calls were relayed by default (at least at the beginning of the call).

    To get VoIP working properly there’s a lot more required than just slapping a server or two on the network – the actual way the deployment is placed and the care and attention given to the use case and various scenarios plays a significant role in the quality of experiences users will feel.

    Reply
joseph says:
July 19, 2018

Tsahi,

perhaps you can assist with my query, you seem to be an expert in these implementations.

When on a phone call, my bluetooth headset’s noise cancellation works a treat, and I can take conference calls with no hassle.

however, when calling via whatsapp, telegram, or google duo, the noise cancellation function seems to not be present.

Where is the restriction here? do you have another web based dialling solution that can still work?

headset is a popular plantronics model. Android phone.

Reply
    Tsahi Levent-Levi says:
    July 19, 2018

    Joseph,

    You will need to go with this to the specific VoIP provider.

    In general, echo cancellation and noise cancellation is implemented by software when it comes to VoIP applications. Their implementations vary in quality.

    Input devices can also come with their own noise and echo cancellation capabilities – just choose one that offers the quality you are looking for.

    Reply

Comment