popexpert and WebRTC: An Interview With Jeremy Thomas

04/04/2013

If you are looking at the video experts market, then popexpert is for you.

There are several services out there that are focused at bridging the gap between experts and their customers. popexpert is such a service. The idea behind it is an elegant one: if you search for an expert in any area, search the site, schedule a meeting, and open your browser when the time comes – the service will take care of the rest. From setting up an account, to scheduling, to billing by the hour.

Jeremy Thomas

I was really happy when Jeremy Thomas, CTO and Co-Founder at popexpert took the time to guide me through their service and answer my questions.

 

What is popexpert all about?

popexpert has a core mission of inspiring and enabling lifelong learning.  This mission is manifested through technology that makes experts accessible to people, regardless of geographical boundaries.  Booking time with an amazing expert is as easy as booking a meeting through your favorite calendaring system.  And we think convenience is very important, which is why popexpert seamlessly integrates scheduling, conversation and payment.

 

How did you come up with the idea for such a service?

My co-founder, Ingrid, came up with the idea after pulling back from her technology career and taking a six month sabbatical.  During this time, she learned meditation and nutrition, and was inspired to make the experts she learned from accessible to others.

 

popexpert screenshot

What role does WebRTC play in popexpert?

WebRTC is the audio/video standard we implemented to provide live, one on one video sessions between experts and learners.  While lifelong learning is our purpose, we’re in the business of selling airtime with experts.  So, in a way, we’re selling WebRTC sessions.

 

Why not use Flash or just Skype for that matter?

We initially tried Flash but found it to be too CPU-intensive for our purposes.  Peer to peer connections, which is what most of our users establish, tend to be data-intensive.  Skype might be our temporary fallback if WebRTC doesn’t gain mass adoption, but we’re bullish here.  We wanted a solution that worked within the popexpert experience.  Plus, we have logic baked into our video session page that records session metrics.  These help us diagnose the efficacy of each session and prompt billing action after a session has ended.  Billing scenarios can be complicated.  Did the learner show up?  How about the expert?  How long did the session last?  With Skype, answering these questions would have been much more difficult.

 

What do you use for the front end of the solution?

popexpert is built in Ruby on Rails.  Anyone with Chrome or IE with ChromeFrame (which we prompt IE users to install) can use our WebRTC video technology.  And anyone with any browser can use all other parts of the site.  The requirement for Chrome or ChromeFrame on our video page has been nice for us as engineers as it allows us to fully utilize HTML5.  Examples of that include video mirroring and full-screen video.

 

What goes on in the backend?

We use WebSockets for signaling.  Given that we’re a relatively small company, I didn’t want to have to stand up and support our own web socket server nor worry about adding the security layer on top of that.  So we went with a third party service called Pusher, which we use for WebRTC signaling and text chat.  Pusher plays nicely with SSL and has a callback feature that allows us to authenticate users before they gain access to a specific channel.

As far as WebRTC goes, the majority of our connections are peer to peer, which keeps our costs down.  But we set up a TURN server, specifically the rfc5766 open source TURN server from Google (https://code.google.com/p/rfc5766-turn-server/), to broker audio and video when firewalls prevent peer-to-peer communication.  It took us a while to figure out how to configure it to work with WebRTC, but we managed to do it after perusing WebRTC forums.  If anyone’s interested, here’s our configuration:

turnserver -o -X xx.xxx.xx.x --no-tls --no-dtls -a -b turnuserdb.conf -f -r popexpert.com

Security is important to us, so we build a simple Sinatra app that populates the “turnuserdb.conf” file with temporary credentials for each user that enters a video session.  The app calls the “turnadmin” command which encodes the temporary password and adds the record to the file for us.  Those credentials are then passed into our “iceServers” list when we initialize a peer connection from Javascript.  Here’s how our ice servers are configured:

{iceServers:[{ url: "turn:" + turnUserName + "@" + turnServer, credential: turnPassword },{url: "stun:stun.l.google.com:19302"}]}

“turnUserName”, “turnServer” and “turnPassword” are all passed into our javascript constructor when the video session page loads. We use Google’s STUN server as a backup, and found that, unless we had our TURN server listed first here, Chrome would never use it.

When the video session ends, our Rails app makes a call to the Sinatra app, which in turn removes the credentials from turnuserdb.conf.  Here’s a diagram illustrating these steps:

popexpert - interaction diagram

 

What worked and what didn’t with the integration to WebRTC

We initially implemented WebRTC via OpenTok.  I love what those guys are doing.  Their bread and butter is in the many to many space, and they had set flags in their implementation for WebRTC to operate via many to many.  Chrome 24 shipped with a bug that broke OpenTok’s WebRTC implementation (audio would fall out after a short period of time).  And they couldn’t fix the problem until Chrome 25 shipped.

Waiting for a new release of Chrome to resolve bugs would kill our business.  So we decided to roll our own implementation of WebRTC.  The difficulty there was in figuring out which reference implementation to use. Some of the demos from the HTML5Rocks article (http://www.html5rocks.com/en/tutorials/webrtc/basics/) are dated.  The apprtc implementation is good, but it’s built with the assumption that person A calls person B.  In our world, we know a session is scheduled to happen between person A and B, but we don’t know which one will get there first, meaning we don’t know who the initiator is.  So we had to build logic into our signaling system where one user is dynamically designated as the initiator.

The next challenge was figuring out how to get the TURN server to work.  There was no one authoritative source showing how to configure WebRTC on the client-side to work with TURN, nor was there a source showing how to configure the RFC5766 project to work with WebRTC.  Getting that to work required a bit of digging.

Finally, we’ve had intermittent issues with audio where it doesn’t work bi-directionally while video works every time.  I think I’ve figured out how to resolve that problem (it looks like a race condition – you need to wait a bit after you call “.addStream” on your peer connection to add your local stream before sending an offer), but that required a lot of trial and error.  Really what it boils down to is that WebRTC is so new, and there are so few reference implementations, you’re on your own to resolve issues like these.

 

Given the opportunity, what would you change in WebRTC?

Cross-browser interoperability will be huge for us.  I’m excited by what I see coming down the pipeline between Firefox and Chrome.  But I’d love to see a more authoritative reference architecture which illustrates how to setup a TURN server, which technologies are good for signaling, etc.

 

What’s next for popexpert?

Next up is focusing on building our marketplace.  We may bring WebRTC to mobile natively.  But we’re more likely to wait for mobile browsers to catch up.  We’d also love to partner with Google as a reference implementation.

I’d like to thank Jeremy for his time and effort on this one. If you’d like to try out popexpert, you can use their invite link to do so.

The interviews are intended to give different viewpoints than my own – you can read more WebRTC interviews.

Responses

Scott colesworthy says:
April 4, 2013

Fabulous article. Does anyone know if the VP8 implementation in the most simple environment of a Chrome browser communicating with another chrome browser supports 100% reliable full duplex. Meaning both parties can talk continuously at the same time and there are not audio drop outs due to things like echo cancellation getting activated?

Reply
    Tsahi Levent-Levi says:
    April 4, 2013

    Scott,

    VP8 is a video codec. The questions you ask are around the voice codec – and more precisely about the algorithms and voice engine around it.

    The voice engine used in Chrome comes from Google’s acquisition of GIPS, which at the time of acquisition was one of the best in the market.

    I hope this helps…
    Tsahi

    Reply
      Scott colesworthy says:
      April 5, 2013

      This is even more complicated than I thought. Is the voice codec choice separate from the video codec choice? In WebRTC does the browser choose there own voice codec? Or is the GIPS based voice codec assumed to always be linked with the VP8 video codec.

      Do you know someone who is an expert on this issue and full duplex capability of the GIPS based audio codec?

      Thank you.

      Reply
        Tsahi Levent-Levi says:
        April 6, 2013

        Scott,

        This is relatively unrelated to this post. I’ll try to follow up on it with you via email.

        Reply

Comment