Can we learn from the history of email about the future of WebRTC?
[Chad Hart is someone you should know already. He is one of the authors of webrtcHacks. He was kind enough to share his views about the possible future of WebRTC federations.]
There has been a lot of discussion lately about WebRTC and federation. Since WebRTC does not specify signaling, creating a WebRTC service that interoperates with other services is completely optional to the provider. This is in contrast to how the traditional telecom’s world operations, where a common signaling protocol is mandatory across every device that connect to the network. This mandatory protocol use means that, in theory, any phone can plug into any telco provider and any telco provider can talk to any other telco provider since they all can speak a common language.
With the exception of a few vendors like Hookflash and &yet, multi-site federation for WebRTC has not been a design goal of the vast majority of WebRTC startups and web-oriented companies. The telco-oriented companies have a slightly different perspective, but they do not really need WebRTC to federate since they can already do that in their networks with the PSTN.
This got me thinking – are there other examples of widely used communications services that are federated over the Internet? Is the universally federated nature of the PSTN really just an anomaly due to its unique history that dates back to the late Victorian era? Or do networks naturally want to merge due to Metcalfe’s law?
The most prevalent federated communications system on the Internet I could come up with is email. Would a federated WebRTC network end up like today’s email systems? Is this the future of the PSTN post-analog/TDM?
Similarities
The PSTN certainly has a significantly longer and more complex history than email, but telephony actually has a lot in common with email. Both have a universal addressing system – phone numbers for the PSTN and the [email protected] email format. Both are globally federated – I can email anyone who has an “email address” just like I can call anyone who has a phone number.
Both are also based on principles from previous service paradigms. Today’s VoIP networks do not look anything like the analog systems from 100 years ago, yet they still use the same concepts of “calling”, “ringing”, “hang-up”, etc. Likewise, email leverages many principles of the postal system that came before it – after all, it is electronic “mail” with terms like “mailboxes”, “address”, message “envelope”, and the Post Office Protocol (POP).
Figure 1 Basic Email architecture
Figure 2 Simplified, Telco-oriented view of WebRTC federation
Email – from silos to ubiquity
Internet Email has certainly evolved. Although federated internet email started in the ARPANET days, widespread usage really began in closed corporate systems like Vax. These eventually lead to LAN based systems like Lotus Notes and Banyan VINES. Walled garden systems like CompuServe and AOL also offered email for consumers. Eventually, and gradually, these systems started to federate until they all came to support SMTP allowing easy emailing over the Internet. Expanding the range of who could be emailed helped continue its growth. Later, email was incorporated into the web, initially with services like Hotmail.com, where the concept of federation was assumed from day one and the silo approach was never known by most newcomers to these services.
Today federated email is free and ubiquitous – most people have multiple email addresses and getting more email identities is a non-issue. Email is also embedded everywhere – my WiFi router has email alert. I can get email notifications of my social networking updates. Email has becomea basic feature of almost every application.
PSTN-telephony is certainly trending in the same direction. WebRTC makes PSTN-access over the web possible, in the same way early web-mail systems freed email from desktop based applications. With “click-to-call” buttons showing up more often, the PSTN is also becoming embedded in more applications. WebRTC will make this much easier.
Is PSTN-telephony going to continue down this path with a system to federate WebRTC clients? What would it mean if it did?
Implications of this analogy
Email technology is so ubiquitous and mature that adding it to an application is no big deal. While WebRTC has many issues that need to be worked out, it does represent a major maturation point for VoIP telephony. It is good enough to open source. It is mature enough to be added to nearly a Billion browsers already. Millions of web developers can make use of it in short order. And just like email, if something is free, available everywhere, easy to embed, and easy to use, it will get used.
Who can provide email services? Pretty much anyone – loading up an SMTP server pretty simple in the realm of IT tasks today. Every ISP has a mail server, most major businesses – pretty much anyone with a significant user base that sends email.
Who makes money off of email? Almost no one. Google, Yahoo, and Microsoft have figured out how to monetize email as a service. Everyone else just offers it as an amenity – like your ISP, as a feature of a larger platform or ecosystem – like Apple iCloud, or as an embedded feature – like your favorite UC provider.
What does this mean for telecom service providers? If WebRTC goes this route then it will mean web-based telephony will be everywhere, but only a few providers will figure out how to make money off of it. This will be bad news for the approximately 2000 PSTN service providers out there.
What about the vendors to these telecom service providers? Well, there will still be huge demand for WebRTC technology but the customer base is going to look a lot different. WebRTC will mostly be an underlying feature, not a stand-alone product. The market will evolve in weird and unexpected ways once various niches get a hold of it.
Could WebRTC really end up like Email?
Does WebRTC need to be federated – no. Will a way to do broad scale federation for those that want to eventually emerge – likely yes. Will Telco’s use WebRTC as a user access mechanism and federate using existing mechanisms on the back-end – they are already actively experimenting with this.
Can the PSTN be compared to email? It is certainly not a perfect analogy. The PSTN has a heavy regulatory legacy. Email was never proposed as a way for citizens to receive emergency services. Let’s also not forgot thatthe quick and universally federated nature of email has led to massive spamming. The PSTN have never really experienced widespread SPAM issues, both because of the relative expense to the spammer and laws preventing it. It will be interesting to see how these changes are addressed as the PSTN dissolves into the web.
So this makes me think, even if it is possible, perhaps inevitable, that there will be some degree of WebRTC federation to supersede the PSTN, is that something we really want?
Don’t worry, it is inevitable. I would be surprised if it didn’t happen.
Why do you think the telecom providers, Google, Microsoft with Lynx/Skype and all the other players are in this game ? It’s disruptive technology.
You don’t think Google wants to be your next telephone-number ? They already have a presence on/in the operating system of many of the phones.
There are people who are part of the IETF RTCWeb working group that are working on this.
Eric Rescorla who the guy who writes all the security related drafts for WebRTC for the workgroup has this: http://tools.ietf.org/html/draft-ietf-rtcweb-security-arch-07#section-5.6
Mozilla has an implementation of that: https://air.mozilla.org/intern-presentation-seys/
There are telephony providers in my country that I could use that can route my own mobile telephone-number directly to my own VoIP-server.
So I’m already free to put the infrastructure in place to route all the legacy-callers to my email address so to say. And have my email address point to all my devices.
So really ? How difficult do you think this is going to be ?
Does this sounds expensive, I think most of it will be available for free. Built in to your operating system, your browser.
Mozilla has something called the Social API. Do you know what that is ? That’s a contact-list: https://blog.mozilla.org/blog/2013/02/24/webrtc-ringing-a-mobile-phone-near-you/
You know things will change radically if the CTO of the FTC does a talk at the IETF.
His slides are:
http://www.ietf.org/proceedings/86/slides/slides-86-iab-techplenary-4.pptx
Audio recording of his talk is at 28:22 of this file:
http://www.ietf.org/audio/ietf86/ietf86-caribbean4-20130311-1730-pm3.mp3
For example did you know POTS is getting more and more expensive ?
It’s not really surprising that Henning Schulzrinne talks at the IETF. He’s been doing that for over 20 years 🙂
But does he also talk about POTS, PSTN going to be replaced with VoIP and WebRTC and there only being one network (the Internet) remaining ?
Very nice post!
In my view, one of the strong points of WebRTC is that it does not mandate any specific signalling protocol, minimizing the need for standardization, giving more freedom to innovation.
However, it would be great if we could get a standard mechanism to let any WebRTC end-user talk each other independently of the domain the user belongs to (similar to email and PSTN, as you mention).
In my opinion, this should achieved ensuring that:
1- required standardization is kept at a minimum ie standard signalling protocols should not be required;
2- full WebRTC features are supported
3- security mechanisms must be in place to avoid email problems like spam
4- required additional costs should not jeopardize innovation from emerging Startups ie back-end solution should be kept simple
5- application development is possible by any Web Programmer
This is quite a challenge but it is still possible to achieve!!
If you’re to have standardized federation, you’re going to need a standardized signalling mechanism – and standardized transport, and standardized identity. Or at least, a heck of a lot of bespoke gateways if you don’t.
Luckily, the folks over at http://xmpp.org/ have been doing that for several years, and &Yet and ESTOS demonstrated a federated voice and video solution based around XMPP, Jingle, and WebRTC. Philipp Hancke, who’s behind the ESTOS implementation, and Lance Stout, behind the &Yet implementation, talked at WebRTCCamp about desktop interoperability, security, and more.
Oh, and because it’s XMPP, your address will look (almost) exactly like an email address. (XMPP’s addresses – known as jids – are fully unicode unlike email, and miss out the more obscure syntax rules, but otherwise much the same).
It’s not *quite* a solved problem – there’s some WebRTC features that don’t yet map cleanly to XMPP’s Jingle – but you’re all welcome to join the mailing list at [email protected] and help build the future.
I don’t expect that XMMP will be used.
Have a look at the first 2 links I posted in the comments.
Those can also be made to work outside of the browser.
Sure. Jingle calls could be made on Nokia’s n900 out of the box back in 2009, as well as previous “non-phone” devices like the n800 in 2007. Google’s Hangouts actually use Jingle (and XMPP) under the hood on Android, too.
No browsers were harmed in the making of this call.
That is not what I meant, I meant it is more likely a PSTN-like WebRTC will probably NOT use XMPP. But use an email address or email address like identifier ( [email protected] ).
Do have a look at the first 2 links I mentioned in the first comment.
Dave,
I’m not sure if standardized federation is synonymous of universal interoperability but you can ensure interoperability with no standard signalling transport.
Right now, it is already possible to have anyone talking each other even if they are registered in different domains that have no agreements to interoperate or any server to server common protocol or gateway.
The calling party just have to have a URL from the called party to download a Web App and .. boom … they can talk, chat, see each other. And that’s it. No standard signalling involved. Both parties “only” have to be connected via the same signalling server that, in this case, is provided by the called party. This is really a new paradigm that won’t be easily embraced by well established mind sets and, of course there are some issues to solve with this approach. One of them is that the user experience is set by the called party which, in some use cases, is nice, but in others is not acceptable.
Currently we (Portugal Telecom) are partnering with Deutsche Telekom in a R&D experimental project (WONDER – Webrtc interOperability tested in coNtradictive DEployment scenaRios), partially funded by the European Commission, to address these issues and others with some new concepts. One of them is “signalling on the fly” ie you download and instantiate different signalling javascript libs according to your needs and interests.
We are going to present initial results in the “Telecom APIs” conference in London (12 Nov) and in the “WebRTC Conference” in Paris (10-12 Dec)
I know Jingle quite well (I’ve developed a proof of concept SIP-Jingle GW almost 10 years ago) and I like it. But for some use cases it may not be the most appropriate protocol. It is “just” another protocol like SIP. The agreement on a single “signalling protocol” will take too much time and efforts that we can’t afford to. And at the end it won’t address all use cases in the most appropriate way.
Let’s keep it open and let service providers and developers decide which signaling protocol to use.
PCh
So what you’ve done there is established a signalling protocol of “send someone a URI”.
And yes, you could send that by email, or indeed any other transmission path you happen to have. But if you want something ubiquitous, like an email address, but instant, like IM, then there’s precious few choices.
You can wander up and down the stack a bit – send a signalling library URI instead of a app URI, and establish a standard signalling API instead of protocol, but there’s a standard somewhere – it just depends where you want to put it.
As you might have noticed, I think Eric Rescorla incorporated an important part that I think the other solutions don’t yet have.
Authentication of the end points.
WebRTC has encryption in the form of SRTP and allows for doing proper authentication. So you know who you are talking to (and you can send media directly between these peers) and it has been confirmed by the organization that controls the domain.
At least have a look at presentation from the Mozilla intern Ryan Seys.
I’m aware of the use of DTLS, and I’ve a reasonable understanding. I’m not sure, though, what that has to do with my comment.
Actually, SRTP has been in Jingle for some time – we’ve typically used ZRTP, but DTLS isn’t dramatically harder to support. I think we added SRTP in around mid 2008, actually. (Philipp tells me it was the end of July 2008).
XMPP’s authentication model is hop-by-hop, so I can know what domain the other endpoint is, and in practise domains enforce user security. When it comes to media channels, though, we can use the same authentication as WebRTC uses – after all, they’re the same data channel. We do have a little more security in how we transmit the signalling data, though, as compared to a blind throwing of data – and moreover, we actually have an authentication model.
This is important, because given your identifier I can be reasonably – not absolutely, but reasonably – sure that a message I send to you will only go to you. This includes both text chat, presence, and Jingle signalling traffic. And that means that I also have a reasonable assertion that the DTLS fingerprint given in the SDP (Okay, so now it’s Jingle) is almost certainly enough to authenticate you.
Yes, it would be a client side standard API (aligned with current WebRTC API where the Identity stuff plays a key role) and not a standard protocol. In future, the WebRTC API could evolve in this direction with minimum costs for browser and back-end server vendors.
“The calling party just have to have a URL from the called party to download a Web App and .. boom … they can talk, chat, see each other.”
And they use what to transmit it? The assumption is that the transport of the url is fairly realtime. What if the caller leaves before the callee accepts? How is the caller notified that the callee rejects the invitation. What if the caller leaves and the identity connected to the URL is assigned to someone else? How does the caller reclaim that identity if he comes back later?
For the majority of use-cases you need a bidirectional channel between caller and callee which is near-realtime. And if you have that, you can run signalling over that channel.
“Both parties “only” have to be connected via the same signalling server […] This is really a new paradigm that won’t be easily embraced by well established mind sets”
No, it’s a fairly old one. It is called a SILO. Or walled garden if you’re from an older generation.
Actually, WebRTC can be done federated just fine.
Also, their is a person in the IETF RTCWeb working group from AT&T that works on push notification.
So calling a person without sending them a prior message should be possible that way.
“And they use what to transmit it? The assumption is that the transport of the url is fairly realtime.”
The URL would be part of your “signature” ie it would be an Identity Management issue.
“What if the caller leaves before the callee accepts? How is the caller notified that the callee rejects the invitation.”
As soon as the signalling channel is established between the two peers you can exchange all messages needed to control the call including “cancel”, “reject”, etc
“No, it’s a fairly old one. It is called a SILO. Or walled garden if you’re from an older generation.”
Not really, if peers involved in a conversation don’t have to have an account in the same domain or service provider. And, yes, I’m old enough to know what a SILO and Walled Garden is and I also don’t like them 🙂
paulo:
@part-of-your-signature: yes, that is the more useful kind of meet-me url. It requires strong identity and you can not simply generate one on the fly.
@silo: you’re not an identity silo anymore, right. Instead, you’ve morphed into a call data record silo. At least, the user can change that without changing identity, so it might be a step in the right direction.
Let’s have a chat about that in Paris 🙂
Phillip, not sure what you mean by “you’ve morphed into a call data record silo” but, yes, let’s keep talking in Paris or somewhere else. You may also contact me at [email protected]
@Philipp on the issue of “call data record silo” have you ever looked at Persona/BrowserID ?
It seems to me that the ultimate destination of WebRTC is browser-to-browser calling with video. It also seems the natural path for the technology is thru the E-mail client.
Yes? No?
Bob,
I am not sure that is correct. It depends a lot about progress in push notifications in web browsers as well as mobile adoption. Another question is where do yu draw the line to what is and isn’t WebRTC… porting it to a mobile device and embedding it inside an app – is that the same thing? If it is, then no email necessary.