Matrix.org and WebRTC: An Interview with Matthew Hodgson

November 13, 2014
Federated signaling for WebRTC. This interview is important to me for several reasons:
  1. I am not a fan of federated signaling. This is why this interview is so important to me – it shows a different way of thinking than my own here in my small corner of the world
  2. It is being developed and put out there by people I have known for several years, out of the scope of WebRTC
Matthew Hodgson is the Technical co-founder of Matrix.org. And Matrix.org is an open source federated signaling standard with an accompanying framework implementation. The idea as well as the execution is really interesting. Here are Matthew’s answers to my questions. What is Matrix.org all about? Matrix.org is an ambitious new opensource project which defines a simple HTTP API for decentralized federated messaging - be that for WebRTC call setup, or an instant message, or indeed any kind of JSON data structure.  Matrix ambition is to be the missing signaling link for interoperable WebRTC. Our core mission is to fix the problem of fragmentation between today's VoIP and IM communication silos by providing a really pragmatic solution for interoperable messaging; trying to make VoIP and IM as easy, ubiquitous and flexible as email, whilst reassuringly familiar to a modern web developer.  Meanwhile, the bigger vision is to use Matrix's cryptographically secure eventually-consistent distributed message history to provide a robust decentralized messaging layer for the internet of things. We provide the open Matrix standard, Apache-licensed implementations of our reference Matrix-compatible server (called Synapse), and example client SDKs and Apps for the WebiOSAndroidPerlPython and more... Matrix is still evolving at this point - we only launched in September, and we've been publishing our work as we go in the true spirit of open source.  However, we're approaching a frozen version of the standard - and Synapse is almost feature complete; all that remains at this point is finishing the authorization layer for federated traffic.  We strongly encourage developers to look at the tutorials and the standard; have a play with the APIs; try running your own server; and come give us feedback on #matrix:matrix.org so we can incorporate feedback and give Matrix the best chance of success. And as always, patches and github pull requests are very welcome :)   What excites you about working in WebRTC? Ironically, in a previous life the Matrix team by building Softphone SDKs and VoIP/Video stacks - we wrote one of the first (closed source, commercial) SIP softphones for iOS back in 2009, using reSIProcate for the signaling and our own media framework (mxmedia) for all the RTP and realtime media processing. So we're pretty familiar with the idea of WebRTC - and when Google acquired GIPS and released WebRTC it was both good news and bad news. At last there was going to be a good VoIP media stack installed in almost every web-browser in the world!  Suddenly anyone would be able to build their own VoIP functionality, harnessing the flexibility and frontier spirit of the web itself! But on the minus side, the marketplace for commercial VoIP stacks shrunk dramatically - forcing us to look at the bigger problem of communications, rather than the specifics of VoIP. Nowadays, the excitement of WebRTC is definitely that it provides the hardest piece of the technological problem of rolling out ubiquitous IP VoIP and IM services by turning almost every browser into a viable endpoint.  The fact that it doesn't define any solution to the problem of interoperating between WebRTC services is also exciting, as it gives us an opportunity to help out using Matrix.   Why Matrix? We already have a slew of signaling frameworks out there. What sets Matrix.org apart? I'm not sure there are /that/ many standard signaling frameworks out there - the only ones which immediately spring to my mind are SIP, XMPP, OpenPeer, FreeSWITCH's mod_verto and some of the XMPP extensions like XMPP-FTW and Buddycloud (I'm hoping nobody's gone and tried to write an H.323 stack in JavaScript...). Obviously there are a bajillion proprietary non-standardized JSON-over-HTTP implementations, which is part of the reason for trying to provide a standard implementation for those who want it... Matrix sets itself apart by:
  • Entirely distributed architecture.  Messages are replicated over all servers who participate in a conversation with eventual consistency, and cryptographically signed by the origin servers using a blockchain-style model to assert the integrity of the conversation's message graph.   There is no single point of control or failure over a Matrix conversation.  Servers can cache as much or as little of the message history as they like.  This is a huge difference from SIP/MSRP, XMPP MUCs, or conventional monolithic HTTP messaging servers.  It also means we get a 'self-healing' architecture transparently resilient to netsplits and offline/disconnected operation thanks to the eventual consistency.
  • It follows that Message History is a 1st class citizen - not an afterthought.  There is no distinction between sending a message and adding it to the history for a Matrix conversation; it's the same thing.
  • Similarly, it follows that Group Conversation is a 1st class citizen too, and not an afterthought.  1:1 chat is simply a subset of group conversation - it's just a group of 2 people.
  • And again, Open Federation is a 1st class citizen - the entire nature of the platform is built around the concept of a distributed message store that anyone can join.
  • Be as Web-friendly as possible, as befits WebRTC. The mandatory baseline APIs for client-server and server-server interaction are plain old HTTP (HTTPS mandatory for server-server).  To send a message, you PUT its JSON to your Matrix server, which federates it to the destination Matrix server (discovered by SRV DNS record) over HTTP PUT, which relays it to the recipient by returning an HTTP response to a long-lived GET request.  We deliberately keep it as simple and compatible as we possibly can (not even using WebSockets!), whilst leaving the door open for more performant transports in future.
  • Strong crypto. Matrix provides PKI infrastructure for the ecosystem, letting both servers and clients publish public keys to the world.  This means all federation traffic is signed by the originating server, preventing server identity from being spoofed.  It also lets servers sign their messages in the distributed message graph, preventing anyone from tampering with message history.  Finally, it provides the infrastructure for End-To-End encryption, for those who don't trust their servers.
  • Identity Agnostic. Rather than creating yet another mandatory global identity namespace on the net, Matrix assimilates your existing ones.  Users can associate as many 3PIDs (3rd Party IDs: email addresses, MSISDNs, Facebook IDs, etc) with their matrix identity as they like - and then be discovered on Matrix via 3PID.  We think this is the optimal solution for identity, allowing us to piggyback on everyone's existing address books without complicating things even further.
  • Entirely Open.  Matrix is an open standard published currently under the liberal Apache license, and to the best of our knowledge it is not encumbered by any patents.  Matrix.org certainly will not assert any patents over it, as per ASLv2.  Meanwhile, all of our reference code is Apache licensed too, and we expect and encourage everyone to run their own Matrix servers and clients and join the Open Federation that Matrix provides.  It can't get much more open than that!
  • Independent.  We have deliberately released Matrix as a pragmatic working project with the intention of maturing both the standard and reference implementations in parallel as rapidly as possible.  Once things are frozen we'll start looking at how best to handle the custodianship of the standard going forwards, but for now we're independent of IETF, W3C, 3GPP, XMPP or any other official dedicated standards body.
Obviously creating a new standard is always risky and controversial, but after spending the last 10 years building SIP/XMPP/IAX infrastructure we felt we knew many of their limitations, and just as WebRTC gives a clean start for manipulating media in the web browser, we reason that the signalling deserves a clean start too.  And so this is our proposal :)   What signaling protocol have you selected for Matrix.org and why? The client-server API has the mandatory baseline of plain old JSON over HTTPS (or plain HTTP for tinkering).  HTTP is incredibly ubiquitous, and thanks to SPDY and HTTP/2 it's even quite performant these days.  Especially for WebRTC, we see no reason to shoe-horn a separate signaling stack into your browser when you already have HTTP there by definition. For instance, to send an IM to a conversation in Matrix, I'd do something like:
curl -XPOST -d '{"msgtype":"m.text", "body":"hello"}' "https://mydomain.com/_matrix/client/api/v1/rooms/ROOM_ID/send/m.room.message?access_token=ACCESS_TOKEN"

# returning:
{ "event_id": "YUwRidLecu" }
This is sending the message "hello" as a plain old UTF8 text message (msgtype "m.text") to your Matrix homeserver exposed to the net at https://mydomain.com/_matrix.  The actual 'event' being injected into Matrix has type "m.room.message" - Matrix supports Java-style namespacing of JSON events, reserving the 'm.' prefix for official specified Matrix events.  But you could equally well inject any random type of JSON object - e.g. by PUTting JSON to a URL ending /send/com.mydomain.custom or similar.  Finally, the room we send the message into is identified in the URL (here as the ROOM_ID) placeholder.  And to identify the end-user, each device has an access_token provisioned during login to authenticate them. To initiate a WebRTC call, I'd do something very similar - but this time sending an m.call.invite event:
curl -XPOST –d '{\
  "version": 0, \
  "call_id": "12345”, \
  "offer": {
    "type" : "offer”,
    "sdp" : "v=0\r\no=- 658458 2 IN IP4 127.0.0.1…"
  }
}' "https://mydomain.com/_matrix/client/api/v1/rooms/ROOM_ID/send/m.call.invite?access_token=ACCESS_TOKEN"

# returning:
{ "event_id": "ZruiCZBu” }
As you can see, we're currently using SDP to describe the WebRTC session, although once ORTC is fully adopted we can easily bump the version of our m.call.invite event and send a JSON ORTC descriptor instead :) The WebRTC call setup really is unashamedly simplistic (the current spec lacks a 'ringing' state, for instance, although by the time you read this that may well have been fixed), but we believe it's perfectly adequate. For more details, please see our client-server API tutorial and the standard itself. Meanwhile, the server-server API is considerably more complicated, but we don't expect anyone other than server implementers to ever need to worry about it.  Again, the baseline is plain old HTTP - entirely push-based using POSTs and PUTs between the servers.  A fictitious, handmade example is:
curl –XPOST –H 'Authorization: X-Matrix origin=matrix.org,key=”898be4…”,sig=“j7JXfIcPFDWl1pdJz…”' –d '{
    "transaction_id":"916d630ea616342b42e98a3be0b74113",
    "ts": 1413414391521,
    "origin": "matrix.org",
    "destination": "alice.com",
    "prev_ids": ["e1da392e61898be4d2009b9fecce5325"],
    "pdus": [{
        "age": 314,
        "content": {
            "body": "hello world",
            "msgtype": "m.text"
        },
        "context": "!fkILCTRBTHhftNYgkP:matrix.org",
        "depth": 26,
        "hashes": {
            "sha256": "MqVORjmjauxBDBzSyN2+Yu+KJxw0oxrrJyuPW8NpELs"
        },
        "is_state": false,
        "origin": "matrix.org",
        "pdu_id": "rKQFuZQawa",
        "pdu_type": "m.room.message",
        "prev_pdus": [
            ["PaBNREEuZj", "matrix.org"]
        ],
        "signatures": {
            "matrix.org": {
                "ed25519:auto": "jZXTwAH/7EZbjHFhIFg8Xj6HGoSI+j7JXfIcPFDWl1pdJz+JJPMHTDIZRha75oJ7lg7UM+CnhNAayHWZsUY3Ag"
            }
        },
        "origin_server_ts": 1413414391521,
        "user_id": "@matthew:matrix.org"
    }]
}' https://alice.com:8448/_matrix/federation/v1/send/916d630ea616342b42e98a3be0b74113
This shows an event (PDU) being replicated from the matrix.org homeserver to the alice.com homeserver.  The important stuff to notice is:
  • We have a transaction layer that wraps a list of PDUs; the transactions use logical timestamping (the prev_ids field) to refer to the previous transaction in the flow.
  • This contains one PDU; the "m.room.message" event for "hello world" (as a plain UTF-8 instant message - i.e. msgtype of "m.text".
  • The "essential fields" of this event are hashed as sha256.
  • The pointer(s) to the antecedant event(s) in the distributed message graph for the room are included in the event in the prev_pdus array.
  • The message, including these pointers, is then signed by the org homeserver using an ED25519 ECC hash.
  • Interestingly, rather than signing the actual message contents, the signature applies to the message hash instead - this allows event contents to be redacted and discarded from servers without violating the signing guarantees.  Redaction events are equivalent to "recall messages" in email - servers uphold them as best they choose.  This gives us a way of obnoxious content being nuked without destroying the integrity of a room.
  Backend. What technologies and architecture are you using there? For Synapse, the current reference Matrix backend, we're using Python 2 and Twisted.  This was a conscious decision to use a mature technology - Twisted's been around for 12 years now, and has most of the kinks knocked out of it by now.  Meanwhile, Python's great for rapid app development and legible code (although in the rush to get Matrix out the door, I'm not sure we'd win any medals for code beauty - but we are going through fixing this currently). Synapse itself is very much a reference implementation - it's not super-scalable or heavily optimized; instead, it's meant to show relatively coherently how a Matrix homeserver could look.  For persistent storage it uses sqlite by default, but in practice any SQL backend could be swapped in. Architecturally, everything is a very straightforward event-driven async model that should be familiar to any Node.js developers out there. Meanwhile, I'm aware of two other serious implementations of a Matrix Homeserver in progress - one in Golang, the other in (modern) Perl 5, although neither have hit github yet.  It's incredibly exciting to see the open source community plunging in and trying to implement the backend though!   Where do you see WebRTC going in 2-5 years? Today WebRTC is a horrible fragmentation of thousands of similar services, none of which can interoperate with one another.  Just because someone wants to talk to me using appear.in (for instance) for a simple call doesn't mean that I should have to be forced to use appear.in if I have my own preferred calling service.  We obviously hope that in the years to come, WebRTC will be as ubiquitous and interoperable as email - if I want to video call someone, I shouldn't have to play hide-and-seek on yet another random service that they or I pick, but instead use my favorite. Meanwhile, it's inevitable that all the browsers will finally implement WebRTC.  With Microsoft having finally committed last week, only Apple remains - and it's very unclear as to what Apple has to benefit by being last to the party.  I'm hoping that both VPx and H.26x codecs will be implemented wherever possible, with network-side resources providing transcoding by default alongside the all-important TURN relays.  In the end, folks who have good hardware H.26x will clearly want to use it.  And folks who don't want to pay H.26x licensing fees will clearly want to use VPx.  I don't think we're going to get away from that any time soon, and it's not necessarily a bad thing - competition is good.  I suspect there may be some shortcuts that can be taken when transcoding between H.26x and VPx to speed things up a bit, which might make interop more palatable. With this solved, we may finally see better interoperability between WebRTC and the Telco world - as VoLTE and ViLTE get more deployed, the ability to interwork the telco and internet ecosystems will be critical, if we're ever to get back to the level of consistent and controllable experience we once enjoyed with the PSTN - or continue to enjoy today on the web and email.   If you had one piece of advice for those thinking of adopting WebRTC, what would it be? Please don't invent yet another proprietary closed signaling protocol!  Use Matrix instead!!   Given the opportunity, what would you change in WebRTC? I'd try to speed up the standardization process, perhaps by encouraging Google to be more of a benign dictator.  I'd also make Matrix a recommended solution for interoperable signaling :)   What’s next for Matrix.org? As of the beginning of November we are on a mission to close the remaining federation security loopholes and get to the point where we can point the wider opensource and VoIP/IM community at running their own Synapses and see what happens. In parallel, we need to do a bunch of editing of the standard itself to bring it fully up-to-date, consistent, comprehensive and digestible. Then, for the next few months, we'll be documenting and implementing the Application Service API: a new HTTP API which matrix servers can expose in order to build more exotic business logic on top of Matrix.  For our first cut of Matrix, we concentrated in making the core ecosystem robust and self-contained - but obviously the promise of Matrix is to be interoperability glue between existing silos.  For this, we need a way to implement gateways between existing silos and Matrix - as well as developing more exotic behaviors on top of Matrix (e.g. speech-recognition; conferencing; etc).  Many of these can be done through the existing client-server interface, but some of them require privileged access to the server - the ability to intercept and filter messages or the ability to create rooms of virtual users and messages.  For this we are defining the various Application Service APIs - and it's at this point that Matrix could start to get really interesting. A simple example for a use case of the Application Service API is APNS/GCM push support in Matrix.  Currently Synapse doesn't know how to send push to mobile clients, which is obviously a bit of an oversight.  However, there's no reason this sort of non-core functionality should be baked into a specific homeserver implementation.  By implementing a separate Push Service which talks to your homeserver via the AS API, we can make a nice reusable separate push module which can work against /any/ homeserver. Finally, we'll be putting a lot of work into making our current demo clients look less ugly - right now the web/IOS/Android apps are deliberately functional, intended for real power users and developers to experiment with Matrix and use as inspiration for their own clients.  This should get fixed fairly shortly, however :)

You may also like

RTC@Scale 2024 – an event summary

RTC@Scale is Facebook’s virtual WebRTC event, covering current and future topics. Here’s the summary for RTC@Scale 2024 so you can pick and choose the relevant ones for you.

Read More