Matrix.org and WebRTC: An Interview with Matthew Hodgson

By Tsahi Levent-Levi

November 13, 2014

Federated signaling for WebRTC.

This interview is important to me for several reasons:

I am not a fan of federated signaling. This is why this interview is so important to me – it shows a different way of thinking than my own here in my small corner of the world
It is being developed and put out there by people I have known for several years, out of the scope of WebRTC

Matthew Hodgson is the Technical co-founder of Matrix.org. And Matrix.org is an open source federated signaling standard with an accompanying framework implementation. The idea as well as the execution is really interesting.

Here are Matthew’s answers to my questions.

What is Matrix.org all about?

Matrix.org is an ambitious new opensource project which defines a simple HTTP API for decentralized federated messaging – be that for WebRTC call setup, or an instant message, or indeed any kind of JSON data structure. Matrix ambition is to be the missing signaling link for interoperable WebRTC.

Our core mission is to fix the problem of fragmentation between today’s VoIP and IM communication silos by providing a really pragmatic solution for interoperable messaging; trying to make VoIP and IM as easy, ubiquitous and flexible as email, whilst reassuringly familiar to a modern web developer. Meanwhile, the bigger vision is to use Matrix’s cryptographically secure eventually-consistent distributed message history to provide a robust decentralized messaging layer for the internet of things.

We provide the open Matrix standard, Apache-licensed implementations of our reference Matrix-compatible server (called Synapse), and example client SDKs and Apps for the Web, iOS, Android, Perl, Python and more…

Matrix is still evolving at this point – we only launched in September, and we’ve been publishing our work as we go in the true spirit of open source. However, we’re approaching a frozen version of the standard – and Synapse is almost feature complete; all that remains at this point is finishing the authorization layer for federated traffic. We strongly encourage developers to look at the tutorials and the standard; have a play with the APIs; try running your own server; and come give us feedback on #matrix:matrix.org so we can incorporate feedback and give Matrix the best chance of success.

And as always, patches and github pull requests are very welcome 🙂

What excites you about working in WebRTC?

Ironically, in a previous life the Matrix team by building Softphone SDKs and VoIP/Video stacks – we wrote one of the first (closed source, commercial) SIP softphones for iOS back in 2009, using reSIProcate for the signaling and our own media framework (mxmedia) for all the RTP and realtime media processing. So we’re pretty familiar with the idea of WebRTC – and when Google acquired GIPS and released WebRTC it was both good news and bad news.

At last there was going to be a good VoIP media stack installed in almost every web-browser in the world! Suddenly anyone would be able to build their own VoIP functionality, harnessing the flexibility and frontier spirit of the web itself! But on the minus side, the marketplace for commercial VoIP stacks shrunk dramatically – forcing us to look at the bigger problem of communications, rather than the specifics of VoIP.

Nowadays, the excitement of WebRTC is definitely that it provides the hardest piece of the technological problem of rolling out ubiquitous IP VoIP and IM services by turning almost every browser into a viable endpoint. The fact that it doesn’t define any solution to the problem of interoperating between WebRTC services is also exciting, as it gives us an opportunity to help out using Matrix.

Why Matrix? We already have a slew of signaling frameworks out there. What sets Matrix.org apart?

I’m not sure there are /that/ many standard signaling frameworks out there – the only ones which immediately spring to my mind are SIP, XMPP, OpenPeer, FreeSWITCH’s mod_verto and some of the XMPP extensions like XMPP-FTW and Buddycloud (I’m hoping nobody’s gone and tried to write an H.323 stack in JavaScript…). Obviously there are a bajillion proprietary non-standardized JSON-over-HTTP implementations, which is part of the reason for trying to provide a standard implementation for those who want it…

Matrix sets itself apart by:

Entirely distributed architecture. Messages are replicated over all servers who participate in a conversation with eventual consistency, and cryptographically signed by the origin servers using a blockchain-style model to assert the integrity of the conversation’s message graph. There is no single point of control or failure over a Matrix conversation. Servers can cache as much or as little of the message history as they like. This is a huge difference from SIP/MSRP, XMPP MUCs, or conventional monolithic HTTP messaging servers. It also means we get a ‘self-healing’ architecture transparently resilient to netsplits and offline/disconnected operation thanks to the eventual consistency.
It follows that Message History is a 1st class citizen – not an afterthought. There is no distinction between sending a message and adding it to the history for a Matrix conversation; it’s the same thing.
Similarly, it follows that Group Conversation is a 1st class citizen too, and not an afterthought. 1:1 chat is simply a subset of group conversation – it’s just a group of 2 people.
And again, Open Federation is a 1st class citizen – the entire nature of the platform is built around the concept of a distributed message store that anyone can join.
Be as Web-friendly as possible, as befits WebRTC. The mandatory baseline APIs for client-server and server-server interaction are plain old HTTP (HTTPS mandatory for server-server). To send a message, you PUT its JSON to your Matrix server, which federates it to the destination Matrix server (discovered by SRV DNS record) over HTTP PUT, which relays it to the recipient by returning an HTTP response to a long-lived GET request. We deliberately keep it as simple and compatible as we possibly can (not even using WebSockets!), whilst leaving the door open for more performant transports in future.
Strong crypto. Matrix provides PKI infrastructure for the ecosystem, letting both servers and clients publish public keys to the world. This means all federation traffic is signed by the originating server, preventing server identity from being spoofed. It also lets servers sign their messages in the distributed message graph, preventing anyone from tampering with message history. Finally, it provides the infrastructure for End-To-End encryption, for those who don’t trust their servers.
Identity Agnostic. Rather than creating yet another mandatory global identity namespace on the net, Matrix assimilates your existing ones. Users can associate as many 3PIDs (3rd Party IDs: email addresses, MSISDNs, Facebook IDs, etc) with their matrix identity as they like – and then be discovered on Matrix via 3PID. We think this is the optimal solution for identity, allowing us to piggyback on everyone’s existing address books without complicating things even further.
Entirely Open. Matrix is an open standard published currently under the liberal Apache license, and to the best of our knowledge it is not encumbered by any patents. Matrix.org certainly will not assert any patents over it, as per ASLv2. Meanwhile, all of our reference code is Apache licensed too, and we expect and encourage everyone to run their own Matrix servers and clients and join the Open Federation that Matrix provides. It can’t get much more open than that!
Independent. We have deliberately released Matrix as a pragmatic working project with the intention of maturing both the standard and reference implementations in parallel as rapidly as possible. Once things are frozen we’ll start looking at how best to handle the custodianship of the standard going forwards, but for now we’re independent of IETF, W3C, 3GPP, XMPP or any other official dedicated standards body.

Obviously creating a new standard is always risky and controversial, but after spending the last 10 years building SIP/XMPP/IAX infrastructure we felt we knew many of their limitations, and just as WebRTC gives a clean start for manipulating media in the web browser, we reason that the signalling deserves a clean start too. And so this is our proposal 🙂

What signaling protocol have you selected for Matrix.org and why?

The client-server API has the mandatory baseline of plain old JSON over HTTPS (or plain HTTP for tinkering). HTTP is incredibly ubiquitous, and thanks to SPDY and HTTP/2 it’s even quite performant these days. Especially for WebRTC, we see no reason to shoe-horn a separate signaling stack into your browser when you already have HTTP there by definition.

For instance, to send an IM to a conversation in Matrix, I’d do something like:

curl -XPOST -d '{"msgtype":"m.text", "body":"hello"}' "https://mydomain.com/_matrix/client/api/v1/rooms/ROOM_ID/send/m.room.message?access_token=ACCESS_TOKEN"

# returning:
{ "event_id": "YUwRidLecu" }

This is sending the message “hello” as a plain old UTF8 text message (msgtype “m.text”) to your Matrix homeserver exposed to the net at https://mydomain.com/_matrix. The actual ‘event’ being injected into Matrix has type “m.room.message” – Matrix supports Java-style namespacing of JSON events, reserving the ‘m.’ prefix for official specified Matrix events. But you could equally well inject any random type of JSON object – e.g. by PUTting JSON to a URL ending /send/com.mydomain.custom or similar. Finally, the room we send the message into is identified in the URL (here as the ROOM_ID) placeholder. And to identify the end-user, each device has an access_token provisioned during login to authenticate them.

To initiate a WebRTC call, I’d do something very similar – but this time sending an m.call.invite event:

curl -XPOST –d '{\
  "version": 0, \
  "call_id": "12345”, \
  "offer": {
    "type" : "offer”,
    "sdp" : "v=0\r\no=- 658458 2 IN IP4 127.0.0.1…"
  }
}' "https://mydomain.com/_matrix/client/api/v1/rooms/ROOM_ID/send/m.call.invite?access_token=ACCESS_TOKEN"

# returning:
{ "event_id": "ZruiCZBu” }

As you can see, we’re currently using SDP to describe the WebRTC session, although once ORTC is fully adopted we can easily bump the version of our m.call.invite event and send a JSON ORTC descriptor instead 🙂

The WebRTC call setup really is unashamedly simplistic (the current spec lacks a ‘ringing’ state, for instance, although by the time you read this that may well have been fixed), but we believe it’s perfectly adequate.

For more details, please see our client-server API tutorial and the standard itself.

Meanwhile, the server-server API is considerably more complicated, but we don’t expect anyone other than server implementers to ever need to worry about it. Again, the baseline is plain old HTTP – entirely push-based using POSTs and PUTs between the servers. A fictitious, handmade example is:

curl –XPOST –H 'Authorization: X-Matrix origin=matrix.org,key=”898be4…”,sig=“j7JXfIcPFDWl1pdJz…”' –d '{
    "transaction_id":"916d630ea616342b42e98a3be0b74113",
    "ts": 1413414391521,
    "origin": "matrix.org",
    "destination": "alice.com",
    "prev_ids": ["e1da392e61898be4d2009b9fecce5325"],
    "pdus": [{
        "age": 314,
        "content": {
            "body": "hello world",
            "msgtype": "m.text"
        },
        "context": "!fkILCTRBTHhftNYgkP:matrix.org",
        "depth": 26,
        "hashes": {
            "sha256": "MqVORjmjauxBDBzSyN2+Yu+KJxw0oxrrJyuPW8NpELs"
        },
        "is_state": false,
        "origin": "matrix.org",
        "pdu_id": "rKQFuZQawa",
        "pdu_type": "m.room.message",
        "prev_pdus": [
            ["PaBNREEuZj", "matrix.org"]
        ],
        "signatures": {
            "matrix.org": {
                "ed25519:auto": "jZXTwAH/7EZbjHFhIFg8Xj6HGoSI+j7JXfIcPFDWl1pdJz+JJPMHTDIZRha75oJ7lg7UM+CnhNAayHWZsUY3Ag"
            }
        },
        "origin_server_ts": 1413414391521,
        "user_id": "@matthew:matrix.org"
    }]
}' https://alice.com:8448/_matrix/federation/v1/send/916d630ea616342b42e98a3be0b74113

This shows an event (PDU) being replicated from the matrix.org homeserver to the alice.com homeserver. The important stuff to notice is:

We have a transaction layer that wraps a list of PDUs; the transactions use logical timestamping (the prev_ids field) to refer to the previous transaction in the flow.
This contains one PDU; the “m.room.message” event for “hello world” (as a plain UTF-8 instant message – i.e. msgtype of “m.text”.
The “essential fields” of this event are hashed as sha256.
The pointer(s) to the antecedant event(s) in the distributed message graph for the room are included in the event in the prev_pdus array.
The message, including these pointers, is then signed by the org homeserver using an ED25519 ECC hash.
Interestingly, rather than signing the actual message contents, the signature applies to the message hash instead – this allows event contents to be redacted and discarded from servers without violating the signing guarantees. Redaction events are equivalent to “recall messages” in email – servers uphold them as best they choose. This gives us a way of obnoxious content being nuked without destroying the integrity of a room.

Backend. What technologies and architecture are you using there?

For Synapse, the current reference Matrix backend, we’re using Python 2 and Twisted. This was a conscious decision to use a mature technology – Twisted’s been around for 12 years now, and has most of the kinks knocked out of it by now. Meanwhile, Python’s great for rapid app development and legible code (although in the rush to get Matrix out the door, I’m not sure we’d win any medals for code beauty – but we are going through fixing this currently).

Synapse itself is very much a reference implementation – it’s not super-scalable or heavily optimized; instead, it’s meant to show relatively coherently how a Matrix homeserver could look. For persistent storage it uses sqlite by default, but in practice any SQL backend could be swapped in.

Architecturally, everything is a very straightforward event-driven async model that should be familiar to any Node.js developers out there.

Meanwhile, I’m aware of two other serious implementations of a Matrix Homeserver in progress – one in Golang, the other in (modern) Perl 5, although neither have hit github yet. It’s incredibly exciting to see the open source community plunging in and trying to implement the backend though!

Where do you see WebRTC going in 2-5 years?

Today WebRTC is a horrible fragmentation of thousands of similar services, none of which can interoperate with one another. Just because someone wants to talk to me using appear.in (for instance) for a simple call doesn’t mean that I should have to be forced to use appear.in if I have my own preferred calling service. We obviously hope that in the years to come, WebRTC will be as ubiquitous and interoperable as email – if I want to video call someone, I shouldn’t have to play hide-and-seek on yet another random service that they or I pick, but instead use my favorite.

Meanwhile, it’s inevitable that all the browsers will finally implement WebRTC. With Microsoft having finally committed last week, only Apple remains – and it’s very unclear as to what Apple has to benefit by being last to the party. I’m hoping that both VPx and H.26x codecs will be implemented wherever possible, with network-side resources providing transcoding by default alongside the all-important TURN relays. In the end, folks who have good hardware H.26x will clearly want to use it. And folks who don’t want to pay H.26x licensing fees will clearly want to use VPx. I don’t think we’re going to get away from that any time soon, and it’s not necessarily a bad thing – competition is good. I suspect there may be some shortcuts that can be taken when transcoding between H.26x and VPx to speed things up a bit, which might make interop more palatable.

With this solved, we may finally see better interoperability between WebRTC and the Telco world – as VoLTE and ViLTE get more deployed, the ability to interwork the telco and internet ecosystems will be critical, if we’re ever to get back to the level of consistent and controllable experience we once enjoyed with the PSTN – or continue to enjoy today on the web and email.

If you had one piece of advice for those thinking of adopting WebRTC, what would it be?

Please don’t invent yet another proprietary closed signaling protocol! Use Matrix instead!!

Given the opportunity, what would you change in WebRTC?

I’d try to speed up the standardization process, perhaps by encouraging Google to be more of a benign dictator. I’d also make Matrix a recommended solution for interoperable signaling 🙂

What’s next for Matrix.org?

As of the beginning of November we are on a mission to close the remaining federation security loopholes and get to the point where we can point the wider opensource and VoIP/IM community at running their own Synapses and see what happens.

In parallel, we need to do a bunch of editing of the standard itself to bring it fully up-to-date, consistent, comprehensive and digestible.

Then, for the next few months, we’ll be documenting and implementing the Application Service API: a new HTTP API which matrix servers can expose in order to build more exotic business logic on top of Matrix. For our first cut of Matrix, we concentrated in making the core ecosystem robust and self-contained – but obviously the promise of Matrix is to be interoperability glue between existing silos. For this, we need a way to implement gateways between existing silos and Matrix – as well as developing more exotic behaviors on top of Matrix (e.g. speech-recognition; conferencing; etc). Many of these can be done through the existing client-server interface, but some of them require privileged access to the server – the ability to intercept and filter messages or the ability to create rooms of virtual users and messages. For this we are defining the various Application Service APIs – and it’s at this point that Matrix could start to get really interesting.

A simple example for a use case of the Application Service API is APNS/GCM push support in Matrix. Currently Synapse doesn’t know how to send push to mobile clients, which is obviously a bit of an oversight. However, there’s no reason this sort of non-core functionality should be baked into a specific homeserver implementation. By implementing a separate Push Service which talks to your homeserver via the AS API, we can make a nice reusable separate push module which can work against /any/ homeserver.

Finally, we’ll be putting a lot of work into making our current demo clients look less ugly – right now the web/IOS/Android apps are deliberately functional, intended for real power users and developers to experiment with Matrix and use as inspiration for their own clients. This should get fixed fairly shortly, however 🙂

Answering ChatGPT questions about WebRTC

Choosing the best WebRTC signaling protocol for your application

Philipp Hancke says:

November 13, 2014 at 6:05 pm

> after spending the last 10 years building SIP/XMPP/IAX infrastructure we felt we knew many of their limitations,

I’d love to hear about the perceived limitations of XMPP, in particular for federation. And how you’re going to avoid the problem of people not understanding x509, making things like DANE and POSH necessary.

Good luck and be wary when Google offers to join. What they did to xmpp actually showed that federated technologies are not enough.

Reply
1. Matthew Hodgson says:
  
  November 14, 2014 at 11:47 pm
  
  Good questions 🙂 Sorry for delay in response – have been travelling back from TADSummit and wanted to try to write a comprehensive answer. The main limitations of XMPP that pushed us towards Matrix were:
  
  * We wanted an architecture where group communication has no single point of control or failure. XMPP MUCs depend on a single chat server, with no intrinsic support for horizontal scalability (especially over federation). A single server crash or disconnect should never be sufficient to destroy a room – and rooms should be able to recover gracefully after netsplits or partial data loss.
  
  * Running a clientside XMPP stack feels unnecessary in a browser which already has an HTTP stack, especially given how performant SPDY and HTTP/2 are. Whilst XMPP-FTW provides a REST mapping for XMPP, you still end up converting everything through into XMPP in the end.
  
  * The baseline featureset of XMPP is deliberately too minimal for many use cases (e.g. no group chat; no history; no reliable message delivery; no specific support for mobile use cases…) – and whilst there are obviously many XEPs, this introduces fragmentation as you can’t guarantee that any given client or server will actually support the extensions and provide a good federation experience. For instance, we believe message history should be a fundamental feature for any modern communication app – whereas it’s very much an optional after-thought on XMPP.
  
  * We wanted to bake in cryptographic strong identity for both servers & clients from the outset, to avoid identity and history spoofing, support end-to-end crypto and provide good foundations for avoiding spam.
  
  * Jingle feels unnecessarily complicated for the task of setting up WebRTC calls, and has not seen huge uptake.
  
  * JIDs haven’t taken over the world; we wanted an architecture which primarily used 3rd party IDs (email addresses, MSISDNs, Facebook IDs etc) which get mapped through to relatively opaque internal identifiers… rather than relying on the success and ubiquity of a global identifier namespace like JIDs or SIP URIs.
  
  * JSON is perhaps a more convenient data representation for current web developers than XML.
  
  In the end, XMPP and Matrix federation have fundamentally different architectures despite similar use cases and superficial similarities. XMPP is all about passing stanzas around over, whereas Matrix is all about replicating crypto-signed data structures (conversation history message graphs) with eventual consistency. If anything, Matrix has more in common with an eventually-consistent globally distributed database than a message-passing system like XMPP.
  
  This does have the disadvantage that implementing federation in Matrix is certainly more complicated than XMPP. But we hope that the features are worth it 🙂
  
  In terms of TLS fun and games… the Matrix client-server API expects to be proxied behind whatever existing HTTPS (or HTTP, if you’re desperate) vhost you have hanging around, so we dodge the bullet there. We need to support WSGI or similar; for now the server exposes a standalone Twisted listener.
  
  The server-server API is a bit trickier; because we mandate HTTPS for federation, the server by default spins up its own TLS listener on port 8448, which we expect folks to unfirewall and advertise via SRV. By default the server generates a self-signed certificate and publishes the public cert over HTTPS (in future via a public key server), so avoiding the need for everyone to go and buy public signed PKIX X.509 certs. Servers connecting over federation validate the public certs against the key server (or their cache of the cert). This listener can also be proxied behind an existing HTTPS vhost with an officially signed X.509 cert if needed (or a self-signed cert that has been published to Matrix). We authenticate the requests at the HTTP layer (TLS client certs being unhelpful if the TLS listener is loadbalanced).
  
  In terms of DANE: we’d be more than happy to authenticate certs via DNSSEC rather than CA signatures, or our own “we just publish the public certs out of band” approach. We haven’t implemented it yet – patches welcome; it’d just be an extension of ways we specify to validate a cert.
  
  In terms of POSH: honestly, I think we missed this one. It looks like a more robust version of the publish-your-cert-over-HTTPS that Matrix does already today. The fact that by default we generate per-server self-signed certs for federation means we limit the exposure of private keys to just the Matrix service in a multitenant environment, but the idea of delegating your certs to a better trusted intermediary seems smart. We’ll have to look into it properly. On the plus side, the Matrix spec (and especially the identity & crypto system) isn’t frozen yet, so we’re always very happy to take inspiration from work like this. Thanks for the pointer!
  
  Reply
  1. Dave Cridland says:
    
    November 16, 2014 at 12:40 pm
    
    Thanks for your response; the majority of nay-sayers tend to limit the descriptions of their problems to vague issues without much definition, but you’ve given some well-reasoned answers.
    
    In order:
    
    * If a MUC is hosted on a single domain, and a single server, then even a single network link dropping can cause outage. Luckily, a domain can be hosted across multiple servers, and chatrooms can even be federated. I’ve seen this work over satellite links to aircraft – it really does work. I think the bigger problem is that the concepts of group conversation in XMPP are intrinsically linked to a chatroom, meaning that the mechanical semantics have to be mapped very carefully to the user experience – that is, some chatrooms look like chatrooms in the UI, whereas others look like an ad-hoc group conversation.
    
    * I personally don’t think it matters what happens to the data on the server side; and I personally don’t think it matters if you’re running a different stack inside the browser much either, these days. The latter in particular is a matter of architectural choice, however – I suspect that as the XMPP community gains some more familiarity with HTTP/2, we’ll probably see a XMPP to HTTP/2 mapping happening.
    
    * Matrix is entirely an optional afterthought to the web, of course. Optional afterthoughts are generally speaking good protocol design, especially if you want to have at least some possibility of people getting into the playing field. There’s really only 5 players in the browser business because the browser requirements are so “kitchen sink”, and this in turn is out of necessity because it’s a single platform standard. The XMPP community had two attempts at a conversation archive spec, XEP-0136 and MAM. Had these been baked into the core spec, I think we’d have been in a much worse shape. In years past, we’ve put together kitchen-sink specs for IM clients and servers, though – I’d personally love to see these resurrected so we can state the requirements for modern IM better.
    
    * Yes. End-to-end crypto is hard. I don’t believe it’s ever been accomplished in a standards-based protocol. XMPP is trying, in a number of directions.
    
    * Jingle is, fundamentally, what Google Hangouts uses. So I’d argue that it’s seen very huge uptake. It suffers a bit from interop, however I’m hopeful that WebRTC will help this (as a common set of codecs, better ICE support, etc).
    
    * Yeah… I think this is a tough one. I think using third-party identifiers is great for getting users on-board, since users really hate sign-up. I think using them for auth is probably practical. I think making these homogeneous throughout the network is possibly practical, but not really without having infinite trust within the network.
    
    * Oh dear lord. Yeah, syntax-fashion has been a bit of a pest. Lloyd tried to address this with XMPP-FTW, Lance with stanza.io, but it’s a point of frustration. From a protocol design angle, XML is much more convenient than JSON, but JSON has a very simple data model that maps really well to most OO-ish languages.
    
    On the subject of message-passing versus DHTs, I’d casually point out that you can implement one in the other, but the end result differs in really fascinating ways, in particular, when you consider the privacy of endpoints. But discussing this would require a lengthy blog-post in itself, and we’re taking over Tsahi’s comments quite enough as it is. 🙂
    
    That said, have you looked at Google Wave? I suspect that you’ve some similarity there (and Google Wave based all its federation and message passing on XMPP, despite being more more like a distributed database).
    
    Reply
    1. Tsahi Levent-Levi says:
      
      November 16, 2014 at 2:35 pm
      
      By all means guys – take over my comments section 🙂
      
      If either one of you wants to write a guest post, I’ll be happy to share it here on my blog as well.
      
      Reply
    2. Matthew Hodgson says:
      
      November 17, 2014 at 12:45 am
      
      Thanks for continuing the discussion and the constructive responses! In the spirit of taking over Tsahi’s comment section then (as sad as it is to throw info into a distinctly non-federated silo like a blog…):
      
      * In terms of whether MUCs can federate – I’m aware of Kevin Smith’s XEP-0289, however I don’t think this is compatible with ‘normal’ MUCs, and also isn’t mainstream yet. Ironically FMUC looks fairly similar to Matrix architecturewise, although we only discovered it after publishing Matrix. As far as I understood, normal MUCs don’t federate at all – unless the implementation happens to try to support clustering of a single logical MUC over multiple physical servers, like OpenFire’s Hazelcast-based plugin. I may be missing something, though, as it seems unlikely anyone would ever run a hazelcast cluster split over airplane wifi… 🙂
      
      * In terms of caring about what happens to data serverside… well, as a user I want to make sure I have synchronised message history guaranteed on all devices, and it’s even nicer if my data is represented in some kind of redundant manner serverside so that i may recover it from the wider network in case of disaster. Plus I want to be able to pick what service I use to store it.
      
      * In terms of “why not just run a XMPP/SIP/whatever stack clientside in the browser”… well, it’s just engineering hygiene. I don’t want to have to select/write/maintain/expend-CPU/bandwidth/memory on unnecessary dependencies if I can get away with just firing off an HTTP (or COAP or MQTT) request. This is particularly true on constrained devices.
      
      * Totally agreed that modular specs are a Good Thing – and that Matrix is just a module on top of the web. It’s just a question of how kitchen-sink to go. Matrix is unashamedly quite kitchen-sink, in terms of trying to provide a baseline of common functionality over as many current VoIP/IM apps as possible. But it’s also extensible beyond that. Obviously we’re interested to see whether we’ve set the baseline at the right height 🙂
      
      * End-to-end crypto is indeed hard, but it’s looking pretty positive – we already have PKI crypto-signing running (and mandatory) for federation traffic at the application level, in addition to TLS at the transport layer. Extending this to support actual end-to-end crypto should be fairly straightforward, and should land in the next month or so.
      
      * I’ve personally never developed against Jingle – and I didn’t realise Hangouts still used it. I maintain that it’s not taken over the world for open federated VoIP though, alas…
      
      * Using 3PIDs (3rd party IDs) for ID is how Matrix works, and hopefully doesn’t require infinite trust – instead we currently ringfence the ID servers that map 3PIDs to IDs to a trusted clique (a bit like DNS root servers). This may well evolve in future, but as long as the messaging fabric is decoupled from the identity problem, i hope we can avoid infinite trust 🙂
      
      Finally, with respect to messaging-passing v. replicated datastructures.. yes, they are two sides of the same coin. Matrix isn’t a DHT (unlike Telehash, Tox & friends), but in the end it’s passing messages to synchronise datastructures rather than simpler stanza-like events. Whether this is a good idea or not remains to be seen. The assumption is that only server-implementors will ever speak the federation protocol, so it doesn’t matter too much that it’s a bit complicated.
      
      In terms of Wave… yes, featurewise Matrix is quite heavily inspired by Wave (albeit without the XMPP, obviously). I don’t think anyone’s actually compared the protocols though – would certainly be good to look and learn.
      
      Reply
      1. Dave Cridland says:
        
        November 17, 2014 at 1:11 pm
        
        Tsahi will regret telling us we can take over his comments. 🙂
        
        * So Hazelcast – while actually performing surprisingly well in Openfire – really isn’t suited to use over links of that quality. I’ve seen FMUC running over poor quality links, though, and I believe Isode have run clustering over the same. So yeah, it’s possible, and works. XMPP servers can cluster internally using eventual consistency (potentially for everything, given enough care), leaving the more structured messaging purely for external connections.
        
        * Yes, you do indeed want data stored securely, and in a service of your choice. I see this as a strong argument *for* XMPP, though.
        
        * I agree with your sentiment, but I suspect that after everything else has been considered, the fact it’s XMPP over WebSocket rather than HTTP isn’t a big deal. There was a discussion… erm… late last year, I think, about doing XMPP on EPROM-grade embedded kit. It surprised me that it’s possible. A chunk of the IoT crowd seems to think XMPP is the right choice there, too.
        
        * E2E is trivial, these days, in terms of the crypto architecture. The problem is that if I want to encrypt my traffic to you, I’ve got to find your key, and in general terms that means I need to find something I trust that can tell me your key. When you consider that PH-B’s “encode the key fingerprint into the email address” is currently one of the more user-friendly mechanisms we have on the table, it begins to become more understandable why this is such a tough area to crack. An option might be to have an issuer lookup running over a DANE-like DNSSEC mechanism, but if you’re using a centralized flat ID space I’m not sure how that works.
        
        * So you’re assuming that there is a single point of trust which presumably you operate. And if those ID servers are compromised, the entire network is compromised, isn’t it? I think you’re braver than I am taking on that responsibility. It solves lots of problems, like key-lookup, etc – but everyone on the entire network must trust you entirely.
      2. Matthew Hodgson says:
        
        November 18, 2014 at 6:44 am
        
        Randomly it doesn’t look like i can reply more than 5 deep, so continuing the thread here…
        
        > Tsahi will regret telling us we can take over his comments. 🙂
        
        I am sure he’s loving the debate 😀
        
        > * So Hazelcast – while actually performing surprisingly well in Openfire – really isn’t suited to use over links of that quality. I’ve seen FMUC running over poor quality links, though, and I believe Isode have run clustering over the same.
        
        Right, interesting. When I came across FMUC a few months ago I got the impression it didn’t exist much outside Isode’s commercial implementation. I maintain that normal XEP-45 MUCs still problematic though – hence the need for FMUC (or Matrix).
        
        > * Yes, you do indeed want data stored securely, and in a service of your choice. I see this as a strong argument *for* XMPP, though.
        
        Well, XMPP and Matrix share the same architecture here: you know precisely where your data is in a client/server model – there’s no magical DHT or similar p2p cloud.
        
        > * I agree with your sentiment, but I suspect that after everything else has been considered, the fact it’s XMPP over WebSocket rather than HTTP isn’t a big deal. There was a discussion… erm… late last year, I think, about doing XMPP on EPROM-grade embedded kit. It surprised me that it’s possible.
        
        I’m sure it’s possible, but is it actually compelling relative to COAP or MQTT?
        
        > A chunk of the IoT crowd seems to think XMPP is the right choice there, too.
        
        I can see XMPP being good for the federated messaging ‘backbone’ (and similarly Matrix), but i’m not sure i’d want some super-lowpower super-cheap low-bandwidth device having to speak XMPP (or HTTP for that matter…)
        
        > * E2E is trivial, these days, in terms of the crypto architecture.
        
        In terms of the crypto theory, yes. Implementations that don’t scare off client developers seem a bit rarer 🙂
        
        > The problem is that if I want to encrypt my traffic to you, I’ve got to find your key, and in general terms that means I need to find something I trust that can tell me your key. When you consider that PH-B’s “encode the key fingerprint into the email address” is currently one of the more user-friendly mechanisms we have on the table, it begins to become more understandable why this is such a tough area to crack.
        
        Hm, sorry for ignorance, but what’s a PH-B (other than a pointy haired boss)?
        
        > An option might be to have an issuer lookup running over a DANE-like DNSSEC mechanism, but if you’re using a centralized flat ID space I’m not sure how that works.
        
        Well, our ID space isn’t flat – matrix IDs are per-domain, and we could absolutely do a DANE style model. But instead we’re looking more at going down a POSH style route, using well known URIs to let domains host their own keyservers, falling back to a central point of control if all else fails. In fact we’ve been looking at how Open Peer handles the same problem and trying to converge there.
        
        > * So you’re assuming that there is a single point of trust which presumably you operate. And if those ID servers are compromised, the entire network is compromised, isn’t it? I think you’re braver than I am taking on that responsibility. It solves lots of problems, like key-lookup, etc – but everyone on the entire network must trust you entirely
        
        Right now we have a single logical cluster of identity servers (there are two currently; one run by matrix.org and another by the developer who wrote it), and they store all ID mappings and replicate globally across the cluster. Right now the public keys (certs) themselves are actually served up by their own server – like SSH, the first time you connect you trust the server and cache the key to spot future changes.
        
        However, the beauty here is that the key distribution and discovery mechanism is *entirely* decoupled from the rest of Matrix – and so especially at this early stage we’re quite happy to add new key distribution mechanisms and deprecate bad ones. Hence looking at how OpenPeer does it – and now POSH and DANE too. Personally I particularly dislike the current single logical ID server model, given almost everything else in Matrix is decentralised (apart from the server where your account lives, but we’re looking at fixing that too).
        
        Eitherway, there is definitely room for refinement in the identity system, but at least the current initial version works. Watch this space to see how it evolves!
Aswath Rao says:

November 14, 2014 at 1:37 am

Would you be doing a follow-up post on what points of yours on fed sig were answered and what issues are still outstanding?

Reply
1. Matthew Hodgson says:
  
  November 15, 2014 at 1:37 am
  
  I guess this one’s for Tsahi, but I’d certainly love to see a follow-up on federation signalling from Tsahi’s perspective.
  
  From my perspective: I agree that federation is a controversial topic. Some obvious arguments against it which I’m aware of include:
  * Federation only gives a lowest-common-denominator experience between different apps – by some users potentially supporting a subset of functionality the overall usability of the app can feel impacted.
  * Federation can make it hard to enhance the federation API without fragmentation – any feature beyond the baseline API will by definition be fragmented
  * Given launching a new WebRTC app is as simple as clicking on a link, why bother trying to minimise the number of apps you use?
  * Picking an communication channel to use to interact with someone is a social negotiation and is a feature rather than a bug. When you go for dinner with someone you don’t insist on both using your favourite restaurants and then doing a lowest-common-denominator video call between them – instead you agree on a mutually favourite restaurant and use that.
  * Why would existing messaging apps ever federate? WhatsApp shows you don’t need to…
  
  However, I genuinely believe that the arguments in favour outweigh the above:
  * Users should absolutely have the right to choose what comms solution they use, rather than being forced by their contacts into installing and trusting given apps or using given devices.
  * Lowest-common-denominator communication is better than being forced to install and use apps that you don’t want to use, and remember which contact is on which app.
  * Discovering how to contact new users is a nightmare currently – you simply don’t know what apps they have installed or what identifiers they use, or which they prefer to use.
  * If users want richer domain-specific features only supported by a given app, /then/ that’s a good reason to install the target app.
  * My conversation history should not be fragmented across multiple silos – as a user, I should have the right to own my data and control which service hosts it, and be able to index/archive/delete/encrypt it as I desire.
  * I can still choose to use different apps by default for different experiences and social groups if I still desire – federation simply provides choice.
  * Users have been lulled into thinking that the fragmentation in VoIP/IM is a feature. If email was fragmented like this, there would be rioting in the streets. Imagine if you had to install 20 televisions in your living room, each to watch different shows, each with a different remote control and different size of screen/picture quality/etc? This is where we’re at with VoIP/messaging apps currently…
  * Whilst the big existing messaging apps have no particular economic reason to federate (and may even have short-term reasons not to), we think that emerging & smaller apps benefit enormously from federation. And if you glue all of those communities together, you can rapidly produce a community which rivals the existing incumbents *and* supports user choice and freedom. And this is compelling.
  * The web, email and the PSTN all show that a federated network can be very compelling. Just because XMPP or SIP haven’t taken over the world doesn’t mean that the same doesn’t apply for VoIP/IM.
  
  I could go on 😉
  
  Reply
Lennie says:

June 12, 2019 at 12:38 pm

The matrix.org specification has reached 1.0 and they say it’s production ready.

Could this be a good moment for a new interview ?

Reply
1. Tsahi Levent-Levi says:
  
  June 12, 2019 at 5:52 pm
  
  It might. I stopped doing these interviews though.
  
  I’ll probably be incorporating a recorded video interview in my WebRTC course at some point for Matrix.
  
  Reply

Matrix.org and WebRTC: An Interview with Matthew Hodgson

You may also like

Answering ChatGPT questions about WebRTC

Choosing the best WebRTC signaling protocol for your application

Leave a Reply