Why was SCTP Selected for WebRTC’s Data Channel?

May 19, 2014

Because we can.

I think the people who defined WebRTC are historians or librarians. I say this all the time: WebRTC brings practically no new technology with it. It is a collection of existing standards, some brought from the dead, coupled together to create this thing called WebRTC. It is probably why many VoIP vendors fail to understand the disruption of it.

Trying to grok WebRTC? Here’s a 3-part video lesson on WebRTC backend servers

Before I digress, let me get back on track here:

Data Channel is a part of WebRTC. It is a neglected part by most developers. It is awesome.

Oh. And it runs on top of this no name protocol called SCTP.

WebRTC Data Channel stack

What is SCTP?

SCTP Stands for Stream Control Transport Protocol. It is an IETF standard (RFC 4960). And it is old. From 2000. That’s when I had a full year of experience in VoIP. H.323 was the next big thing. SIP was mostly a dream. Chrome didn’t exist. Firefox didn’t exist either. We’re talking great grandparents type old in technology years.

SCTP sits somewhere between TCP and UDP – it is a compromise of both, or rather an improvement on both. It also has a feature called multi-homing which isn’t used in WebRTC (so I’ll ignore it here).

I remember some time around 2005 or so, we decided at RADVISION to implement SCTP. I don’t recall i this was for SIP or for Diameter. The problems started when we looked for an SCTP implementation. None really existed in the operating system. We ended up implementing it on our own, on top of raw sockets – not the best experience we had.

Fast forward to 2014. SCTP is nowhere to be found. Sure – some SIP Trunks have it. A Diameter implementation or two. But if you go look at Wikipedia for SCTP, this is what you get once you filter the list for relevant operating systems:

  • Generic BSD (with external patch)
  • Linux 2.4 and above
  • SUN Solaris 10 and above
  • VxWorks 6.x and above (but not all 6.x versions)
  • Nothing on Windows besides third parties

Abysmal for something you’d think should be an OS service.

Mobile? It is there on Android, but someone needs to turn the lights on when you compile Android and enable it. On iOS? Meh.

Where SCTP is used in WebRTC

SCTP is never used. And when it does, it is by the VoIP community and for server-to-server communications. But now in WebRTC, it is used for peer-to-peer arbitrary data delivery across browsers.

Why on earth?

The answer lies in the word arbitrary. The data channel is there for use cases we have no clue about. Where we don’t really know the types of requirements.

I’ll try to explain it with the table below that I use in my training sessions:

201405-SCTP-comparison
  • Reliability – if I send a packet – do I have the confidence (acknowledge) it was received on the other end?
  • Delivery – if I receive 2 packets – am I sure the order they were received is the order in which they were sent?
  • Transmission – am I sending packets or an endless stream of bytes?
  • Flow control / congestion control – is the protocol itself acts responsibly and deals with congested networks on its own?

For SCTP, reliability and delivery are configurable – I can decide if I want these characteristics or not. Why wouldn’t I want them? Everything comes at a price. In this case latency, and assumptions made on my use case.

So there are times I’d like more control, which leans towards UDP. But I also want to make life for developers, so I’d rather hint to the protocol on my needs and let him handle it.

Similarly, here’s a great post Justin Uberti linked to a few weeks back – it is on reasons some game developers place their signaling over UDP and not TCP. Read it to get some understanding on why SCTP makes sense for WebRTC.

The Data Channel is designed for innovation. And in this case, using an old and unused tool was the best approach.


You may also like

Comment​

Your email address will not be published. Required fields are marked

  1. “Developers aren’t challenged enough” (Hadriel Kaplan, in a slightly different context) is a much better explanation 🙂

    I think that the architectural distinction between VoIP and data instead of an integrated approach like done by anicimas MFP (which later became Adobes RTMFP) will be regretted in the future.

      1. well, some of the recent discussions on the IETF list make it seem that SCTP now gets the first real deployment experience.

        That also seems true for TURN though 😉

    1. I don’t know how technically accurate you want to be, but ICE does not encapsulate packets:
      so it’s actually SCTP over DTLS with ICE, not over ICE.

  2. SCTP is extensively used in SS7/SIGTRAN networks as the transport for protocols like M3UA, M2PA, so I would say it has a good pedigree in telecom signalling stacks.

    1. Sure it does – but that’s like nowhere if you consider the rest of the world of networking. Everything else is on TCP or UDP. No wonder this has no adoption by many operating systems.

      1. The primary problem is that Microsoft blatantly refuses to even consider adding support for it. Without full access to it on clients, it’ll go nowhere fast.

        SCTP is awesome, and I’m saddened that it seems to be struggling to gain traction while older and far more problematic/limited protocols building on the even older and even more limited IP stack keep the net back.
        All the nifty things they do on IP these days, like web sockets and streams are essentially hacks trying to approximate what you get out of the box on SCTP.

  3. Yep, I think you are right, it was (yet another) pragmatic decision to use a proven protocol or stack and
    slightly re-purpose it to meet the unique needs of webRTC. – Consider the alternative, designing a whole new
    protocol stack just for the dataChannel use, protocols generally need a couple of revisions to get the kinks
    ironed out, SCTP has been through that process and is heavily used (albeit in a different domain as Martyn pointed out).
    Also it is a well written RFC with multiple implementations, which is proof at least that it is implementable from the RFC!

    I don’t think that your list of implementations is overly useful. All of them are in the wrong place.
    You are looking (mostly) at a list of SCTP stacks that are designed to run inside various kernels.

    The requirements for running the relevant subset of SCTP inside a webRTC app are considerably simpler,
    as you point out multi-homeing isn’t needed for example. What the browser vendors have done is to take the source code of a working kernel based stack, surround it with code that mimics the kernel, then plug the resulting box into
    the already tangled heap that is the browser. I’m personally unconvinced that such a mechanism will scale if/when we
    need server side data channel support.

    I think we can expect to see a few simplified SCTP stacks emerge pretty soon. I can be confident in saying this because
    Westhawk is working on one 🙂 . Ours is aimed to be plugged into an existing Java (or other JVM based) Server environment, this will allow servers to play a full role in the growth of the DataChannel.

    1. Tim,

      For a first release, that will do. The thing is that data channel related startups are already taking this capability to the extreme with their P2P assisted delivery. To make this work well, SCTP will need to find its way into the kernel instead of running on top of the OS in user mode.

      1. That’s a big ask. Looking at your diagram, you’ll have to push DTLS and ICE into the kernel too. If you don’t every
        packet that is received by the kernel will come out to the ICE/DTLS stack in user land, then transit back into the kernel for SCTP, then back out again to the end application, which would be staggeringly inefficient.
        There is also a downside to pushing SCTP down to the kernel, you lose the tight connection of the data channel with the rest of the web app/media and blur the security boundaries between users. Both are currently pretty well served in webRTC.

        I can imagine some gateways (say to IMS or the rest of the legacy VoIP world) might want to push all of this into a product specific kernel. However for the bulk of the server side webRTC use-cases you want the data-channel close to the web app so as to leverage the context of the web session.

        1. Without knowing the details, I assume there are packets that don’t need to go “all the way up” to the application.

          There’s also the nagging issue of needing to reassemble packets into messages (again, no need to do that in user space).

          And there’s the possibility of using asynchronous I/O or lord forbid zero copying (or close to it).

          What I am looking for, is an SCTP kernel mode implementation that is comparable to that of UDP and TCP – both live in kernel mode and spewing buffers up to the application.

          1. Protocols like SCTP can perfectly done in user space. You only need the kernel to make sure that one application can’t receive packets of a different application — UDP does this in the webrtc case. On could argue that moving session-specific protocol handling into user space is the rigth thing to do if you believe in the end-to-end principle. The kernel is just another hop on the path. It is always easier to add function to an application than adding a function to the kernel (see tls vs ipsec).

  4. Hello,

    Don’t be so harsh. It’s a good protocol.
    Some info – SCTP was created as an effort to allow carrying SS7(signalling) traffic on top of IP. TCP is not suitable for this purpose. This is the reason SCTP is message oriented(rather than stream) and supports redundancy on protocol level (multi-homing). SS7, for the readers not familiar with Telecoms, is an ancient protocol stack which predates TCP/IP and has its roots in circuit switched networks.
    The lovechild of this effort is the SIGTRAN stack. Currently SCTP + M3UA is the de-facto standard for carrying signalling traffic in Telco networks around the world.
    Yes, SCTP presence in consumer devices might be abysmal but that’s not true in general. There are many products and systems which rely on it. Implementations(some proprietary) are mostly for Linux, Unix and Solaris (yeah nobody runs Windows in this kind of environment).

    Back to the topic why SCTP is used for WebRTC. Maybe (I don’t know) the idea was to make WebRTC ubiquitous. Make it a standard for both consumer peer to peer communications and larger scale enterprise systems(PBXs). SCTP shines in use cases where two nodes communicate with each other for the sessions of many users.
    There were plenty of vendors on the last MWC(in Barcelona) offering enterprise products which supported WebRTC(along with other protocols).

  5. P.S.
    “I think the people who defined WebRTC are historians or librarians. I say this all the time: WebRTC brings practically no new technology with it.”
    If it was intended for enterprise systems, then that’s understandable. In large systems voice/video traffic goes through HW implemented codecs. Anything too fancy is not going to be available on silicon and running it on x86 (or any general purpose processor) is not a good option for scaling.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}