All the Truth About the Latest (non)Hype of Fuzzy Testing WebRTC

By Tsahi Levent-Levi

December 17, 2018

There’s a lot of fuzzing around lately about WebRTC. Which is really about SRTP. Which is really important. But also really misplaced.

Before I Begin

This all started when Google Project Zero, a team tasked with actively searching for zero day bugs (nasty crashes and similar bugs that might be exploited by hackers) set their sights on video conferencing and WebRTC. The end result of it all is a github repository with tools to test RTP streams (and some filed bugs).

A few things to put the house in order:

These bugs are important. Go fix them
I am not a security expert, but I know my way with security and have a few scars to show for it
This isn’t the end of the world. A few bugs were found. Many of them old. This happens every day. Some are nastier than others
These won’t be the last bugs in WebRTC and they won’t be the most serious that get found either. Just ask NewVoiceMedia about their recent audio issues
We will all forget about this come 2019 and proceed with our normal daily lives
Check here for more on WebRTC testing

Now that we’ve cleared the air – let’s check what’s all that fuzz. Shall we?

What Fuzzing means

Wikipedia has his to say about Fuzzing:

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.

For me, fuzz testing is about the generation of malformed inputs in ways that the developers haven’t anticipated or tested for. This will result undefined behavior, which is largely a nicer word of saying a bug. In some cases, the bug will be an innocent one. In other cases, it can be nasty:

It might cause the software to crash
Go read or write where it shouldn’t (overflow)
Deadlock the whole thing (=cause it to freeze)
Cause a memory leak

The type of bugs that can be found is endless, which makes for really good FUD (fear, uncertainty, doubt) and lore.

A good malformed input can theoretically be used to grant you administrative access to a machine or to allow you to read memory where you shouldn’t have access to.

A simple explanation can be this: assume your software expects a user’s email to be 40 characters long. Lower than that is obviously fine, but what will happen if you use an email that is longer than 40 characters? Somewhere along the line, there will be a piece of code that should check the length and state that you’ve got it too long. And if there isn’t… well… we’ve reached the realm of undefined and potential security bugs.

The same can happen in network protocols,where whatever you send “on the wire” has a structure of sorts. The machines need structure to be able to parse the data and act upon it. So if you change the data so it is close to the expected structure, but off in just a bit – you might get to that realm of undefined as well.

Fuzzing is trying to get to that place – adding randomness in just the correct places to get to undefined software behavior.

Let me tell you a bedtime story

MY fuzzy life started in Finland, though I’ve never been there (yet).

At Oulu university, one day, a new something called “PROTOS Test Suite” was created. At the time, I was the project manager leading the development and maintenance of RADVISION’s H.323 protocol stack. We’ve licensed it to many vendors around the globe, all using our source code to build VoIP products.

The PROTOS Test-Suite was all about security testing. The intent behind it was to find bugs that cause crashes and other ailments to those using H.323. And they chose the best possible entry point. Here’s how they phrased it:

The purpose of this test-suite is to evaluate implementation level security and robustness of H.225.0 implementations. H.225.0 is a protocol responsible for signalling and setting up H.323 calls. […]
The scope of the test-suite was narrowed to H.225.0 version 4 Setup-PDU. Rationale behind this selection was:

Setup is the first message sent to a target H.323 endpoint upon call signalling, it is easy to deliver test-cases and to restore the implementation back to its initial state by disconnecting.

[…]

I marked in bold the important parts. Specifically, the guys at Oulu decided to go after the “pick up line” of H.323 and try to come up with nasty Setup messages that will confuse H.323 devices.

And confuse they did. PROTOS has 4497 Setup messages. On my first run with it, probably 50% of them caused our beloved H.323 stack to crash. I spent a week building the software to automate using it and fixing all the nastiness out of it. I admired the work they did and the work they made me do.

PROTOS practically analyzed how the things go on the wire, and devised a set of messages that were bound to get picked by bad programming practices, which we all err on as humans. This isn’t exactly fuzzing in an automated fashion, but it is the “manual” equivalent of it.

This got its own CERT vulnerability note and we had a great time working with our customers on updating our stack and getting these security fixes to work.

I believe some of our customers actually upgraded and updated their systems due to this. I am sure many didn’t. I am also assuming many of our customers’ customers didn’t upgrade their own deployed equipment. And the world continued on. Happily enough.

All this took place in 2004. Before WebRTC. Before the cloud. Before mobile. With practically the same RTP/RTCP protocol and the same techniques and mechanisms in VoIP that we use today in WebRTC.

Why didn’t people look at RTP vulnerabilities at that time? We’ll get to that.

Google’s Project Zero and video conferencing

This year, Google Project Zero decided to look at video conferencing. The “way in” was through WebRTC. Natalie Silvanovich was tasked with this and she wrote a series of 5 posts about it. The first one was about her selection and adventures with WebRTC itself. In it, she writes:

I started by looking at WebRTC signalling, because it is an attack surface that does not require any user interaction. […] WebRTC uses SDP for signalling.
I reviewed the WebRTC SDP parser code, but did not find any bugs. Then I also compiled it so it would accept an SDP file on the commandline and fuzzed it, but I did not find any bugs through fuzzing either. […]
I then decided to look at how RTP is processed in WebRTC. While RTP is not an interaction-less attack surface because the user usually has to answer the call before RTP traffic is processed, picking up a call is a reasonable action to expect a user to take. […]
Setting up end-to-end fuzzing was fairly time intensive […]

A few things that come to mind here:

The “signaling” layer in WebRTC (=the SDP parser) is rather robust against these types of attacks. Natalie couldn’t find anything there
Signaling and SDP, is the equivalent of what the guys at Oulu did with their PROTOS test suite
There is a notion here of “call answering”. This isn’t what WebRTC does. It connects sessions. Sometimes directly and sometimes indirectly. And in all cases, there are layers above RTP that the users (and attackers) will need to go through first
Setting up such a test, doing end-to-end fuzzing in the RTP layer is time intensive

Time intensive is important, as this raises the bar to those wishing to exploit such a weakness.

The fact that RTP isn’t the first attack surface and isn’t the first layer of interaction makes it somewhat less obvious on how to exploit it (besides instigating DDoS attacks on devices and servers).

Coupling these two – the complexity and the non-obviousness of an exploit is what kept people from putting the effort into it up until today.

The Fuzzy feelings of our WebRTC industry

Ben Hawkes, Project Zero team lead tweets on it garnered 3 digit likes and retweets, tapering off in the last 2 posts (I attribute that to fatigue of the subject):

Project Zero blog: "Adventures in Video Conferencing Part 1: The Wild World of WebRTC" by @natashenka – https://t.co/pdtZLDDP9M
— Ben Hawkes (@benhawkes) December 4, 2018

That kind of sharing is an average day for most posts published by that team. A few immediately took the cue and started fuzzing on their own. A notable example is Philipp Hancke who aimed at the Janus media server and fuzzed REMB RTCP messages.

His attack was quite successful due to several reasons:

He had he source code of Janus and was able to isolate the area he wanted to attack. This made the process easier than the work done by Project Zero
He picked an obvious target that was bound to crash multiple times – a message buried deep inside the protocol that aimed at control logic that takes place a lot after the session gets connected

Should you start Fuzzing away your WebRTC application?

Probably not.

There’s a lot more to WebRTC testing than just fuzzing. Prioritize what you’re doing…

And let’s face it – in the list of tests that you want to do but don’t do today, fuzzing fits nicely near that end of the things you just never find the time and priority to handle.

The good thing? For most of us, fuzzing is something that “others” should be doing.

If you are using a CPaaS vendor, it is his task to protect his signaling and media servers against such attacks.

If you run on top of the browser… well… those who maintain the WebRTC code for the browser need to do it (and it is Google for the most part at the moment).

You should think about fuzzing in your own application logic and the things that are under your control, but the WebRTC pieces? Going down the rabbit hole of fuzzing RTP and RTCP packets? Not for you.

Your role here is to ask the vendors you work with if they have taken steps in the area of security testing and what exactly have they done there. Fuzzing needs to be one of them things.

Who should care about fuzzing?

There’s a shortlist of people that needs to deal with fuzzing.

If you develop and deploy your own media servers and client side frameworks – you should fuzz them away
- The example above that Philipp Hancke did with Janus? It should be done on more such message types and protocol layers and it should be done for the other media servers
- A WebRTC implementation in Python added some fuzzing related fixes in version 0.9.14: “Fix RTP and RTCP parsing errors detected by fuzzing”
- That said, do we want them to do that or implement unified plan? What has a higher priority? For most of the industry, it would be unified plan…
If you are using third parties, you need to make sure you update them frequently
- Using a WebRTC stack from a year or two ago isn’t something you should be doing
- Using open source media servers without upgrading them from time to time (and actively looking for these security patches for them) is als not something you should be doing
CPaaS vendors…
- These things is one of them things they live for
- They deal with this headache so you don’t have to
- If they don’t – you should take your business elsewhere. Just saying
Browser vendors. Enough said

Where do we go to next?

Start by learning everything you can about WebRTC security. Fuzzing isn’t the first thing that comes to mind when you set off to build your business. Not even if your service must be very secure.

We are at a point where we are dealing and addressing fuzzing, and at the layers of RTP is what people seem to be doing (at least a bit). We’ve come a long way since we started with WebRTC and it is a good sign.

If you are serious about testing your WebRTC service, you can check out more here to learn about how to perform a WebRTC test.

To Fuzz or not to Fuzz? Where should you spend your energies with WebRTC? If you need help with that, just contact me.

Choosing the best WebRTC signaling protocol for your application

WebRTC is about reducing friction and barriers of entry

Philipp Hancke says:

December 17, 2018 at 6:06 pm

Tsahi,

thanks for putting fuzzing into a wider context.

I still think it is important to fuzz for people who write their own RTP/RTCP implementations. It complements (hopefully) existing processes for code review (even though I can see how I would have approved the janus get_remb code) and unit testing. And I hope that i’ve shown it is not overly expensive or complicated so can be applied by all developers on your shortlist.
The business rationale for this is increased stability and uptime.

If we can make fuzzing common place in 2019 i’ll be happy. But I think we need to talk about it a bit more.

Reply

All the Truth About the Latest (non)Hype of Fuzzy Testing WebRTC Applications