0:00:00.060,0:00:04.560
My talk is going to be a little different. So my talk is going to be on a surprising

0:00:04.560,0:00:09.960
efficiency and exponential cost of fuzzing. So how can something be surprisingly efficient

0:00:09.960,0:00:14.220
and also come with an exponential cost? So the actual purpose of this talk is

0:00:14.220,0:00:20.640
to introduce to the weirdness of randomness and probability theory.

0:00:20.640,0:00:26.040
So given it - given a name of this event I won't talk much about the theory

0:00:26.040,0:00:34.380
but I want to talk about intuition. And one of the key messages that I want

0:00:34.380,0:00:39.120
you to take away from this talk is that we might have strong intuitions about solving a problem

0:00:39.120,0:00:44.580
but without a deep understanding of the problem sometimes our intuitions might lead us astray.

0:00:46.260,0:00:51.960
So what is fuzzing? Fuzzing is a testing process.

0:00:51.960,0:00:54.960
So here we are given a program and that program takes four characters.

0:00:56.520,0:01:01.740
One approach to test this program is called white box fuzzing - white box because the first

0:01:01.740,0:01:05.940
thing is that the analysis can actually see the internals of that program.

0:01:07.500,0:01:13.380
The program has four branches and we could now try to explore every path in

0:01:13.380,0:01:17.700
that program simply by trying to capture the conditions of each of these branches.

0:01:17.700,0:01:24.540
Say one branch is - one input would be exercising the path where the first character is not a b,

0:01:24.540,0:01:30.300
another input would exercise that path where the first input is a b but the second input

0:01:30.300,0:01:32.280
is not an a, and so on.

0:01:32.280,0:01:37.680
So we could explore all paths and at the end after exploring the fifth path - number

0:01:37.680,0:01:40.500
five - we would see that the program actually crashes.

0:01:40.500,0:01:45.420
And so in this sense this approach called white box fuzzing is actually most effective.

0:01:45.420,0:01:47.760
Why most effective? Because it can

0:01:47.760,0:01:52.920
actually prove the absence of an error, in this case of an assertion violation,

0:01:52.920,0:01:55.260
simply by enumerating all paths through the program.

0:01:55.260,0:01:59.520
Because you're making some assumptions but in principle we could say that white box buzzing

0:01:59.520,0:02:05.160
would be able to prove the absence of error, in this sense be most effective.

0:02:06.420,0:02:07.740
And also it's quite efficient, 

0:02:07.740,0:02:10.650
so if we - suppose we are -

0:02:10.650,0:02:12.840
we have these five - there are five paths in that program.

0:02:12.840,0:02:20.640
If we sample a path at random without replacement we would expect to take about three inputs to find

0:02:20.640,0:02:22.500
that crashing bug, right,

0:02:22.500,0:02:27.180
generated by enumerating all of these paths using this technique called white box buzzing.

0:02:28.980,0:02:33.240
Now on the other side of the spectrum there's something called black box fuzzing.

0:02:33.240,0:02:34.800
Black box fuzzing doesn't know anything about the

0:02:34.800,0:02:37.500
program - what it would do - it simply generates random

0:02:37.500,0:02:42.360
inputs for that program. So in - on most machines a

0:02:42.360,0:02:47.520
character can take one of 256 values, and suppose we just randomly generate

0:02:48.600,0:02:54.120
these values using sampling - uniform at random with replacement.

0:02:56.400,0:02:59.580
Of course it can never prove the absence of errors,

0:02:59.580,0:03:05.880
in fact there's this great quote by Dijkstra saying program testing can

0:03:05.880,0:03:08.700
be used to show the presence of bugs but never too show their absence.

0:03:09.300,0:03:14.340
It's almost more recently we've looked into this problem of giving

0:03:14.340,0:03:20.760
testing some kind of statistical guarantees so we can somehow compute what is called a

0:03:20.760,0:03:26.220
residual risk after a certain amount of testing but the effect remains,

0:03:26.220,0:03:31.740
black box fuzzing is not most effective - it cannot prove the absence of errors.

0:03:31.740,0:03:37.200
Okay so then if we look at this from a statistical perspective,

0:03:37.800,0:03:42.000
I said earlier that white box fuzzing would require about three inputs and

0:03:42.000,0:03:44.880
expectation to find that bug. Now black box fuzzing,

0:03:44.880,0:03:49.320
if we sample each input each value for each character uniformly at random,

0:03:49.320,0:03:52.140
we would expect to generate about four billion inputs before

0:03:52.140,0:03:54.360
discovering this bug. This sounds a lot.

0:03:55.200,0:03:58.680
So then white box buzzing should be better, right, it's the most effective,

0:03:58.680,0:04:02.340
it's also quite efficient. And here it's where it's going

0:04:02.340,0:04:05.280
wrong for the first time, at least not always.

0:04:06.240,0:04:13.920
Sometimes black box fuzzing wins. And this was discovered 30-40

0:04:13.920,0:04:18.840
years ago when people started doing experiments with these two techniques

0:04:20.940,0:04:25.500
with the hypothetically most effective technique and with just a simple random sampling.

0:04:25.500,0:04:34.020
Somehow they found that given some limited time, this very simple, very dumb, technique wins and

0:04:34.020,0:04:41.040
finds more bugs than the most effective technique. And so we looked at this very recently

0:04:43.140,0:04:48.180
from a statistical perspective and found that if a white box

0:04:48.180,0:04:52.980
fuzzer takes too long per input our black box fuzzer outperforms the white box fuzzer.

0:04:52.980,0:04:55.560
Right, if the generation for one input takes

0:04:55.560,0:04:59.760
too long then the black box fuzzer outperforms. Because the black box fuzzer generates

0:05:00.300,0:05:04.440
inputs very fast. So on my machine it takes about 6.3

0:05:04.440,0:05:08.820
seconds to generate 4 billion inputs and if I had 100 machines

0:05:09.540,0:05:15.840
it would take 63 milliseconds so we can easily scale bug finding

0:05:15.840,0:05:19.740
because because black box fuzzing is essentially embarrassingly parallel.

0:05:21.900,0:05:27.540
And so in this paper we actually give a probabilistic bound on this maximum time.

0:05:29.520,0:05:33.360
So then is black box fuzzing the best that we can do?

0:05:35.940,0:05:39.780
And the answer here is wrong, not - we can do even better than that.

0:05:40.440,0:05:44.340
So what I presented here was a generational black box fuzzer.

0:05:44.340,0:05:45.540
What it means is, I generate

0:05:46.980,0:05:51.540
values for each of these characters, but I could also - instead of trying

0:05:51.540,0:05:56.040
to generate inputs from scratch, which is the generational approach,

0:05:56.040,0:06:01.440
maybe we can reuse existing inputs, and this is called mutational black box testing.

0:06:02.160,0:06:09.480
So suppose we have a seed input which is "bad?" The bad input is "bad!"

0:06:10.260,0:06:14.400
So if we had a mutational fuzzer which simply selects the four - which

0:06:14.400,0:06:18.540
simply selects the character at random and then chooses a value for that at random

0:06:18.540,0:06:24.540
then with a probability of one over four times one over 256

0:06:24.540,0:06:31.740
an expectation of about 1000 inputs which is much less than four billion inputs.

0:06:32.340,0:06:38.340
But I cheated a little bit here. We chose a really good seed to start with,

0:06:38.340,0:06:42.720
so can we do even better? Can we somehow automatically

0:06:42.720,0:06:46.800
discover this seed input? And this is where the

0:06:46.800,0:06:52.380
third approach gray box fuzzing comes in. Can we make can we take the advantage of black box

0:06:52.380,0:06:55.740
fuzzing and the advantage of white box fuzzing, can we kind of combine it?

0:06:56.280,0:06:59.100
And gray box fuzzing - because we don't actually analyze the program,

0:06:59.100,0:07:03.300
but we take some feedback called coverage feedback from gray box buzzing

0:07:03.300,0:07:05.760
and we add generated inputs to the corpus which increase coverage.

0:07:06.780,0:07:12.240
So suppose we just start with a random input. The probability that we generate the first

0:07:12.240,0:07:16.080
coverage increasing input is one over one thousand,

0:07:16.080,0:07:20.880
which means an expectation we require about 1024 inputs to

0:07:20.880,0:07:26.640
generate the first coverage increasing input. We add it to the corpus then we select this

0:07:28.800,0:07:33.540
next seed uniformly at random and in this case - this seed - and

0:07:33.540,0:07:37.980
the probability that we chose this seed, this character and generate 'a' as the value

0:07:37.980,0:07:42.420
requires about 2000 inputs. So we go on like that,

0:07:42.420,0:07:48.300
and easily starting from a random input we can generate - we can

0:07:48.300,0:07:53.640
find a bug using only 10,000 inputs. And now my machine just takes 150 microseconds.

0:07:53.640,0:07:58.620
So we - so this is much faster than a symbolic execution tool which requires

0:07:58.620,0:08:03.420
all this machinery - constraint solving and encoding and so on for three inputs.

0:08:05.280,0:08:09.120
And in the work that we presented at CCS16 we also boosted gray

0:08:09.120,0:08:13.860
box fuzzing simply by choosing that seed which exercises the lowest probability domain,

0:08:13.860,0:08:18.060
and we go down further to four thousand inputs and to 55 microseconds.

0:08:19.560,0:08:24.960
Okay so the insight is, if you have a really efficient fuzzer

0:08:24.960,0:08:30.060
let's just throw more machines at the problem. You remember on my machine it takes 6.3 seconds,

0:08:30.060,0:08:34.620
on 100 machines it takes 63 milliseconds to find the same bug.

0:08:36.540,0:08:40.920
So then you might think, well, if I have - I can

0:08:40.920,0:08:44.340
take X times small machines means I can find X times more bugs, right?

0:08:45.120,0:08:50.520
And again we are wrong. So this is an empirical cost.

0:08:50.520,0:08:56.460
We looked at a lot of data where we see this is an exponentially increasing number of machines.

0:08:56.460,0:08:59.400
It's a log scale here. And this is the number of additional

0:08:59.400,0:09:04.620
vulnerabilities discovered within the same time. And we see that on a linear increase in the number

0:09:04.620,0:09:08.880
of new vulnerabilities discovered requires an exponential increase in the number of machines.

0:09:08.880,0:09:13.980
And this is the exponential cost. And a little bit of an explanation here,

0:09:13.980,0:09:17.580
this is just the intuition. If you think about just one bug

0:09:18.360,0:09:22.980
and we increase the number of machines exponentially for a long time we won't -

0:09:22.980,0:09:27.540
for a long time means the number of machines - we won't see that bug

0:09:27.540,0:09:32.280
and there's almost linear increase at a certain number of machines

0:09:32.280,0:09:37.080
and then we are again almost - we have discovered this bug and so

0:09:37.080,0:09:41.160
we kind of go on with a straight line. If we now - instead of having just one

0:09:41.160,0:09:45.360
vulnerability we have 10 vulnerabilities we kind of get these quickly lines and

0:09:45.360,0:09:48.660
they almost look like straight lines and we increase the number of - the

0:09:48.660,0:09:53.340
number of vulnerabilities, number of bugs or whatever you want to measure you find it -

0:09:53.340,0:09:58.920
you get more and more linear. And the reason for this is that we have these

0:09:58.920,0:10:04.260
kind of constant - we never see that either - we never see the one to be discovered or we

0:10:04.260,0:10:07.800
always see the one already discovered and in between we have this

0:10:07.800,0:10:13.020
kind of almost linear increases. This is how we kind of get this linear

0:10:13.020,0:10:19.080
increase for an exponential number of machines. Okay so intuitively each new vulnerability

0:10:19.080,0:10:22.200
requires some more resources than the previous vulnerability

0:10:24.180,0:10:28.980
so the constant rate of vulnerability discovery requires exponential amount of resources.

0:10:31.560,0:10:35.700
So what I showed you is white box fuzzing - we have a technique which

0:10:35.700,0:10:38.520
we really love to work with, which is really smart,

0:10:38.520,0:10:41.160
which analyzes the program, which in fact is the most effective,

0:10:41.160,0:10:45.540
but in practice it is easily outperformed by a very dumb

0:10:45.540,0:10:48.780
technique called black box fuzzing, by simply randomly generating inputs.

0:10:48.780,0:10:53.880
And in fact this is so efficient that we can easily scale that across a lot

0:10:53.880,0:10:58.620
of machines and find the same the same number of bugs X times faster.

0:11:00.060,0:11:03.420
I also talked about a machine that kind of leverages the efficiency

0:11:03.420,0:11:07.800
of black box fuzzing but it's more like - but it enumerates paths like whitebox fuzzing

0:11:07.800,0:11:12.780
but then I explained how even that gray box fuzzing technique includes black box fuzzing

0:11:12.780,0:11:17.160
and gray box fuzzing are somehow affected by an exponential cost.

0:11:18.000,0:11:23.820
Thank you so much. All right thank you very much Marcel.

0:11:25.200,0:11:28.620
It's great to see this kind of quantitative approach to this problem.

0:11:29.700,0:11:36.180
A question coming in from one of the viewers is, how much of the statistics do I need to understand

0:11:36.180,0:11:43.560
to be able to apply these sorts of techniques? Oh so there's - that's - to apply it has no

0:11:43.560,0:11:48.960
need to understand statistics. It's more like - to understand

0:11:48.960,0:11:55.860
the statistics is more a help for you that - For instance, the insight that you cannot just

0:11:55.860,0:12:00.480
throw more machines at the problem and then hopefully you find more bugs, doesn't work.

0:12:00.480,0:12:06.540
At some point you kind of run out of machines because the cost for

0:12:06.540,0:12:13.560
finding more vulnerabilities is exponential. That's interesting to understand why this is the

0:12:13.560,0:12:17.460
case and we use statistics and probability theory to explain why this is the case.

0:12:17.460,0:12:21.720
But when you apply fuzzing you don't need to know, you don't need to know understand the statistics.

0:12:21.720,0:12:29.580
Maybe one other thing that we explored is that - we are often interested in the

0:12:29.580,0:12:32.100
probability that we find about - if we have not found a bug.

0:12:32.100,0:12:37.500
Suppose you have run a campaign for 24 hours and you have not found any bug,

0:12:37.500,0:12:40.560
You still want to know what is the probability that we find a

0:12:40.560,0:12:45.900
bug with just one more input generated and this is where you can use statistics.