Quick fuzzing with AFL

As part of my projects, I am working on a Network based evolutionary fuzzer which works with Driller and honggfuzz at it's core. It is capable of instrumenting code and symbolic execution. During the course of this project, I was able to work a bit with AFL, probably one of the most powerfull fuzzers out there. However I never really quite tamed the beast and I thought I should get back to it. This post is a starters on this experience and gives a brief overview of AFL.

Edit :

This was my first time using AFL, as such there are many things I had not taken into account. Nevertheless I am leaving this post here for informational purpose as most of the commands and the logic is what you would use when doing actual fuzzing. However please note that:
-When using AFL you should try and aim for a much better exec speed! I am surprised I found anything considering my exec speed was something around 100 execs/sec. 3000 execs/sec is a minimum you should be aiming for. -In order to reach maximum coverage, please spend some time creating a test corpus. Use afl-showmap to make sure the coverage for a given input is efficient, moreover use afl-cmin to reduce your testcases to the interesting ones.

(Really) Quick reminders

Fuzzing: Is an automated vulnerability discovery technique which you can think of as billions of monkeys behind keyboards feeding random input to your app until it crashes.
Instrumented Fuzzing: Now picture this, instead of giving purely random input to the application, some fuzzers instrument the app, which means they sort of insert "beacons" in the binary at compile time. This allows them to monitor code coverage and see which path of the application was hit by their input. Hence they can select the inputs they are going to mutate (by bitflipping it for example) in order to maximize code coverage. The selection of the inputs which will be mutated is often based on a genetical algorithm which is inspired by... Yep! You got it! Genetic evolution.
In order to have efficient code coverage, it is important to select base testcases that follow the application's logic. For instance don't use a txt file if you're trying to fuzz an application which usually takes gifs as input.

Fuzzing with AFL

AFL Also known as American fuzzy lop is an instrumentation based fuzzer created by MichaƂ Zalewski (author of the tangled web and probably one of the most influential person in infosec in the world). It has several methods of instrumentation mostly proper to AFL.

First and foremost I decided to fuzz an application which was probably not fuzzed very often. Moreover I did not want to fuzz a network application as AFL does not work out of the box with network applications (though there are plenty of posts on the internet which describe workarounds) and I thought this would be quite the hassle for a first try in a long series (My next post will probably revolve around a network application though).

I decided on https://github.com/kohler/gifsicle it's a very popular app for manipulating gifs and I had no idea wether or not someone had ever felt like fuzzing it.

After installing AFL (pretty simple tbh) I ran:

CC=/home/warsang/afl-2.41b/afl-gcc CXX=/home/warsang/afl-2.41b/afl-g++ LDFLAGS="-
pthread" AFL_USE_ASAN=1 ./configure

In the gifsicle directory to instrument the binary with afl instrumentation. The AFLUSEASAN flag tells AFL to make use of it's address sanitizer.
ASAN is an open source tool from Google.
I'd like to quote the clang documentation which explains pretty well all you need to know about different types of sanitizers:

AddressSanitizer is a fast memory error detector. It consists of a compiler instrumentation module and a run-time library. The tool can detect the following types of bugs:

1.Out-of-bounds accesses to heap, stack and globals

2.Use-after-free

3.Use-after-return (runtime flag ASANOPTIONS=detectstackuseafter_return=1)

4.Use-after-scope (clang flag -fsanitize-address-use-after-scope)

5.Double-free, invalid free

6.Memory leaks (experimental)

Typical slowdown introduced by AddressSanitizer is 2x.

So I gave it a shot and after running a

make clean all

I ran:

afl-fuzz -m none -i gif_testcase/ -o output/ ./gifsicle/src/gifsicle -i -o toto.gif

afl-fuzz is the part of afl which does the actual fuzzing.

-m option: instructs AFL to not set a memory limit. That is something you want when using ASAN.

Nevertheless, using this method I was at something around 80 execs/second. This was quite slow and I thought it would take ages to fuzz my binary :'(

So I went through AFL docs and found this. Hence I thought I'd give llvm instrumentation a go. My initial testcases were gifs between 4 and 20k so I couldn't really make them smaller.

After installing clang (which I won't cover here as it's also pretty easy)

I ran:

CC=/home/warsang/afl-2.41b/afl-clang-fast CXX=/home/warsang/afl-2.41b/afl-clang-fast++ LDFLAGS="-pthreads" ./configure

Please note that the LDFLAGS here is "-pthreads" and not "-pthread" as before. We're using different compilers so that is probably why.

Clang instrumentation is instrumentation done by the compiler which implies that we benefit from compiler optimisation (have a look at this if you don't really know how a compiler works). The instrumentation we used beforehand was done by AFL itself and hence was not as optimised.

After doing another make clean all

I ran again:

afl-fuzz -i gif_testcase/ -o output/ ./gifsicle/src/gifsicle -i -o toto.gif

in a screen this time. After letting it fuzz for almost 16 hours I had the following results:

Campaign results

Quite a slow exec time. I think it's probably because I was running something else at the same time. As you will see the asciinema video had better results.

I found no crashes (which are for example typically caused by a buffer overflow vulnerability) however I did find 43 hangs. That's doesn't matter much on a gif editing binary, however on a network service which needs to stay up all the time, such an input may cause a Denial of Service (DOS). By looking at the AFL output file I saw in fuzzer_stats that the coverage was quite good as 466 paths were hit out of 470 paths found which overalls at over 99% coverage!

This is quite good hence I doubt fuzzing any longer will find the crash vulnerability I was hoping for!

Maybe symbolic execution would have helped hit 100% code coverage.
Asciinema of the whole fuzzing setup

I hope you enjoyed this article.
I hope to do a post on fuzzing a network application and/or on symbolic/concolic execution sometime.

Resources

http://lcamtuf.coredump.cx/afl/
http://www.geeknik.net/4rzj8nz7n
http://resources.infosecinstitute.com/fuzzing-mutation-vs-generation/#gref
http://www.aosabook.org/en/llvm.html
https://clang.llvm.org/docs/