Discussion: Fuzzing from First Principles

October 14th, 2021 – Alisa Esage

On September 14, 2024, I had the pleasure of participating in the Off By One Security Podcast by SANS, discussing "Fuzzing from First Principles". The livestream had two parts: first, a concise theoretical presentation on the math behind fuzzing, covering cutting-edge models which underpin all of fuzzing practices, from basic fuzzing to modern coverage-guided evolutionary techniques. The second part was a practical walkthrough of my successes and failures in 0-day vulnerability discovery across a wide range of popular software systems, as published in my github, which served as a basic demo of the model applications. You can find the slides here and watch the livestream recording here. In this follow-up article, I’ll tackle some interesting questions and comments from the livestream chat that didn’t make it into the live Q&A due to time constraints.

Introduction

First, a shoutout to podcast host Stephen Sims for protecting my peace during the presentation. I didn’t realize how many questions flooded in the livestream chat until I opened the recording on YouTube. Stephen expertly managed the chat and filtered a handful of relevant questions for our Q&A so, moving forward, I’ll address some more here. From the feedback, it’s clear that some viewers failed to understand my points and the math behind them. That’s not surprizing; my ideas in this presentation are extremely novel, therefore they are expected to provoke defensive reactions from narrow-minded individuals, leading to arrogant rejections based on false claims. As one commenter pointed out, the math seemed to trigger people. Shrug. Let’s be real: to grasp this presentation, you need a solid understanding of BOTH probability theory and advanced fuzzing practices. That’s not a common combo. But just because some aren't open to new ideas doesn’t mean I owe them repeated explanations, especially when they approach with offense. I do plan to publish a concrete text for those willing to learn. What thrilled me is that some viewers instantly connected with my ideas, evident from their relevant contributions to the livestream chat. Their questions showed general grasp of my theoretical framework, while struggling to connect it with advanced aspects of practical fuzzing that I didn't cover. I’ll discuss those meaningful contributions in this article, along with some off-topic but still interesting questions about binary exploitation, and a bit of casual chat for completeness. Prerequisites: low-level fuzzing internals (as in this masterclass), cutting-edge concepts in software/hardware exploitation, basic ideas of probability theory, Fuzzing from First Principles presentation. Note: certain questions appear out of context as they reacted to a specific thing that I mentioned in the moment. Here, I respond generally.

Fuzzing, Symbolic Execution, and Math

L0u15> Thanks for the awesome talk, it is gold! Besides coverage feedback, what other kinds of feedback do you think fuzzers could use for targets like browsers?

Program branch passing feedback is the only fundamental low-level metric in fuzzing. Note that I replaced your original term "coverage feedback" with a more specific term for a reason. In complex software like browsers (I think you meant JavaScript Engines in particular) it's not so much the code coverage that matters, but the context in which a particular piece of code executes; and other "meta" aspects of program at runtime.

Example. Imagine that you have a piece of JIT compiler code that generates low-level encoding for primitive values. There is a bug that can somehow be triggered from a WASM frontend that uses this lowering backend, but not from a JS frontend that also uses it. It means that achieving code coverage on that procedure isn't a meaningful indicator of progress in your fuzzer, as it pertains to finding the bug, because it misses the high level context. Similarly, one can think of a bug in JS runtime that can only be triggered from JIT-optimized JavaScript code, but not from interpreted code. In these sample scenarios you can technically achieve 100% code coverage on the whole source code and still miss a ton of bugs, with no hopes of getting closer to it with algorithmic optimization. This is obviously a state-of-the-art research topic where no ready solutions exist.

mesh3al32> if you know probability enough, fuzzing could be improved drastically. thanks for giving that to our attention

Exactly this. In fact, it's the core point of my presentation: use theoretical models to guide fuzzing improvement.

Aaron Hernandez> Loved the presentation. What do you think about the future of fuzzing and vulnerability research, considering the shift towards memory safe programming? Will fuzzing for vulns still be effective?

Fuzzing is as fundamental and inevitable as vulnerability. Also, it isn't just for memory safety issues. As technologies evolve, specific fuzzing techniques will change, but the core concept will remain relevant.

One thing that will change in fuzzing, regardless of programming technologies that it will target, is the low-level success rate. Speaking in terms of this presentation, the probability distribution of fuzzer will be consistently improving through research and practical efforts, while reducing the low-level error rate. Symbolic execution assist is one way to do this, which is an active area of research. Another way is to leverage LLM to generate fuzzing inputs.

An ideal fuzzer generates a single input and finds the bug with it, which is impossible - kind of like in low-level physics it's impossible to know a particle's location and speed at the same time, you can only approximate it. To the best of scientific knowledge, this (the principle of uncertainty) is a fundamental property of the universe. Similarly, vulnerability seems to be a fundamental property of systems. And so is fuzzing, as a way to uncover it.

Hhhhhhhhhhgffrfghjkjhyh> Just curious in terms of this example, would symbolic execution not get us a higher rate? I am just a bit confused on the overall scenario I may be a bit wrong

Symbolic execution can greatly enhance the fuzzer's probability distribution in theory, increasing the success rate per execution (i.e., discovering new branches or finding bugs). Ideally, if a branch can be deterministically resolved with an SMT solver, it gives a probability of P=1 for finding the right conditional value, compared to P=1/254 with stochastic blackbox fuzzing (assuming an 8-bit conditional value on branch and single-byte mutator with a true random generator, as an example; Image 1). So, going deeper into the program, the probabilities of finding the next branch correctly drop overall much slower that in random fuzzing and even in coverage guided fuzzing. This is in theory.

Image 1. Example of probabilities in stochastic blackbox fuzzing (singular branch)

In practice, however, this comes with the cost of running the solver, that slows down the fuzzer; as well as other practical constraints. Especially facing the program path explosion, this slowdown accumulates exponentially, rendering symbolic execution-assisted fuzzers unproductive toward deeper program exploration. This is why, at present, symbolic execution-assisted fuzzers are only marginally useful. My model shows exactly why and how it can be improved, though much of the solution will depend on practical challenges.

Note: This is purely theoretical reasoning; I might be overlooking specific implementation details, which I am not extensively familiar with.

mesh3al32> will this [fuzzing] suffer from path explosion as the case with symbolic execution?

Fuzzing suffers from "path explosion" just as symbolic execution, but to a lesser degree. In symbolic execution, the SMT solver overhead is substantial and it accumulates quickly, which at some point effectively halts the fuzzer. In contrast, in normal fuzzing it's the small probabilities of finding correct values which compound over time, while the cost of running the fuzzer and the fuzzed program is largely consistent throughout the program tree (Image 2).

Image 2. Example of probabilities in stochastic blackbox fuzzing (compounding)

This means that a SE-assisted fuzzer may initially outperform a normal fuzzer, but its productivity drops sharply, while a normal fuzzer starts slower but excels deeper into the program.

In practical terms, a SE-assisted fuzzer is better for quickly exploring medium-shallow surfaces, while a normal coverage-guided fuzzer is more productive in the long run, and generally better for targeting ultra-deep code. This is also a purely theoretical reasoning, based on the models that I presented in this livestream.

Alek Say> What about side channels and transitive path that is not represented in program and happens because of lack of constraints knowledge of underlying levels: hw, stlib, os, firmware api

First note that, while the root cause of side-channel attacks and speculative execution bugs isn't in software, they still require specific code constructs to exploit. Therefore, from the fuzzer's perspective, these attack surfaces aren't much different from software. You’re just shifting from fuzzing programs to fuzzing *with* programs, as you need to find the right code construct rather than a bug in the given code.

Core challenge here is bug detection. In software fuzzing, bugs typically manifest as memory access violations (crashes, BSODs, segfaults), but for hardware-level transient issues, this indicator must be synthetically constructed.

Another major challenge is low-level fuzzing optimization. As the root cause of bug lies beyond the code, it isn't possible to apply cutting edge techniques in this scenario, such as symbolic execution or input-to-state correspondence, nor even some basic coverage feedback; which restricts fuzzing to the old ways of random.

There are additional challenges, but overall, the difference between software fuzzing and hardware fuzzing is less than most people think.

Most importantly, there’s zero difference in the mathematical model I presented. Except the "probability distribution of program" will extend to "...of a set of programs", which is a purely practical difference.

Janek> is fuzzing combined with RE the most effective way to discover vulnerabilities ?

True in most cases. With RE used to target the fuzzer to specific deep code, that otherwise would take forever or never to reach with scratch cov-guided fuzzing of the general input vector.

Adam A> Q: in her experience with fuzzer probabilities vs those with vulnerability. Is there something to be said about the difference? IE vulnerabilities are more likely with Fuzz inputs > 5?

Adding extra seed inputs (assuming coverage-guided fuzzing with valid inputs) immediately enhances the fuzzer’s probability distribution. However, it will only align this distribution with the bug space to the extent that it aligns with the full branch coverage distribution. This is beneficial but not ideal. As I mentioned in the presentation, matching Pf to Pv is the core challenge of fuzzing.

Jack> This probability theory is really fascinating that I would like to visit some Mathematical things again to get fundamentals clear.

100%. I've included a couple of textbooks that I like on probability theory in the Reference section.

1337's> Im dumb but doest this mean tldr fuzzing don't work ?

Fuzzing works. Otherwise I wouldn't spare the effort on mathematical modeling it.

At the same time, fuzzing fails at a rate of 99%+ under the hood. Meaning the fraction of fuzzing inputs, and therefore target program runs, that neither crash the program nor discover new branches. They just spin the CPU, and negatively contribute to the climate problem. This situation is normalized in the fuzzing culture, and here is my humble contribution to denormalize it. It isn't smart to scale a 99%+ failure, even if it works to some extent.

If you can improve low-level efficiency of fuzzing, as opposed to compensating it with scaling - the latter being a mainstream trend in fuzzing optimization research - then you will find the bug faster than anybody else. If, after that, you scale your improved fuzzer, you will beat the Google's cloudfuzzer at scale.

I hope this makes sense.

Janek> How do you pick targets for fuzzing ? How to narrow down interesting targets ?

Generally, an ideal target for fuzzing has three properties: 1) complex code with binary data parsing (or a simple grammar that is easy to generate) 2) compact size of input 3) easy to interface with the fuzzer. If the target satisfies, then fuzzing is a no-brainer. If some requirements are off, then you'll need to reverse engineer the target software anyways to tweak and specialize the fuzzer, so maybe find some bugs manually in the process.

For open source software, manual code review will usually be the first choice. As someone with a lot of VR experience, I will likely spot the bug faster than it takes to set up the fuzzer.

When the code is fairly simple, meaning low potential for security issues: fuzzing of such code doesn't make sense, as it will take longer to fuzz than to find the bug manually.

In case of uncertainty get the fuzzer running first, meanwhile reverse engineer or manual analysis. Note that running "off the shelf" fuzzers rarely finds bugs in high-demand software, but at least you'll get a sense of how the program performs under harness and start collecting research feedback.

To summarize, a normal workflow in bug hunting isn't so much about "picking the target for fuzzing", it's more about picking the target first and then considering: "what techniques are best suited to find security bugs in this attack vector, and how to differ from other researchers?".

Binary Exploitation

A1 exe> Will she talk about how to get better at triaging crashes?

Crash triage is a distinct topic which is unrelated to fuzzing, in terms of "how it works". Obviously, this wasn't the topic of discussion, but I'm happy to address it here.

Assuming that your research platform basics are covered (symbolized binary or source code debugging, a snappy debugging setup, a stable crashing test case, etc.), you improve your crash triage by understanding the target software internals. Awareness of common vulnerabilities in target software is helpful, too, so studying the history of vulnerabilities at a mid to deep technical level. This forms the foundation of my specialized Vulnerability Research trainings (such as Hypervisor Vulnerability Research, with more target courses coming soon). Without this theory, you’ll struggle during triaging as well as exploit development, or waste excessive time debugging.

صائدون البيض > And also I wish Alisa could explain how to make the exploit portable, like for a Linux kernel exploit how to make it work for other kernel versions etc

How to make the exploit portable: a universal approach.

The first thing to understand is the limitations imposed by the target software system. Sometimes, the system's internals change so drastically between versions that you'll need a completely different bug trigger or heap grooming technique. If you can't determine the system's exact version - common in many scenarios - and you're restricted from crashing the target with brute-force based exploit techniques (like with OS kernels), you're effectively stuck on one version.

In a better scenario, you still can’t know the exact version, but the bug trigger is consistent, only the locations of system libraries vary, that you reuse in the exploit. Here, you can implement a dynamic assembler for ROP gadgets, as I did in my JScript9 exploit.

For OS kernels, a similar approach is possible using something like eBPF JIT compiler - not as the vulnerability target, but as a universal exploit engineering framework. The kernel memory space is unified; there’s no process separation. All threads can see each other, allowing you to use any kernel subsystem as a source of exploit primitives for vulnerabilities in other subsystems.

Ibrahim Aymen> If no one asked yet, how AI will change the fuzzing game

I will cover this topic in my upcoming keynote at VXCON. Hong Kong, 16th November 2024.

Binary House> arent JIT bug already dead?

JIT bugs (assuming JavaScript Engines context) are far from being dead. In fact, they keep evolving beyond mitigations, and will not die any time soon.

Kush> Implementing rust in kernel will stop or somehow limit the attack area? at least for beginner researcher

Kind of. Implementations in Rust have a significantly reduced attack surface, which comes at the cost of poorly understood security implications of the unreduced fraction of the attack surface. So, it isn't necessarily a good idea to rewrite OS kernels in Rust. Rather, maintainting a well audited open source C-code base with a small footprint offers better security guarantees in this scenario, as vulnerability implications of memory-unsafe languages are well understood, and they can be effectively mitigated, given a compact codebase. A good example is KVM (Kernel Virtual Machine hypervisor in Linux Kernel), which is written in pure C, and it's rather hard to break.

Claudius> binexp is literally magic

I know it's a casual take but it's more relevant than you think.

Low level hacking is a game (in the sense of game theory, not “video game”) where, as a hacker, you choose your balance between technology and magic as your strategy. You can rely entirely on technology, which is what most hackers do, but that won’t make you legendary. Pulling off 100% magic is tough and impractical in real life. A balance of around 80% technology and 20% magic elevates you to the elite level.

I touched on this when discussing the “probability distribution collapse” scenario - spending weeks analyzing code without results, only to get an exploitable crash by accident under some tooling; a pure coincidence, apparently. That’s the essence of it.

Just Chat

Joshua J. Drake> please!! teach me how to hack ;-)

My pleasure: zerodayengineering.com/training

Alek Say> Math proof last minute update before exam

Literally. I finalized the slides AND THE MATH MODELS in the last moments leading to the livestream, after a full week of hard intellectual work. There are typos in the slides to prove it.

Ethical Math> hi everyone!

Ethical Math just entered a practical hacking podcast. You saw it here first!

Adam A> Youtube's notifications are trash. just got here to support the stream

Indeed, Youtube's notifications of a livestream going live aren't super convenient. Also, it turns out that Youtube doesn't count livestream visitors as views, so the ~10K views number that currently shows on my livestream recording isn't inclusive of the original 7,500+ live viewers that broke Stephen's podcast record!!

Cat drinking fountain


Kush> in her training, will she also teach about fuzzing? Maze> Id be looking at the v8 internals course

My Zero Day Vulnerability Research course has a module on fuzzing. It also explains the basics of attack surface modeling to decide where to target the fuzzer.

Hypervisor Vulnerability Research training includes hypervisor-specific aspects of fuzzing in the general context of vulnerability discovery in hypervisors, while assuming familiarity with fuzzing foundations.

Fuzzing masterclass shows how to level up your fuzzing game through understanding low-level internals of modern fuzzing tools, with a goal in mind to attack non-trivial software that either isn't supported by fuzzers out of the box, or is extensively fuzzed by the vendor so you need to outsmart them.

JavaScript Engines Internals masterclass covers essential prerequisite knowledge for successful fuzzing of browser JavaScript Engines, not covering the fuzzing itself.

References

Fuzzing from First Principles, A.Esage, 2024, Off By One Security Podcast; slides, recording

Probability theory: - Introduction to Probability Models, 13th Edition, Sheldon M. Ross - Probability Theory: The Logic of Science, E.T. Jaynes

Youtube livestream chat export: YCS Chrome extension

Research Training