Compiler and Machine Learning-based Predictive Techniques for Security Enhancement through Software Debloating
Author(s)
Porter, Christopher
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Code reuse attacks continue to be a serious threat to software. Attackers today are able to piece together short sequences of instructions in otherwise benign code to carry out malicious actions. Eliminating these reusable code snippets, known as gadgets, has become one of the prime focuses of attack surface reduction research. The aim is to break these chains of gadgets, thereby making such code reuse attacks impossible or substantially less common. Recent work on attack surface reduction has attempted to eliminate these attacks by subsetting the application, e.g. via user-specified inputs, configurations, or features, to achieve high gadget reductions. However, such approaches suffer from the limitations of soundness (meaning the software might crash or produce incorrect output during no-attack executions on regular inputs), or the techniques may be conservative and leave a large amount of attack surface untackled. This thesis develops three techniques that combine static analysis with dynamic predictions based on machine learning (ML) to address the above shortcomings. They are fully sound, obtain strong gadget reduction, and are shown to break shell-spawning gadget chains and stop real-world attacks arising out of known Common Vulnerabilities and Exposures (CVEs). The techniques reduce attack surface by activating a (minimal) set of functions at chosen callsites and then deactivating them upon return.
In the first work, BlankIt, we target library code and achieve ~97% attack surface reduction. The technique uses arguments to library function calls and their static single assignment-based backward slices for training an ML model, which then predicts reachable functions at the callsite using runtime values. In particular, we are able to debloat GNU libc, which is notorious for housing gadgets for code reuse attacks. In the second work, Decker, we target application code and achieve ~73% total gadget reduction. The percentage reduction is similar to prior art but without sacrificing soundness. Decker works by instrumenting the program at compile-time at key points to enable and disable code pages; then at runtime, the framework executes these permission-mapping calls with minimal overhead (~5%). In the third work, PDSG, we show how to augment the whole-application technique with an accurate predictor to further reduce the potential attack surface. ML-based predictive techniques do not offer guarantees and suffer from mispredictions; thus, the predictions are sanitized with lightweight checks. The checks rely on statically derived ensue relations (i.e. valid call sequence relations) that are used for separating mispredictions from actual attacks. PDSG achieves ~83% total gadget reduction with ~11% runtime overhead. Its predictions trigger runtime checking in ~4% of cases.
In conclusion, the thesis empirically shows that it is possible to devise precise and sound attack surface reduction techniques by combining static analysis and ML to overcome their inherent limitations. ML prediction aids purely static analysis by improving its precision, and static techniques augment the ML models by providing mechanisms for identifying when a misprediction is truly an attack.
Sponsor
Date
2023-07-30
Extent
Resource Type
Text
Resource Subtype
Dissertation