Title:
Improving the Understanding of Malware using Machine Learning

Thumbnail Image
Author(s)
Downing, Evan
Authors
Advisor(s)
Lee, Wenke
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Series
Supplementary to
Abstract
When a security organization receives a sample (whether it be a binary, script, etc.) from their customers, their goal is to determine if it is malicious or benign. Because samples can be received in large volumes, automated triage and analysis is required to keep up. Broadly speaking, these automated solutions are composed of statistical models and heuristic rulesets, which use distinctive attributes from malicious samples observed in the past. In response, attackers will evolve their samples to evade analysis and detection over time. To evade static analysis, malware binaries can obfuscate themselves by removing system calls and strings from plain view. This prevents reverse engineers from statically identifying binary functions of interest to trigger during dynamic analysis. To evade dynamic analysis detection, malware can randomize their artifacts (such as filenames, process names, etc.), which makes automatically mining behaviors which generalize for future variations difficult. To address these challenges, this thesis proposes a framework to identify malicious functions in static malware binaries for analysis, and behavior combinations in dynamic analysis reports for detection. The framework takes incoming sample binaries submitted to the organization to be analyzed as input. First, DeepReflect localizes malicious functions within the unpacked malware binaries (statically), allowing analysts to target specific regions for further dynamic analysis. DeepReflect increases the malicious function detection Area Under the Curve (AUC) value by 6-10% compared to four state-of-the-art approaches on a dataset of 36k unique, unpacked malware binaries. After executing the samples in a controlled sandbox, BCRAFTY uses its dynamic report to extract and generalize behavior combinations to detect similar malware samples in the future. Compared to using analyst-defined behaviors alone, BCRAFTY increases the malware detection True Positive Rate (TPR) value by 7.5% while keeping the False Positive Rate (FPR) value near 0.3% .
Sponsor
Date Issued
2023-12-06
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI