Improving the Understanding of Malware using Machine Learning

Downing, Evan

Title:

Improving the Understanding of Malware using Machine Learning

Files

DOWNING-DISSERTATION-2023.pdf (3.69 MB)

Author(s)

Downing, Evan

Advisor(s)

Lee, Wenke

Advisor(s)

Person

Lee, Wenke

Associated Organization(s)

Organizational Unit

College of Computing

Organizational Unit

School of Computer Science

Collections

Theses and Dissertations

Permanent Link

https://hdl.handle.net/1853/73147

Abstract

When a security organization receives a sample (whether it be a binary, script, etc.) from their customers, their goal is to determine if it is malicious or benign. Because samples can be received in large volumes, automated triage and analysis is required to keep up. Broadly speaking, these automated solutions are composed of statistical models and heuristic rulesets, which use distinctive attributes from malicious samples observed in the past. In response, attackers will evolve their samples to evade analysis and detection over time. To evade static analysis, malware binaries can obfuscate themselves by removing system calls and strings from plain view. This prevents reverse engineers from statically identifying binary functions of interest to trigger during dynamic analysis. To evade dynamic analysis detection, malware can randomize their artifacts (such as filenames, process names, etc.), which makes automatically mining behaviors which generalize for future variations difficult. To address these challenges, this thesis proposes a framework to identify malicious functions in static malware binaries for analysis, and behavior combinations in dynamic analysis reports for detection. The framework takes incoming sample binaries submitted to the organization to be analyzed as input. First, DeepReflect localizes malicious functions within the unpacked malware binaries (statically), allowing analysts to target specific regions for further dynamic analysis. DeepReflect increases the malicious function detection Area Under the Curve (AUC) value by 6-10% compared to four state-of-the-art approaches on a dataset of 36k unique, unpacked malware binaries. After executing the samples in a controlled sandbox, BCRAFTY uses its dynamic report to extract and generalize behavior combinations to detect similar malware samples in the future. Compared to using analyst-defined behaviors alone, BCRAFTY increases the malware detection True Positive Rate (TPR) value by 7.5% while keeping the False Positive Rate (FPR) value near 0.3% .

Date Issued

2023-12-06

Resource Type

Text

Resource Subtype

Dissertation

Full item page

Title:

Improving the Understanding of Malware using Machine Learning

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Improving the Understanding of Malware using Machine Learning

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Improving the Understanding of Malware using Machine Learning