Organizational Unit:

School of Computer Science

Permanent Link

https://hdl.handle.net/1853/70781

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/945

Full item page

Publication Search Results

Now showing 1 - 10 of 21

Improving the Understanding of Malware using Machine Learning

(Georgia Institute of Technology, 2023-12-06) Downing, Evan

When a security organization receives a sample (whether it be a binary, script, etc.) from their customers, their goal is to determine if it is malicious or benign. Because samples can be received in large volumes, automated triage and analysis is required to keep up. Broadly speaking, these automated solutions are composed of statistical models and heuristic rulesets, which use distinctive attributes from malicious samples observed in the past. In response, attackers will evolve their samples to evade analysis and detection over time. To evade static analysis, malware binaries can obfuscate themselves by removing system calls and strings from plain view. This prevents reverse engineers from statically identifying binary functions of interest to trigger during dynamic analysis. To evade dynamic analysis detection, malware can randomize their artifacts (such as filenames, process names, etc.), which makes automatically mining behaviors which generalize for future variations difficult. To address these challenges, this thesis proposes a framework to identify malicious functions in static malware binaries for analysis, and behavior combinations in dynamic analysis reports for detection. The framework takes incoming sample binaries submitted to the organization to be analyzed as input. First, DeepReflect localizes malicious functions within the unpacked malware binaries (statically), allowing analysts to target specific regions for further dynamic analysis. DeepReflect increases the malicious function detection Area Under the Curve (AUC) value by 6-10% compared to four state-of-the-art approaches on a dataset of 36k unique, unpacked malware binaries. After executing the samples in a controlled sandbox, BCRAFTY uses its dynamic report to extract and generalize behavior combinations to detect similar malware samples in the future. Compared to using analyst-defined behaviors alone, BCRAFTY increases the malware detection True Positive Rate (TPR) value by 7.5% while keeping the False Positive Rate (FPR) value near 0.3% .
Understanding and mitigating security threats in software supply chain

(Georgia Institute of Technology, 2023-11-29) Xiao, Feng

Modern software heavily relies on the software supply chain ecosystem to boost development efficiency and reduce costs. Due to its popularity, securing the software supply chain has become an increasingly critical concern for individuals, organizations, and governments alike. Unfortunately, the inherent vastness, complexity, and interdependence of the software supply chain often render existing security techniques inadequate. In particular, as software developers nowadays incorporate a plethora of unfamiliar third-party code, it is becoming increasingly challenging for existing vulnerability detection and mitigation techniques to understand and restrict program behaviors. To tackle the diverse threats and rising complexities, my dissertation proposes a series of novel program analysis techniques that focus on validating the interactions between supply chain modules. Along this path, I have designed and implemented a robust, end-to-end program analysis framework. In this dissertation, I first present LYNX and JASMINE, which are designed to assist developers in understanding the security-related properties of complex supply chain software. Specifically, LYNX is capable of revealing and comprehending hidden execution paths or input spaces that arise from careless use of supply chain software packages. LYNX has led to the discovery of a novel attack vector, hidden property abusing, as well as 15 previously unknown vulnerabilities. JASMINE, on the other hand, is a scalable program analysis diagram that simplifies the complexity of supply chain security analysis by focusing on inter-module behaviors when analyzing bloated and complex third-party modules. By applying JASMINE to real-world programs in the npm supply chain, we successfully detected 22 new vulnerabilities, many of which were assigned the highest severity rating by the CVSS. In the end, I will present XGuard, a tool designed for developers to implement robust and efficient security protection. This tool utilizes the comprehensive security properties identified by LYNX and JASMINE to automatically generate detailed protection policies. With the policy, XGuard ensures the integrity of data and control flow within the supply chain software.
Web-Based Forensics & Attack Investigations

(Georgia Institute of Technology, 2023-07-10) Allen, Joe

When a data breach transpires, forensic investigators swing into action to unravel the adversary's activities within the enterprise network, necessitating the elucidation of attack-induced damages, identification of sensitive resources accessed by the adversary, and formulation of future defense strategies. The rigorous examination often hinges on the organization's audit logs, which provide insights into each stage of the cyber-kill chain. Addressing this, researchers have devised sophisticated auditing systems that record complete system data provenance. However, a notable drawback is the semantic-gap issue, resulting in limited visibility into web-based attacks, a critical flaw considering the increasing prevalence of such attacks, often used by nation-state adversaries for initial penetration and compromise of enterprise networks. To address this limitation, this thesis presents a web-based attack investigation framework for forensic analysis of web-based attacks, both statically and dynamically, in a postmortem manner. The framework involves a web-based auditor that passively collects audit logs from user browsing sessions at an enterprise level, storing them on a logging server for later analysis. If a data breach occurs, these logs can help determine the root causes and implications of the attack. For static analysis, the logs can be transformed into a causality graph for thorough causality analysis. To demonstrate this, we propose Mnemosyne, a system utilizing audit logs to reconstruct, investigate, and assess the impacts of watering hole attacks. For dynamic analysis, the framework produces replayable causality logs, enabling auditors to identify suspicious events and replay the attack site. To achieve this, we developed WebRR, a novel, OS- and device-independent record-and-replay forensic auditing system for Chromium-based web browsers, allowing an investigator to dynamically analyze the attack through replaying the event postmortem.
Detection and Forensic Analysis of Modern ICS Attacks Via Correlating Scada Host Operations with Physical Behavior

(Georgia Institute of Technology, 2023-07-07) Ike, Moses Junior

The increased cyber connectivity in modern Industrial Control Systems (ICS) improved the overall operations of life-essential processes such as power and water treatment plants. Unfortunately, it also widened the cyber-attack surface of ICS, allowing adversaries to penetrate previously air-gapped plants, causing physical disruptions to critical infrastructure. Modern ICS attacks penetrate plants by infecting cyber-facing Supervisory Control and Data Acquisition (SCADA) workstations, which manage physical processes and devices. To evade defenses, attackers use ICS knowledge to stage and blend their attacks with normal SCADA activities, injecting just enough payloads at each step. As such, existing host and physical anomaly-based defenses miss these stealthy tactics due to their inability to correlate SCADA operations with physical behavior. To address this problem, this dissertation presents a hybrid approach that applies ICS domain knowledge to correlate SCADA operations with physical effects, enabling it to analyze the multistage behaviors of modern attacks. To demonstrate the efficacy of my approach, I first present an attack detection technique, SCAPHY. SCAPHY leverages the unique execution phases of SCADA to identify the limited set of behaviors to legitimately control physical processes, which differentiate from the attacker’s activities. SCAPHY detected real past attacks such as the Ukrainian power disruption. Next, to proactively detect staged attacks, I present FORECAST, a symbolic execution-based exploration of SCADA execution states following suspicious process symptoms. FORECAST detects “not-yet-executed” attacks and ranks them by their likelihood of future execution, enabling operators to prioritize their attack response. Finally, I present a post-mortem attack recovery technique, OTGUARD, which extends the ideas from SCAPHY and FORECAST to connect process symptoms to SCADA infections. OTGUARD uses the physical location of process symptoms to guide a symbolic exploration of multiple SCADA execution states leading up to the attack.
Toward solving the security risks of open-source software use

(Georgia Institute of Technology, 2019-11-11) Duan, Ruian

Open-source software (OSS) has been widely adopted in all layers of the software stack, from operating systems to web servers and mobile apps. Despite their myriad benefits, careless use of OSS can introduce significant legal and security risks, which if ignored not only jeopardize the security and privacy of end users but also cause developers and enterprises high financial loss. On one hand, use of OSS implicitly binds the developer to the associated licensing terms protected under copyright laws, which could have legal ramifications if violated. Just recently, Cisco and VMWare were involved in legal disputes for failing to comply with the licensing terms of the Linux kernel. On the other hand, software that reuses OSS also inherits their flaws, which could be exploited if not timely fixed. For example, the record-breaking security breach of Equifax originated from failure to patch a disclosed vulnerability in the open-source Apache Struts framework. Moreover, attackers are actively injecting malware into the open-source ecosystem, which abuses OSS reuse to amplify their effects. For example, eslint-scope, a package with millions of downloads in Npm, was compromised to steal credentials from developers. In this thesis, we aim to provide solutions to those risks posed by OSS misuse. First, we present a scalable OSS detection system (OSSPolice) that accurately detects OSS included in binary programs and checks for illegal misuse and n-day vulnerabilities in those OSS versions. OSSPolice was used to compare 1.6M apps against 140K OSS versions and identified over 40K potential GPL/AGPL license violators and over 100K apps using known vulnerable OSS. Once vulnerabilities have been identified, my next work (OSSPatcher) provides an automated patching system that fixes vulnerable OSS versions in app binaries using publicly available source patches. OSSPatcher is based upon variability-aware techniques which make patch feasibility analysis and, more importantly, source-code-to-binary-code matching possible. Third, we present a study (MalOSS) on recent supply chain attacks against the open-source ecosystem, where hundreds of malware have sneaked into package managers, and have been downloaded millions of times. We propose a comparative framework to understand the attacks and the misplaced trust that makes them possible, and a vetting pipeline to detect malware in package managers. MalOSS reported 339 malware to package manager maintainers, out of which, 278 (82 percent) have been confirmed and removed and 3 with more than 100K downloads have been assigned CVEs.
Efficient and refinable attack investigation

(Georgia Institute of Technology, 2019-10-04) Ji, Yang

As modern attacks become more stealthy and persistent, detecting or preventing them at their early stages becomes virtually impossible. Instead, an attack investigation or provenance system aims to continuously monitor and log interesting system events with minimal overhead. Later, if the system observes any anomalous behavior, it analyzes the log to identify who initiated the attack and which resources were affected by the attack and then assess and recover from any damage incurred. However, because of a fundamental tradeoff between log granularity and system performance, existing systems typically record system- call events without detailed program-level activities (e.g., memory operation) required for accurately reconstructing attack causality or demand that every monitored program be instrumented to provide program-level information. In this thesis, I will present my research focusing on addressing this issue. First, I will present a Refinable Attack INvestigation system (RAIN) based on a record-replay technology that records system-call events during runtime and performs instruction-level dynamic information flow tracking (DIFT) during on-demand process replay. Instead of replaying every process with DIFT, RAIN conducts system-call-level reachability analysis to filter out unrelated processes and to minimize the number of processes to be replayed, making inter-process DIFT feasible. Second, I will present a data flow tagging and tracking mechanism, called RTAG, which further enables practical cross-host attack investigations. RTAG allows lazy synchronization between independent and parallel DIFT instances of different hosts, and applies optimal tag map to minimize memory consumption. Evaluation results show RTAG is able to recover true data flows of realistic cross-host attack scenarios with low time and memory cost. Furthermore, we deployed RAIN and RTAG in the red team adversarial engagements funded by the DARPA Transparent Computing program. The data generated by our system effectively reconstructed the causality of the attacks with high accuracy, even in the presence of knowledgeable attackers.
Identifying and clustering attack-driven crash reports using machine learning

(Georgia Institute of Technology, 2019-04-26) Alzahrani, Ibtehaj M.

We propose a tool to identify crashes caused by filed exploits from benign crashes, and cluster them based on the exploited vulnerabilities to prioritize crashes from a security point of view. The tool extracts features from crash reports and decides whether a crash caused by malicious behavior or not. In the case of malicious behavior, it identifies the attack type that generates the crash report; we are focusing on four attack types which are Heap exploitation, Shellcode injection, Format String attack, and Return Oriented Programming. Further, it clusters the crash reports based on the exploited vulnerabilities.
Identifying and mitigating threats from embedding third-party content

(Georgia Institute of Technology, 2017-08-02) Meng, Wei

Embedding content from third parties to enrich features is a common practice in the development of modern web applications and mobile applications. Such practices can pose serious security and privacy threats to an end user, because sensitive data about a user in an application can be directly accessed by third-party content that usually operates with the same privilege as first-party content. The confidentiality and integrity of a user’s indirect data, such as a user profile, may also be compromised by such practices. This dissertation aims to identify new threats posed to end users by the practices of embedding third-party content and develop techniques to mitigate these threats. We first demonstrate how a malicious first-party application can either pollute or infer a user’s in- direct data in a third-party service or application by embedding it, and propose defense techniques to mitigate these two new classes of threats. We then study how over-privileged third-party JavaScript code accesses a user’s direct data in a web application in general through a large-scale measurement. This dissertation also aims to design mechanisms that enable end users and developers to limit the privilege of third-party content to prevent unintended behaviors. First, we present TrackMeOrNot, a client-side tracking control mechanism that allows end users to selectively opt out of third-party web tracking based on their demand. Second, we propose a fine- grained permission mechanism for web applications to restrict the privilege of third-party JavaScript code.
Securing software systems by preventing information leaks

(Georgia Institute of Technology, 2017-07-31) Lu, Kangjie

Foundational software systems such as operating systems and web servers are implemented in unsafe programming languages for efficiency, and system designers often prioritize performance over security. Hence, these systems inherently suffer from a variety of vulnerabilities and insecure designs that have been exploited by adversaries to launch critical system attacks. Two typical goals of these attacks are to leak sensitive data and to control victim systems. This thesis aims to defeat both data leaks and control attacks. We first identify that, in modern systems, preventing information leaks can be a general defense that not only stops data leaks but also defeats control attacks. We then investigate three ways to prevent information leaks: eliminating information-leak vulnerabilities, re-designing system mechanisms against information leaks, and protecting certain sensitive data from information leaks. We have developed multiple tools for each way. While automatically and reliably securing complex systems, all these tools impose negligible performance overhead. Our extensive evaluation results show that preventing information leaks can be a general and practical approach to securing complex software systems.
Building trust in the user I/O in computer systems

(Georgia Institute of Technology, 2017-07-26) Jang, Yeong Jin

User input plays an essential role in computer security because it can control system behavior and make security decisions in the system. System output to users, or user output, is also important because it often contains security-critical information that must be protected regarding its integrity and confidentiality, such as passwords and user’s private data. Despite the importance of user input and output (I/O), modern computer systems often fail to provide necessary security guarantees on them, which could result in serious security breaches. This dissertation aims to build trust in the user I/O in computer systems to keep the systems secure from attacks on the user I/O. To this end, we analyze the user I/O paths on popular platforms including desktop operating systems, mobile operating systems, and trusted execution environments such as Intel SGX, and identified that threats and attacks on the user I/O can be blocked by guaranteeing three key security properties of user I/O: integrity, confidentiality, and authenticity. First, GYRUS addresses the integrity of user input by matching the user’s original input with the content of outgoing network traffic to authorize user-intended network transactions. Second, M-AEGIS addresses the confidentiality of user I/O by implementing an encryption layer on top of user interface layer that provides user-to-user encryption. Third, the A11Y ATTACK addresses the importance of verifying user I/O authenticity by demonstrating twelve new attacks. Finally, to establish trust in the user I/O in a commodity computer system, I built a system called SGX-USB, which combines all three security properties to ensure the assurance of user I/O. The implemented system supports common user input devices such as a keyboard and a mouse over the trusted channel. Having assurance in user I/O allows the computer system to securely handle commands and data from the user by eliminating attack pathways to a system’s I/O paths.