Title:
Robust Learning Frameworks and Algorithms for Scalable Data Systems

Thumbnail Image
Author(s)
Chow, Ka Ho
Authors
Advisor(s)
Liu, Ling
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computer Science
School established in 2007
Supplementary to
Abstract
The data explosion and advances in machine learning have empowered modern computing systems with intelligence. While blossomed in business, science, and engineering with numerous real-world applications, unprecedented challenges in security, privacy, and scalability have been found. This dissertation research is dedicated to advancing robust learning algorithms and developing scalable frameworks for next-generation trustworthy and responsible intelligent data systems. The first contribution is to develop risk assessment frameworks for in-depth investigation of security threats in deep learning-driven visual recognition systems, including the vulnerability during the model inference and the distributed model training phases. We identify potential risks unique to object detection systems from their multi-task learning nature and introduce TOG, a suite of optimization algorithms that generate deceptive queries to fool well-trained object detection models. It targets different loss functions in object recognition to deceive the victim model into misbehaving randomly or purposefully with domain knowledge. Similarly, we take a holistic approach to understanding the data poisoning vulnerability in distributed model training. We introduce perception poisoning to mislead the learning process of the global object detection model in federated learning by selectively poisoning various combinations of objectness, bounding boxes, and class labels. Our innovations offer practitioners comprehensive frameworks for risk management and researchers to identify root causes for designing mitigation strategies. The second contribution is to develop security and privacy risk mitigation frameworks for building reliable systems with robustness guarantees against adversarial manipulation and enabling privacy-preserving photo sharing against unauthorized face recognition. Deceptive queries at the model inference phase can harm the integrity of intelligent systems. They can be transferred across different models to launch black-box attacks. To circumvent such a severe threat, we present a diversity-driven model fusion framework, ODEN, for robust object detection. It employs a team of models carefully constructed by our optimization algorithms and focal diversity methodology to conduct robust fusion through a three-stage technique. ODEN effectively mitigates TOG and other state-of-the-art attacks and enhances accuracy in the benign scenario. For the perception poisoning threat during the distributed training phase, only a small population of clients is present, and malicious clients can contribute gradients inconsistently to obfuscate their identity. We introduce a poisoning-resilient federated learning framework, STDLens, with a spatial-temporal forensic methodology to perform timely identification and removal of malicious clients. Even under various adaptive attacks, the STDLens-protected system has no observable performance degradation. Apart from the security threats, we develop defense mechanisms against unauthorized face recognition. Governments, private companies, or even individuals can scrape the web, collect facial images, and build a face database to fuel a face recognition system to identify human faces without their consent. We introduce PMask and Chameleon for users to remove their facial signatures from photos before sharing them online. Even though the signature-removed photo looks similar to its unprotected counterpart, privacy intruders cannot infer meaningful information from them for face recognition. The third contribution of this dissertation research is to develop machine learning algorithms to strengthen the cyber-resilience and scalability of microservice applications in hybrid clouds. Cyberattacks such as ransomware have been on the rise, and rapid recovery from such attacks with minimal data loss is crucial for business continuity. We introduce DeepRest to estimate how many resources are expected to serve the application traffic received from its end users. It verifies resource usage by comparing the expected consumption with the actual measurement from the microservice application without assuming workload periodicity. Any statistically unjustifiable resource usage can be identified as a potential threat. Our extensive studies confirm the effective detection of representative ransomware and crypto-jacking attacks. As a solution with dual purposes, DeepRest is the first to support resource estimation for unseen application traffic in the future (e.g., ten times more users purchasing products due to a sale). While it enables precise scaling in advance, the expected resource usage can exceed the capacity of the current private computing infrastructure. We further propose Atlas, a hybrid cloud migration advisor, to span the microservice application across private and public clouds to enjoy virtually unlimited resources while remaining cost-effective and performance-optimized with the least disruption to the regular operation of the application during the migration process.
Sponsor
Date Issued
2023-04-27
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI