Title:
Robustness To Visual Perturbations In Pixel-Based Tasks

dc.contributor.advisor Kira, Zsolt
dc.contributor.advisor Hoffman, Judy
dc.contributor.author Yung, Dylan D.
dc.contributor.committeeMember Xu, Danfei
dc.contributor.department Computer Science
dc.date.accessioned 2023-05-18T17:48:03Z
dc.date.available 2023-05-18T17:48:03Z
dc.date.created 2023-05
dc.date.issued 2023-01-13
dc.date.submitted May 2023
dc.date.updated 2023-05-18T17:48:04Z
dc.description.abstract Convolutional Neural Networks (CNNs) have been shown to provide great utility across many vision tasks and have become the go-to model for problems involving video or image input. Though they’ve shown promise across many problems they come with inherent flaws. For example, in image classification, CNNs are known to output very high confidence values even when their accuracy is low. This is exacerbated when visual perturbations are introduced to inputs causing accuracy to drop, but confidence to remain high. This is similarly problematic when models use visual inputs for decision-making, such as through pixel-based Reinforcement Learning (RL) where an agent must learn a policy leveraging images of the environment as input. RL agents under these settings can perform well in training, but once deployed may face unseen visual perturbation, causing an erroneous execution of their learned task. Poor robustness to the previously mentioned examples is deadly in applied Machine Learning (ML) in the medical field and autonomous vehicles. Thus ways to impart robustness on CNNs for image classification and RL are of utmost importance. In this thesis, we explore solutions to the problem of overconfident image classification models and embedding robustness to visual perturbations in RL. We propose two distinct frameworks for doing so in two contexts: Image-based classification (Geometric Sensitivity Decomposition (GSD)) and decision-making (Augmentation Curriculum Learning (AugCL)). CNNs utilized for image classification has been shown to be erroneously overconfident. A large contributor to the overconfidence is attributed to a combination of Cross-Entropy loss, the standard loss for classification, and the final linear layer typically in vision models. GSD decomposes the norm of a sample feature embedding and the angular similarity to a target classifier into an instance-dependent and an instance-independent component. The instance-dependent component captures the sensitive information about changes in the input while the instance-independent component represents the insensitive information serving solely to minimize the loss on the training dataset. Inspired by the decomposition, we analytically derive a simple extension to current softmax-linear models, which learns to disentangle the two components during training. On several common vision models, the disentangled model outperforms other calibration methods on standard calibration metrics in the face of out-of-distribution (OOD) data and corruption with significantly less complexity. Specifically, we surpass the current state of the art by 30.8% relative improvement on corrupted CIFAR100 in Expected Calibration Error. Pixel-based RL has shown a lack of ability to identify and learn visual features when things such as color have been changed. Image augmentation has been shown to add to this, but is difficult to balance. AugCL is a novel curriculum learning approach that schedules image augmentation into training into a weak augmentation phase and a strong augmentation phase. We also introduce a novel visual augmentation strategy that proves to aid in the benchmarks we evaluate on. Our method achieves state-of-the-art performance on Deep Mind Control Generalization Benchmark when combined with previous methods.
dc.description.degree M.S.
dc.format.mimetype application/pdf
dc.identifier.uri https://hdl.handle.net/1853/71957
dc.language.iso en_US
dc.publisher Georgia Institute of Technology
dc.subject reinforcement learning
dc.subject computer vision
dc.subject calibration
dc.subject robustness
dc.title Robustness To Visual Perturbations In Pixel-Based Tasks
dc.type Text
dc.type.genre Thesis
dspace.entity.type Publication
local.contributor.advisor Kira, Zsolt
local.contributor.advisor Hoffman, Judy
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computer Science
relation.isAdvisorOfPublication 7d182893-486b-4570-87b4-0c4ba0c10626
relation.isAdvisorOfPublication 403cff3c-8f25-4db5-978b-ef617a9f8b6a
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 6b42174a-e0e1-40e3-a581-47bed0470a1e
thesis.degree.level Masters
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
YUNG-THESIS-2023.pdf
Size:
4.33 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
3.86 KB
Format:
Plain Text
Description: