Robustness To Visual Perturbations In Pixel-Based Tasks

Yung, Dylan D.

Title:

Robustness To Visual Perturbations In Pixel-Based Tasks

dc.contributor.advisor	Kira, Zsolt
dc.contributor.advisor	Hoffman, Judy
dc.contributor.author	Yung, Dylan D.
dc.contributor.committeeMember	Xu, Danfei
dc.contributor.department	Computer Science
dc.date.accessioned	2023-05-18T17:48:03Z
dc.date.available	2023-05-18T17:48:03Z
dc.date.created	2023-05
dc.date.issued	2023-01-13
dc.date.submitted	May 2023
dc.date.updated	2023-05-18T17:48:04Z
dc.description.abstract	Convolutional Neural Networks (CNNs) have been shown to provide great utility across many vision tasks and have become the go-to model for problems involving video or image input. Though they’ve shown promise across many problems they come with inherent flaws. For example, in image classification, CNNs are known to output very high confidence values even when their accuracy is low. This is exacerbated when visual perturbations are introduced to inputs causing accuracy to drop, but confidence to remain high. This is similarly problematic when models use visual inputs for decision-making, such as through pixel-based Reinforcement Learning (RL) where an agent must learn a policy leveraging images of the environment as input. RL agents under these settings can perform well in training, but once deployed may face unseen visual perturbation, causing an erroneous execution of their learned task. Poor robustness to the previously mentioned examples is deadly in applied Machine Learning (ML) in the medical field and autonomous vehicles. Thus ways to impart robustness on CNNs for image classification and RL are of utmost importance. In this thesis, we explore solutions to the problem of overconfident image classification models and embedding robustness to visual perturbations in RL. We propose two distinct frameworks for doing so in two contexts: Image-based classification (Geometric Sensitivity Decomposition (GSD)) and decision-making (Augmentation Curriculum Learning (AugCL)). CNNs utilized for image classification has been shown to be erroneously overconfident. A large contributor to the overconfidence is attributed to a combination of Cross-Entropy loss, the standard loss for classification, and the final linear layer typically in vision models. GSD decomposes the norm of a sample feature embedding and the angular similarity to a target classifier into an instance-dependent and an instance-independent component. The instance-dependent component captures the sensitive information about changes in the input while the instance-independent component represents the insensitive information serving solely to minimize the loss on the training dataset. Inspired by the decomposition, we analytically derive a simple extension to current softmax-linear models, which learns to disentangle the two components during training. On several common vision models, the disentangled model outperforms other calibration methods on standard calibration metrics in the face of out-of-distribution (OOD) data and corruption with significantly less complexity. Specifically, we surpass the current state of the art by 30.8% relative improvement on corrupted CIFAR100 in Expected Calibration Error. Pixel-based RL has shown a lack of ability to identify and learn visual features when things such as color have been changed. Image augmentation has been shown to add to this, but is difficult to balance. AugCL is a novel curriculum learning approach that schedules image augmentation into training into a weak augmentation phase and a strong augmentation phase. We also introduce a novel visual augmentation strategy that proves to aid in the benchmarks we evaluate on. Our method achieves state-of-the-art performance on Deep Mind Control Generalization Benchmark when combined with previous methods.
dc.description.degree	M.S.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/1853/71957
dc.language.iso	en_US
dc.publisher	Georgia Institute of Technology
dc.subject	reinforcement learning
dc.subject	computer vision
dc.subject	calibration
dc.subject	robustness
dc.title	Robustness To Visual Perturbations In Pixel-Based Tasks
dc.type	Text
dc.type.genre	Thesis
dspace.entity.type	Publication
local.contributor.advisor	Kira, Zsolt
local.contributor.advisor	Hoffman, Judy
local.contributor.corporatename	College of Computing
local.contributor.corporatename	School of Computer Science
relation.isAdvisorOfPublication	7d182893-486b-4570-87b4-0c4ba0c10626
relation.isAdvisorOfPublication	403cff3c-8f25-4db5-978b-ef617a9f8b6a
relation.isOrgUnitOfPublication	c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication	6b42174a-e0e1-40e3-a581-47bed0470a1e
thesis.degree.level	Masters