Title:
Robustness To Visual Perturbations In Pixel-Based Tasks
Robustness To Visual Perturbations In Pixel-Based Tasks
Author(s)
Yung, Dylan D.
Advisor(s)
Kira, Zsolt
Hoffman, Judy
Hoffman, Judy
Editor(s)
Collections
Supplementary to
Permanent Link
Abstract
Convolutional Neural Networks (CNNs) have been shown to provide great utility across
many vision tasks and have become the go-to model for problems involving video or
image input. Though they’ve shown promise across many problems they come with
inherent flaws. For example, in image classification, CNNs are known to output very
high confidence values even when their accuracy is low. This is exacerbated when visual
perturbations are introduced to inputs causing accuracy to drop, but confidence to remain
high. This is similarly problematic when models use visual inputs for decision-making,
such as through pixel-based Reinforcement Learning (RL) where an agent must learn a
policy leveraging images of the environment as input. RL agents under these settings
can perform well in training, but once deployed may face unseen visual perturbation,
causing an erroneous execution of their learned task. Poor robustness to the previously
mentioned examples is deadly in applied Machine Learning (ML) in the medical field and
autonomous vehicles. Thus ways to impart robustness on CNNs for image classification
and RL are of utmost importance. In this thesis, we explore solutions to the problem of
overconfident image classification models and embedding robustness to visual perturbations
in RL. We propose two distinct frameworks for doing so in two contexts: Image-based
classification (Geometric Sensitivity Decomposition (GSD)) and decision-making
(Augmentation Curriculum Learning (AugCL)).
CNNs utilized for image classification has been shown to be erroneously overconfident.
A large contributor to the overconfidence is attributed to a combination of Cross-Entropy
loss, the standard loss for classification, and the final linear layer typically in vision
models. GSD decomposes the norm of a sample feature embedding and the angular
similarity to a target classifier into an instance-dependent and an instance-independent
component. The instance-dependent component captures the sensitive information about
changes in the input while the instance-independent component represents the insensitive
information serving solely to minimize the loss on the training dataset. Inspired by
the decomposition, we analytically derive a simple extension to current softmax-linear
models, which learns to disentangle the two components during training. On several
common vision models, the disentangled model outperforms other calibration methods on
standard calibration metrics in the face of out-of-distribution (OOD) data and corruption
with significantly less complexity. Specifically, we surpass the current state of the art by
30.8% relative improvement on corrupted CIFAR100 in Expected Calibration Error.
Pixel-based RL has shown a lack of ability to identify and learn visual features when
things such as color have been changed. Image augmentation has been shown to add
to this, but is difficult to balance. AugCL is a novel curriculum learning approach
that schedules image augmentation into training into a weak augmentation phase and a
strong augmentation phase. We also introduce a novel visual augmentation strategy that
proves to aid in the benchmarks we evaluate on. Our method achieves state-of-the-art
performance on Deep Mind Control Generalization Benchmark when combined with
previous methods.
Sponsor
Date Issued
2023-01-13
Extent
Resource Type
Text
Resource Subtype
Thesis