Towards Reliable Computer Vision Systems
Author(s)
Prabhu, Viraj Uday
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
The real world has infinite visual variation – across viewpoints, time, space, and curation. As deep visual models become ubiquitous in high-stakes applications, their ability to generalize across such variation becomes increasingly important. Such generalization will alleviate the need to label a large corpus for every new deployment, which may be infeasible due to data volume (e.g. autonomous driving) or labeling cost (e.g. medical diagnosis). Further, it is necessary to overcome the natural spatiotemporal distribution shifts that a deployed model will invariably face (e.g. changing geographies and seasons). Finally, such generalization will unlock the possibility of knowledge transfer from inexpensive sources of data (e.g. transferring models trained in simulation to reality).
In this thesis, I will present opportunities to improve such generalization at different stages of the ML lifecycle. First, I will discuss proactive strategies for training robust models by leveraging simulation to augment the long tail of real training data. Next, I will present reactive strategies to recover from unforeseen distribution shifts via self-supervised domain adaptation. Finally, I will present a framework to stress-test the robustness of vision models by leveraging foundation models for text and image synthesis to generate challenging counterfactual test cases.
Sponsor
Date
2024-02-28
Extent
Resource Type
Text
Resource Subtype
Dissertation