Towards Transparent and Grounded Visual AI Systems

Thumbnail Image
Goyal, Yash
Batra, Dhruv
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
My research goal is to build transparent and grounded AI systems. More specifically, my research tries to answer the question -- Do deep visual models make their decisions for the "right reasons"? In my dissertation, I try to answer this question in two ways: 1. Visual grounding. Grounding is essential to build reliable and generalizable systems that are not driven by dataset biases. In the context of the task of Visual Question Answering (VQA), we would expect models to be visually grounded, i.e., looking at the right regions in the image while answering a question. I address this issue of visual grounding in VQA by proposing a) two new benchmarking datasets to test visual grounding, and b) a new VQA model that is visually grounded by design. 2. Transparency. Transparency in AI systems can help system designers find their failure modes and provide guidance to teach humans. I developed techniques for generating explanations from deep models that give us insights into what they are basing their decisions on. Specifically, I study the following -- a) what parts of the inputs VQA models focus on while making a prediction, b) a new counter-example explanation modality where a VQA model has to identify images for which a given question-answer is not true, c) counterfactual visual explanations and how we can use such explanations to teach humans, and d) causal concept explanations (explaining “zebra” class prediction in terms of human-understandable concept “stripes”) by reasoning about the causal relationship between concept explanations, images and classifier predictions.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI