Mutual exclusivity bias and spatial reasoning in Vision-Language Models

Loading...
Thumbnail Image
Author(s)
Thai, Ngoc Anh
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Interactive Computing
School established in 2007
Supplementary to:
Abstract
Despite rapid advancements in machine learning, enabling models to generalize beyond their training data, they still lag significantly behind the learning efficiency of young children. In this dissertation, we draw inspiration from developmental psychology, specifically children’s learning environments and strategies, to inform machine learning algorithms. To achieve this, we focus on two key aspects of children’s word and object learning: 1) Spatial preposition comprehension through 3D information, and 2) Mutual exclusivity bias, which facilitates object-word association. We begin by examining the generalization ability of 3D reconstruction models, identifying key factors that influence their capacity to infer complete 3D structures. Extending this exploration, we demonstrate that 2D feature with strong semantic correspondence matching can be effectively leveraged for 3D object part segmentation. With the rapid progress in large vision-language models (VLMs), we introduce SplatTalk, a novel approach that utilizes multi-view RGB images to address the 3D Visual Question Answering (3D VQA) task, where 3D spatial understanding is essential for achieving high performance. To further investigate the capabilities of VLMs and assess whether they exhibit human-like learning biases, particularly those observed in young children, we introduce MEBench, a benchmark for object detection and recognition. This benchmark challenges computational models to leverage mutual exclusivity bias for rapidly associating new semantic concepts with novel objects. Beyond traditional mutual exclusivity bias evaluation, we explore whether VLMs can effectively use spatial information to reason about scenes and resolve ambiguities in uncertain learning environments.
Sponsor
Date
2025-04-02
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI