Extracting Knowledge with Multimodal and Multilingual Intelligent Systems
Loading...
Author(s)
Chen, Yang
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Recent advancements in large language models (LLMs) have revolutionized natural language processing and vision-language tasks. While demonstrating emergent capabilities, these models present challenges in responsible development, particularly in visual world knowledge, privacy concerns, and multilingual capabilities. This thesis addresses these challenges through three main contributions. First, we introduce InfoSeek, a vision-language benchmark assessing models’ ability to leverage world knowledge for answering queries about visual entities. To improve performance on this challenging task, we develop multimodal retrieval-augmented generation systems to acquire knowledge from external resources. Inspired by the visual knowledge we found from InfoSeek, we raise an emergent privacy concern of multimodal LLM to reveal geolocation information of user posted images. We then present PrivQA, a benchmark evaluating models’ ability to follow access control instructions and prevent private information disclosure. Our findings reveal biases and vulnerabilities in current privacy protection mechanisms, especially in adversarial settings. Third, we propose three approaches to enhance multilingual capabilities and improve information extraction (IE) in low-resource languages: TransFusion, a framework leveraging English translations to enhance multilingual performance; EasyProject, a simplified method for creating synthetic multilingual IE data; and a model selection algorithm predicting multilingual model performance on unseen languages. These contributions aim to develop and benchmark methods for extracting knowledge with multimodal and multilingual intelligent systems, addressing key challenges in emergent visual knowledge, privacy concerns, and multilingual capabilities.
Sponsor
Date
2024-07-25
Extent
Resource Type
Text
Resource Subtype
Dissertation