Augmenting Large Language Models for Situated Multimodal Conversations in Scientific Documents
Author(s)
Sundar, Anirudh
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
There has been a rapid increase in the number of scientific articles published in recent times, particularly in the field of computing. For example, the number of electronic pre-prints submitted to ArXiv has increased 30-fold in the last 30 years. Simultaneously, staying current with peer-reviewed work is essential for scientific research. To address this issue, this dissertation outlines research done towards the development of language-model-based methods for situated and multimodal interactive conversations in scientific documents. First, the problem is broken down into five research threads based on the taxonomy of research in multimodal machine learning. These are multimodal representation, translation, fusion, alignment, and co-learning. Next, proofs-of-concept for individual research threads are developed and their efficacy is demonstrated on standard benchmark datasets. Finally, a new dataset called Conversational Papers is collected and open-sourced specifically for situated and multimodal interactive conversations in scientific documents. A series of baseline methods are presented to address the dataset and the overall challenge of situated and multimodal interactive conversations in scientific documents.
Sponsor
Date
2025-04-23
Extent
Resource Type
Text
Resource Subtype
Dissertation