Uncertainty-Aware and Data-Efficient Fine-Tuning and Application of Foundation Models
Loading...
Author(s)
Li, Yinghao
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Pre-trained foundation models have become indispensable in modern Natural Language Processing (NLP) and scientific domains, evolving into Large Language Models (LLMs) with impressive zero- and few-shot capabilities. Despite their widespread success, challenges persist in real-world applications due to distribution shifts between training and inference data, opaque inference processes, and limited in-domain manually labeled examples. These issues complicate reliable confidence estimation and application of foundation models in downstream tasks. To address these concerns, this thesis explores two primary research directions: 1) reliable Uncertainty Quantification (UQ) and 2) data-efficient model learning.
In addressing reliability, we investigate different UQ methods and develop novel techniques to enhance model calibration. Molecular Uncertainty Benchmark (MUBen) establishes a best-practice benchmark for UQ in molecular representation models, thoroughly evaluating uncertainty calibration and predictive accuracy in large-scale discriminative tasks. Expanding uncertainty estimation techniques to autoregressive LLMs, we introduce Uncertainty Quantification with Attention Chain (UQAC), an approach that employs iterative attention-chain backtracking to approximate an otherwise intractable marginalization over Chain of Thought (CoT) reasoning paths, thus enhancing confidence estimation robustness for LLMs.
Regarding data efficiency, our work targets scenarios characterized by limited or noisy labeled data. In zero-shot Named Entity Recognition (NER), we develop Conditional Hidden Markov Model (CHMM) and Sparse Conditional Hidden Markov Model (Sparse-CHMM), which effectively exploit weak supervision signals through contextual embeddings from autoencoding foundation models, employing sparsity regularization to improve robustness. Additionally, we propose Generate and Organize (G&O), a zero-shot Information Extraction (IE) framework leveraging the powerful reasoning abilities of autoregressive LLMs. Lastly, we introduce Ensembles of Low-Rank Expert Adapters (ELREA), designed for date-efficient multi-task fine-tuning, which clusters training instructions based on gradient directions and applies task-specific Low-Rank Adaptation (LoRA) experts through ensemble techniques. ELREA mitigates task interference, promoting better generalization and parameter efficiency.
Together, our proposed methods enhance the trustworthiness and adaptability of pre-trained models in critical domains by addressing uncertainty concerns and reducing dependency on extensive labeled data. The thesis underscores the importance of calibration, interpretability, and scalable fine-tuning strategies in developing robust, data-efficient solutions suitable for high-takes real-world applications.
Sponsor
Date
2025-04-25
Extent
Resource Type
Text
Resource Subtype
Dissertation