Enhancing Foundation Models with Self-Guided Techniques: From Attention to Adapters to Agents

Loading...
Thumbnail Image
Author(s)
Yu, Zhongzhi
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computer Science
School established in 2007
Series
Supplementary to:
Abstract
Foundation models, a class of large-scale transformers pretrained on large-scale datasets, have achieved remarkable performance across various applications. However, the growing demand to deploy foundation models in real-world applications with diverse resource and capability requirements highlights three critical challenges hindering their broader adoption: (1) the accuracy-efficiency trade-off, where improving accuracy through scaling leads to prohibitive computational costs; (2) inefficient adaptation strategies that require heavy supervision and resources, hindering use in resource-constrained environments; and (3) limited capabilities in handling complex tasks, such as automated hardware code generation and multi-agent collaboration. This thesis addresses these challenges by leveraging our insight that foundation models encode rich representations, which, if effectively extracted, can enable self-guided optimization. Specifically, we introduce a set of techniques across three complementary levels, each targeting one of the aforementioned challenges: (1) At the attention level, addressing the accuracy-efficiency trade-off, we introduce the Attention Calibration Technique (ACT), which refines suboptimal attention distributions to improve performance without training, and SpotVLM, which reduces visual token redundancy in video-language models through attention-based selection. (2) At the adapter level, targeting adaptation efficiency, we present Master-ASR, which enables dynamic selection and composition of adapters to support efficient model adaptation. (3) At the agent level, targeting complex tasks that require knowledge retrieval and reasoning, we propose Instant-RAG, a retrieval-augmented generation system that hides retrieval overhead within the standard generation workflow to enable efficient knowledge access, and Spec2RTL-Agent, which addresses the challenging task of directly generating Register Transfer Level (RTL) code from specification documents by coordinating multiple foundation models to achieve advanced reasoning capabilities. Together, these techniques form a comprehensive framework for self-guided optimization that addresses key challenges limiting the broader deployment of foundation models, enabling more accessible and capable models in real-world scenarios.
Sponsor
Date
2025-04-29
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI