Hierarchical and Hardware-Aware Optimization for Enhancing AI Model Efficiency: From Bits to Modules to Models
Loading...
Author(s)
Fu, Yonggan
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Despite the remarkable advancements of AI foundation models, such as large language models (LLMs), in numerous tasks and applications, deploying these powerful models on everyday devices remains challenging due to their growing computational and memory demands. This challenge hinders the realization of immersive and interactive user experiences that require real-time AI processing on resource-constrained devices.
This PhD thesis aims to bridge this gap by performing hierarchical and hardware-aware optimization of AI models, maximizing accuracy-efficiency trade-offs to enable ubiquitous edge intelligence. Specifically, this thesis addresses redundancy at the bit, module, and model levels and leverages hardware characteristics to achieve real-device speed-ups. The proposed techniques include cyclic precision training (CPT) for efficient and accurate bit-level quantization, DepthShrinker and AmoebaLLM for delivering real-hardware-efficient LLMs through module-level optimization, and a new language model architecture, Hymba, for efficient language processing, as well as Omni-Recon for efficient 3D understanding at the model level. These techniques collectively enable real-time execution of complex AI models on everyday devices, advancing the development of efficient AI solutions for ubiquitous edge intelligence.
Sponsor
Date
2025-04-29
Extent
Resource Type
Text
Resource Subtype
Dissertation