Advanced Quantization Techniques for Communication Efficiency and Privacy in Federated Learning, and Memory-efficient Fine-tuning of LLMs
Loading...
Author(s)
Youn, Yeo Joon
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Quantization techniques play a crucial role in developing communication-efficient Federated Learning (FL) algorithms, as well as in optimizing training for Large Language Models (LLMs). While it is evident that quantization can significantly enhance the efficiency of modern machine learning systems with thoughtful and sophisticated engineering, the theoretical understanding of why federated optimization algorithms with quantization still maintain convergence guarantees—especially when combined with other optimization techniques—remains unclear. Also, since privacy-sensitive data on local devices necessitates privacy-preserving training in FL, it is imperative to devise a well-designed quantization scheme that not only ensures communication efficiency but also aligns with the guaranteed privacy guarantees. Finally, when it comes to memory-efficient fine-tuning of LLMs, conventional quantization methods like QLoRA fall short for extremely low-bit fine-tuning tasks.
This dissertation addresses these three challenges of quantization from an optimization perspective. Specifically, it presents: 1. Federated Optimization Algorithm with Acceleration and Quantization (FedAQ), which tackles the communication bottleneck in federated learning by combining an accelerated federated averaging method that reduces training and synchronization steps with an efficient quantization scheme, significantly reducing communication complexity while maintaining stronger theoretical guarantees. 2. A new algorithm, Randomized Quantization Mechanism (RQM), maps gradients to a randomized discrete grid while preserving Rényi differential privacy, offering improved privacy-accuracy trade-offs in federated learning compared to the previous state-of-the-art. 3. A redesign of the NormalFloat quantization in QLoRA, introducing Quantization Group Adaptive NormalFloat (AdaNF), which dynamically adjusts the CDF offset based on each quantization group's statistics, enabling 2-bit fine-tuning in resource-constrained environments.
Sponsor
Date
2024-11-19
Extent
Resource Type
Text
Resource Subtype
Dissertation