Organizational Unit:
School of Computational Science and Engineering

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 2 of 2
  • Item
    Fast and compact neural network via Tensor-Train reparameterization
    (Georgia Institute of Technology, 2023-08-28) Yin, Chunxing
    The exponential growth of data and model size poses a number of challenges for deep learning training. Large neural network layers can be parameterized based on tensor decomposition to compress model size, but at the potential costs of degraded accuracy and more execution time to reconstruct the layer parameters from the tensorized representation. In this dissertation, we explore neural network compression through Tensor Train (TT) reparameterization. We aim to develop efficient algorithms to accelerate training of tensorized networks while minimizing the memory consumption, and to understand the necessary components for Tensor Train format to succeed in model compression. We design efficient algorithms to accelerate the training of tensorized layers in Convolutional Neural Networks (CNNs), Deep Learning Recommendation Models (DLRMs), and in Graph Neural Networks (GNNs). While the use of TT for compression in CNNs has been suggested in the past, the prior art has not demonstrated significant speedups for training or inference. The reason is that conventional implementations of TT-compressed convolutional layers pose several challenges: increases in computational work for reconstructing TT-compressed layers, increases in memory footprint due to weight reconstruction, and limitations to parallel scalability as the effective problem sizes shrink under compression. We address these issues through asymptotic reductions in computation, avoidance of data movement, and an alternative parallelization strategy that significantly improves scalability. In recommendation models, the performance of TT-compressed DLRM (TT-Rec) is further optimized with the batched matrix multiplication and caching strategies for embedding vector lookup operations. In addition, we present mathematically and empirically the effect of weight initialization distribution on DLRM accuracy and propose to initialize the tensor cores of TT-Rec following the sampled Gaussian distribution. In the next part of this dissertation, we study the node embeddings in graph neural networks where both the numerical features and topological graph information need to be preserved. We design training schemes that unify hierarchical tensor decomposition and graph topology to exploit graph homophily, as well as to develop novel parameter initialization algorithms that introduces graph spectrum to improve model convergence and accuracy. Finally, we evaluate our technique on million-node graphs to demonstrate the efficiency and accuracy in real-world graphs, as well as on synthetic graphs to understand the correlation between graph homophily and weight sharing in TT. While the primary focus of this dissertation lies in exploring proof-of-concept algorithms, its outcomes can hold significant implications for systems. For example, by transforming the data-intensive embedding operator to compute-intensive and memory-efficient tensorized embedding, we can potentially reconfigure the allocation of system resources within a heterogeneous data-center with a combination of CPUs and GPUs. Moreover, our compression technique would enable storing large modules on a limited-memory accelerator with data-parallelism, thereby providing opportunities for optimizing communication.
  • Item
    Multifidelity Memory System Simulation
    (Georgia Institute of Technology, 2023-08-25) Lavin, Patrick
    As computer systems grow larger and more complex, it takes more time to simulate them in detail. Researchers interested in simulating large systems must choose between simpler, less-accurate models or simulating smaller portions of their benchmarks, both of which can be highly manual, offline approaches that require time-consuming analysis by experts. Multifidelity simulation aims to lessen this burden by adapting the fidelity of a simulation to the complexity of the behavior being simulated. Multifidelity simulation refers to a simulation that can utilize multiple models for the same phenomena at different levels of fidelity. We borrow the phrase from the simulation of physical systems where scientists may have models with more or fewer terms, or may resolve their models on smaller or larger grid sizes, depending on the nature of the behavior at any point or time in the simulation. We have taken those ideas and applied them to computer architecture simulation. In this dissertation, we will present our novel multifidelity computer architecture simulation algorithm and implement it in two separate models: one for the cache and one for the entire memory system. Our cache model is able to automatically train and choose between low-fidelity models to adapt to the complexity of the modeled behavior online. The second model, the memory system, refines upon the ideas developed to create the first. We use statistical techniques to choose data that is used to create the low-fidelity models and implement this work as reusable components within a widely-used simulator, SST. This model achieves up to 2x speedup with only 1-5% mean error in the instructions per cycle.