Fast and compact neural network via Tensor-Train reparameterization

Yin, Chunxing

Title:

Fast and compact neural network via Tensor-Train reparameterization

Files

YIN-DISSERTATION-2023.pdf (4.81 MB)

Author(s)

Yin, Chunxing

Advisor(s)

Vuduc, Richard W.

Advisor(s)

Person

Vuduc, Richard

Associated Organization(s)

Organizational Unit

College of Computing

Organizational Unit

School of Computational Science and Engineering

Collections

Theses and Dissertations

Permanent Link

https://hdl.handle.net/1853/73092

Abstract

The exponential growth of data and model size poses a number of challenges for deep learning training. Large neural network layers can be parameterized based on tensor decomposition to compress model size, but at the potential costs of degraded accuracy and more execution time to reconstruct the layer parameters from the tensorized representation. In this dissertation, we explore neural network compression through Tensor Train (TT) reparameterization. We aim to develop efficient algorithms to accelerate training of tensorized networks while minimizing the memory consumption, and to understand the necessary components for Tensor Train format to succeed in model compression. We design efficient algorithms to accelerate the training of tensorized layers in Convolutional Neural Networks (CNNs), Deep Learning Recommendation Models (DLRMs), and in Graph Neural Networks (GNNs). While the use of TT for compression in CNNs has been suggested in the past, the prior art has not demonstrated significant speedups for training or inference. The reason is that conventional implementations of TT-compressed convolutional layers pose several challenges: increases in computational work for reconstructing TT-compressed layers, increases in memory footprint due to weight reconstruction, and limitations to parallel scalability as the effective problem sizes shrink under compression. We address these issues through asymptotic reductions in computation, avoidance of data movement, and an alternative parallelization strategy that significantly improves scalability. In recommendation models, the performance of TT-compressed DLRM (TT-Rec) is further optimized with the batched matrix multiplication and caching strategies for embedding vector lookup operations. In addition, we present mathematically and empirically the effect of weight initialization distribution on DLRM accuracy and propose to initialize the tensor cores of TT-Rec following the sampled Gaussian distribution. In the next part of this dissertation, we study the node embeddings in graph neural networks where both the numerical features and topological graph information need to be preserved. We design training schemes that unify hierarchical tensor decomposition and graph topology to exploit graph homophily, as well as to develop novel parameter initialization algorithms that introduces graph spectrum to improve model convergence and accuracy. Finally, we evaluate our technique on million-node graphs to demonstrate the efficiency and accuracy in real-world graphs, as well as on synthetic graphs to understand the correlation between graph homophily and weight sharing in TT. While the primary focus of this dissertation lies in exploring proof-of-concept algorithms, its outcomes can hold significant implications for systems. For example, by transforming the data-intensive embedding operator to compute-intensive and memory-efficient tensorized embedding, we can potentially reconfigure the allocation of system resources within a heterogeneous data-center with a combination of CPUs and GPUs. Moreover, our compression technique would enable storing large modules on a limited-memory accelerator with data-parallelism, thereby providing opportunities for optimizing communication.

Date Issued

2023-08-28

Resource Type

Text

Resource Subtype

Dissertation

Full item page

Title:

Fast and compact neural network via Tensor-Train reparameterization

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Fast and compact neural network via Tensor-Train reparameterization

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Fast and compact neural network via Tensor-Train reparameterization