Organizational Unit:

School of Computational Science and Engineering

Permanent Link

https://hdl.handle.net/1853/70780

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1111

Full item page

Publication Search Results

Now showing 1 - 10 of 199

Methodologies for co-designing supercomputer-scale systems and deep learning software

(Georgia Institute of Technology, 2024-04-27) Isaev, Mikhail

This dissertation introduces new methodologies to co-design deep learning software and supercomputer hardware in the setting of large-scale training. The first is an analytical performance model for exploring the co-design space of parallel algorithms for large language models (LLMs) and potential supercomputer architectures during the early phases of the co-design process. On the algorithm side, we consider diverse implementation strategies, including data, tensor, and pipeline parallelism, communication-computation overlap, and memory optimization. The hardware aspect includes hierarchical memory systems, multiple interconnection networks, and parameterized efficiencies based on operation size. Our open-source tool, Calculon, implements this model. Its analytical nature enables rapid evaluation, estimating performance for billions of strategy and architecture combinations. This facilitates co-design-space exploration for future LLMs with trillions of parameters, yielding insights into optimal system characteristics and the interplay between algorithmic and architectural decisions. As models scale beyond 100 trillion parameters, two bottlenecks become especially critical to address: memory capacity and network speed. For the former, Calculon suggests a hardware solution involving the addition of slower capacity-tier memory for intermediate tensors and model parameters, optimizing faster memory for current layer computation. For the latter, we present novel distributed-memory parallel matrix multiplication algorithms capable of hiding communication entirely, potentially achieving perfect scaling. Looking ahead, we foresee a need to model artificial intelligence (AI) applications beyond LLMs and perform detailed system simulations in later design stages. To meet these demands, we introduce ParaGraph, a tool bridging the gap between applications and network hardware simulators. ParaGraph features a high-level graph representation of parallel programs, automatically extracted from compiled applications and a runtime environment for emulator-based dynamic execution. Case studies on deep learning workloads extracted from JAX and TensorFlow programs illustrate ParaGraph's utility for software-hardware co-design workflows, including communication optimization, hardware bottleneck identification, and simulation validation.
Accurate and Trustworthy Recommender Systems: Algorithms and Findings

(Georgia Institute of Technology, 2024-04-15) Oh, Sejoon

The exponential growth of information on the Web has led to the problem of ”information overload,” which has been addressed through the use of recommender systems. Modern recommender systems use deep learning algorithms trained with user-item interaction data to generate recommendations. However, current recommender systems still face diverse challenges with respect to accuracy, personalization, and robustness. In this thesis, we investigate such challenges and provide insights and solutions to them. This thesis is divided into two parts: (1) making recommender systems accurate and personalized, and (2) making recommender systems robust and trustworthy. First, we study session-based recommender systems (SBRSs) and user intent-aware recommender systems, which have been proposed to enhance accuracy and personalization via modeling users’ short-term and evolving interests. Existing recommender systems face two significant limitations. First, they cannot incorporate session contexts or user intents (i.e., high-level interests) into their models, which could improve the next-item prediction. To address it, we propose a novel SBRS: ISCON to assign precise and meaningful implicit contexts to sessions via node embedding and clustering algorithms. By leveraging the session contexts found by ISCON, we can offer more personalized recommendations to end users. We also propose a new recommendation framework: INTENTREC that predicts a user’s intent on Netflix and uses that as one of the input features of the next-item prediction of the user. The user intents obtained by INTENTREC can be used for diverse applications such as real-time recommendations, personalized UI and notifications, etc. Second, existing recommender systems cannot scale to large real-world recommendation datasets. To handle the scalability issue, we propose M2TREC, a metadata-aware multi-task Transformer model that uses only item attributes to learn item representations and is completely item-ID free. With M2TREC, we can achieve faster convergence, higher accuracy, and robust recommendations with fewer training data. Sparse training data can cause recommendation models to produce incorrect and popularity-biased recommendations. It has been well-known that most recommendation datasets are extremely large and sparse, limiting the ability of models to generate effective representations for cold-start users or items with few interactions. To address the sparsity issue, we devise an influence-guided data augmentation technique DAIN that augments important data points for reducing training loss to the original data. With DAIN, we can enhance the recommendation model’s generalization ability and mitigate cold-start and popularity-bias problems. Apart from accuracy and personalization, we also analyze the robustness of existing recommender systems against input perturbations and devise a solution to enhance the robustness of the recommenders. Deep learning-based recommender systems have shown sensitivity to arbitrary and adversarial input perturbations, resulting in drastic alterations of recommendation lists after perturbations. The sensitivity disproportionately affects low-accuracy user groups compared to high-accuracy groups, making the models unreliable and detrimental to both users and service providers, particularly in high-stakes applications such as healthcare, education, and housing. Despite its importance, the stability of recommender systems has not been studied thoroughly. Thus, we first introduce two Rank List Sensitivity (RLS) metrics that allow us to measure changes in recommendations against perturbations, and we propose two training data perturbation mechanisms (random and CASPER) for recommender systems. We show that existing sequential recommenders are highly vulnerable against CASPER and even random perturbations. We further introduce a fine-tuning mechanism called FINEST that can stabilize predictions of sequential recommender systems against training data perturbations. FINEST simulates perturbations during the fine-tuning and utilizes a rank-preserving loss function to ensure stable recommendations. With FINEST, any sequential recommenders become more robust against interaction-level perturbations. Finally, we investigate the robustness of text-aware recommender systems against adversarial text rewriting. Our proposed text rewriting framework (ATR) can generate optimal product descriptions via two-phase fine-tuning of language models. Such rewritten product descriptions can significantly boost the ranks of target items, and the attackers can exploit the vulnerability of text-aware recommenders to promote their own items on diverse web platforms such as e-commerce. Our work highlights the importance of studying the robustness of existing recommenders and the need for inventing a defense mechanism against the text rewriting attack: ATR. Overall, we proposed next-generation recommendation frameworks as per accuracy, personalization, and robustness. We also suggest several ongoing and future works including a unified robustness benchmark of existing recommender systems, adversarial attacks/defenses against multimodal recommenders, and leveraging emerging large language models to maximize the accuracy, personalization, and interpretability of recommender systems.
Federated approaches for the visualization and analysis of neuroimaging data

(Georgia Institute of Technology, 2023-12-13) Saha, Debbrata Kumar

In the neuroimaging domain, the data collection process is expensive, and attempting to pool data from multiple imaging sites faces numerous challenges, including variations in data acquisition protocols from site to site. There is also concern associated with revealing the identities of rare disease subjects. The challenges of data sharing associated with these issues prevent the datasets from being as large as desired, ultimately hindering the benefits of utilizing large-scale datasets in research operations. This dissertation aims to address these potential challenges. First, we develop a federated embedding algorithm to assess the quality control of neuroimaging datasets. Our algorithm has demonstrated superior performance in overcoming challenges that some notable existing algorithms struggle to solve. Subsequently, we introduce a privacy-preserving algorithm tailored to the neuroimaging domain, ensuring formal mathematical privacy guarantees during message passing in federated computation. The integration of this algorithm with the existing software platform for federated neuroimaging has been demonstrated, making our methods readily available as tools for neuroimaging users worldwide. Our third proposed approach emphasizes fast federated communication with more stringent privacy assurances. Lastly, we design a federated algorithm to extract multivariate patterns (covarying networks) from structural magnetic resonance imaging (sMRI) data for the analysis of brain morphometry. These four proposed methods enable neuroimaging users to perform operations in a federated environment where it is not possible to run operations centrally in typical scenarios.
Safe Explanations And Explainable Models For Neuroimaging Data Through A Framework Of Constraints

(Georgia Institute of Technology, 2023-12-11) Lewis, Noah Jerome

Neuroimaging data, which can be highly complex and occasionally inscrutable, requires robust, reproducible, and domain-specific methods. Deep learning and model explainability have become common methods for analyzing neuroimaging data. However, the complex, obscure, and sometimes flawed nature of both deep learning and explainability compound the difficulties in neuroimaging analysis. This dissertation addresses several of these issues with explainability by employing a framework of constraint-based solutions. These constraints span the entire modeling pipeline, including initialization, model parameters and gradients, and the loss functions. To familiarize the readers with the field, this dissertation will begin with a comprehensive investigation into current explainability methods both in general and specific to neuroimaging, then describe the three constraint-based methodologies that comprise this framework. First, we develop an attention-based constraint for recurrent models that resolves vanishing saliency. Vanishing saliency is closely related to vanishing gradients, a common issue for training, in which the gradients lose value during backpropagation. Our second proposed method is a set of initialization constraints that target underspecification and its implications for post-hoc explanations. Our final proposed method leverages inherent neuroimaging-based geometric information in the input to constrain the optimization approach to produce more interpretable models. These three constraint methods amount to a broad framework that provides a robust and reproducible explanatory system appropriate for neuroimaging.
Interactive Scalable Discovery Of Concepts, Evolutions, And Vulnerabilities In Deep Learning

(Georgia Institute of Technology, 2023-12-05) Park, Haekyu

Deep Neural Networks (DNNs) are increasingly prevalent, but deciphering their operations is challenging. Such a lack of clarity undermines trust and problem-solving during deployment, highlighting the urgent need for interpretability. How can we efficiently summarize concepts models learn? How do these concepts evolve during training? When models are at risk from potential threats, how do we explain their vulnerabilities? We address these concerns with a human-centered approach, by developing novel systems to interpret learned concepts, their evolution, and potential vulnerabilities within deep learning. This thesis focuses on three key thrusts: (1) Scalable Automatic Visual Summarization of Concepts. We develop NeuroCartography, an interactive system that scalably summarizes and visualizes concepts learned by a large-scale DNN, such as InceptionV1 trained with 1.2M images. A large-scale human evaluation with 244 participants shows that NeuroCartography discovers coherent, human-meaningful concepts. (2) Insights to Reveal Model Vulnerabilities. We develop scalable interpretation techniques to visualize and identify internal elements in DNNs, which are susceptible to potential harms, aiming to understand how these defects lead to incorrect predictions. We develop first-of-its-kind interactive systems such as Bluff that visually compares the activation pathways for benign and attacked images in DNNs, and SkeletonVis that explains how attacks manipulate human joint detection in human action recognition models. (3) Scalable Discovery of Concept Evolution During Training. Our first-of-its-kind ConceptEvo unified interpretation framework holistically reveals the inception and evolution of learned concepts and their relationships during training. ConceptEvo enables powerful new ways to monitor model training and discover training issues, addressing critical limitations of existing post-training interpretation research. A large-scale human evaluation with 260 participants demonstrates that ConceptEvo identifies concept evolutions that are both meaningful to humans and important for class predictions. This thesis contributes to information visualization, deep learning, and crucially, their intersection. We have developed open-source interactive interfaces, scalable algorithms, and a unified framework for interpreting DNNs across different models. Our work impacts academia, industry, and the government. For example, our work has contributed to the DARPA GARD program (Garanteeing AI Robustness against Deception). Additionally, our work has been recognized through a J.P. Morgan AI PhD Fellowship and 2022 Rising Stars in IEEE EECS. NeuroCartography has been highlighted as a top visualization publication (top 1%) invited to SIGGRAPH.
Learning with Less: Low-rank Dynamics, Communication, and Introspection in Machine Learning

(Georgia Institute of Technology, 2023-10-03) Baker, Bradley Thomas

The enclosed research is a focused empirical and theoretical analysis of the optimization methods in machine learning, and the underlying role that the matrix rank of utilized learning statistics plays in these algorithms. We show that this new perspective on machine learning optimization provides benefits in terms of communication-efficient federated learning algorithms, as well as novel insights in terms of model introspection and theory of learning dynamics. In applications to the complex domain of Neuroimaging data analysis, we show that this rank-focused frame of reference allows for unique insights into how models perform on particular populations.
Fast and compact neural network via Tensor-Train reparameterization

(Georgia Institute of Technology, 2023-08-28) Yin, Chunxing

The exponential growth of data and model size poses a number of challenges for deep learning training. Large neural network layers can be parameterized based on tensor decomposition to compress model size, but at the potential costs of degraded accuracy and more execution time to reconstruct the layer parameters from the tensorized representation. In this dissertation, we explore neural network compression through Tensor Train (TT) reparameterization. We aim to develop efficient algorithms to accelerate training of tensorized networks while minimizing the memory consumption, and to understand the necessary components for Tensor Train format to succeed in model compression. We design efficient algorithms to accelerate the training of tensorized layers in Convolutional Neural Networks (CNNs), Deep Learning Recommendation Models (DLRMs), and in Graph Neural Networks (GNNs). While the use of TT for compression in CNNs has been suggested in the past, the prior art has not demonstrated significant speedups for training or inference. The reason is that conventional implementations of TT-compressed convolutional layers pose several challenges: increases in computational work for reconstructing TT-compressed layers, increases in memory footprint due to weight reconstruction, and limitations to parallel scalability as the effective problem sizes shrink under compression. We address these issues through asymptotic reductions in computation, avoidance of data movement, and an alternative parallelization strategy that significantly improves scalability. In recommendation models, the performance of TT-compressed DLRM (TT-Rec) is further optimized with the batched matrix multiplication and caching strategies for embedding vector lookup operations. In addition, we present mathematically and empirically the effect of weight initialization distribution on DLRM accuracy and propose to initialize the tensor cores of TT-Rec following the sampled Gaussian distribution. In the next part of this dissertation, we study the node embeddings in graph neural networks where both the numerical features and topological graph information need to be preserved. We design training schemes that unify hierarchical tensor decomposition and graph topology to exploit graph homophily, as well as to develop novel parameter initialization algorithms that introduces graph spectrum to improve model convergence and accuracy. Finally, we evaluate our technique on million-node graphs to demonstrate the efficiency and accuracy in real-world graphs, as well as on synthetic graphs to understand the correlation between graph homophily and weight sharing in TT. While the primary focus of this dissertation lies in exploring proof-of-concept algorithms, its outcomes can hold significant implications for systems. For example, by transforming the data-intensive embedding operator to compute-intensive and memory-efficient tensorized embedding, we can potentially reconfigure the allocation of system resources within a heterogeneous data-center with a combination of CPUs and GPUs. Moreover, our compression technique would enable storing large modules on a limited-memory accelerator with data-parallelism, thereby providing opportunities for optimizing communication.
Scalable Algorithms for Hypergraph Analytics using Symmetric Tensor Decompositions

(Georgia Institute of Technology, 2023-08-28) Shivakumar, Shruti

Tensors are higher-dimension generalizations of matrices and are used to represent multi-dimensional data. Tensor-based methods are receiving renewed attention in recent years due to their prevalence in diverse real-world applications. Symmetric tensors are an important class of tensors, arising in diverse fields such as signal processing, machine learning, and hypergraph analytics. Hypergraphs, generalizations of graphs which allow edges to span multiple vertices, have become ubiquitous in understanding real-world networks and multi-entity interactions. Affinity relations in a hypergraph can be represented as a high-order adjacency tensor which is sparse and symmetric. While mathematical research on symmetric tensors is longstanding, emerging massive data in these applications has sparked the demand for scalable, efficient algorithms that utilize advances in numerical linear algebra, numerical optimization, as well as high-performance computing. State-of-the-art tensor libraries incorporate high-performance tensor methods for general sparse tensors; however, they lack specialized algorithms for sparse tensors that are symmetric. This dissertation focuses on scaling hypergraph analytics to real-world datasets by taking advantage of the sparsity and symmetry of the associated adjacency tensors through the development of compact storage formats and efficient serial and parallel algorithms for tensor operations. We present a novel computation-aware compressed storage format - CSS - for sparse symmetric tensors, along with efficient parallel algorithms for symmetric tensor operations that are compute- and memory-intensive due to the high tensor order and the associated factorial explosion in the number of non-zeros. In order to scale to large multi-entity complex networks, we consider the problem of distributed-memory hypergraph analytics. To that end, we present algorithms for parallel distributed-memory line graph construction of hypergraphs and demonstrate their application to large-scale symmetric adjacency tensor decomposition for hypergraph clustering. For hypergraphs with varying edge cardinalities, the CSS format has been extended to the CCSS format, using which we present a new shared-memory parallel algorithm for a key symmetric tensor kernel in the complutation of hypergraph tensor eigenvector centrality. Finally, we present Coupled Symmetric Tensor Completion (CoSTCo), a Riemannian optimization framework for the task of link prediction in non-uniform hypergraphs and analyze its performance with both synthetic and real-world datasets against state-of-the-art general tensor completion algorithms.
Multifidelity Memory System Simulation

(Georgia Institute of Technology, 2023-08-25) Lavin, Patrick

As computer systems grow larger and more complex, it takes more time to simulate them in detail. Researchers interested in simulating large systems must choose between simpler, less-accurate models or simulating smaller portions of their benchmarks, both of which can be highly manual, offline approaches that require time-consuming analysis by experts. Multifidelity simulation aims to lessen this burden by adapting the fidelity of a simulation to the complexity of the behavior being simulated. Multifidelity simulation refers to a simulation that can utilize multiple models for the same phenomena at different levels of fidelity. We borrow the phrase from the simulation of physical systems where scientists may have models with more or fewer terms, or may resolve their models on smaller or larger grid sizes, depending on the nature of the behavior at any point or time in the simulation. We have taken those ideas and applied them to computer architecture simulation. In this dissertation, we will present our novel multifidelity computer architecture simulation algorithm and implement it in two separate models: one for the cache and one for the entire memory system. Our cache model is able to automatically train and choose between low-fidelity models to adapt to the complexity of the modeled behavior online. The second model, the memory system, refines upon the ideas developed to create the first. We use statistical techniques to choose data that is used to create the low-fidelity models and implement this work as reusable components within a widely-used simulator, SST. This model achieves up to 2x speedup with only 1-5% mean error in the instructions per cycle.
Artificial Intelligence for Data-centric Surveillance and Forecasting of Epidemics

(Georgia Institute of Technology, 2023-08-15) Rodriguez Castillo, Alexander D.

Surveillance and forecasting of epidemics are crucial tools for decision making and planning of government officials, businesses, and the general public. In many respects, our understanding of how epidemics spread is still at its infancy, despite multiple advances in understanding how diseases spread in the population. Many of the major challenges stem from other complex dynamics, such as mobility patterns, policy compliance, and even shifts in data collection procedures. As a result of efforts to collect and process data from novel sources, granular data are becoming increasingly available on many of these variables. These datasets, however, are difficult to exploit using traditional methodologies from mathematical epidemiology and agent-based modeling. Alternatively, AI methods in epidemiology are challenged by data sparsity, distributional changes, and disparities in data quality. AI also lacks understanding of epidemic dynamics, which may lead to unrealistic predictions. Several frameworks are proposed in this dissertation to address these challenges and move toward more data-centric methods. Specifically, we utilize multiple examples to showcase that bringing the data-driven expressibility of AI into epidemiology leads to more sensitive and precise surveillance and forecasting of epidemics.