Organizational Unit:

School of Computational Science and Engineering

Permanent Link

https://hdl.handle.net/1853/70780

Parent Organization

Organizational Unit

College of Computing

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1111

Full item page

Publication Search Results

Now showing 1 - 10 of 46

Scalable tensor decompositions in high performance computing environments

(Georgia Institute of Technology, 2018-07-31) Li, Jiajia

This dissertation presents novel algorithmic techniques and data structures to help build scalable tensor decompositions on a variety of high-performance computing (HPC) platforms, including multicore CPUs, graphics co-processors (GPUs), and Intel Xeon Phi processors. A tensor may be regarded as a multiway array, generalizing matrices to more than two dimensions. When used to represent multifactor data, tensor methods can help analysts discover latent structure; this capability has found numerous applications in data modeling and mining in such domains as healthcare analytics, social networks analytics, computer vision, signal processing, and neuroscience, to name a few. When attempting to implement tensor algorithms efficiently on HPC platforms, there are several obstacles: the curse of dimensionality, mode orientation, tensor transformation, irregularity, and arbitrary tensor dimensions (or orders). These challenges result in non-trivial computational and storage overheads. This dissertation considers these challenges in the specific context of the two of the most popular tensor decompositions, the CANDECOMP/PARAFAC (CP) and Tucker decompositions, which are, roughly speaking, the tensor analogues to low-rank approximations in standard linear algebra. Within that context, two of the critical computational bottlenecks are the operations known as Tensor-Times-Matrix (TTM) and Matricized Tensor Times Khatri-Rao Product (MTTKRP). We consider these operations in cases when the tensor is dense or sparse. Our contributions include: 1) applying memoization to overcome the curse of dimensionality challenge that exists in a sequence of tensor operations; 2) addressing the challenge of mode orientation through a novel tensor format HICOO and proposing a parallel scheduler to avoid the locks for write-conflict memory; 3) carrying out TTM and MTTKRP operations in-place, for dense and sparse cases, to avoid tensor-matrix conversions; 4) employing different optimization and parameter tuning techniques for CPU and GPU implementations to conquer the challenges of the irregularity and arbitrary tensor orders. To validate these ideas, we have implemented them in three prototype libraries, named AdaTM, InTensLi, and ParTI!, for arbitrary-order tensors. AdaTM is a model-driven framework to generate an adaptive tensor memoization algorithm with the optimal parameters for sparse CP decomposition. InTensLi produces fast single-node implementations of dense TTM of an arbitrary dimension. ParTI! is short for a Parallel Tensor Infrastructure which is written in C, OpenMP, MPI, and NVIDIA CUDA for sparse tensors and supports MATLAB interfaces for application-level users.
Tackling chronic diseases via computational phenotyping: Algorithms, tools and applications

(Georgia Institute of Technology, 2018-07-31) Chen, Robert

With the recent tsunami of medical data from electronic health records (EHRs), there has been a rise in interest in leveraging such data to improve efficiency of healthcare delivery and improve clinical outcomes. A large part of medical data science involves computational phenotyping, which leverages data driven methods to subtype and characterize patient conditions from heterogeneous EHR data. While many applications have used supervised phenotyping, unsupervised phenotyping will become increasingly more important in future precision medicine initiatives. A typical healthcare analytics workflow consists of phenotype discovery from EHR data, followed by predictive modeling that may leverage such phenotypes, followed by model deployment via avenues such as FHIR. To address unmet clinical needs, we have developed and demonstrated algorithms, tools and applications along each step of this process.
Learning over functions, distributions and dynamics via stochastic optimization

(Georgia Institute of Technology, 2018-07-27) Dai, Bo

Machine learning has recently witnessed revolutionary success in a wide spectrum of domains. The learning objectives, model representation, and learning algorithms are important components of machine learning methods. To construct successful machine learning methods that are naturally fit to different problems with different targets and inputs, one should consider these three components together in a principled way. This dissertation aims for developing a unified learning framework for such purpose. The heart of this framework is the optimization with the integral operator in infinite-dimensional spaces. Such an integral operator representation view in the proposed framework provides us an abstract tool to consider these three components together for plenty of machine learning tasks and will lead to efficient algorithms equipped with flexible representations achieving better approximation ability, scalability, and statistical properties. We mainly investigate several motivated machine learning problems, i.e., kernel methods, Bayesian inference, invariance learning, policy evaluation and policy optimization in reinforcement learning, as the special cases of the proposed framework with different instantiations of the integral operator in the framework. These instantiations result in the learning problems with inputs as functions, distributions, and dynamics. The corresponding algorithms are derived to handle the particular integral operators via efficient and provable stochastic approximation by exploiting the particular structure properties in the operators. The proposed framework and the derived algorithms are deeply rooted in functional analysis, stochastic optimization, nonparametric method, and Monte Carlo approximation, and contributed to several sub-fields in machine learning community, including kernel methods, Bayesian inference, and reinforcement learning. We believe the proposed framework is a valuable tool for developing machine learning methods in a principled way and can be potentially applied to many other scenarios.
Efficient parallel algorithms for error correction and transcriptome assembly of biological sequences

(Georgia Institute of Technology, 2018-05-29) Sachdeva, Vipin

Next-generation sequencing technologies have led to a big data age in biology. Since the sequencing of the human genome, the primary bottleneck has steadily moved from collection to storage and analysis of the data. The primary contributions of this dissertation are design and implementation of novel parallel algorithms for two important problems in bioinformatics – error-correction and transcriptome assembly. For error-correction, we focused on k-mer spectrum based error-correction application called Reptile. We designed a novel distributed memory algorithm that divided the k-mer and tiles amongst the processing ranks. This allows any hardware with any memory size per node to be employed for error-correction using Reptile’s algorithm, irrespective of the size of the dataset. Our implementational achieved highly scalable results for E.Coli, Drosophila as well as the human datasets which consisted of 1.55 billion reads. Besides an algorithm that distributes k-mers and tiles between ranks, we have also implemented numerous heuristics that are useful to adjust the algorithm based on the hardware traits. We also implemented an extension of our parallel algorithm further by using pre-generating tiles and using collective messages to reduce the number of point to point messages for error-correction. Further extensions of this work have focused to create a library for distributed k-mer processing which has applications to problems in metagenomics. For transcriptome assembly, we have implemented a hybrid MPI-OpenMP approach for Chrysalis, which is part of the Trinity pipeline. Chrysalis clusters minimally overlapping contigs obtained from the prior module in Trinity called Inchworm. With this parallelization, we were able to reduce the runtime of the Chrysalis step of the Trinity workflow from over 50 hours to less than 5 hours for the sugarbeet dataset. We also employed this implementation to complete transcriptome of a 1.5 billion reads dataset pooled from different bread wheat cultivars. Furthermore, we have also implemented a MapReduce based approach to clustering k-mers which has application to the parallelization of the Inchworm module of Trinity. This implementation is a significant step towards making de novo transcriptome assembly feasible for ever bigger transcriptome datasets.
Doctor AI: Interpretable deep learning for modeling electronic health records

(Georgia Institute of Technology, 2018-05-23) Choi, Edward

Deep learning recently has been showing superior performance in complex domains such as computer vision, audio processing and natural language processing compared to traditional statistical methods. Naturally, deep learning techniques, combined with large electronic health records (EHR) data generated from healthcare organizations have potential to bring dramatic changes to the healthcare industry. However, typical deep learning models can be seen as highly expressive blackboxes, making them difficult to be adopted in real-world healthcare applications due to lack of interpretability. In order for deep learning methods to be readily adopted by real-world clinical practices, they must be interpretable without sacrificing their prediction accuracy. In this thesis, we propose interpretable and accurate deep learning methods for modeling EHR, specifically focusing on longitudinal EHR data. We will be- gin with a direct application of a well-known deep learning algorithm, recurrent neural networks (RNN), to capture the temporal nature of longitudinal EHR. Then, based on the initial approach we develop interpretable deep learning models by focusing on three aspects of computational healthcare: efficient representation learning of medical concepts, code-level interpretation for sequence predictions, and leveraging domain knowledge into the model. Another important aspect that we will address in this thesis is developing a framework for effectively utilizing multiple data sources (e.g. diagnoses, medications, procedures), which can be extended in the future to incorporate wider data modalities such as lab values and clinical notes.
Scalable and resilient sparse linear solvers

(Georgia Institute of Technology, 2018-05-22) Sao, Piyush kumar

Solving a large and sparse system of linear equations is a ubiquitous problem in scientific computing. The challenges in scaling such solvers on current and future parallel computer systems are the high-cost of communication and the expected decrease in reliability of the hardware components. This dissertation contributes new techniques to address these issues. Regarding communication, we make two advances to reduce both on-node and inter-node communication of distributed memory sparse direct solvers. On-node, we propose a novel technique, called the HALO, targeted at heterogeneous architectures consisting of multicore and hardware accelerator such as GPU or Xeon-Phi. The name HALO is a shorthand for highly asynchronous lazy offload, which refers to the way the method combines highly aggressive use of asynchrony with the accelerated offload, lazy updates, and data shadowing (a la Halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. The overall hybrid solver achieves speed-up of up-to 3x on a variety of realistic test problems in single and multi-node configurations. To reduce inter-node communication, we present a novel communication-avoiding 3D sparse LU factorization algorithm. The 3D sparse LU factorization algorithm uses a three-dimensional logical arrangement of MPI processes and combines the data redundancy with the so-called elimination tree parallelism to reduce the communication. The 3D algorithm reduces the asymptotic communication costs by a factor of $O(\sqrt(log n))$ and latency costs by a factor of $O(log n)$ for planar sparse matrices arising from finite element discretization of two-dimensional PDEs. For the non-planar sparse matrices, it reduces the communication and latency costs by a constant factor. Beyond performance, we consider methods to improve solver resilience. In emerging and future systems with billions of computing elements, hardware faults during the execution may become a norm rather than an exception. We illustrate the principle of self-stabilization for constructing fault-tolerant iterative linear solvers. We give two proof-of-concept examples of self-stabilizing iterative linear solvers: one for steepest descent (SD) and one for conjugate gradients (CG). Our self-stabilized versions of SD and CG require small amounts of fault-detection, e.g., we may check only for NaNs and infinities. We test our approach experimentally by analyzing its convergence and overhead for different types and rates of faults.
A novel method for cluster analysis of RNA structural data

(Georgia Institute of Technology, 2018-05-21) Rogers, Emily

Functional RNA is known to contribute to a host of important biological pathways, with new discoveries being made daily. Because function is dependent on structure, computational tools that predict secondary structure of RNA are crucial to researchers. By far the most popular method is to predict the minimum free energy structure as the native. However, well-known limitations of this method have led the computational RNA community to move on to Boltzmann sampling. This method predicts an ensemble of structures sampled from the Boltzmann distribution under the Nearest Neighbor Thermodynamic Model (NNTM). Although providing a more thorough view of the folding landscape of a sequence, the Boltzmann sampling method also has the drawback of needing post-processing (i.e. data mining) in order to be meaningful. This dissertation presents a novel method of representing and clustering secondary structures of a Boltzmann sample. In addition, it demonstrates its ability to extract the meaningful structural signal of a Boltzmann sample by identifying significant commonalities and differences. Applications include two outstanding problems in the computational RNA community: investigating the ill-conditioning of thermodynamic optimization under the NNTM, and predicting a consensus structure for a set of sequences. Finally, this dissertation concludes with research performed as an intern for the Department of Defense's Defense Forensic Science Center. This work concerns analyzing the results of a DNA mixture interpretation study, highlighting the current state of forensic interpretation today.
Graph analysis of streaming relational data

(Georgia Institute of Technology, 2018-04-13) Zakrzewska, Anita N.

Graph analysis can be used to study streaming data from a variety of sources, such as social networks, financial transactions, and online communication. The analysis of streaming data poses many challenges, including dealing with the high volume of data and the speed with which it is generated. This dissertation addresses challenges that occur throughout the graph analysis process. Because many datasets are large and growing, it may be infeasible to collect and build a graph from all the data that has been generated. This work addresses the challenges created by large volumes of streaming data through new sampling techniques. The algorithms presented can sample a subgraph in a single pass over an edge stream and are therefore appropriate for streaming applications. A sampling algorithm that can produce a temporally biased subgraph is also presented. Before graph analysis techniques can be applied, a graph must first be created from the data collected. When creating dynamic graphs, it is not obvious how to de-emphasize old information, especially when edges are derived from interactions. This work evaluates several methods of aging old data to create dynamic graphs. This dissertation also contributes new techniques for dynamic community detection and analysis. A new algorithm for local community detection on dynamic graphs is presented. Because it incrementally updates results when the graph changes, the method is suitable for streaming data. The creation of dynamic graphs allows us to study community changes over time. This work addresses the topic of community analysis with a vertex-level measure of community change. Together, these contributions advance the study of streaming relational data through graph analysis.
Energy efficient data driven distributed traffic simulations

(Georgia Institute of Technology, 2018-04-05) Neal, Sabra Alexandria

With the growing capabilities of the Internet of Things and proliferation of mobile devices interest in the use of real-time data as a means for input to distributed online simulations has increased. Online simulations provide users with the ability to utilize real-time data to make adaptations to the system, e.g., to adjust to unexpected events. One problem that arises when using these systems on mobile devices is that they are dependent upon the device’s stored energy. It is vital to understand how all components of such a system use the stored energy in order to understand how to develop such systems for energy constrained environments. One aspect of this thesis is to examine the role that discrete event driven and cellular automata models have on energy consumption in embedded systems. Discrete event driven simulations are dependent on a future event list for execution. It is important to understand the affect of the data structure for the future event list on energy consumption when running such simulations in embedded systems. This thesis presents a characterization of the relationship between the operations performed on the future event list and energy consumption. This thesis investigates an energy aware approach applicable for systems that are restricted to energy constrained environments.
Point process modeling and optimization of social networks

(Georgia Institute of Technology, 2018-04-05) Farajtabar, Mehrdad

Online social media such as Facebook and Twitter and communities such as Wikipedia and Stackoverflow turn to become an inseparable part of today's lifestyle. Users usually participate via a variety of ways like sharing text and photos, asking questions, finding friends, and favoring contents. Theses activities produce sequences of events data whose complex temporal dynamics need to be studied and is of many practical, economic, and societal interest. We propose a novel framework based on multivariate temporal point processes that is used for modeling, optimization, and inference of processes taken place over networks. In the modeling part, we propose a temporal point process model for joint dynamics of information propagation and structure evolution in networks. These two highly intertwined stochastic processes have been predominantly studied separately, ignoring their co-evolutionary dynamics. Our model allows us to efficiently simulate interleaved diffusion and network events, and generate traces obeying common diffusion and network patterns observed in real-world networks. In the optimization part, we establish the fundamentals of intervention and control in networks by combining the rich area of temporal point processes and the well-developed framework of Markov decision processes. We use point processes to capture both endogenous and exogenous events in social networks and formulate the problem as a Markov decision problem. Our methodology helps finding the optimal policy that balances the high present reward and large penalty on low future outcome in the presence of extensive uncertainties. In the inference part, we propose an intensity-free approach for point processes modeling that transforms the nuisance process to the target one. Furthermore, we train our deep neural network model using a likelihood-free approach leveraging Wasserstein distance between point processes.