Organizational Unit:
School of Computational Science and Engineering

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 17
  • Item
    The fast multipole method at exascale
    (Georgia Institute of Technology, 2013-11-26) Chandramowlishwaran, Aparna
    This thesis presents a top to bottom analysis on designing and implementing fast algorithms for current and future systems. We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (FMM) for solving N- body problems. We target the FMM because it is broadly applicable to a variety of scientific particle simulations used to study electromagnetic, fluid, and gravitational phenomena, among others. Importantly, the FMM has asymptotically optimal time complexity with guaranteed approximation accuracy. As such, it is among the most attractive solutions for scalable particle simulation on future extreme scale systems. We specifically address two key challenges. The first challenge is how to engineer fast code for today’s platforms. We present the first in-depth study of multicore op- timizations and tuning for FMM, along with a systematic approach for transforming a conventionally-parallelized FMM into a highly-tuned one. We introduce novel opti- mizations that significantly improve the within-node scalability of the FMM, thereby enabling high-performance in the face of multicore and manycore systems. The second challenge is how to understand scalability on future systems. We present a new algorithmic complexity analysis of the FMM that considers both intra- and inter- node communication costs. Using these models, we present results for choosing the optimal algorithmic tuning parameter. This analysis also yields the surprising prediction that although the FMM is largely compute-bound today, and therefore highly scalable on current systems, the trajectory of processor architecture designs, if there are no significant changes could cause it to become communication-bound as early as the year 2015. This prediction suggests the utility of our analysis approach, which directly relates algorithmic and architectural characteristics, for enabling a new kind of highlevel algorithm-architecture co-design. To demonstrate the scientific significance of FMM, we present two applications namely, direct simulation of blood which is a multi-scale multi-physics problem and large-scale biomolecular electrostatics. MoBo (Moving Boundaries) is the infrastruc- ture for the direct numerical simulation of blood. It comprises of two key algorithmic components of which FMM is one. We were able to simulate blood flow using Stoke- sian dynamics on 200,000 cores of Jaguar, a peta-flop system and achieve a sustained performance of 0.7 Petaflop/s. The second application we propose as future work in this thesis is biomolecular electrostatics where we solve for the electrical potential using the boundary-integral formulation discretized with boundary element methods (BEM). The computational kernel in solving the large linear system is dense matrix vector multiply which we propose can be calculated using our scalable FMM. We propose to begin with the two dielectric problem where the electrostatic field is cal- culated using two continuum dielectric medium, the solvent and the molecule. This is only a first step to solving biologically challenging problems which have more than two dielectric medium, ion-exclusion layers, and solvent filled cavities. Finally, given the difficulty in producing high-performance scalable code, productivity is a key concern. Recently, numerical algorithms are being redesigned to take advantage of the architectural features of emerging multicore processors. These new classes of algorithms express fine-grained asynchronous parallelism and hence reduce the cost of synchronization. We performed the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art mul- ticore systems. Our implementations in CnC was able to match and in some cases even exceed competing vendor-tuned and domain specific library codes. We combine these two distinct research efforts by expressing FMM in CnC, our approach tries to marry performance with productivity that will be critical on future systems. Looking forward, we would like to extend this to distributed memory machines, specifically implement FMM in the new distributed CnC, distCnC to express fine-grained paral- lelism which would require significant effort in alternative models.
  • Item
    Generalized N-body problems: a framework for scalable computation
    (Georgia Institute of Technology, 2013-08-26) Riegel, Ryan Nelson
    In the wake of the Big Data phenomenon, the computing world has seen a number of computational paradigms developed in response to the sudden need to process ever-increasing volumes of data. Most notably, MapReduce has proven quite successful in scaling out an extensible class of simple algorithms to even hundreds of thousands of nodes. However, there are some tasks---even embarrassingly parallelizable ones---that neither MapReduce nor any existing automated parallelization framework is well-equipped to perform. For instance, any computation that (naively) requires consideration of all pairs of inputs becomes prohibitively expensive even when parallelized over a large number of worker nodes. Many of the most desirable methods in machine learning and statistics exhibit these kinds of all-pairs or, more generally, all-tuples computations; accordingly, their application in the Big Data setting may seem beyond hope. However, a new algorithmic strategy inspired by breakthroughs in computational physics has shown great promise for a wide class of computations dubbed generalized N-body problems (GNBPs). This strategy, which involves the simultaneous traversal of multiple space-partitioning trees, has been applied to a succession of well-known learning methods, accelerating each asymptotically and by orders of magnitude. Examples of these include all-k-nearest-neighbors search, k-nearest-neighbors classification, k-means clustering, EM for mixtures of Gaussians, kernel density estimation, kernel discriminant analysis, kernel machines, particle filters, the n-point correlation, and many others. For each of these problems, no overall faster algorithms are known. Further, these dual- and multi-tree algorithms compute either exact results or approximations to within specified error bounds, a rarity amongst fast methods. This dissertation aims to unify a family of GNBPs under a common framework in order to ease implementation and future study. We start by formalizing the problem class and then describe a general algorithm, the generalized fast multipole method (GFMM), capable of solving all problems that fit the class, though with varying degrees of speedup. We then show O(N) and O(log N) theoretical run-time bounds that may be obtained under certain conditions. As a corollary, we derive the tightest known general-dimensional run-time bounds for exact all-nearest-neighbors and several approximated kernel summations. Next, we implement a number of these algorithms in a commercial database, empirically demonstrating dramatic asymptotic speedup over their conventional SQL implementations. Lastly, we implement a fast, parallelized algorithm for kernel discriminant analysis and apply it to a large dataset (40 million points in 4D) from the Sloan Digital Sky Survey, identifying approximately one million quasars with high accuracy. This exceeds the previous largest catalog of quasars in size by a factor of ten and has since been used in a follow-up study to confirm the existence of dark energy.
  • Item
    Extending low-rank matrix factorizations for emerging applications
    (Georgia Institute of Technology, 2013-08-12) Zhou, Ke
    Low-rank matrix factorizations have become increasingly popular to project high dimensional data into latent spaces with small dimensions in order to obtain better understandings of the data and thus more accurate predictions. In particular, they have been widely applied to important applications such as collaborative filtering and social network analysis. In this thesis, I investigate the applications and extensions of the ideas of the low-rank matrix factorization to solve several practically important problems arise from collaborative filtering and social network analysis. A key challenge in recommendation system research is how to effectively profile new users, a problem generally known as \emph{cold-start} recommendation. In the first part of this work, we extend the low-rank matrix factorization by allowing the latent factors to have more complex structures --- decision trees to solve the problem of cold-start recommendations. In particular, we present \emph{functional matrix factorization} (fMF), a novel cold-start recommendation method that solves the problem of adaptive interview construction based on low-rank matrix factorizations. The second part of this work considers the efficiency problem of making recommendations in the context of large user and item spaces. Specifically, we address the problem through learning binary codes for collaborative filtering, which can be viewed as restricting the latent factors in low-rank matrix factorizations to be binary vectors that represent the binary codes for both users and items. In the third part of this work, we investigate the applications of low-rank matrix factorizations in the context of social network analysis. Specifically, we propose a convex optimization approach to discover the hidden network of social influence with low-rank and sparse structure by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrences. The proposed framework combines the estimation of mutually exciting process and the low-rank matrix factorization in a principled manner. In the fourth part of this work, we estimate the triggering kernels for the Hawkes process. In particular, we focus on estimating the triggering kernels from an infinite dimensional functional space with the Euler Lagrange equation, which can be viewed as applying the idea of low-rank factorizations in the functional space.
  • Item
    Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC
    (Georgia Institute of Technology, 2013-08) Lin, Zhiyuan ; Chau, Duen Horng
    Large graphs with billions of nodes and edges are increasingly common, calling for new kinds of scalable computation frameworks. Although popular, distributed approaches can be expensive to build, or require many resources to manage or tune. State-of-the-art approaches such as GraphChi and TurboGraph recently have demonstrated that a single machine can efficiently perform advanced computation on billion-node graphs. Although fast, they both use sophisticated data structures, memory management, and optimization techniques. We propose a minimalist approach that forgoes such complexities, by leveraging the memory mapping capability found on operating systems. Our experiments on large datasets, such as a 1.5 billion edge Twitter graph, show that our streamlined approach achieves up to 26 times faster than GraphChi, and comparable to TurboGraph. We con- tribute our crucial insight that by leveraging memory mapping, a fundamental operating system capability, we can outperform the latest graph computation techniques.
  • Item
    Multi-tree algorithms for computational statistics and phyiscs
    (Georgia Institute of Technology, 2013-07-02) March, William B.
    The Fast Multipole Method of Greengard and Rokhlin does the seemingly impossible: it approximates the quadratic scaling N-body problem in linear time. The key is to avoid explicitly computing the interactions between all pairs of N points. Instead, by organizing the data in a space-partitioning tree, distant interactions are quickly and efficiently approximated. Similarly, dual-tree algorithms, which approximate or eliminate parts of a computation using distance bounds, are the fastest algorithms for several fundamental problems in statistics and machine learning -- including all nearest neighbors, kernel density estimation, and Euclidean minimum spanning tree construction. We show that this overarching principle -- that by organizing points spatially, we can solve a seemingly quadratic problem in linear time -- can be generalized to problems involving interactions between sets of three or more points and can provide orders-of-magnitude speedups and guarantee runtimes that are asymptotically better than existing algorithms. We describe a family of algorithms, multi-tree algorithms, which can be viewed as generalizations of dual-tree algorithms. We support this thesis by developing and implementing multi-tree algorithms for two fundamental scientific applications: n-point correlation function estimation and Hartree-Fock theory. First, we demonstrate multi-tree algorithms for n-point correlation function estimation. The n-point correlation functions are a family of fundamental spatial statistics and are widely used for understanding large-scale astronomical surveys, characterizing the properties of new materials at the microscopic level, and for segmenting and processing images. We present three new algorithms which will reduce the dependence of the computation on the size of the data, increase the resolution in the result without additional time, and allow probabilistic estimates independent of the problem size through sampling. We provide both empirical evidence to support our claim of massive speedups and a theoretical analysis showing linear scaling in the fundamental computational task. We demonstrate the impact of a carefully optimized base case on this computation and describe our distributed, scalable, open-source implementation of our algorithms. Second, we explore multi-tree algorithms as a framework for understanding the bottleneck computation in Hartree-Fock theory, a fundamental model in computational chemistry. We analyze existing fast algorithms for this problem, and show how they fit in our multi-tree framework. We also show new multi-tree methods, demonstrate that they are competitive with existing methods, and provide the first rigorous guarantees for the runtimes of all of these methods. Our algorithms will appear as part of the PSI4 computational chemistry library.
  • Item
    Integration of computational methods and visual analytics for large-scale high-dimensional data
    (Georgia Institute of Technology, 2013-07-02) Choo, Jae gul
    With the increasing amount of collected data, large-scale high-dimensional data analysis is becoming essential in many areas. These data can be analyzed either by using fully computational methods or by leveraging human capabilities via interactive visualization. However, each method has its drawbacks. While a fully computational method can deal with large amounts of data, it lacks depth in its understanding of the data, which is critical to the analysis. With the interactive visualization method, the user can give a deeper insight on the data but suffers when large amounts of data need to be analyzed. Even with an apparent need for these two approaches to be integrated, little progress has been made. As ways to tackle this problem, computational methods have to be re-designed both theoretically and algorithmically, and the visual analytics system has to expose these computational methods to users so that they can choose the proper algorithms and settings. To achieve an appropriate integration between computational methods and visual analytics, the thesis focuses on essential computational methods for visualization, such as dimension reduction and clustering, and it presents fundamental development of computational methods as well as visual analytic systems involving newly developed methods. The contributions of the thesis include (1) the two-stage dimension reduction framework that better handles significant information loss in visualization of high-dimensional data, (2) efficient parametric updating of computational methods for fast and smooth user interactions, and (3) an iteration-wise integration framework of computational methods in real-time visual analytics. The latter parts of the thesis focus on the development of visual analytics systems involving the presented computational methods, such as (1) Testbed: an interactive visual testbed system for various dimension reduction and clustering methods, (2) iVisClassifier: an interactive visual classification system using supervised dimension reduction, and (3) VisIRR: an interactive visual information retrieval and recommender system for large-scale document data.
  • Item
    Ad hoc distributed simulation: a method for embedded online simulations
    (Georgia Institute of Technology, 2013-07-02) Huang, Ya-Lin
    The continual growth of computing power in small devices has motivated the development of novel approaches to optimizing operational systems efficiently and effectively. These optimization problems are often so complex that solving them analytically may be difficult, if not prohibited. One method for solving such problems is to use online simulation. However, challenges in using online simulation include the issues of responsiveness (e.g., because of communication delays), scalability, and failure resistance. To tackle these issues, this study proposes embedding online simulations into a network of sensors that monitors the system under investigation. This thesis explores an approach termed “ad hoc distributed simulation,” which is based on embedding online simulations into a sensor network and adding communication and synchronization among simulators to model operational systems. This approach offers several potential advantages over existing approaches: (1) it can provide rapid response to system dynamics as well as efficiency since data exchange is local to the sensor network, (2) it can achieve better scalability to incorporate more sensors, and (3) it can provide better robustness to failures because portions of the system are still under local control. This research addresses several statistical issues in this ad hoc approach: (1) rapid and effective estimation of the input processes at model boundaries, (2) estimation of system-wide performance measures from individual simulator outputs, and (3) correction mechanisms responding to unexpected events or inaccuracies within the model. This thesis examines ad hoc distributed simulation analytically and experimentally, mainly focusing on the accuracy of predicting the performance of open queueing networks. First, the analytical part formalizes the ad hoc approach and evaluates its accuracy at modeling certain class of open queueing networks with regard to the steady-state system performance measures. This work concerning steady-state metrics is extended to a broader class of networks by an empirical study, which presents evidence to show that the ad hoc approach can generate predictions comparable to those from sequential simulations. Furthermore, a “buffered-area” mechanism is proposed to substantially reduce prediction bias with a moderate increase in execution time. In addition to those steady-state studies, another empirical study targets the prediction accuracy of the ad hoc approach at open queueing networks with short-term system-state transients. This study demonstrates that, with slight modification to the prior design of the ad hoc queueing simulation method for those steady-state studies, system dynamics can be well modeled. The results, again, support the conclusion that the ad hoc approach is competitive to the sequential simulation method in terms of prediction accuracy.
  • Item
    New paradigms for approximate nearest-neighbor search
    (Georgia Institute of Technology, 2013-07-02) Ram, Parikshit
    Nearest-neighbor search is a very natural and universal problem in computer science. Often times, the problem size necessitates approximation. In this thesis, I present new paradigms for nearest-neighbor search (along with new algorithms and theory in these paradigms) that make nearest-neighbor search more usable and accurate. First, I consider a new notion of search error, the rank error, for an approximate neighbor candidate. Rank error corresponds to the number of possible candidates which are better than the approximate neighbor candidate. I motivate this notion of error and present new efficient algorithms that return approximate neighbors with rank error no more than a user specified amount. Then I focus on approximate search in a scenario where the user does not specify the tolerable search error (error constraint); instead the user specifies the amount of time available for search (time constraint). After differentiating between these two scenarios, I present some simple algorithms for time constrained search with provable performance guarantees. I use this theory to motivate a new space-partitioning data structure, the max-margin tree, for improved search performance in the time constrained setting. Finally, I consider the scenario where we do not require our objects to have an explicit fixed-length representation (vector data). This allows us to search with a large class of objects which include images, documents, graphs, strings, time series and natural language. For nearest-neighbor search in this general setting, I present a provably fast novel exact search algorithm. I also discuss the empirical performance of all the presented algorithms on real data.
  • Item
    Analysis of macromolecular structure through experiment and computation
    (Georgia Institute of Technology, 2013-04-08) Gossett, John Jared
    This thesis covers a wide variety of projects within the domain of computational structural biology. Structural biology is concerned with the molecular structure of proteins and nucleic acids, and the relationship between structure and biological function. We used molecular modeling and simulation, a purely computational approach, to study DNA-linked molecular nanowires. We developed a computational tool that allows potential designs to be screened for viability, and then we used molecular dynamics (MD) simulations to test their stability. As an example of using molecular modeling to create experimentally testable hypotheses, we were able to suggest a new design based on pyrrylene vinylene monomers. In another project, we combined experiments and molecular modeling to gain insight into factors that influence the kinetic binding dynamics of fibrin "knob" peptides and complementary "holes." Molecular dynamics simulations provided helpful information about potential peptide structural conformations and intrachain interactions that may influence binding properties. The remaining projects discussed in this thesis all deal with RNA structure. The underlying approach for these studies is a recently developed chemical probing technology called 2'-hydroxyl acylation analyzed by primer extension (SHAPE). One study focuses on ribosomal RNA, specifically the 23S rRNA from T. thermophilus. We used SHAPE experiments to show that Domain III of the T. thermophilus 23S rRNA is an independently folding domain. This first required the development of our own data processing program for generating quantitative and interpretable data from our SHAPE experiments, due to limitations of existing programs and modifications to the experimental protocol. In another study, we used SHAPE chemistry to study the in vitro transcript of the RNA genome of satellite tobacco mosaic virus (STMV). This involved incorporating the SHAPE data into a secondary structure prediction program. The SHAPE-directed secondary structure of the STMV RNA was highly extended and considerably different from that proposed for the RNA in the intact virion. Finally, analyzing SHAPE data requires navigating a complex data processing pipeline. We review some of the various ways of running a SHAPE experiment, and how this affects the approach to data analysis.
  • Item
    Cyber Games
    (Georgia Institute of Technology, 2013-02-19) Vorobeychik, Yevgeniy
    Over the last few years I have been working on game theoretic models of security, with a particular emphasis on issues salient in cyber security. In this talk I will give an overview of some of this work. I will first spend some time motivating game theoretic treatment of problems relating to cyber and describe some important modeling considerations. In the remainder, I will describe two game theoretic models (one very briefly), and associated solution techniques and analyses. The first is the "optimal attack plan interdiction" problem. In this model, we view a threat formally as a sophisticated planning agent, aiming to achieve a set of goals given some specific initial capabilities and considering a space of possible "attack actions/vectors" that may (or may not) be used towards the desired ends. The defender's goal in this setting is to "interdict" a select subset of attack vectors by optimally choosing among miti-gation options, in order to prevent the attacker from being able to achieve its goals. I will describe the formal model, explain why it is challenging, and present highly scalable decomposition-based integer programming techniques that leverage extensive research into heuristic formal planning in AI. The second model addresses the problem that defense decisions are typically decentralized. I describe a model to study the impact of decentralization, and show that there is a "sweet spot": for an intermediate number of decision makers, the joint decision is nearly socially optimal, and has the additional benefit of being robust to the changes in the environment. Finally, I will describe the Secure Design Competition (FIREAXE) that involved two teams of interns during the summer of 2012. The problem that the teams were tasked with was to design a highly stylized version of an electronic voting system. The catch was that after the design phase, each team would attempt to "attack" the other's design. I will describe some salient aspects of the specification, as well as the outcome of this competition and lessons that we (the designers and the students) learned in the process.