Series
Master of Science in Computer Science

Series Type
Degree Series
Description
Associated Organization(s)
Associated Organization(s)
Organizational Unit

Publication Search Results

Now showing 1 - 8 of 8
  • Item
    Tuned and asynchronous stencil kernels for CPU/GPU systems
    (Georgia Institute of Technology, 2009-05-18) Venkatasubramanian, Sundaresan
    We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider "wildly asynchronous" implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations, thereby potentially trading off more flops (via more iterations to converge) for a higher degree of asynchronous parallelism. Our relaxed-synchronization implementations on a GPU can be 1.2-2.5x faster than our best synchronized GPU implementation while achieving the same accuracy. Looking forward, this result suggests research on similarly "fast-and-loose" algorithms in the coming era of increasingly massive concurrency and relatively high synchronization or communication costs.
  • Item
    Prediction of secondary structures for large RNA molecules
    (Georgia Institute of Technology, 2009-01-12) Mathuriya, Amrita
    The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.
  • Item
    Modeling and simulating the propagation of infectious diseases using complex networks
    (Georgia Institute of Technology, 2008-07-15) Quax, Rick
    For explanation and prediction of the evolution of infectious diseases in populations, researchers often use simplified mathematical models for simulation. We believe that the results from these models are often questionable when the epidemic dynamics becomes more complex, and that developing more realistic models is intractable. In this dissertation we propose to simulate infectious disease propagation using dynamic and complex networks. We present the Simulator of Epidemic Evolution using Complex Networks (SEECN), an expressive and high-performance framework that combines algorithms for graph generation and various operators for modeling temporal dynamics. For graph generation we use the Kronecker algorithm, derive its underlying statistical structure and exploit it for a variety of purposes. Then the epidemic is evolved over the network by simulating the dynamics of the population and the epidemic simultaneously, where each type of dynamics is performed by a separate operator. All dynamics operators can be fully and independently parameterized, facilitating incremental model development and enabling different influences to be toggled for differential analysis. As a prototype, we simulate two relatively complex models for the HIV epidemic and find a remarkable fit to reported data for AIDS incidence and prevalence. Our most important conclusion is that the mere dynamics of the HIV epidemic is sufficient to produce rather complex trends in the incidence and prevalence statistics, e.g. without the introduction of particularly effective treatments at specific times. We show that this invalidates assumptions and conclusions made previously in the literature, and argue that simulations used for explanation and prediction of trends should incorporate more realistic models for both the population and the epidemic than is currently done. In addition, we substantiate a previously predicted paradox that the availability of Highly Active Anti-Retroviral Treatment likely causes an increased HIV incidence.
  • Item
    Honeynet design and implementation
    (Georgia Institute of Technology, 2007-12-20) Artore, Diane
    Over the past decade, webcriminality has become a real issue. Because they allow the botmasters to control hundreds to millions of machines, botnets became the first-choice attack platform for the network attackers, to launch distributed denial of service attacks, steal sensitive information and spend spam emails. This work aims at designing and implementing a honeynet, specific to IRC bots. Our system works in 3 phasis: (1) binaries collection, (2) simulation, and (3) activity capturing and monitoring. Our phase 2 simulation uses an IRC redirection to extract the connection information thanks to a IRC redirection (using a DNS redirection and a "fakeserver"). In phase 3, we use the information previously extracted to launch our honeyclient, which will capture and monitor the traffic on the C&C channel. Thanks to our honeynet, we create a database of the activity of IRC botnets (their connection characteristics, commands on the C&C ), and hope to learn more about their behavior and the underground market they create.
  • Item
    Efficient Parallel Algorithm for Overlaying Surface Meshes
    (Georgia Institute of Technology, 2007-05-17) Jain, Ankita
    Many computational applications involve multiple physical components, each having its own computational domain discretized by a mesh. An integrated simulation of these physical systems require transferring data across these boundaries, which are typically represented by surface meshes composed of triangles or quadrilaterals and are non-matching with differing connectivities and geometry. It is necessary to constructa common refinement (or common tessellation) of the surface meshes to transfer data between different domains accurately and conservatively. For large-scale problems that involve moving boundary, the common tessellation must be updated frequently within the integrated simulations running on parallel computers. Previously, Jiao and Heath developed an algorithm for constructing a common tessellation by overlaying the surface meshes. The original algorithm is efficient and robust, but unfortunately, it is complex and difficult to parallelize. In this thesis, we develop a modified algorithm for overlaying surface meshes. Our algorithm employs a high-level primitive, face-intersection, which combines the low-level point-projection and edge-intersection primitives of the original algorithm. A main advantage of our modified algorithm is its ease of implementation and parallelization. Our implementation utilizes flexible data structures for efficient computation and query of the common tessellation and avoids potential redundancy in computations to achieve high efficiency. To achieve robustness, we pay special attention to avoid potential topological inconsistencies due to numerical errors, and introduce a preprocessing step to project a far-apart surface mesh onto other before computing the common tessellation. We present numerical examples to demonstrate the robustness and efficiency of our method on parallel computers.
  • Item
    Parallel discrete event simulation techniques for scientific simulations
    (Georgia Institute of Technology, 2005-04-19) Dave, Jagrut Durdant
    Exponential growth in computer technology, both in terms of individual CPUs and parallel technologies over the past decades has triggered rapid progress in large scale simulations. However, despite these achievements it has become clear that many conventional state-of-the-art techniques are ill-equipped to tackle problems that inherently involve multiple scales in configuration space. Our difficulty is that conventional ("time driven" or "time stepped") techniques update all parts of simulation space (fields, particles) synchronously, i.e. at time intervals assumed to be the same throughout the global computation domain or at best varying on a sub-domain basis (in adaptive mesh refinement algorithms). Using a serial electrostatic model, it was recently shown that discrete event techniques can lead to more than two orders of magnitude speedup compared to the time-stepped approach. In this research, the focus is on the extension of this technique to parallel architectures, using parallel discrete event simulation. Previous research in parallel discrete event simulations of scientific phenomena has been limited This thesis outlines a technique for converting a time-stepped simulation in the scientific domain into an equivalent parallel discrete event model. As a candidate simulation, an electromagnetic hybrid plasma simulation is considered. The experiments and analysis show the trade-offs on performance by varying the following factors: the simulations model characteristics (e.g. lookahead), applications load balancing, and accuracy of simulation results. The experiments are performed on a high performance cluster, using a conservative synchronization mechanism. Initial performance results are encouraging, demonstrating very good parallel speedup for large-scale model configurations containing tens of thousands of cells. Overheads for inter-processor communication remain a challenge for smaller computations.
  • Item
    An Optimization Framework for Embedded Processors with Auto-Modify Addressing Modes
    (Georgia Institute of Technology, 2004-12-08) Lau, ChokSheak
    Modern embedded processors with dedicated address generation unit support memory accesses using indirect addressing mode with auto-increment and auto-decrement. The auto-increment/decrement mode, if properly utilized, can save address arithmetic instructions, reduce static and dynamic footprint of the program and speed up the execution as well. We propose an optimization framework for embedded processors based on the auto-increment and decrement addressing modes for address registers. Existing work on this class of optimizations focuses on using an access graph and finding the maximum weight path cover to find an optimized stack variables layout. We take this further by using coalescing, addressing mode selection and offset registers to find further opportunities for reducing the number of load-address instructions required. We also propose an algorithm for building the layout with considerations for memory accesses across basic blocks, because existing work mainly considers intra-basic-block information. We then use the available offset registers to try to further reduce the number of address arithmetic instructions after layout assignment.
  • Item
    An Empirical Evaluation of Human Figure Tracking Using Switching Linear Models
    (Georgia Institute of Technology, 2004-11-19) Patrick, Hugh Alton, Jr.
    One of the difficulties of human figure tracking is that humans move their bodies in complex, non-linear ways. An effective computational model of human motion could therefore be of great benefit in figure tracking. We are interested in the use of a class of dynamic models called switching linear dynamic systems for figure tracking. This thesis makes two contributions. First, we present an empirical analysis of some of the technical issues involved with applying linear dynamic systems to figure tracking. The lack of high-level theory in this area makes this type of empirical study valuable and necessary. We show that sensitivity of these models to perturbations in input is a central issue in their application to figure tracking. We also compare different types of LDS models and identification algorithms. Second, we describe 2-DAFT, a flexible software framework we have created for figure tracking. 2-DAFT encapsulates data and code involved in different parts of the tracking problem in a number of modules. This architecture leads to flexibility and makes it easy to implement new tracking algorithms.