Series
Doctor of Philosophy with a Major in Computational Science and Engineering

Series Type
Degree Series
Description
Associated Organization(s)

Publication Search Results

Now showing 1 - 4 of 4
  • Item
    Data-Driven Approach using Machine Learning for Real-Time Flight Path Optimization
    (Georgia Institute of Technology, 2021-04-27) Kim, Jung Hyun
    Airlines typically gather all available weather information before departure to generate flight routes that avoid hazardous weather while minimizing operating expenditures. However, pilots potentially have to perform in-flight re-planning as weather information can significantly change after original flight plans are created. One potential issue is that current in-flight re-planning systems are not fully automated; thus, pilots today perform some portions of the in-flight activities manually. Another potential issue is that weather forecasts used for the systems are not always accessible in a timely manner. This research attempts to resolve the potential issues by developing a framework that automatically performs in-flight re-planning continuously with the latest weather information sets available. This study specifically develops 1) a supervised machine learning-based wind prediction model to obtain continuous wind information, 2) an unsupervised machine learning-based short-term (i.e., every 10 minutes) convective weather prediction model to define reliable and up-to-date areas of convective weather, and 3) a designated points-based flight path optimization model that combines the A* search algorithm with a free-flight approach to find an optimal flight path. As a part of this research, statistical analyses are performed using real flights to prove the potential benefits and applicability of the proposed methodology. The results indicate that the framework developed by this research generates flight routes that reduce flight time by up to two percent in most cases. The outcome of this research establishes not only an automated framework that continuously performs flight path optimization but also provides a basis for optimizing flight routes for all categories of airplanes.
  • Item
    Analysis of the subsequence composition of biosequences
    (Georgia Institute of Technology, 2012-05-07) Cunial, Fabio
    Measuring the amount of information and of shared information in biological strings, as well as relating information to structure, function and evolution, are fundamental computational problems in the post-genomic era. Classical analyses of the information content of biosequences are grounded in Shannon's statistical telecommunication theory, while the recent focus is on suitable specializations of the notions introduced by Kolmogorov, Chaitin and Solomonoff, based on data compression and compositional redundancy. Symmetrically, classical estimates of mutual information based on string editing are currently being supplanted by compositional methods hinged on the distribution of controlled substructures. Current compositional analyses and comparisons of biological strings are almost exclusively limited to short sequences of contiguous solid characters. Comparatively little is known about longer and sparser components, both from the point of view of their effectiveness in measuring information and in separating biological strings from random strings, and from the point of view of their ability to classify and to reconstruct phylogenies. Yet, sparse structures are suspected to grasp long-range correlations and, at short range, they are known to encode signatures and motifs that characterize molecular families. In this thesis, we introduce and study compositional measures based on the repertoire of distinct subsequences of any length, but constrained to occur with a predefined maximum gap between consecutive symbols. Such measures highlight previously unknown laws that relate subsequence abundance to string length and to the allowed gap, across a range of structurally and functionally diverse polypeptides. Measures on subsequences are capable of separating only few amino acid strings from their random permutations, but they reveal that random permutations themselves amass along previously undetected, linear loci. This is perhaps the first time in which the vocabulary of all distinct subsequences of a set of structurally and functionally diverse polypeptides is systematically counted and analyzed. Another objective of this thesis is measuring the quality of phylogenies based on the composition of sparse structures. Specifically, we use a set of repetitive gapped patterns, called motifs, whose length and sparsity have never been considered before. We find that extremely sparse motifs in mitochondrial proteomes support phylogenies of comparable quality to state-of-the-art string-based algorithms. Moving from maximal motifs -- motifs that cannot be made more specific without losing support -- to a set of generators with decreasing size and redundancy, generally degrades classification, suggesting that redundancy itself is a key factor for the efficient reconstruction of phylogenies. This is perhaps the first time in which the composition of all motifs of a proteome is systematically used in phylogeny reconstruction on a large scale. Extracting all maximal motifs, or even their compact generators, is infeasible for entire genomes. In the last part of this thesis, we study the robustness of measures of similarity built around the dictionary of LZW -- the variant of the LZ78 compression algorithm proposed by Welch -- and of some of its recently introduced gapped variants. These algorithms use a very small vocabulary, they perform linearly in the input strings, and they can be made even faster than LZ77 in practice. We find that dissimilarity measures based on maximal strings in the dictionary of LZW support phylogenies that are comparable to state-of-the-art methods on test proteomes. Introducing a controlled proportion of gaps does not degrade classification, and allows to discard up to 20% of each input proteome during comparison.
  • Item
    Stochastic m-estimators: controlling accuracy-cost tradeoffs in machine learning
    (Georgia Institute of Technology, 2011-11-15) Dillon, Joshua V.
    m-Estimation represents a broad class of estimators, including least-squares and maximum likelihood, and is a widely used tool for statistical inference. Its successful application however, often requires negotiating physical resources for desired levels of accuracy. These limiting factors, which we abstractly refer as costs, may be computational, such as time-limited cluster access for parameter learning, or they may be financial, such as purchasing human-labeled training data under a fixed budget. This thesis explores these accuracy- cost tradeoffs by proposing a family of estimators that maximizes a stochastic variation of the traditional m-estimator. Such "stochastic m-estimators" (SMEs) are constructed by stitching together different m-estimators, at random. Each such instantiation resolves the accuracy-cost tradeoff differently, and taken together they span a continuous spectrum of accuracy-cost tradeoff resolutions. We prove the consistency of the estimators and provide formulas for their asymptotic variance and statistical robustness. We also assess their cost for two concerns typical to machine learning: computational complexity and labeling expense. For the sake of concreteness, we discuss experimental results in the context of a variety of discriminative and generative Markov random fields, including Boltzmann machines, conditional random fields, model mixtures, etc. The theoretical and experimental studies demonstrate the effectiveness of the estimators when computational resources are insufficient or when obtaining additional labeled samples is necessary. We also demonstrate that in some cases the stochastic m-estimator is associated with robustness thereby increasing its statistical accuracy and representing a win-win.
  • Item
    A parallel geometric multigrid method for finite elements on octree meshes applied to elastic image registration
    (Georgia Institute of Technology, 2009-06-24) Sampath, Rahul Srinivasan
    The first component of this work is a parallel algorithm for constructing non-uniform octree meshes for finite element computations. Prior to octree meshing, the linear octree data structure must be constructed and a constraint known as "2:1 balancing" must be enforced; parallel algorithms for these two subproblems are also presented. The second component of this work is a parallel matrix-free geometric multigrid algorithm for solving elliptic partial differential equations (PDEs) using these octree meshes. The last component of this work is a parallel multiscale Gauss Newton optimization algorithm for solving the elastic image registration problem. The registration problem is discretized using finite elements on octree meshes and the parallel geometric multigrid algorithm is used as a preconditioner in the Conjugate Gradient (CG) algorithm to solve the linear system of equations formed in each Gauss Newton iteration. Several ideas were used to reduce the overhead for constructing the octree meshes. These include (a) a way to lower communication costs by reducing the number of synchronizations and reducing the communication message size, (b) a way to reduce the number of searches required to build element-to-vertex mappings, and (c) a compression scheme to reduce the memory footprint of the entire data structure. To our knowledge, the multigrid algorithm presented in this work is the only matrix-free multiplicative geometric multigrid implementation for solving finite element equations on octree meshes using thousands of processors. The proposed registration algorithm is also unique; it is a combination of many different ideas: adaptivity, parallelism, fast optimization algorithms, and fast linear solvers. All the algorithms were implemented in C++ using the Message Passing Interface (MPI) standard and were built on top of the PETSc library from Argonne National Laboratory. The multigrid implementation has been released as an open source software: Dendro. Several numerical experiments were performed to test the performance of the algorithms. These experiments were performed on a variety of NSF TeraGrid platforms. Our largest run was a highly-nonuniform, 8-billion-unknown, elasticity calculation on 32,000 processors.