Computational Science and Engineering Seminar Series

Series Type
Event Series
Associated Organization(s)
Associated Organization(s)

Publication Search Results

Now showing 1 - 10 of 35
  • Item
    Optimization for Machine Learning: SMO-MKL and Smoothing Strategies
    (Georgia Institute of Technology, 2011-04-15) Vishwanathan, S. V. N. ; Purdue University
    Our objective is to train $p$-norm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style co-ordinate ascent. In this paper, we demonstrate that linear MKL regularised with the $p$-norm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than the state-of-the-art specialised $p$-norm MKL solvers. We show that we can train on a hundred thousand kernels in less than fifteen minutes and on fifty thousand points in nearly an hour on a single core using standard hardware.
  • Item
    The Aha! Moment: From Data to Insight
    (Georgia Institute of Technology, 2014-02-07) Shahaf, Dafna ; Georgia Institute of Technology. School of Computational Science and Engineering ; Stanford University
    The amount of data in the world is increasing at incredible rates. Large-scale data has potential to transform almost every aspect of our world, from science to business; for this potential to be realized, we must turn data into insight. In this talk, I will describe two of my efforts to address this problem computationally. The first project, Metro Maps of Information, aims to help people understand the underlying structure of complex topics, such as news stories or research areas. Metro Maps are structured summaries that can help us understand the information landscape, connect the dots between pieces of information, and uncover the big picture. The second project proposes a framework for automatic discovery of insightful connections in data. In particular, we focus on identifying gaps in medical knowledge: our system recommends directions of research that are both novel and promising. I will formulate both problems mathematically and provide efficient, scalable methods for solving them. User studies on real-world datasets demonstrate that our methods help users acquire insight efficiently across multiple domains.
  • Item
    Accurate Inference of Phylogenetic Relationships from Multi-locus Data
    (Georgia Institute of Technology, 2010-03-09) Nakhleh, Luay ; Rice University. Dept. of Computer Science ; Rice University. Dept. of Biochemistry and Cell Biology
    Accurate inference of phylogenetic relationships of species, and understanding their relationships with gene trees are two central themes in molecular and evolutionary biology. Traditionally, a species tree is inferred by (1) sequencing a genomic region of interest from the group of species under study, (2) reconstructing its evolutionary history, and (3) declaring it to be the estimate of the species tree. However, recent analyses of increasingly available multi-locus data from various groups of organisms have demonstrated that different genomic regions may have evolutionary histories (called "oegene trees") that may disagree with each other, as well as with that of the species. This observation has called into question the suitability of the traditional approach to species tree inference. Further, when some, or all, of these disagreements are caused by reticulate evolutionary events, such as hybridization, then the phylogenetic relationship of the species is more appropriately modeled by a phylogenetic network than a tree. As a result, a new, post-genomic paradigm has emerged, in which multiple genomic regions are analyzed simultaneously, and their evolutionary histories are reconciled in order to infer the evolutionary history of the species, which may not necessarily be treelike. In this talk, I will describe our recent work on developing mathematical criteria and algorithmic techniques for analyzing incongruence among gene trees, and inferring phylogenetic relationships among species despite such incongruence. This includes work on lineage sorting, reticulate evolution, as well as simultaneous treatment of both. If time permits, I will describe our recent work on population genomic analysis of bacterial data, and the implications on the evolutionary forces shaping the genomic diversity in these populations.
  • Item
    High-performance-computing challenges for heart simulations
    (Georgia Institute of Technology, 2012-08-31) Fenton, Flavio H. ; Georgia Institute of Technology. School of Physics
    The heart is an electro-mechanical system in which, under normal conditions, electrical waves propagate in a coordinated manner to initiate an efficient contraction. In pathologic states, propagation can destabilize and exhibit chaotic dynamics mostly produced by single or multiple rapidly rotating spiral/scroll waves that generate complex spatiotemporal patterns of activation that inhibit contraction and can be lethal if untreated. Despite much study, little is known about the actual mechanisms that initiate, perpetuate, and terminate spiral waves in cardiac tissue. In this talk, I will motivate the problem with some experimental examples and then discuss how we study the problem from a computational point of view, from the numerical models derived to represent the dynamics of single cells to the coupling of millions of cells to represent the three-dimensional structure of a working heart. Some of the major difficulties of computer simulations for these kinds of systems include: i) Different orders of magnitude in time scales, from milliseconds to seconds; ii) millions of degrees of freedom over millions of integration steps within irregular domains; and iii) the need for near-real-time simulations. Advances in these areas will be discussed as well as the use of GPUs over the web using webGL?
  • Item
    Dependable direct solutions for linear systems using a little extra precision
    (Georgia Institute of Technology, 2009-08-21) Riedy, E. Jason ; Georgia Institute of Technology. School of Computational Science and Engineering
    Solving a square linear system Ax=b often is considered a black box. It's supposed to "just work," and failures often are blamed on the original data or subtleties of floating-point. Now that we have an abundance of cheap computations, however, we can do much better. A little extra precision in just the right places produces accurate solutions cheaply or demonstrates when problems are too hard to solve without significant cost. This talk will outline the method, iterative refinement with a new twist; the benefits, small backward and forward errors; and the trade-offs and unexpected benefits.
  • Item
    Composite Objective Optimization and Learning for Massive Datasets
    (Georgia Institute of Technology, 2010-09-03) Singer, Yoram ; Google Research ; Georgia Institute of Technology. School of Computational Science and Engineering
    Composite objective optimization is concerned with the problem of minimizing a two-term objective function which consists of an empirical loss function and a regularization function. Application with massive datasets often employ a regularization term which is non-differentiable or structured, such as L1 or mixed-norm regularization. Such regularizers promote sparse solutions and special structure of the parameters of the problem, which is a desirable goal for datasets of extremely high-dimensions. In this talk, we discuss several recently developed methods for performing composite objective minimization in the online learning and stochastic optimization settings. We start with a description of extensions of the well-known forward-backward splitting method to stochastic objectives. We then generalize this paradigm to the family of mirrordescent algorithms. Our work builds on recent work which connects proximal minimization to online and stochastic optimization. We focus in the algorithmic part on a new approach, called AdaGrad, in which the proximal function is adapted throughout the course of the algorithm in a data-dependent manner. This temporal adaptation metaphorically allows us to find needles in haystacks as the algorithm is able to single out very predictive yet rarely observed features. We conclude with several experiments on large-scale datasets that demonstrate the merits of composite objective optimization and underscore superior performance of various instantiations of AdaGrad.
  • Item
    Coordinate Sampling for Sublinear Optimization and Nearest Neighbor Search
    (Georgia Institute of Technology, 2011-04-22) Clarkson, Kenneth L. ; Almaden Research Center (IBM Research). Dept. of Computer Science Principles and Methodologies
    I will describe randomized approximation algorithms for some classical problems of machine learning, where the algorithms have provable bounds that hold with high probability. Some of our algorithms are sublinear, that is, they do not need to touch all the data. Specifically, for a set of points a[subscript 1]...a[subscript n] in d dimensions, we show that finding a d-vector x that approximately maximizes the margin min[subscript i] a[subscript i dot x can be done in O(n+d)/epsilon[superscript 2] time, up to logarithmic factors, where epsilon>0 is an additive approximation parameter. This was joint work with Elad Hazan and David Woodruff. A key step in these algorithms is the use of coordinate sampling to estimate dot products. This simple technique can be an effective alternative to random projection sketching in some settings. I will discuss the potential of coordinate sampling for speeding up some data structures for nearest neighbor searching in the Euclidean setting, via fast approximate distance evaluations.
  • Item
    PHAST: Hardware-Accelerated Shortest Path Trees
    (Georgia Institute of Technology, 2011-02-25) Delling, Daniel ; Microsoft Research Silicon Valley ; Georgia Institute of Technology. School of Computational Science and Engineering
    We present a novel algorithm to solve the nonnegative single-source shortest path problem on road networks and other graphs with low highway dimension. After a quick preprocessing phase, we can compute all distances from a given source in the graph with essentially a linear sweep over all vertices. Because this sweep is independent of the source, we are able to reorder vertices in advance to exploit locality. Moreover, our algorithm takes advantage of features of modern CPU architectures, such as SSE and multi-core. Compared to Dijkstra's algorithm, our method needs fewer operations, has better locality, and is better able to exploit parallelism at multi-core and instruction levels. We gain additional speedup when implementing our algorithm on a GPU, where our algorithm is up to three orders of magnitude faster than Dijkstra's algorithm on a high-end CPU. This makes applications based on all-pairs shortest-paths practical for continental-sized road networks. Several algorithms, such as computing the graph diameter, exact arc flags, or centrality measures (exact reaches or betweenness), can be greatly accelerated by our method. Joint work with Andrew V. Goldberg, Andreas Nowatzyk, and Renato F. Werneck.
  • Item
    Efficient High-Order Discontinuous Galerkin Methods for Fluid Flow Simulations
    (Georgia Institute of Technology, 2010-02-22) Shahbazi, Khosro ; Brown University. Division of Applied Mathematics
  • Item
    Virus Quasispecies Assembly using Network Flows
    (Georgia Institute of Technology, 2009-09-25) Zelikovsky, Alexander ; Georgia State University
    Understanding how the genomes of viruses mutate and evolve within infected individuals is critically important in epidemiology. In this talk I focus on optimization problems in sequence assembly for viruses based on 454 Lifesciences system. Several formulations of the quasispecies assembly problem and a measure of the assembly quality will be given. I will describe a scalable assembling method for quasispecies based on network flow and maximum likelihood formulations and then give details of existing and novel methods for reliably assembling quasipsecies that have very long common segments. Finally, I report the results of assembling 44 quasispecies from the 1700 bp long E1E2 region of Hepatitis C Virus.