Organizational Unit:
School of Mathematics

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 832
  • Item
    Improving Foundation Models
    (Georgia Institute of Technology, 2023-12-10) Komatsuzaki, Aran
    Foundation models are the family of models (e.g. GPT-4, CLIP) that are trained on a massive dataset and perform various down-streaming tasks, usually with either zero- or few-shot learning, optionally after fine-tuning. This dissertation presents a wide range of important measures we have made to make foundation models more efficient, performant and versatile. In particular, we focus on three points of improvement: architecture, dataset and training. We first present our findings on how to optimally scale language models, which leads to significant performance improvement. We then present GPT-J, one of the earliest open-source large language models. We then show that the performance of ViT and T5, both Transformer-based foundation models, can be greatly improved for a given compute budget using Sparse Upcycling, which is to resume training a sparsely gated model made out of pretrained dense models. We also briefly discuss LAION datasets, massive open-source datasets with around one billion pairs of text and image that are used to train various state-of-the-art multimodal models, and ARB benchmark, a highly challenging benchmark to measure the state-of-the-art LLMs such as GPT-4. On the theoretical side, we prove that feedforward layers of a transformer cannot be compressed without information loss, which may explain the power of sparsely gated models such as mixture-of-experts.
  • Item
    Mathematical Approaches to Identification problems -- Counting, RNA folding, and PDE identification
    (Georgia Institute of Technology, 2023-11-28) Tang Rajchel, Mengyi
    Mathematical algorithms have become an essential tool in uncovering hidden patterns and unraveling dynamic behaviors within complex datasets, aiding in gaining deeper insights and making informed choices in an era driven by data-driven decision-making. This thesis introduces several numerical algorithms addressing identification problems derived from mathematical models. These works place a specific emphasis on identifying and predicting structures and patterns within various types of datasets while also offering the capacity to forecast the behavior of future data. Our contributions include StemP, a novel algorithm using graph notations predicting RNA sequence secondary structures with simplicity and being deterministic, without a training process. Additionally, our work Counting Objects by Diffused Index(CODI) efficiently counts objects in digital images using a diffusion algorithm with an operator-splitting approach and the alternating direction minimization method inspired by color inpainting, delivering results within seconds.Furthermore, our works WeakIdent and FourierIdent focus on identifying differential equations in the physical and frequency domains, respectively. WeakIdent provides a general and robust framework for identifying differential equations, enhancing accuracy with proposed innovative mechanisms of narrow-fit and trimming. FourierIdent explores the benefits and challenges of frequency domain utilization in differential equation identification, presenting comprehensive experiments to demonstrate their benefits in robustness over state-of-the-art methods.
  • Item
    Set Images and Convexity Properties of Convolutions for Sum Sets and Difference Sets
    (Georgia Institute of Technology, 2023-07-31) Lee, Chi-Nuo
    Many recent breakthroughs in additive combinatorics, such as results relating to Roth's theorem or inverse sum set theorems, utilize a combination of Fourier analytical and physical methods. Physical methods refer to results relating to the physical space, such as almost-periodicity results regarding convolutions. This thesis will focus on the properties of convolutions. Given an abelian group $G$ and sets $A \subseteq G,$ we study the properties of the convolution for sum sets and difference sets, $1_A*1_A$ and $1_A*1_{-A}.$ Given $\bm{x} \in G^n,$ we consider its corresponding \emph{set image} of the sum set, the image of $f(A):= 1_A*1_A(\bm{x}),$ and the similarly defined set image of the difference set. We break down the study of set images into two cases, when $\bm{x}$ is independent, and when $\bm{x}$ is an arithmetic progression. In both cases, we provide some convexity result for the set image of both the sum set and difference set. For the case of the arithmetic progression, we prove convexity by first showing a recurrence relation for the distribution of the convolution. Finally, we prove a smoothness property regarding 4-fold convolutions $1_A*1_A*1_A*1_A.$ We then construct different examples to better understand possible bounds for the smoothness property in the case of 2-fold convolutions $1_A*1_A.$
  • Item
    Unfoldings of Convex Polyhedra
    (Georgia Institute of Technology, 2023-07-28) Barvinok, Nicholas
    A pseudo-edge graph of a convex polyhedron K is a 3-connected embedded graph in K whose vertices coincide with those of K, whose edges are distance minimizing geodesics, and whose faces are convex. We construct a convex polyhedron K in Euclidean 3-space with a pseudo-edge graph with respect to which K is not unfoldable. The proof is based on a result of Pogorelov on convex caps with prescribed curvature, and an unfoldability criterion for almost flat convex caps due to Tarasov. Our example, which has 340 vertices, significantly simplifies an earlier construction by Tarasov, and confirms that Durer's problem is false for pseudo-edge unfoldings. We then use the Maxwell-Cremona Correspondence to present evidence both for and against Durer's problem.
  • Item
    Theory and computation of Wasserstein geometric flows with application to time-dependent Schrodinger equation
    (Georgia Institute of Technology, 2023-07-26) Wu, Hao
    We focus on the systematic study of a novel computational framework to the Wasserstein geometric flows, which describe time evolution of probability density functions on the infinite-dimensional Wasserstein manifold. In particular, Wasserstein gradient flows (WGFs) and Wasserstein Hamiltonian flows (WHFs), are the main examples used throughout this research—they have many applications in real-world physics systems and more recently in deep learning problems such as generative models. The main feature of our computational framework is to use deep neural networks, to parameterize the push-forward maps such that they can push a simple reference density to the ones solving the WGFs or WHFs. This approach essentially reduces these flows defined on infinite-dimensional Wasserstein manifold to finite-dimensional dynamical systems of the parameters. These new dynamical systems are parameterizations of the WGFs and WHFs, which we call PWGFs and PWHFs for short. By leveraging a relaxed pullback Wasserstein metric on the parameter space, we can develop effective numerical methods to approximate the solutions of these flows. For WGFs, we show that our proposed PWGF scheme can be applied to WGF with general energy functional. Moreover, our scheme does not require any spatial discretization and thus is scalable to cases where the space dimensions of the problems are high. Our approach only requires solving standard least squares problems in each time step, hence is training free. With these features, PWGF demonstrates promising computational efficiency and accuracy on a variety of WGF examples, as shown in our numerical experiments. For WHFs, we adopt the similar idea but apply it to the more challenging Hamiltonian systems on Wasserstein manifolds. To preserve the Hamiltonian, we employ a symplectic numerical scheme to solve the PWHF, where a fixed-point iteration scheme is used to solve the implicit update equation of the model parameter. Similar to PWGF, PWHF is training free and thus avoids the issues of nonconvex optimization algorithms. We also present the connection between the Lagrangian and Eulerian perspectives of the original flows using PWHF. Approximation error analysis and a number of numerical examples are also provided using PWHF. Furthermore, we also consider the Schr\"odinger equations (SEs), and show how to use PWHF to solve them.
  • Item
    Applications of the filtered mapping cone surgery formulas in Heegaard Floer homology
    (Georgia Institute of Technology, 2023-07-25) Zhou, Hongyi
    Heegaard Floer homology is a package of powerful invariants for three manifolds and knots, equipped with many robust computational tools. Among them is the filtered mapping cone surgery formula by Hedden-Levine, which computes the knot Floer homology of the image of the knot meridian inside knot surgery. In particular, this formula allows one to explicitly compute knot Floer complex of knots in three manifolds other than S^3. This thesis will highlight the practical value of the filtered mapping cone formula in solving problems in low dimensional topology. We include two applications, and a refinement of the formula.
  • Item
    Informed Sampling in Discrete Space, and its Applications
    (Georgia Institute of Technology, 2023-07-25) Sun, Haoran
    Sampling has been an important problem in physics, statistics, computer science, and machine learning. When the target distribution is intractable, Metropolis-Hastings algorithms are widely used. Within the Metropolis-Hastings paradigm, informed sampling is defined as using the information of target distribution to guide the proposal distribution, which is typically referred to by gradient-based sampling in continuous space. Over the past decades, gradient-based sampling algorithms have significantly improved the sampling efficiency in continuous space from both theoretical and practical sides. However, the informed sampling in discrete space are less understood as the diffusion processes in continuous space do not apply in discrete space. In this thesis, we will introduce the recent advances of informed sampling in discrete space. Specifically \begin{itemize} \item {\bf Discrete Langevin Dynamics}\quad The Langevin dynamics, from which the gradient-based sampling algorithms in continuous space are designed, is the gradient flow to minimize the KL-divergence towards the target distribution on the Wasserstein manifold. Inspired by this connection, we derive the discrete Langevin dynamics as a continuous time Markov chain by leveraging the gradient flow on the Wasserstein manifold consisting of discrete distributions. \item {\bf Designing Informed discrete Samplers}\quad The discrete Langevin dynamics provide a principled framework to design the informed samplers in discrete space. We discuss the numerical methods regarding discrete time simulations of the discrete Langevin dynamics and the approximations of the target information to efficiently implement informed sampling in discrete space with the help of modern accelerators like GPUs. We also derive an asymptotic theorem that allows us to adaptively tune the parameters in informed samplers. \item {\bf Applications}\quad We investigate the applications of informed sampling in discrete space, including Monte Carlo integration, combinatorial optimization, and generative modeling. We demonstrate the excellent performance of informed sampling compared to classical methods like Gibbs sampling. We also build a benchmark of sampling in discrete space to facilitate future research. \end{itemize}
  • Item
    Divisors and multiplicities under tropical and signed shadows
    (Georgia Institute of Technology, 2023-07-24) Gunn, Trevor
    This thesis addresses questions related to divisors and multiplicities as analyzed through tropicalization or signs. It begins with a introduction to the subject matter written for a non-specialist. The next chapter concerns fully-faithful tropicalization in low dimension. The last two chapters concern questions about Baker-Lorscheid multiplicities in one and several variables respectively. With fully-faithful tropicalization, the goal was to construct a tropicalization map from a curve to a 3-dimensional toric variety. The constraints are that we need the map to be injective and we need the gcd of all the slopes to be 1, so that we get an isometry with respect to the lattice length metric. We also have some results about smooth, fully-faithful tropicalizations of a genus g curve in a toric variety of a dimension 2g + 2 (three more than the lower bound imposed by the maximal vertex degree). For multiplicities, I present a broad generalization of the work of Baker and Lorscheid for univariate multiplicities over hyperfields. In Baker and Lorscheid's work, they show how Descartes's Rule of Signs and Newton's Polygon Rule may be obtained from factorizing polynomials in the arithmetics of signs and tropical numbers respectively. In Chapter 3, I introduce a broad generalization of their multiplicity operator to a class of arithmetics, which I call "whole-idylls." In particular, we have a way of extending multiplicity rules by extending the arithmetic by a valuation. An important corollary is that for so-called "stringent" hyperfields, we have a degree bound: the sum of multiplicities for a polynomial is bounded by its degree. The last chapter contains my work with Andreas Gross on multivariate hyperfield multiplicities. We give particular attention to the hyperfield of signs and the so-far-unresolved Multivariate Descartes Question. We define several multiplicity operators for linear factors of polynomials and apply them to systems of equations. We recover the lower bound of Itenberg-Roy on any potential upper bound for roots with a given sign pattern.
  • Item
    Functional Itô Calculus for Lévy Processes (With a View Towards Mathematical Finance)
    (Georgia Institute of Technology, 2023-07-24) Viquez Bolanos, Jorge Aurelio Aurelio
    We examine the relationship between Dupire’s functional derivative and a variant of the functional derivative developed by Kim for analyzing functionals in systems with delay. Our findings demonstrate that if Dupire’s space derivatives exist, differentiability in any continuous functional direction implies differentiability in any other direction, including the constant one. Additionally, we establish that co-invariant differentiable functionals can lead to a functional Itô formula in the Cont and Fournié path-wise setting under the right regularity conditions. Next, our attention turns to functional extensions of the Meyer-Tanaka formula and the efforts made to characterize the zero-energy term for integral representations of functionals of semimartingales. Using Eisenbaum’s idea for reversible semimartingales, we obtain an optimal integration formula for Lévy processes, which avoids imposing additional regularity requirements on the functional’s space derivative and extends other approaches using the stationary and martingale properties of Lévy processes. Finally, we address the topic of integral representations for the Delta of a path-dependent pay-off, which generalizes Benth, Di Nunno, and Khedher’s framework for the approximation of functionals of jump-diffusions to cases where they may be driven by a process satisfying a path-dependent differential equation. Our results extend Jazaerli and Saporito’s formula for the Delta of functionals to the jump-diffusion case. We propose an adjoint formula for the horizontal derivative, hoping to obtain more tractable formulas for the Delta of value options with strongly path-dependent pay-offs.
  • Item
    Tutte paths and even covers
    (Georgia Institute of Technology, 2023-07-24) Wigal, Michael Carroll
    A Tutte path of a graph G is a path P of G such that every component of G − P has at most three attachments on P. Tutte paths are well studied in the literature due to their applications towards the Hamiltonian cycle problem. We prove the existence of Tutte paths in which the number of components is bounded for circuit graphs, a natural family of planar graphs which generalizes 3-connected planar graphs. As a consequence, we obtain a sharp lower bound for the circumference of essentially 4-connected planar graphs, answering a conjecture of Fabrici, Harant, Mohr, and Schmidt. The Traveling Salesperson Problem (TSP) is a foundational problem in the optimization literature and generalizes the Hamiltonian cycle problem. Motivated by the TSP, we inves- tigate even covers of subcubic graphs, i.e., finding a small number of cycles that cover the majority of the vertices (a graph is subcubic if its maximum degree is 3). As an application, we will show that if G is a 2-connected subcubic graph with n vertices and n_2 vertices of degree 2, then G has a TSP walk of length at most (5n+n_2)/4−1, establishing a conjecture of Dvořák, Král', and Mohar. There are an infinite family of subcubic (respectively, cubic) graphs whose minimum TSP walk have length (5n + n_2)/4 − 1 (respectively, 5n/4 − 2). As this walk can be found in quadratic time, this provides a state-of-the-art 5/4-approximation algorithm for the TSP on 2-connected cubic graphs, improving the prior best guarantee of 9/7.