Organizational Unit:
Transdisciplinary Research Institute for Advancing Data Science
Transdisciplinary Research Institute for Advancing Data Science
Permanent Link
Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Includes Organization(s)
ArchiveSpace Name Record
Publication Search Results
Now showing
1  10 of 16

ItemLecture 5: Mathematics for Deep Neural Networks: Energy landscape and open problems( 20190318) SchmidtHieber, Johannes ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; University of Twente. Dept. of Applied MathematicsTo derive a theory for gradient descent methods, it is important to have some understanding of the energy landscape. In this lecture, an overview of existing results is given. The second part of the lecture is devoted to future challenges in the field. We describe important future steps needed for the future development of the statistical theory of deep networks.

ItemThe Debiased Lasso(Georgia Institute of Technology, 20180906) van de Geer, Sara ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; Georgia Institute of Technology. School of Mathematics ; Eidgenössische Technische Hochschule Zürich ; ETH ZürichThere will be three lectures, which in principle will be independent units. Their common theme is exploiting sparsity in highdimensional statistics. Sparsity means that the statistical model is allowed to have quite a few parameters, but that it is believed that most of these parameters are actually not relevant. We let the data themselves decide which parameters to keep by applying a regularization method. The aim is then to derive socalled sparsity oracle inequalities. In the first lecture, we consider a statistical procedure called Mestimation. "M" stands here for "minimum": one tries to minimize a risk function, in order to obtain the best fit to the data. Lease squares is a prominent example. Regularization is done by adding a sparsity inducing penalty that discourages too good a fit to the data. An example is the l₁penalty which together with least squares gives to an estimation procedure called the Lasso. We address the question: why does the l₁penalty lead to sparsity oracle inequalities and how does this generalize to other norms? We will see in the first lecture that one needs conditions which relate the penalty to the risk function. They have in a certain sense to be “compatible”. We discuss these compatibility conditions in the second lecture in the context of the Lasso, where the l₁penalty needs to be compatible with the least squares risk, i.e. with the l₂norm. We give as example the total variation penalty. For D := {x1,…,xn} ⊂ R an increasing sequence, the total variation of a function f : D > R is the sum of the absolute values of its jump sizes. We derive compatibility and as a consequence a sparsity oracle inequality which shows adaptation to the number of jumps. In the third lecture we use sparsity to establish confidence intervals for a parameter of interest. The idea is to use the penalized estimator as an initial estimator in a onestep NewtonRaphson procedure. Functionals of this new estimator that can under certain conditions be shown to be asymptotically normally distributed. We show that in the highdimensional case, one may further profit from sparsity conditions if the inverse Hessian of the problem is not sparse.

ItemLecture 3: Mathematics for Deep Neural Networks: Advantages of Additional Layers( 20190313) SchmidtHieber, Johannes ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; University of Twente. Dept. of Applied MathematicsWhy are deep networks better than shallow networks? We provide a survey of the existing ideas in the literature. In particular, we discuss localization of deep networks, functions that can be easily approximated by deep networks and finally discuss the KolmogorovArnold representation theorem.

ItemLecture 4: Mathematics for Deep Neural Networks: Statistical theory for deep ReLU networks( 20190315) SchmidtHieber, Johannes ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; University of Twente. Dept. of Applied MathematicsWe outline the theory underlying the recent bounds on the estimation risk of deep ReLU networks. In the lecture, we discuss specific properties of the ReLU activation function that relate to skipping connections and efficient approximation of polynomials. Based on this, we show how risk bounds can be obtained for sparsely connected networks.

ItemLecture 1: Mathematics for Deep Neural Networks( 20190306) SchmidtHieber, Johannes ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; University of Twente. Dept. of Applied MathematicsThere are many different types of neural networks that differ in complexity and the data types that can be processed. This lecture provides an overview and surveys the algorithms used to fit deep networks to data. We discuss different ideas that underly the existing approaches for a mathematical theory of deep networks.

ItemLecture 2: Mathematics for Deep Neural Networks: Theory for shallow networks( 20190308) SchmidtHieber, Johannes ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; University of Twente. Dept. of Applied MathematicsWe start with the universal approximation theorem and discuss several proof strategies that provide some insights into functions that can be easily approximated by shallow networks. Based on this, a survey on approximation rates for shallow networks is given. It is shown how this leads to estimation rates. In the lecture, we also discuss methods that fit shallow networks to data.

ItemLecture 4: Spectral Methods Meets Asymmetry: Two Recent Stories( 20190904) Chen, Yuxin ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; Princeton University. Dept. of Electrical EngineeringThis talk is concerned with the interplay between asymmetry and spectral methods. Imagine that we have access to an asymmetrically perturbed lowrank data matrix. We attempt estimation of the lowrank matrix via eigendecomposition  an uncommon approach when dealing with nonsymmetric matrices. We provide two recent stories to demonstrate the advantages and effectiveness of this approach. The first story is concerned with topK ranking from pairwise comparisons, for which the spectral method enables unimprovable ranking accuracy. The second story is concern with matrix denoising and spectral estimation, for which the eigendecomposition method significantly outperforms the (unadjusted) SVDbased approach and is fully adaptive to heteroscedasticity without the need of careful bias correction. The first part of this talk is based on joint work with Cong Ma, Kaizheng Wang, and Jianqing Fan; the second part of this talk is based on joint work with Chen Cheng and Jianqing Fan.

ItemLecture 3: Projected Power Method: An Efficient Algorithm for Joint Discrete Assignment( 20190903) Chen, Yuxin ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; Princeton University. Dept. of Electrical EngineeringVarious applications involve assigning discrete label values to a collection of objects based on some pairwise noisy data. Due to the discreteand hence nonconvexstructure of the problem, computing the optimal assignment (e.g. maximum likelihood assignment) becomes intractable at first sight. This paper makes progress towards efficient computation by focusing on a concrete joint discrete alignment problemthat is, the problem of recovering n discrete variables given noisy observations of their modulo differences. We propose a lowcomplexity and modelfree procedure, which operates in a lifted space by representing distinct label values in orthogonal directions, and which attempts to optimize quadratic functions over hypercubes. Starting with a first guess computed via a spectral method, the algorithm successively refines the iterates via projected power iterations. We prove that for a broad class of statistical models, the proposed projected power method makes no errorand hence converges to the maximum likelihood estimatein a suitable regime. Numerical experiments have been carried out on both synthetic and real data to demonstrate the practicality of our algorithm. We expect this algorithmic framework to be effective for a broad range of discrete assignment problems. This is joint work with Emmanuel Candes.

ItemCompatibility and the Lasso(Georgia Institute of Technology, 20180904) van de Geer, Sara ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; Georgia Institute of Technology. School of Mathematics ; Eidgenössische Technische Hochschule Zürich ; ETH ZürichThere will be three lectures, which in principle will be independent units. Their common theme is exploiting sparsity in highdimensional statistics. Sparsity means that the statistical model is allowed to have quite a few parameters, but that it is believed that most of these parameters are actually not relevant. We let the data themselves decide which parameters to keep by applying a regularization method. The aim is then to derive socalled sparsity oracle inequalities. In the first lecture, we consider a statistical procedure called Mestimation. "M" stands here for "minimum": one tries to minimize a risk function, in order to obtain the best fit to the data. Lease squares is a prominent example. Regularization is done by adding a sparsity inducing penalty that discourages too good a fit to the data. An example is the l₁penalty which together with least squares gives to an estimation procedure called the Lasso. We address the question: why does the l₁penalty lead to sparsity oracle inequalities and how does this generalize to other norms? We will see in the first lecture that one needs conditions which relate the penalty to the risk function. They have in a certain sense to be “compatible”. We discuss these compatibility conditions in the second lecture in the context of the Lasso, where the l₁penalty needs to be compatible with the least squares risk, i.e. with the l₂norm. We give as example the total variation penalty. For D := {x1,…,xn} ⊂ R an increasing sequence, the total variation of a function f : D > R is the sum of the absolute values of its jump sizes. We derive compatibility and as a consequence a sparsity oracle inequality which shows adaptation to the number of jumps. In the third lecture we use sparsity to establish confidence intervals for a parameter of interest. The idea is to use the penalized estimator as an initial estimator in a onestep NewtonRaphson procedure. Functionals of this new estimator that can under certain conditions be shown to be asymptotically normally distributed. We show that in the highdimensional case, one may further profit from sparsity conditions if the inverse Hessian of the problem is not sparse.

ItemSharp Oracle Inequalities for NonConvex Loss(Georgia Institute of Technology, 20180831) van de Geer, Sara ; Georgia Institute of Technology. Transdisciplinary Research Institute for Advancing Data Science ; Eidgenössische Technische Hochschule Zürich ; ETH ZürichThere will be three lectures, which in principle will be independent units. Their common theme is exploiting sparsity in highdimensional statistics. Sparsity means that the statistical model is allowed to have quite a few parameters, but that it is believed that most of these parameters are actually not relevant. We let the data themselves decide which parameters to keep by applying a regularization method. The aim is then to derive socalled sparsity oracle inequalities. In the first lecture, we consider a statistical procedure called Mestimation. "M" stands here for "minimum": one tries to minimize a risk function, in order to obtain the best fit to the data. Lease squares is a prominent example. Regularization is done by adding a sparsity inducing penalty that discourages too good a fit to the data. An example is the l₁penalty which together with least squares gives to an estimation procedure called the Lasso. We address the question: why does the l₁penalty lead to sparsity oracle inequalities and how does this generalize to other norms? We will see in the first lecture that one needs conditions which relate the penalty to the risk function. They have in a certain sense to be “compatible”. We discuss these compatibility conditions in the second lecture in the context of the Lasso, where the l₁penalty needs to be compatible with the least squares risk, i.e. with the l₂norm. We give as example the total variation penalty. For D := {x1,…,xn} ⊂ R an increasing sequence, the total variation of a function f : D > R is the sum of the absolute values of its jump sizes. We derive compatibility and as a consequence a sparsity oracle inequality which shows adaptation to the number of jumps. In the third lecture we use sparsity to establish confidence intervals for a parameter of interest. The idea is to use the penalized estimator as an initial estimator in a onestep NewtonRaphson procedure. Functionals of this new estimator that can under certain conditions be shown to be asymptotically normally distributed. We show that in the highdimensional case, one may further profit from sparsity conditions if the inverse Hessian of the problem is not sparse.