Data-driven Discovery of Differential Equations, Binary Classifiers and Spectrum of Dynamical Systems
Loading...
Author(s)
Cheng, Jiahui
Advisor(s)
Liao, Wenjing
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Data driven methods have been widely applied in many fields, for example healthcare, finance, marketing manufacturing. This dissertation focuses on scientific computation in data driven discovery: High-dimensional binary classification in machine learning, identification of differential equations from noisy observations and spectrum estimation of dynamical systems from partial trajectory data.
The first part of the dissertation studies high dimensional binary classification, particularly under label shift, where the class priors differ between training and test distributions. Classical theoretical analysis of linear classifiers, such as Fisher's Linear Discriminant classifier, often considers the underparameterized regime, where the sample size is much larger than the data dimension. However, modern applications (e.g., neural networks) often operate in the overparameterized setting. The research under the overparametrized regime is very limited. To bridge this gap, we establish a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification. In particular, its asymptotic behavior under the label shift is studied, and we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.
The second part mainly focuses on the identification of differential equations from data, where significant progresses are made with the weak/integral formulation in the realm of data-driven modeling. We explore weighting the test function for better identification of differential equations from a given single set of noisy observations. Using test functions and formulating differential equations in a weak form shows various advantages, thus, we explore if one can find an optimal test function adaptively given the observed data. Although this is a difficult task, we propose a new method in this direction by weighting a collection of localized test functions. We find using high dynamic region is effective in finding the equation as well as the coefficients, and propose dynamics indicator per differential terms and weight the weak form equation accordingly. To be more stable against noise, we further consider occurrence voting for equation identification. Systematic numerical experiments are provided to demonstrate the robustness of our method with high level noise.
Lastly, the third part of this dissertation addresses the challenge of estimating the spectrum from partial observations in affine dynamical systems. Accurate spectrum estimation is crucial for understanding the stability and long-term behavior of dynamical systems, particularly when complete data access is limited or costly. Traditional spectral analysis methods often assume full-state observations or multiple trajectories, which are impractical in many real-world scenarios. To overcome these limitations, we establish a spectral estimation algorithm from partial data, with theoretical guarantees, and propose various reconstruction algorithms generalizing the classical Prony method, ESPIRIT and matrix pencil method. Extensive numerical studies demonstrate the effectiveness of the proposed algorithms.
Sponsor
Date
2025-04-24
Extent
Resource Type
Text
Resource Subtype
Dissertation