On Parameter Efficiency of Neural Language Models

(Georgia Institute of Technology, 2024-01-04) Liang, Chen

In recent years, pre-trained neural language models have achieved remarkable capabilities across various natural language understanding and generation tasks. However, the trend of scaling these models to encompass billions of parameters, while enhancing adaptability and emergent capabilities, has brought forth significant deployment challenges due to their massive size. These challenges include constraints in model storage and inference latency for real-world deployment, intensive time and computational costs for task adaptation, and the presence of substantial redundant parameters that affect task adaptability. Motivated by these challenges, this thesis aims to improve the parameter efficiency of these models, seeking to minimize storage requirements, accelerate inference and adaptation, and enhance generalizability. \noindent {\it -- Improving Parameter Utilization in Neural Language Models} \\ While recent studies have identified significant redundancy in pre-trained neural language models, the impact of parameter redundancy on model generalizability remains largely underexplored. We first examine the relationship between parameter redundancy and model generalizability. Observing that removing redundant parameters improves generalizability, we propose an adaptive optimization algorithm for fine-tuning to improve the utilization of the redundant parameters. Experimental results validate increased generalization across various downstream tasks. \noindent {\it -- Model Compression in Neural Language Models} \\ We explore model compression methods, including weight pruning and knowledge distillation, to reduce model storage and accelerate inference. We first develop a reliable iterative pruning method that accounts for uncertainties in training dynamics. Then, we dive into the realm of knowledge distillation, addressing the large teacher-student ``knowledge gap" that often hampers the student's performance. To tackle this, we offer two solutions for producing students for specific tasks by selectively distilling task-relevant knowledge. In scenarios demanding student adaptability across diverse tasks, we propose to reduce the knowledge gap by combining iterative pruning with distillation. Our approaches significantly surpass conventional distillation methods at similar compression ratios. \noindent {\it -- Efficient Task Adaptation in Neural Language Models} \\ While fine-tuning is an essential adaptation method for attaining satisfactory performance on downstream tasks, it is both computation-intensive and time-consuming. To speed up task adaptation, we study the hypernetwork approach, which employs an auxiliary hypernetwork to swiftly generate task-specific weights based on few-shot demonstration examples. We improve the weight generation scheme by exploiting the intrinsic weight structure as an inductive bias, enhancing sample efficiency for hypernetwork training. The method shows superior generalization performance on unseen tasks compared to existing hypernetwork methods.
Online Decision Making Under Information-Theoretic Constraints

(Georgia Institute of Technology, 2023-12-15) Chang, Meng-Che

We consider the problems of online decision-making under two categories of information theoretical constraints, namely, security and communication constraints. Three types of security constraints considered in this thesis are secrecy, covertness, and robustness, where the objectives are to make the adversary have little knowledge about the unknown parameter, to make the decision-making process undetectable, and to make the decision-making algorithm robust to adversarial attacks, respectively. We formulate the decision-making problems with these three types of security-related constraints formally and analyze the performance of decision-making algorithms under these constraints. In the second half of this thesis, we analyze the tradeoff between the detection error exponent and the communication rate under the framework of joint communication and sensing. Both mono-static and bi-static joint communication and sensing models are considered in the thesis. Finally, an extension to the case when the transmission window varies with observations is also made to emphasize the benefit of adaptivity.
Data driven approaches to address inaccurate nosology in mental health from neuroimaging data

(Georgia Institute of Technology, 2023-12-14) Rokham, Hooman

This research addresses the pervasive challenge of label noise in machine learning, particularly in sensitive domains like medical applications and psychiatry. Label noise, stemming from sources such as inadequate information and human error, poses a significant threat to the accuracy of classification models, especially in medical imaging where mislabeled data can lead to harmful outcomes. In the realm of psychiatry, existing categorizations of psychosis face complexities due to unreliability and heterogeneity. The research aims to achieve two key objectives: first, to develop robust frameworks and algorithms for detecting and rectifying incorrect labels in datasets, focusing on semi-supervised auto-labeling approaches to enhance data homogeneity. Second, the research delves into biomarker discovery for mood and psychosis disorders, seeking to unveil latent patterns within neuroimaging data that could serve as vital biomarkers, revolutionizing diagnostic and classification methods for these complex mental health conditions.
Path-Based Differential Algorithm and Graph Theory-Based Analysis on Neuroimaging Data

(Georgia Institute of Technology, 2023-12-14) Falakshahi, Haleh

Graph theoretical methods have emerged as crucial tools for exploring the intricate networks within the human brain, spanning disciplines such as neuroscience, cognitive science, psychiatry, psychology, and the study of brain disorders and development. However, research in this realm has traditionally concentrated on assessing local and global graph metrics, inadvertently neglecting the rich information embedded within the intricate paths that interconnect distinct brain regions. This gap in knowledge motivated the development of an innovative algorithm aimed at identifying multi-step paths in patient groups by comparing them to control cohorts. Following path identification, a covariance decomposition approach is employed to delve into the connections shared between pairs of brain nodes, offering a deeper understanding of network dynamics. The application of this methodology is exemplified through the analysis of resting-state functional MRI data from individuals with schizophrenia, yielding valuable insights into the presence of disconnectors within and between specific functional domains, with a particular focus on the default mode and cognitive control networks. Additionally, an extensive longitudinal study investigates the processes associated with healthy aging, employing advanced neuroimaging techniques and cognitive assessments. This comprehensive approach spans from individuals in their mid-30s to centenarians, revealing dynamic changes in brain networks. These findings underscore the importance of considering both static and dynamic network characteristics and highlight specific graph metrics that hold relevance in elucidating the cognitive changes associated with the aging process. Furthermore, the proposed path analysis algorithm detects disrupted pathways, shedding light on potential path-based biomarkers. Altogether, these research endeavors expand our understanding of brain network dynamics in health and disease, with implications for both clinical applications and the broader study of brain function.
Overcoming Longstanding Synthesis Challenges Toward Realizing the Full Device Potential of III-Nitride Semiconductors

(Georgia Institute of Technology, 2023-12-13) Matthews, Chris

III-nitrides have the potential to address a large number of fundamental needs for semiconductor device applications, including full-spectrum/tandem-with-silicon solar cells (InGaN), RGB LEDs (InGaN), UV LEDs and lasers (AlGaN), and high-power diodes and transistors (AlGaN). However, many of these applications remain unrealized due to challenges in growing high-quality material by traditional growth techniques like metalorganic chemical vapor deposition (MOCVD) and molecular beam epitaxy (MBE). Through control of growth kinetics, metal-modulated epitaxy (MME) has been shown to have success in growing III-nitrides, especially highly doped films and ternary alloys with compositions in the miscibility gaps unreachable by other techniques. This control of growth kinetics is expected to lead to the realization of devices that have driven interest in this material system but have thus far been unachievable. This dissertation focuses on progressing the understanding of material synthesis and properties at the extreme ends of the III-nitride material range, with a particular emphasis on InGaN, AlGaN, and AlN. The history of InGaN synthesis as it relates to phase separation is reviewed, and a revised definition of phase separation is proposed. The prior assumption of spinodal decomposition in III-nitrides is re-examined and found to be unlikely due to the density and packing of these materials. Phase separation is reconsidered as a function of surface processes, especially for epitaxy by physical vapor deposition methods such as MBE and MME. A set of surface processes (thermal decomposition, lateral cation separation, vertical cation segregation, and preferential incorporation) are proposed to contribute to phase separation in InGaN, AlInN, and even AlGaN. This revised definition of phase separation in III-nitrides is discussed as needing further examination experimentally, theoretically, or both. The range of growth conditions for MME is much larger than for MOCVD or MBE, and the throughput of all of these techniques is low, so it is necessary to develop a model that can quantitatively describe the growth kinetics under any growth condition in order to simplify the task of quantifying the revised definition of phase separation and eventually realizing devices of interest. Such a model is described, implemented, and evaluated against experimental data from phase-separated AlGaN. This model is simplified to only include vertical cation segregation and preferential incorporation due to the high thermal stability of AlGaN and high growth rates used in MME. Both mechanisms are found to be important in modeling phase separation in AlGaN that is similar to the experimental data. Previously, a major impediment to AlN semiconductor device progress was achieving high, bulk carrier concentrations through impurity doping. Low-temperature epitaxial methods are investigated and found to play a key role in enabling the doping of AlN and the eventual realization of AlN-based semiconductor devices. Both silicon and beryllium doping of AlN are hindered by temperature-dependent processes during epitaxy, such as lattice expansion, dopant desorption, and generation of compensating impurities. Using metal-modulated epitaxy to grow AlN at low temperatures, p- and n-type AlN films with carrier concentrations of 4.4 × 1018 cm-3 and 6 × 1018 cm-3 and resistivities of 0.045 Ω-cm and 0.02 Ω-cm, respectively, are achieved. Doping and defect states in doped aluminum nitride films are examined via cathodoluminescence (CL) spectroscopy. Energy levels within the band gap are observed and potential associated defects are proposed. Fermi-Dirac statistics are used to identify three effective donor states in Si-doped AlN and a single effective acceptor energy in Be-doped AlN. CL investigation reveals near-band-edge and defect luminescence for both n- and p-type AlN films. AlN is found to be a promising optoelectronic material, but requires significant further study on contaminant and defect mitigation before high-quality devices can be realized.
Magnetic Steering to Save Sight: Trabecular Meshwork Cell Therapy as a Treatment for Primary Open Angle Glaucoma

(Georgia Institute of Technology, 2023-12-12) Bahrani Fard, Mohammad Reza

Glaucoma, which affects almost 80 million people worldwide, is the main cause of irreversible blindness. The most common type, primary open angle glaucoma (POAG), causes gradual loss of vision by damaging retinal ganglion cells. The major risk factor for POAG is high intraocular pressure (IOP). Current clinical treatments for POAG aim to reduce IOP, but they often have low success rates. The trabecular meshwork (TM) is a key regulator of IOP and has been shown to undergo significant changes in POAG including a loss of cells. This motivates the regeneration or restoration of the TM as a potential treatment for POAG. While TM cell therapy has shown promise in reversal of POAG pathology, previously-developed cell delivery techniques have resulted in poor cell delivery efficiency which elevates the risk of tumorigenicity and immunogenicity and undermines therapeutic potential. In addition, a lack of comprehensive characterization of the treatment effects in an appropriate POAG model is a roadblock to clinical translation. We here tackled these shortcomings by: 1) using an optimized magnetic delivery method to significantly improve the specificity and efficiency of delivery of cells to the mouse TM, in turn reducing the risk of unwanted side-effects, and 2) employing this optimized method to test the therapeutic capabilities of two types of cells in a mutant myocilin mouse model of ocular hypertension, characterizing the morphological and functional benefits of the treatment. The central hypothesis of this work is that an optimized magnetically-driven TM cell therapy can lead to long-term clinically significant levels of IOP reduction while minimizing the risks associated with unwanted off-target cell-delivery. This work resulted in the development of a novel magnetic TM cell therapy technique which outperformed those used previously. Employing this technique proved adipose-derived mesenchymal stem cells (hAMSC) and induced pluripotent stem cells differentiated towards a TM phenotype (iPSC-TM) to be effective in IOP lowering. Mesenchymal stem cells showed superior efficacy by stably lowering the IOP by 27% for 9 months, accompanied by increased cellularity in the conventional outflow pathway. These findings, bring magnetic TM cell therapy one step closer to clinical translation.
An observational and modeling study of energy, water, and carbon transport in eco-hydro-meteorological systems

(Georgia Institute of Technology, 2023-12-12) Zhu, Modi

Eco-hydro-meteorological systems play a critical role in regulating the Earth's energy, water, and carbon cycles. Understanding the physical mechanisms driving ecosystem functioning is essential for predicting and mitigating the impacts of global environmental change. The primary objective of this study is to understand the complex mechanisms and interactions that govern the transport of energy, carbon, and water in various eco-hydro-meteorological systems. However, the mechanisms in different eco-hydro-meteorological systems are quite different. This study, by employing a blend of observational data and modeling techniques, investigates the physical transportation of energy, water, and carbon within diverse ecosystems --forest, permafrost, and lake --each with its distinct mechanisms, and develops a comprehensive understanding of how these ecosystems function and respond to environmental changes. In the observational phase, data is gathered using flux towers that measure the exchange of energy, water, and carbon between the Earth's surface and the atmosphere. Datasets from multiple flux towers across forest, permafrost, and lake ecosystems are scrutinized to discern patterns and drivers of eco-hydro-meteorological system processes. The observations have revealed the differences of how energy, water, and carbon are transported in different eco-hydro-meteorological systems and the importance of further study. In the modeling phase, the past traditional models of energy, water, and carbon transport of eco-hydro-meteorological systems have been carefully reviewed. The non-gradient models are widely applied in modeling the meteorological processes in recent decades. This study utilizes Maximum Entropy Production (MEP) Model and Half-order Derivative (HOD) Methods together with newly proposed inference models to simulate the eco-hydro-meteorological processes, which yielded consistent results compared to field experiments. Overall, this study has significant implications for our understanding of how eco-hydro-meteorological systems function and how they respond to environmental changes. The knowledge gained from this research could inform the development of policies and strategies to promote environmental sustainability and protect these vital ecosystems for future generations.
Neural-network representations of chemical kinetics

(Georgia Institute of Technology, 2023-12-12) Sabenca Gusmao, Gabriel

High-fidelity microkinetic models (MKMs) have provided a framework for understanding and mathematically representing the elementary processes underpinning catalytic reactions in terms of differential equations. Computational chemistry methods, such as density functional theory, have enabled the estimation of the thermochemistry of elementary reactions on different catalytic materials from first-principles. MKMs constructed around computational chemistry methods have proven useful in determining trends in catalytic activity across different materials. Nevertheless, there are still challenges to overcome: (i) the inclusion of lateral interactions and solvation effects in models leads to over-parameterization, making the mean-field approximation useless and approximating MKMs to kinetic Monte-Carlo; (ii) uncertainties in the structure of the active site and the detailed mechanism, and (iii) non-standardization in the reported thermochemistry models. In this thesis, we introduce a general and unifying algebraic framework that uses singular value decomposition to assess the connectivity of complex reaction networks. Such a framework addresses the standardization issue by leveraging the use of thermochemical data from multiple sources, allowing them to be re-referenced or combined in extended reaction mechanisms. It also generalizes the construction of descriptor-based models by providing a means to quantify the explained variance by each descriptor. With this general algebraic representation of MKM thermochemistry, we set our focus on creating methods to bridge information from transient experiments to high-fidelity MKMs. Physics-informed neural networks (PINNs) have proven to be a suitable mathematical scaffold for solving inverse ordinary (ODE) and partial differential equations (PDE). In this work, we devise an application of the PINNs formulation designed to address inverse kinetics problems, which we call Kinetics-Informed Neural Networks (KINNs). It consists of soft-constrained multi-objective optimization problems that include a hyperparameter that controls the variance between adhering to physical laws and interpolating observed data. We further bridge the statistical formulation of the error probability density in inverse PINNs to frame it in terms of maximum-likelihood estimators (MLE), which allows explicit error propagation from interpolation to the physical model space through Taylor expansion, thereby eliminating the need for hyperparameter tuning. We explore its application to high-dimensional coupled ODEs constrained by differential algebraic equations that are common in transient chemical and biological kinetics. Furthermore, we show that singular-value decomposition (SVD) of the ODE coupling matrices (reaction stoichiometry matrices) provides reduced uncorrelated subspaces in which PINNs solutions can be represented and over which residuals can be projected. Finally, SVD bases serve as preconditioners for the inversion of covariance matrices in this hyperparameter-free robust application of MLE to KINNs, in robust-KINNs (rKINNs). Our exploration extends to applying rKINNs to high-fidelity MKMs constructed from an amalgamation of literature-reported DFT energies relying on the developed unified algebraic thermochemistry framework. We embed domain knowledge into the singular computational perturbation (CSP) approach to satisfy MKM constitutional constraints and generate synthetic data by solving the stiff forward differential equations associated with transient reactor models of laboratory-scale. In this endeavor, we probe the limits of realistic chemical dynamics recoverability, using the ab-initio predicted timescales of the RWGS MKM as a case study.
Modeling and Simulation of Industrial Membrane Processes Using Complex Mixtures for Integration in Process Simulation Environments

(Georgia Institute of Technology, 2023-12-12) Weber, Dylan Jacob

The goal of this work is to enable design, optimization, and control of membrane-based separation processes that encounter complex industrial streams of up to thousands of components. These mixture components can have boundless concentrations and interactions between them. Presently, tools for such processes are non-existent. For chemical engineers, after synthesizing the chemical of interest, half of the job is separating it. Traditional separations rely on energy intensive heat and specialty chemicals which generate pollutants and contribute to climate change. Membrane-based separations alleviate these effects by using electrical energy which can be based on renewable resources. This thesis achieves this goal by asserting the following objectives: (i) develop improved numerical methods for local membrane transport of complex mixtures, (ii) extend models for predicting complex mixture sorption and diffusion, (iii) develop a software package for membrane process simulation to use within process flowsheet simulation environments, and (iv) present preliminary process design and control strategies for transport of complex mixtures through ion-exchange membrane modules (to shift towards electrochemical membrane-based separations for nutrient recovery from ubiquitous waste streams). The numerical methods, models, and software package presented has been, and will continue to be utilized by researchers and engineers to design, optimize, and control membrane-based processes as a green alternative for separations in the oil, bio-refinery, paper making, and water treatment industries.
Robotic System to Motivate Long Term Infant Kicking for Motor Development Progression

(Georgia Institute of Technology, 2023-12-12) Emeli, Victor

The spontaneous kicking patterns of an infant provide markers that may predict the progression of motor development. Consistent atypical kicking behaviors can forecast irregularities in future development. One main indicator of impaired motor development is the progressive advancement of spasticity in muscle groups. For at-risk infants, physical therapy that encourages kicking motions can help reduce the onset of spasticity, especially if initiated at an early age. Traditionally, physical therapy is conducted by health professionals in a clinical setting, which can be labor intensive and costly. A method that increases physical therapy opportunities by providing an in-home system that motivates kicking motions and operates without immediate clinical supervision would be beneficial. We introduce a system that utilizes 3D computer vision and a robotic infant mobile to detect spontaneous kicking patterns and activate mobile stimuli to encourage prolonged kicking activity. The visual classification of kick or non-kick activity is used to activate the mobile stimuli with the goal of encouraging continued kicking patterns. We also employ statistical techniques to calculate kick amplitude, kick intensity, and kick deviation. These parameters provide insight into kick features and provide measurements to validate the influence of the mobile stimuli on kicking behavior. Additionally, we develop algorithms to identify mobile stimuli preferences that are unique to each infant for encouraging prolonged kicking activity. Finally, we investigate methods for reducing the complexity of the system by employing 2D data estimation for real-world use cases.