Theses and Dissertations

Item

On Parameter Efficiency of Neural Language Models

(Georgia Institute of Technology, 2024-01-04) Liang, Chen

In recent years, pre-trained neural language models have achieved remarkable capabilities across various natural language understanding and generation tasks. However, the trend of scaling these models to encompass billions of parameters, while enhancing adaptability and emergent capabilities, has brought forth significant deployment challenges due to their massive size. These challenges include constraints in model storage and inference latency for real-world deployment, intensive time and computational costs for task adaptation, and the presence of substantial redundant parameters that affect task adaptability. Motivated by these challenges, this thesis aims to improve the parameter efficiency of these models, seeking to minimize storage requirements, accelerate inference and adaptation, and enhance generalizability. \noindent {\it -- Improving Parameter Utilization in Neural Language Models} \\ While recent studies have identified significant redundancy in pre-trained neural language models, the impact of parameter redundancy on model generalizability remains largely underexplored. We first examine the relationship between parameter redundancy and model generalizability. Observing that removing redundant parameters improves generalizability, we propose an adaptive optimization algorithm for fine-tuning to improve the utilization of the redundant parameters. Experimental results validate increased generalization across various downstream tasks. \noindent {\it -- Model Compression in Neural Language Models} \\ We explore model compression methods, including weight pruning and knowledge distillation, to reduce model storage and accelerate inference. We first develop a reliable iterative pruning method that accounts for uncertainties in training dynamics. Then, we dive into the realm of knowledge distillation, addressing the large teacher-student ``knowledge gap" that often hampers the student's performance. To tackle this, we offer two solutions for producing students for specific tasks by selectively distilling task-relevant knowledge. In scenarios demanding student adaptability across diverse tasks, we propose to reduce the knowledge gap by combining iterative pruning with distillation. Our approaches significantly surpass conventional distillation methods at similar compression ratios. \noindent {\it -- Efficient Task Adaptation in Neural Language Models} \\ While fine-tuning is an essential adaptation method for attaining satisfactory performance on downstream tasks, it is both computation-intensive and time-consuming. To speed up task adaptation, we study the hypernetwork approach, which employs an auxiliary hypernetwork to swiftly generate task-specific weights based on few-shot demonstration examples. We improve the weight generation scheme by exploiting the intrinsic weight structure as an inductive bias, enhancing sample efficiency for hypernetwork training. The method shows superior generalization performance on unseen tasks compared to existing hypernetwork methods.

Item

Evaluation of Convolutional Neural Networks for Modeling Blast Propagation in Multi-room Bunkers

(Georgia Institute of Technology, 2023-12-15) Luo, Felix

The rapid evaluation of blasts in enclosed geometrically complex spaces has long eluded the design of safer blast-resistant structures. Traditional methods of determining blast responses in enclosed geometrically complex spaces oftentimes rely on the use of traditional computational fluid dynamics (CFD) solvers to compute the entire flow field of the structure. This method has an enormous computational burden, especially considering that blasts are highly transient in nature and require the transient pressure fluctuations to be determined to formulate an accurate blast response prediction. However, more efficient methods of blast evaluation are desired such that parametric sweeps or optimization processes can be performed at low cost to provide a tool for iterative design. To rectify this gap in capabilities, a convolutional neural network based (CNN) model was developed to provide rapid blast predictions for 2D structures to establish this capability to aid in the design of more blast resistant structures. This approach leverages the inherent spatial awareness of CNNs to provide predictions for peak pressures since blasts in enclosed spaces are highly dependent on the spatial relationships between blast locations and wall location. This approach provides a nearly 5,000 times speed up against CFD simulations used within this study with good convergence of errors, correlation coefficients, predicted and truth values and distributions in all situational evaluations. These computational advantages, in part, comes from using the CNN based model to directly predict peak pressures whereas traditional CFD solvers require iterations to propagate fluid flows over time. However, some limitations do exist with respect to higher errors, such as model training costs, and the capability to predict 3D structures. Nonetheless, the results provide a characterization of the capabilities CNN based models in predicting peak pressures from blasts in enclosed spaces. From these evaluations and studies, a model which can provide significant computational savings while maintaining a similar accuracy can be obtained, which enables the rapid iterative design of more blast resistant structures.

Item

Explicit Group Sparse Projection for Machine Learning

(Georgia Institute of Technology, 2023-12-14) Ohib, Riyasat

The concept of sparse solutions in classical machine learning is noted for its efficiency and has parallels in the natural world, such as in the mammalian visual cortex. This biological basis provides an inspiration for the importance of sparsity in computational models. Sparsity is increasingly relevant in machine learning, especially in non-negative matrix factorization (NMF), where it aids in interpretability and efficiency. NMF involves breaking down a non-negative matrix into simpler components, with sparsity ensuring these components distinctly represent data features, simplifying interpretation. In deep learning, sparse model parameters lead to more efficient computation, quicker training and inference, and in some cases, more robust models. As models grow in size, the role of inducing sparsity becomes even more crucial. In this thesis, we design a new sparse projection method for a set of vectors that guarantees a desired average sparsity level measured leveraging the popular Hoyer measure. Existing approaches either project each vector individually or require the use of a regularization parameter which implicitly maps to the average $\ell_0$-measure of sparsity. Instead, in our approach we set the \revise{Hoyer} sparsity level for the whole set explicitly and simultaneously project a group of vectors with the \revise{Hoyer} sparsity level of each vector tuned automatically. Hence, we call this the Group Sparse Projection (GSP). We show that the computational complexity of our projection operator is linear in the size of the problem. GSP can be used in particular to sparsify the columns of a matrix, which we use to compute sparse low-rank matrix approximations (namely, sparse NMF). We showcase the efficacy of our approach in both supervised and unsupervised learning tasks on image datasets including MNIST and CIFAR10. In non-negative matrix factorization, our approach yields competitive reconstruction errors against state-of-the-art algorithms. In neural network pruning, the sparse models produced by our method have competitive accuracy at corresponding sparsity values compared to existing methods.

Item

Chordate-specific gene regulatory network of neuron development in Ciona.

(Georgia Institute of Technology, 2023-12-12) Kim, Kwantae

In this research, I investigated the complex gene regulatory networks underlying neurogenesis, taking advantage of the unique neurons of the Ciona model system. I revealed that Fgf signaling is crucial for the neurogenesis of Bipolar Tail Neurons (BTNs) by controlling the expression of Neurogenin, the fate-determining transcription factor in these neurons. Then I also characterized multiple effector genes functioning in the development of BTNs. Additionally, I determined the vital role of the Pax3/7 transcription factor in the neural plate border to induce the neural tube closure. Finally, I demonstrated how the Pax3/7 also orchestrates an intricate gene regulatory network upstream of multiple transcription factors and functional effectors during the neurogenesis of Descending Decussating Neurons (ddNs). I found that the majority of this network’s regulatory branches are shared with other neurons in Ciona or even other organisms including vertebrates. Moreover, I revealed the role of key putative effector genes during the neurogenesis of ddNs. These findings will provide profound insights into developmental mechanisms in the central nervous system of chordates.

Item

A Sliding-Window Matrix Pencil Method for Aeroelastic Design Optimization with Limit-Cycle Oscillation Constraints

(Georgia Institute of Technology, 2023-12-13) Golla, Tarun

This thesis presents a new approach to constraining limit-cycle oscillations (LCOs) in aeroelastic design optimization. LCOs are self-excited oscillations that can develop in nonlinear aeroelastic systems experiencing flutter, and they must be avoided during operation to keep safety and performance. One approach to addressing this problem is to design the system using an optimization process that includes an LCO constraint. Previous efforts have proposed various LCO constraints for aeroelastic design optimization but have not addressed realistic design applications. This gap persists because existing LCO constraints are not oriented toward scalable gradient-based optimization algorithms. The proposed approach builds on a recent LCO constraint that bounds the recovery rate to equilibrium and is suited to gradient-based optimization. The new contribution from this thesis consists of introducing a new matrix pencil method for accurately evaluating the recovery rate within the LCO constraint using output data from transient responses. The amplitude-varying behavior of the recovery rate in the presence of dynamic nonlinearities is captured using a sliding time window along the transient response for a chosen quantity of interest. This new approach differs from the conventional matrix pencil method, which considers an entire transient response at once under linearized assumptions. Sensitivity studies are conducted to identify the optimal singular-value decomposition tolerance, sliding window size, stride size, output data sampling step, and aggregation parameters for obtaining accurate results. The new sliding-window matrix pencil method is then used to optimize a typical aeroelastic section model with a subcritical LCO behavior, enforcing no flutter or LCOs at chosen operation conditions. Optimization results are compared with previous work that used the same LCO constraint formulation combined with an approximate, conservative method to evaluate the recovery rate. The LCO constraint evaluated using the new sliding-window matrix pencil method allows the optimizer to completely suppress subcritical LCOs within the specified operating conditions while minimizing design changes, achieving a less conservative optimized solution. This work is a step toward constraining LCOs in large-scale aeroelastic design optimization to enable higher-performance designs while avoiding undesirable dynamics, such as subcritical LCOs. Future work includes formulating adjoint derivatives of the LCO constraint and demonstrating the methodology for aeroelastic models of increasing physical and computational complexity.

Item

Structural characterization and understanding growth kinetics of modern III-nitride epitaxial methods

(Georgia Institute of Technology, 2023-12-12) Motoki, Keisuke

In recent years, III-Nitride materials including GaN, AlN, InN, ScN, and its ternary alloys have drawn attention for use in various power electronics and optoelectronics technologies. These include power-switching devices, radio frequency devices, light-emitting diodes, laser diodes, solar cells, etc. Investigations for some of the binary and ternary III-nitrides are still in the infancy phase, lacking an understanding of the mechanisms dictating their physical properties. It is important to have a better understanding of the material properties to achieve better quality of the material. The structural property of the material is one of the most important factors for determining the electrical and optical performance and understanding the defect mechanisms is crucial for the advancement of the novel devices. Investigations of the structural properties coupled with other electrical and optical performances in GaN, AlInN, AlGaN, and ScAlN grown via the Metal Modulated Epitaxy (MME) technique, a modern growth technique in the Molecular Beam Epitaxy (MBE) system, are explained in the present dissertation.

Item

Modeling and Simulation of Industrial Membrane Processes Using Complex Mixtures for Integration in Process Simulation Environments

(Georgia Institute of Technology, 2023-12-12) Weber, Dylan Jacob

The goal of this work is to enable design, optimization, and control of membrane-based separation processes that encounter complex industrial streams of up to thousands of components. These mixture components can have boundless concentrations and interactions between them. Presently, tools for such processes are non-existent. For chemical engineers, after synthesizing the chemical of interest, half of the job is separating it. Traditional separations rely on energy intensive heat and specialty chemicals which generate pollutants and contribute to climate change. Membrane-based separations alleviate these effects by using electrical energy which can be based on renewable resources. This thesis achieves this goal by asserting the following objectives: (i) develop improved numerical methods for local membrane transport of complex mixtures, (ii) extend models for predicting complex mixture sorption and diffusion, (iii) develop a software package for membrane process simulation to use within process flowsheet simulation environments, and (iv) present preliminary process design and control strategies for transport of complex mixtures through ion-exchange membrane modules (to shift towards electrochemical membrane-based separations for nutrient recovery from ubiquitous waste streams). The numerical methods, models, and software package presented has been, and will continue to be utilized by researchers and engineers to design, optimize, and control membrane-based processes as a green alternative for separations in the oil, bio-refinery, paper making, and water treatment industries.

Item

Fundamental Limits and Algorithms for Database and Graph Alignment

(Georgia Institute of Technology, 2023-12-12) Dai, Osman Emre

Data alignment refers to a class of problems where given two sets of anonymized data pertaining to overlapping sets of users, the goal is to identify the correspondences between the two sets. If the data of a user is contained in both sets, the correlation between the two data points associated with the user might make it possible to determine that both belong to the same user and hence link the data points. Alignment problems are of practical interest in applications such as privacy and data junction. Data alignment can be used to de-anonymize data, therefore, studying the feasibility of alignment allows for a more reliable understanding of the limitations of anonymization schemes put in place to protect against privacy breaches. Additionally, data alignment can aid in finding the correspondence between data from different sources, e.g. different sensors. The data fusion performed through data alignment in turn can help with variety of inference problems that arise in scientific and engineering applications. This thesis considers two types of data alignment problems: database and graph alignment. Database alignment refers to the setting where each feature (i.e. data points) in a data set is associated with a single user. Graph alignment refers to the setting where data points in each data set are associated with pairs of users. For both problems, we are particularly interested in the asymptotic case where n, the number of users with data in both sets, goes to infinity. Nevertheless our analyses often yield results applicable to the finite n case. To develop a preliminary understanding of the database alignment problem, we first study the closely related problem of planted matching with Gaussian weights of unit variance, and derive tight achievability bounds that match our converse bounds: Specifically we identify different inequalities between log n and the signal strength (which corresponds to the square of the difference between the mean weights of planted and non-planted edges) that guarantee upper bounds on the log of the expected number of errors. Then, we study the database alignment problem with Gaussian features in the low per-feature correlation setting where the number of dimensions of each feature scales as ω(log n): We derive inequalities between log n and signal strength (which, for database alignment, corresponds to the mutual information between correlated features) that guarantee error bounds matching those of the planted matching setting, supporting the claimed connection between the two problems. Then, relaxing the restriction on the number of dimensions of features, we derive conditions on signal strength and dimensionality that guarantee smaller upper bounds on the log of the expected number of errors. The stronger results in the O(log n)-dimensional-feature setting for Gaussian databases show how planted matching, while useful, is not a perfect substitute to understand the dynamics of the more complex problem of database alignment. For graph alignment, we focus on the correlated Erdős–Rényi graph model where the data point (i.e. edge) associated with each pair of users in a graph is a Bernoulli random variable that is correlated with the data point associated with the same pair in the other graph. We study a canonical labeling algorithm for alignment and identify conditions on the density of the graphs and correlation between edges across graphs that guarantees the recovery of the true alignment with high probability.

Item

Enhancement of Ankle Fusion through FK506 Induced Osteogenesis

(Georgia Institute of Technology, 2023-12-14) Huffman, Nicholas

Ankle Arthrodesis is a common surgical procedure that typically involves the fusion of the tibia and talus of the patient. During surgery, the surgeon uses screws and plates to compress the bones together and cease plantar and dorsiflexion motion [1]. However, one of the main complications with the surgery is the non-union of bones. This can be due to loosening of the screws or failure to grow new bone in the joint space. Our team hypothesized that introducing an additional orthobiologic into the system would assist in bone formation and reducing non-union rates. In this study, we evaluated the effectiveness of osteogenic drugs to improve bone fusion within ankle arthrodesis. One such molecule we evaluated is FK506 (Tacrolimus), an FDA approved drug for treating organ transplant rejection. We implemented a cell culture model to test out the osteogenic potential of FK506. Bovine Marrow Derived Cells (MDCs) were cultured for 1-2 weeks and evaluated with Alizarin Red S Staining, Results were also tested with hMSCs. ALP Activity, and Gene Expression. We found that FK506 significantly affects Alizarin Red S staining within our MDCs. Additionally, we identified that rhPDGF-bb could be a potential adjuvant to FK506 treatment. Though future work will be needed to confirm the effects of rhPDGF-bb within an in vivo model. It was also noticed that there was significant variation associated with the MDC results between donors. We will look to answer those questions with flow cytometry in future experiments. Following those results, we tested our model within a rabbit ankle model to evaluate effectiveness.

Item

Surface passivation for enhanced stability and performance in perovskite solar cells

(Georgia Institute of Technology, 2023-12-13) Sharma, Sakshi

Lead halide perovskite solar cells (PSC) have emerged as promising next generation photovoltaics. Their unique ABX3 stoichiometry- where ‘A’ is a monovalent cation, ‘B’ is a divalent metal cation and ‘X’ is a halogen- provides tremendous potential for composition and bandgap engineering to obtain desired optoelectronic properties, enabling high power conversion efficiencies exceeding 25%. Despite their growing appeal, commercialization of PSC technology faces challenges due to device instabilities in ambient conditions. Particularly, device interfaces between the active perovskite layer and adjacent charge transport layers are vulnerable to defects which can accelerate perovskite degradation under environmental stressors such as heat, moisture, or oxygen, limiting their long-term viability. Interfaces also significantly impact charge transport, collection and recombination mechanisms in devices and thus require optimization. To address these challenges, research has concentrated on interface modification to passivate surface defects, protect the bulk of perovskite from external environment, and tune the charge transfer properties at the surface. Conjugated organic ammonium salts have been used at interfaces to introduce hydrophobicity on the perovskite film and promote charge delocalization brought on by conjugation. However, most surface treatment strategies relying on organic molecules introduce an electrically insulating spacer layer under thermal stress. Heat induced diffusion of molecules can reconstruct the interface into lower dimensional phases, which impedes charge extraction and affects photo-conversion efficiency (PCE) of devices. This brings a tradeoff between the benefits of passivation and charge extraction. For proper interface design, it is essential to study the thermal behavior of these passivation layers and establish their relationship with the optoelectronic properties of solar cells. This work explores the thermal behavior of passivation agents, specifically employing long-chain thiophene-functionalized π-conjugated molecules (2TI and 4TmI, with two and four thiophene rings, respectively) on interfacial structural stability and charge extraction. Tailoring the steric hindrance of the bulky cations used to treat perovskite surfaces presents an opportunity to control cation mobility, and consequently any phase changes resulting at elevated temperatures. Structural studies reveal that the length of the cation backbone regulates the rate of interfacial perovskite structure reconstruction on prolonged heating. Consequently, faster phase conversion is observed in 2TI compared to larger 4TmI, with the formation of a n=1 A’PbI4 two- dimensional phase which consists of inorganic PbI6 octahedra monolayers separated by an organic spacer layer, A’ being either 2T or 4Tm. The oligothiophene tail in these molecules further contributes to spacer layer conductivity, prompting distinct charge extraction and recombination behaviors in 2TI versus 4TmI passivated devices, confirmed by synchrotron-based X-ray measurements. Results show that despite the observed phase changes, 2TI treated devices can tune the surface potential to promote efficient hole extraction to the overlying hole transport layer and reduce carrier recombination. This interfacial steric engineering translates to high performing passivated solar cells, with 2TI/CsFAPbI3 devices exhibiting efficiency exceeding 20%, an open-circuit voltage of 1.07 V and minimal changes under continuous thermal exposure. By identifying the nature and impact of heat induced dynamical structural changes at passivated perovskite interfaces, this work highlights the key to surface functionalization so that solar cell performances can be maintained at high operating temperatures.

Theses and Dissertations

Permanent URI for this collection

Search within this collection.

Browse

Georgia Tech Library

Browse

Recent Submissions