Organizational Unit:
School of Chemistry and Biochemistry

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 2 of 2
  • Item
    Integrating Machine Learning Solutions into Untargeted Metabolomics and Xenobiotics Workflows
    (Georgia Institute of Technology, 2024-05-01) Rainey, Markace Alan
    Untargeted metabolomics explores the entirety of small molecules within biological samples, providing insights into metabolic alterations associated with various conditions. Standard methodologies like NMR and LC-MS are pivotal in identifying molecular markers but often fall short in fully deciphering the metabolic landscape due to limitations in accurately annotating a vast number of metabolites. This gap in annotation hampers the diagnostic application and biological interpretation of metabolomic data. Ion mobility spectrometry (IMS) offers a solution by providing semi-orthogonal data that enhances metabolite annotation. IMS separates ions based on their collision cross-section (CCS), a property influenced by an ion's mass, shape, charge, and external factors like temperature and pressure. When integrated with mass spectrometry (MS), IMS aids in resolving ions’ of similar or identical mass-to-charge ratio (m/z), offering a refined approach to metabolite characterization. This thesis focuses on employing computational strategies within LC-IM-MS workflows to facilitate rapid metabolite characterization. Chapter 1 outlines the challenges in metabolomics, specifically the limitations of current LC-MS workflows and the concept of the "dark metabolome." This introductory chapter provides the theoretical framework to better understand ion mobility and the use of quantitative-structural activity relationships to predict molecular properties. The chapter also discusses xenobiotics—external compounds impacting health—and their characterization challenges. Chapter 2 introduces Collision Cross Section Predictor 2.0 (CCSP 2.0), a machine learning-based tool for predicting ion mobility-derived CCS values. CCSP 2.0, developed to improve the accuracy and ease of CCS prediction, is evaluated for its efficacy in enhancing annotation accuracy in LC-MS workflows. It utilizes a support-vector regression model and incorporates a comprehensive library of molecular descriptors, demonstrating superior prediction accuracy and utility in reducing false positive annotations. Chapter 3 presents a workflow for automated detection of polyhalogenated xenobiotics in biological samples using LC-IM-MS. This approach combines CCS to m/z ratios, Kendrick mass defect analysis, and CCS prediction to filter isomeric candidates. A case study on the detection of per- and polyfluorinated alkyl substances in human serum exemplifies the workflow's effectiveness. Chapter 4 describes an analytical chemistry experiment for undergraduate students, focusing on laser-induced breakdown spectroscopy (LIBS) and its application in data science education. This chapter emphasizes enhancing students' programming literacy and analytical skills through hands-on experiments and analysis using Jupyter Notebooks. The experiment, adaptable to various curricula, showcases real-world applications of LIBS, including its use in space exploration. Chapter 5 summarizes key findings from the research, discussing the implications of integrating computational methods in metabolomics and the potential advancements in ion-mobility mass spectrometry. Future research directions are proposed to further explore and refine these methodologies. Appendix A explores an on-going project aimed at predicting analyte concentrations without standard calibration curves using machine learning. This approach predicts relative ionization efficiencies of lipids from their structural properties, demonstrating the potential of machine learning in streamlining quantitative analyses in metabolomics. In conclusion, this thesis underscores the importance of computational approaches in enhancing metabolite annotation and characterizing xenobiotics, contributing valuable tools and methodologies to the field of metabolomics.
  • Item
    Metabolomics and Machine Learning for Early-Stage Cancer Diagnosis
    (Georgia Institute of Technology, 2023-04-25) Sah, Samyukta
    Amongst all omics sciences, metabolomics is the most recent and is rapidly advancing as one of the predominant methodologies for early disease diagnosis and precision medicine. Metabolomics involves the high-throughput analysis of low molecular weight metabolites and their interactions with biological networks. Metabolomics studies involve measuring the abundance of hundreds to thousands of metabolites in biological fluids, tissues, and cells, and provide instantaneous snapshot of the status of the biological system. For cancer research, metabolomics is a powerful platform as it enables the identification of metabolic alterations useful in diagnosis, prognosis, and therapeutics. However, because of the complexity of the metabolome, no single analytical platform can capture the complete metabolic profile of a biological system. Thus, the use of complementary techniques allows for a more comprehensive analysis. Mass spectrometry (MS) is one of the commonly used techniques in metabolomics. Due to its high sensitivity and high-resolution, MS based metabolomics studies can provide a wide breath of knowledge of cancer metabolism. MS is typically coupled with separation techniques such as liquid chromatography (LC) or capillary electrophoresis to reduce the spectral complexity. MS based metabolomics studies generate a large amount of data and thus the use of machine learning (ML) methods are becoming increasingly popular to interpret and visualize metabolomics data and uncover new biological insights into disease biology. This thesis work focuses on MS based metabolomics analysis for improved understanding of cancer metabolism. As diagnosis of ovarian cancer (OC) remains an unmet clinical challenge, the main focus of this thesis is on uncovering the metabolic profile of OC and identifying potential biomarkers for its early diagnosis. In addition, ML methods were applied to identify the urinary metabolic profile of renal cell carcinoma (RCC). Kidney cancer is one the most lethal urinary cancers out of which 90% are renal cell carcinomas (RCC). Diagnosis of RCC is typically performed with expensive imagining tests and biopsies which not only are invasive but also are prone to sampling errors. Due to the proximity of the tumor to the urine, urine metabolomic profiling provides an excellent opportunity to study the metabolic rewiring of RCC. Chapter 1 introduces OC and the application of metabolomics to identify the metabolic reprogramming associated with OC pathogenesis. An overview of the analytical platforms and techniques in metabolomics is given, and the general workflow, including the use of ML in metabolomics data handling, is outlined. The commonly used statistical approaches and ML assisted metabolomics analysis for cancer research are provided. Furthermore, an overview of various MS based metabolomics studies of OC is given, describing the metabolic phenotype of OC. The diagnostic potential of the metabolite panels, as identified by the described studies, is also given. Finally, the biological implications of the potentially important metabolic alterations in OC are described. Chapter 2 presents a longitudinal serum metabolomics profiling of a triple-mutant (TKO) mouse model of high-grade serous carcinoma (HGSC), a subtype of OC. Two complementary ultrahigh performance liquid chromatography (UHPLC) – MS techniques were used to profile both the serum lipidome and the polar metabolome of TKO and TKO control mice. Sequentially collected serum samples from TKO mice starting from 8 weeks of age until death were analyzed, and a comprehensive metabolic map associated HGSC onset, development, and progression was revealed. These UHPLC-MS experiments were complemented with spatial lipidomic profiling of the entire reproductive system of the TKO mice. Ultrahigh-resolution matrix assisted laser desorption/ionization (MALDI) mass spectrometry was used to visualize the lipid alterations in HGSC. The longitudinal analysis of the serum metabolome revealed specific temporal trends for 17 lipid classes, as well as for other polar metabolites including amino acids, TCA acids, bile acids, bile acid derivatives, progesterone metabolites, and metabolites of arachidonic acid. Spatial lipidomic experiments provided a map of the metabolic alterations of HGSC that showed accumulation and reduction of various lipid classes, indicating the observed changes in the serum are in fact a signature of OC. Chapter 3 describes the serum lipidomic alterations in another mouse model of HGSC. For this study, sequentially collected serum samples from a double knockout mouse model (DKO) of HGSC were analyzed with reverse phase (RP) UHPLC-MS. Similar to the serum collection protocol followed for the TKO mouse in chapter 2, serum samples from these DKO mice were collected biweekly starting from 8 weeks of age until death or humane endpoint for sacrifice. Longitudinal lipidomic alterations in the DKO were investigated via machine learning (ML) approaches. A hierarchical clustering analysis identified 4 main lipid trajectory clusters. These clusters mainly consisted of glycerophospholipids including ether linked phospholipids and sphingolipids. Of note, early disease stages were marked by changes in phospholipids and sphingolipids while late disease stages showed more diverse changes. Furthermore, the ML based approaches characterized lipidome alterations at various disease stages. Five ML algorithms were used for classification purposes and a 5-lipid panel discriminated early stage DKO mice from DKO control mice with the area under curve (AUC) value of 0.80, indicating the possibility of employing circulating lipid markers for early detection. Animal models of OC provide a simpler, better-controlled model to study metabolic alterations which can later potentially be translated to humans. In addition, as obtaining early-stage OC samples from humans is challenging, animal models provide a rare opportunity to study early disease stages. However, the goal of these metabolomics studies is to develop clinically relevant biomarkers that aid in selecting therapeutic strategies, and thus investigating and validating metabolic changes in humans is crucial for assessing their translational implications. Additionally, besides detecting metabolic alterations in OC compared to controls with no gynecological malignancies, distinguishing OC from other benign or cancerous gynecological malignancies remains a challenge with significant implications on patient survival. Better results are observed when women with OC are correctly diagnosed and ensured the optimal treatment. In Chapter 4, a comprehensive serum lipidome profiling of OC and various other gynecological malignancies including benign ovarian tumor, benign uterine tumor, and cervical cancer (non-OC), is performed with RP UHPLC-MS. This study used serum samples from two independent tissue banks (Dongsan Hospital Human Tissue Bank and Gangnam Severance Hospital Gene Bank) in South Korea. Age matched patient cohort included 208 women with OC of various histological types and disease stages (mean age 51.9 years) and 137 non-OC patients (mean age 49.9 years). Among the OC patients, 93 women had early stage (I and II) OC out of which 31% were of serous histology. Serous tumors accounted for 86% of late stage (III and IV) cases while the remaining 14% of the cases included clear cell, transitional, mucinous, and carcinosarcoma subtypes. The serum lipidome of OC showed most lipid species to be reduced in OC with some lipid classes, including ceramides and triglycerides, showing increased abundance. Stage-stratified analyses were conducted to investigate if lipids show distinct alterations in early stage (I and II) or late stage (III and IV) OC versus non-OC. While both early OC or later OC vs. non-OC conditions showed alterations in various lipid classes including phospholipids and sphingolipids, changes in certain lipid species such as diglycerides, fatty acids and cholesterol were distinctive of advanced stage OC vs. non-OC. Moreover, results indicated that lipidome alterations in OC were present when the cancer was localized, and those changes amplified as the disease progresses. Besides, OC of different histological types showed similar lipidome changes for most lipid classes. Additionally, a panel of 10 top discriminating lipids, consisting mainly of ether phospholipids, phosphatidylcholine species, and one sphingomyelin species, was selected that differentiated OC from non-OC conditions with AUC of 0.91 and 0.74 in training and test sets, respectively. These results provided a systemic analysis of circulating lipidomic alterations in OC patients, highlighting the potential of lipids as a complementary class of blood-based biomarkers for OC diagnosis. Although UHPLC-MS remains one of the major tools in metabolomics studies, the use of complementary analytical techniques holds promise to significantly improve metabolite coverage. Capillary electrophoresis (CE) coupled with high-resolution MS (HRMS) offers high selectivity and sensitivity for charged polar metabolites and provides a complementary view of the metabolome. Recent advances in CE-MS technologies include small, chip-based CE systems coupled with nanoelectrospray ionization (nanoESI), enabling fast and sensitive analysis. In Chapter 5, a microchip CE (µCE)-MS based targeted metabolomics assay was developed to analyze 40 key metabolites related to cancer progression. A commercial µCE-MS system from 908 devices (Boston MA) was coupled with high-resolution accurate mass Q Exactive plus mass spectrometer (Thermo Fisher, MA). The developed method was applied to biweekly collected serum samples from 3 TKO and 3 TKO control mice of HGSC. The µCE-MS method produced sharp baseline resolved peak shapes and calibration curves maintained good linearity. When applied to serum samples from the TKO mouse, 30 metabolites were successfully detected. These included amino acids, amino acids derivatives, and nucleotides. The data collected from this µCE-MS platform were compared with the previously collected UHPLC-MS data in Chapter 2. Time-resolved data for the 5 metabolites detected with both platforms showed identical temporal trends, indicating the µCE-MS method performed satisfactorily in capturing granular time-course data in complex biological matrices. Chapter 6 presents the major conclusions drawn from this thesis work. Metabolic alterations in the two animal models of OC are discussed, highlighting the similarities and differences observed in each case. A summary of conclusions from the lipidomic profiling of human ovarian cancer samples is presented, and the results from animal models of OC are compared with alterations observed in humans. Possible future directions to continue with this work are also discussed. Appendix A presents a pilot study to investigate the feasibility of liquid-based Papanicolaou (Pap) tests as biospecimens for OC detection. As past studies have observed OC cells in Pap tests, we hypothesized that lipid alterations that are observed in serum and ovarian tissues can also be present in cells obtained in a Pap test. To test this hypothesis, a UHPLC-MS based lipidomics pipeline is developed and a liquid-based Pap test sample from a woman with normal cytology was analyzed. Cells were pelleted out followed by an extraction protocol to extract lipids. Results showed that lipids can be detected in these cell pellets. The detected lipids included sphingolipids, phospholipids, ether linked phospholipids, and glycerolipids. These results suggest a possibility of developing non-invasive techniques for OC diagnosis and detection. Appendix B outlines collaborative metabolomics work that combined both nuclear magnetic resonance (NMR) and UHPLC-MS datasets and used ML to discover urine-based candidate biomarkers for RCC prediction. Urine samples from patients at Emory University Hospital with a solid renal mass that were confirmed to be RCC were used for this study. Controls were identified during the annual physical exam. The study cohort consisted of 105 RCC patients and 179 controls, which is larger than most previously published studies. ML based feature selection led to a panel of seven metabolites that discriminated RCC from controls with 88% accuracy, 94% sensitivity, 85% specificity and AUC of 0.98 in the test cohort. This panel consisted of metabolites that were detected on the MS platform. High resolution MS and tandem MS experiments were conducted to assign metabolite annotation. The annotated metabolites included 2-phenylacetamide, lysine-isoleucine or lysine-leucine, hippuric acid, a mannitol hippurate derivative, N-acetyl-glucosaminic acid and two exogenous metabolites: 2-mercaptobenzothiazole, and dibutylamine. Furthermore, Appendix C presents a machine learning based urine metabolomics study to identify the metabolites associated with RCC staging and to estimate RCC tumor size. The same dataset that was collected for the analysis in Appendix B was used in this study. The metabolites associated with RCC progression included 3-hydroxyanthranilic acid, lysyl-glycine, glycine, and citrate. Overall, this multiplatform metabolomics study provided a broad coverage of metabolites and provided a complementary view of the urine metabolome of RCC. These results suggest the use of urinary metabolomics profiling as a promising platform for RCC detection.