Developing Graph-based Computational Algorithms for Single-cell Data Science

Loading...
Thumbnail Image
Author(s)
Lim, Hong Seo
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Wallace H. Coulter Department of Biomedical Engineering
The joint Georgia Tech and Emory department was established in 1997
Supplementary to:
Abstract
Explosive advances in single-cell measurement technologies allow in-depth analysis of the cellular heterogeneity of the biological systems of interest. Single-cell profiling through flow cytometry, mass cytometry, and single-cell RNA sequencing (scRNA-seq) has led to novel discoveries in immunology, virology, neuroscience, and cancer biology. Single-cell data science is a new discipline encompassing the usage of statistics, mathematics, or machine learning for various computational challenges arising in single-cell profiling data and subsequent analysis steps. In this thesis, we have identified several single-cell-related challenges that need proper attention: (1) proper integration of single-cell datasets acquired from different technologies or affected by batch effect, (2) quantification of cluster-like and trajectory-like characteristics of scRNA-seq datasets for proper algorithm choice, and (3) quantification of cell-type-specific differences across the single-cell dataset. In this thesis, we provide graph-based computational tools to tackle these challenges. The novel computational tools we developed are as follows: (1) We propose a new algorithm, JSOM, to align two datasets through jointly evolved self-organizing maps. We demonstrated that the JSOM maps could be used to identify related clusters between the two datasets, and we demonstrated the alignment of various single-cell profiling datasets. (2) We present five scoring metrics and a new pipeline to quantify geometric characteristics of scRNA-seq data, more specifically, the clusterness and trajectoriness of the data. The proposed scoring metrics are based on pairwise distance distribution, persistent homology, vector magnitude, Ripley's K, and degrees of separation, and we demonstrated that our pipeline could quantify clusterness and trajectoriness of scRNA-seq data. (3) We present a new pipeline to quantify cell-type-specific difference and to identify features driving the variation. Our pipeline exploits the quantifiable differences seen in the low-dimensional UMAP and used SHAP analysis to measure the differences, and we demonstrated the algorithm’s utility in interpreting and quantifying differences in various single-cell profiling data.
Sponsor
Date
2022-07-30
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI