Organizational Unit:

School of Biological Sciences

Permanent Link

https://hdl.handle.net/1853/70750

Parent Organization

Organizational Unit

College of Sciences

Includes Organization(s)

Organizational Unit

Center for the Study of Systems Biology

ArchiveSpace Name Record

https://finding-aids.library.gatech.edu/agents/corporate_entities/1131

Full item page

Publication Search Results

Now showing 1 - 10 of 13

Human genetic ancestry, health, and adaptation in Latin America

(Georgia Institute of Technology, 2019-11-05) Norris, Emily Taylor

Genetic admixture is the process that occurs when populations that were previously reproductively isolated, and consequently genetically diverged, come back together and exchange genes. Recent studies of modern and ancient genomes have underscored the frequency with which admixture has occurred during human evolution. Indeed, human evolution has been characterized by numerous iterations of physical isolation and genetic divergence followed by population convergence and admixture. Genetic admixture has profound implications for human evolution as it results in the creation of evolutionarily novel genomes that contain combinations of genetic variants (haplotypes) never seen before on the same genomic background. This dissertation explores the implications of large-scale genetic admixture in Latin America for human health, evolution (natural selection), and population structure (assortative mating). Latin America provides an ideal setting to explore the implications of admixture given the formation of modern populations via admixture among distinct African, European, and Native American population groups. Human health and evolution are explored through the lens of admixture, with an emphasis on the demographic processes that serve to combine distinct ancestry components within genomes. Population structure is considered with respect to assortative mating, which serves to limit the extent of genetic admixture within populations, thereby maintaining genetic diversity among distinct population groups even when they are co-located. In order to understand the implications of admixture for the formation of the New World, comparative genomic analyses were used to characterize patterns of genetic ancestry and admixture for individuals from four modern Latin American populations: Colombia, Mexico, Peru, and Puerto Rico. Comparative genomic analyses with ancestral source populations allowed for the characterization of genetic ancestry and admixture profiles for these four Latin American populations at both genome-wide (global) and variant/gene (local) levels. These data on genetic ancestry were integrated with a variety of functional genomic data sources in an effort to more fully understand the biological implications of admixture. Global patterns of ancestry for each population were used to parameterize the expected values of local ancestry, for both specific genetic variants and at the level of individual genes, and comparisons of observed versus expected ancestry levels were used to look for anomalous deviations of local ancestry, i.e. ancestry enrichment. Ancestry-enriched genetic variants were implicated in a number of health-related phenotypes, including immune system and disease response pathways, and a number of these variants were shown to exert their phenotypic effects via ancestry-specific gene regulation. Ancestry enrichment at the gene level was used to provide evidence for rapid adaptation to local environments via admixture-enabled selection, which occurs when admixture introduces novel genetic variants (haplotypes) to newly formed populations at intermediate frequencies. Admixture-enabled selection was observed for the major histocompatibility complex (MHC) locus of the adaptive immune system across multiple Latin American populations, and both the adaptive and innate immune systems were shown to evolve via polygenic admixture-enabled selection. Patterns of gene level ancestry were also used to search for evidence of population structure caused by assortative mating, whereby mate choice is influenced by phenotypic similarity. This analysis allowed us to characterize the genetic basis of phenotypic cues that influence patterns of assortative mating, including a number of anthropometric and neurological traits as well as the MHC locus. Considered together, these results underscore the outsized role that admixture has played in shaping the biology of modern Latin American populations. Global patterns of genetic ancestry and admixture are distinct to each population, and local ancestry can differ widely even for closely related individuals within a population. Local ancestry impacts a wide variety of health-related traits, provides the raw material for rapid, adaptive evolution, and informs the phenotypic cues that are used for mate choice and help to maintain population structure.
Global dysregulation of gene expression and tumorigenesis: Data science for cancer

(Georgia Institute of Technology, 2019-09-03) Clayton, Evan

Dysregulation of gene expression is a hallmark of cancer. Broadly speaking, my research is focused on the changes in gene expression that characterize the transition from normal to cancerous states, i.e. tumorigenesis. To study such changes, I performed integrated analysis of next generation sequencing data for matched normal and primary tumor samples from hundreds of patients across numerous different cancer types. By analyzing this sequencing data, I have been able to explore the global landscape of transcriptional reprogramming in cancer and discover how changes in the regulation of gene expression may be implicated in tumorigenesis. My thesis is focused on four specific areas of transcriptional reprogramming in cancer: (1) changes in the expression and activity of transposable elements (TEs), (2) changes in alternative splicing induced by TEs, (3) allele-specific expression of tumor suppressor genes (TSGs), and (4) gene expression changes that are implicated in cancer drug response. TEs are known to be uniformly overexpressed in cancer, suggesting a possible role for their activity in tumorigenesis. I discovered a class of long interspersed nuclear elements (the LINE-1 family) with elevated levels of expression and activity in three different cancer types, and I showed examples where cancer-specific LINE-1 insertions disrupt enhancers, leading to the down-regulation of TSGs. TEs are also implicated in the creation of novel splicing isoforms, and aberrant alternative splicing has been associated with tumorigenesis for a number of different cancers. Integrated analysis of genome sequence and transcriptome data revealed thousands of TE-generated alternative splice events genome-wide, including close to 5,000 events distributed among cancer associated genes. I explored the functional implications of specific cases of isoform switching, whereby TE-induced isoforms of cancer associated genes show elevated levels of relative expression in tumor samples. A closer look at TSG expression in matched normal and tumor samples indicated that functionally important changes in patterns of allele-specific expression in individuals heterozygous for loss-of-function TSG alleles is a significant factor in cancer onset/progression. These results identified a variety of molecular mechanisms that contribute to the observed changes in allele-specific expression patterns in cancer with allele-specific alternative splicing mediated by anti-sense RNA emerging as a predominant factor. Furthermore, analysis of the genomic variation for world-wide human populations demonstrates that loss-of-function TSG alleles are segregating at remarkedly high frequencies implying that a significant fraction of otherwise healthy individuals may be pre-disposed to developing cancer. For the final study of my thesis research, I applied the gene expression data from primary tumor samples to build predictive models of cancer drug response for two common chemotherapeutics: 5-Fluorouracil and Gemcitabine. My gene expression based models predict whether patients will respond to individual therapies with up to 86% accuracy. The genes that I found to be most informative for predicting drug response were enriched in well-known cancer signaling pathways highlighting their potential significance in prognosis of chemotherapy.
Building a systematic analytic pipeline – big data innovation in healthcare

(Georgia Institute of Technology, 2019-08-27) Wang, Yuanbo

Electronic Health Records (EHR) containing large amount of patient data present both opportunities and challenges to industry, policy makers, and researchers. Data-driven healthcare utilizing big data in EHR has the potential to revolutionize care delivery while reducing costs. However, for researchers, policymakers, and practitioners to take full advantage of the benefits that electronic records can provide, several challenges must be addressed: 1) Extraction and coding methods for EHR data must be strategically designed to address issues of data quantity, quality, and patient confidentiality; 2) Standardization of clinical terminologies is essential in facilitating interoperability among EHR systems and allows for multi-site comparative effectiveness studies; 3) Effective methods for mining longitudinal health data common in the EHR are critical for revealing disease progression, treatment patterns, and patient similarities, all of which play important role in clinical decision support and treatment improvement; 4) Advanced machine learning techniques are necessary for early detection and prognosis of disease and identifying critical factors that impact patient outcome and; 5) Practical intervention strategies must be developed to address healthcare disparity in rural and remote areas with lack of resources and access. This thesis focuses on these five issues by developing a systematic analytic pipeline for big data in healthcare. Specifically, innovative strategies are developed for information extraction, clinical terminology mapping, time-series mining and clustering, feature selection and discriminatory modeling. Finally, practical implementation methods for telehealth services are designed to reduce healthcare disparity in underserved rural Appalachian counties in Georgia.
Public health informatics - Strategy and decision modeling

(Georgia Institute of Technology, 2019-08-21) Tian, Haozheng

My research is composed of three studies focused on providing decision modeling and analytical tools with the objective of protecting public health. The first study introduces an agent-based simulation platform that serves as a decision support system for crowd management in public venues. I propose a new implementation of agent-based simulation with improvement on four aspects: path planning, collision avoidance, emotion modeling and optimization with simulation. The deliverables of this study also include a complete simulation platform for researcher’s use. The second study applies a data-driven informatics and machine learning approach to quantify the outcome of practice variance of medical care providers. The study investigates the safety and efficacy of a large-dose, needle-based epidural anesthesia technique for parturient women. Machine learning model is proposed as the classifier to predict the occurrence of hypotension. Further, machine learning approach is applied to predict the outcome of epidural anesthesia, uncovering the important factors of a successful practice. Quantification of the effect of practice variance and medicine usage is provided. The findings from this investigation facilitate delivery improvement and establish an improved clinical practice guideline for training and for dissemination of safe practice. The third study proposes the application of convolutional neural network (CNN) in the prediction of antigenicity of influenza viruses (A/H3N2) and vaccine recommendation. The study systematically explores the ways of representation of hemagglutinin (HA) besides using binary digit or character as widely applied in other researches. Heuristic optimization is applied to optimize the selection of AAindex as well as the structure of CNN. Contrasting to other state-of-the-art approaches, the model offers better coverage in vaccine recommendation and has superior performance in accurate prediction of antigenicity.
Transposable element polymorphisms and human genome regulation

(Georgia Institute of Technology, 2017-11-13) Wang, Lu

Transposable elements (TEs) are DNA sequences that are capable of moving from one genomic location to another. A large proportion of the human genome is derived from TEs, and TE-derived sequences have been shown to contribute to genome regulation in a variety of ways. There are several active families of human TEs, primarily the Alu, LINE-1 (L1), and SVA retrotransposons, which generate structural variations that segregate as polymorphisms within and between human populations. Given the known regulatory properties of human TEs, considered together with the fact that TE insertion activity is a source of population genetic variation, I hypothesized that TE polymorphisms can lead to gene regulatory differences among human individuals with health related phenotypic consequences. I evaluated this hypothesis via a series of genome-wide association screens aimed at assessing: (1) how the human genome regulates TE activity, and (2) how TE activity impacts human genome regulation and health related phenotypes. Expression quantitative trait loci (eQTL) analysis was used to discover a number of novel genetic modifiers of L1 element expression, including genes encoding for transcription factors and chromatin associated proteins. Human TE polymorphisms were shown to participate in population-specific gene regulation, with the potential to coordinately modify transcriptional networks. The regulatory effects of human TE polymorphisms were linked to immune system function, and related diseases, via insertions into cell type-specific enhancers. Results from my novel genome-wide approach to the study of human TE activity underscore the ability of TEs to effect health related phenotypes by virtue of changes to the regulatory landscape of the genome.
Population genomics of human polymorphic transposable elements

(Georgia Institute of Technology, 2016-11-15) Rishishwar, Lavanya

Transposable element (TE) activity has had a major impact on the human genome; more than two-thirds of the sequence is derived from TE insertions. Several families of human TEs – primarily Alu, L1 and SVA – continue to actively transpose, thereby generating insertion polymorphisms between individuals. Until very recently, it has not been possible to characterize the genetic variation generated by the activity of these TE families at the scale of whole genomes for multiple individuals within and between human populations. For this reason, the impact of recent TE activity on human evolution has yet to be fully appreciated. My dissertation research leverages novel technologies in data science to investigate the role that recent TE activity has played in shaping human population genetic variation. Specifically, my dissertation addresses three problems: 1) evaluation of the computational techniques used to characterize human polymorphic TE insertion sites from whole genome, next-generation sequence data, 2) characterization of the population genomic variation of human polymorphic TEs and evaluation of their effectiveness as markers of human genetic ancestry and admixture, and 3) analysis of the effects that natural selection (negative and positive) has exerted on human polymorphic TE insertions. I close by presenting a broad prospectus on the implications of genome-scale analyses of human polymorphic TE insertions for population and clinical genetic studies. The results reported in this dissertation represent the dawn of the population genomics era for human TEs.
Effects of repetitive DNA and epigenetics on human genome regulation

(Georgia Institute of Technology, 2013-07-02) Jjingo, Daudi

The highly developed and specialized anatomical and physiological characteristics observed for eukaryotes in general and mammals in particular are underwritten by an elaborate and intricate process of genome regulation. This precise control of the location, timing and amplitude of gene expression is achieved by a variety of genetic and epigenetic tools and mechanisms. While several of these regulatory mechanisms have been extensively studied, our understanding of the complex and diverse associations between various epigenetic marks and genetic elements with genome regulatory systems has remained incomplete. However, the recent profound improvements in sequencing technologies have significantly improved the depth and breadth to which their functions and relationships can be understood. The objective of this thesis has been to apply bioinformatics, computational and statistical tools to analyze and interpret various recent high throughput datasets from a combination of Next generation sequencing and Chromatin immune precipitation (ChIP-seq) experiments. These datasets have been analyzed to further our understanding of the dynamics of gene regulation in humans, particularly as it relates to repetitive DNA, cis-regulation and DNA methylation. The thesis thus resides at the intersection of three major areas; transposable elements, cis-regulatory elements and epigenetics. It explores how those three aspects of regulation relate with gene expression and the functional implications of those interactions. From this analysis, the thesis provides new insights into; 1) the relationship between the transposable element environment of human genes and their expression, 2) the role of mammalian-wide interspersed repeats (MIRs) in the function of human enhancers and enhancement of tissue-specic functions, 3) the existence and function of composite cis-regulatory elements and 4) the dynamics and relationship between human gene-body DNA methylation and gene expression.
Computational algorithm development for epigenomic analysis

(Georgia Institute of Technology, 2012-07-03) Wang, Jianrong

Multiple computational algorithms were developed for analyzing ChIP-seq datasets of histone modifications. For basic ChIP-seq data processing, the problems of ambiguous short sequence read mapping and broad peak calling of diffuse ChIP-seq signals were solved by novel statistical methods. Their performance was systematically evaluated compared with existing approaches. The potential utility of finding meaningful biological information was demonstrated by the applications on real datasets. For biological question driven data mining, several important topics were selected for algorithm developments, including hypothesis-driven insulator prediction, unbiased chromatin boundary element discovery and combinatorial histone modification signature inference. The integrative computational pipeline for insulator prediction not only produced a list of putative insulators but also recovered specific associated chromatin and functional features. Selected predictions have been experimentally validated. The unbiased chromatin boundary element prediction algorithm was feature-free and had the capability to discover novel types of boundary elements. The predictions found a set of chromatin features and provided the first report of tRNA-derived boundary elements in the human genome. The combinatorial chromatin signature algorithm employed chromatin profile alignments for unsupervised inferences of histone modification patterns. The signatures were associated with various regulatory elements and functional activities. Both the computational advantages and the biological discoveries were discussed.
Alteration of transcription by non-coding elements in the human genome

(Georgia Institute of Technology, 2012-06-27) Conley, Andrew Berton

The human genome contains ~1.5% coding sequence, with the remaining 98.5% being non-coding. The functional potential of the majority of this non-coding sequence remains unknown. Much of this non-coding sequence is derived from transposable element (TE) sequences. These TE sequences contain their own regulatory information, e.g. promoter and transcription factor binding sites. Given the large number of these sequences, over 4 million in the human genome, it would be expected that the regulatory information that they contain would affect the expression of nearby genes. This dissertation describes research that characterizes that alternation of and contribution to the human transcriptome by non-coding elements, including TE sequences.
Computational tools for molecular epidemiology and computational genomics of Neisseria meningitidis

(Georgia Institute of Technology, 2010-11-17) Katz, Lee Scott

Neisseria meningitidis is a gram negative, and sometimes encapsulated, diplococcus that causes devastating disease worldwide. For the worldwide genetic surveillance of N. meningitidis, the gold standard for profiling the bacterium uses genetic loci found around the genome. Unfortunately, the software for analyzing the data for these profiles is difficult to use for a variety of reasons. This thesis shows my suite of tools called the Meningococcus Genome Informatics Platform for the analysis of these profiling data. To better understand N. meningitidis, the CDC Meningitis Laboratory and other world class laboratories have adopted a whole genome approach. To facilitate this approach, I have developed a computational genomics assembly and annotation pipeline called the CG-Pipeline. It assembles a genome, predicts locations of various features, and then annotates those features. Next, I developed a comparative genomics browser and database called NBase. Using CG-Pipeline and NBase, I addressed two open questions in N. meningitidis research. First, there are N. meningitidis isolates that cause disease but many that do not cause disease. What is the genomic basis of disease associated versus asymptomatically carried isolates of N. meningitidis? Second, some isolates' capsule type cannot be easily determined. Since isolates are grouped into one of many serogroups based on this capsule, which aids in epidemiological studies and public health response to N. meningitidis, often an isolate cannot be grouped. Thus the question is what is the genomic basis of nongroupability? This thesis addresses both of these questions on a whole genome level.