Organizational Unit:
School of Biological Sciences

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)

Publication Search Results

Now showing 1 - 10 of 15
Thumbnail Image
Item

Differential Gene Co-Expression Network Characteristics Of Cancer

2022-12-13 , Arshad, Zainab

The transformation from a healthy state to a disease state in cancer is dictated in large part by structural and regulatory abnormalities in genes. While the molecular features underlying this transition have been investigated for some time, allowing groundbreaking advancements in cancer research, a majority of these efforts are focused on mutational and expression changes of individual genes. The recent advancement of network-based analytic methods affords an additional route through which disease pathophysiology and biologic regulation can be investigated. Furthermore, with the development of high-throughput technologies and the availability of large biobanks, gene interaction changes, and their functional consequences can be reliably interpreted from a systemic perspective, in a context specific manner. Towards this end, my research investigates gene co-expression changes, derived from transcriptomic case-control data, that underlie cancer onset and progression relative to healthy tissue. For the first study, global network changes associated with cancers of nine different tissues of origin were investigated. Network complexity generally dropped in the transition from normal precursor tissues to corresponding primary tumors, whereas cross-tissue cancer network similarity overall increased in early-stage cancers followed by a subsequent loss in similarity as tumors reacquire cancer-specific network complexity in late-stage cancers. In addition, gene-gene connections remaining stable through cancer development were found enriched for ‘‘housekeeping’’ gene functions, whereas newly acquired interactions were associated with established cancer-promoting functions. For the second study, gene-network characteristics of the molecular subtypes (Luminal A, Luminal B and Basal) of Breast Cancer (BC) were outlined based on a comparative analysis relative to precursor normal breast tissue. Basal was identified as the most highly connected yet dissimilar subtype to normal control. We discovered eight extensively connected network modules acquired in Basal BCs that harbored 19 genes found significantly associated with survival and encoding cancer hallmark functions including regulation of cell proliferation and motility, as well as neural pathways that have not been previously associated with basal BCs. Finally, the consensus approach of network construction for an unbiased differential analysis of gene co-expression networks used in these studies was published as a step-by-step protocol. Altogether, this thesis highlights gene-network changes characteristic of individual cancer types, molecular subtypes and disease stages that informs their diverse progression patterns and clinical outcomes. Furthermore, it underscores the importance and demonstrates the utility of gene co-expression networks in identifying key genes, gene interactions and functional characteristics of cancers that maybe undiscovered by standard molecular analysis approaches.

Thumbnail Image
Item

Global dysregulation of gene expression and tumorigenesis: Data science for cancer

2019-09-03 , Clayton, Evan

Dysregulation of gene expression is a hallmark of cancer. Broadly speaking, my research is focused on the changes in gene expression that characterize the transition from normal to cancerous states, i.e. tumorigenesis. To study such changes, I performed integrated analysis of next generation sequencing data for matched normal and primary tumor samples from hundreds of patients across numerous different cancer types. By analyzing this sequencing data, I have been able to explore the global landscape of transcriptional reprogramming in cancer and discover how changes in the regulation of gene expression may be implicated in tumorigenesis. My thesis is focused on four specific areas of transcriptional reprogramming in cancer: (1) changes in the expression and activity of transposable elements (TEs), (2) changes in alternative splicing induced by TEs, (3) allele-specific expression of tumor suppressor genes (TSGs), and (4) gene expression changes that are implicated in cancer drug response. TEs are known to be uniformly overexpressed in cancer, suggesting a possible role for their activity in tumorigenesis. I discovered a class of long interspersed nuclear elements (the LINE-1 family) with elevated levels of expression and activity in three different cancer types, and I showed examples where cancer-specific LINE-1 insertions disrupt enhancers, leading to the down-regulation of TSGs. TEs are also implicated in the creation of novel splicing isoforms, and aberrant alternative splicing has been associated with tumorigenesis for a number of different cancers. Integrated analysis of genome sequence and transcriptome data revealed thousands of TE-generated alternative splice events genome-wide, including close to 5,000 events distributed among cancer associated genes. I explored the functional implications of specific cases of isoform switching, whereby TE-induced isoforms of cancer associated genes show elevated levels of relative expression in tumor samples. A closer look at TSG expression in matched normal and tumor samples indicated that functionally important changes in patterns of allele-specific expression in individuals heterozygous for loss-of-function TSG alleles is a significant factor in cancer onset/progression. These results identified a variety of molecular mechanisms that contribute to the observed changes in allele-specific expression patterns in cancer with allele-specific alternative splicing mediated by anti-sense RNA emerging as a predominant factor. Furthermore, analysis of the genomic variation for world-wide human populations demonstrates that loss-of-function TSG alleles are segregating at remarkedly high frequencies implying that a significant fraction of otherwise healthy individuals may be pre-disposed to developing cancer. For the final study of my thesis research, I applied the gene expression data from primary tumor samples to build predictive models of cancer drug response for two common chemotherapeutics: 5-Fluorouracil and Gemcitabine. My gene expression based models predict whether patients will respond to individual therapies with up to 86% accuracy. The genes that I found to be most informative for predicting drug response were enriched in well-known cancer signaling pathways highlighting their potential significance in prognosis of chemotherapy.

Thumbnail Image
Item

Transposable element polymorphisms and human genome regulation

2017-11-13 , Wang, Lu

Transposable elements (TEs) are DNA sequences that are capable of moving from one genomic location to another. A large proportion of the human genome is derived from TEs, and TE-derived sequences have been shown to contribute to genome regulation in a variety of ways. There are several active families of human TEs, primarily the Alu, LINE-1 (L1), and SVA retrotransposons, which generate structural variations that segregate as polymorphisms within and between human populations. Given the known regulatory properties of human TEs, considered together with the fact that TE insertion activity is a source of population genetic variation, I hypothesized that TE polymorphisms can lead to gene regulatory differences among human individuals with health related phenotypic consequences. I evaluated this hypothesis via a series of genome-wide association screens aimed at assessing: (1) how the human genome regulates TE activity, and (2) how TE activity impacts human genome regulation and health related phenotypes. Expression quantitative trait loci (eQTL) analysis was used to discover a number of novel genetic modifiers of L1 element expression, including genes encoding for transcription factors and chromatin associated proteins. Human TE polymorphisms were shown to participate in population-specific gene regulation, with the potential to coordinately modify transcriptional networks. The regulatory effects of human TE polymorphisms were linked to immune system function, and related diseases, via insertions into cell type-specific enhancers. Results from my novel genome-wide approach to the study of human TE activity underscore the ability of TEs to effect health related phenotypes by virtue of changes to the regulatory landscape of the genome.

Thumbnail Image
Item

Computational algorithm development for epigenomic analysis

2012-07-03 , Wang, Jianrong

Multiple computational algorithms were developed for analyzing ChIP-seq datasets of histone modifications. For basic ChIP-seq data processing, the problems of ambiguous short sequence read mapping and broad peak calling of diffuse ChIP-seq signals were solved by novel statistical methods. Their performance was systematically evaluated compared with existing approaches. The potential utility of finding meaningful biological information was demonstrated by the applications on real datasets. For biological question driven data mining, several important topics were selected for algorithm developments, including hypothesis-driven insulator prediction, unbiased chromatin boundary element discovery and combinatorial histone modification signature inference. The integrative computational pipeline for insulator prediction not only produced a list of putative insulators but also recovered specific associated chromatin and functional features. Selected predictions have been experimentally validated. The unbiased chromatin boundary element prediction algorithm was feature-free and had the capability to discover novel types of boundary elements. The predictions found a set of chromatin features and provided the first report of tRNA-derived boundary elements in the human genome. The combinatorial chromatin signature algorithm employed chromatin profile alignments for unsupervised inferences of histone modification patterns. The signatures were associated with various regulatory elements and functional activities. Both the computational advantages and the biological discoveries were discussed.

Thumbnail Image
Item

Efficient alignment-free software applications for next generation sequencing-based molecular epidemiology

2020-01-09 , Espitia Navarro, Hector Fabio

Public health agencies increasingly couple next generation sequencing (NGS) characterization of microbial genomes with bioinformatics analysis methods for molecular epidemiology. The overhead associated with the bioinformatics methods used for this purpose, in terms of both the required human expertise and computational resources, represents a critical bottleneck that limits the potential impact of microbial genomics on public health. This is particularly true for local public health agency laboratories, which are typically staffed with microbiologists who may not have substantial bioinformatics expertise or ready access to high-performance computational resources. There is a pressing need for bioinformatics solutions to genome-enabled molecular epidemiology that is accurate, easy to use, fast, and computationally efficient. This thesis research is focused on the development of an alignment-free algorithm for NGS data analysis and its implementation into turn-key software applications tailored explicitly for genome-enabled molecular epidemiology and environmental microbial genomics. I explored a computational strategy based on k-mer frequencies to distinguish among sequences of interest in NGS read samples. By combining this strategy with the efficient data structure Enhanced Suffix Array (ESA), I developed a base algorithm for the rapid analysis of unprocessed NGS reads. I further adapted and implemented this algorithm into a suite of software applications for sequence typing, gene detection, and gene-based taxonomic read classification. Benchmarking and validation analyses showed that STing is an ultrafast and accurate solution for genome-enabled molecular epidemiology, which performs better than existing bioinformatics methods for sequence typing and gene detection. To overcome the limitation of bioinformatics infrastructure and expertise in public health laboratories, I developed WebSTing, a Web-platform that uses the STing algorithm to provide easy access to the accurate and rapid alignment-free automated characterization of whole genome sequencing (WGS) samples of bacterial isolates. Finally, to demonstrate the utility of the STing in problems beyond simple sequence typing and gene detection, I applied the alignment-free algorithm to two different areas: (1) public health, with the virulence gene profiling of Shiga toxin-producing Escherichia coli (STEC) isolates, and (2) environmental microbial genomics, with the nifH gene-based taxonomy classification of amplicon sequencing reads. I showed that STing performs better that the gold-standard method for STEC isolate characterization, and that it correctly classifies amplicon sequencing reads on simulated communities of nitrogen-fixing organisms.

Thumbnail Image
Item

Building a systematic analytic pipeline – big data innovation in healthcare

2019-08-27 , Wang, Yuanbo

Electronic Health Records (EHR) containing large amount of patient data present both opportunities and challenges to industry, policy makers, and researchers. Data-driven healthcare utilizing big data in EHR has the potential to revolutionize care delivery while reducing costs. However, for researchers, policymakers, and practitioners to take full advantage of the benefits that electronic records can provide, several challenges must be addressed: 1) Extraction and coding methods for EHR data must be strategically designed to address issues of data quantity, quality, and patient confidentiality; 2) Standardization of clinical terminologies is essential in facilitating interoperability among EHR systems and allows for multi-site comparative effectiveness studies; 3) Effective methods for mining longitudinal health data common in the EHR are critical for revealing disease progression, treatment patterns, and patient similarities, all of which play important role in clinical decision support and treatment improvement; 4) Advanced machine learning techniques are necessary for early detection and prognosis of disease and identifying critical factors that impact patient outcome and; 5) Practical intervention strategies must be developed to address healthcare disparity in rural and remote areas with lack of resources and access. This thesis focuses on these five issues by developing a systematic analytic pipeline for big data in healthcare. Specifically, innovative strategies are developed for information extraction, clinical terminology mapping, time-series mining and clustering, feature selection and discriminatory modeling. Finally, practical implementation methods for telehealth services are designed to reduce healthcare disparity in underserved rural Appalachian counties in Georgia.

Thumbnail Image
Item

Population genomics of human polymorphic transposable elements

2016-11-15 , Rishishwar, Lavanya

Transposable element (TE) activity has had a major impact on the human genome; more than two-thirds of the sequence is derived from TE insertions. Several families of human TEs – primarily Alu, L1 and SVA – continue to actively transpose, thereby generating insertion polymorphisms between individuals. Until very recently, it has not been possible to characterize the genetic variation generated by the activity of these TE families at the scale of whole genomes for multiple individuals within and between human populations. For this reason, the impact of recent TE activity on human evolution has yet to be fully appreciated. My dissertation research leverages novel technologies in data science to investigate the role that recent TE activity has played in shaping human population genetic variation. Specifically, my dissertation addresses three problems: 1) evaluation of the computational techniques used to characterize human polymorphic TE insertion sites from whole genome, next-generation sequence data, 2) characterization of the population genomic variation of human polymorphic TEs and evaluation of their effectiveness as markers of human genetic ancestry and admixture, and 3) analysis of the effects that natural selection (negative and positive) has exerted on human polymorphic TE insertions. I close by presenting a broad prospectus on the implications of genome-scale analyses of human polymorphic TE insertions for population and clinical genetic studies. The results reported in this dissertation represent the dawn of the population genomics era for human TEs.

Thumbnail Image
Item

Human genetic ancestry, health, and adaptation in Latin America

2019-11-05 , Norris, Emily Taylor

Genetic admixture is the process that occurs when populations that were previously reproductively isolated, and consequently genetically diverged, come back together and exchange genes. Recent studies of modern and ancient genomes have underscored the frequency with which admixture has occurred during human evolution. Indeed, human evolution has been characterized by numerous iterations of physical isolation and genetic divergence followed by population convergence and admixture. Genetic admixture has profound implications for human evolution as it results in the creation of evolutionarily novel genomes that contain combinations of genetic variants (haplotypes) never seen before on the same genomic background. This dissertation explores the implications of large-scale genetic admixture in Latin America for human health, evolution (natural selection), and population structure (assortative mating). Latin America provides an ideal setting to explore the implications of admixture given the formation of modern populations via admixture among distinct African, European, and Native American population groups. Human health and evolution are explored through the lens of admixture, with an emphasis on the demographic processes that serve to combine distinct ancestry components within genomes. Population structure is considered with respect to assortative mating, which serves to limit the extent of genetic admixture within populations, thereby maintaining genetic diversity among distinct population groups even when they are co-located. In order to understand the implications of admixture for the formation of the New World, comparative genomic analyses were used to characterize patterns of genetic ancestry and admixture for individuals from four modern Latin American populations: Colombia, Mexico, Peru, and Puerto Rico. Comparative genomic analyses with ancestral source populations allowed for the characterization of genetic ancestry and admixture profiles for these four Latin American populations at both genome-wide (global) and variant/gene (local) levels. These data on genetic ancestry were integrated with a variety of functional genomic data sources in an effort to more fully understand the biological implications of admixture. Global patterns of ancestry for each population were used to parameterize the expected values of local ancestry, for both specific genetic variants and at the level of individual genes, and comparisons of observed versus expected ancestry levels were used to look for anomalous deviations of local ancestry, i.e. ancestry enrichment. Ancestry-enriched genetic variants were implicated in a number of health-related phenotypes, including immune system and disease response pathways, and a number of these variants were shown to exert their phenotypic effects via ancestry-specific gene regulation. Ancestry enrichment at the gene level was used to provide evidence for rapid adaptation to local environments via admixture-enabled selection, which occurs when admixture introduces novel genetic variants (haplotypes) to newly formed populations at intermediate frequencies. Admixture-enabled selection was observed for the major histocompatibility complex (MHC) locus of the adaptive immune system across multiple Latin American populations, and both the adaptive and innate immune systems were shown to evolve via polygenic admixture-enabled selection. Patterns of gene level ancestry were also used to search for evidence of population structure caused by assortative mating, whereby mate choice is influenced by phenotypic similarity. This analysis allowed us to characterize the genetic basis of phenotypic cues that influence patterns of assortative mating, including a number of anthropometric and neurological traits as well as the MHC locus. Considered together, these results underscore the outsized role that admixture has played in shaping the biology of modern Latin American populations. Global patterns of genetic ancestry and admixture are distinct to each population, and local ancestry can differ widely even for closely related individuals within a population. Local ancestry impacts a wide variety of health-related traits, provides the raw material for rapid, adaptive evolution, and informs the phenotypic cues that are used for mate choice and help to maintain population structure.

Thumbnail Image
Item

Public health informatics - Strategy and decision modeling

2019-08-21 , Tian, Haozheng

My research is composed of three studies focused on providing decision modeling and analytical tools with the objective of protecting public health. The first study introduces an agent-based simulation platform that serves as a decision support system for crowd management in public venues. I propose a new implementation of agent-based simulation with improvement on four aspects: path planning, collision avoidance, emotion modeling and optimization with simulation. The deliverables of this study also include a complete simulation platform for researcher’s use. The second study applies a data-driven informatics and machine learning approach to quantify the outcome of practice variance of medical care providers. The study investigates the safety and efficacy of a large-dose, needle-based epidural anesthesia technique for parturient women. Machine learning model is proposed as the classifier to predict the occurrence of hypotension. Further, machine learning approach is applied to predict the outcome of epidural anesthesia, uncovering the important factors of a successful practice. Quantification of the effect of practice variance and medicine usage is provided. The findings from this investigation facilitate delivery improvement and establish an improved clinical practice guideline for training and for dissemination of safe practice. The third study proposes the application of convolutional neural network (CNN) in the prediction of antigenicity of influenza viruses (A/H3N2) and vaccine recommendation. The study systematically explores the ways of representation of hemagglutinin (HA) besides using binary digit or character as widely applied in other researches. Heuristic optimization is applied to optimize the selection of AAindex as well as the structure of CNN. Contrasting to other state-of-the-art approaches, the model offers better coverage in vaccine recommendation and has superior performance in accurate prediction of antigenicity.

Thumbnail Image
Item

Effects of repetitive DNA and epigenetics on human genome regulation

2013-07-02 , Jjingo, Daudi

The highly developed and specialized anatomical and physiological characteristics observed for eukaryotes in general and mammals in particular are underwritten by an elaborate and intricate process of genome regulation. This precise control of the location, timing and amplitude of gene expression is achieved by a variety of genetic and epigenetic tools and mechanisms. While several of these regulatory mechanisms have been extensively studied, our understanding of the complex and diverse associations between various epigenetic marks and genetic elements with genome regulatory systems has remained incomplete. However, the recent profound improvements in sequencing technologies have significantly improved the depth and breadth to which their functions and relationships can be understood. The objective of this thesis has been to apply bioinformatics, computational and statistical tools to analyze and interpret various recent high throughput datasets from a combination of Next generation sequencing and Chromatin immune precipitation (ChIP-seq) experiments. These datasets have been analyzed to further our understanding of the dynamics of gene regulation in humans, particularly as it relates to repetitive DNA, cis-regulation and DNA methylation. The thesis thus resides at the intersection of three major areas; transposable elements, cis-regulatory elements and epigenetics. It explores how those three aspects of regulation relate with gene expression and the functional implications of those interactions. From this analysis, the thesis provides new insights into; 1) the relationship between the transposable element environment of human genes and their expression, 2) the role of mammalian-wide interspersed repeats (MIRs) in the function of human enhancers and enhancement of tissue-specic functions, 3) the existence and function of composite cis-regulatory elements and 4) the dynamics and relationship between human gene-body DNA methylation and gene expression.