Novel Bioinformatic Approaches To Assess Genetic Relatedness, Degree Of Endemicity And Associations Among Viral And Bacterial Communities Along The Chattahoochee Riverine System
Loading...
Author(s)
Ruiz Perez, Carlos Alexander
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Bacteriophages, the most abundant biological entities on earth, play a major role in biogeochemical cycling by infecting and killing millions of bacterial cells daily. Not only they affect the flow of nutrients, but also have the capacity to modulate or modify the behavior of their hosts. The advent of next-generation sequencing (NGS) and bioinformatics methods have revealed the presence of phages in virtually every environment where microorganisms are present, their huge diversity, and functional potential. However, we are still scratching the surface. While large-scale studies in ocean environments revealed that viral communities follow similar ecological patterns as their hosts in terms of seasonality and biogeography, other environments such as freshwater systems remain to be fully characterized. To provide additional insight into the diversity and ecology of freshwater bacteriophage communities and their potential hosts, in Chapter 2 we performed a metagenomic survey of five interconnected lakes and one estuarine location along the Chattahoochee River in the southeast USA. We recovered a large fraction of viral genomes that were unclassified, highlighting the extant viral diversity in freshwater systems. We found differences among lake viral (and bacterial) communities along the river, which correlated with geographic distances and environmental parameters. These differences were mainly driven by the presence of endemic viral populations with preferential abundances towards one or a couple of locations. The potential interactions between phages and their hosts revealed a complex interaction network where phages and hosts preferentially interacted in smaller sub-communities with similar endemicity preferences and where non-endemic phages could act as bridges between subcommunities. A closer focus on phages that infect Cyanobacteria in Chapter 3 revealed underexplored diversity compared to ocean representatives and the presence of photosynthesis-related, bacteria-derived metabolic genes which we demonstrated are evolutionarily divergent from ocean versions and can serve as markers to identify and determine the origin cyanophages. A common theme observed in the Chattahoochee River microbial communities was the large portion of genomes without classification or close representatives in reference databases. While several methods currently exist for the estimation of genome relatedness in microorganisms, many of them suffer from poor scalability and speed. To address these issues, in Chapter 4 we developed FastAAI, a tool aimed at the fast and accurate estimation of genome relatedness in microorganisms. We demonstrated that FastAAI is consistent with traditional Average Amino-acid Identities (AAI) while being orders of magnitude faster than current implementations and requiring less computational resources. This allows FastAAI to scale in response to the ever-increasing number of genomes recovered from environmental samples and makes it suitable for replacing traditional AAI implementations. Finally, we focused on functional annotation, an important step when characterizing microbial genomes. While there are multiple tools for this purpose, fast tools use small, non-comprehensive databases, which result in incomplete annotations. Comprehensive and complex tools, on the other hand, rarely scale with multiple genomes and often their results are not intuitive, requiring additional work to extract relevant information. This is a limitation, especially for large datasets that are not easily parsed without command-line interface experience. In Chapter 5 we present MicrobeAnnotator, a user-friendly pipeline for the annotation of microbial genomes. We demonstrated that MicrobeAnnotator can comprehensively annotate microbial genomes using multiple databases to speed the annotation process and summarizes individual protein annotations into module and pathways, translating into ready-to-use and easy to interpret results. We compared MicrobeAnnotator with smaller, faster tools that demonstrated the higher fraction of proteins annotated by MicrobeAnnotator, while comparisons with more complex tools revealed the same level of annotations but the lower use of computational resources by MicrobeAnnotator.
Sponsor
Date
2021-06-17
Extent
Resource Type
Text
Resource Subtype
Dissertation