Toward Advancing the Definitions of Sequence-Discrete Prokaryotic Species and Intra-Species Units, and Quantifying Their Distribution Patterns in Marine Environments
Author(s)
Conrad, Roth Edward
Advisor(s)
Konstantinidis, Konstantinos T.
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Microbial communities constitute the majority of Earth's biodiversity and play fundamental roles in ecosystem function, biogeochemical cycling, and human health. Recent advancements in genomic and metagenomic technologies have revolutionized our understanding of microbial diversity, revealing intricate patterns of genetic variation and ecological adaptation within microbial populations. Central to this understanding is the concept of microbial species and intra-species units, defined largely by genomic similarity metrics such as the average nucleotide identity (ANI) and shared gene content. The delineation of microbial species has evolved beyond traditional morphological criteria to encompass genomic coherence and ecological cohesion. Studies have shown that microbial populations often exhibit discrete genomic clusters with high ANI (>95%) within species, while ANI values below 90% typically differentiate between species. These genomic discontinuities not only define species boundaries but also underscore the presence of finer-scale diversity within microbial populations, such as strains and sequence types, crucial for understanding ecological interactions and adaptive responses. In chapter 1, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of the 330, best-sampled bacterial species with at least 10 genome representatives each available in the public databases. Similar patterns were observed with long-read metagenomes, suggesting that the results reported in the chapter are not merely the effect of isolation biases. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity
The identification of species-level (previous work by the Konstantinidis Lab) and intra-species units (this thesis, Chapter 1) highlighted the imperative need to answer the question, what drives these sequence-discrete units? Moreover, the mechanisms that maintain genomic coherence within microbial populations, including ecological interactions and horizontal gene transfer mediated by homologous and non-homologous recombination, are pivotal for species stability and adaptive potential. These mechanisms challenge traditional models of microbial speciation based on asexual (clonal) reproduction, emphasizing the synergy between ecological cohesion and genetic exchange in shaping microbial diversity across diverse habitats. In chapter 3, by analyzing closely related isolate genomes from the same or related samples using a novel methodology to identify recent recombination events we show that high ecological cohesiveness coupled to frequent-enough and unbiased (i.e., not selection driven) horizontal gene flow, mediated by homologous recombination, often underlie the species- and intra-species-units. Ecological cohesiveness was inferred based on higher similarity in temporal abundance patterns of genomes of the same vs. different units, while recombination frequency was shown to have two times or more impact on sequence evolution than point (diversifying) mutation. Therefore, our results represent a departure compared to previous models of microbial speciation that invoke either ecology or selection-driven recombination, but not their synergistic effect, and provide a mechanistic explanation of how members of species- and intra-species units cohere together.
Despite elucidation of the mechanisms of species cohesion by the work described in Chapter 3, it is important to realize that microbial species are not constant but undergo substantial gene content gain and loss (or fluidity). For instance, environmental perturbations, such as changes in salinity or light intensity, can drive adaptive shifts within microbial populations, influencing the dynamics of core and accessory genome components. The maintenance of genomic diversity within populations, despite selective pressures, highlights the role of adaptive pangenomes in microbial ecology. Understanding these dynamics is essential for predicting microbial responses to environmental change and for harnessing microbial diversity in biotechnological applications. Toward addressing this knowledge gap, the work under Chapter 2 showed that the pangenome of the Salinibacter ruber, isolated over the course of one month from a single saltern, is open and similar in size to that of randomly sampled Escherichia coli genomes, isolated over many years by various labs across globe [Pangenome is defined as the total non-redundant genes of all members of a species or group of genomes]. While most of the accessory (noncore) genes of Sal. ruber were isolate-specific and showed low in situ abundances based on the metagenomes compared to the core genes, indicating that they were functionally unimportant and/or transient, 3.5% of them became abundant when salinity (but not light) conditions in the salterns changed and encoded for functions related to osmoregulation. Nonetheless, the ecological advantage of these genes, while significant, was apparently not strong enough to purge diversity within the population. Collectively, these results provide an explanation for how this immense intraspecies gene diversity is maintained, and quantified what fraction of the pangenome may be ecologically important during transitions in environmental conditions.
In marine environments, microbial communities exhibit depth-stratified diversity patterns, with distinct genomic adaptations correlating with environmental gradients. Metagenomic analyses of oceanic samples reveal site-specific genomic signatures and functional adaptations, shedding light on the genomic basis of microbial niche specialization and biogeochemical cycling in the marine realm. However, it remains challenging to determine when the populations retrieved from separate samples or locations are identical or not in terms of sequence diversity and gene content based on short-read data. To answer this question, the work described in chapter 5 employed the approaches developed in other chapters of this thesis as well as a new approach to define which reads recruited from a metagenome belong to a target population and define the ANIr concept (average nucleotide identity of mapped reads). By applying this new approach to samples from the Gulf of Mexico (GoM) described in chapter 4, this work showed that most populations showed high ANIr (i.e. were identical) in only one or a few samples of an ocean basin at similar depths, and that the ANIr decreased and gene-content differences increased between samples where a closely related population was detected (e.g., same 95% ANI-based genomospecies), and that also correlated with the distance (horizontal or vertical) between the samples. Accordingly, only a few truly cosmopolitan populations in the World’s oceans were identified. Interestingly, a few of these cosmopolitan populations, identified with closest matches to Alteromonas macleodii (97% AAI), Prochlorococcus marinus (79% AAI) and Desulfuromonas soudanensis (40% AAI), showed high relative abundance between samples from both the surface (0-200m) and the deep (>1000m). These data suggest that ubiquitous marine taxa may show significant endemic adaptation as they disperse, indicative of local population divergence and speciation, and provide a highly needed methodology to identify and track such populations.
Finally, the detection and monitoring of target genes such as nitrogen-cycling and antimicrobial resistance genes (ARGs) in microbial populations to assess the relative importance of different functions pose significant challenges due to sequence similarity with non-target genes and assembly artifacts. Novel computational tools, such as ROCker models, enhance the accuracy of ARG (or other target gene) detection from short-read sequences, providing robust frameworks for surveillance and management of antimicrobial resistance in environmental and clinical settings. However, the ROCker tool developed previously by the Konstantinidis lab has a few shortcomings when dealing with the more recent, big data that have become available. Most notably, it employs a couple old bioinformatics libraries that are not supported anymore and is not friendly to the non-expert user. The work under Chapter 6 describes the new version of ROCker that effectively alleviates these shortcomings, and the development of new ROCker models for families of ARGs that were not covered previously by a model. Specifically, novel ROCker models for macrolide resistance genes that target the broad functional classes of mcr, mph, erm, and lnu genes as well as models targeting specific clades containing mcr-1, mphA, ermB, lnuF, lnuB, and lnuG genes were developed and validated with simulated reads spanning a range of common read lengths (100, 150, 250, and 300 base pairs). Subsequently, these simulated reads were used to challenge the filtering efficacy of ROCker vs. common static filtering approaches. ROCker models generally had improved F1 scores [2 x precision x recall/(precision + recall)] and consistently showed lower false-positive rates (FPR) and false-negative rates (FNR) compared to alternative methods such as similarity searches using BLASTx with various e-value thresholds or hidden Markov models. The ROCker models and all related reference material and data are freely available through http://enve-omics.ce.gatech.edu/rocker/models. These new ROCker models further expand the available model collection developed previously for other genes, including β-lactamases, and their application to short-read metagenomes, metatranscriptomes, and PCR amplicon data should facilitate the reliable detection and quantification of antimicrobial resistance genes.
Therefore, this thesis integrated genomic and metagenomic approaches to elucidate microbial diversity, adaptation mechanisms, and speciation dynamics across various ecosystems. By exploring these interconnected themes, this thesis advanced our understanding of microbial community ecology, informed on the microbial species definition and surveillance strategies, and contributed to the sustainable management of microbial resources and health implications.
Sponsor
Date
2024-07-22
Extent
Resource Type
Text
Resource Subtype
Dissertation