Title:
Low Richness -- Set 2

dc.contributor.author Konstantinidis, Kostas T.
dc.contributor.author Rodriguez, Luis M.
dc.contributor.corporatename Georgia Institute of Technology. School of Civil and Environmental Engineering en_US
dc.contributor.corporatename Georgia Institute of Technology. School of Biology en_US
dc.coverage.temporal June 2012 - July 2012
dc.date.accessioned 2014-02-19T13:58:12Z
dc.date.available 2014-02-19T13:58:12Z
dc.date.issued 2014-02-03
dc.description The files provide simulated metagenomic datasets, generated in silico from complete bacterial and archeal genomes. The read length and frequency of errors are based on Illumina technology. The objective of these simulated datasets was to evaluate the performance of Nonpareil, an algorithm and implementation designed to estimate the average coverage of metagenomic datasets. Nonpareil is described in the following publication: Abstract is from related publication, Rodriguez-R LM, Konstantinidis KT. (2013). Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. Bioinformatics btt584. doi: 10.1093/bioinformatics/btt584. Nonpareil, the method described in the related publication, which was tested using these data, can be found at https://github.com/lmrodriguezr/nonpareil/ under the Artistic License 2.0. en_US
dc.description These data are part of a larger collection of datasets, 13 in total, which were produced using only 282 genomes from the Escherichia coli, Yersinia pestis, Helicobacter pylori, and Staphylococcus aures, to simulate environments with low species richness and phylogenetic diversity, but high intra-species diversity. In order to recreate the full collection, please also see the additional tiers of files, Low Richness -- Set 1. Each tier contains one "README.txt" file in raw text format, as well as paired files with the same prefix. Those files ending with ".fa.gz" are the sequences of the simulated dataset in the FastA/gzipped format, and those files ending with ".genomes" are the tables of abundance per molecule. All files are packaged in a zipped file, and may need to be extracted before they can be used. en_US
dc.description.abstract Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to these limitations, central ecological questions with respect to the global distribution of microbes and the functional diversity of their communities cannot be robustly assessed. Results: We introduce Nonpareil, a method to estimate and project coverage in metagenomes. Nonpareil does not rely on high-quality assemblies, OTU calling, or comprehensive reference databases; thus, it is broadly applicable to metagenomic studies. Application of Nonpareil on available metagenomic datasets provided estimates on the relative complexity of soil, freshwater and human microbiome communities, and suggested that about 200Gb of sequencing data are required for 95% abundance-weighted average coverage of the soil communities analyzed. en_US
dc.description.sponsorship United States. Department of Energy en_US
dc.description.sponsorship National Science Foundation (U.S.) en_US
dc.embargo.terms null en_US
dc.identifier.citation Rodriguez-R LM, Konstantinidis KT. (2013). Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. Bioinformatics btt584. doi: 10.1093/bioinformatics/btt584 en_US
dc.identifier.uri http://hdl.handle.net/1853/51143
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.relation.ispartofseries Metagenomic Datasets Simulated In Silico for the Evaluation of Nonpareil
dc.relation.uri http://hdl.handle.net/1853/50838
dc.rights These data were collected from the National Center for Biotechnology Information (NCBI) database GenBank, which was designed to provide and encourage access within the scientific community to sources of current and comprehensive information. NCBI and Georgia Tech place no restrictions on the use or distribution of the data contained in this collection. However, some of the original data may be subject to patent, copyright, or other intellectual property rights. Neither NCBI nor Georgia Tech are in a position to assess the validity of such claims and since there is no transfer or rights from submitters to NCBI, NCBI has no rights to transfer to a third party. For more information on NCBI's copyright disclaimer, please see: http://www.ncbi.nlm.nih.gov/About/disclaimer.html
dc.subject Environmental and clinical microbiology en_US
dc.subject Bioinformatics applications en_US
dc.subject Sequence analysis en_US
dc.subject Metagenomics en_US
dc.subject Operational taxonomic units en_US
dc.subject Nonpareil en_US
dc.title Low Richness -- Set 2 en_US
dc.type Dataset en_US
dspace.entity.type Publication
local.contributor.author Konstantinidis, Kostas T.
local.contributor.corporatename Environmental Microbial Genomics Laboratory
local.contributor.corporatename School of Civil and Environmental Engineering
local.contributor.corporatename College of Engineering
local.relation.ispartofseries Metagenomic Datasets Simulated In Silico for the Evaluation of Nonpareil
relation.isAuthorOfPublication f66cc347-a0bd-44a1-ac96-d4f61b26368a
relation.isOrgUnitOfPublication d5ae838e-a56b-420d-95d3-ec6584c98d2c
relation.isOrgUnitOfPublication 88639fad-d3ae-4867-9e7a-7c9e6d2ecc7c
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
relation.isSeriesOfPublication bfa5b4bb-4787-48ff-b7cf-4b56a9282aa8
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
lr_2.tar.gz
Size:
4.54 GB
Format:
Unknown data format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.13 KB
Format:
Item-specific license agreed upon to submission
Description: