Title:
Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training

dc.contributor.author Ter-Hovhannisyan,Vardges en_US
dc.contributor.author Lomsadze, Alexandre en_US
dc.contributor.author Chernoff, Yury O. en_US
dc.contributor.author Borodovsky, Mark en_US
dc.contributor.corporatename Georgia Institute of Technology. School of Biology en_US
dc.contributor.corporatename Georgia Institute of Technology. Dept. of Biomedical Engineering en_US
dc.contributor.corporatename Emory University. Dept. of Biomedical Engineering en_US
dc.contributor.corporatename Georgia Institute of Technology. College of Computing en_US
dc.date.accessioned 2013-10-03T20:24:02Z
dc.date.available 2013-10-03T20:24:02Z
dc.date.issued 2008-12
dc.description ©2008 by Cold Spring Harbor Laboratory en_US
dc.description DOI: 10.1101/gr.081612.108 en_US
dc.description.abstract We describe a new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes. The algorithm does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM). Instead, the anonymous genomic sequence in question is used as an input for iterative unsupervised training. The algorithm extends our previously developed method tested on genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. To better reflect features of fungal gene organization, we enhanced the intron submodel to accommodate sequences with and without branch point sites. This design enables the algorithm to work equally well for species with the kinds of variations in splicing mechanisms seen in the fungal phyla Ascomycota, Basidiomycota, and Zygomycota. Upon self-training, the intron submodel switches on in several steps to reach its full complexity. We demonstrate that the algorithm accuracy, both at the exon and the whole gene level, is favorably compared to the accuracy of gene finders that employ supervised training. Application of the new method to known fungal genomes indicates substantial improvement over existing annotations. By eliminating the effort necessary to build comprehensive training sets, the new algorithm can streamline and accelerate the process of annotation in a large number of fungal genome sequencing projects en_US
dc.identifier.citation Vardges Ter-Hovhannisyan, Alexandre Lomsadze,Yury O. Chernoff, and Mark Borodovsky, "Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training," Genome Research, (December 2008) 18:1979–1990. en_US
dc.identifier.doi 10.1101/gr.081612.108
dc.identifier.issn 1088-9051
dc.identifier.uri http://hdl.handle.net/1853/49178
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.publisher.original Cold Spring Harbor Laboratory Press en_US
dc.subject Gene prediction en_US
dc.subject Ab initio algorithm en_US
dc.subject Fungal genomes en_US
dc.subject Enhanced intron submodel en_US
dc.title Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training en_US
dc.type Text
dc.type.genre Article
dspace.entity.type Publication
local.contributor.author Chernoff, Yury O.
local.contributor.author Borodovsky, Mark
local.contributor.corporatename College of Sciences
local.contributor.corporatename School of Biological Sciences
relation.isAuthorOfPublication d9f3d192-f4c7-4db2-ace4-2baadbeb98b6
relation.isAuthorOfPublication fa975b84-f807-4cec-93a6-9df633afb791
relation.isOrgUnitOfPublication 85042be6-2d68-4e07-b384-e1f908fae48a
relation.isOrgUnitOfPublication c8b3bd08-9989-40d3-afe3-e0ad8d5c72b5
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
1979.pdf
Size:
467.81 KB
Format:
Adobe Portable Document Format
Description: