Title:
High precision multi-genome scale reannotation of enzyme function by EFICAz

dc.contributor.author Arakaki, Adrian K.
dc.contributor.author Tian, Weidong
dc.contributor.author Skolnick, Jeffrey
dc.contributor.corporatename Georgia Institute of Technology. Center for the Study of Systems Biology
dc.contributor.corporatename Harvard Medical School. Dept. of Biological Chemistry and Molecular Pharmacology
dc.date.accessioned 2009-01-28T18:52:30Z
dc.date.available 2009-01-28T18:52:30Z
dc.date.issued 2006-12-13
dc.description ©2006 Arakaki et al; licensee BioMed Central Ltd. en
dc.description The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/7/315
dc.description doi:10.1186/1471-2164-7-315
dc.description.abstract Background: The functional annotation of most genes in newly sequenced genomes is inferred from similarity to previously characterized sequences, an annotation strategy that often leads to erroneous assignments. We have performed a reannotation of 245 genomes using an updated version of EFICAz, a highly precise method for enzyme function prediction. Results: Based on our three-field EC number predictions, we have obtained lower-bound estimates for the average enzyme content in Archaea (29%), Bacteria (30%) and Eukarya (18%). Most annotations added in KEGG from 2005 to 2006 agree with EFICAz predictions made in 2005. The coverage of EFICAz predictions is significantly higher than that of KEGG, especially for eukaryotes. Thousands of our novel predictions correspond to hypothetical proteins. We have identified a subset of 64 hypothetical proteins with low sequence identity to EFICAz training enzymes, whose biochemical functions have been recently characterized and find that in 96% (84%) of the cases we correctly identified their three-field (four-field) EC numbers. For two of the 64 hypothetical proteins: PA1167 from Pseudomonas aeruginosa, an alginate lyase (EC 4.2.2.3) and Rv1700 of Mycobacterium tuberculosis H37Rv, an ADP-ribose diphosphatase (EC 3.6.1.13), we have detected annotation lag of more than two years in databases. Two examples are presented where EFICAz predictions act as hypothesis generators for understanding the functional roles of hypothetical proteins: FLJ11151, a human protein overexpressed in cancer that EFICAz identifies as an endopolyphosphatase (EC 3.6.1.10), and MW0119, a protein of Staphylococcus aureus strain MW2 that we propose as candidate virulence factor based on its EFICAz predicted activity, sphingomyelin phosphodiesterase (EC 3.1.4.12). Conclusion: Our results suggest that we have generated enzyme function annotations of high precision and recall. These predictions can be mined and correlated with other information sources to generate biologically significant hypotheses and can be useful for comparative genome analysis and automated metabolic pathway reconstruction. en
dc.identifier.citation BMC Genomics 2006, 7:315 en
dc.identifier.issn 1471-2164
dc.identifier.uri http://hdl.handle.net/1853/26731
dc.language.iso en_US en
dc.publisher Georgia Institute of Technology en
dc.publisher.original BioMed Central
dc.subject EFICAz en
dc.subject Genome sequencing en
dc.subject Enzyme function prediction en
dc.subject Enzyme function annotation en
dc.subject Enzyme Function Inference by Combined Approach
dc.title High precision multi-genome scale reannotation of enzyme function by EFICAz en
dc.type Text
dc.type.genre Article
dspace.entity.type Publication
local.contributor.author Skolnick, Jeffrey
local.contributor.corporatename College of Sciences
local.contributor.corporatename School of Biological Sciences
local.contributor.corporatename Center for the Study of Systems Biology
relation.isAuthorOfPublication 80f29357-f18b-4635-abd1-628d627d301d
relation.isOrgUnitOfPublication 85042be6-2d68-4e07-b384-e1f908fae48a
relation.isOrgUnitOfPublication c8b3bd08-9989-40d3-afe3-e0ad8d5c72b5
relation.isOrgUnitOfPublication d3d635bd-b38e-4ef6-a2d0-0875b9a83e34
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
268.pdf
Size:
463.91 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.8 KB
Format:
Item-specific license agreed upon to submission
Description: