Organizational Unit:
College of Sciences

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Includes Organization(s)
Organizational Unit
Organizational Unit
Organizational Unit

Publication Search Results

Now showing 1 - 5 of 5
  • Item
    EFICAz²: enzyme function inference by a combined approach enhanced by machine learning
    (Georgia Institute of Technology, 2009-04-13) Arakaki, Adrian K. ; Huang, Ying ; Skolnick, Jeffrey
    Background: We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. Results: We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz², exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz² and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz² generates considerably more unique assignments than KEGG. Conclusion: Performance benchmarks and the comparison with KEGG demonstrate that EFICAz² is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz² web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html
  • Item
    Identification of metabolites with anticancer properties by computational metabolomics
    (Georgia Institute of Technology, 2008-06-17) Arakaki, Adrian K. ; Mezencev, Roman ; Bowen, Nathan J. ; Huang, Ying ; McDonald, John F. ; Skolnick, Jeffrey
    Background: Certain endogenous metabolites can influence the rate of cancer cell growth. For example, diacylglycerol, ceramides and sphingosine, NAD+ and arginine exert this effect by acting as signaling molecules, while carrying out other important cellular functions. Metabolites can also be involved in the control of cell proliferation by directly regulating gene expression in ways that are signaling pathway-independent, e.g. by direct activation of transcription factors or by inducing epigenetic processes. The fact that metabolites can affect the cancer process on so many levels suggests that the change in concentration of some metabolites that occurs in cancer cells could have an active role in the progress of the disease. Results: CoMet, a fully automated Computational Metabolomics method to predict changes in metabolite levels in cancer cells compared to normal references has been developed and applied to Jurkat T leukemia cells with the goal of testing the following hypothesis: Up or down regulation in cancer cells of the expression of genes encoding for metabolic enzymes leads to changes in intracellular metabolite concentrations that contribute to disease progression. All nine metabolites predicted to be lowered in Jurkat cells with respect to lymphoblasts that were examined (riboflavin, tryptamine, 3- sulfino-L-alanine, menaquinone, dehydroepiandrosterone, α-hydroxystearic acid, hydroxyacetone, seleno-L-methionine and 5,6-dimethylbenzimidazole), exhibited antiproliferative activity that has not been reported before, while only two (bilirubin and androsterone) of the eleven tested metabolites predicted to be increased or unchanged in Jurkat cells displayed significant antiproliferative activity. Conclusion: These results: a) demonstrate that CoMet is a valuable method to identify potential compounds for experimental validation, b) indicate that cancer cell metabolism may be regulated to reduce the intracellular concentration of certain antiproliferative metabolites, leading to uninhibited cellular growth and c) suggest that many other endogenous metabolites with important roles in carcinogenesis are awaiting discovery.
  • Item
    The Mosaic Genome of Anaeromyxobacter dehalogenans Strain 2CP-C Suggests an Aerobic Common Ancestor to the Delta-Proteobacteria
    (Georgia Institute of Technology, 2008-05-07) Thomas, Sara H. ; Wagner, Ryan D. ; Arakaki, Adrian K. ; Skolnick, Jeffrey ; Kirby, John R. ; Shimkets, Lawrence J. ; Sanford, Robert A. ; Löffler, Frank E.
    Anaeromyxobacter dehalogenans strain 2CP-C is a versaphilic delta-Proteobacterium distributed throughout many diverse soil and sediment environments. 16S rRNA gene phylogenetic analysis groups A. dehalogenans together with the myxobacteria, which have distinguishing characteristics including strictly aerobic metabolism, sporulation, fruiting body formation, and surface motility. Analysis of the 5.01 Mb strain 2CP-C genome substantiated that this organism is a myxobacterium but shares genotypic traits with the anaerobic majority of the delta-Proteobacteria (i.e., the Desulfuromonadales). Reflective of its respiratory versatility, strain 2CP-C possesses 68 genes coding for putative c-type cytochromes, including one gene with 40 heme binding motifs. Consistent with its relatedness to the myxobacteria, surface motility was observed in strain 2CP-C and multiple types of motility genes are present, including 28 genes for gliding, adventurous (A-) motility and 17 genes for type IV pilus-based motility (i.e., social (S-) motility) that all have homologs in Myxococcus xanthus. Although A. dehalogenans shares many metabolic traits with the anaerobic majority of the delta- Proteobacteria, strain 2CP-C grows under microaerophilic conditions and possesses detoxification systems for reactive oxygen species. Accordingly, two gene clusters coding for NADH dehydrogenase subunits and two cytochrome oxidase gene clusters in strain 2CP-C are similar to those in M. xanthus. Remarkably, strain 2CP-C possesses a third NADH dehydrogenase gene cluster and a cytochrome cbb3 oxidase gene cluster, apparently acquired through ancient horizontal gene transfer from a strictly anaerobic green sulfur bacterium. The mosaic nature of the A. dehalogenans strain 2CP-C genome suggests that the metabolically versatile, anaerobic members of the delta-Proteobacteria may have descended from aerobic ancestors with complex lifestyles.
  • Item
    High precision multi-genome scale reannotation of enzyme function by EFICAz
    (Georgia Institute of Technology, 2006-12-13) Arakaki, Adrian K. ; Tian, Weidong ; Skolnick, Jeffrey
    Background: The functional annotation of most genes in newly sequenced genomes is inferred from similarity to previously characterized sequences, an annotation strategy that often leads to erroneous assignments. We have performed a reannotation of 245 genomes using an updated version of EFICAz, a highly precise method for enzyme function prediction. Results: Based on our three-field EC number predictions, we have obtained lower-bound estimates for the average enzyme content in Archaea (29%), Bacteria (30%) and Eukarya (18%). Most annotations added in KEGG from 2005 to 2006 agree with EFICAz predictions made in 2005. The coverage of EFICAz predictions is significantly higher than that of KEGG, especially for eukaryotes. Thousands of our novel predictions correspond to hypothetical proteins. We have identified a subset of 64 hypothetical proteins with low sequence identity to EFICAz training enzymes, whose biochemical functions have been recently characterized and find that in 96% (84%) of the cases we correctly identified their three-field (four-field) EC numbers. For two of the 64 hypothetical proteins: PA1167 from Pseudomonas aeruginosa, an alginate lyase (EC 4.2.2.3) and Rv1700 of Mycobacterium tuberculosis H37Rv, an ADP-ribose diphosphatase (EC 3.6.1.13), we have detected annotation lag of more than two years in databases. Two examples are presented where EFICAz predictions act as hypothesis generators for understanding the functional roles of hypothetical proteins: FLJ11151, a human protein overexpressed in cancer that EFICAz identifies as an endopolyphosphatase (EC 3.6.1.10), and MW0119, a protein of Staphylococcus aureus strain MW2 that we propose as candidate virulence factor based on its EFICAz predicted activity, sphingomyelin phosphodiesterase (EC 3.1.4.12). Conclusion: Our results suggest that we have generated enzyme function annotations of high precision and recall. These predictions can be mined and correlated with other information sources to generate biologically significant hypotheses and can be useful for comparative genome analysis and automated metabolic pathway reconstruction.
  • Item
    EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference
    (Georgia Institute of Technology, 2004-12-01) Tian, Weidong ; Arakaki, Adrian K. ; Skolnick, Jeffrey
    EFICAz (Enzyme Function Inference by Combined Approach) is an automatic engine for large-scale enzyme function inference that combines predictions from four different methods developed and optimized to achieve high prediction accuracy: (i) recognition of functionally discriminating residues (FDRs) in enzyme families obtained by a Conservation controlled HMM Iterative procedure for Enzyme Family classification (CHIEFc), (ii) pairwise sequence comparison using a family specific Sequence Identity Threshold, (iii) recognition of FDRs in Multiple Pfam enzyme families, and (iv) recognition of multiple Prosite patterns of high specificity. For FDR (i.e. conserved positions in an enzyme family that discriminate between true and false members of the family) identification, we have developed an Evolutionary Footprinting method that uses evolutionary information from homofunctional and heterofunctional multiple sequence alignments associated with an enzyme family. The FDRs show a significant correlation with annotated active site residues. In a jackknife test, EFICAz shows high accuracy (92%) and sensitivity (82%) for predicting four EC digits in testing sequences that are ,40% identical to any member of the corresponding training set. Applied to Escherichia coli genome, EFICAz assigns more detailed enzymatic function than KEGG, and generates numerous novel predictions.