Organizational Unit:
Center for the Study of Systems Biology

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 10 of 46
  • Item
    A Threading-Based Method for the Prediction of DNABinding Proteins with Application to the Human GenomeProteins with Application to the Human Genome
    (Georgia Institute of Technology, 2009-11-13) Gao, Mu ; Skolnick, Jeffrey
    Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader, for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein’s sequence. In our approach,fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 A°of their experimental structures, with their associated DNA binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that ,30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.
  • Item
    FINDSITE LHM: A Threading-Based Approach to Ligand Homology Modeling
    (Georgia Institute of Technology, 2009-06-05) Brylinski, Michal ; Skolnick, Jeffrey
    Ligand virtual screening is a widely used tool to assist in new pharmaceutical discovery. In practice, virtual screening approaches have a number of limitations, and the development of new methodologies is required. Previously, we showed that remotely related proteins identified by threading often share a common binding site occupied by chemically similar ligands. Here, we demonstrate that across an evolutionarily related, but distant family of proteins, the ligands that bind to the common binding site contain a set of strongly conserved anchor functional groups as well as a variable region that accounts for their binding specificity. Furthermore, the sequence and structure conservation of residues contacting the anchor functional groups is significantly higher than those contacting ligand variable regions. Exploiting these insights, we developed FINDSITELHM that employs structural information extracted from weakly related proteins to perform rapid ligand docking by homology modeling. In large scale benchmarking, using the predicted anchor-binding mode and the crystal structure of the receptor, FINDSITELHM outperforms classical docking approaches with an average ligand RMSD from native of ,2.5 A° . For weakly homologous receptor protein models, using FINDSITELHM, the fraction of recovered binding residues and specific contacts is 0.66 (0.55) and 0.49 (0.38) for highly confident (all) targets, respectively. Finally, in virtual screening for HIV-1 protease inhibitors, using similarity to the ligand anchor region yields significantly improved enrichment factors. Thus, the rather accurate, computationally inexpensive FINDSITELHM algorithm should be a useful approach to assist in the discovery of novel biopharmaceuticals.
  • Item
    EFICAz²: enzyme function inference by a combined approach enhanced by machine learning
    (Georgia Institute of Technology, 2009-04-13) Arakaki, Adrian K. ; Huang, Ying ; Skolnick, Jeffrey
    Background: We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. Results: We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz², exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz² and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz² generates considerably more unique assignments than KEGG. Conclusion: Performance benchmarks and the comparison with KEGG demonstrate that EFICAz² is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz² web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html
  • Item
    From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions
    (Georgia Institute of Technology, 2009-04-03) Gao, Mu ; Skolnick, Jeffrey
    DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Ca deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein.
  • Item
    Protein structure prediction by pro-Sp3-TASSER
    (Georgia Institute of Technology, 2009-03) Zhou, Hongyi ; Skolnick, Jeffrey
    An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP3. Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare prosp3- TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available .
  • Item
    Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score
    (Georgia Institute of Technology, 2008-12-12) Pandit, Shashi Bhushan ; Skolnick, Jeffrey
    Background: Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true. Results: To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TMalign, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9%) and coverage (~7%) in comparison to those generated by TM-align. Fr- TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align. Conclusion: Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr- TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/.
  • Item
    Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints
    (Georgia Institute of Technology, 2008-08) Lee, Seung Yup ; Skolnick, Jeffrey
    To improve tertiary structure predictions of more difficult targets, the next generation of TASSER, TASSER_2.0, has been developed. TASSER_2.0 incorporates more accurate side-chain contact restraint predictions from a new approach, the composite-sequence method, based on consensus restraints generated by an improved threading algorithm, PROSPECTOR_3.5, which uses computationally evolved and wild-type template sequences as input. TASSER_2.0 was tested on a large-scale, benchmark set of 2591 nonhomologous, single domain proteins " 200 residues that cover the Protein Data Bank at 35% pairwise sequence identity. Compared with the average fraction of accurately predicted side-chain contacts of 0.37 using PROSPECTOR_3.5 with wildtype template sequences, the average accuracy of the composite-sequence method increases to 0.60. The resulting TASSER_2.0 models are closerto their native structures, with an average root mean-square deviation of 4.99 A compared to the 5.31 A result of TASSER. Defining a successful prediction as a model with a root mean-square deviation to native < 6.5 A. the success rate of TASSER_2.0 (TASSER) for Medium targets (targets with good templates/poor alignments) is 74.3% (64.7%) and 40.8% (35.5%) for the Hard targets (incorrect templates/alignments). For Easy targets (good templates/alignments), the success rate slightly increases from 86.3% to 88.4%
  • Item
    Identification of metabolites with anticancer properties by computational metabolomics
    (Georgia Institute of Technology, 2008-06-17) Arakaki, Adrian K. ; Mezencev, Roman ; Bowen, Nathan J. ; Huang, Ying ; McDonald, John F. ; Skolnick, Jeffrey
    Background: Certain endogenous metabolites can influence the rate of cancer cell growth. For example, diacylglycerol, ceramides and sphingosine, NAD+ and arginine exert this effect by acting as signaling molecules, while carrying out other important cellular functions. Metabolites can also be involved in the control of cell proliferation by directly regulating gene expression in ways that are signaling pathway-independent, e.g. by direct activation of transcription factors or by inducing epigenetic processes. The fact that metabolites can affect the cancer process on so many levels suggests that the change in concentration of some metabolites that occurs in cancer cells could have an active role in the progress of the disease. Results: CoMet, a fully automated Computational Metabolomics method to predict changes in metabolite levels in cancer cells compared to normal references has been developed and applied to Jurkat T leukemia cells with the goal of testing the following hypothesis: Up or down regulation in cancer cells of the expression of genes encoding for metabolic enzymes leads to changes in intracellular metabolite concentrations that contribute to disease progression. All nine metabolites predicted to be lowered in Jurkat cells with respect to lymphoblasts that were examined (riboflavin, tryptamine, 3- sulfino-L-alanine, menaquinone, dehydroepiandrosterone, α-hydroxystearic acid, hydroxyacetone, seleno-L-methionine and 5,6-dimethylbenzimidazole), exhibited antiproliferative activity that has not been reported before, while only two (bilirubin and androsterone) of the eleven tested metabolites predicted to be increased or unchanged in Jurkat cells displayed significant antiproliferative activity. Conclusion: These results: a) demonstrate that CoMet is a valuable method to identify potential compounds for experimental validation, b) indicate that cancer cell metabolism may be regulated to reduce the intracellular concentration of certain antiproliferative metabolites, leading to uninhibited cellular growth and c) suggest that many other endogenous metabolites with important roles in carcinogenesis are awaiting discovery.
  • Item
    DBD-Hunter: a knowledge-based method for the prediction of DNA protein interactions
    (Georgia Institute of Technology, 2008-05-31) Gao, Mu ; Skolnick, Jeffrey
    The structures of DNA–protein complexes have illuminated the diversity of DNA–protein binding mechanisms shown by different protein families. This lack of generality could pose a great challenge for predicting DNA–protein interactions. To address this issue, we have developed a knowledge-based method, DNA-binding Domain Hunter (DBD-Hunter), for identifying DNA-binding proteins and associated binding sites. The method combines structural comparison and the evaluation of a statistical potential, which we derive to describe interactions between DNA base pairs and protein residues. We demonstrate that DBD-Hunter is an accurate method for predicting DNA-binding function of proteins, and that DNA-binding protein residues can be reliably inferred from the corresponding templates if identified. In benchmark tests on ~4000 proteins, our method achieved an accuracy of 98% and a precision of 84%, which significantly outperforms three previous methods. We further validate the method on DNAbinding protein structures determined in DNA-free (apo) state. Weshow that the accuracy of our method is only slightly affected on apo-structures compared to the performance on holo-structures cocrystallized with DNA. Finally, we apply the method to ~1700 structural genomics targets and predict that 37 targets with previously unknown function are likely to be DNA-binding proteins.
  • Item
    The Mosaic Genome of Anaeromyxobacter dehalogenans Strain 2CP-C Suggests an Aerobic Common Ancestor to the Delta-Proteobacteria
    (Georgia Institute of Technology, 2008-05-07) Thomas, Sara H. ; Wagner, Ryan D. ; Arakaki, Adrian K. ; Skolnick, Jeffrey ; Kirby, John R. ; Shimkets, Lawrence J. ; Sanford, Robert A. ; Löffler, Frank E.
    Anaeromyxobacter dehalogenans strain 2CP-C is a versaphilic delta-Proteobacterium distributed throughout many diverse soil and sediment environments. 16S rRNA gene phylogenetic analysis groups A. dehalogenans together with the myxobacteria, which have distinguishing characteristics including strictly aerobic metabolism, sporulation, fruiting body formation, and surface motility. Analysis of the 5.01 Mb strain 2CP-C genome substantiated that this organism is a myxobacterium but shares genotypic traits with the anaerobic majority of the delta-Proteobacteria (i.e., the Desulfuromonadales). Reflective of its respiratory versatility, strain 2CP-C possesses 68 genes coding for putative c-type cytochromes, including one gene with 40 heme binding motifs. Consistent with its relatedness to the myxobacteria, surface motility was observed in strain 2CP-C and multiple types of motility genes are present, including 28 genes for gliding, adventurous (A-) motility and 17 genes for type IV pilus-based motility (i.e., social (S-) motility) that all have homologs in Myxococcus xanthus. Although A. dehalogenans shares many metabolic traits with the anaerobic majority of the delta- Proteobacteria, strain 2CP-C grows under microaerophilic conditions and possesses detoxification systems for reactive oxygen species. Accordingly, two gene clusters coding for NADH dehydrogenase subunits and two cytochrome oxidase gene clusters in strain 2CP-C are similar to those in M. xanthus. Remarkably, strain 2CP-C possesses a third NADH dehydrogenase gene cluster and a cytochrome cbb3 oxidase gene cluster, apparently acquired through ancient horizontal gene transfer from a strictly anaerobic green sulfur bacterium. The mosaic nature of the A. dehalogenans strain 2CP-C genome suggests that the metabolically versatile, anaerobic members of the delta-Proteobacteria may have descended from aerobic ancestors with complex lifestyles.