Organizational Unit:
Center for the Study of Systems Biology

Research Organization Registry ID
Description
Previous Names
Parent Organization
Parent Organization
Organizational Unit
Includes Organization(s)
ArchiveSpace Name Record

Publication Search Results

Now showing 1 - 10 of 21
Thumbnail Image
Item

Ab initio protein structure prediction using chunk-TASSER

2007-09 , Zhou, Hongyi , Skolnick, Jeffrey

We have developed an ab initio protein structure prediction method called chunk-TASSER that uses ab initio folded supersecondary structure chunks of a given target as well as threading templates for obtaining contact potentials and distance restraints. The predicted chunks, selected on the basis of a new fragment comparison method, are folded by a fragment insertion method. Full-length models are built and refined by the TASSER methodology, which searches conformational space via parallel hyperbolic Monte Carlo. We employ an optimized reduced force field that includes knowledge-based statistical potentials and restraints derived from the chunks as well as threading templates. The method is tested on a dataset of 425 hard target proteins 0;250 amino acids in length. The average TM-scores of the best of top five models per target are 0.266, 0.336, and 0.362 by the threading algorithm SP3, original TASSER and chunk-TASSER, respectively. For a subset of 80 proteins with predicted a-helix content "'50%, these averages are 0.284, 0.356, and 0.403, respectively. The percentages of proteins with the best of top five models having TM-score "'0.4 (a statistically significant threshold for structural similarity) are 3.76, 20.94, and 28.94% by SP3, TASSER, and chunk-TASSER, respectively, overall, while for the subset of 80 predominantly helical proteins, these percentages are 2.50, 23.75, and 41.25%. Thus, chunk-TASSER shows a significant improvement over TASSER for modeling hard targets where no good template can be identified. We also tested chunk-TASSER on 21 mediumlhard targets <200 amino-acids-Iongfrom CASP7. Chunk-TASSER is -11% (10%) better than TASSER for the total TM-score of the first (best of top five) models. ChunkTASSER is fully automated and can be used in proteome scale protein structure prediction.

Thumbnail Image
Item

TASSER-Lite: an automated tool for protein comparative modeling

2006-12 , Pandit, Shashi Bhushan , Zhang, Yang , Skolnick, Jeffrey

This study involves the development of a rapid comparative modeling tool for homologous sequences by extension of the TASSER methodology, developed for tertiary structure prediction. This comparative modeling procedure was validated on a representative benchmark set of proteins in the Protein Data Bank composed of 901 single domain proteins (41- 200 residues) having sequence identities between 35-90% with respect to the template. Using a Monte Carta search scheme with the length of runs optimized lor weakly/nonhomologous proteins, TASSER often provides appreciable improvement in structure quality over the initial template. However, on average, this requires - 29 h of CPU time per sequence. Since homologous proteins are unlikely to require the extent of conformational search as weakly/nonhomologous proteins, TASSER's parameters were optimized to reduce the required CPU time to - 17 min, while retaining TASSER's ability to improve structure quality. Using this optimized TASSER (T ASSER-Lite), we find an average improvement in the aligned region of - 10% in root mean-square deviation from native over the initial template. Comparison of TASSER-Lite with the widely used comparative modeling tool MODELLER showed that TASSER-Lite yields final models that are closer to the native. TASSER-Lite is provided on the web at http://cssb.biology.gatech.edulskolnicklwebserviceltassertiteflndex.html.

Thumbnail Image
Item

TM-align: a protein structure alignment algorithm based on the TM-score

2005-04-22 , Zhang, Yang , Skolnick, Jeffrey

We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and Dynamic Programming (DP). The algorithm is ~4 times faster than CE and 20 times faster than DALI and SAL. On average, the resulting structure alignments have higher accuracy and coverage than those provided by these most often-used methods. TM-align is applied to an all-against-all structure comparison of 10 515 representative protein chains from the Protein Data Bank (PDB) with a sequence identity cutoff,95%: 1996 distinct folds are found when a TM-score threshold of 0.5 is used. We also use TM-align to match the models predicted by TASSER for solved non-homologous proteins in PDB. For both folded and misfolded models, TM-align can almost always find close structural analogs, with an average root mean square deviation, RMSD, of 3 A° and 87% alignment coverage. Nevertheless, there exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB. This correlation could be used to assist in model selection in blind protein structure predictions.

Thumbnail Image
Item

Application of sparse NMR restraints to large-scale protein structure prediction

2004-08 , Li, Wei , Zhang Yang , Skolnick, Jeffrey

The protein structure prediction algorithm TOUCHSTONEX that uses sparse distance restraints derived from NMR nuclear Overhauser enhancement (NOE) data to predict protein structures at low-to-medium resolution was evaluated as follows: First, a representative benchmark set of the Protein Data Bank library consisting of 1365 proteins up to 200 residues was employed. Using N/8 simulated long-range restraints, where N is the number of residues, 1023 (75%) proteins were folded to a Ca root-mean-square deviation (RMSD) from native ,6.5A˚ in one of the top five models. The average RMSD of the models for all 1365 proteins is 5.0 A˚ . Using N/4 simulated restraints, 1206 (88%) proteins were folded to a RMSD ,6.5 A˚ and the average RMSD improved to 4.1 A˚ . Then, 69 proteins with experimental NMR data were used. Using long-range NOE-derived restraints, 47 proteins were folded to a RMSD ,6.5 A˚ with N/8 restraints and 61 proteins were folded to a RMSD ,6.5 A˚ with N/4 restraints. Thus, TOUCHSTONEX can be a tool for NMR-based rapid structure determination, as well as used in other experimental methods that can provide tertiary restraint information.

Thumbnail Image
Item

Ab initio modeling of small proteins by iterative TASSER simulations

2007-05-08 , Wu, Sitao , Skolnick, Jeffrey , Zhang, Yang

Background: Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins. Results: We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Cα-root mean square deviation (RMSD) of 3.8Å, with 6 of them having a Cα-RMSD < 2.5Å. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Cα-RMSD < 2.5Å. The average Cα-RMSD of the I-TASSER models was 3.9Å, whereas it was 5.9Å using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Cα-RMSD of 3.9Å was obtained for the third benchmark, with seven cases having a Cα-RMSD < 2.5Å. Conclusion: Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of ITASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users http://zhang.bioinformatics.ku.edu/I-TASSER.

Thumbnail Image
Item

Onset of anthrax toxin pore formation

2006-05 , Gao, Mu , Schulten, Klaus

Protective antigen (PA) is the anthrax toxin protein recognized by capillary morphogenesis gene 2 (CMG2), a transmembrane cellular receptor. Upon activation, seven ligand-receptor units self-assemble into a heptameric ring-like complex that becomes endocytozed by the host cell. A critical step in the subsequent intoxication process is the formation and insertion of a pore into the endosome membrane by PA. The pore conversion requires a change in binding between PA and its receptor in the acidified endosome environment. Molecular dynamics simulations totaling ;136 ns on systems of over 92,000 atoms were performed. The simulations revealed how the PA-CMG2 complex, stable at neutral conditions, becomes transformed at low pH upon protonation of His-121 and Glu-122, two conserved amino acids of the receptor. The protonation disrupts a salt bridge important for the binding stability and leads to the detachment of PA domain II, which weakens the stability of the PA-CMG2 complex significantly, and subsequently releases a PA segment needed for pore formation. The simulations also explain the great strength of the PA-CMG2 complex achieves through extraordinary coordination of a divalent cation.

Thumbnail Image
Item

EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference

2004-12-01 , Tian, Weidong , Arakaki, Adrian K. , Skolnick, Jeffrey

EFICAz (Enzyme Function Inference by Combined Approach) is an automatic engine for large-scale enzyme function inference that combines predictions from four different methods developed and optimized to achieve high prediction accuracy: (i) recognition of functionally discriminating residues (FDRs) in enzyme families obtained by a Conservation controlled HMM Iterative procedure for Enzyme Family classification (CHIEFc), (ii) pairwise sequence comparison using a family specific Sequence Identity Threshold, (iii) recognition of FDRs in Multiple Pfam enzyme families, and (iv) recognition of multiple Prosite patterns of high specificity. For FDR (i.e. conserved positions in an enzyme family that discriminate between true and false members of the family) identification, we have developed an Evolutionary Footprinting method that uses evolutionary information from homofunctional and heterofunctional multiple sequence alignments associated with an enzyme family. The FDRs show a significant correlation with annotated active site residues. In a jackknife test, EFICAz shows high accuracy (92%) and sensitivity (82%) for predicting four EC digits in testing sequences that are ,40% identical to any member of the corresponding training set. Applied to Escherichia coli genome, EFICAz assigns more detailed enzymatic function than KEGG, and generates numerous novel predictions.

Thumbnail Image
Item

High precision multi-genome scale reannotation of enzyme function by EFICAz

2006-12-13 , Arakaki, Adrian K. , Tian, Weidong , Skolnick, Jeffrey

Background: The functional annotation of most genes in newly sequenced genomes is inferred from similarity to previously characterized sequences, an annotation strategy that often leads to erroneous assignments. We have performed a reannotation of 245 genomes using an updated version of EFICAz, a highly precise method for enzyme function prediction. Results: Based on our three-field EC number predictions, we have obtained lower-bound estimates for the average enzyme content in Archaea (29%), Bacteria (30%) and Eukarya (18%). Most annotations added in KEGG from 2005 to 2006 agree with EFICAz predictions made in 2005. The coverage of EFICAz predictions is significantly higher than that of KEGG, especially for eukaryotes. Thousands of our novel predictions correspond to hypothetical proteins. We have identified a subset of 64 hypothetical proteins with low sequence identity to EFICAz training enzymes, whose biochemical functions have been recently characterized and find that in 96% (84%) of the cases we correctly identified their three-field (four-field) EC numbers. For two of the 64 hypothetical proteins: PA1167 from Pseudomonas aeruginosa, an alginate lyase (EC 4.2.2.3) and Rv1700 of Mycobacterium tuberculosis H37Rv, an ADP-ribose diphosphatase (EC 3.6.1.13), we have detected annotation lag of more than two years in databases. Two examples are presented where EFICAz predictions act as hypothesis generators for understanding the functional roles of hypothetical proteins: FLJ11151, a human protein overexpressed in cancer that EFICAz identifies as an endopolyphosphatase (EC 3.6.1.10), and MW0119, a protein of Staphylococcus aureus strain MW2 that we propose as candidate virulence factor based on its EFICAz predicted activity, sphingomyelin phosphodiesterase (EC 3.1.4.12). Conclusion: Our results suggest that we have generated enzyme function annotations of high precision and recall. These predictions can be mined and correlated with other information sources to generate biologically significant hypotheses and can be useful for comparative genome analysis and automated metabolic pathway reconstruction.

Thumbnail Image
Item

Structure Modeling of All Identified G Protein–Coupled Receptors in the Human Genome

2006-02 , Zhang, Yang , DeVries, Mark E. , Skolnick, Jeffrey

G protein–coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global Ca root-mean-squared deviation from native of 4.6 A°, with a root-mean-squared deviation in the transmembrane helix region of 2.1A°. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis.

Thumbnail Image
Item

Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins

2004-10 , Zhang, Yang , Skolnick, Jeffrey

We evaluate tertiary structure predictions on medium to large size proteins by TASSER, a new algorithm that assembles protein structures through rearranging the rigid fragments from threading templates guided by a reduced Ca and side-chain based potential consistent with threading based tertiary restraints. Predictions were generated for 745 proteins 201– 300 residues in length that cover the Protein Data Bank (PDB) at the level of 35% sequence identity. With homologous proteins excluded, in 365 cases, the templates identified by our threading program, PROSPECTOR_3, have a root-mean-square deviation (RMSD) to native , 6.5 A˚ , with .70% alignment coverage. After TASSER assembly, in 408 cases the best of the top five full-length models has a RMSD , 6.5 A˚ . Among the 745 targets are 18 membrane proteins, with one-third having a predicted RMSD , 5.5 A˚ . For all representative proteins less than or equal to 300 residues that have corresponding multiple NMR structures in the Protein Data Bank, 20% of the models generated by TASSER are closer to the NMR structure centroid than the farthest individual NMR model. These results suggest that reasonable structure predictions for nonhomologous large size proteins can be automatically generated on a proteomic scale, and the application of this approach to structural as well as functional genomics represent promising applications of TASSER.