Organizational Unit:

Center for the Study of Systems Biology

Permanent Link

https://hdl.handle.net/1853/70795

Parent Organization

Organizational Unit

School of Biological Sciences

Full item page

Publication Search Results

Now showing 1 - 7 of 7

Ab initio modeling of small proteins by iterative TASSER simulations

(Georgia Institute of Technology, 2007-05-08) Wu, Sitao ; Skolnick, Jeffrey ; Zhang, Yang

Background: Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins. Results: We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Cα-root mean square deviation (RMSD) of 3.8Å, with 6 of them having a Cα-RMSD < 2.5Å. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Cα-RMSD < 2.5Å. The average Cα-RMSD of the I-TASSER models was 3.9Å, whereas it was 5.9Å using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Cα-RMSD of 3.9Å was obtained for the third benchmark, with seven cases having a Cα-RMSD < 2.5Å. Conclusion: Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of ITASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users http://zhang.bioinformatics.ku.edu/I-TASSER.
TASSER-Lite: an automated tool for protein comparative modeling

(Georgia Institute of Technology, 2006-12) Pandit, Shashi Bhushan ; Zhang, Yang ; Skolnick, Jeffrey

This study involves the development of a rapid comparative modeling tool for homologous sequences by extension of the TASSER methodology, developed for tertiary structure prediction. This comparative modeling procedure was validated on a representative benchmark set of proteins in the Protein Data Bank composed of 901 single domain proteins (41- 200 residues) having sequence identities between 35-90% with respect to the template. Using a Monte Carta search scheme with the length of runs optimized lor weakly/nonhomologous proteins, TASSER often provides appreciable improvement in structure quality over the initial template. However, on average, this requires - 29 h of CPU time per sequence. Since homologous proteins are unlikely to require the extent of conformational search as weakly/nonhomologous proteins, TASSER's parameters were optimized to reduce the required CPU time to - 17 min, while retaining TASSER's ability to improve structure quality. Using this optimized TASSER (T ASSER-Lite), we find an average improvement in the aligned region of - 10% in root mean-square deviation from native over the initial template. Comparison of TASSER-Lite with the widely used comparative modeling tool MODELLER showed that TASSER-Lite yields final models that are closer to the native. TASSER-Lite is provided on the web at http://cssb.biology.gatech.edulskolnicklwebserviceltassertiteflndex.html.
Structure Modeling of All Identified G Protein–Coupled Receptors in the Human Genome

(Georgia Institute of Technology, 2006-02) Zhang, Yang ; DeVries, Mark E. ; Skolnick, Jeffrey

G protein–coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global Ca root-mean-squared deviation from native of 4.6 A°, with a root-mean-squared deviation in the transmembrane helix region of 2.1A°. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis.
TM-align: a protein structure alignment algorithm based on the TM-score

(Georgia Institute of Technology, 2005-04-22) Zhang, Yang ; Skolnick, Jeffrey

We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and Dynamic Programming (DP). The algorithm is ~4 times faster than CE and 20 times faster than DALI and SAL. On average, the resulting structure alignments have higher accuracy and coverage than those provided by these most often-used methods. TM-align is applied to an all-against-all structure comparison of 10 515 representative protein chains from the Protein Data Bank (PDB) with a sequence identity cutoff,95%: 1996 distinct folds are found when a TM-score threshold of 0.5 is used. We also use TM-align to match the models predicted by TASSER for solved non-homologous proteins in PDB. For both folded and misfolded models, TM-align can almost always find close structural analogs, with an average root mean square deviation, RMSD, of 3 A° and 87% alignment coverage. Nevertheless, there exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB. This correlation could be used to assist in model selection in blind protein structure predictions.
Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins

(Georgia Institute of Technology, 2004-10) Zhang, Yang ; Skolnick, Jeffrey

We evaluate tertiary structure predictions on medium to large size proteins by TASSER, a new algorithm that assembles protein structures through rearranging the rigid fragments from threading templates guided by a reduced Ca and side-chain based potential consistent with threading based tertiary restraints. Predictions were generated for 745 proteins 201– 300 residues in length that cover the Protein Data Bank (PDB) at the level of 35% sequence identity. With homologous proteins excluded, in 365 cases, the templates identified by our threading program, PROSPECTOR_3, have a root-mean-square deviation (RMSD) to native , 6.5 A˚ , with .70% alignment coverage. After TASSER assembly, in 408 cases the best of the top five full-length models has a RMSD , 6.5 A˚ . Among the 745 targets are 18 membrane proteins, with one-third having a predicted RMSD , 5.5 A˚ . For all representative proteins less than or equal to 300 residues that have corresponding multiple NMR structures in the Protein Data Bank, 20% of the models generated by TASSER are closer to the NMR structure centroid than the farthest individual NMR model. These results suggest that reasonable structure predictions for nonhomologous large size proteins can be automatically generated on a proteomic scale, and the application of this approach to structural as well as functional genomics represent promising applications of TASSER.
TOUCHSTONE II: a new approach to ab initio protein structure prediction

(Georgia Institute of Technology, 2003-08) Zhang, Yang ; Kolinski, Andrzej ; Skolnick, Jeffrey

We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting Ca atoms, with attached Cb atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 3 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 ; 120 residues) with predicted structures having a RMSD from native below 6.5 A˚ in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 ; 174 residues) with structures having a RMSD to native below 6.5 A˚ in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is [20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.
Parallel-hat tempering: A Monte Carlo search scheme for the identification of low-energy structures

(Georgia Institute of Technology, 2001-09-15) Zhang, Yang ; Skolnick, Jeffrey

A new parallel-hat tempering algorithm has been developed for Monte Carlo simulations, in which a composite ensemble of noninteracting replicas of the molecule system at different temperatures is simulated, and the Markov process of each replica is driven by a hatlike weight factor with exponentially enhanced probability in both low- and high-energy regions. To test the algorithm, the methodology is applied to a homopolymeric protein chain located on a face-centered cubic lattice. We demonstrate that the ability of the current approach to search for low-energy molecule structures is significantly better than other Monte Carlo techniques found in the literature.

Organizational Unit:

Center for the Study of Systems Biology

Permanent Link

Research Organization Registry ID

Description

Previous Names

Parent Organization

Parent Organization

Includes Organization(s)

ArchiveSpace Name Record

Filters

Author

Date

Organization

Resource Type

Resource Subtype

Has files

Record Type

Settings

Sort By

Results per page

Publication Search Results

Georgia Tech Library

Organizational Unit: Center for the Study of Systems Biology

Permanent Link

Research Organization Registry ID

Description

Previous Names

Parent Organization

Parent Organization

Includes Organization(s)

ArchiveSpace Name Record

Filters

Author

Date

Organization

Resource Type

Resource Subtype

Has files

Record Type

Settings

Sort By

Results per page

Publication Search Results

Organizational Unit:

Center for the Study of Systems Biology