57
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improving the accuracy of protein secondary structure prediction using structural alignment

      product-review

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high.

          Results

          We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%.

          Conclusion

          By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.

          Related collections

          Most cited references46

          • Record: found
          • Abstract: found
          • Article: not found

          The PredictProtein server.

          PredictProtein (http://www.predictprotein.org) is an Internet service for sequence analysis and the prediction of protein structure and function. Users submit protein sequences or alignments; PredictProtein returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localization signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions, disulfide-bonds, sub-cellular localization and functional annotations. Upon request fold recognition by prediction-based threading, CHOP domain assignments, predictions of transmembrane strands and inter-residue contacts are also available. For all services, users can submit their query either by electronic mail or interactively via the World Wide Web.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              VADAR: a web server for quantitative evaluation of protein structure quality.

              VADAR (Volume Area Dihedral Angle Reporter) is a comprehensive web server for quantitative protein structure evaluation. It accepts Protein Data Bank (PDB) formatted files or PDB accession numbers as input and calculates, identifies, graphs, reports and/or evaluates a large number (>30) of key structural parameters both for individual residues and for the entire protein. These include excluded volume, accessible surface area, backbone and side chain dihedral angles, secondary structure, hydrogen bonding partners, hydrogen bond energies, steric quality, solvation free energy as well as local and overall fold quality. These derived parameters can be used to rapidly identify both general and residue-specific problems within newly determined protein structures. The VADAR web server is freely accessible at http://redpoll.pharmacy.ualberta.ca/vadar.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                2006
                14 June 2006
                : 7
                : 301
                Affiliations
                [1 ]Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
                [2 ]Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada
                Article
                1471-2105-7-301
                10.1186/1471-2105-7-301
                1550433
                16774686
                0ad57439-11cd-479b-b004-72e1bae4aabc
                Copyright © 2006 Montgomerie et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 18 October 2005
                : 14 June 2006
                Categories
                Software

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article