• Record: found
  • Abstract: found
  • Article: found
Is Open Access

PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at

      Related collections

      Most cited references 91

      • Record: found
      • Abstract: not found
      • Article: not found

      Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

        • Record: found
        • Abstract: found
        • Article: not found

        Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

        In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.
          • Record: found
          • Abstract: found
          • Article: not found

          Protein secondary structure prediction based on position-specific scoring matrices.

           Walton Jones (1999)
          A two-stage neural network has been used to predict protein secondary structure based on the position specific scoring matrices generated by PSI-BLAST. Despite the simplicity and convenience of the approach used, the results are found to be superior to those produced by other methods, including the popular PHD method according to our own benchmarking results and the results from the recent Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP3), where the method was evaluated by stringent blind testing. Using a new testing set based on a set of 187 unique folds, and three-way cross-validation based on structural similarity criteria rather than sequence similarity criteria used previously (no similar folds were present in both the testing and training sets) the method presented here (PSIPRED) achieved an average Q3 score of between 76.5% to 78.3% depending on the precise definition of observed secondary structure used, which is the highest published score for any method to date. Given the success of the method in CASP3, it is reasonable to be confident that the evaluation presented here gives a fair indication of the performance of the method in general. Copyright 1999 Academic Press.

            Author and article information

            [1 ]Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
            [2 ]National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin, People's Republic of China
            [3 ]Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
            [4 ]Faculty of Information Technology, Monash University, Melbourne, Australia
            [5 ]ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, Australia
            Indian Institute of Science, India
            Author notes

            Competing Interests: The authors have declared that no competing interests exist.

            Conceived and designed the experiments: JS JCW RNP. Performed the experiments: JS HT. Analyzed the data: JS HT AJP GIW TA. Contributed reagents/materials/analysis tools: GIW TA. Wrote the paper: JS RNP.

            Role: Editor
            PLoS One
            PLoS ONE
            PLoS ONE
            Public Library of Science (San Francisco, USA )
            29 November 2012
            : 7
            : 11
            23209700 3510211 PONE-D-12-21687 10.1371/journal.pone.0050300

            This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

            Pages: 23
            This work was supported by grants from the National Health and Medical Research Council of Australia (NHMRC) (490989), the Australian Research Council (ARC) (LP110200333), the Chinese Academy of Sciences (CAS), the Japan Society for the Promotion of Science (S11156), the Knowledge Innovation Program of CAS (KSCX2-EW-G-8) and Tianjin Municipal Science & Technology Commission (10ZCKFSY05600). JS is an NHMRC Peter Doherty Fellow and a Recipient of the Hundred Talents Program of CAS. AJP is an NHMRC Peter Doherty Fellow. JCW is an ARC Federation Fellow and an honorary NHMRC Principal Research Fellow. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
            Research Article
            Protein Structure
            Computational Biology
            Genome Analysis Tools
            Gene Prediction
            Sequence Analysis
            Computer Science
            Computer Applications
            Web-Based Applications



            Comment on this article