20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function.

      Bio Systems
      Algorithms, Artificial Intelligence, Binding Sites, Catalysis, Endopeptidases, chemistry, Enzyme Activation, Hydrolysis, Neural Networks (Computer), Oligopeptides, Peptide Fragments, chemical synthesis, Peptide Hydrolases, Protein Binding, Reproducibility of Results, Sensitivity and Specificity, Sequence Alignment, methods, Sequence Analysis, Protein, Structure-Activity Relationship

      Read this article at

      ScienceOpenPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper presents an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.

          Related collections

          Author and article information

          Comments

          Comment on this article