9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Exploring bias in the Protein Data Bank using contrast classifiers.

      Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
      Bias (Epidemiology), Binding Sites, Computational Biology, Databases, Protein, Molecular Structure, Neural Networks (Computer), Proteins, chemistry, classification, genetics

      Read this article at

      ScienceOpenPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this study we analyzed the bias existing in the Protein Data Bank (PDB) using the novel contrast classifier approach. We trained an ensemble of neural network classifiers, called a contrast classifier, to learn the distributional differences between non-redundant sequence subsets of PDB and SWISS-PROT. Assuming that SWISS-PROT is a representative of the sequence diversity in nature while the PDB is a biased sample, output of the contrast classifier can be used to measure whether the properties of a given sequence or its region are underrepresented in PDB. We applied the contrast classifier to SWISS-PROT sequences to analyze the bias in PDB towards different functional protein properties. The results showed that transmembrane, signal, disordered, and low complexity regions are significantly underrepresented in PDB, while disulfide bonds, metal binding sites, and sites involved in enzyme activity are overrepresented. Additionally, hydroxylation and phosphorylation posttranslational modification sites were found to be underrepresented while acetylation sites were significantly overrepresented. These results suggest the potential usefulness of contrast classifiers in the selection of target proteins for structural characterization experiments.

          Related collections

          Author and article information

          Comments

          Comment on this article