107
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Mining high-throughput screening (HTS) assays is key for enhancing decisions in the area of drug repositioning and drug discovery. However, many challenges are encountered in the process of developing suitable and accurate methods for extracting useful information from these assays. Virtual screening and a wide variety of databases, methods and solutions proposed to-date, did not completely overcome these challenges. This study is based on a multi-label classification (MLC) technique for modeling correlations between several HTS assays, meaning that a single prediction represents a subset of assigned correlated labels instead of one label. Thus, the devised method provides an increased probability for more accurate predictions of compounds that were not tested in particular assays.

          Results

          Here we present DRABAL, a novel MLC solution that incorporates structure learning of a Bayesian network as a step to model dependency between the HTS assays. In this study, DRABAL was used to process more than 1.4 million interactions of over 400,000 compounds and analyze the existing relationships between five large HTS assays from the PubChem BioAssay Database. Compared to different MLC methods, DRABAL significantly improves the F 1Score by about 22%, on average. We further illustrated usefulness and utility of DRABAL through screening FDA approved drugs and reported ones that have a high probability to interact with several targets, thus enabling drug-multi-target repositioning. Specifically DRABAL suggests the Thiabendazole drug as a common activator of the NCP1 and Rab-9A proteins, both of which are designed to identify treatment modalities for the Niemann–Pick type C disease.

          Conclusion

          We developed a novel MLC solution based on a Bayesian active learning framework to overcome the challenge of lacking fully labeled training data and exploit actual dependencies between the HTS assays. The solution is motivated by the need to model dependencies between existing experimental confirmatory HTS assays and improve prediction performance. We have pursued extensive experiments over several HTS assays and have shown the advantages of DRABAL. The datasets and programs can be downloaded from https://figshare.com/articles/DRABAL/3309562.

          Graphical abstract

          .

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13321-016-0177-8) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references65

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Open Babel: An open chemical toolbox

            Background A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. Results We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org.
              • Record: found
              • Abstract: not found
              • Article: not found

              Nearest neighbor pattern classification

                Author and article information

                Contributors
                othman.soufan@kaust.edu.sa
                wail.baalawi@kaust.edu.sa
                moataz.afeef@kaust.edu.sa
                magbubah.essack@kaust.edu.sa
                panos.kalnis@kaust.edu.sa
                vladimir.bajic@kaust.edu.sa
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                10 November 2016
                10 November 2016
                2016
                : 8
                : 64
                Affiliations
                [1 ]Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900 Saudi Arabia
                [2 ]Infocloud Group, Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900 Saudi Arabia
                Author information
                http://orcid.org/0000-0001-5435-4750
                Article
                177
                10.1186/s13321-016-0177-8
                5105261
                27895719
                c45e32e4-d6d3-4186-b304-3018ce4df12b
                © The Author(s) 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 15 May 2016
                : 3 November 2016
                Funding
                Funded by: King Abdullah University of Science and Technology (KAUST) and KAUST Office of Sponsored Research (OSR)
                Award ID: URF/1/1976-02
                Award Recipient :
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2016

                Chemoinformatics
                Chemoinformatics

                Comments

                Comment on this article

                Related Documents Log