17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13321-017-0203-5) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology

          BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Entrez Gene: gene-centered information at NCBI

            Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics

              The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo-and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
                Bookmark

                Author and article information

                Contributors
                Jiangming.Sun@astrazeneca.com
                jeliazkova.nina@gmail.com
                vchupakh@its.jnj.com
                jgolibdz@ITS.JNJ.com
                Ola.Engkvist@astrazeneca.com
                Lars.A.Carlsson@astrazeneca.com
                jwegner@its.jnj.com
                hceulema@its.jnj.com
                ivan@jonan.info
                vedrin.jeliazkov@gmail.com
                nick@uni-plovdiv.net
                ashby@imec.be
                hongming.chen@astrazeneca.com
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                7 March 2017
                7 March 2017
                2017
                : 9
                : 17
                Affiliations
                [1 ]Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, 43183 Mölndal, Sweden
                [2 ]Ideaconsult Ltd., 4. Angel Kanchev Str., 1000 Sofia, Bulgaria
                [3 ]ISNI 0000 0004 0623 0341, GRID grid.419619.2, Computational Biology, Discovery Sciences, , Janssen Pharmaceutica NV, ; Turnhoutseweg 30, 2349 Beerse, Belgium
                [4 ]Computational Biology, Discovery Sciences, Janssen Cilag SA, Calle Río Jarama, 71A, 45007 Toledo, Spain
                [5 ]ISNI 0000 0001 1014 775X, GRID grid.11187.3e, Department of Analytical Chemistry and Computer Chemistry, , University of Plovdiv, ; Plovdiv, Bulgaria
                [6 ]ISNI 0000 0001 2215 0390, GRID grid.15762.37, , Imec vzw, ; Kappeldreef 75, 3001 Louvain, Belgium
                Article
                203
                10.1186/s13321-017-0203-5
                5340785
                925bd624-80ee-404f-bb5e-2b5b1d93e071
                © The Author(s) 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 5 December 2016
                : 24 February 2017
                Funding
                Funded by: EU Horizon 2020 project ExCAPE (Exascale Compound Activity Prediction Engines)
                Award ID: 671555
                Award Recipient :
                Categories
                Database
                Custom metadata
                © The Author(s) 2017

                Chemoinformatics
                big data,bioactivity,chemogenomics,chemical structure,molecular fingerprints,search engine,qsar

                Comments

                Comment on this article