+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Molecule Cloud - compact visualization of large collections of molecules


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis. In any case, however, viewing and analyzing large tables with molecular structures is necessary. We present a new visualization technique, providing basic information about the composition of molecular data sets at a single glance.


          A method is presented here allowing visual representation of the most common structural features of chemical databases in a form of a cloud diagram. The frequency of molecules containing particular substructure is indicated by the size of respective structural image. The method is useful to quickly perceive the most prominent structural features present in the data set. This approach was inspired by popular word cloud diagrams that are used to visualize textual information in a compact form. Therefore we call this approach “Molecule Cloud”. The method also supports visualization of additional information, for example biological activity of molecules containing this scaffold or the protein target class typical for particular scaffolds, by color coding. Detailed description of the algorithm is provided, allowing easy implementation of the method by any cheminformatics toolkit. The layout algorithm is available as open source Java code.


          Visualization of large molecular data sets using the Molecule Cloud approach allows scientists to get information about the composition of molecular databases and their most frequent structural features easily. The method may be used in the areas where analysis of large molecular collections is needed, for example processing of high throughput screening results, virtual screening or compound purchasing. Several example visualizations of large data sets, including PubChem, ChEMBL and ZINC databases using the Molecule Cloud diagrams are provided.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: not found

          Privileged scaffolds for library design and drug discovery.

          This review explores the concept of using privileged scaffolds to identify biologically active compounds through building chemical libraries. We hope to accomplish three main objectives: to provide one of the most comprehensive listings of privileged scaffolds; to reveal through four selected examples the present state of the art in privileged scaffold library synthesis (in hopes of inspiring new and even more creative approaches); and also to offer some thoughts on how new privileged scaffolds might be identified and exploited. Copyright 2010 Elsevier Ltd. All rights reserved.
            • Record: found
            • Abstract: found
            • Article: not found

            970 million druglike small molecules for virtual screening in the chemical universe database GDB-13.

            GDB-13 enumerates small organic molecules containing up to 13 atoms of C, N, O, S, and Cl following simple chemical stability and synthetic feasibility rules. With 977,468,314 structures, GDB-13 is the largest publicly available small organic molecule database to date.
              • Record: found
              • Abstract: found
              • Article: not found

              The scaffold tree--visualization of the scaffold universe by hierarchical scaffold classification.

              A hierarchical classification of chemical scaffolds (molecular framework, which is obtained by pruning all terminal side chains) has been introduced. The molecular frameworks form the leaf nodes in the hierarchy trees. By an iterative removal of rings, scaffolds forming the higher levels in the hierarchy tree are obtained. Prioritization rules ensure that less characteristic, peripheral rings are removed first. All scaffolds in the hierarchy tree are well-defined chemical entities making the classification chemically intuitive. The classification is deterministic, data-set-independent, and scales linearly with the number of compounds included in the data set. The application of the classification is demonstrated on two data sets extracted from the PubChem database, namely, pyruvate kinase binders and a collection of pesticides. The examples shown demonstrate that the classification procedure handles robustly synthetic structures and natural products.

                Author and article information

                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                BioMed Central
                6 July 2012
                : 4
                : 12
                [1 ]Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056, Basel, Switzerland
                Copyright ©2012 Ertl and Rohde; licensee Chemistry Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                : 14 May 2012
                : 6 July 2012

                open source,molecule cloud,scaffold analysis,visualization,chemical databases
                open source, molecule cloud, scaffold analysis, visualization, chemical databases


                Comment on this article