415
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CheS-Mapper 2.0 for visual validation of (Q)SAR models

      product-review
      1 , 2 , 2 ,
      Journal of Cheminformatics
      BioMed Central
      Visualization, Validation, (Q)SAR, 3D space

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking.

          Results

          We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints.

          Conclusions

          Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.

          Graphical abstract

          Comparing actual and predicted activity values with CheS-Mapper.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          Reoptimization of MDL keys for use in drug discovery.

          For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            On outliers and activity cliffs--why QSAR often disappoints.

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Reoptimization of MDL Keys for Use in Drug Discovery

                Bookmark

                Author and article information

                Contributors
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                BioMed Central
                1758-2946
                2014
                23 September 2014
                : 6
                : 41
                Affiliations
                [1 ]Institute for Physics, Albert-Ludwigs-Universität Freiburg, Hermann Herder Str. 3, Freiburg D-79104, Germany
                [2 ]Information Systems, Institut für Informatik, Johannes Gutenberg - Universität Mainz, Staudingerweg 9, Mainz D-55128, Germany
                Article
                s13321-014-0041-7
                10.1186/s13321-014-0041-7
                4186979
                852732b9-664b-45a9-8859-560ee62060a7
                Copyright © 2014 Gütlein et al.; licensee Springer.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 7 April 2014
                : 29 August 2014
                Categories
                Software

                Chemoinformatics
                visualization,validation,(q)sar,3d space
                Chemoinformatics
                visualization, validation, (q)sar, 3d space

                Comments

                Comment on this article