6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The CompTox Chemistry Dashboard (ToxCast) contains one of the largest public databases on Zebrafish ( Danio rerio) developmental toxicity. The data consists of 19 toxicological endpoints on unique 1018 compounds measured in relatively low concentration ranges. The endpoints are related to developmental effects occurring in dechorionated zebrafish embryos for 120 hours post fertilization and monitored via gross malformations and mortality. We report the predictive capability of 209 quantitative structure–activity relationship (QSAR) models developed by machine learning methods using penalization techniques and diverse model quality metrics to cope with the imbalanced endpoints. All these QSAR models were generated to test how the imbalanced classification (toxic or non-toxic) endpoints could be predicted regardless which of three algorithms is used: logistic regression, multi-layer perceptron, or random forests. Additionally, QSAR toxicity models are developed starting from sets of classical molecular descriptors, structural fingerprints and their combinations. Only 8 out of 209 models passed the 0.20 Matthew’s correlation coefficient value defined a priori as a threshold for acceptable model quality on the test sets. The best models were obtained for endpoints mortality (MORT), ActivityScore and JAW (deformation). The low predictability of the QSAR model developed from the zebrafish embryotoxicity data in the database is mainly due to a higher sensitivity of 19 measurements of endpoints carried out on dechorionated embryos at low concentrations.

          Related collections

          Most cited references69

          • Record: found
          • Abstract: found
          • Article: not found

          Extended-connectivity fingerprints.

          Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Scikit-learn: machine learning in Python

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Learning from Imbalanced Data

                Bookmark

                Author and article information

                Contributors
                Role: Academic Editor
                Journal
                Molecules
                Molecules
                molecules
                Molecules
                MDPI
                1420-3049
                15 March 2021
                March 2021
                : 26
                : 6
                : 1617
                Affiliations
                [1 ]Know-Center, Inffeldgasse 13, 8010 Graz, Austria; mlovric@ 123456know-center.at (M.L.); rkern@ 123456know-center.at (R.K.)
                [2 ]Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia; olga.malev@ 123456irb.hr
                [3 ]Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov Trg 6, 10000 Zagreb, Croatia; goran.klobucar@ 123456biol.pmf.hr
                [4 ]Institute of Interactive Systems and Data Science, TU Graz, Inffeldgasse 16c, 8010 Graz, Austria
                [5 ]Department of Chemical Engineering, Pukyong National University, Busan 608-739, Korea
                Author notes
                [* ]Correspondence: jayliu@ 123456pknu.ac.kr (J.J.L.); lucic@ 123456irb.hr (B.L.); Tel.: +82-51-629-6453 (J.J.L.); +385-1-456-1111 (B.L.)
                Author information
                https://orcid.org/0000-0002-3541-9624
                https://orcid.org/0000-0002-0838-4593
                https://orcid.org/0000-0003-0202-6100
                https://orcid.org/0000-0001-7232-2007
                Article
                molecules-26-01617
                10.3390/molecules26061617
                7998177
                33803931
                d911dc00-47a4-443c-8f88-57beef1e0360
                © 2021 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 05 February 2021
                : 11 March 2021
                Categories
                Article

                predictive qsar,toxicity,toxcast,zebrafish embryo,rdkit,structural descriptors,structural fingerprints,machine learning,imbalanced classification,aquatic toxicology

                Comments

                Comment on this article