19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Critical Assessment of Small Molecule Identification 2016: automated methods

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest ( www.casmi-contest.org) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification.

          Results

          The Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in “Category 2: Best Automatic Structural Identification— In Silico Fragmentation Only”, won by Team Brouard with 41% challenge wins. The winner of “Category 3: Best Automatic Structural Identification—Full Information” was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways.

          Conclusions

          The improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for “known unknowns”. As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for “real life” annotations. The true “unknown unknowns” remain to be evaluated in future CASMI contests.

          Graphical abstract

          .

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13321-017-0207-1) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          MassBank: a public repository for sharing mass spectral data for life sciences.

          MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data. 2010 John Wiley & Sons, Ltd.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            METLIN: a metabolite mass spectral database.

            Endogenous metabolites have gained increasing interest over the past 5 years largely for their implications in diagnostic and pharmaceutical biomarker discovery. METLIN (http://metlin.scripps.edu), a freely accessible web-based data repository, has been developed to assist in a broad array of metabolite research and to facilitate metabolite identification through mass analysis. METLINincludes an annotated list of known metabolite structural information that is easily cross-correlated with its catalogue of high-resolution Fourier transform mass spectrometry (FTMS) spectra, tandem mass spectrometry (MS/MS) spectra, and LC/MS data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics

              The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo-and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
                Bookmark

                Author and article information

                Contributors
                emma.schymanski@eawag.ch
                cruttkie@ipb-halle.de
                martin.krauss@ufz.de
                celine.brouard@aalto.fi
                tkind@ucdavis.edu
                kai.duehrkop@uni-jena.de
                felicity.allen@ualberta.ca
                avaniya@ucdavis.edu
                dries.verdegem@vib-kuleuven.be
                sebastian.boecker@uni-jena.de
                juho.rousu@aalto.fi
                huibin.shen@aalto.fi
                hiroshi.tsugawa@riken.jp
                tsajed@ualberta.ca
                ofiehn@ucdavis.edu
                bart.ghesquiere@vib-kuleuven.be
                sneumann@ipb-halle.de
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                27 March 2017
                27 March 2017
                2017
                : 9
                : 22
                Affiliations
                [1 ]ISNI 0000 0001 1551 0562, GRID grid.418656.8, , Eawag: Swiss Federal Institute for Aquatic Science and Technology, ; Überlandstrasse 133, 8600 Dübendorf, Switzerland
                [2 ]ISNI 0000 0004 0493 728X, GRID grid.425084.f, Department of Stress and Developmental Biology, , Leibniz Institute of Plant Biochemistry, ; Weinberg 3, 06120 Halle, Germany
                [3 ]ISNI 0000 0004 0492 3830, GRID grid.7492.8, Department of Effect-Directed Analysis, , UFZ: Helmholtz Centre for Environmental Research, ; Permoserstrasse 15, 04318 Leipzig, Germany
                [4 ]ISNI 0000000108389418, GRID grid.5373.2, Department of Computer Science, , Aalto University, ; Konemiehentie 2, 02150 Espoo, Finland
                [5 ]Helsinki Institute for Information Technology, Tekniikantie 14, 02150 Espoo, Finland
                [6 ]ISNI 0000 0004 1936 9684, GRID grid.27860.3b, West Coast Metabolomics Center and Genome Center, , University of California Davis, ; 451 Health Sciences Drive, Davis, CA 95616 USA
                [7 ]ISNI 0000 0001 1939 2794, GRID grid.9613.d, Chair of Bioinformatics, , Friedrich-Schiller-University, Jena, ; Ernst-Abbe-Platz 2, 07743 Jena, Germany
                [8 ]GRID grid.17089.37, Department of Computing Science, , University of Alberta, ; Edmonton, AB T6G 2E9 Canada
                [9 ]ISNI 0000 0004 1936 9684, GRID grid.27860.3b, Department of Chemistry, , University of California Davis, ; One Shields Avenue, Davis, CA 95616 USA
                [10 ]ISNI 0000 0001 0668 7884, GRID grid.5596.f, Metabolomics Expertise Center, Vesalius Research Center (VRC), VIB, , KU Leuven – University of Leuven, ; 3000 Louvain, Belgium
                [11 ]ISNI 0000000094465255, GRID grid.7597.c, , RIKEN Center for Sustainable Resource Science (CSRS), ; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045 Japan
                [12 ]ISNI 0000 0001 0619 1117, GRID grid.412125.1, Department of Biochemistry, Faculty of Sciences, , King Abdulaziz University, ; Jeddah, Saudi Arabia
                Author information
                http://orcid.org/0000-0001-6868-8145
                Article
                207
                10.1186/s13321-017-0207-1
                5368104
                28316652
                9a3f59aa-af1d-4e72-8e2c-b35f34deb246
                © The Author(s) 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 14 December 2016
                : 13 March 2017
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100000780, European Commission;
                Award ID: 603437
                Funded by: Academy of Finland
                Award ID: 268874/MIDAS
                Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;
                Award ID: BO 1910/16
                Award Recipient :
                Funded by: NSERC, AICML, AIHS, Genome Alberta, CIHR
                Funded by: The Metabolomics Innovation Centre
                Funded by: Leibniz Association
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Chemoinformatics
                compound identification,in silico fragmentation,high resolution mass spectrometry,metabolomics,structure elucidation

                Comments

                Comment on this article