10
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      In silico MS/MS spectra for identifying unknowns: a critical examination using CFM-ID algorithms and ENTACT mixture samples

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          High-resolution mass spectrometry (HRMS) enables rapid chemical annotation via accurate mass measurements and matching of experimentally derived spectra with reference spectra. Reference libraries are generated from chemical standards and are therefore limited in size relative to known chemical space. To address this limitation, in silico spectra (i.e., MS/MS or MS2 spectra), predicted via Competitive Fragmentation Modeling-ID (CFM-ID) algorithms, were generated for compounds within the U.S. Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database (totaling, at the time of analysis, ~ 765,000 substances). Experimental spectra from EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT) mixtures ( n = 10) were then used to evaluate the performance of the in silico spectra. Overall, MS2 spectra were acquired for 377 unique compounds from the ENTACT mixtures. Approximately 53% of these compounds were correctly identified using a commercial reference library, whereas up to 50% were correctly identified as the top hit using the in silico library. Together, the reference and in silico libraries were able to correctly identify 73% of the 377 ENTACT substances. When using the in silico spectra for candidate filtering, an examination of binary classifiers showed a true positive rate (TPR) of 0.90 associated with false positive rates (FPRs) of 0.10 to 0.85, depending on the sample and method of candidate filtering. Taken together, these findings show the abilities of in silico spectra to correctly identify true positives in complex samples (at rates comparable to those observed with reference spectra), and efficiently filter large numbers of potential false positives from further consideration.

          Graphical abstract

          Electronic supplementary material

          The online version of this article (10.1007/s00216-019-02351-7) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          Searching molecular structure databases with tandem mass spectra using CSI:FingerID.

          Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software.

            Compound identification from accurate mass MS/MS spectra is a bottleneck for untargeted metabolomics. In this study, we propose nine rules of hydrogen rearrangement (HR) during bond cleavages in low-energy collision-induced dissociation (CID). These rules are based on the classic even-electron rule and cover heteroatoms and multistage fragmentation. We evaluated our HR rules by the statistics of MassBank MS/MS spectra in addition to enthalpy calculations, yielding three levels of computational MS/MS annotation: "resolved" (regular HR behavior following HR rules), "semiresolved" (irregular HR behavior), and "formula-assigned" (lacking structure assignment). With this nomenclature, 78.4% of a total of 18506 MS/MS fragment ions in the MassBank database and 84.8% of a total of 36370 MS/MS fragment ions in the GNPS database were (semi-) resolved by predicted bond cleavages. We also introduce the MS-FINDER software for structure elucidation. Molecular formulas of precursor ions are determined from accurate mass, isotope ratio, and product ion information. All isomer structures of the predicted formula are retrieved from metabolome databases, and MS/MS fragmentations are predicted in silico. The structures are ranked by a combined weighting score considering bond dissociation energies, mass accuracies, fragment linkages, and, most importantly, nine HR rules. The program was validated by its ability to correctly calculate molecular formulas with 98.0% accuracy for 5063 MassBank MS/MS records and to yield the correct structural isomer with 82.1% accuracy within the top-3 candidates. In a test with 936 manually identified spectra from an untargeted HILIC-QTOF MS data set of human plasma, formulas were correctly predicted in 90.4% of the cases, and the correct isomer structure was retrieved at 80.4% probability within the top-3 candidates, including for compounds that were absent in mass spectral libraries. The MS-FINDER software is freely available at http://prime.psc.riken.jp/ .
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra

              CFM-ID is a web server supporting three tasks associated with the interpretation of tandem mass spectra (MS/MS) for the purpose of automated metabolite identification: annotation of the peaks in a spectrum for a known chemical structure; prediction of spectra for a given chemical structure and putative metabolite identification—a predicted ranking of possible candidate structures for a target spectrum. The algorithms used for these tasks are based on Competitive Fragmentation Modeling (CFM), a recently introduced probabilistic generative model for the MS/MS fragmentation process that uses machine learning techniques to learn its parameters from data. These algorithms have been extensively tested on multiple datasets and have been shown to out-perform existing methods such as MetFrag and FingerId. This web server provides a simple interface for using these algorithms and a graphical display of the resulting annotations, spectra and structures. CFM-ID is made freely available at http://cfmid.wishartlab.com.
                Bookmark

                Author and article information

                Contributors
                chao.alex@epa.gov
                sobus.jon@epa.gov
                Journal
                Anal Bioanal Chem
                Anal Bioanal Chem
                Analytical and Bioanalytical Chemistry
                Springer Berlin Heidelberg (Berlin/Heidelberg )
                1618-2642
                1618-2650
                22 January 2020
                22 January 2020
                2020
                : 412
                : 6
                : 1303-1315
                Affiliations
                [1 ]Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711 USA
                [2 ]Student Contractor, U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711 USA
                [3 ]GRID grid.422638.9, ISNI 0000 0001 2107 5309, Present Address: Agilent Technologies Inc., ; Santa Clara, CA 95051 USA
                [4 ]General Dynamics Information Technology, 79 T.W. Alexander Drive, Research Triangle Park, NC 27709 USA
                [5 ]GRID grid.16008.3f, ISNI 0000 0001 2295 9843, Present Address: Luxembourg Centre for Systems Biomedicine, , University of Luxembourg, ; 4365 Esch-sur-Alzette, Luxembourg
                [6 ]U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711 USA
                [7 ]U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711 USA
                Author information
                http://orcid.org/0000-0002-7914-3623
                Article
                2351
                10.1007/s00216-019-02351-7
                7021669
                31965249
                e5d79499-3f98-42a3-b923-44e596328c29
                © The Author(s) 2019

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 4 October 2019
                : 27 November 2019
                : 11 December 2019
                Categories
                Research Paper
                Custom metadata
                © Springer-Verlag GmbH Germany, part of Springer Nature 2020

                Analytical chemistry
                non-targeted analysis,high-resolution mass spectrometry,cfm-id,entact,toxcast,dsstox

                Comments

                Comment on this article