+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      In silico fragmentation for computer assisted identification of metabolite mass spectra

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          Mass spectrometry has become the analytical method of choice in metabolomics research. The identification of unknown compounds is the main bottleneck. In addition to the precursor mass, tandem MS spectra carry informative fragment peaks, but the coverage of spectral libraries of measured reference compounds are far from covering the complete chemical space. Compound libraries such as PubChem or KEGG describe a larger number of compounds, which can be used to compare their in silico fragmentation with spectra of unknown metabolites.


          We created the MetFrag suite to obtain a candidate list from compound libraries based on the precursor mass, subsequently ranked by the agreement between measured and in silico fragments. In the evaluation MetFrag was able to rank most of the correct compounds within the top 3 candidates returned by an exact mass query in KEGG. Compared to a previously published study, MetFrag obtained better results than the commercial MassFrontier software. Especially for large compound libraries, the candidates with a good score show a high structural similarity or just different stereochemistry, a subsequent clustering based on chemical distances reduces this redundancy. The in silico fragmentation requires less than a second to process a molecule, and MetFrag performs a search in KEGG or PubChem on average within 30 to 300 seconds, respectively, on an average desktop PC.


          We presented a method that is able to identify small molecules from tandem MS measurements, even without spectral reference data or a large set of fragmentation rules. With today's massive general purpose compound libraries we obtain dozens of very similar candidates, which still allows a confident estimate of the correct compound class. Our tool MetFrag improves the identification of unknown substances from tandem MS spectra and delivers better results than comparable commercial software. MetFrag is available through a web application, web services and as java library. The web frontend allows the end-user to analyse single spectra and browse the results, whereas the web service and console application are aimed to perform batch searches and evaluation.

          Related collections

          Most cited references 12

          • Record: found
          • Abstract: found
          • Article: not found

          The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics

          The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo-and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
            • Record: found
            • Abstract: found
            • Article: not found

            Optimization and testing of mass spectral library search algorithms for compound identification.

            Five algorithms proposed in the literature for library search identification of unknown compounds from their low resolution mass spectra were optimized and tested by matching test spectra against reference spectra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were probability-based matching (PBM), dot-product, Hertz et al. similarity index, Euclidean distance, and absolute value distance. The test set consisted of 12,592 alternate spectra of about 8000 compounds represented in the database. Most algorithms were optimized by varying their mass weighting and intensity scaling factors. Rank in the list of candidatc compounds was used as the criterion for accuracy. The best performing algorithm (75% accuracy for rank 1) was the dot-product function that measures the cosine of the angle between spectra represented as vectors. Other methods in order of performance were the Euclidean distance (72%), absolute value distance (68%) PBM (65%), and Hertz et al. (64%). Intensity scaling and mass weighting were important in the optimized algorithms with the square root of the intensity scale nearly optimal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the results. A modest improvement in the performance of the dot-product algorithm was made by adding a term that gave additional weight to relative peak intensities for spectra with many peaks in common.
              • Record: found
              • Abstract: found
              • Article: not found

              Current trends and future requirements for the mass spectrometric investigation of microbial, mammalian and plant metabolomes.

               Casey W. Dunn (2007)
              The functional levels of biological cells or organisms can be separated into the genome, transcriptome, proteome and metabolome. Of these the metabolome offers specific advantages to the investigation of the phenotype of biological systems. The investigation of the metabolome (metabolomics) has only recently appeared as a mainstream scientific discipline and is currently developing rapidly for the study of microbial, plant and mammalian metabolomes. The metabolome pipeline or workflow encompasses the processes of sample collection and preparation, collection of analytical data, raw data pre-processing, data analysis and data storage. Of these processes the collection of analytical data will be discussed in this review with specific interest shown in the application of mass spectrometry in the metabolomics pipeline. The current developments in mass spectrometry platforms (GC-MS, LC-MS, DIMS and imaging MS) and applications of specific interest will be highlighted. The current limitations of these platforms and applications will be discussed with areas requiring further development also highlighted. These include the detectable coverage of the metabolome, the identification of metabolites and the process of converting raw data to biological knowledge.

                Author and article information

                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                22 March 2010
                : 11
                : 148
                [1 ]Leibniz Institute of Plant Biochemistry- Department of Stress- and Developmental Biology, Weinberg 3, 06120 Halle(Saale), Germany
                [2 ]Institut für Informatik, Martin-Luther-Universität, Halle-Wittenberg, Von-Seckendorffplatz 1, 06120 Halle (Saale), Germany
                Copyright ©2010 Wolf et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Methodology article

                Bioinformatics & Computational biology


                Comment on this article