In silico fragmentation for computer assisted identification of metabolite mass spectra

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Mass spectrometry has become the analytical method of choice in metabolomics research. The identification of unknown compounds is the main bottleneck. In addition to the precursor mass, tandem MS spectra carry informative fragment peaks, but the coverage of spectral libraries of measured reference compounds are far from covering the complete chemical space. Compound libraries such as PubChem or KEGG describe a larger number of compounds, which can be used to compare their in silico fragmentation with spectra of unknown metabolites.

Results

We created the MetFrag suite to obtain a candidate list from compound libraries based on the precursor mass, subsequently ranked by the agreement between measured and in silico fragments. In the evaluation MetFrag was able to rank most of the correct compounds within the top 3 candidates returned by an exact mass query in KEGG. Compared to a previously published study, MetFrag obtained better results than the commercial MassFrontier software. Especially for large compound libraries, the candidates with a good score show a high structural similarity or just different stereochemistry, a subsequent clustering based on chemical distances reduces this redundancy. The in silico fragmentation requires less than a second to process a molecule, and MetFrag performs a search in KEGG or PubChem on average within 30 to 300 seconds, respectively, on an average desktop PC.

Conclusions

We presented a method that is able to identify small molecules from tandem MS measurements, even without spectral reference data or a large set of fragmentation rules. With today's massive general purpose compound libraries we obtain dozens of very similar candidates, which still allows a confident estimate of the correct compound class. Our tool MetFrag improves the identification of unknown substances from tandem MS spectra and delivers better results than comparable commercial software. MetFrag is available through a web application, web services and as java library. The web frontend allows the end-user to analyse single spectra and browse the results, whereas the web service and console application are aimed to perform batch searches and evaluation.

Related collections

Most cited references 12

Record: found
Abstract: found
Article: not found

The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics

Christoph Steinbeck, Yongquan Han, Stefan Kuhn … (2003)

The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo-and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.

0 comments Cited 293 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Optimization and testing of mass spectral library search algorithms for compound identification.

Stephen Stein, Donald Scott (1994)

Five algorithms proposed in the literature for library search identification of unknown compounds from their low resolution mass spectra were optimized and tested by matching test spectra against reference spectra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were probability-based matching (PBM), dot-product, Hertz et al. similarity index, Euclidean distance, and absolute value distance. The test set consisted of 12,592 alternate spectra of about 8000 compounds represented in the database. Most algorithms were optimized by varying their mass weighting and intensity scaling factors. Rank in the list of candidatc compounds was used as the criterion for accuracy. The best performing algorithm (75% accuracy for rank 1) was the dot-product function that measures the cosine of the angle between spectra represented as vectors. Other methods in order of performance were the Euclidean distance (72%), absolute value distance (68%) PBM (65%), and Hertz et al. (64%). Intensity scaling and mass weighting were important in the optimized algorithms with the square root of the intensity scale nearly optimal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the results. A modest improvement in the performance of the dot-product algorithm was made by adding a term that gave additional weight to relative peak intensities for spectra with many peaks in common.

0 comments Cited 155 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Current trends and future requirements for the mass spectrometric investigation of microbial, mammalian and plant metabolomes.

Casey W. Dunn (2007)

The functional levels of biological cells or organisms can be separated into the genome, transcriptome, proteome and metabolome. Of these the metabolome offers specific advantages to the investigation of the phenotype of biological systems. The investigation of the metabolome (metabolomics) has only recently appeared as a mainstream scientific discipline and is currently developing rapidly for the study of microbial, plant and mammalian metabolomes. The metabolome pipeline or workflow encompasses the processes of sample collection and preparation, collection of analytical data, raw data pre-processing, data analysis and data storage. Of these processes the collection of analytical data will be discussed in this review with specific interest shown in the application of mass spectrometry in the metabolomics pipeline. The current developments in mass spectrometry platforms (GC-MS, LC-MS, DIMS and imaging MS) and applications of specific interest will be highlighted. The current limitations of these platforms and applications will be discussed with areas requiring further development also highlighted. These include the detectable coverage of the metabolome, the identification of metabolites and the process of converting raw data to biological knowledge.

0 comments Cited 59 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2010

Publication date (Electronic): 22 March 2010

Volume: 11

Page: 148

Affiliations

[1 ]Leibniz Institute of Plant Biochemistry- Department of Stress- and Developmental Biology, Weinberg 3, 06120 Halle(Saale), Germany

[2 ]Institut für Informatik, Martin-Luther-Universität, Halle-Wittenberg, Von-Seckendorffplatz 1, 06120 Halle (Saale), Germany

Article

Publisher ID: 1471-2105-11-148

DOI: 10.1186/1471-2105-11-148

PMC ID: 2853470

PubMed ID: 20307295

SO-VID: 2b7e5467-6f92-4ddc-bfd6-d6a728a99a6e

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 24 November 2009

Date accepted : 22 March 2010

Comments

Comment on this article

scite_

Cited by 161

See all cited by

Most referenced authors 319

See all reference authors

- Version 1

In silico fragmentation for computer assisted identification of metabolite mass spectra

Read this article at

Abstract

Background

Results

Conclusions

Related collections

REPO-TRIAL

Most cited references 12

The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics

Optimization and testing of mass spectral library search algorithms for compound identification.

Current trends and future requirements for the mass spectrometric investigation of microbial, mammalian and plant metabolomes.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 12

Cited by 161

Most referenced authors 319