The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience*

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R.

We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. mzTab is intended as a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files can contain protein, peptide, and small molecule identifications together with experimental metadata and basic quantitative information. The format is not intended to store the complete experimental evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the experimental design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biological community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive additional documentation can be found online.

Related collections

Most cited references 28

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15209 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

HMDB: a knowledgebase for the human metabolome

David Wishart, Craig Knox, An Guo … (2009)

The Human Metabolome Database (HMDB, http://www.hmdb.ca) is a richly annotated resource that is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. Since its first release in 2007, the HMDB has been used to facilitate the research for nearly 100 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 2.0) has been significantly expanded and enhanced over the previous release (version 1.0). In particular, the number of fully annotated metabolite entries has grown from 2180 to more than 6800 (a 300% increase), while the number of metabolites with biofluid or tissue concentration data has grown by a factor of five (from 883 to 4413). Similarly, the number of purified compounds with reference to NMR, LC-MS and GC-MS spectra has more than doubled (from 380 to more than 790 compounds). In addition to this significant expansion in database size, many new database searching tools and new data content has been added or enhanced. These include better algorithms for spectral searching and matching, more powerful chemical substructure searches, faster text searching software, as well as dedicated pathway searching tools and customized, clickable metabolic maps. Changes to the user-interface have also been implemented to accommodate future expansion and to make database navigation much easier. These improvements should make the HMDB much more useful to a much wider community of users.

0 comments Cited 539 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Open mass spectrometry search algorithm.

Lewis Geer, Sanford Markey, Jeffrey Kowalak … (2004)

Large numbers of MS/MS peptide spectra generated in proteomics experiments require efficient, sensitive and specific algorithms for peptide identification. In the Open Mass Spectrometry Search Algorithm (OMSSA), specificity is calculated by a classic probability score using an explicit model for matching experimental spectra to sequences. At default thresholds, OMSSA matches more spectra from a standard protein cocktail than a comparable algorithm. OMSSA is designed to be faster than published algorithms in searching large MS/MS datasets.

0 comments Cited 309 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Mol Cell Proteomics

Journal ID (iso-abbrev): Mol. Cell Proteomics

Journal ID (hwp): mcprot

Journal ID (pmc): mcprot

Journal ID (publisher-id): MCP

Title: Molecular & Cellular Proteomics : MCP

Publisher: The American Society for Biochemistry and Molecular Biology

ISSN (Print): 1535-9476

ISSN (Electronic): 1535-9484

Publication date (Print): October 2014

Publication date (Electronic): 30 June 2014

Publication date PMC-release: 30 June 2014

Volume: 13

Issue: 10

Pages: 2765-2775

Affiliations

[1]From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK;

[2]§Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Vienna, Austria;

[3]‖Institute of Integrative Biology, University of Liverpool, L69 7ZB, Liverpool, UK;

[4]**Center for Bioinformatics and Department of Computer Science, University of Tübingen, D-72076 Tübingen, Germany;

[5]‡‡Computational Proteomics Unit and Cambridge Centre for Proteomics, Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, CB2 1QR, Cambridge, UK;

[6]§§Institute for Genomics and Bioinformatics, Graz University of Technology, Petersgasse 14/V, 8010 Graz, Austria;

[7]¶¶Core Facility Bioinformatics, Austrian Centre of Industrial Biotechnology (ACIB GmbH), Petersgasse 14/V, 8010 Graz, Austria;

[8]‖‖Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany;

[9] ^a Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, 06120 Halle (Saale), Germany;

[10] ^b School of Biological and Chemical Sciences, Queen Mary University of London, London, UK;

[11] ^c College of Computer, Hubei University of Education, Wuhan, China;

[12] ^d Center for Computational Mass Spectrometry, University of California, San Diego, La Jolla, CA;

[13] ^e Swiss-Prot group, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland;

[14] ^f Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Genopode 1015 Lausanne;

[15] ^g Center of Integrative Genomics, University of Lausanne, Quartier Sorge Genopode, 1015 Lausanne;

[16] ^h Quantitative Biology Center, University of Tübingen, D-72076 Tübingen, Germany

Author notes

ⁱ To whom correspondence should be addressed: Dr. Juan Antonio Vizcaíno, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK, Tel.: 44-1223-492-610, Fax: 44-1223-494-484, E-mail: juan@ 123456ebi.ac.uk .

¶ These authors contributed to this work equally.

Article

Publisher ID: O113.036681

DOI: 10.1074/mcp.O113.036681

PMC ID: 4189001

PubMed ID: 24980485

SO-VID: 44c2cc34-61df-4fa1-9a41-d809566af271

License:

Author's Choice—Final version full access.

History

Date received : 11 December 2013

Date revision received : 20 June 2014

Funding

Funded by: National Institutes of Health

Award ID: 8P41GM103485-05

Comments

Comment on this article

scite_

Cited by 60

See all cited by

Most referenced authors 2,321

See all reference authors

The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience*

Read this article at

Abstract

Related collections

Higher order chromatin architecture

Most cited references 28

Gene Ontology: tool for the unification of biology

HMDB: a knowledgebase for the human metabolome

Open mass spectrometry search algorithm.

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 122

Cited by 60

Most referenced authors 2,321