+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective

      a , b , a , a ,   c , b , a , *
      Biochimica et Biophysica Acta
      Elsevier Pub. Co
      AMT, Accurate Mass Tag, ATAQS, Automated and Targeted Analysis with Quantitative SRM, CV, Controlled Vocabulary, DAO, Data Access Object, EBI, European Bioinformatics Institute, emPAI, exponentially modified Protein Abundance Index, FDR, False Discovery Rate, (HUPO)-PSI, (Human Proteome Organization) — Proteomics Standards Initiative, GUI, Graphical User Interface, ICAT, Isotope-Coded Affinity Tags, ICPL, Isotope-Coded Protein Label, IPTL, Isobaric Peptide Termini Labeling, ISB, Institute for Systems Biology, iTRAQ, Isobaric Tag for Relative and Absolute Quantitation, JPL, Java Proteomic Library, LC-MS, Liquid Chromatography–Mass Spectrometry, LIMS, Laboratory Information Management System, MGF, Mascot Generic Format, MIAPE, Minimum Information About a Proteomics Experiment, MS, Mass Spectrometry, SILAC, Stable Isotope Labeling by Amino acids in Cell culture, PASSEL, PeptideAtlas SRM Experiment Library, PRIDE, PRoteomics IDEntifications (database), PSM, Peptide Spectrum Match, PTM, Post-Translational Modifications, RT, Retention Time, SRM, Selected Reaction Monitoring, TMT, Tandem Mass Tag, TOPP, The OpenMS Proteomics Pipeline, TPP, Trans-Proteomic Pipeline, Proteomics, Databases, Bioinformatics, Software libraries, Application programming interface, Open source software

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.


          • A review of existing open-source software for computational proteomics.

          • Available software for each step in a typical MS experiment is described.

          • OpenMS, TPP, compomics, ProteoWizard, JPL, PRIDE toolsuite are covered in detail.

          • Different programming languages are considered (Java, Perl, C++ or Python).

          Related collections

          Most cited references129

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Proteomics Identifications (PRIDE) database and associated tools: status in 2013

          The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.
            • Record: found
            • Abstract: found
            • Article: not found

            Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

            We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.
              • Record: found
              • Abstract: found
              • Article: not found

              TANDEM: matching proteins with tandem mass spectra.

              Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.

                Author and article information

                Biochim Biophys Acta
                Biochim. Biophys. Acta
                Biochimica et Biophysica Acta
                Elsevier Pub. Co
                1 January 2014
                January 2014
                : 1844
                : 1
                : 63-76
                [a ]EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
                [b ]Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
                [c ]Proteome Informatics Group, Swiss Institute of Bioinformatics, CMU - 1, rue Michel Servet CH-1211 Geneva, Switzerland
                Author notes
                [* ]Corresponding author at: EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. Tel.: + 44 1223 492 610; fax: + 44 1223 494 484. juan@ 123456ebi.ac.uk
                © 2014 Elsevier B.V.

                This document may be redistributed and reused, subject to certain conditions.

                : 1 October 2012
                : 5 February 2013
                : 22 February 2013

                topp, the openms proteomics pipeline,isb, institute for systems biology,ataqs, automated and targeted analysis with quantitative srm,psm, peptide spectrum match,fdr, false discovery rate,tmt, tandem mass tag,(hupo)-psi, (human proteome organization) — proteomics standards initiative,cv, controlled vocabulary,ms, mass spectrometry,pride, proteomics identifications (database),tpp, trans-proteomic pipeline,mgf, mascot generic format,lims, laboratory information management system,icpl, isotope-coded protein label,application programming interface,empai, exponentially modified protein abundance index,databases,ptm, post-translational modifications,amt, accurate mass tag,proteomics,srm, selected reaction monitoring,itraq, isobaric tag for relative and absolute quantitation,passel, peptideatlas srm experiment library,ebi, european bioinformatics institute,jpl, java proteomic library,software libraries,dao, data access object,lc-ms, liquid chromatography–mass spectrometry,rt, retention time,miape, minimum information about a proteomics experiment,icat, isotope-coded affinity tags,open source software,iptl, isobaric peptide termini labeling,gui, graphical user interface,bioinformatics,silac, stable isotope labeling by amino acids in cell culture


                Comment on this article