Blog
About

13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Published and Perished? The Influence of the Searched Protein Database on the Long-Term Storage of Proteomics Data*

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time.

          To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier.

          Our analysis showed the differences between the main protein databases (International Protein Index (IPI), UniProt Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein identifiers on proteomics data. Based on these findings, UniProtKB seems the best database for applications that rely on the long-term storage of proteomics data.

          Related collections

          Most cited references 31

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Reactome: a database of reactions, pathways and biological processes

          Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Ensembl 2011

            The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              NCBI Reference Sequences: current status, policy and new initiatives

              NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 × 106 proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.
                Bookmark

                Author and article information

                Journal
                Mol Cell Proteomics
                mcprot
                mcprot
                MCP
                Molecular & Cellular Proteomics : MCP
                The American Society for Biochemistry and Molecular Biology
                1535-9476
                1535-9484
                September 2011
                23 June 2011
                23 June 2011
                : 10
                : 9
                Affiliations
                From the ‡EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK;
                §Department of Medicine I, Medical University of Vienna, Borschkegasse 8a, 1090 Vienna, Austria
                Author notes
                ¶ To whom correspondence should be addressed: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. E-mail: juan@ 123456ebi.ac.uk .
                Article
                M111.008490
                10.1074/mcp.M111.008490
                3186200
                21700957
                © 2011 by The American Society for Biochemistry and Molecular Biology, Inc.

                Creative Commons Attribution Non-Commercial License applies to Author Choice Articles

                Product
                Categories
                Technological Innovation and Resources

                Molecular biology

                Comments

                Comment on this article