ProHits: an integrated software platform for mass spectrometry-based interaction proteomics

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Affinity purification coupled with mass spectrometric identification (AP-MS) is now a method of choice for charting novel protein-protein interactions, and has been applied to a large number of both small scale and high-throughput studies1. However, general and intuitive computational tools for sample tracking, AP-MS data analysis, and annotation have not kept pace with rapid methodological and instrument improvements. To address this need, we developed the ProHits LIMS platform. ProHits is a complete open source software solution for MS-based interaction proteomics that manages the entire pipeline from raw MS data files to fully annotated protein-protein interaction datasets. ProHits was designed to provide an intuitive user interface from the biologist's perspective, and can accommodate multiple instruments within a facility, multiple user groups, multiple laboratory locations, and any number of parallel projects. ProHits can manage all project scales, and supports common experimental pipelines, including those utilizing gel-based separation, gel-free analysis, and multi-dimensional protein or peptide separation. ProHits is a client-based HTML program written in PHP that runs a MySQL database on a dedicated server. The complete ProHits software solution consists of two main components: a Data Management module, and an Analyst module (Fig. 1a; see Supplementary Fig. 1 for data structure tables). These modules are supported by an Admin Office module, in which projects, instruments, user permissions and protein databases are managed (Supplementary Fig. 2). A simplified version of the software suite (“ProHits Lite”), consisting only of the Analyst module and Admin Office, is also available for users with pre-existing data management solutions or who receive pre-computed search results from analyses performed in a core MS facility (Supplementary Fig. 3). A step-by-step installation package, installation guide and user manual (see Supplementary Information) are available on the ProHits website (www.prohitsMS.com). In the Data Management module, raw data from all mass spectrometers in a facility or user group are copied to a single secure storage location in a scheduled manner. Data are organized in an instrument-specific manner, with folder and file organization mirroring the organization on the acquisition computer. ProHits also assigns unique identifiers to each folder and file. Log files and visual indicators of current connection status assist in monitoring the entire system. The Data Management module monitors the use of each instrument for reporting purposes (Supplementary Fig. 4–5). Raw MS files can be automatically converted to appropriate file formats using the open source ProteoWizard converters (http://proteowizard.sourceforge.net/). Converted files may be subjected to manual or automated database searches, followed by statistical analysis of the search results, according to any user-defined schedule; search engine parameters are also recorded to facilitate reporting and compliance with MIAPE guidelines2. Mascot3, X!Tandem4 and the TransProteomics Pipeline (TPP5) are fully integrated with ProHits via linked search engine servers (Supplementary Fig. 6–7). The Analyst module organizes data by project, bait, experiment and/or sample, for gel-based or gel-free approaches (Fig. 1a; for description of a gel-based project, see Supplementary Fig. 8). To create and analyze a gel-free affinity purification sample, the user specifies the bait gene name and species. ProHits automatically retrieves the amino acid sequence and other annotation from its associated database. Bait annotation may then be modified as necessary, for example to specify the presence of an epitope tag or mutation (Supplementary Fig. 9). A comprehensive annotation page tracks experimental details (Supplementary Fig. 10), including descriptions of the Sample, Affinity Purification protocol, Peptide Preparation methodology, and LC-MS/MS procedures. Controlled vocabulary lists for experimental descriptions can be added via drop-down menus to facilitate compliance with annotation guidelines such as MIAPE6 and MIMIx7, and to facilitate the organization and retrieval of data files. Free text notes for cross-referencing laboratory notebook pages, adding experimental details not captured in other sections, describing deviations from reference protocols and links to gel images or other file types may be added in the Experimental Detail page. Once an experiment is created, multiple samples may be linked to it, for example technical replicates of the same sample, or chromatographic fractions derived from the same preparation. All baits, experiments, samples and protocols are assigned unique identifiers. Once a sample is created, it is linked to both the relevant raw files and database search results. For multiple samples in HTP projects, automatic sample annotation may be established by using a standardized file naming system (Supplementary Fig. 11), or files may be manually linked. Alternatively, search results obtained outside of ProHits (with the X!Tandem or Mascot search engines) can be manually imported into the Analyst module (Supplementary Fig. 12). The ProHits Lite version enables uploading of external search results for users with an established MS data management system. In the Analyst module, mass spectrometry data can be explored in an intuitive manner, and results from individual samples, experiments or baits can be viewed and filtered (Supplementary Fig. 13–14). A user interface enables alignment of data from multiple baits or MS analyses using the Comparison tool. Data from individual MS runs, or derived from any user-defined sample group, are selected for visualization in a tabular format, for side-by-side comparisons (Fig. 1b; Supplementary Fig. 15–17). In the Comparison view, control groups and individual baits, experiments or samples are displayed by column. Proteins identified in each MS run or group of runs are displayed by row, and each cell corresponds to a putative protein hit, according to user-specified database search score cutoff. Cells display spectral count number, unique peptides, scores from search engines, and/or protein coverage information; a mouse-over function reveals all associated data for each cell in the table. For each protein displayed in the Comparison view, an associated Peptide link (Fig. 1b) may also be selected to reveal information such as sequence, location, spectral counts, and score, for each associated peptide. Importantly, all search results can be filtered. For example, ProHits allows for the removal of non-specific background proteins from the hit list, as defined by negative controls, search engine score thresholds, or contaminant lists. Links to the external NCBI and BioGRID8 databases are provided for each hit to facilitate data interpretation. Overlap with published interaction data housed in the BioGRID database8 can be displayed to allow immediate identification of new interaction partners. A flexible export function enables visualization in a graphical format with Cytoscape9, in which spectral counts, unique peptides, and search engine scores can be visualized as interaction edge attributes. The Analyst module also includes advanced search functions, bulk export functions for filtered or unfiltered data, and management of experimental protocols and background lists (e.g. Supplementary Fig. 18–20). Deposition of all mass spectrometry-associated data in public repositories is likely to become mandatory for publication of proteomics experiments2, 7, 10. Open access to raw files is essential for data reanalysis and cross-platform comparison; however, data submission to public repositories can be laborious due to strict formatting requirements. ProHits facilitates extraction of the necessary details in compliance with current standards, and generates Proteomic Standard Initiative (PSI) v2.5 compliant reports11, either in the MITAB format for BioGRID8 or in XML format for submission to IMEx consortium databases12, including IntAct13 (Supplementary Fig. 21). MS raw files associated with a given project can also be easily retrieved and grouped for submission to data repositories such as Tranche14. ProHits has developed to manage many large-scale in-house projects, including a systematic analysis of kinase and phosphatase interactions in yeast, consisting of 986 affinity purifications15. Smaller-scale projects from individual laboratories are readily handled in a similar manner. Examples of AP-MS data from both yeast and mammalian projects are provided in a demonstration version of ProHits at www.prohitsMS.com, and in Supplementary documents. The modular architecture of ProHits will accommodate additional new features, as dictated by future experimental and analytical needs. Although ProHits has been designed to handle protein interaction data, simple modifications of the open source code will enable straightforward adaptation to other proteomics workflows. Supplementary Material 1

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

TANDEM: matching proteins with tandem mass spectra.

Robertson Craig, Ronald C Beavis (2004)

Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.

0 comments Cited 655 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Large-scale mapping of human protein–protein interactions by mass spectrometry

Rob Ewing, PETER CHU, Fred Elisma … (2007)

Mapping protein–protein interactions is an invaluable tool for understanding protein function. Here, we report the first large-scale study of protein–protein interactions in human cells using a mass spectrometry-based approach. The study maps protein interactions for 338 bait proteins that were selected based on known or suspected disease and functional associations. Large-scale immunoprecipitation of Flag-tagged versions of these proteins followed by LC-ESI-MS/MS analysis resulted in the identification of 24 540 potential protein interactions. False positives and redundant hits were filtered out using empirical criteria and a calculated interaction confidence score, producing a data set of 6463 interactions between 2235 distinct proteins. This data set was further cross-validated using previously published and predicted human protein interactions. In-depth mining of the data set shows that it represents a valuable source of novel protein–protein interactions with relevance to human diseases. In addition, via our preliminary analysis, we report many novel protein interactions and pathway associations.

0 comments Cited 295 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

IntAct—open source resource for molecular interaction data

S. Kerrien, Y. Alam-Faruque, Gloria Aranda … (2006)

IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data. The data available in the database originates entirely from published literature and is manually annotated by expert biologists to a high level of detail, including experimental methods, conditions and interacting domains. The database features over 126 000 binary interactions extracted from over 2100 scientific publications and makes extensive use of controlled vocabularies. The web site provides tools allowing users to search, visualize and download data from the repository. IntAct supports and encourages local installations as well as direct data submission and curation collaborations. IntAct source code and data are freely available from .

0 comments Cited 289 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9604648

Journal ID (pubmed-jr-id): 20305

Journal ID (nlm-ta): Nat Biotechnol

Title: Nature biotechnology

ISSN (Print): 1087-0156

ISSN (Electronic): 1546-1696

Publication date Nihms-submitted: 20 August 2010

Publication date (Print): October 2010

Publication date PMC-release: 1 April 2011

Volume: 28

Issue: 10

Pages: 1015-1017

Affiliations

[1 ] Centre for Systems Biology, Samuel Lunenfeld Research Institute, 600 University Avenue, Toronto, Ontario, M5G 1X5, Canada

[2 ] Department of Molecular Genetics, University of Toronto, 1 Kings College Circle, Toronto, Ontario, M5S 1A8 Canada

[3 ] Departments of Pathology and Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan USA, 48109-0602

[4 ] Ontario Cancer Institute and McLaughlin Centre for Molecular Medicine, 101 College St, Toronto, ON M5G 1L7 Canada

[5 ] Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Mayfield Road, Edinburgh, EH9 3JR Scotland UK

Author notes

[* ]Correspondence should be addressed to M.T. ( m.tyers@ 123456ed.ac.uk ) or A.-C.G. ( gingras@ 123456lunenfeld.ca )

Article

Manuscript ID: nihpa229581

DOI: 10.1038/nbt1010-1015

PMC ID: 2957308

PubMed ID: 20944583

SO-VID: 745266a4-8dff-4c9a-a78c-0e7a825045ef

License:

Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

History

Funding

Funded by: National Cancer Institute : NCI

Award ID: R01 CA126239-04 ||CA

Comments

Comment on this article

scite_

Cited by 87

See all cited by

ProHits: an integrated software platform for mass spectrometry-based interaction proteomics

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 11

TANDEM: matching proteins with tandem mass spectra.

Large-scale mapping of human protein–protein interactions by mass spectrometry

IntAct—open source resource for molecular interaction data

Author and article information

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 10

Cited by 87