Affinity purification coupled with mass spectrometric identification (AP-MS) is now
a method of choice for charting novel protein-protein interactions, and has been applied
to a large number of both small scale and high-throughput studies1. However, general
and intuitive computational tools for sample tracking, AP-MS data analysis, and annotation
have not kept pace with rapid methodological and instrument improvements.
To address this need, we developed the ProHits LIMS platform. ProHits is a complete
open source software solution for MS-based interaction proteomics that manages the
entire pipeline from raw MS data files to fully annotated protein-protein interaction
datasets. ProHits was designed to provide an intuitive user interface from the biologist's
perspective, and can accommodate multiple instruments within a facility, multiple
user groups, multiple laboratory locations, and any number of parallel projects. ProHits
can manage all project scales, and supports common experimental pipelines, including
those utilizing gel-based separation, gel-free analysis, and multi-dimensional protein
or peptide separation.
ProHits is a client-based HTML program written in PHP that runs a MySQL database on
a dedicated server. The complete ProHits software solution consists of two main components:
a Data Management module, and an Analyst module (Fig. 1a; see Supplementary Fig. 1
for data structure tables). These modules are supported by an Admin Office module,
in which projects, instruments, user permissions and protein databases are managed
(Supplementary Fig. 2). A simplified version of the software suite (“ProHits Lite”),
consisting only of the Analyst module and Admin Office, is also available for users
with pre-existing data management solutions or who receive pre-computed search results
from analyses performed in a core MS facility (Supplementary Fig. 3). A step-by-step
installation package, installation guide and user manual (see Supplementary Information)
are available on the ProHits website (www.prohitsMS.com).
In the Data Management module, raw data from all mass spectrometers in a facility
or user group are copied to a single secure storage location in a scheduled manner.
Data are organized in an instrument-specific manner, with folder and file organization
mirroring the organization on the acquisition computer. ProHits also assigns unique
identifiers to each folder and file. Log files and visual indicators of current connection
status assist in monitoring the entire system. The Data Management module monitors
the use of each instrument for reporting purposes (Supplementary Fig. 4–5). Raw MS
files can be automatically converted to appropriate file formats using the open source
ProteoWizard converters (http://proteowizard.sourceforge.net/). Converted files may
be subjected to manual or automated database searches, followed by statistical analysis
of the search results, according to any user-defined schedule; search engine parameters
are also recorded to facilitate reporting and compliance with MIAPE guidelines2. Mascot3,
X!Tandem4 and the TransProteomics Pipeline (TPP5) are fully integrated with ProHits
via linked search engine servers (Supplementary Fig. 6–7).
The Analyst module organizes data by project, bait, experiment and/or sample, for
gel-based or gel-free approaches (Fig. 1a; for description of a gel-based project,
see Supplementary Fig. 8). To create and analyze a gel-free affinity purification
sample, the user specifies the bait gene name and species. ProHits automatically retrieves
the amino acid sequence and other annotation from its associated database. Bait annotation
may then be modified as necessary, for example to specify the presence of an epitope
tag or mutation (Supplementary Fig. 9). A comprehensive annotation page tracks experimental
details (Supplementary Fig. 10), including descriptions of the Sample, Affinity Purification
protocol, Peptide Preparation methodology, and LC-MS/MS procedures. Controlled vocabulary
lists for experimental descriptions can be added via drop-down menus to facilitate
compliance with annotation guidelines such as MIAPE6 and MIMIx7, and to facilitate
the organization and retrieval of data files. Free text notes for cross-referencing
laboratory notebook pages, adding experimental details not captured in other sections,
describing deviations from reference protocols and links to gel images or other file
types may be added in the Experimental Detail page. Once an experiment is created,
multiple samples may be linked to it, for example technical replicates of the same
sample, or chromatographic fractions derived from the same preparation. All baits,
experiments, samples and protocols are assigned unique identifiers.
Once a sample is created, it is linked to both the relevant raw files and database
search results. For multiple samples in HTP projects, automatic sample annotation
may be established by using a standardized file naming system (Supplementary Fig.
11), or files may be manually linked. Alternatively, search results obtained outside
of ProHits (with the X!Tandem or Mascot search engines) can be manually imported into
the Analyst module (Supplementary Fig. 12). The ProHits Lite version enables uploading
of external search results for users with an established MS data management system.
In the Analyst module, mass spectrometry data can be explored in an intuitive manner,
and results from individual samples, experiments or baits can be viewed and filtered
(Supplementary Fig. 13–14). A user interface enables alignment of data from multiple
baits or MS analyses using the Comparison tool. Data from individual MS runs, or derived
from any user-defined sample group, are selected for visualization in a tabular format,
for side-by-side comparisons (Fig. 1b; Supplementary Fig. 15–17). In the Comparison
view, control groups and individual baits, experiments or samples are displayed by
column. Proteins identified in each MS run or group of runs are displayed by row,
and each cell corresponds to a putative protein hit, according to user-specified database
search score cutoff. Cells display spectral count number, unique peptides, scores
from search engines, and/or protein coverage information; a mouse-over function reveals
all associated data for each cell in the table. For each protein displayed in the
Comparison view, an associated Peptide link (Fig. 1b) may also be selected to reveal
information such as sequence, location, spectral counts, and score, for each associated
peptide. Importantly, all search results can be filtered. For example, ProHits allows
for the removal of non-specific background proteins from the hit list, as defined
by negative controls, search engine score thresholds, or contaminant lists. Links
to the external NCBI and BioGRID8 databases are provided for each hit to facilitate
data interpretation. Overlap with published interaction data housed in the BioGRID
database8 can be displayed to allow immediate identification of new interaction partners.
A flexible export function enables visualization in a graphical format with Cytoscape9,
in which spectral counts, unique peptides, and search engine scores can be visualized
as interaction edge attributes. The Analyst module also includes advanced search functions,
bulk export functions for filtered or unfiltered data, and management of experimental
protocols and background lists (e.g. Supplementary Fig. 18–20).
Deposition of all mass spectrometry-associated data in public repositories is likely
to become mandatory for publication of proteomics experiments2, 7, 10. Open access
to raw files is essential for data reanalysis and cross-platform comparison; however,
data submission to public repositories can be laborious due to strict formatting requirements.
ProHits facilitates extraction of the necessary details in compliance with current
standards, and generates Proteomic Standard Initiative (PSI) v2.5 compliant reports11,
either in the MITAB format for BioGRID8 or in XML format for submission to IMEx consortium
databases12, including IntAct13 (Supplementary Fig. 21). MS raw files associated with
a given project can also be easily retrieved and grouped for submission to data repositories
such as Tranche14.
ProHits has developed to manage many large-scale in-house projects, including a systematic
analysis of kinase and phosphatase interactions in yeast, consisting of 986 affinity
purifications15. Smaller-scale projects from individual laboratories are readily handled
in a similar manner. Examples of AP-MS data from both yeast and mammalian projects
are provided in a demonstration version of ProHits at www.prohitsMS.com, and in Supplementary
documents.
The modular architecture of ProHits will accommodate additional new features, as dictated
by future experimental and analytical needs. Although ProHits has been designed to
handle protein interaction data, simple modifications of the open source code will
enable straightforward adaptation to other proteomics workflows.
Supplementary Material
1