134
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          • We provide a large-scale dataset on absolute protein and matching mRNA concentrations from the human medulloblastoma cell line Daoy. The correlation between mRNA and protein concentrations is significant and positive ( R s=0.46, R 2=0.29, P-value<2e16), although non-linear.

          • Out of ∼200 tested sequence features, sequence length, frequency and properties of amino acids, as well as translation initiation-related features are the strongest individual correlates of protein abundance when accounting for variation in mRNA concentration.

          • When integrating mRNA expression data and all sequence features into a non-parametric regression model (Multivariate Adaptive Regression Splines), we were able to explain up to 67% of the variation in protein concentrations. Half of the contributions were attributed to mRNA concentrations, the other half to sequence features relating to regulation of translation and protein degradation. The sequence features are primarily linked to the coding and 3′ untranslated region. To our knowledge, this is the most comprehensive predictive model of human protein concentrations achieved so far.

          Abstract

          mRNA decay, translation regulation and protein degradation are essential parts of eukaryotic gene expression regulation ( Hieronymus and Silver, 2004; Mata et al, 2005), which enable the dynamics of cellular systems and their responses to external and internal stimuli without having to rely exclusively on transcription regulation. The importance of these processes is emphasized by the generally low correlation between mRNA and protein concentrations. For many prokaryotic and eukaryotic organisms, <50% of variation in protein abundance variation is explained by variation in mRNA concentrations ( de Sousa Abreu et al, 2009).

          Given the plethora of regulatory mechanisms involved, most studies have focused so far on individual regulators and specific targets. Particularly in human, we currently lack system-wide, quantitative analyses that evaluate the relative contribution of regulatory elements encoded in the mRNA and protein sequence. Existing studies have been carried out only in bacteria and yeast ( Nie et al, 2006; Brockmann et al, 2007; Tuller et al, 2007; Wu et al, 2008). Here, we present the first comprehensive analysis on the impact of translation and protein degradation on protein abundance variation in a human cell line. For this purpose, we experimentally measured absolute protein and mRNA concentrations in the Daoy medulloblastoma cell line, using shotgun proteomics and microarrays, respectively ( Figure 1). These data comprise one of the largest such sets available today for human. We focused on sequence features that likely impact protein translation and protein degradation, including length, nucleotide composition, structure of the untranslated regions (UTRs), coding sequence, composition of the translation initiation site, presence of upstream open reading frames putative target sites of miRNAs, codon usage, amino-acid composition and protein degradation signals.

          Three types of tests have been conducted: (a) we examined partial Spearman's rank correlation of numerical features (e.g. length) with protein concentration, accounting for variation in mRNA concentrations; (b) for numerical and categorical features (e.g. function), we compared two extreme populations with Welch's t-test and (c) using a Multivariate Adaptive Regression Splines model, we analyzed the combined contributions of mRNA expression and sequence features to protein abundance variation ( Figure 1). To account for the non-linearity of many relationships, we use non-parametric approaches throughout the analysis.

          We observed a significant positive correlation between mRNA and protein concentrations, larger than many previous measurements ( de Sousa Abreu et al, 2009). We also show that the contribution of translation and protein degradation is at least as important as the contribution of mRNA transcription and stability to the abundance variation of the final protein products. Although variation in mRNA expression explains ∼25–30% of the variation in protein abundance, another 30–40% can be accounted for by characteristics of the sequences, which we identified in a comparative assessment of global correlates. Among these characteristics, sequence length, amino-acid frequencies and also nucleotide frequencies in the coding region are of strong influence ( Figure 3A). Characteristics of the 3′UTR and of the 5′UTR, that is length, nucleotide composition and secondary structures, describe another part of the variation, leaving 33% expression variation unexplained. The unexplained fraction may be accounted for by mechanisms not considered in this analysis (e.g. regulation by RNA-binding proteins or gene-specific structural motifs), as well as expression and measurement noise.

          Our combined model including mRNA concentration and sequence features can explain 67% of the variation of protein abundance in this system—and thus has the highest predictive power for human protein abundance achieved so far ( Figure 3B).

          Abstract

          Transcription, mRNA decay, translation and protein degradation are essential processes during eukaryotic gene expression, but their relative global contributions to steady-state protein concentrations in multi-cellular eukaryotes are largely unknown. Using measurements of absolute protein and mRNA abundances in cellular lysate from the human Daoy medulloblastoma cell line, we quantitatively evaluate the impact of mRNA concentration and sequence features implicated in translation and protein degradation on protein expression. Sequence features related to translation and protein degradation have an impact similar to that of mRNA abundance, and their combined contribution explains two-thirds of protein abundance variation. mRNA sequence lengths, amino-acid properties, upstream open reading frames and secondary structures in the 5′ untranslated region (UTR) were the strongest individual correlates of protein concentrations. In a combined model, characteristics of the coding region and the 3′UTR explained a larger proportion of protein abundance variation than characteristics of the 5′UTR. The absolute protein and mRNA concentration measurements for >1000 human genes described here represent one of the largest datasets currently available, and reveal both general trends and specific examples of post-transcriptional regulation.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: found
          • Article: not found

          Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.

          We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation.

            We report a method for large-scale absolute protein expression measurements (APEX) and apply it to estimate the relative contributions of transcriptional- and translational-level gene regulation in the yeast and Escherichia coli proteomes. APEX relies upon correcting each protein's mass spectrometry sampling depth (observed peptide count) by learned probabilities for identifying the peptides. APEX abundances agree with measurements from controls, western blotting, flow cytometry and two-dimensional gels, as well as known correlations with mRNA abundances and codon bias, providing absolute protein concentrations across approximately three to four orders of magnitude. Using APEX, we demonstrate that 73% of the variance in yeast protein abundance (47% in E. coli) is explained by mRNA abundance, with the number of proteins per mRNA log-normally distributed about approximately 5,600 ( approximately 540 in E. coli) protein molecules/mRNA. Therefore, levels of both eukaryotic and prokaryotic proteins are set per mRNA molecule and independently of overall protein concentration, with >70% of yeast gene expression regulation occurring through mRNA-directed mechanisms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans.

              Upstream ORFs (uORFs) are mRNA elements defined by a start codon in the 5' UTR that is out-of-frame with the main coding sequence. Although uORFs are present in approximately half of human and mouse transcripts, no study has investigated their global impact on protein expression. Here, we report that uORFs correlate with significantly reduced protein expression of the downstream ORF, based on analysis of 11,649 matched mRNA and protein measurements from 4 published mammalian studies. Using reporter constructs to test 25 selected uORFs, we estimate that uORFs typically reduce protein expression by 30-80%, with a modest impact on mRNA levels. We additionally identify polymorphisms that alter uORF presence in 509 human genes. Finally, we report that 5 uORF-altering mutations, detected within genes previously linked to human diseases, dramatically silence expression of the downstream protein. Together, our results suggest that uORFs influence the protein expression of thousands of mammalian genes and that variation in these elements can influence human phenotype and disease.
                Bookmark

                Author and article information

                Journal
                Mol Syst Biol
                Molecular Systems Biology
                Nature Publishing Group
                1744-4292
                2010
                24 August 2010
                24 August 2010
                : 6
                : 400
                Affiliations
                [1 ]simpleCenter for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas , Austin, TX, USA
                [2 ]simpleChildren's Cancer Research Institute, University of Texas Health Science Center , San Antonio, TX, USA
                [3 ]simpleDepartment of Management Science and Statistics, University of Texas , San Antonio, TX, USA
                [4 ]simpleCenter for Cancer Research Nanobiology Program, National Cancer Institute, NCI-Frederick , Frederick, MD, USA
                Author notes
                [a ]Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, 2500 Speedway, MBB 3.210, Austin, TX 78229-3900, USA. Tel.: +1 512 232 3919; Fax: +1 512 471 2149; cvogel@ 123456mail.utexas.edu
                [b ]Children's Cancer Research Institute, University of Texas Health Science Center, San Antonio, TX, USA. Tel.: +1 210 562 9049; Fax: +1 210 562 9014; E-mail: penalva@ 123456uthscsa.edu
                [*]

                These authors contributed equally to this work

                Article
                msb201059
                10.1038/msb.2010.59
                2947365
                20739923
                8448c7e2-7b63-4407-a4c7-768b8de6a63c
                Copyright © 2010, EMBO and Macmillan Publishers Limited

                This is an open-access article distributed under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 Unported License, which allows readers to alter, transform, or build upon the article and then distribute the resulting work under the same or similar license to this one. The work must be attributed back to the original author and commercial use is not permitted without specific permission.

                History
                : 10 March 2010
                : 29 June 2010
                Categories
                Article

                Quantitative & Systems biology
                protein stability,gene expression regulation,protein degradation,translation

                Comments

                Comment on this article