+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Peptidomic discovery of short open reading frame-encoded peptides in human cells


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The amount of the transcriptome that is translated into polypeptides is of fundamental importance. We developed a peptidomic strategy to detect short ORF (sORF)-encoded polypeptides (SEPs) in human cells. We identified 90 SEPs, 86 of which are novel, the largest number of human SEPs ever reported. SEP abundances range from 10-1000 molecules per cell, identical to known proteins. SEPs arise from sORFs in non-coding RNAs as well as multi-cistronic mRNAs, and many SEPs initiate with non-AUG start codons, indicating that non-canonical translation may be more widespread in mammals than previously thought. In addition, coding sORFs are present in a small fraction (8/1866) of long intergenic non-coding RNAs (lincRNAs). Together, these results provide the strongest evidence to date that the human proteome is more complex than previously appreciated.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: found
          • Article: not found

          Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

          RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
            • Record: found
            • Abstract: found
            • Article: not found

            Comprehensive comparative analysis of strand-specific RNA sequencing methods

            Strand-specific, massively-parallel cDNA sequencing (RNA-Seq) is a powerful tool for novel transcript discovery, genome annotation, and expression profiling. Despite multiple published methods for strand-specific RNA-Seq, no consensus exists as to how to choose between them. Here, we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-Seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library construction protocols, including both published and our own novel methods. We found marked differences in strand-specificity, library complexity, evenness and continuity of coverage, agreement with known annotations, and accuracy for expression profiling. Weighing each method’s performance and ease, we identify the dUTP second strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.
              • Record: found
              • Abstract: found
              • Article: not found

              Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans.

              Upstream ORFs (uORFs) are mRNA elements defined by a start codon in the 5' UTR that is out-of-frame with the main coding sequence. Although uORFs are present in approximately half of human and mouse transcripts, no study has investigated their global impact on protein expression. Here, we report that uORFs correlate with significantly reduced protein expression of the downstream ORF, based on analysis of 11,649 matched mRNA and protein measurements from 4 published mammalian studies. Using reporter constructs to test 25 selected uORFs, we estimate that uORFs typically reduce protein expression by 30-80%, with a modest impact on mRNA levels. We additionally identify polymorphisms that alter uORF presence in 509 human genes. Finally, we report that 5 uORF-altering mutations, detected within genes previously linked to human diseases, dramatically silence expression of the downstream protein. Together, our results suggest that uORFs influence the protein expression of thousands of mammalian genes and that variation in these elements can influence human phenotype and disease.

                Author and article information

                Nat Chem Biol
                Nat. Chem. Biol.
                Nature chemical biology
                26 February 2013
                18 November 2012
                January 2013
                01 July 2013
                : 9
                : 1
                : 59-64
                [1 ]Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA
                [2 ]Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
                [3 ]Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
                [4 ]Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
                [5 ]Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA
                [6 ]Genome Sequencing & Analysis Program, Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, Massachusetts 02141, USA
                [7 ]Research Computing, Division of Science, Faculty of Arts and Sciences, Harvard University, 38 Oxford St, Room 211A, Cambridge, Massachusetts 02138, USA
                [8 ]Center of Systems Biology, Mass Spectrometry and Proteomics Lab, Faculty of Arts and Sciences, Harvard University, 52 Oxford St, Northwest Labs, B243.20, Cambridge, Massachusetts 02138, USA
                Author notes

                These authors contributed equally to this work.

                []Correspondence to: saghatelian@ 123456chemistry.harvard.edu .

                Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

                Funded by: National Human Genome Research Institute : NHGRI
                Award ID: U54 HG003067 || HG
                Funded by: National Institute of General Medical Sciences : NIGMS
                Award ID: R01 GM102491 || GM
                Funded by: Office of the Director : NIH
                Award ID: DP2 OD002374 || OD



                Comment on this article