34
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Hominoid-Specific De Novo Protein-Coding Genes Originating from Long Non-Coding RNAs

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA–Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level.

          Author Summary

          Ever since the pre-genomic era, people believed that “mother gene”-based mechanisms such as gene duplication were the major means of creating new genes. Recently, we and others reported several “motherless” protein-coding genes in human, challenging the conventional idea in that some protein-coding genes might have emerged de novo from ancestral non-coding DNAs. However, how these interesting proteins originated is a question that remained unaddressed. The ancestral non-coding DNA must become transcribed and gain a translatable open reading frame before becoming a protein-coding gene, but either order of these two steps is possible. Here, we performed a comparative transcriptome study in human, chimpanzee, and rhesus macaque to address these fundamental questions. We found that most of the hominoid-specific de novo protein-coding genes encoded long non-coding RNAs in rhesus macaque or chimpanzee, with similar transcript structure and correlated tissue expression profile, but the protein-coding genes often had higher transcriptional abundance. According to the rule of parsimony, we conclude that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes that are then further optimized at the transcriptional level, a pattern insensitive to the parameters used in the identification and analysis of de novo genes.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          The Bioperl toolkit: Perl modules for the life sciences.

          The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The evolution of gene expression levels in mammalian organs.

            Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses of gene expression evolution were until recently prevented by technological limitations. Here we report the sequencing of polyadenylated RNA from six organs across ten species that represent all major mammalian lineages (placentals, marsupials and monotremes) and birds (the evolutionary outgroup), with the goal of understanding the dynamics of mammalian transcriptome evolution. We show that the rate of gene expression evolution varies among organs, lineages and chromosomes, owing to differences in selective pressures: transcriptome change was slow in nervous tissues and rapid in testes, slower in rodents than in apes and monotremes, and rapid for the X chromosome right after its formation. Although gene expression evolution in mammals was strongly shaped by purifying selection, we identify numerous potentially selectively driven expression switches, which occurred at different rates across lineages and tissues and which probably contributed to the specific organ biology of various mammals.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The UCSC Genome Browser database: update 2011

              The University of California, Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a ‘mean+whiskers’ windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                September 2012
                September 2012
                13 September 2012
                : 8
                : 9
                : e1002942
                Affiliations
                [1 ]Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
                [2 ]Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
                [3 ]Institute of Molecular Medicine, Peking University, Beijing, China
                University of California Davis, United States of America
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: C-YL LW YEZ. Performed the experiments: C-YL CX YEZ J-YC C-JL W-ZZ YL RZ MZ. Analyzed the data: C-YL CX YEZ J-YC C-JL W-ZZ YL RZ MZ. Contributed reagents/materials/analysis tools: C-YL YEZ RZ. Wrote the paper: C-YL LW YEZ.

                [¤]

                Current address: Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China

                Article
                PGENETICS-D-12-00411
                10.1371/journal.pgen.1002942
                3441637
                23028352
                7029086d-21b9-4880-8aa2-ddcc3461c459
                Copyright @ 2012

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 17 February 2012
                : 24 July 2012
                Page count
                Pages: 13
                Funding
                This work was supported by National Basic Research Program of China [2011CB518000] ( http://www.973.gov.cn/), the National Natural Science Foundation of China [31171269] ( http://www.nsfc.gov.cn/Portal0/default106.htm), and National High-Tech R&D Program [2007AA02Z165] ( http://www.most.gov.cn/eng/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Evolutionary Biology
                Evolutionary Processes
                Natural Selection
                Comparative Genomics
                Evolutionary Theory
                Genomics
                Genome Expression Analysis

                Genetics
                Genetics

                Comments

                Comment on this article