93
views
0
recommends
+1 Recommend
1 collections
    3
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Human-Specific De Novo Protein-Coding Gene Associated with Human Brain Functions

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          To understand whether any human-specific new genes may be associated with human brain functions, we computationally screened the genetic vulnerable factors identified through Genome-Wide Association Studies and linkage analyses of nicotine addiction and found one human-specific de novo protein-coding gene, FLJ33706 (alternative gene symbol C20orf203). Cross-species analysis revealed interesting evolutionary paths of how this gene had originated from noncoding DNA sequences: insertion of repeat elements especially Alu contributed to the formation of the first coding exon and six standard splice junctions on the branch leading to humans and chimpanzees, and two subsequent substitutions in the human lineage escaped two stop codons and created an open reading frame of 194 amino acids. We experimentally verified FLJ33706's mRNA and protein expression in the brain. Real-Time PCR in multiple tissues demonstrated that FLJ33706 was most abundantly expressed in brain. Human polymorphism data suggested that FLJ33706 encodes a protein under purifying selection. A specifically designed antibody detected its protein expression across human cortex, cerebellum and midbrain. Immunohistochemistry study in normal human brain cortex revealed the localization of FLJ33706 protein in neurons. Elevated expressions of FLJ33706 were detected in Alzheimer's brain samples, suggesting the role of this novel gene in human-specific pathogenesis of Alzheimer's disease. FLJ33706 provided the strongest evidence so far that human-specific de novo genes can have protein-coding potential and differential protein expression, and be involved in human brain functions.

          Author Summary

          For decades, gene duplication, retrotranspositions and gene fusions were believed to be major ways to increase gene number. All involve “mother” genes as the “building blocks” for new genes. However, several recently identified “motherless” genes challenged the idea in that some proteins might have emerged de novo from ancestral non-coding DNAs. Did any such genes emerge in human after the divergence from chimpanzee? If yes, such genes might help understand what makes us human. Here we report the first experimentally verified case of a human-specific protein-coding gene, FLJ33706 (alternative gene symbol C20orf203), that originated de novo since the divergence of human and chimpanzee. FLJ33706 was formed by the insertion of repeat elements, especially Alu sequences, that contributed to the formation of the first coding exon and six standard splice junctions, followed by two human-specific substitutions that escaped stop codons. The functional protein-coding features of the FLJ33706 gene are supported by population genetics, transcriptome profiling, Western-blot and immunohistochemistry assays. Data suggest that FLJ33706 may be involved in nicotine addiction and Alzheimer's disease. FLJ33706 provided the strongest evidence so far that human-specific de novo genes can have protein-coding potential and be involved in human brain functions.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: not found

          Consed: a graphical tool for sequence finishing.

          Sequencing of large clones or small genomes is generally done by the shotgun approach (Anderson et al. 1982). This has two phases: (1) a shotgun phase in which a number of reads are generated from random subclones and assembled into contigs, followed by (2) a directed, or finishing phase in which the assembly is inspected for correctness and for various kinds of data anomalies (such as contaminant reads, unremoved vector sequence, and chimeric or deleted reads), additional data are collected to close gaps and resolve low quality regions, and editing is performed to correct assembly or base-calling errors. Finishing is currently a bottleneck in large-scale sequencing efforts, and throughput gains will depend both on reducing the need for human intervention and making it as efficient as possible. We have developed a finishing tool, consed, which attempts to implement these principles. A distinguishing feature relative to other programs is the use of error probabilities from our programs phred and phrap as an objective criterion to guide the entire finishing process. More information is available at http:// www.genome.washington.edu/consed/consed. html.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            DnaSP, DNA polymorphism analyses by the coalescent and other methods.

            DnaSP is a software package for the analysis of DNA polymorphism data. Present version introduces several new modules and features which, among other options allow: (1) handling big data sets (approximately 5 Mb per sequence); (2) conducting a large number of coalescent-based tests by Monte Carlo computer simulations; (3) extensive analyses of the genetic differentiation and gene flow among populations; (4) analysing the evolutionary pattern of preferred and unpreferred codons; (5) generating graphical outputs for an easy visualization of results. The software package, including complete documentation and examples, is freely available to academic users from: http://www.ub.es/dnasp
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The Universal Protein Resource (UniProt) in 2010

              The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                March 2010
                March 2010
                26 March 2010
                : 6
                : 3
                : e1000734
                Affiliations
                [1 ]Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, China
                [2 ]Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
                [3 ]Department of Pathology, Chinese PLA General Hospital, Beijing, China
                [4 ]Institute of Molecular Medicine, Peking University, Beijing, China
                [5 ]Molecular Neurobiology Branch, NIDA, Baltimore, Maryland, United States of America
                [6 ]Department of Biochemistry and Molecular Biology, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, China
                [7 ]Behavioral Neuroscience Branch, Intramural Research Program, National Institute on Drug Abuse, NIH/DHHS, Baltimore, Maryland, United States of America
                University of California San Diego, United States of America
                Author notes

                Conceived and designed the experiments: Chuan-Yun Li, George R Uhl, Qing-Rong Liu, Liping Wei. Performed the experiments: Chuan-Yun Li, Zhanbo Wang, Yan Zhang, Chunmei Cao, Ping-Wu Zhang, Shu-Juan Lu, Xiao-Mo Li, Quan Yu, Xiaofeng Zheng, Quan Du. Analyzed the data: Chuan-Yun Li, Yong Zhang. Wrote the paper: Chuan-Yun Li, Yong Zhang, Qing-Rong Liu, Liping Wei. Performed most of the experiments: Chuan-Yun Li. Conceived the idea: Chuan-Yun Li, Liping Wei.

                Article
                09-PLCB-RA-1354R3
                10.1371/journal.pcbi.1000734
                2845654
                20376170
                312e190c-5b10-472f-8e63-e3c245dbadd1
                This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
                History
                : 5 November 2009
                : 3 March 2010
                Page count
                Pages: 11
                Categories
                Research Article
                Evolutionary Biology/Evolutionary and Comparative Genetics
                Evolutionary Biology/Human Evolution

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article