117
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse

      research-article
      1 , * , 2 , * , 3 , 4 , 5 , 6 , 7 , 8 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 2 , 2 , 3 , 6 , 6 , 6 , 6 , 9 , 6 , 6 , 6 , 10 , 6 , 7 , 4 , 5 , * , 7 , * , 2 , * , The Mouse Genome Sequencing Consortium
      PLoS Biology
      Public Library of Science
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A finished clone-based assembly of the mouse genome reveals extensive recent sequence duplication during recent evolution and rodent-specific expansion of certain gene families. Newly assembled duplications contain protein-coding genes that are mostly involved in reproductive function.

          Abstract

          The mouse ( Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.

          Author Summary

          The availability of an accurate genome sequence provides the bedrock upon which modern biomedical research is based. Here we describe a high-quality assembly, Build 36, of the mouse genome. This assembly was put together by aligning overlapping individual clones representing parts of the genome, and it provides a more complete picture than previous assemblies, because it adds much rodent-specific sequence that was previously unavailable. The addition of these sequences provides insight into both the genomic architecture and the gene complement of the mouse. In particular, it highlights recent gene duplications and the expansion of certain gene families during rodent evolution. An improved understanding of the mouse genome and thus mouse biology will enhance the utility of the mouse as a model for human disease.

          Related collections

          Most cited references63

          • Record: found
          • Abstract: found
          • Article: not found

          The transcriptional landscape of the mammalian genome.

          This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

            Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Evolutionary and biomedical insights from the rhesus macaque genome.

              The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
                Bookmark

                Author and article information

                Contributors
                Role: Academic Editor
                Journal
                PLoS Biol
                plos
                plosbiol
                PLoS Biology
                Public Library of Science (San Francisco, USA )
                1544-9173
                1545-7885
                May 2009
                May 2009
                26 May 2009
                : 7
                : 5
                : e1000112
                Affiliations
                [1 ]National Center for Biotechnology Information, Bethesda, Maryland, United States of America
                [2 ]MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
                [3 ]The Genome Center at Washington University, St. Louis, Missouri, United States of America
                [4 ]The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
                [5 ]Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
                [6 ]Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
                [7 ]Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington, Seattle, Washington, United States of America
                [8 ]The Jackson Laboratory, Bar Harbor, Maine, United States of America
                [9 ]Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
                [10 ]McArdle Laboratory for Cancer Research, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, United States of America
                New England Biolabs, United States of America
                Author notes
                [¶]

                Membership of The Mouse Genome Sequencing Consortium is provided in the Acknowledgments.

                The author(s) have made the following declarations about their contributions: Conceived and designed the experiments: DMC LG DCS KLT CPP. Performed the experiments: DMC LG LWH MZ SG XS CJB RA JLC MD WH YK PM DM ZB ACM TG SZ BT KP CC MP JH RR DF JAL ZC TMGSC. Analyzed the data: DMC LG LWH MZ SG XS CJB DCS ZC EEE CPP. Contributed reagents/materials/analysis tools: DMC LG. Wrote the paper: DMC LG LWH MZ KLT EEE CPP.

                Article
                08-PLBI-RA-5501R2
                10.1371/journal.pbio.1000112
                2680341
                19468303
                522383f8-8353-4ed7-a437-485d0df34d83
                This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
                History
                : 19 December 2008
                : 3 April 2009
                Page count
                Pages: 16
                Categories
                Research Article
                Genetics and Genomics/Bioinformatics
                Genetics and Genomics/Genome Projects
                Genetics and Genomics/Genomics
                Genetics and Genomics/Plant Genomes and Evolution

                Life sciences
                Life sciences

                Comments

                Comment on this article