23
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Organization and Evolution of Primate Centromeric DNA from Whole-Genome Shotgun Sequence Data

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

          Author Summary

          Centromeric DNA has been described as the last frontier of genomic sequencing; such regions are typically poorly assembled during the whole-genome shotgun sequence assembly process due to their repetitive complexity. This paper develops a computational algorithm to systematically extract data regarding primate centromeric DNA structure and organization from that ∼5% of sequence that is not included as part of standard genome sequence assemblies. Using this computational approach, we identify and reconstruct published human higher-order alpha satellite arrays and discover new families in human, chimpanzee, and Old World monkeys. Experimental validation confirms the utility of this computational approach to understanding the centromere organization of other nonhuman primates. An evolutionary analysis in diverse primate genomes supports fundamental differences in the structure and organization of centromere DNA between ape and Old World monkey lineages. The ability to extract meaningful biological data from random shotgun sequence data helps to fill an important void in large-scale sequencing of primate genomes, with implications for other genome sequencing projects.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: not found

          Consed: a graphical tool for sequence finishing.

          Sequencing of large clones or small genomes is generally done by the shotgun approach (Anderson et al. 1982). This has two phases: (1) a shotgun phase in which a number of reads are generated from random subclones and assembled into contigs, followed by (2) a directed, or finishing phase in which the assembly is inspected for correctness and for various kinds of data anomalies (such as contaminant reads, unremoved vector sequence, and chimeric or deleted reads), additional data are collected to close gaps and resolve low quality regions, and editing is performed to correct assembly or base-calling errors. Finishing is currently a bottleneck in large-scale sequencing efforts, and throughput gains will depend both on reducing the need for human intervention and making it as efficient as possible. We have developed a finishing tool, consed, which attempts to implement these principles. A distinguishing feature relative to other programs is the use of error probabilities from our programs phred and phrap as an objective criterion to guide the entire finishing process. More information is available at http:// www.genome.washington.edu/consed/consed. html.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genomic and genetic definition of a functional human centromere.

            The definition of centromeres of human chromosomes requires a complete genomic understanding of these regions. Toward this end, we report integration of physical mapping, genetic, and functional approaches, together with sequencing of selected regions, to define the centromere of the human X chromosome and to explore the evolution of sequences responsible for chromosome segregation. The transitional region between expressed sequences on the short arm of the X and the chromosome-specific alpha satellite array DXZ1 spans about 450 kilobases and is satellite-rich. At the junction between this satellite region and canonical DXZ1 repeats, diverged repeat units provide direct evidence of unequal crossover as the homogenizing force of these arrays. Results from deletion analysis of mitotically stable chromosome rearrangements and from a human artificial chromosome assay demonstrate that DXZ1 DNA is sufficient for centromere function. Evolutionary studies indicate that, while alpha satellite DNA present throughout the pericentromeric region of the X chromosome appears to be a descendant of an ancestral primate centromere, the current functional centromere based on DXZ1 sequences is the product of the much more recent concerted evolution of this satellite DNA.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Algorithms on Stings, Trees, and Sequences : Computer Science and Computational Biology

                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                pcbi
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                September 2007
                28 September 2007
                : 3
                : 9
                : e181
                Affiliations
                [1 ] Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, United States of America
                [2 ] Department of Genetics and Microbiology, University of Bari, Bari, Italy
                [3 ] Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
                [4 ] Howard Hughes Medical Institute, Seattle, Washington, United States of America
                The J. Craig Venter Institute, United States of America
                Author notes
                * To whom correspondence should be addressed. E-mail: eee@ 123456gs.washington.edu
                Article
                07-PLCB-RA-0242R3 plcb-03-09-14
                10.1371/journal.pcbi.0030181
                1994983
                17907796
                0cd84760-9e08-4662-9ed0-3b797e9d6988
                Copyright: © 2007 Alkan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 1 May 2007
                : 31 July 2007
                Page count
                Pages: 12
                Categories
                Research Article
                Computational Biology
                Evolutionary Biology
                Molecular Biology
                Homo (Human)
                Primates
                Custom metadata
                Alkan C, Ventura M, Archidiacono N, Rocchi M, Sahinalp SC, et al. (2007) Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data. PLoS Comput Biol 3(9): e181. doi: 10.1371/journal.pcbi.0030181

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article