Blog
About

1,378
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      The Diploid Genome Sequence of an Individual Human

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

          Author Summary

          We have generated an independently assembled diploid human genomic DNA sequence from both chromosomes of a single individual (J. Craig Venter). Our approach, based on whole-genome shotgun sequencing and using enhanced genome assembly strategies and software, generated an assembled genome over half of which is represented in large diploid segments (>200 kilobases), enabling study of the diploid genome. Comparison with previous reference human genome sequences, which were composites comprising multiple humans, revealed that the majority of genomic alterations are the well-studied class of variants based on single nucleotides (SNPs). However, the results also reveal that lesser-studied genomic variants, insertions and deletions, while comprising a minority (22%) of genomic variation events, actually account for almost 74% of variant nucleotides. Inclusion of insertion and deletion genetic variation into our estimates of interchromosomal difference reveals that only 99.5% similarity exists between the two chromosomal copies of an individual and that genetic variation between two individuals is as much as five times higher than previously estimated. The existence of a well-characterized diploid human genome sequence provides a starting point for future individual genome comparisons and enables the emerging era of individualized genomic information.

          Abstract

          Comparison of the DNA sequence of an individual human from the reference sequence reveals a surprising amount of difference.

          Related collections

          Most cited references 106

          • Record: found
          • Abstract: found
          • Article: not found

          Initial sequencing and analysis of the human genome.

           James Galagan (2001)
          The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genome sequencing in microfabricated high-density picolitre reactors.

            The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Tandem repeats finder: a program to analyze DNA sequences.

               G. Benson (1999)
              A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.
                Bookmark

                Author and article information

                Contributors
                Role: Academic Editor
                Journal
                PLoS Biol
                pbio
                PLoS Biology
                Public Library of Science (San Francisco, USA )
                1544-9173
                1545-7885
                October 2007
                4 September 2007
                : 5
                : 10
                Affiliations
                [1 ] J. Craig Venter Institute, Rockville, Maryland, United States of America
                [2 ] Program in Genetics and Genomic Biology, The Hospital for Sick Children, and Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada
                [3 ] Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
                [4 ] Genetics Department, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
                Lawrence Berkeley National Laboratories, United States of America
                Author notes
                * To whom correspondence should be addressed. E-mail: slevy@ 123456jcvi.org
                Article
                07-PLBI-RA-1258R2 plbi-05-10-02
                10.1371/journal.pbio.0050254
                1964779
                17803354
                Copyright: © 2007 Levy et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                Page count
                Pages: 32
                Categories
                Research Article
                Genetics and Genomics
                Mammals
                Primates
                Human
                Custom metadata
                Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. (2007) The diploid genome sequence of an individual human. PLoS Biol 5(10): e254. doi: 10.1371/journal.pbio.0050254

                Life sciences

                Comments

                Comment on this article