40
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Characterization of Twenty Sequenced Human Genomes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

          Author Summary

          We report here the nearly complete genomic sequence of 20 different individuals, determined using “next-generation” sequencing technologies. We use these data to characterize the type of genetic variation carried by humans in a sample of this size, which is to our knowledge the largest set of unrelated genomic sequences that have been reported. We summarize different categories of variation in each genome, and in total across all 20 of the genomes, finding a surprising number of variants predicted to reduce or remove the proteins encoded by many different genes. This work provides important fundamental information about the scope of human genetic variation, and suggests ways to further explore the relationship between these genetic variants and human disease.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found
          Is Open Access

          The complete genome of an individual by massively parallel DNA sequencing.

          The association of genetic variation with disease and drug response, and improvements in nucleic acid technologies, have given great optimism for the impact of 'genomic medicine'. However, the formidable size of the diploid human genome, approximately 6 gigabases, has prevented the routine application of sequencing methods to deciphering complete individual human genomes. To realize the full potential of genomics for human health, this limitation must be overcome. Here we report the DNA sequence of a diploid genome of a single individual, James D. Watson, sequenced to 7.4-fold redundancy in two months using massively parallel sequencing in picolitre-size reaction vessels. This sequence was completed in two months at approximately one-hundredth of the cost of traditional capillary electrophoresis methods. Comparison of the sequence to the reference genome led to the identification of 3.3 million single nucleotide polymorphisms, of which 10,654 cause amino-acid substitution within the coding sequence. In addition, we accurately identified small-scale (2-40,000 base pair (bp)) insertion and deletion polymorphism as well as copy number variation resulting in the large-scale gain and loss of chromosomal segments ranging from 26,000 to 1.5 million base pairs. Overall, these results agree well with recent results of sequencing of a single individual by traditional methods. However, in addition to being faster and significantly less expensive, this sequencing technology avoids the arbitrary loss of genomic sequences inherent in random shotgun sequencing by bacterial cloning because it amplifies DNA in a cell-free system. As a result, we further demonstrate the acquisition of novel human sequence, including novel genes not previously identified by traditional genomic sequencing. This is the first genome sequenced by next-generation technologies. Therefore it is a pilot for the future challenges of 'personalized genome sequencing'.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Ensembl 2009

            The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases, and other information for chordate, selected model organism and disease vector genomes. As of release 51 (November 2008), Ensembl fully supports 45 species, and three additional species have preliminary support. New species in the past year include orangutan and six additional low coverage mammalian genomes. Major additions and improvements to Ensembl since our previous report include a major redesign of our website; generation of multiple genome alignments and ancestral sequences using the new Enredo-Pecan-Ortheus pipeline and development of our software infrastructure, particularly to support the Ensembl Genomes project (http://www.ensemblgenomes.org/).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Analysis of genetic inheritance in a family quartet by whole-genome sequencing.

              We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                September 2010
                September 2010
                9 September 2010
                : 6
                : 9
                : e1001111
                Affiliations
                [1 ]Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America
                [2 ]McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
                [3 ]Allergic Inflammation Unit, Laboratory of Allergic Diseases, National Institute of Allergy and Infectious Diseases, Bethesda, Maryland, United States of America
                [4 ]G. H. Sergievsky Center and Departments of Epidemiology and Neurology, Columbia University, New York, New York, United States of America
                [5 ]Division of Epidemiology, New York State Psychiatric Institute, New York, New York, United States of America
                [6 ]Duke Human Vaccine Institute, Duke University, Durham, North Carolina, United States of America
                [7 ]Infections and Immunoepidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America
                Georgia Institute of Technology, United States of America
                Author notes

                Conceived and designed the experiments: KVS DBG. Performed the experiments: KVS JPS CEG CRC LKH KAL AMM. Analyzed the data: KP DG JMM MZ JPS ETC JF SPD ELH ACN EKR AS. Contributed reagents/materials/analysis tools: NLMS JEHF JDM RO BFH JJG. Wrote the paper: KP KVS JF DBG.

                Article
                10-PLGE-RA-2481R2
                10.1371/journal.pgen.1001111
                2936541
                20838461
                9d24a0ba-ec75-4869-98e2-fadaa80ff636
                This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
                History
                : 27 January 2010
                : 3 August 2010
                Page count
                Pages: 10
                Categories
                Research Article
                Genetics and Genomics/Genome Projects
                Genetics and Genomics/Genomics

                Genetics
                Genetics

                Comments

                Comment on this article