10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Chromosome-scale, haplotype-resolved assembly of human genomes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98–99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.

          Abstract

          Assembly of phased human genomes is achieved by combining long reads and long-range conformational data.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

          The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Phased diploid genome assembly with single-molecule real-time sequencing.

            While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

              The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions 15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
                Bookmark

                Author and article information

                Contributors
                shilpa_garg@hms.harvard.edu
                jchin@dnanexus.com
                gchurch@genetics.med.harvard.edu
                hli@ds.dfci.harvard.edu
                Journal
                Nat Biotechnol
                Nat Biotechnol
                Nature Biotechnology
                Nature Publishing Group US (New York )
                1087-0156
                1546-1696
                7 December 2020
                7 December 2020
                2021
                : 39
                : 3
                : 309-312
                Affiliations
                [1 ]GRID grid.38142.3c, ISNI 000000041936754X, Department of Genetics, , Harvard Medical School, ; Boston, MA USA
                [2 ]GRID grid.65499.37, ISNI 0000 0001 2106 9910, Department of Data Sciences, , Dana-Farber Cancer Institute, ; Boston, MA USA
                [3 ]GRID grid.38142.3c, ISNI 000000041936754X, Department of Biomedical Informatics, , Harvard Medical School, ; Boston, MA USA
                [4 ]DNAnexus, Mountain View, CA USA
                [5 ]GRID grid.420451.6, Google, ; Mountain View, CA USA
                [6 ]GRID grid.504177.0, Arima Genomics, ; San Diego, CA USA
                [7 ]GRID grid.423340.2, ISNI 0000 0004 0640 9878, Pacific Biosciences, ; Menlo Park, CA USA
                [8 ]GRID grid.504403.6, Dovetail Genomics, ; Scotts Valley, CA USA
                [9 ]GRID grid.39382.33, ISNI 0000 0001 2160 926X, Human Genome Sequencing Center, Baylor College of Medicine, ; Houston, TX USA
                [10 ]GRID grid.419538.2, ISNI 0000 0000 9071 0620, Max Planck Institute for Molecular Genetics, ; Berlin, Germany
                [11 ]GRID grid.507869.5, ISNI 0000 0004 0647 9307, Material Measurement Laboratory, National Institute of Standards and Technology, ; Gaithersburg, MD USA
                [12 ]GRID grid.11749.3a, ISNI 0000 0001 2167 7588, Saarland University, ; Saarbrücken, Germany
                [13 ]GRID grid.419528.3, ISNI 0000 0004 0491 9823, Max Planck Institute for Informatics, ; Saarbrücken, Germany
                Author information
                http://orcid.org/0000-0003-0200-4200
                http://orcid.org/0000-0002-2553-4231
                http://orcid.org/0000-0001-8346-9565
                http://orcid.org/0000-0003-2309-8402
                http://orcid.org/0000-0002-9376-1030
                http://orcid.org/0000-0001-6040-2691
                http://orcid.org/0000-0003-4394-2455
                http://orcid.org/0000-0003-3535-2076
                http://orcid.org/0000-0003-4874-2874
                Article
                711
                10.1038/s41587-020-0711-0
                7954703
                33288905
                7de5b939-fe31-44b1-a5ed-f35b86c2b728
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 21 October 2019
                : 9 September 2020
                : 17 September 2020
                Categories
                Letter
                Custom metadata
                © The Author(s), under exclusive licence to Springer Nature America, Inc. 2021

                Biotechnology
                computational biology and bioinformatics,genetics,molecular biology,diseases
                Biotechnology
                computational biology and bioinformatics, genetics, molecular biology, diseases

                Comments

                Comment on this article