+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the ‘genome in a bottle’ (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          An integrated semiconductor device enabling non-optical genome sequencing.

          The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Extensive sequencing of seven human genomes to characterize benchmark reference materials

            The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
              • Record: found
              • Abstract: found
              • Article: not found

              Accurate multiplex polony sequencing of an evolved bacterial genome.

              We describe a DNA sequencing technology in which a commonly available, inexpensive epifluorescence microscope is converted to rapid nonelectrophoretic DNA sequencing automation. We apply this technology to resequence an evolved strain of Escherichia coli at less than one error per million consensus bases. A cell-free, mate-paired library provided single DNA molecules that were amplified in parallel to 1-micrometer beads by emulsion polymerase chain reaction. Millions of beads were immobilized in a polyacrylamide gel and subjected to automated cycles of sequencing by ligation and four-color imaging. Cost per base was roughly one-ninth as much as that of conventional sequencing. Our protocols were implemented with off-the-shelf instrumentation and reagents.

                Author and article information

                Nat Biotechnol
                Nat. Biotechnol.
                Nature biotechnology
                12 July 2019
                12 August 2019
                October 2019
                12 February 2020
                : 37
                : 10
                : 1155-1162
                [1. ]Pacific Biosciences, Menlo Park, CA, USA
                [2. ]Google Inc., Mountain View, CA, USA
                [3. ]Center for Bioinformatics, Saarland University, Saarbrücken, Germany
                [4. ]Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
                [5. ]Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, Saarbrücken, Germany
                [6. ]DNAnexus, Mountain View, CA, USA
                [7. ]National Institute of Standards and Technology, Gaithersburg, MD, USA
                [8. ]Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
                [9. ]Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
                [10. ]Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
                [11. ]Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
                [12. ]Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China
                [13. ]Dana-Farber Cancer Institute, Boston, MA, USA
                Author notes

                These authors contributed equally to this work.

                Author Contributions

                A.M.W., D.R.R., M.W.H., and P.P. designed the study. D.R.R. and P.P. developed the sample preparation protocol and performed sample preparation. D.R.R., P.P., and Y.Q. performed sequencing. A.C., A.K., C-S.C., M.A.D., and P.C. adapted the algorithms and implementation of DeepVariant. A.C., A.F., A.K., A.M.P., A.M.W., A.T., C-S.C., D.R.R., F.J.S., G.M., G.T.C., H.L., J.E., J.M.Z., J.R., M.A., M.A.D., M.C.S., M.M., N.D.O., P.C., P.P., R.J.H., S.K., T.M., and W.J.R. performed analysis. A.C., A.M.P., C-S.C., D.R.R., F.J.S., J.M.Z., M.A.D., M.C.S., and M.W.H. supervised analysis. A.C., A.M.W., D.R.R., G.M., J.M.Z., P.P., R.J.H., S.K., and W.J.R. wrote the manuscript. All authors reviewed and approved the final manuscript.

                [* ]Address correspondence to M.W.H. ( mhunkapiller@ 123456pacb.com ) or D.R.R. ( drank@ 123456pacb.com ).

                Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms




                Comment on this article