Reconstructing complex regions of genomes using long-read sequencing technology

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: not found

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Chen-Shan Chin, David H. Alexander, Patrick Marks … (2013)

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

0 comments Cited 734 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions.

Joshua Burton, Andrew Adey, Rupali P. Patwardhan … (2013)

Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving--for the human genome--98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.

0 comments Cited 458 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Diversity of human copy number variation and multicopy genes.

Jay Shendure, Evan Eichler, Peter Sudmant … (2010)

Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrated accurate estimation of absolute copy number for duplications as small as 1.9 kilobase pairs, ranging from 0 to 48 copies. We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies and used them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.

0 comments Cited 295 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Genome Res

Journal ID (iso-abbrev): Genome Res

Journal ID (publisher-id): GENOME

Title: Genome Research

Publisher: Cold Spring Harbor Laboratory Press

ISSN (Print): 1088-9051

ISSN (Electronic): 1549-5469

Publication date (Print): April 2014

Volume: 24

Issue: 4

Pages: 688-696

Affiliations

[1 ]Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;

[2 ]Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA;

[3 ]Pacific Biosciences of California, Inc., Menlo Park, California 94025, USA;

[4 ]Department of Biology, University of Bari, Bari 70126, Italy;

[5 ]The Genome Institute at Washington University, Washington University School of Medicine, St. Louis, Missouri 63110, USA;

[6 ]Department of Computer Engineering, Bilkent University, Ankara, 06800, Turkey

Author notes

[7 ]Corresponding author E-mail eee@ 123456gs.washington.edu

Article

Medline ID: 9518021

DOI: 10.1101/gr.168450.113

PMC ID: 3975067

PubMed ID: 24418700

SO-VID: 550d34fe-540f-46cd-8b7c-ffd372d78af5

License:

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.

History

Date received : 17 October 2013

Date accepted : 9 January 2014

Page count

Pages: 9

Comments

Comment on this article

scite_

Cited by 116

See all cited by

- Version 1
- Version 1

Reconstructing complex regions of genomes using long-read sequencing technology

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 22

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions.

Diversity of human copy number variation and multicopy genes.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 118

Cited by 116