Identification of copy number variants in whole-genome data using Reference Coverage Profiles

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.

Related collections

Most cited references 49

Record: found
Abstract: found
Article: not found

Analysis of genetic inheritance in a family quartet by whole-genome sequencing.

Jared Roach, Gustavo Glusman, Arian Smit … (2010)

We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.

0 comments Cited 354 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Sensitive and accurate detection of copy number variants using read depth of coverage.

Seungtai Yoon, Zhenyu Xuan, Vladimir Makarov … (2009)

Methods for the direct detection of copy number variation (CNV) genome-wide have become effective instruments for identifying genetic risk factors for disease. The application of next-generation sequencing platforms to genetic studies promises to improve sensitivity to detect CNVs as well as inversions, indels, and SNPs. New computational approaches are needed to systematically detect these variants from genome sequence data. Existing sequence-based approaches for CNV detection are primarily based on paired-end read mapping (PEM) as reported previously by Tuzun et al. and Korbel et al. Due to limitations of the PEM approach, some classes of CNVs are difficult to ascertain, including large insertions and variants located within complex genomic regions. To overcome these limitations, we developed a method for CNV detection using read depth of coverage. Event-wise testing (EWT) is a method based on significance testing. In contrast to standard segmentation algorithms that typically operate by performing likelihood evaluation for every point in the genome, EWT works on intervals of data points, rapidly searching for specific classes of events. Overall false-positive rate is controlled by testing the significance of each possible event and adjusting for multiple testing. Deletions and duplications detected in an individual genome by EWT are examined across multiple genomes to identify polymorphism between individuals. We estimated error rates using simulations based on real data, and we applied EWT to the analysis of chromosome 1 from paired-end shotgun sequence data (30x) on five individuals. Our results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods.

0 comments Cited 243 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

CNV-seq, a new method to detect copy number variation using high-throughput sequencing

Chao Xie, Martti T Tammi (2009)

Background DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. Results Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. Conclusion Simulation of various sequencing methods with coverage between 0.1× to 8× show overall specificity between 91.7 – 99.9%, and sensitivity between 72.2 – 96.5%. We also show the results for assessment of CNV between two individual human genomes.

0 comments Cited 243 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Gustavo Glusman: URI : http://community.frontiersin.org/people/u/196951

Jared C. Roach: URI : http://community.frontiersin.org/people/u/44701

Leroy Hood: URI : http://community.frontiersin.org/people/u/12300

Journal

Journal ID (nlm-ta): Front Genet

Journal ID (iso-abbrev): Front Genet

Journal ID (publisher-id): Front. Genet.

Title: Frontiers in Genetics

Publisher: Frontiers Media S.A.

ISSN (Electronic): 1664-8021

Publication date (Electronic): 17 February 2015

Publication date Collection: 2015

Volume: 6

Electronic Location Identifier: 45

Affiliations

[1] ¹Institute for Systems Biology Seattle, WA, USA

[2] ²Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA

Author notes

Edited by: Yih-Horng Shiao, US Patent Trademark Office, USA

Reviewed by: Tony Merriman, University of Otago, New Zealand; Junjie Fu, Chinese Academy of Agricultural Sciences, China

*Correspondence: Gustavo Glusman, Institute for Systems Biology, 401 Terry Ave. N, Seattle, WA 98109, USA e-mail: gustavo@ 123456systemsbiology.org

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics.

Article

DOI: 10.3389/fgene.2015.00045

PMC ID: 4330915

PubMed ID: 25741365

SO-VID: 546063c5-e992-4588-9312-4f777eb9888b

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 30 November 2014

Date accepted : 30 January 2015

Page count

Figures: 8, Tables: 1, Equations: 0, References: 61, Pages: 13, Words: 9402

Comments

Comment on this article

scite_

Cited by 11

See all cited by

Most referenced authors 915

See all reference authors

Identification of copy number variants in whole-genome data using Reference Coverage Profiles

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 49

Analysis of genetic inheritance in a family quartet by whole-genome sequencing.

Sensitive and accurate detection of copy number variants using read depth of coverage.

CNV-seq, a new method to detect copy number variation using high-throughput sequencing

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 243

Cited by 11

Most referenced authors 915