Copy number variation detection and genotyping from exome sequence data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

While exome sequencing is readily amenable to single-nucleotide variant discovery, the sparse and nonuniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict (94% overall precision) both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r ² = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER ( copy number inference from exome reads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: not found

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Ali Mortazavi, Brian Williams, Kenneth McCue … (2008)

We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41-52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3' untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 x 10(5) distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices.

0 comments Cited 1328 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Is Open Access

A map of human genome variation from population-scale sequencing.

Amit Indap, Carlo Sidore (2011)

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

0 comments Cited 697 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Mapping and sequencing of structural variation from eight human genomes.

Jeffrey M. Kidd, Gregory M. Cooper, William F Donahue … (2008)

Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale--particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation--a standard for genotyping platforms and a prelude to future individual genome sequencing projects.

0 comments Cited 365 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Genome Res

Journal ID (iso-abbrev): Genome Res

Journal ID (publisher-id): GENOME

Title: Genome Research

Publisher: Cold Spring Harbor Laboratory Press

ISSN (Print): 1088-9051

ISSN (Electronic): 1549-5469

Publication date (Print): August 2012

Volume: 22

Issue: 8

Pages: 1525-1532

Affiliations

[1 ]Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA;

[2 ]NHLBI Exome Sequencing Project, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;

[3 ]Department of Public Health Sciences, Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia 22908, USA;

[4 ]Howard Hughes Medical Institute, Seattle, Washington 98195, USA

Author notes

[5 ]Corresponding author E-mail eee@ 123456gs.washington.edu

Article

Medline ID: 9518021

DOI: 10.1101/gr.138115.112

PMC ID: 3409265

PubMed ID: 22585873

SO-VID: b3c1904e-e7bb-4880-82d2-0ed335ed4205

License:

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

History

Date received : 26 January 2012

Date accepted : 11 May 2012

Comments

Comment on this article

scite_

Cited by 257

See all cited by

Most referenced authors 1,283

See all reference authors

- Version 1
- Version 1

Copy number variation detection and genotyping from exome sequence data

Read this article at

Abstract

Related collections

Exponential Random Graph Models

Most cited references 22

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

A map of human genome variation from population-scale sequencing.

Mapping and sequencing of structural variation from eight human genomes.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 204

Cited by 257

Most referenced authors 1,283