Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models.

Results

We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions.

Conclusions

We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-3425-4) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 13

Record: found
Abstract: found
Article: not found

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Kristian Cibulskis, Michael S Lawrence, Scott L Carter … (2013)

Detection of somatic point substitutions is a key step in characterizing the cancer genome. Mutations in cancer are rare (0.1–100/Mb) and often occur only in a subset of the sequenced cells, either due to contamination by normal cells or due to tumor heterogeneity. Consequently, mutation calling methods need to be both specific, avoiding false positives, and sensitive to detect clonal and sub-clonal mutations. The decreased sensitivity of existing methods for low allelic fraction mutations highlights the pressing need for improved and systematically evaluated mutation detection methods. Here we present MuTect, a method based on a Bayesian classifier designed to detect somatic mutations with very low allele-fractions, requiring only a few supporting reads, followed by a set of carefully tuned filters that ensure high specificity. We also describe novel benchmarking approaches, which use real sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.

0 comments Cited 1080 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Detection of ultra-rare mutations by next-generation sequencing.

Michael W Schmitt, Scott Kennedy, Jesse Salk … (2012)

Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ~1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when "deep sequencing" genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, we have developed a method termed Duplex Sequencing. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. We determine that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. In addition, we establish that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage. We apply the method to directly assess the frequency and pattern of random mutations in mitochondrial DNA from human cells.

0 comments Cited 383 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform

Melanie Schirmer, Umer Z. Ijaz, Rosalinda D'Amore … (2015)

With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%.

0 comments Cited 316 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Chang Xu: chang.xu@qiagen.com

Yexun Wang:

ORCID: http://orcid.org/0000-0003-0362-1892

yexun.wang@qiagen.com

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2164

Publication date (Electronic): 3 January 2017

Publication date PMC-release: 3 January 2017

Publication date Collection: 2017

Volume: 18

Electronic Location Identifier: 5

Affiliations

Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland, 21703 USA

Author information

Yexun Wang http://orcid.org/0000-0003-0362-1892

Article

Publisher ID: 3425

DOI: 10.1186/s12864-016-3425-4

PMC ID: 5209917

PubMed ID: 28049435

SO-VID: ad6c7de7-d770-462d-8edb-78969ed83922

License:

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 27 May 2016

Date accepted : 14 December 2016

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: variant caller,molecular barcode,statistical model,pcr enrichment

Data availability:

ScienceOpen disciplines: Genetics

Keywords: variant caller, molecular barcode, statistical model, pcr enrichment

Comments

Comment on this article

scite_

Cited by 47

See all cited by

Most referenced authors 239

See all reference authors

Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller

Read this article at

Abstract

Background

Results

Conclusions

Electronic supplementary material

Related collections

Genome Engineering using CRISPR

Most cited references 13

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Detection of ultra-rare mutations by next-generation sequencing.

Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 165

Cited by 47

Most referenced authors 239