appreci8: a pipeline for precise variant calling integrating 8 tools

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

The application of next-generation sequencing in research and particularly in clinical routine requires valid variant calling results. However, evaluation of several commonly used tools has pointed out that not a single tool meets this requirement. False positive as well as false negative calls necessitate additional experiments and extensive manual work. Intelligent combination and output filtration of different tools could significantly improve the current situation.

Results

We developed appreci8, an automatic variant calling pipeline for calling single nucleotide variants and short indels by combining and filtering the output of eight open-source variant calling tools, based on a novel artifact- and polymorphism score. Appreci8 was trained on two data sets from patients with myelodysplastic syndrome, covering 165 Illumina samples. Subsequently, appreci8’s performance was tested on five independent data sets, covering 513 samples. Variation in sequencing platform, target region and disease entity was considered. All calls were validated by re-sequencing on the same platform, a different platform or expert-based review. Sensitivity of appreci8 ranged between 0.93 and 1.00, while positive predictive value ranged between 0.65 and 1.00. In all cases, appreci8 showed superior performance compared to any evaluated alternative approach.

Availability and implementation

Appreci8 is freely available at https://hub.docker.com/r/wwuimi/appreci8/. Sequencing data (BAM files) of the 678 patients analyzed with appreci8 have been deposited into the NCBI Sequence Read Archive (BioProjectID: 388411; https://www.ncbi.nlm.nih.gov/bioproject/PRJNA388411).

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

An integrated semiconductor device enabling non-optical genome sequencing.

Jonathan Rothberg, Wolfgang Hinz, Todd M. Rearick … (2011)

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

0 comments Cited 549 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Performance comparison of benchtop high-throughput sequencing platforms.

Nicholas Loman, Raju V. Misra, Timothy Dallman … (2012)

Three benchtop high-throughput sequencing instruments are now available. The 454 GS Junior (Roche), MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) are laser-printer sized and offer modest set-up and running costs. Each instrument can generate data required for a draft bacterial genome sequence in days, making them attractive for identifying and characterizing pathogens in the clinical setting. We compared the performance of these instruments by sequencing an isolate of Escherichia coli O104:H4, which caused an outbreak of food poisoning in Germany in 2011. The MiSeq had the highest throughput per run (1.6 Gb/run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated the longest reads (up to 600 bases) and most contiguous assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h). Run in 100-bp mode, the Ion Torrent PGM had the highest throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM and 454 GS Junior both produced homopolymer-associated indel errors (1.5 and 0.38 errors per 100 bases, respectively).

0 comments Cited 470 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Comparison of Next-Generation Sequencing Systems

Lin Liu, Yinhu Li, Siliang Li … (2012)

With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized.

0 comments Cited 376 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

John Hancock: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 15 December 2018

Publication date (Electronic): 26 June 2018

Publication date PMC-release: 26 June 2018

Volume: 34

Issue: 24

Pages: 4205-4212

Affiliations

[1 ]Institute of Medical Informatics, University of Münster, Münster, Germany

[2 ]Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden

[3 ]Laboratory Hematology, RadboudUMC, Nijmegen GA, The Netherlands

[4 ]Department of Hematology, Oncology, and Rheumatology, Heidelberg University Hospital, Heidelberg, Germany

[5 ]Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden

[6 ]Departments of Hematology Oncology & Molecular Medicine, Fondazione IRCCS Policlinico San Matteo & University of Pavia, Pavia, Italy

Author notes

To whom correspondence should be addressed. E-mail: sarah.sandmann@ 123456uni-muenster.de

Author information

Sarah Sandmann http://orcid.org/0000-0002-5011-0641

Article

Publisher ID: bty518

DOI: 10.1093/bioinformatics/bty518

PMC ID: 6289140

PubMed ID: 29945233

SO-VID: b05dc5e4-1568-4cc2-b5d7-8776ba29ea12

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

History

Date received : 19 March 2018

Date revision received : 30 May 2018

Date accepted : 25 June 2018

Page count

Pages: 8

Funding

Funded by: European Union

Funded by: Triage-MDS

Funded by: ERA-Net TRANSCAN BMBF

Award ID: 01KT1401

Funded by: -Horizon2020 MDS-RIGHT

Award ID: 634789

Funded by: Deutsche Krebshilfe 10.13039/501100005972

Funded by: Verbesserung der Diagnostik von Tumorerkrankungen durch neue DNA-Sequenzierverfahren und Algorithmen

appreci8: a pipeline for precise variant calling integrating 8 tools

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

Genetoberfest

Most cited references 17

An integrated semiconductor device enabling non-optical genome sequencing.

Performance comparison of benchtop high-throughput sequencing platforms.

Comparison of Next-Generation Sequencing Systems

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 43

Cited by 12

Most referenced authors 2,195