Comparison of error correction algorithms for Ion Torrent PGM data: application to hepatitis B virus

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Ion Torrent Personal Genome Machine (PGM) technology is a mid-length read, low-cost and high-speed next-generation sequencing platform with a relatively high insertion and deletion (indel) error rate. A full systematic assessment of the effectiveness of various error correction algorithms in PGM viral datasets (e.g., hepatitis B virus (HBV)) has not been performed. We examined 19 quality-trimmed PGM datasets for the HBV reverse transcriptase (RT) region and found a total error rate of 0.48% ± 0.12%. Deletion errors were clearly present at the ends of homopolymer runs. Tests using both real and simulated data showed that the algorithms differed in their abilities to detect and correct errors and that the error rate and sequencing depth significantly affected the performance. Of the algorithms tested, Pollux showed a better overall performance but tended to over-correct ‘genuine’ substitution variants, whereas Fiona proved to be better at distinguishing these variants from sequencing errors. We found that the combined use of Pollux and Fiona gave the best results when error-correcting Ion Torrent PGM viral data.

Related collections

Most cited references 34

Record: found
Abstract: found
Article: not found

An integrated semiconductor device enabling non-optical genome sequencing.

Jonathan Rothberg, Wolfgang Hinz, Todd M. Rearick … (2011)

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

0 comments Cited 549 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

SSAHA: a fast search method for large DNA databases.

Z. Ning, A Cox, J Mullikin (2001)

We describe an algorithm, SSAHA (Sequence Search and Alignment by Hashing Algorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples of k contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the "hits" for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHA algorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects.

0 comments Cited 373 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

AdapterRemoval: easy cleaning of next-generation sequencing reads

Stinus Lindgreen (2012)

Background With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. Findings We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5’ and 3’ ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. Conclusions AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data.

0 comments Cited 282 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Keyue Ding: dingkeyue@hospital.cqmu.edu.cn

Journal

Journal ID (nlm-ta): Sci Rep

Journal ID (iso-abbrev): Sci Rep

Title: Scientific Reports

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2045-2322

Publication date (Electronic): 14 August 2017

Publication date PMC-release: 14 August 2017

Publication date Collection: 2017

Volume: 7

Electronic Location Identifier: 8106

Affiliations

[1 ]ISNI 0000 0000 8653 0555, GRID grid.203458.8, , Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital, Chongqing Medical University, ; Chongqing, 400010 P.R. China

[2 ]ISNI 0000 0001 0662 3178, GRID grid.12527.33, , Center for Hepatobillary and Pancreatic Diseases, Beijing Tsinghua Changgung Hospital, Medical Center, Tsinghua University, ; Beijing, 100044 P.R. China

Article

Publisher ID: 8139

DOI: 10.1038/s41598-017-08139-y

PMC ID: 5556038

PubMed ID: 28808243

SO-VID: 1b567f86-6274-473f-b2b5-948d65026962

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 5 January 2017

Date accepted : 5 July 2017

Custom metadata

ScienceOpen disciplines: Uncategorized

Data availability:

ScienceOpen disciplines: Uncategorized

Comments

Comment on this article

scite_

Cited by 14

See all cited by

Most referenced authors 868

See all reference authors

Comparison of error correction algorithms for Ion Torrent PGM data: application to hepatitis B virus

Read this article at

Abstract

Related collections

Point-Of-Care Testing for Hepatitis

Most cited references 34

An integrated semiconductor device enabling non-optical genome sequencing.

SSAHA: a fast search method for large DNA databases.

AdapterRemoval: easy cleaning of next-generation sequencing reads

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 94

Cited by 14

Most referenced authors 868