SeqPurge: highly-sensitive adapter trimming for paired-end NGS data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Trimming of adapter sequences from short read data is a common preprocessing step during NGS data analysis. When performing paired-end sequencing, the overlap between forward and reverse read can be used to identify excess adapter sequences. This is exploited by several previously published adapter trimming tools. However, our evaluation on amplicon-based data shows that most of the current tools are not able to remove all adapter sequences and that adapter contamination may even lead to spurious variant calls.

Results

Here we present SeqPurge ( https://github.com/imgag/ngs-bits), a highly-sensitive adapter trimmer that uses a probabilistic approach to detect the overlap between forward and reverse reads of Illumina sequencing data. SeqPurge can detect very short adapter sequences, even if only one base long. Compared to other adapter trimmers specifically designed for paired-end data, we found that SeqPurge achieves a higher sensitivity. The number of remaining adapter bases after trimming is reduced by up to 90 %, depending on the compared tool. In simulations with different error rates, we found that SeqPurge is also the most error-tolerant adapter trimmer in the comparison.

Conclusion

SeqPurge achieves a very high sensitivity and a high error-tolerance, combined with a specificity and runtime that are comparable to other state-of-the-art adapter trimmers. The very good adapter trimming performance, complemented with additional features such as quality-based trimming and basic quality control, makes SeqPurge an excellent choice for the pre-processing of paired-end NGS data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1069-7) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 5

Record: found
Abstract: found
Article: not found

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.

Gerton Lunter, Martin Goodson (2011)

High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences ("reads") resulting from a sequencing run are first "mapped" (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.

0 comments Cited 523 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

AdapterRemoval: easy cleaning of next-generation sequencing reads

Stinus Lindgreen (2012)

Background With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. Findings We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5’ and 3’ ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. Conclusions AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data.

0 comments Cited 282 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis

Cristian Del Fabbro, Simone Scalabrin, Michele Morgante … (2013)

Next Generation Sequencing is having an extremely strong impact in biological and medical research and diagnostics, with applications ranging from gene expression quantification to genotyping and genome reconstruction. Sequencing data is often provided as raw reads which are processed prior to analysis 1 of the most used preprocessing procedures is read trimming, which aims at removing low quality portions while preserving the longest high quality part of a NGS read. In the current work, we evaluate nine different trimming algorithms in four datasets and three common NGS-based applications (RNA-Seq, SNP calling and genome assembly). Trimming is shown to increase the quality and reliability of the analysis, with concurrent gains in terms of execution time and computational resources needed.

0 comments Cited 174 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Marc Sturm: marc.sturm@med.uni-tuebingen.de

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date (Electronic): 10 May 2016

Publication date PMC-release: 10 May 2016

Publication date Collection: 2016

Volume: 17

Electronic Location Identifier: 208

Affiliations

Institute of Medical Genetics and Applied Genomics, University Hospital Tübingen, Tübingen, Germany

Article

Publisher ID: 1069

DOI: 10.1186/s12859-016-1069-7

PMC ID: 4862148

PubMed ID: 27161244

SO-VID: 20ccc8e4-2d79-44d6-afdb-ce28577e129b

License:

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 22 January 2016

Date accepted : 3 May 2016

Custom metadata

ScienceOpen disciplines: Bioinformatics & Computational biology

Data availability:

ScienceOpen disciplines: Bioinformatics & Computational biology

Comments

Comment on this article

scite_

Cited by 55

See all cited by

Most referenced authors 234

See all reference authors

SeqPurge: highly-sensitive adapter trimming for paired-end NGS data

Read this article at

Abstract

Background

Results

Conclusion

Electronic supplementary material

Related collections

Privacy and Data Protection

Most cited references 5

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.

AdapterRemoval: easy cleaning of next-generation sequencing reads

An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 63

Cited by 55

Most referenced authors 234