Quality control and preprocessing of metagenomic datasets

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Summary: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis.

Availability and Implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at http://prinseq.sourceforge.net/.

Contact: rschmied@ 123456sciences.sdsu.edu ; redwards@ 123456cs.sdsu.edu

Related collections

Most cited references 5

Record: found
Abstract: found
Article: found

Is Open Access

Manipulation of FASTQ data with Galaxy

Daniel Blankenberg, Assaf Gordon, Gregory Von Kuster … (2010)

Summary: Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps. Availability and Implementation: This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Two short movies that highlight the functionality of tools described in this manuscript as well as results from testing components of this tool suite against a set of previously published files are available at http://usegalaxy.org/u/dan/p/fastq Contact: james.taylor@emory.edu; anton@bx.psu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 309 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Systematic artifacts in metagenomes from complex microbial communities.

Vicente Gomez-Alvarez, Tracy K. Teal, Thomas M. Schmidt (2009)

Metagenomics is providing an unprecedented view of the taxonomic diversity, metabolic potential and ecological role of microbial communities in biomes as diverse as the mammalian gastrointestinal tract, the marine water column and soils. However, we have found a systematic error in metagenomes generated by 454-based pyrosequencing that leads to an overestimation of gene and taxon abundance; between 11% and 35% of sequences in a typical metagenome are artificial replicates. Here we document the error in several published and original datasets and offer a web-based solution (http://microbiomes.msu.edu/replicates) for identifying and removing these artifacts.

0 comments Cited 175 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

Robert Schmieder, Yan Wei Lim, Forest Rohwer … (2010)

Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner.

0 comments Cited 118 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (publisher-id): bioinformatics

Journal ID (hwp): bioinfo

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 15 March 2011

Publication date (Electronic): 28 January 2011

Publication date PMC-release: 28 January 2011

Volume: 27

Issue: 6

Pages: 863-864

Affiliations

¹Department of Computer Science, ²Computational Science Research Center, San Diego State University, San Diego, CA 92182 and ³Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA

Author notes

* To whom correspondence should be addressed.

Associate Editor: Alex Bateman

Article

Publisher ID: btr026

DOI: 10.1093/bioinformatics/btr026

PMC ID: 3051327

PubMed ID: 21278185

SO-VID: 244a01be-c7c0-40cf-979c-5c5dad3d1669

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 8 November 2010

Date revision received : 11 January 2011

Date accepted : 12 January 2011

Comments

Comment on this article

scite_

Cited by 2,118

See all cited by

Most referenced authors 73

See all reference authors

- Version 1

Quality control and preprocessing of metagenomic datasets

Read this article at

Abstract

Related collections

Genetoberfest

Most cited references 5

Manipulation of FASTQ data with Galaxy

Systematic artifacts in metagenomes from complex microbial communities.

TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 279

Cited by 2,118

Most referenced authors 73