Tools for mapping high-throughput sequencing data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task, numerous software tools have been proposed. Determining the mappers that are most suitable for a specific application is not trivial. This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem.

Related collections

Most cited references 42

Record: found
Abstract: found
Article: not found

Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning.

Shawn Cokus, Suhua Feng, Xiaoyu Zhang … (2008)

Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences. Recent genomic studies in Arabidopsis thaliana have revealed that many endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels. However, plants have different types of methylation controlled by different genetic pathways, and detailed information on the methylation status of each cytosine in any given genome is lacking. To this end, we generated a map at single-base-pair resolution of methylated cytosines for Arabidopsis, by combining bisulphite treatment of genomic DNA with ultra-high-throughput sequencing using the Illumina 1G Genome Analyser and Solexa sequencing technology. This approach, termed BS-Seq, unlike previous microarray-based methods, allows one to sensitively measure cytosine methylation on a genome-wide scale within specific sequence contexts. Here we describe methylation on previously inaccessible components of the genome and analyse the DNA methylation sequence composition and distribution. We also describe the effect of various DNA methylation mutants on genome-wide methylation patterns, and demonstrate that our newly developed library construction and computational methods can be applied to large genomes such as that of mouse.

0 comments Cited 793 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

ART: a next-generation sequencing read simulator.

Weichun Huang, Leping Li, Jason R. Myers … (2012)

ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.

0 comments Cited 653 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.

Gerton Lunter, Martin Goodson (2011)

High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences ("reads") resulting from a sequencing run are first "mapped" (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.

0 comments Cited 523 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 1460-2059

ISSN (Print): 1367-4803

Publication date Created: December 2012

Publication date Created: December 01 2012

Publication date Created: October 11 2012

Publication date (Print): December 2012

Publication date (Print): December 01 2012

Publication date (Electronic): October 11 2012

Volume: 28

Issue: 24

Pages: 3169-3177

Article

DOI: 10.1093/bioinformatics/bts605

PubMed ID: 23060614

SO-VID: 6ba6b3b2-0cb2-4a1d-9baa-1cda98bbbce1

History

Data availability:

Comments

Comment on this article

scite_

Cited by 100

See all cited by

Most referenced authors 1,393

See all reference authors

- Version 1

Tools for mapping high-throughput sequencing data

Read this article at

Abstract

Related collections

ChemSpider related publications

Most cited references 42

Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning.

ART: a next-generation sequencing read simulator.

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 1,798

Cited by 100

Most referenced authors 1,393