STAR: ultrafast universal RNA-seq aligner

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

Related collections

Author and article information

Journal

Title: Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 1460-2059

ISSN (Print): 1367-4803

Publication date Created: January 2013

Publication date Created: January 01 2013

Publication date Created: October 25 2012

Publication date (Print): January 2013

Publication date (Print): January 01 2013

Publication date (Electronic): October 25 2012

Volume: 29

Issue: 1

Pages: 15-21

Article

DOI: 10.1093/bioinformatics/bts635

PMC ID: 3530905

PubMed ID: 23104886

SO-VID: e5081734-6d53-4318-9c23-a9d053187d7b

History

Data availability:

Comments

Comment on this article

scite_

Cited by 18,325

See all cited by

- Version 1
- Version 1