Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Burrows-Wheeler transform for terabases

Preprint

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      In order to avoid the reference bias introduced by mapping reads to a reference genome, bioinformaticians are investigating reference-free methods for analyzing sequenced genomes. With large projects sequencing thousands of individuals, this raises the need for tools capable of handling terabases of sequence data. A key method is the Burrows-Wheeler transform (BWT), which is widely used for compressing and indexing reads. We propose a practical algorithm for building the BWT of a large read collection by merging the BWTs of subcollections. With our 2.4 Tbp datasets, the algorithm can merge 600 Gbp/day on a single system, using 30 gigabytes of memory overhead on top of the run-length encoded BWTs.

      Related collections

      Author and article information

      Journal
      2015-11-03
      2016-01-14
      1511.00898

      http://arxiv.org/licenses/nonexclusive-distrib/1.0/

      Custom metadata
      This is the full version of the paper that was accepted to DCC 2016. The implementation is available at https://github.com/jltsiren/bwt-merge
      cs.DS

      Data structures & Algorithms

      Comments

      Comment on this article