137
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Burrows-Wheeler transform for terabases

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In order to avoid the reference bias introduced by mapping reads to a reference genome, bioinformaticians are investigating reference-free methods for analyzing sequenced genomes. With large projects sequencing thousands of individuals, this raises the need for tools capable of handling terabases of sequence data. A key method is the Burrows-Wheeler transform (BWT), which is widely used for compressing and indexing reads. We propose a practical algorithm for building the BWT of a large read collection by merging the BWTs of subcollections. With our 2.4 Tbp datasets, the algorithm can merge 600 Gbp/day on a single system, using 30 gigabytes of memory overhead on top of the run-length encoded BWTs.

          Related collections

          Author and article information

          Journal
          2015-11-03
          2016-01-14
          Article
          1511.00898
          061d4add-c185-4d0e-ab30-d7e78acc83e0

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          This is the full version of the paper that was accepted to DCC 2016. The implementation is available at https://github.com/jltsiren/bwt-merge
          cs.DS

          Data structures & Algorithms
          Data structures & Algorithms

          Comments

          Comment on this article