16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      ERGC: an efficient referential genome compression algorithm

      research-article
      , *
      Bioinformatics
      Oxford University Press

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: Genome sequencing has become faster and more affordable. Consequently, the number of available complete genomic sequences is increasing rapidly. As a result, the cost to store, process, analyze and transmit the data is becoming a bottleneck for research and future medical applications. So, the need for devising efficient data compression and data reduction techniques for biological sequencing data is growing by the day. Although there exists a number of standard data compression algorithms, they are not efficient in compressing biological data. These generic algorithms do not exploit some inherent properties of the sequencing data while compressing. To exploit statistical and information-theoretic properties of genomic sequences, we need specialized compression algorithms. Five different next-generation sequencing data compression problems have been identified and studied in the literature. We propose a novel algorithm for one of these problems known as reference-based genome compression.

          Results: We have done extensive experiments using five real sequencing datasets. The results on real genomes show that our proposed algorithm is indeed competitive and performs better than the best known algorithms for this problem. It achieves compression ratios that are better than those of the currently best performing algorithms. The time to compress and decompress the whole genome is also very promising.

          Availability and implementation: The implementations are freely available for non-commercial purposes. They can be downloaded from http://engr.uconn.edu/∼rajasek/ERGC.zip.

          Contact: rajasek@ 123456engr.uconn.edu

          Related collections

          Author and article information

          Journal
          Bioinformatics
          Bioinformatics
          bioinformatics
          bioinfo
          Bioinformatics
          Oxford University Press
          1367-4803
          1367-4811
          01 November 2015
          02 July 2015
          : 31
          : 21
          : 3468-3475
          Affiliations
          Department of Computer Science and Engineering, University of Connecticut, Storrs , CT 06269, USA
          Author notes
          *To whom correspondence should be addressed.

          Associate Editor: John Hancock

          Article
          PMC4838057 PMC4838057 4838057 btv399
          10.1093/bioinformatics/btv399
          4838057
          26139636
          2bebb388-efb3-468e-9315-87163629855b
          © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
          History
          : 11 February 2015
          : 12 June 2015
          : 17 June 2015
          Page count
          Pages: 8
          Categories
          Original Papers
          Sequence Analysis

          Comments

          Comment on this article