3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Extraction of long k-mers using spaced seeds

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The extraction of k-mers from sequencing reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These methods tend to be more accurate when the used k-mers are unique in the analyzed DNA, and thus the use of longer k-mers is preferred. When the read lengths of short read sequencing technologies increase, the error rate will become the determining factor for the largest possible value of k. Here we propose LoMeX which uses spaced seeds to extract long k-mers accurately even in the presence of sequencing errors. Our experiments show that LoMeX can extract long k-mers from current Illumina reads with a higher recall than a standard k-mer counting tool. Furthermore, our experiments on simulated data show that when the read length further increases, the performance of standard k-mer counters declines, whereas LoMeX still extracts long k-mers successfully.

          Related collections

          Author and article information

          Journal
          22 October 2020
          Article
          2010.11592
          24f72430-08ee-48ac-ad13-003ac1d260de

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          LoMeX is freely available at https://github.com/Denopia/LoMeX
          q-bio.GN

          Genetics
          Genetics

          Comments

          Comment on this article