24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improved ESP-index: a practical self-index for highly repetitive texts

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: not found
          • Book Chapter: not found

          A Faster Grammar-Based Self-index

            Bookmark
            • Record: found
            • Abstract: not found
            • Book Chapter: not found

            Succinct Representations of Permutations

              Bookmark
              • Record: found
              • Abstract: not found
              • Book Chapter: not found

              LZ77-Based Self-indexing with Faster Pattern Matching

                Bookmark

                Author and article information

                Journal
                2014-04-19
                2014-04-27
                Article
                1404.4972
                b301e345-6bb4-4894-a00f-7fa648384872

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014)
                cs.DS

                Data structures & Algorithms
                Data structures & Algorithms

                Comments

                Comment on this article