8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Distributed Caching for Complex Querying of Raw Arrays

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority -- by as much as two orders of magnitude -- of the proposed framework over existing techniques in terms of cache overhead and workload execution time.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: not found
          • Article: not found

          LRFU: a spectrum of policies that subsumes the least recently used and least frequently used policies

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            The LRU-K page replacement algorithm for database disk buffering

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Distributed Cache Management in Information-Centric Networks

                Bookmark

                Author and article information

                Journal
                16 March 2018
                Article
                1803.06089
                cde672b3-fb81-4d1a-9995-20001cf8251b

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.DB cs.DC

                Comments

                Comment on this article