27
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PlinyCompute: A Platform for High-Performance, Distributed, Data-Intesive Tool Development

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PC can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: not found
          • Article: not found

          OpenMP: an industry standard API for shared-memory programming

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A high-performance, portable implementation of the MPI message passing interface standard

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Spark SQL

                Bookmark

                Author and article information

                Journal
                15 November 2017
                Article
                1711.05573
                b18a4aff-d222-422e-94ff-dc7161c8bd9d

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                48 pages, including references and Appendix
                cs.DB cs.DC

                Comments

                Comment on this article