Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Preprint

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance modeling of loop nests. In comparison to the Roofline model, the ECM model can also describes the single-core performance and saturation behavior on a multicore chip. We give an introduction to the Roofline and ECM models, and to stencil performance modeling using layer conditions (LC). We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis. The layer condition analysis allows to predict optimal spatial blocking factors for loop nests. Together with the models it enables an ab-initio estimate of the potential benefits of loop blocking optimizations and of useful block sizes. In cases where LC analysis is not easily possible, Kerncraft supports a cache simulator as a fallback option. Using a 25-point long-range stencil we demonstrate the usefulness and predictive power of the Kerncraft tool.

      Related collections

      Most cited references 11

      • Record: found
      • Abstract: not found
      • Article: not found

      Roofline

        Bookmark
        • Record: found
        • Abstract: not found
        • Article: not found

        POLLY — PERFORMING POLYHEDRAL OPTIMIZATIONS ON A LOW-LEVEL INTERMEDIATE REPRESENTATION

          Bookmark
          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model

           ,  ,   (2015)
          Stencil algorithms on regular lattices appear in many fields of computational science, and much effort has been put into optimized implementations. Such activities are usually not guided by performance models that provide estimates of expected speedup. Understanding the performance properties and bottlenecks by performance modeling enables a clear view on promising optimization opportunities. In this work we refine the recently developed Execution-Cache-Memory (ECM) model and use it to quantify the performance bottlenecks of stencil algorithms on a contemporary Intel processor. This includes applying the model to arrive at single-core performance and scalability predictions for typical corner case stencil loop kernels. Guided by the ECM model we accurately quantify the significance of "layer conditions," which are required to estimate the data traffic through the memory hierarchy, and study the impact of typical optimization approaches such as spatial blocking, strength reduction, and temporal blocking for their expected benefits. We also compare the ECM model to the widely known Roofline model.
            Bookmark

            Author and article information

            Journal
            2017-01-13
            1702.04653

            http://arxiv.org/licenses/nonexclusive-distrib/1.0/

            Custom metadata
            22 pages, 5 figures
            cs.PF

            Performance, Systems & Control

            Comments

            Comment on this article