4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an indispensable component of analytical performance models, such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a deep understanding of the performance-relevant interactions between hardware architecture and loop code. We present the Open Source Architecture Code Analyzer (OSACA), a static analysis tool for predicting the execution time of sequential loops comprising x86 instructions under the assumption of an infinite first-level cache and perfect out-of-order scheduling. We show the process of building a machine model from available documentation and semi-automatic benchmarking, and carry it out for the latest Intel Skylake and AMD Zen micro-architectures. To validate the constructed models, we apply them to several assembly kernels and compare runtime predictions with actual measurements. Finally we give an outlook on how the method may be generalized to new architectures.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: not found
          • Article: not found

          The gem5 simulator

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model

            , , (2015)
            Stencil algorithms on regular lattices appear in many fields of computational science, and much effort has been put into optimized implementations. Such activities are usually not guided by performance models that provide estimates of expected speedup. Understanding the performance properties and bottlenecks by performance modeling enables a clear view on promising optimization opportunities. In this work we refine the recently developed Execution-Cache-Memory (ECM) model and use it to quantify the performance bottlenecks of stencil algorithms on a contemporary Intel processor. This includes applying the model to arrive at single-core performance and scalability predictions for typical corner case stencil loop kernels. Guided by the ECM model we accurately quantify the significance of "layer conditions," which are required to estimate the data traffic through the memory hierarchy, and study the impact of typical optimization approaches such as spatial blocking, strength reduction, and temporal blocking for their expected benefits. We also compare the ECM model to the widely known Roofline model.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              ZSim

                Bookmark

                Author and article information

                Journal
                04 September 2018
                Article
                1809.00912
                2e5f71b5-b144-4a4e-8cfb-dea47fc38e51

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                11 pages, 4 figures, 7 tables
                cs.PF cs.SE

                Software engineering,Performance, Systems & Control
                Software engineering, Performance, Systems & Control

                Comments

                Comment on this article