26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Computing the R of the QR factorization of tall and skinny matrices using MPI_Reduce

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A QR factorization of a tall and skinny matrix with n columns can be represented as a reduction. The operation used along the reduction tree has in input two n-by-n upper triangular matrices and in output an n-by-n upper triangular matrix which is defined as the R factor of the two input matrices stacked the one on top of the other. This operation is binary, associative, and commutative. We can therefore leverage the MPI library capabilities by using user-defined MPI operations and MPI_Reduce to perform this reduction. The resulting code is compact and portable. In this context, the user relies on the MPI library to select a reduction tree appropriate for the underlying architecture.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: not found
          • Article: not found

          A class of parallel tiled linear algebra algorithms for multicore architectures

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Applying recursion to serial and parallel QR factorization leads to better performance

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Parallel Tiled QR Factorization for Multicore Architectures

              As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data. These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out of order execution of the tasks which will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can only be exploited at the level of the BLAS operations.
                Bookmark

                Author and article information

                Journal
                23 February 2010
                Article
                1002.4250
                c3355f39-05b2-4e80-a897-e1f4288abe12

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                math.NA

                Comments

                Comment on this article