This paper describes a parallel implementation of a matrix/vector library for C++ for a large distributed-memory multicomputer. The library is “self-optimising” by exploiting lazy evaluation: execution of matrix operations is delayed as much as possible. This exposes the context in which each intermediate result is used. The run-time system extracts a functional representation of the values being computed and optimises data distribution, grain size and scheduling prior to execution. This exploits results in the theory of program transformation for optimising parallel functional programs, while presenting an entirely conventional interface to the programmer. We present details of some of the simple optimisations we have implemented so far and illustrate their effect using a small example.
Content
Author and article information
Contributors
Paul H J Kelly
Conference
Publication date:
July
1995
Publication date
(Print):
July
1995
Pages: 1-10
Affiliations
[0001]Department of Computing, Imperial College
180 Queen’s Gate, London SW7 2BZ, UK