An Axiomatic Semantics for

Proof rules for both directly and indirectly indexed data-parallel array assignment are presented. Consequently, the correctness of two programs, (i) a representation of Cannon's algorithm and (ii) sparse matrix-vector multiplication, are established by application of the rules.


Introduction
Data parallelism [2] [3] [7] [12] [14] [16] is a structure for specifying scientific computations (usually representations of numerical algorithms) which are to be executed on multi-processor machines.Data parallel computation is attractive because (i) many scientific applications can be conveniently described and efficiently implemented in the framework; and (ii) it is conceptually simpler than many alternative models (for example, CSP [11] and BSP [15]).Its significance is reflected by the adoption of the array assignment construct in FORTRAN 90 [6] and the FORALL statement [1] in HPF [9].The goals of this paper are to (i) provide an axiomatic definition of data parallel assignment and (ii) illustrate how the resulting formal rules may be used in correctness proofs.
The notation ∀i ∈S.a(i): = E(i) is used to represent (directly indexed) data parallel assignment; S denotes a subset of the indices of array variable a, i denotes an arbitrary index tuple in S, and E(i) denotes an expression involving program variables and bound index variable i.In particular, S may denote either a regular or irregular substructure of the domain of a and E(i) may be a conditional expression and involve index expressions.
Examples: Informally, the meaning of ∀i ∈S.a(i): = E(i) is to (i) evaluate (simultaneously) all right hand sides, E(i), i ∈S and (ii) perform the assignments to the specified substructure of a. Data parallel assignment may involve indirect addressing of the left hand side variable [7,14].The notation ∀i ∈S.a(v(i)): = E(i) is used to An axiomatic semantics for Data-Parallel Computation 1st Irish Workshop on Formal Methods, 1997 2 represent indirectly indexed data parallel assignment where v denotes another array variable.This form of the construct may be used, for example, to specify parallel operations over sparse data structures where the positions of non-null elements are recorded in an auxiliary array (akin to the array v above).
The goal of this article is to present axiomatic rules for reasoning about both direct and indirect data parallel assignment and to illustrate the use of the rules through proofs of Cannon's matrix multiplication algorithm and a vector-matrix multiplication on a sparse store.

. Axiomatic Semantics (Direct)
An array a is a function which for each possible argument gives the value of the array at that point.The generalised assignment ∀i ∈S.a(i): = E(i) assigns to a new array that is like a except at subscript values defined by S. We define the notation (a, i ∈S, E(i)) to denote a new array, by

(Rule array)
Therefore, data parallel assignment can be defined axiomatically in a way akin to sequential (array) assignment [4,5,8,10]: a denotes the substitution of the term E(i) for a(i) for all i in S. Q is an assertion about the state after an assignment; every occurrence of a(i), i ∈S , in Q can be replaced by the semantically equivalent expression E(i) (evaluated) in the state before the assignment.Thus, the meaning of Q after the assignment is equivalent to the meaning of the assertion Q (a;i∈S:E(i)) a before the assignment.For example, consider the derivation of a precondition for the statement ∀ (i,j) ∈S.r(i,j) := 0.0 and post condition Q = ∀(i, j) ∈S.r(i, j) = 0. 0 .Rule DPD defines the precondition to be: Rule DPD is sound and relatively complete [13].Rule DPD assumes that: (i) the expression E(i) is well formed; and (ii) the index set S is a subset of the domain of assigned variable a.
We follow convention [8] and exclude these conditions from the rule.

. Cannon's Algorithm: an illustration
The use of the axiomatic rule is illustrated by a proof of correctness of Cannon's algorithm for multiplying two n × n matrices b and c .The algorithm is particularly suited for implementation on array processors.
Arrays b and c are realigned so that every index position (i,j) is associated with a pair of components of the realigned matrices which form a part of the (i,j)th inner-product of the result matrix (see statements 2 and 3 below).The n 2 required inner-products are computed in parallel (through data-parallel assignments).Individual An axiomatic semantics for Data-Parallel Computation 1st Irish Workshop on Formal Methods, 1997 3 inner-products are assembled in n stages (statements 5 and 6).At each stage in the assembly of the inner-products the "matrices" b and c are realigned (statements 7 and 8).
Realignment is carried out using cyclic shifting.Let ⊕ denote cyclic (n) addition: Let b and c denote the values of the variables b and c in the initial state.A description of Cannon's algorithm is given below: where the invariant I[k] is defined by: It remains to establish that the assertions embedded in the program above are correct.

. 1 Initialisation
The precondition of the for loop, I [1], is shown to be consistent with the precondition and statements 1 -4: ∀ (i,j) ∈S.c(i,j) := c(i ⊕ (j-1),j); First consider statement 4. We have, by Rule DPD and expansion: The right hand side expression simplifies to: Assertion A can be dragged back through statements 2 and 3 (Rule DPD)giving: An axiomatic semantics for Data-Parallel Computation 1st Irish Workshop on Formal Methods, 1997 4 is valid (by expansion) as b = b , c = c in the initial state.

. 2 Invariant
We establish the invariance of I.For loop body postcondition I[k+1], the following pre-condition can be calculated, by applying proof Rule DPD: The precondition expands to: which simplifies to: Finally, letting the expression above be B, it follows that B ⇐ I[k] (i.e.I is an invariant of the loop).

. Termination
∑ by cyclic arithmetic and hence the program post-condition follows.

. Indirect addressing
It is useful to specify a construct which defines the indices at which an array is to be updated indirectly by means of a subsidiary indexing function.Such a construct supports the expression of parallel operations over sparse data structures and allows data to be permuted (e.g.sorting).Informally, the meaning of ∀i ∈S.a(v(i)): = E(i) is to: (i) evaluate (simultaneously) all right hand sides, E(i), i ∈S and (ii) perform the assignments to the specified substructure {v(i)|i ∈S} of a.
Difficulties occur if S ≠ {v(i)|i ∈S} (i.e. more right hand sides are instantiated than there are elements to be updated).Such a situation can be resolved by either : (a) defining assignments in a non-deterministic way: for example, if v(1)=v( 2 The second approach is used in order to maintain consistency with FORTRAN 90.Thus,

(Rule DPI)
where the modified array (a; v;i ∈S: E(i))is defined as: In other words, if k occurs at position ip in vector v, then (a;v;i ∈S: . As is to be expected Rule DPI is more cumbersome than Rule DPD because of the definedness condition and the indirect way that the terms A(V) and E are linked [7,14].The instantiation of the right hand side expressions is carried out in the conventional manner.

. Sparse Matrix Manipulation: an illustration
The use of the indirect axiomatic rule is illustrated by a proof of correctness of a method for multiplying a sparse n × n matrix by a dense vector.Let A be a sparse n × n matrix with exactly w non-zero elements per row.A compact store is defined by: ∀(i, j) ∈{1,..., n} × {1,..., w}.a(i, j) = A(i, indices(i, j)) where indices is an auxiliary matrix with (i,j)th element recording the position of the jth non-zero element in row i of a.For example, if then the sparse store would be represented by:

injective(indices(k)).
A conventional matrix transpose vector product V = A T .ucan be recast in terms of the sparse store as follows: Proof Outline:

. 1 Initialisation
The precondition of the for loop, I [1], is by definition:

Summary
Two vital concerns with which the designer of a programming language is concerned are: (i) the potential efficiency of compiler generated code; and (ii) the ease with which programs expressed in the language can be reasoned about.
The ease of proof of a program depends on the tractability of the underlying language constructs.The goal of this paper is to formulate the axiomatic laws of data parallelism and, by so doing, to expose the level of difficulty in reasoning about data parallel programs.
then 0. 0 else 1. 0 Example 1 specifies that all elements of the array a in the range {1.. n} are simultaneously set to zero.Example 2 illustrates a multidimensional assignment while example 3 is a value dependent assignment.Example 4 is an index dependent computation which has separate definitions for boundary and interior points of grid g (such computations typically arise in PDE algorithms).