A Case Study on Proving Transformations Correct: Data-Parallel Conversion (cid:1)

The issue of correctness in the context of a certain style of program transformation is investigated. This style is characterised by the fully automated application of large numbers of simple transformation rules to a representation of a functional program (serving as a speciﬁcation) to produce an equivalent efﬁcient imperative program. The simplicity of the transformation rules ensures that the proofs of their correctness are straightforward. A selection of transformations appropriate for use in a particular context are shown to preserve program meaning. The transformations convert array operations expressed as the application of a small numberof general-purposefunctions into applications of a large numberof functions whichare amenable to efﬁcient implementation on an array processor.


Introduction
In [1,2], and elsewhere, a style of program derivation is advocated in which a complex change is wrought on a program through a number of simpler changes brought about by the automated application of a sequence of sets of simple transformations.For example, in [3] an imperative, explicitly data-parallel program is derived from a pure, functional program (which is used as an abstract specification) in six main steps.SML !-calculus !Unfolded !Simplified !Array Form !Common Sub-expressions Eliminated !Fortran Plus Enhanced (DAP) The functional program (expressed in SML [4]) is translated into the -calculus; definitions are unfolded; expressions are simplified using algebraic rules; array expressions are translated into explicitly data-parallel (but still pure) forms; expressions are optimized; state and imperative control constructs are introduced to produce an imperative program (expressed in a variant of Fortran for execution on the AMT DAP array processor [5]).One of the advantages claimed for this style of derivation is that establishing that program correctness is preserved at all stages is simplified by (i) the representation of a complex change as a sequence of The work reported in this paper was supported by SERC grant GR/G 57970.simpler, conceptually independent changes; (ii) the simplicity of the individual transformations; and (iii) regarding transformations as abstract rewrite rules rather than concentrating on their concrete effect on given programs.
This paper presents evidence to support this claim.It contains, for the stage in the derivation that produces the explicitly data-parallel form, a proof of correctness for each of the more complex transformations and a sample of proofs of correctness for the simple transformations.This stage is conceptually and technically significant.It is the first of the two pivotal stages in the derivation, the other being the translation from functional to imperative form.Note that the proofs are of themselves not of great interest -none of them contains novel proof techniques.Indeed, the interest lies in the fact that the proofs are mundane.
The structure of the paper is as follows: the basic notation used is explained; the functions used in the initial form and in the Array Form are defined; useful lemmas relating to the functions are stated (their proofs are relegated to the appendix); the transformations are defined; and their correctness is proved.

Basic Notation
.
The basic notation used in this paper is that of the -calculus with optional type information and a set of primitive functions.An array is considered to be a mapping from a (finite) set of indices onto some range of values of type .A 1-dimensional array may be referred to as a vector and a 2-dimensional array as a matrix.For a set, the operator + indicates element insertion: S+i S -i˝.
In the following sections, four primitive array functions and the Array Form functions are defined.The primitive functions are taken to define the semantics of arrays.Most array operations commonly encountered in numerical mathematics can be compactly expressed using these primitives.However, it is not intended that a programmer be restricted to using only these primitives: other, perhaps more convenient, functions can be defined in terms of these primitive functions.Transformations can be employed to eliminate such derived functions using techniques such as unfolding and algebraic simplification.
The Array Form functions are much more restricted than the primitive functions, but are also much simpler to implement efficiently on an array processor such as the AMT DAP.In effect, the Array Form may be considered as a functional abstraction of an array processor.

Primitive Array Functions
The four primitive array functions are: shape, element, generate and reduce.shape(A: array) !Shape Given an array, the function shape can be used to obtain its index set.The extent of an array in a particular dimension can be obtained by specifying the dimension: shape(A: array, n:int) !int element(A: array, i:index) !
The element function returns the value of the element of A at position i.For convenience, the infix operator @ is defined to be equivalent to element: A@i element(A, i) generate(S:Shape, g:index ! ) ! array The basic function for constructing arrays is generate.The first argument, S, specifies the index set of the constructed array.The second argument, g, is a function, called the generating function, which determines the values of the elements: the value of element i is g(i).
The following are some examples of arrays constructed using generate: the elementwise addition of two arrays, of arbitrary dimensionality, having the same shape: generate(shape(A), i A@i+B@i) the transpose of a 2-dimensional array A of shape [m, n]: An argument, such as i, of a generating function is called a generating index.For multi-dimensional arrays, the term may also be used of the components of an index argument; for example, in [i, j] e, the generating indices are i and j.It should be clear from context whether a whole index or a component is being considered.

reduce(r:
! , r0: , S:Shape, g:index ! ) !Many array operations require the elements of an array to be accumulated -or reduced -into a single value by the repeated application of a binary reducing function.For example, the sum of the elements of a numeric array is a reduction using the addition function.Reductions are denoted using the reduce function.
The argument r is the reducing function.The argument r0 is the initial value which is used to instantiate the accumulation (it is usually an identity of the reducing function and so does not alter the value of the reduction; its inclusion helps simplify the semantics of reductions by ensuring that a reduction is well defined even if an array contains only one element, or even no elements).
The arguments S and g (g is a generating function) can be used to specify the elements of the array which are to be reduced.For example, reduce(+, 0, shape(A), i A@i) produces the sum of the elements of array A. However, the generating function need not necessarily be an application of element: a reduction can involve any set of values which can be specified by applying a generating function over an index set.For example, the inner-product of two vectors U and V, of shape [n], can be expressed as reduce(+, 0, [n], i U@i * V@i) The four functions shape, element, generate and reduce are the basic array functions; most common vector and matrix operations can be readily expressed using them.Some further examples are given below: Row i of a matrix A: generate(shape(A, 2), [j] A@[i, j]) Product of two matrices A and B (which are assumed to be conformant; i.e. the number of columns of A equals the number of rows of B): generate([shape(A, 1), shape(B, 2)], [i, j] reduce(+, 0, shape(A, 2) Boolean matrix, of shape That is, the shape of an array constructed by an application of generate is the shape specified by the shape argument; and the value of each element of the constructed array is found by applying the generating function to the element's index.
When the shape of a generation is a manifest list, axiom G1 may be used in the form shape(generate([m, n], i g), 1) m In the proofs presented in this paper, the condition i 0 2 S in axiom G2 -that an index be a member of an array's shape -is usually ignored.It could be verified separately from the main proofs, or it could be included in the form element(generate(S, i g), i 0 ) if (i 0 2 S) then i g (i 0 ) else ?(G2 0 ) where ? is the undefined value, bottom.Context could then be used to establish that the condition i 0 2 S is true, and so the conditional expression can be reduced into just the true limb.This technique is illustrated in one proof (of lemma 5).

Definition 2: reduce
The reduce function is defined recursively on the index set over which the reduction is to be performed: reduce(r, r0, , i g) r0 (R1) reduce(r, r0, S+i 0 , i g) r( i g (i 0 ), reduce(r, r0, S, i g)) (R2) where i' = 2 S and denotes the empty set (of indices) Note that no order is defined for performing reductions, so reducing functions must be associative and commutative.

Array Form Functions
The main Array Form functions are now defined.These functions are intended to capture the sorts of operations that any array processor could be expected to implement efficiently (for example, simultaneously adding corresponding elements of two arrays).Some of the functions, and some of the transformations discussed later, are perhaps peculiar to array processors in which the processing elements are arranged in a two-dimensional array (the AMT DAP is one such processor).It should be emphasized that what is under consideration in this paper is the correctness of the transformations that create the Array Form, rather than how well suited the Array Form is for use on a particular processor.

Main Array Form Functions
See Figure 1 for examples.Probably the most important functions of the Array Form are the mapping functions, which apply a scalar function to each element of an array, or to corresponding elements of a pair of arrays.Mappings are supported for only certain scalar functions -for example, the basic arithmetic and logical functions.
map(A: array, B: array, f: where A and B have the same shape 2 The fold function performs restricted forms of reductions, in which the values reduced are elements of an array and where the reducing function is one of a limited set (and typically evaluating sum, product, logical and, logical or, minimum or maximum).A variant of fold is also defined which reduces a matrix along its rows, thereby forming a vector of (partial) cumulative values -this corresponds to operations on the DAP which compute multiple vector reductions simultaneously.
The join function is a data-parallel conditional.The result of applying join is an array with elements merged from two arrays according to whether the corresponding element of a mask array is true or false.
Definition 6: join join(M:boolean array, T: array, F: array) !array def = generate(shape(M), i if M@i then T@i else F@i) where M, T and F have the same Shape 2

Miscellaneous Functions
The Array Form defines many functions for performing miscellaneous array operations such as transposing a matrix, 'shifting' the elements of a matrix in a specified direction and constructing logical matrices having true values in certain patterns (such as along their main diagonals or in their upper triangles).Here, only a few examples of such functions are considered, since the correctness of transformations involving such functions is usually trivial to establish from the function definitions.

Preliminary Results
Some properties of generate and reduce are presented below -proofs of these properties are presented in the appendix.In addition, some elementary identities of the -calculus are noted (without proof).
In the following, if an identifier is introduced on the right of an identity, then it should be assumed that it is a 'new' identifier, i.e. one that does not occur free in the expression on the left.For example, in the identity it is to be assumed that x does not occur free in B, so that no problem arises with name clashes.
Lemma 1: Identity -bindings A -binding in which the bound identifier and the bound value are the same is redundant.
x B (x) B 2 Lemma 2: Propagation of -binding out of abstraction A -binding can be moved out of an immediately enclosing -abstraction if the bound value does not depend on the identifier of the abstraction.

i ( x B (e))
x ( i B) (e) where e does not contain i (Note that x is still bound to e.) 2 Lemma 3: Propagation of -binding through a function application An applied -binding that is an argument in a function application can be moved outside the function application.

f( x B (e))
x f(B) (e) where f does not contain x 2 This identity can be generalised for moving a -binding out of an argument in a function application which has more than one argument, provided all of the other arguments are free of the bound identifier.Note that this lemma can be applied in both directions, to move a -binding into, as well as out of, an argument position.

Properties of Elementwise Applications
The following lemmas pertain to elementwise applications of functions.Such applications can be denoted using the map function but, for convenience in later proofs, the operator is used to promote a binary function to a binary function on arrays.For example, (+) is a function that performs elementwise addition of two arrays.
Definition 9: Elementwise Operator, (f: ! ) ! ( array array !array) def = X: array, Y: array generate(shape(Y), i f(X@i, Y@i)) 2 Lemma 4: Shape of an elementwise application The shape of an application of an elementwise function to two arrays is the same as the shape of the second argument array (which is required to be of the same shape as the first argument array).
shape( (f)(A, B)) shape(B) 2 Lemma 5: Element of an elementwise application An element of an elementwise application of a function to two arrays is the value of that function applied to the corresponding elements of the arrays; that is, element propagates through :

Properties of Reductions
If the reducing function of a reduction is an elementwise function, then the result of the reduction is an array, so it is valid to apply the shape and element functions to the result.Such a reduction is called an -reduction: Definition 10: -reduction A reduction of the form reduce( (r), R0, S, i g) is called an -reduction.

2
The following two lemmas pertain to -reductions.

Lemma 6: Shape of an -reduction
The result of an -reduction has the same shape as the reduction's initial value: shape(reduce( (r), R0, S, i g)) shape(R0)

2
Lemma 7: Element of an -reduction An element of a reduction using (r) is a reduction using r: that is, element can be propagated through an -reduction.
element(reduce( (r), R0, S, i g), j) reduce(r, element(R0, j), S, i element(g, j)) where S is independent of j 2 Lemma 8: Reduction over a union Since no order is specified for performing reductions (and since reducing functions are required to be associative and commutative), a reduction over an index set that is the union of two sets can be split into a pair of reductions.
reduce(r, r0, S T, i e) r(reduce(r, r0, S, i e), reduce(r, r0, T, i e)) where r0 is an identity element of r, and S and T are disjoint 2 Lemma 9: Collapsing singleton dimensions If one of the dimensions of a multi-dimensional reduction contains only a single member, that dimension can be collapsed -that is, removed from the reduction's index set.For this paper, collapsing is required for only leading dimensions (i is a leading dimension in -i˝ S).
reduce(r, r0, -i 0 ˝ S, ij e) reduce(r, r0, S, j ( i e (i 0 ))) where r0 is an identity element of r and where i 0 and i have the same dimensionality 2

The Transformations
In this section, the main transformations (actually, identities) for converting from expressions that use the basic array functions (generate, reduce, etc.) into expressions that use Array Form functions are listed; their correctness is established in the following section.The basic strategy for converting into Array Form is to propagate applications of generate into generating functions; for example, a generation of a suitable scalar function becomes an application of map; and a generation of a conditional expression becomes an application of join -this strategy is discussed at length in [3].In addition, several transformations optimize combinations of operations to make best use of the DAP hardware.

General Transformations
Constructing an array by applying a scalar function to the elements of an array (or to corresponding elements of a pair of arrays) is equivalent to applying the elementwise version of the function (expressed using map).

2
(The function f is necessarily independent of i.) Constructing an array by evaluating a conditional expression for each element is equivalent to forming a data-parallel conditional, in which arrays are constructed from the true and false limbs of the conditional and are merged according to a mask generated from the predicate.
Transformation 2: Propagation through conditional generate(S, i if p then t else f) join(generate(S, i p), generate(S, i t), generate(S, i f))

2
As mentioned previously, only a few examples of miscellaneous Array Form functions will be considered.

Propagation through -expressions
Consider a generation in which the body of the generating function is a -binding:

generate(S, i ( x B (e)))
The task of the Array Form transformations is to convert this generation into a form that can be efficiently implemented on an array processor: in the general case, all of the bindings are evaluated in parallel, then all of the body expressions are evaluated in parallel: 8 i 2 S: evaluate e; 8 i 2 S: evaluate B If parallelism of unlimited dimensionality were permitted, it would be a simple matter to create this parallel form.However, because the DAP is limited to 2-dimensional parallelism, it is incapable of efficiently implementing the general case in the above manner.For example, if S were 2-dimensional and e 1dimensional, then the above scheme would require the creation of a 2-dimensional array of 1-dimensional arrays, a structure which the DAP can manipulate, but not in a completely parallel manner.Nevertheless, the above scheme can be used for certain cases where the effective or useful parallelism is at most 2-dimensional: for example, if S is 1-or 2-dimensional and e is a scalar value; or if S is 2dimensional and e is a vector which is independent of one of the dimensions of S. The transformations below pertain to such cases; if none of these transformations applies to a given binding, then, as a last resort, the binding can be -reduced in the hope that the resulting expression can be parallelised.In the following, it is assumed that all shapes in generations and reductions are at most 2-dimensional.
If the bound value is independent of the generating index, then the generation can be propagated into the binding.where e is a scalar expression 2 The binding for x is removed by -reduction after application of this transformation.Suppose that a 2-dimensional array of shape [l, m] is being generated over indices i and j, and that for each element a -binding is formed in which the bound value is a vector generation over index k.In the general case, l m vectors must be created.However, if the bound vector is independent of j, then only l vectors need be created (one for each value of i).These l vectors can be created simultaneously as the rows of a matrix generation over indices i and k.
Transformation 7: Matrix generation, vector binding independent of one generating index generate([l, m], [i, j] ( x B (generate where e is a function of i and k but not of j, and is not an application of element 2 As it stands, the expression produced by this transformation does not appear to be an improvement over the initial expression, as the generation still contains a vector binding.However, it is usually the case that B requires only individual elements of x, and not the vector as a whole. 1 Then the binding for x can be reduced and parallelisation of the resulting expression can proceed. A similar transformation can be applied when e is independent of i rather than j.(It is assumed that the case of e independent of k is optimized by converting the vector binding into a scalar binding.) If the array being constructed is a matrix and the bound value is a vector which is dependent on both matrix indices, and if the generating function of the vector is a function application, then the function application can be moved outside the vector.This transformation is expressed below for arrays of arbitrary dimensionality, but it is applied in practice only to a matrix generation/vector binding combination.

2
As discussed above, it may be possible to remove the binding for x if only individual elements of x are required, and not the entire array.The repeated application of this transformation may reduce the binding into a form that can be parallelised by one of the preceding transformations.A similar transformation can be used if f is a binary function.

Transformations for Reductions
Consider a generation having a generating function that is a reduction: generate(S, i reduce(r, r0, T, j g)) There are several ways that this expression could be converted into Array Form: Each of the reductions can be parallelised: that is, i is iterated over sequentially, and for each i, a parallel reduction is performed.
The generation can be parallelised by exchanging the generate and the reduce; then j is iterated over sequentially, while for each j, the generation is evaluated in parallel.
The generation and the reduction can be combined into a partial reduction (such as fold.rows),so that both are evaluated in parallel.
The third option, combination, is generally preferable when it is feasible, since it makes maximum use of parallelism.However, on a computer such as the DAP, which is limited to 2-dimensional parallelism, combination is possible only when both S and T are 1-dimensional.
Failing the third option, the second option is preferable, since generations generally make better use of parallelism than reductions.(For example, two arrays of arbitrary size can, theoretically, be added in a single step, whereas the reduction of an array of size [n] requires log 2 (n) steps.) 2he following two transformations enforce these preferences.
where n and r are independent of i 2 Transformation 10: generate-reduce swap generate(S, i reduce(r, r0, T, j e)) reduce( (r), generate(S, i r0), T, j generate(S, i e)) where r, r0 and T are independent of i and S is independent of j 2 This transformation can be applied for arrays of arbitrary dimensionality; for the DAP, however, it is used only for matrix generation and vector reduction.
If each component of a reduction is itself a reduction (which uses the same reducing function), then coalescing the reductions into a single reduction increases parallelism.
Transformation 11: reduce-reduce combination reduce(r, r0, S, i reduce(r, r0, T, j e)) reduce(r, r0, S T, ij e) where r0 is an identity element of r, and r and T are independent of i 2

Proofs of Correctness
The correctness of the transformations discussed in the preceding section is established below.The proofs are presented in the same order as the transformations.

2
A similar proof applies for unary functions.Note that the proof is simplified by choosing the appropriate generating index when the generation is introduced.Any other generating index, say j, could be used and would lead to an expression such as generate(S, j f( i a (j), i b (j))) which, since f is independent of j, is equivalent to generate(S, j ( i f(a, b) (j))) which is one way of expressing the process of -converting a -abstraction from using identifier i to using identifier j; that is, this expression is equivalent under -conversion to generate(S, i f(a, b)).where A and r are independent of i and A has shape [n, m] Proof follows directly from the definition of row: row(A, r) = definition 8 generate(shape(A, 2), [j] A@[r, j]) = substituting for the shape of A generate([m], [j] A@[r, j]) 2

Proofs of Transformations for -bindings
The following proofs pertain to transformations for generating functions whose bodies are -bindings.Proof 5: Invariant binding generate(S, i ( x B (e))) x generate(S, i B) (e) where e is independent of i and S is independent of x Proof involves only elementary properties of the -calculus: generate(S, i ( x B (e))) = move binding of x out of abstraction of i (lemma 2) generate(S, x ( i B) (e)) = move binding out of application of generate (lemma 3) x generate(S, i B) (e) 2 Proof 10: generate-reduce swap generate(S, i reduce(r, r0, T, j e)) reduce( (r), generate(S, i r0), T, j generate(S, i e)) where r, r0 and T are independent of i and S is independent of j Since both sides of this identity evaluate to arrays, proof of this identity requires proof that the two arrays have the same shape and the same elements.

Same Elements
Consider an arbitrary element i 0 .

2
Proof 11: reduce-reduce combination reduce(r, r0, S, i reduce(r, r0, T, j e)) reduce(r, r0, S T, ij e) where r0 is an identity element of r and r, r0 and T are independent of i Proof is by induction over S.

6 Conclusions
The transformations required for converting basic array expressions into whole-array form have been shown to preserve the meaning of expressions.The majority of the proofs of correctness are simple and many are trivial; even the more complex proofs are straightforward to carry out, and use only well-known techniques (primarily induction over sets).
The simplicity of the proofs is due in large measure to the decomposition of a derivation into independent stages and to the decomposition of each stage into a sequence of simple transformations -the effect of each transformational step is small and is consequently readily amenable to formal analysis.In addition, the postponement of consideration of imperative details until very late in a derivation allows most of the transformational steps to be made within a purely functional framework; indeed, in this paper, it was not necessary to consider imperative details at all, even though the motive for applying the transformations is to tailor a program to the peculiarities of a particular type of imperative system.
Transformation sequences provide a means of characterising the special features of particular parallel architectures and/or problems domains (e.g.array processors or sparse matrix problems).The transformations relating to array processors presented in this paper may be reused in the derivation of efficient implementations of algorithms for the solution of a range of problems.Further, they are applied automatically by a tool.For these reasons, it is particularly important that the transformation sequence be proved to be meaning preserving.

A Proofs of Preliminary Lemmas
Some basic properties of generate and reduce are established below.

A.1 Properties of Elementwise Applications
Proof of Lemma 4: Shape of an elementwise application shape( (f)(A, B)) shape(B) Proof.

Proof.
Proof is by induction on S.

Figure 1 :
Figure 1: Examples of Array Form functions

2 Transformation 4 :
row generate([m], [j] A@[r, j]) row(A, r)where A has shape [n, m] and A and r are independent of j

Transformation 5 : 2
Invariant binding generate(S, i x B (e))x generate(S, i B) (e) where e is independent of i and S is independent of x If the bound value is a scalar, then the generation can be propagated into the binding by creating an array of bound values.Transformation 6: Scalar binding generate(S, i ( x B (e))) X generate(S, i ( x B (X@[i]))) (generate(S, i e))

Proof 1 :
Propagation through scalar functions Consider the case of a binary function.generate(S, i f(a, b)) map(generate(S, i a), generate(S, i b), f) Proof follows directly from the definition of map: map(generate(S, i a), generate(S, i b), f) = definition 3 generate(shape(generate(S, i a)), i , r0, , ij e) = by R1 r0Inductive Step: S+i' Assume identity holds for shape S. Now consider shape S+i 0 , where i 0 = 2 S.