Compact Fusion

There are many advantages to writing functional programs in a compositional style, such as clarity and modularity. However, the intermediate data structures produced may mean that the resulting program is inefﬁcient in terms of space. These may be removed using deforestation techniques, but whether the space performance is actually improved depends upon the structures being consumed in the same order that they are produced. In this paper we explore this problem for the case when the intermediate structure is a list, and present a solution. We then formalise the space behaviour of our solution by means of program transformation techniques and the use of abstract machines


INTRODUCTION
Hylomorphisms [1] represent a common programming pattern of using an intermediate data structure, that is first built and then collapsed, to give a result.More formally, it is the composition of an unfold and a fold: the unfold uses a seed value to generate a data structure and the fold takes this structure and collapses it in some way.The space efficiency of this composition may be improved by applying fusion techniques to eliminate the intermediate data structure.However, whether the space performance is actually improved depends on the fold being able to consume elements as they are generated.If this is not the case, then the result is the creation of the whole structure before any folding evaluation can take place, and the intermediate structure still effectively exists in the fused function.
Here we will illustrate this problem with some examples and show how using an accumulating fold, fold-left, will improve the space performance.We then show how to formalise these space results, by using abstract machines to expose the underlying data structures, which can then be measured.The contributions are i) a new hylomorphism theorem, that captures the idea of consuming elements as they are generated, and ii) the process of producing space results.To achieve the second contribution, we derive an abstract machine using program transformation techniques.Once we have such a machine we can produce a high-level function that measures space usage.All our examples are given in Haskell [2].

HYLOMORPHISMS
We will consider hylomorphisms where the intermediate data structure is a list; that is, the unfold function generates a list from a seed value, and the fold then consumes this list.

Unfold
The unfold function builds a list from an initial seed value.It takes three additional arguments: a predicate, p, to determine when to stop generating list elements, and two other functions, hd and tl , to make the head of the list and to modify the seed value to pass to the recursive call, and generate the rest of the list: The resulting list is therefore of the form: unfold p hd tl x = [hd x , hd (tl x ), hd (tl (tl x )), ...] For example, we can define a function downFrom using unfold , which takes a natural number n and produces a list of all the numbers from n down to 1, where id and pred are the identity and predecessor functions: Applying downFrom to the number 3 produces evaluation trace A in figure 1.We can use the shape of the trace to informally measure the space requirements in evaluation of the expression.The expression size can be estimated by counting constructor symbols and the space requirements for evaluation of an expression is given by the maximum expression size generated during evaluation, since space may be re-used at each step of evaluation.As we can see in the trace, the expression size reaches its maximum when the list has been completely generated, producing a list of length equal to the argument to downFrom.Evaluating downFrom therefore requires additional space proportional to its argument, and so has linear space requirements.

Fold-right
The standard fold operator for lists [3] takes two arguments, a binary operator (⊕) and value v , replacing every list constructor (:) with (⊕) and v in place of the empty list [ ].It is defined as follows: For example, a list [a, b, c ] would be folded as: Calculating the product of a list of numbers can be expressed by folding the multiplication operator over the list, and substituting the unit of multiplication in the empty list case: Applying product to the list [3,2,1] gives evaluation trace B shown in figure 1, and takes space proportional to the length of the list.This fold is called fold-right because, as shown in the trace, after replacing each (:) with ( * ), the application brackets to the right.

Hylomorphisms
A hylomorphism is the composition of a unfold with a fold, and is defined as follows: We use the name hylor for this function, rather than the standard hylo, to emphasise that it is specified in terms of fold-right.Within the definition for hylor , a list is generated by the unfold function and passed to the fold, which consumes it.However, the well-known hylo theorem [1] states that the two functions may be fused together to eliminate this intermediate data structure.
The hylomorphism theorem for lists is: Now we will look at an example hylomorphism and see how the space performance is affected by applying this theorem.

Example: factorial
The factorial of a natural number, n, can be calculated by taking the product of a list from n down to 1.We can therefore express the factorial function as the composition of the two functions product and downFrom: This composition is a hylomorphism, since the downFrom function is an unfold and product is a fold, and so we can apply the hylor theorem, inlining the pred function, to give the following fused program: The purpose of fusing the program is to eliminate the creation of the intermediate list.In this case the input and output are both integers, but a list is built in the process, so potentially we could perform the multiplication after each element of the list is generated and achieve evaluation in a constant amount of space.However, the unwinding of the fused definition of factorial, given in trace A of figure 2, shows that this isn't the case.The trace shows that all of the list elements do have to be generated before any multiplication evaluation can occur.Although there is not an explicit list, the structure is still there, with the list constructor replaced by the multiplication operator.Multiplication evaluation can only occur once the unfold has finished producing list elements, and the structure is then collapsed from the right.
The maximum expression size produced in the factorial example occurs when the list has been completely generated.Therefore, the amount of space required in evaluating the factorial of a number is directly proportional to that number, so it is linear and not the constant desired.

Impedance mismatch
The problem is the impedance mismatch 1 between unfold and fold-right; the former generates the list elements in left-to-right order, but the latter consumes them in right-to-left order.The hylor theorem eliminates the overhead of constructing/destructing the intermediate list, but retains the impedance mismatch and hence gives poor space performance.

Fold-left
An alternative way to fold a list is to bracket the operator from the left: This version, called fold-left, uses an accumulator that is returned in the empty list case, and, in the non-empty case, combined with the head of the list, using the operator, and then the updated accumulator is passed to fold the tail of the list.The definition for fold-left is:

Duality
A well known duality property [4] is that when the operator (⊕) is associative and has the element e as its unit, foldr and foldl always give the same result.In fact, the opposite result also holds, giving the following equivalence: In the case of the product function, ( * ) is associative and has 1 as its unit, so it can be re-expressed using fold-left: Under Haskell's lazy evaluation strategy, the outermost redex is chosen to be evaluated first, so the recursive call is evaluated before the accumulator expression.This is illustrated in evaluation trace A of figure 3. To force evaluation of the multiplication first we can introduce a strictness annotation, $!.In the expression, f $! x , the strictness annotation will ensure that x is evaluated first, though only enough to check that it is not undefined (head-normal form), before f x is evaluated [4].Fold-left can be modified using the strictness operator as so: Re-expressing product using foldl now means it is evaluated as in trace B in figure 3, with the evaluation of the multiplication now occurring before the recursive call.

Left hylomorphism
The corresponding hylomorphism theorem for fold-left is: Although straightforward, to the best of our knowledge, this operator has not been considered before.

Proof of left hylomorphism theorem
Structural induction cannot be used to prove that this definition satisfies the specification above, because there is nothing to do induction over; we do not know the structure of the seed value to the unfold.There is also no structured result to do co-induction over.However, because both foldl and unfold are defined as fixpoints and therefore hylol is a composition of two fixpoints, we can apply the "total fusion" [5] theorem.This states that a function that is the composition of two fixpoints, is related by: We can prove the total fusion theorem using fixpoint induction [6].
The assumptions here are that types are complete partial orders (CPOs), which are sets with a partial-ordering , a least element ⊥, and limits of all non-empty chains, and programs are continuous functions, functions between CPOs that preserve the partial-order and limit structure.
Showing that the first conjunct is satisfied is trivial (⊥ • ⊥ ≡ ⊥), so we proceed straight to verifying the second conjunct: This completes the proof, apart from showing that the predicate P is admissible (preserves limits of chains), which is immediate from the fact that any equality between continuous functions can be shown to be admissible, and that the composition of any two continuous functions is continuous.
To apply total fusion first we need to re-express unfold , foldl and hylol in terms of least fixpoints: The list and accumulator arguments have been swapped over in the foldl and hylol functions, so the list is now the first argument.This is to make it easier to compose the fold-left and unfold in the proof, in that the result of the unfold (a list) is the first argument to the fold-left.
We can now prove the hylol theorem: The final equation can be verified as follows: The functions unfold , foldl and hylol are only locally defined above and so contain free variables, but we use them for clarity.

Example: left factorial
The factorial function can be re-expressed using the fold-left version of the product function: Applying the left-hylomorphism theorem gives the following fused definition: The resulting trace (B in figure 2) shows that the multiplication evaluation now happens as soon as the list elements are generated.The shape of the evaluation trace is different, because the evaluation now occurs in constant space; only the additional space to hold the accumulator is required.

Calculating an accumulator version
It is interesting to consider whether a function produced from the hylor theorem can be turned into a space efficient version by calculation.In general, an accumulator version f can be calculated, with an appropriate ⊗, for a function f using the specification: In the factorial example, we can attempt to calculate an accumulating version: The proof would proceed directly as: The next step would be to substitute facta a (x − 1) for (a * x ) * fact (x − 1), but we cannot do this because there is no induction hypothesis.One could be created for this specific case by induction on natural numbers, but not for the general case of functions produced using the hylor rule.It is therefore not possible to produce an accumulator version in the general case from hylor , but this can instead be done by applying the hylol theorem instead.

Strictness
The space performance of the original hylomorphism definition may in some cases still be constant.This occurs when the fold operator is non-strict in its second argument; it does not require the value of it to produce a result.

Example: prime
We can naively define a function that tests if a number is prime by creating a list from two up to the integer argument and checking to see that none of the list elements are divisors.
Applying the hylor theorem, gives the fused function: In Haskell, the conjunction function ∧ is strict on its first argument, and non-strict in its second: Using this definition of ∧, the evaluation trace for prime 9 is: The resulting trace has constant space requirements, because ∧ can be evaluated solely based on the value of its first argument.If the conjunction was implemented differently, so that it was strict in both its arguments, then evaluation would occur as in the previous examples.The fold-left version of this function still has constant space requirements, though the time requirements are worse if the number isn't prime, because the fold-left always has a tail-recursive call, it can never exploit the laziness of the ∧ if the first argument evaluates to False.

FORMALISING
We now seek to formalise the space performance results of the previous section.Inspired by our earlier work on measuring time performance [7], the approach here is to first transform the function whose space performance we wish to measure into an abstract machine that makes explicit how evaluation proceeds.This technique has been developed by Danvy et al [8] and has been applied in a calculational way by Hutton and Wright [9].We then label the transitions of the machine with explicit space information, and reverse the transformation process to obtain a high-level function that measures the space behaviour of the original function.In the remainder of this section we show how this proceeds for the particular case of the hylor function.

Abstract machines
Let us start with the definition of the hylor function: The first step in the process of obtaining an abstract machine that implements this function is to make the control flow explicit, by transforming the function into continuation-passing style [10], giving the following result: The next step is to replace the use of continuations by an explicit stack data structure, by applying the technique of defunctionalization [10], which results in the following definition: We can now rewrite this function in the form of transition rules for an abstract machine with two states-the state (x , c) corresponds to evaluating an expression using the function call h x c, and c, v to executing a stack using the function call exec c v : Finally, we also specify the evaluation order of the else branch within these rules, by introducing explicit let bindings with strict semantics: Further details of this approach to transforming a function to an abstract machine can be found in [8,9].

Memory management
To keep track of the space usage a memory manager data structure is introduced, consisting of a pair of non-negative integers: The first component of the pair is the amount of memory that has been explicitly freed at the current point, and the second is the amount that has been explicitly allocated: As we shall see, both parts are necessary to capture an accurate space model, in that memory freed by earlier evaluation may be re-used by a later on.Two functions are defined on the manager to allocate and free memory, alloc and free.To free some memory, the amount to be freed is simply added to the free memory integer, and is then available to use in later allocation requests: When allocating memory, the request is first satisfied using the pool of free memory that is currently available, by subtracting the amount from the free memory integer until it is zero, with the difference then added to the allocated memory integer: For simplicity we assume an infinite amount of memory, and hence allocation requests are always successful.The auxiliary subtraction function, x .− y, is defined as the maximum of x − y and 0, thereby ensuring that the result is never negative: For the purposes of later proofs, we will exploit the following properties for these functions, which can easily be proved from the above definitions: The first and second properties express that repeated occurrences of free or alloc may be accumulated.The third states that an alloc immediately followed by a free of the same amount has no effect, since the allocation can use up the previously freed amount.Finally, the last property expresses that freeing memory does not affect the amount allocated.

Space costs
For the purposes of assigning space costs we use the notation x s to denote the space requirements for evaluating x.In the case when x is a piece of data, this will be a non-negative integer representing the size of that data, which we measure by simply counting constructors.For example, the cost of the stack data structure is defined recursively as follows: In the case of a function f of a single argument, f s will be a function that takes this argument along with a memory manager, and returns a modified memory manager that reflects the cost of this application.For example, the cost of applying the tail function on lists can be expressed as follows: Functions with multiple arguments can be treated in the same way by exploiting currying, resulting in a function of n arguments having n unary cost functions.

Transition costs
To add space information to the abstract machine, a way of instrumenting each transition with its cost is required.The space requirements are added using an accumulator, so that it remains an abstract machine.The accumulator is a memory manager and is updated according to the structure of the transition.For a basic transition of the form x → y we can perform an update operation update x s y s , when provided with the sizes of the data structures on the left and righthand of the transition (before and after the transition occurs).The update captures the idea that as much space is-used as possible.First the space occupied by structures in x that don't occur in y is freed, allowing it to be re-used, and then the space for additional structures, that appear only in y, is allocated.We can defined the update function as: There are two special cases to consider, when transitions are of the structure let or if .For transitions of the form x → let y = f x in z , initially the space for the argument x is allocated, then the space requirements of the function f applied to x is performed, and finally an update occurs, with the sizes of the left hand (which now includes the new bound data y) and right-handside, update (x s + y s ) z s .Altogether this occurs as: Similarly in the if case, the space cost of performing the transition x → if p x then y else z first allocates the space for x , then applies the cost of applying the function p to x .If the predicate p x Mathematically Structured Functional Programming evaluates to True then an update occurs with the size of the left-hand-side x s + True s and righthand-side y s , and if it is False then the size of the left-hand-side is x s + False s and right-hand-side z s .
(if p x then update (x s + True s ) y s else update (x s + False s ) z s ) • p s x • alloc x s In the new machine each argument is paired with its space cost, as defined in the previous section.For example x is replaced by (x , x s ).The resulting machine, which has also been simplified by inlining the definition of update and applying the properties in section 3.2, is given below: spaceMach (p, p s ) (hd , hd s ) (tl , tl s ) ((⊕), (⊕ s1 ), (⊕ s2 )) v (x , x s ) m = h x (alloc 1 m) TOP where h x m c = if p x then exec c v ((free (x s + 1) The next step is to perform the same program transformations, but in the reverse order, to produce a high-level function that measures the space from the abstract machine.After refunctionalizing the continuation and transforming from CPS, the following accumulator version is produced: In the next section, we will use this derived function to prove the space properties of the factorial example function.

Example: factorial space
We can analyse the space performance of the factorial function by first producing space requirements functions for the primitive functions: equivalence to zero, multiplication and predecessor functions.This is done simply by taking the difference in size between the input and output, for example, if we define the size of an integer to be one unit of space, then the multiplication function will free one unit of space, since it takes two integers as arguments and the result is one integer.
Applying the spacer function and inlining the primitive space functions gives the following result: The resulting function shows how, for each recursive call, two units need to be allocated before the call, which are then released afterwards.
To prove that the space requirements are linear we can form a specification that says, if we free n units of space initially and execute the space requirements function, then the allocated amount of memory will be unchanged.This means there was no need to request more memory, since the pool of n units of free memory was sufficient for evaluation.If we can prove this specification, then we can say that the function executes in n units of memory.may be rewritten as a fold-left if their operators associate with each other, and the empty list value, v , is the right and left unit for the fold-right and fold-left operator respectively: x The left-hylo space function can then be applied to get the space requirements function: spaceToBinl = h where h x = if x ≡ 0 then free 3 • alloc 2 else h (x 'div ' 2) • alloc 2 This function gives the space requirements 4+2 * log 2 n, but not including the space that the result is occupying, it only requires a constant 3 units for evaluation.
The results of the two space requirements functions show that the right-hylo version requires additional space proportional to the size of the result, whereas the left version only requires a constant amount of additional space.

CONCLUSION AND FURTHER WORK
The aim of applying fusion theorems, such as the hylomorphism theorem, is to eliminate the intermediate data structure produced.However, we have shown that this is only achieved if the generating function produces elements in the same order as they are consumed.The examples given illustrate this impedance mismatch and show how an accumulator version, using fold-left, is often a solution.The accumulator is then able to evaluate the elements generated in-place, rather than waiting until the end, giving improved space performance.
The space results may be observed informally by looking at evaluation traces, but we can get more concrete space measures by using program transformation techniques to derive the underlying abstract machine.At this level we can measure data structures that were not visible at the original function level.The machine can then be instrumented with space usage and then the transformations reversed to get a resulting space requirements function.This can then be used to prove the space performance.
Applying this technique to more general structures is not so simple, since fold-left cannot be generalised as fold-right can.There are more restrictive functions that can be generalised, such as crush [11], where the structure is first flattened to a list and then folded.The same idea, of using an accumulating fold, may be applied to improve the space usage.How to extend this approach to other structures would be an interesting topic for further work.

FIGURE 1 :
FIGURE 1: Evaluation traces for downFrom and product

FIGURE 2 :
FIGURE 2: Evaluation traces for fact and factl