Communication Traces in the Verification of Distributed Programs

Two types of communication traces, channel traces and process traces, have been used in the literature in dealing with distributed programs. Specifications and proofs in channel-trace systems are usually considered to be simpler than in systems based on process traces. But channel traces do not contain information about the relative order of communication along different channels of a process which can lead to incompleteness of the channel-trace based axiom systems. Several attempts have been made to overcome the incompleteness by adding new axioms to these systems. 
 
We show with simple examples that these axioms do not by themselves solve the incompleteness problem. Effectively, process traces or some equivalent thereof is necessary to achieve completeness. We also consider the possibility of adding new communications to the processes to include more information in the channel traces.


Introduction
Auxiliary variables [3] are very useful in program verification.They are especially valuable in the verification of distributed programs since they allow us to account for interactions between the different processes of a program. 1 Although auxiliary variables may be used to express many different types of information about the program or process under consideration, perhaps the most useful type of auxiliary variables in dealing with distributed programs are those that represent sequences of communications that the processes engage in.This is not surprising since it is via these communications that all interactions between the processes take place, there being no shared variables.Further, there is also a 'naturalness' about these sequences; while other types of auxiliary variables have to be 'invented' by the program prover based on the particular program in question, communication sequences simply record the sequences of communications between the various processes.This is reflected in the proofs of individual programs; whereas for other types of auxiliary variables the program prover must introduce new (assignment) statements into the program to appropriately update the values of the auxiliary variables, the statements that update communication sequences are already in the program in the form of the input and output commands that the individual processes use to communicate with each other.It is also reflected in the proof systems; whereas systems that allow arbitrary auxiliary variables must include appropriate rules governing the introduction and removal of the variables from programs (see, for example, [1]), systems that include only communication sequences need only ensure that the rules corresponding to the input and output commands capture the effect of these commands on the communication sequences (see, for example, [7]).Two types of communication sequences have been used in the literature.The first, which we will call channel traces or CTs, records the communications that a process È goes through on one of its channels, a separate channel trace being used to record È 's communications along each of its channels.The second type of sequence, which we call process traces or PTs, records all the communications that È engages in along all of its channels on a single trace variable.The primary advantage of channel traces is ease of use.In particular when composing in parallel a set of processes È ½ È Ò , the channel traces of È ½ È Ò ℄ are obtained directly from the channel traces of the component processes.Thus if is a channel along which È ½ È Ò ℄ communicates with an external process, will also be a channel of exactly one of È ½ È Ò , say, È ; hence the trace of È ½ È Ò ℄'s communications along is identical to the trace of È 's communications along .The primary disadvantage of channel traces is the lack of information about the relative ordering of communications along different channels of a process.
The goal of this paper is to consider the question of exactly what information is missing from channel traces, discuss in detail some attempts -by adding some axioms to the proof system-that have been made in the literature to handle the resulting problems, and to show via relatively simple examples that channel traces have a fundamental incompleteness about them that cannot be resolved by the addition of such axioms.We should note that for most practical examples, channel traces are entirely adequate.But in all these examples, correctness proofs that use process traces are no more complex than those that use channel traces; indeed, any proof using channel traces can be translated in a straightforward manner into one that uses process traces.Our conclusion is that while channel traces are usually sufficient, (relative) completeness requires us to use process traces as the underlying mechanism in our proof systems.
The rest of the paper is organized as follows.In the next section we briefly describe the fairly standard notation that we use for processes, traces, and specifications.In the third section we present two examples that will illustrate the types of information that channel traces do not provide.The first example exhibits behavior like that of the classic Brock-Ackerman [2] example and we show in section 4 how the CT-based approach can, when strengthened with axioms such as those proposed by Widom et al [9], deal with it.(The original Brock-Ackerman example and even the simplified version in [5] are rather involved.Our example is simpler but illustrates the same point.)Our second example is new to this paper and we show in section 4 why general axioms such as those of [9] are inadequate to deal with the problem illustrated by this example.This seems to contradict the results of [9] and we discuss the apparent contradiction.Briefly, the problem is as follows: Widom et al [9] essentially prove that if we have 'sufficient information' about the component processes È ½ È Ò of a program, then all valid conclusions about È ½ È Ò ℄, that are expressible in terms of the channel traces of È ½ È Ò ℄ can be derived using their axioms.Our example shows, however, that in some cases the required information about the component processes È ½ È Ò cannot be expressed in terms of the channel traces of those processes although the result about È ½ È Ò ℄ can be expressed in tems of its channel traces.And this is not a problem of the assertion language used to express these properties; rather it is a result of the nature of channel traces.In section 4 we also briefly relate our discussion to the models of Jonsson and Kok [4].Jonsson and Kok introduce the notion of full abstractness to represent whether or not a particular model contains enough information to avoid Brock-Ackerman type of problems.
In section 5 we reiterate our conclusions and briefly mention an extension to channel traces that may allow us to handle the incompleteness problem in a very different manner.Interestingly, this approach would require us to introduce new statements into the processes to include additional information in the channel traces in the spirit of the original auxiliary variables.We should note that we have not fully explored the ramifications of including such additional information in the traces, nor have we formalized axioms and rules corresponding to this extension.We hope to do that in a future paper.

Processes, Traces, and Specifications
We use a standard CSP-like notation with processes communicating along named channels.A process may either be a sequential process or the parallel composition È ½ È Ò ℄ of Ò other processes.A sequential process is made up of the standard statements: skip, assignment, sequential composition, selection, and repetition.Selection and repetition may have input as well as output guards.
Consider a parallel composed process È ½ È Ò ℄.Let Á Ç be the sets of input and output channels of È .We will make the standard assumption that È ½ È Ò are 'compatible', i.e., that no channel appears in more than one Á or in more than one Ç .(We have borrowed some of our terminology and notation from [4].)If appears in Á Ç then it is a channel along which È can send values to È ; such a channel is an internal channel of È ½ È Ò ℄, communications along which are, of course, not visible to an external observer of È ½ È Ò ℄.If appears in Ç but not in Á , it is an external channel of È ½ È Ò ℄ and is used by È ½ È Ò ℄ for outputting values to the 'environment' -more precisely to the process at the other end of .Similar remarks may be made if ¾ Á ¾ Ç .
Let È be a process, sequential or parallel composed.An external observer of È will see all the communications that È engages in on its external channel but none of its internal activities including communications on any internal channels (in the case of parallel composed processes).The observer can record the entire sequence of È 's communications with external agents and this record is the process trace of È .Thus is a sequence of elements each representing a communication between È and an external process, an element of being of the form ´ µ where is the channel and the value communicated.Communications in which È receive a value and those in which it sends a value are both recorded in the same manner.Alternately, we may associate a separate channel trace with each channel of È and record all communications that È engages in along this channel in .Each element of is just a value there being no need to record the identity of the channel.Given the value of the process trace at any time, we can obtain the corresponding channel traces of È , say in particular, by simply projecting out those elements of whose first component is and then omitting this first component from each element.On the other hand, given the values of all the channel traces of È at some time we cannot, in general, obtain the corresponding since we do not know the relative order of communications along the different channels.
Consider next the specification of a process.One possibility is to specify the set of possible values that the process trace , in the case of the PT-based approach, or the various sets of possible values that the channel traces of È in the case of the CT-based approach, may have when È finishes.A better alternative, especially for non-terminating processes, would be an invariant that specifies the possible values that or the set of values that the channel traces of È may have at any time during execution.We will use invariants not only because of their ability to deal with non-terminating processes, but also because, as we will soon see, they add considerable power to the CT-based system even for terminating processes.We will not formally define a language of assertions for the invariants.Something along the lines of the STL (Simple Trace Logic) of Widom [8] would be appropriate.
Let us consider a simple example.Suppose is a bounded buffer of size Ò that reads values on channel and outputs them on .¼ ´Ò ½µ℄ is the internal array in which stores the values it has input but not yet output.Ò ÓÙØ keep track of how many values have been input and output respectively: A CT-invariant Ö for would be: where denotes 'prefix of', and is the number of elements in .Thus Ö says that the sequence of values has output on channel is a prefix of the sequence of values it has input on , with the length of the former being no more than Ò less than the length of the latter.
Suppose we have two bounded buffers ½ ¾ ; ½ is exactly like except it has a capacity Ò ½ , inputs on channel , and outputs on channel ¼ ; ¾ is also like except it has a capacity Ò ¾ , inputs on ¼ , and outputs on .Pictorially The invariant for ½ is: and for ¾ : The invariant for ½ ¾ ℄ can be obtained by just conjuncting Ö ½ and Ö ¾ and eliminating references to communications on the internal channel ¼ : Simplifying and eliminating references to channel ¼ , we get, as invariant for This invariant shows that ½ ¾ ℄ behaves like a buffer, copying values from to as we would expect.Note that we have used the same name ¼ for the trace of values output by ½ on ¼ and the trace of values input by ¾ on ¼ .This is a fairly standard trick but, of course, it is ok only if i/o is synchronous (so that these traces are, in fact, always equal).If i/o were asynchronous, we would have to use distinct names, say, ÓÙØ ¼ Ò ¼ for these traces and add the condition Ò ¼ Ò ¼ at parallel composition, i.e., when combining Ö ½ Ö ¾ to obtain the invariant for ½ ¾ ℄.For convenience we will assume synchronous communications in our discussion.
In a PT-based system, since the invariants Ö ½ Ö ¾ would be expressed in terms of ½ ¾ the process traces of ½ ¾ , things would seem to be more complex since to obtain the process trace of ½ ¾ ℄, we would have to 'merge' ½ ¾ , and then omit the elements corresponding to communications on the internal channel ¼ .(For details, see for example [7].)In fact though, in examples like ½ ¾ ℄, the situation in the PT-based system is actually not very different than in the CTbased system since Ö ½ Ö ¾ can be expressed in terms of projections of ½ on the channels ¼ and the projections of ¾ on ¼ respectively.Once this is done, the invariants in the PT-based system look exactly like the invariants in the CT-based system and combining these to obtain the invariant for the parallel composition is no harder than in the CT-based system since the 'merge' operation in this case reduces to identifying, for instance, the projection of ½ on ¼ with the projection of ¾ on ¼ , the projection of the trace of the parallel composition on the channel with the projection of ½ on etc.In general, given a CT-based proof, we can similarly translate it into an equivalent PT-based proof.

Consider again the buffer process . Consider the value
´ ¼µ ´ ¼µ ´ ½µ ´ ½µ ´ ¾µ for the process trace of .The corresponding values for and are ¼ ½ ¾ and ¼ ½ respectively.These values satisfy the invariant Ö, but this particular trace cannot in fact arise during the execution of (since the value 0 appears on the channel before it has been input on the channel ).In a process trace based approach we can easily fix this problem by strengthening the invariant as follows: where ¼ is the sequence obtained from ¼ by retaining only the elements communicated along the channel ; in other words, it is the channel trace on corresponding to the process trace ¼ ; ¼ is similar.This invariant allows us to conclude that the given trace ´ ¼µ ´ ¼µ ´ ½µ ´ ½µ ´ ¾µ cannot arise during the execution of since if we take ¼ to be ´ ¼µ , it will not satisfy the conditions in Interestingly, Widom et al [9] showed that there is no need to resort to process traces to solve this problem.The intuition behind their approach was that although the original invariant Ö does not directly rule out the trace specified in the last paragraph, we can still argue that it is not an acceptable value for according to Ö: Since Ö is an invariant, not only must it be satisfied at the current moment, but it must also have been satisfied at all moments in the past.But this would not have been the case immediately following the communication recorded in the first element of the in question since at this point the corresponding values of the channel traces would have been (the empty trace) and ¼ respectively and this pair of values would not have satisfied the invariant Ö.
Using this type of reasoning, one can in general extract, from the CT-based invariant, considerable amount of information on the relative ordering of communications along different channels.Widom et al proposed two general axioms, prefix and ordering, to capture such properties that follow from the fact that we are dealing with invariants over (channel) traces; the idea being that these axioms would allow us to derive, on the basis of invariants over channel traces, all the information that we would need about the communications, and that there would never be any need to resort to process traces.One of the examples in the next section will show that this is not possible in general.

The Missing Information
We present two examples in this section.The first one is similar, as far as the problem it demonstrates is concerned, to the Brock-Ackerman [2] example.This problem, as we will see in the next section, is handled by the [9] axioms.Our second example will demonstrate a different type of problem with the CT-based systems that cannot be handled by such axioms.
Before we present the examples, we should make some general remarks.The bounded buffer example of section 2 is not really a good example of incompleteness in the CT-based system.This is because the particular property of the buffer we considered -that the output of 0 on channel must have taken place after the input of 0 of channel -is not expressible using channel traces because once the communications on and have taken place, there is no way to talk about the order of those communications.So one could validly argue that there is no need to worry about this property or about the fact that it is not implied by the CT-based invariant Ö.The importance of the Brock-Ackerman example is that it presents a property Ô of a process È ½ È ¾ ℄ that is expressible in terms of its channel traces and that also intuitively follows from the (channel-trace) invariants Ö ½ and Ö ¾ of È ½ and È ¾ , but is not implied by Ö ½ Ö ¾ .The axioms of [9] solves this problem since in conjunction with those axioms Ö ½ Ö ¾ does imply Ô.
Our second example behaves differently.In that example a process È ¿ È ℄ has a property Õ that is expressible in terms of its channel traces.But the properties of È ¿ and È that lead to È ¿ È ℄ having the property Õ are not themselves expressible in terms of their channel traces.It is this kind of example that general axioms such as those of [9] cannot help with.
Let us now turn to the actual examples.The first is the parallel composition of 2 processes È ½ È ¾ ℄ where The invariants Ö ½ Ö ¾ for È ½ È ¾ (in Ö ½ Ö ¾ below as well as in the rest of the paper, we will use rather than to denote the trace of communications on the channel ; this should cause no confusion since the context will make it clear whether we are talking about the channel or the corresponding channel trace): Eliminating references to the internal channels ½ ¾ , we get, as the invariant for From this there is no way to see that È ½ È ¾ ℄ will not output any values on .But as we will see in the next section, Ö ½ Ö ¾ , in conjunction with the axioms of [9], will let us derive this.
Our next example is more involved.It is the parallel composition of 2 processes È ¿ È ℄ where The invariants Ö ¿ Ö for È ¿ È are: The reader might already see the beginnings of a problem in Ö ¿ .There is no way to see the relation between the value output on and the order of the previous communications on ¿ .And there is no way to strengthen Ö ¿ to include information about this relation since each combination of values for ¿ allowed by Ö ¿ can actually arise during the execution of È ¿ .We may, for now, simply ignore this relation but when we consider È ¿ È ℄, we will see that without information about this relation (and a similar relation for È ) we will be unable to establish a property of that process although the property is easily stated in terms of its channel traces.
Combining the two: Simplifying and omitting references to the internal channels ¿ , we get But we can write a stronger invariant for È ¿ È ℄ to express the equality of values output on : but, as we will see in the next section, there is no way to derive this from Ö ¿ Ö even with the help of the [9] axioms.

The Incompleteness of Channel Traces
Consider the first example È ½ È ¾ ℄ from section 3. The conjunction of Ö ½ Ö ¾ , the invariants of È ½ È ¾ is: This assertion does, using the special kind of reasoning from section 2, let us conclude that no value will be output on since for that to happen a value must have also been output on ½ and on ¾ .But if we consider the situation when only one or two of these three values have been output, the corresponding values of the channel traces ½ ¾ will not satisfy Ö ½ Ö ¾ and hence, since there is no way for the process to simultaneously communicate on all three channels, no values will be output on any of the channels.In particular no value will be output on .
But this type of argument does not apply once we abstract away the internal channels ½ ¾ and obtain the 'proper' invariant (i.e., the invariant that refers only to È ½ È ¾ ℄'s external channels: To solve this problem, [9] introduce two new axioms.Their prefix axiom says that the value of a channel trace at any time will be a proper prefix of its value at the 'next' moment.Essentially this says that communications cannot be 'taken back' once they have occurred.Their second axiom, the ordering axiom, essentially says that if (an invariant asserts that) two or more communications must take place simultaneously (or not at all) then they will not take place at all.They also go on to show that there is no way to express these axioms without using temporal operators and Widom [8] analyses in depth exactly 'how much' of temporal notations are needed to express these axioms (the point being that if we simply allow all the power of temporal logic (as do [5]), we would lose the simplicity of the CT-based approach and such a system would be even more complex than the PTbased system).For our purposes though, it is not necessary to look at the precise formal expressions of these axioms.All we need is to note that using these axioms we can formalize the type of informal arguments we have considered.This essentially takes care of our first example È ½ È ¾ ℄ since we can conjunct the ordering axiom to Ö ½ Ö ¾ and derive the stronger invariant: We can then omit the references to the internal channels and derive the required invariant ℄.
Consider now our second example.By looking at Ö ¿ (and Ö ) there is, as we saw, no way to tell what the relation between the value output on (and ) and the the order of prior communications on ¿ .We can see this by noting that the process also satisfies Ö ¿ and for this process there is indeed no relation between the value output on and the order of communications on ¿ .A similar process È ¼ satisfies Ö again with no relation between the value output on and the order or communications on ¿ .Given these facts it is clear that the [9] axioms cannot be used to strengthen Ö ¿ Ö to allow us to establish the equality of the values output on and by È ¿ È ℄ since otherwise we will be able to do the same for , and this equality does not hold for The only other possibility would be to try to arrive at stronger invariants for È ¿ and È ; È ¼ ¿ and È ¼ will presumably not satisfy these stronger invariants.But if we are using a CT-based approach, no stronger invariant than Ö ¿ and Ö can be valid for È ¿ and È since every one of the combinations of values for the traces ¿ and ( ) allowed by Ö ¿ (Ö ) can actually arise during execution.So no strengthening of the invariants of È ¿ È is possible.If we cannot strengthen Ö ¿ Ö and we should not even try to strengthen Ö ¿ Ö (given the existence of the conclusion has to be that there is no way to establish the relation between the values output on channels and by È ¿ È ℄ in a CT-based system. Widom et al introduce the notion of 'precise invariants' in their paper.Informally, an invariant is precise if every combination of (channel) trace values that satisfies the invariant either can actually arise during the execution of the process or is such that 'somewhere along the way to reaching this set of values' the traces would have to go through one or more combinations of values that do not satisfy the invariant.(The idea being that in that case we can use their axioms to strengthen the invariant to be satisfiable only by those combinations of trace values that can actually arise during execution.)They then proceed to show that given precise invariants of component processes, we can derive a precise invariant for the parallel composed process.This seems to be contradicted by our example.Ö ¿ Ö are precise invariants for È ¿ È since each combination of trace values allowed by Ö ¿ Ö can arise in practice, yet Ö ¿ Ö is not precise for the process È ¿ È ℄ as we saw.
The problem is that though the examples that they consider suggest the above definition of preciseness, Widom et al's formal definition of preciseness is stronger.To see this definition let us introduce the notion of a 'state' of a process as the set of values of all the channel traces of the process at any given time.Consider now a sequence of states such that each state can arise from the previous one by performing one communication on one channel, in other words there are no violations of the axioms of [9] in going from state to the next in the sequence.Suppose also that each state in the sequence satisfies the invariant Ö in question.Then we will say that Ö is 'path precise' (this is not a term that [9] use; we are using this term to distinguish it from the notion introduced in the last paragraph) if this sequence of states can actually arise during the execution of the process.Note that it is not enough that each state can arise; the entire sequence must be capable of arising during execution.It is clear that path preciseness implies preciseness but the converse is not true.In particular Ö ¿ Ö are not path precise for È ¿ È since Ö ¿ for instance is satisfied by each state in the following sequence: , and each state in this sequence can be obtained from the previous one by a single communication, but this sequence cannot arise during the execution of È ¿ .
In their proof that from precise invariants of component processes one can derive precise invariants of the parallel composition it is this stronger definition of preciseness that [9] use, not the weaker one we considered earlier.But if we assume that we must start with specifications of component processes which are (path) precise, we have a serious problem.There is no way to specify, in first order logic, such invariants for È ¿ and È ; as we saw Ö ¿ Ö are the strongest such invariants these processes satisfy.We would have to instead move to temporal assertions which means that much of the promise that the [9] approach held, that of retaining the simplicity of the first order CT-based system, will not be met.Indeed once we have such temporal assertions, there would be no need for the axioms proposed by Widom et al because once we allow such temporal invariants, Ö ¿ and Ö can include all the ordering information that are specifiable using process trace invariants. 2efore concluding this section we briefly consider the models that Jonsson and Kok [4] prove are 'fully abstract'.Informally, a model is fully abstract if it contains all (but no more) information necessary to distinguish between processes that exhibit differences in their communications with the outside world. 3It is important for the model to contain this information since otherwise, as we saw in the examples, these differences can be translated into effects that can show up in the channel trace specifications of processes constructed out of these processes.Jonsson and Kok [4] establish the full abstractness of three models.The first is the process trace model.The remaining two models are the ones of interest to us.Both of these use channel traces but the semantics of a process is not just the set of values of its channel traces; rather it is a set of functions whose values are essentially all the legal sequences of states, where by a state we mean, as before, the set of values of all the channel traces.But given such a legal sequence, we can directly map it to the corresponding process trace and this is exactly why these models are fully abstract.(The main difference between the two models that [4] consider are in the details of how the functions are specified.)The problem, of course, is that the simplicity which was the original advantage of the CT-based system over the PT-based system is lost since specifying these legal sequences, or combining the sets of legal sequences of component processes to obtain the set of legal sequences of the parallel composition, is no easier than the corresponding task in the PT-based system.

Discussion
Is it likely that stronger axioms than those [9] can be found using which we can achieve completeness in a CT-based system while using only first order assertions for invariants of processes?The answer, as indicated by our È ¿ È ℄ and È ¼ ¿ È ¼ ℄, is almost certainly no.The problem is that even a 'small amount of missing 3 It is worth noting here that the term 'fully abstract' is perhaps a bit confusing.The problem with the channel trace models and some of the others analyzed by [4] is not that they are not abstract enough but that they are too abstract; the problem is not that they contain inappropriate information (such as, for instance, the details of the internal workings of a process) but that they do not contain enough information about the external behavior of the processes in question.In order for a model to be fully abstract according to the definition of [4], the model is required not only not to contain too much information (such as about the internals of the process) but also not contain too little information; it is this latter aspect that we are concerned with in this paper.Perhaps something like sufficiently expressive rather than fully abstract might be a better term for this.
information' about the ordering between communications on separate channels can be converted by composing the given process with an appropriately cleverly designed process to manifest itself as influencing the actual values that are output on specific channels.In fact, [4] (in an appendix to their paper) show how to construct such a process quite simply.If we could recover this information using general axioms, then we would also be able to apply these axioms to a differently designed example where the missing information ought not to be derived (because it is not valid) and we would have a serious problem.But there may be another possible approach.The problem with channel traces is that the information they contain is not sufficient to allow us to discriminate between È ¿ and È ¼ ¿ and similarly between È and È ¼ .But what if we added the required information to the channel traces?This would be like introducing appropriate auxiliary variables and introducing assignment statements to update them at appropriate points so that they contain the required information.Similarly we may be able to introduce additional communication commands to include additional information in the individual elements of the (channel) traces.For instance, in the particular example, we could include information immediately following a communication on channel ¿ whether or not the communication on has taken place, and specify how the value communicated later on the channel depends on that earlier information.We might then be able to combine these pieces of information from È ¿ and È to show that the values communicated by È ¿ È ℄ on channels and are equal to each other.This is obviously very preliminary and many details remain to be worked out but it looks promising.One problem we must deal with is the possibility that the additional communication commands we introduce do not influence the behavior of the processes; in particular, we certainly do not want to introduce deadlocks into the system!One easy way of ensuring this would be to allow such auxiliary communication commands to be only 'piggy-backed' onto existing communication commands.We plan to investigate this possibility and report on the results in a future paper.