On Continuous Models of Computation: Towards Computing the Distance Between (Logic) Programs

We present a report on work in progress on certain aspects of a programme of research concerned with building formal, mathematical models both for aspects of the computational process and for features of programming languages. In this paper, considering work of Kozen showing that complete normed vector spaces (Banach spaces) and bounded linear operators provide a framework for the semantics of deterministic and probabilistic programs, we include logic programs within this framework. We thereby make it a framework in which it is possible to handle the semantics of all three types of program. Using these ideas, we advance a programme of research proposed by M. Bukatin and J.S. Scott concerned with defining and computing meaningful notions of metrics and generalized metrics measuring the distance between two programs, the terms metrics and generalized metrics being used here in the precise sense in which they are employed in mathematics. The long-term objective of this work is to use such metrics as tools in measuring correctness of programs.


INTRODUCTION
Roughly speaking, the term software metric as used in software engineering refers to some property of a piece of software, or of its specification, which can be measured.Such software metrics might include the number of lines of source code, the number of operators, the number of executable statements, the cyclometric complexity (McCabe) etc.They can be thought of as giving a rough measure of how far a program is from some ideal or, via similarity graphs, how far apart two programs are.On the other hand, it is common practice in many branches of mathematics and related disciplines to introduce metrics, pseudometrics or generalized metrics, in the precise mathematical sense defined in Section 2, in order to measure the distance between any two members of a given class of objects under investigation.The intuition behind this is that two objects which are close together, that is, are a small distance apart should share many properties.Indeed, one might expect that two objects which are zero distance apart should always be identical and, conversely, that the distance from an object to itself should always be zero.Whether or not these expectations actually hold depends, of course, on the precise definition one adopts of "metric" and its generalizations, and we discuss this issue in Section 3.1.In the context of computing science, the question of measuring the distance between two programs using metrics or generalized metrics has been addressed in [2,4] in terms of domain theory.In fact, the work just cited may, in a certain sense, be viewed as an abstract formulation of the software development process.Indeed, granted for a moment that the goal of producing completely correct software is not feasible, it is argued in [2] that the provision of suitable optimal methods which produce close approximations to the ideal is an alternative to the absolute goal of formal methods.It is further argued in [2] that such methods depend on the powerful tools and methods of so called "continuous mathematics", and therefore that these tools should be made available in the context of program development environments (including that of domain theory).
Of course, the mathematical tools traditionally associated with computer science are logic and discrete mathematics, the latter including set theory, abstract algebra, combinatorics, graph theory, order theory, and the like, and these topics naturally reflect the discrete nature of computer science.On the other hand, there is a quite well-established usage of the methods of continuous mathematics, by which we mean topology, mathematical analysis, linear analysis etc. 1 , in dealing with programming languages involving uncertainty and probabilistic elements in various ways, and in handling concurrency.For example, in the fundamental paper [6], Kozen defines a semantics for a simple programming language allowing assignment statements involving a random number generator.(Deterministic semantics is of course a special case of this obtained essentially by eliminating the random assignment.)In this semantics, each program denotes a bounded linear operator on a Banach lattice of measures, and we will give some details of this construction in Section 3.2.In [7] and its references, the reader will find some similar examples from the area of concurrency.Thus, the tools employed in these papers are sophisticated and include measure theory and functional analysis, and also the methods of linear algebra and the theory of operator algebras.Of considerable interest, however, is the fact that Kozen shows that Scott domains (and fixed-point semantics) can be embedded in his Banach-lattice framework and hence in the conventional structures found in mathematical analysis.Consistent with the theme of the previous paragraph are a number of applications of mathematical analysis to the theory of logic programming semantics made by various writers, including the firstnamed author of this paper and his co-authors.In the main, such applications result from viewing certain semantic operators arising from logic programs as rather general dynamical systems, see [13] for example.(We support the view proposed by Prakash Panangaden in [11] that "... Indeed one can say that the semantics of programming languages is a branch of dynamical systems and should be studied as such.")Normally, these operators T are studied by means of order theory, and one is interested especially in their fixed points, if any exist.Our present point of view is different, and is to "linearize" such operators T by representing them as bounded linear operators on Banach spaces and hence determining their properties by studying their linearizations from the view point of linear analysis in a manner akin to that found in [6].When this much is done, one then sees, by virtue of Kozen's embedding mentioned above, that Banach spaces and linear operators defined on them provide a common framework in which to define the denotational semantics of imperative programs, of probabilistic programs and of logic programs, and it is the first of our two main objectives in this paper to draw attention to this observation.Moreover, the framework of Banach spaces and linear operators is highly conducive to the approximation methods proposed in [2], indeed more so in many ways than the framework of domain theory.In particular, once programs denote bounded linear operators, it is possible to consider the distance between two programs determined by the operator norm of the difference of the corresponding operators.This distance determines a pseudometric in general, but by a standard process one can pass to an associated metric.It is the second main objective of this paper to advance the programme of research proposed in [2] by reporting on initial investigations to found a theory of distance functions defined on pairs of programs in this way, although we focus here mainly on logic programs.Nevertheless, there are several issues which immediately arise whichever type of program one wishes to consider, as follows.The first is the question of how well the linearizations reflect properties of the programs in question.The second is the question of how meaningful the distance between two programs is, and in particular how meaningful it is in terms of program development when two programs are close together.The third is the question of actually computing the distance itself between two programs.These issues will be partially addressed here.As far as the second point is concerned, the linear operators in question are all related to the semantics of programs rather than to their syntax.In the case of logic programs, these operators are determined by semantic operators, as already noted.Thus, we note here straight away that two definite logic programs which are distance zero apart have the same (fixedpoint) semantics and hence compute the same things, which is what one would expect, and this fact can be viewed as positive evidence that the methods proposed here have some significance.The overall structure of the paper is as follows.First, in Section 2, we collect together the necessary background we need in domain theory and analysis.Then, in Section 3, we discuss the properties one would expect of functions measuring the distance between pairs of elements of various classes of programs, commencing with a brief summary of the results of [2,4], moving then to a brief discussion of the results of [6] before commencing, in the latter part of Section 3, the presentation of our own results in the context of logic programs.This development is continued in Sections 4 and 5.In particular, we show in Section 5 that two definite programs which are distance zero apart are subsumption equivalent and hence compute the same things.We briefly discuss the issue of how well our linear operators represent programs, and close with a simple, but interesting, example showing that the metrics we define reflect properties of programs.Finally, in Section 6, we list our conclusions.Because of the technicalities involved and lack of space, we do not include details of all of the constructions and proofs.Acknowledgement We thank two anonymous referees for their comments, which helped sharpen the focus and presentation of the paper.

BACKGROUND MATERIAL
In this section, we collect together the basic facts we need in the sequel concerning semantics, domain theory, analysis etc. with the intention of making the paper as self contained as possible, but with no pretence at completeness.We begin with the following definition which formalizes the various notions of distance function that one needs to consider.Definition 2.1 Let X be a set and let d : X × X → R + be a mapping, where R + denotes the set of non-negative real numbers.
1. We call d a metric if it satisfies the following axioms: (a) For all x, y ∈ X, x = y if and only if d(x, y) = 0.

We call d a pseudometric if it satisfies axioms (b) and (c) for a metric, and also the following axiom:
(a ) For all x ∈ X, d(x, x) = 0.
3. We call d a partial metric if it satisfies the following axioms: (a) For all x, y ∈ X, x = y if and only if d(x, x) = d(x, y) = d(y, y).
Thus, a pseudometric fails to be a metric just to the extent that d(x, y) = 0 does not imply that x = y.In general, if d is a pseudometric defined on a set X, it is a standard procedure, see [14] It should be noted that a partial metric d can satisfy d(x, x) = 0.This property is slightly surprising, but in fact partial metrics were introduced by S.G.Matthews, see [10], in the context of Kahn's dataflow model, where they naturally occur, and they have found application in a number of other places in computing, one of which we will discuss below in Section 3.1.It can be argued that d(x, x) = 0 simply means that x is only partially defined.
As far as semantics is concerned, we will not be concerned here either with operational or axiomatic semantics of programming languages, but only with denotational semantics.Thus, by the term semantics we mean denotational semantics and therefore we are concerned with mappings from a syntactic space of programs (or program parse trees) to a space of meanings.This theory divides into two parts: one based on order and domain theory, the other based on more conventional mathematics, as we shall see.
The basic ingredients in order-based semantics are as follows.By the term domain we understand a directed complete partial order (D, ) equipped with the Scott topology, see [1].Thus, (D, ) is a partially ordered set with bottom element, ⊥, in which each directed subset M has a supremum, M , in D. Furthermore, a subset U of D is, by definition, Scott open if it is upwards closed, that is, whenever x ∈ U and x y we have y ∈ U , and for any directed set and (E, ) are domains and f : D → E is a function, then f is called Scott continuous if it is continuous in the Scott topologies on D and E. In fact, this is equivalent to requiring that f is monotonic (whenever x y we have f (x) f (y)) and for any directed set M , we have f ( M ) = f (M ).The usual thinking in semantics based on order is that all data types should be represented by domains, and all computable functions should be represented by Scott continuous functions between domains.We assume that the reader has a basic familiarity with vector spaces.Most of the time we will be working over the real field of scalars, but most of what we say will apply equally well in the case of the complex field also.Let E be a vector space.We remind the reader that a norm on E is a mapping : E → R + satisfying the properties: (1) x = 0 if and only if x = 0, (2) ax = |a| x for all scalars a, and (3) x + y ≤ x + y .It is an important fact that any norm on E induces a metric d on E, where d(x, y) = x − y .As usual, we call a normed vector space complete or a Banach space if it is complete in this metric.It will also be convenient to record some elementary facts concerning linear operators or linear transformations defined on Banach spaces.Thus, let (E 1 , 1 ) and (E 2 , 2 ) be (real or complex) Banach spaces, and let L : E 1 → E 2 be a linear mapping.Then L is said to be bounded if there is a real number K ≥ 0 with the property that L(x) 2 ≤ K x 1 for all x ∈ E 1 .When L is bounded, we define the norm, L , of L by Of course, the value of L depends on the choice of the norms on E 1 and E 2 , and we indicate this fact when necessary by writing L 1,2 .Furthermore, we will on occasions use, without further mention, the following well-known facts concerning the norm of an operator L.

Proposition 2.2 Let
2 ) be a bounded linear operator.Then the following hold.

DISTANCE FUNCTIONS ON CLASSES OF PROGRAMS
It will be helpful to first summarize the results of [4] and [6].

Deterministic Programs
In [4], Bukatin and Scott consider the conventional semantic framework of a domain P of (parse trees of) programs, a domain A of meanings and a Scott continuous semantic function : P → A determined by the usual methods of denotational semantics.Furthermore, it is assumed that we have a domain D representing distances (usually, the intervals domain) and a Scott continuous function ρ : A × A → D, called a generalized distance function.Under certain natural topological conditions, ρ will reflect computational properties of A in which case ρ( p 1 , p 2 ) can be thought of as yielding a computationally meaningful distance between programs p 1 and p 2 , namely, the distance between their meanings.Finally, assume that there is an element 0 of D representing the ordinary numerical zero.One of the main conclusions of [4] then is that ρ(a, a) must be non-zero for some values of a. Thus, it emerges that partial metrics and the associated relaxed metrics defined in [4] are the most appropriate distance functions to consider in the context of domain theory if one wants Scott continuity of ρ.

Probabilistic Programs
In [6], Kozen considers while programs which have simple assignment, composition, conditional tests, while loops and calls x := random to a random number generator.He gives two semantics for such programs: one operational in flavour and the other denotational in flavour, and it is the second which concerns us.Suppose that the program variables are x 1 , . . ., x n .Let (X, M ) be a measurable space and let B = B(X n , M (n) ) be the set of measures on the cartesian product measurable space (X n , M (n) ).
The Sixth International Workshop in Formal Methods (IWFM'03) Thus, B consists of all linear combinations of all possible joint distributions of the program variables x 1 , . . ., x n , where addition and scalar multiplication are defined by Let P denote the cone of positive measures in B, and let µ denote the total variation norm of µ given by µ = |µ|(X n ), where µ denotes the total variation measure determined by µ, namely, the sum of the positive and negative parts of µ in the Hahn-Jordan decomposition of µ.Then (B, P, ) is a conditionally complete Banach lattice.Thus, (B, ) is a Banach space; P orders B by µ ≤ ν if and only if ν − µ ∈ P, each pair µ, ν has a least upper bound or join µ ∨ ν, and each bounded subset of B has a least upper bound; and, finally, we have that |µ| = µ , and whenever 0 ≤ µ ≤ ν, we have µ ≤ ν .The measures of interest here are the probability measures, namely, the positive measures with norm 1, and the subprobability measures, namely, the positive measures with norm at most 1.Kozen shows that any program in the language in question maps, in a natural way, a probability distribution to a subprobability distribution, and that this mapping extends uniquely to a bounded linear operator B → B; we do not, however, have space to give more details of this here other than to remark that the semantics of while loops is given in terms of least fixed points, as usual.Thus, ultimately, each program S denotes a bounded linear operator, and hence may be interpreted as an element of the Banach space B of all bounded linear operators B → B. Furthermore, B is itself ordered by requiring S ≤ T if and only if S(µ) ≤ T (µ) for all µ ∈ P. Note that by eliminating the random assignment and restricting input distributions to point masses, one obtains deterministic semantics as a special case of probabilistic semantics.Using these facts, Kozen then further shows that one may embed the domain P f n(ω → ω) of partial functions, endowed with the usual ordering of graph inclusion, into B for a suitable choice of B, where ω denotes the set of natural numbers.Under this embedding, the totally undefined function (the bottom element) is mapped to 0, and in P f n(ω → ω) is mapped to ≤ in B , as one would require.Thus, in a rather general sense, one sees that Banach spaces and bounded linear operators form a semantic framework for both probabilistic and deterministic programming languages.Furthermore, once each program denotes a bounded linear operator, one has a natural distance function d defined on classes of programs by d(S, T ) = S − T , that is, the operator norm of the difference of S and T , where we are using the same symbols S, T etc. to denote both programs and their denotations which, in this case, are bounded linear operators.Our remaining task is to carry out this sort of construction in the context of logic programs, and we do this next.Once this is done, we will have fulfilled our first objective of showing that Banach spaces and linear operators form a semantic framework for deterministic, probabilistic and logic programs.Then, by considering the distance function d just defined, we will have candidate distance functions to fulfill our second objective for each type of program we are discussing.

Logic Programs
There are various possible ways of associating Banach spaces and bounded linear operators with logic programs, but the one we pursue here uses the notion of composition operator defined below; other ways will be considered elsewhere.One obvious requirement in this process, of course, in any computing paradigm is that if the operators corresponding to programs P 1 and P 2 are equal, then P 1 and P 2 should be related in some sense.In other words, the representation in terms of operators should be faithful relative to some notion of equivalence on programs.We shall see later on that this is so in the case of logic programs.Let X be a set, and let F denote either the real or complex scalar field.We let F(X) denote the set of all functions f : X → F defined on X and equipped with the usual vector space and algebra operations defined pointwise as follows: for all x ∈ X, and (αf )(x) = αf (x) for all x ∈ X and all α ∈ F, where f 1 , f 2 and f are elements of F(X).
The Sixth International Workshop in Formal Methods (IWFM'03) Definition 3.1 Let T : X → Y be a mapping.We define the composition operator F(T ) : Thus, the situation in the previous definition can be pictured as follows of algebras over F , and F determines a contravariant functor from the category of sets and mappings to the category of algebras and algebra homomorphisms over F .
In the sequel, we will be concerned only with the case in which X = Y , that is, with functions T mapping X to itself and therefore with the corresponding composition operators F(T ) : F(X) → F(X).Nevertheless, it is useful to give the definition in general since it is this that makes clear the contravariance of F.
In practice, the sets X we consider will be sets of partial functions or sets of valuations etc., and may be treated as sets, topological spaces (in the Scott topology, the Lawson topology etc.) or as measure spaces.Of particular importance is the case when X is a space of valuations equipped with the Cantor topology, and is thus a compact metric space.Furthermore, the scalar valued functions we consider defined on X will either be bounded or essentially bounded, in which case F(X) will be equipped with the uniform norm, or will be integrable functions, and then F(X) will be equipped with one of the L p norms.It will therefore be convenient next to summarize the properties we need of these two classes of normed spaces.

Bounded Functions and the Uniform Norm
Noting the categorical nature of F, and in order to distinguish between the cases, it will be helpful to use the notation B(X) for F(X) and B(T ) for F(T ) in the case of bounded functions.Later on in this section, the notation L(X) respectively L(T ) will be established in the case of integrable functions and the L p norms.Thus, we introduce the following notation.
Notation 3.3 Let B(X) denote the set of all bounded real or complex valued functions defined on X.In both cases, we endow B(X) with the uniform or supremum norm, ∞ , defined by f ∞ = sup{|f (x)| ; x ∈ X}.Proposition 3.4 For any set X, the pair (B(X), ∞ ) is a Banach space.

Integrable Functions and the L p Norms
Suppose that (X, B, µ) is a measure space.For a real number p satisfying 1 ≤ p < ∞, we define L p (X, B, µ) to be the set of all measurable real valued functions defined on X for which X |f | p dµ is finite.If (X, B, µ) is understood, we write L p (X, µ), L p (X) or just L p for L p (X, B, µ).For an element f ∈ L p (X), we define f p by In case p = ∞, we define L ∞ (X, B, µ) to be the set of all essentially bounded real valued measurable functions defined on X.Once again, we write L ∞ (X) or L ∞ instead of L ∞ (X, B, µ) when (X, B, µ) is understood.In this case, we define the norm where ess sup denotes the essential supremum of f and is defined in the usual way, namely, ess sup(f ) = inf{M ; µ{x ∈ X; f (x) > M } = 0}.Note that if f is actually bounded, then the supremum and the essential supremum coincide and so there is no conflict of notation here.Note also that, for all values of p, 1 ≤ p ≤ ∞, we follow the common practice of identifying two functions in L p (X) which are equal µ-almost everywhere.Now suppose that (X, B, µ) is a measure space, and that T : X → X is a mapping on X which is measurable with respect to B, that is, T −1 (A) ∈ B for each A ∈ B. We want to consider the possibility of defining the composition operator L(T ) : L p (X) → L p (X) by setting L(T )(g) = g • T , as before, but there are a number of technical obstacles to overcome.First, although g • T will be measurable for each function g ∈ L p (X), it need not be the case that |g • T | p is µ-integrable for a given g ∈ L p (X), that is, X |g • T | p dµ need not be finite.Second, even if |g • T | p is µ-integrable for each g ∈ L p (X), it need not be the case that L(T ) is a bounded operator.
There are some important situations, which we consider, when both of the obstacles just mentioned can be overcome.The first of these, which we discuss in a number of simple examples later, is to take the measure consisting of unit masses placed at each point of a finite set.In this case, integration just amounts to summation, and the operator norms of interest coincide with well-known matrix norms.The second important case is when X is a compact metric space and T is continuous.In this case, by the classical theory of Krylov and Bougliabov, T has invariant Borel probability measures.Thus, there are measures µ defined on the Borel sets of X such that µ(X) = 1, and µ(T −1 (A)) = µ(A) for each Borel set A ∈ B. Indeed, this latter condition is equivalent to the condition that X (g • T ) dµ = X g dµ for each measurable function g.In fact, such invariant measures µ are the limit points of the sequences of Cesaro sums 1 n n−1 i=0 T i * ν in the weak topology on P (X) determined by the functions ν → X f dν, where f is a continuous real valued function.Here, P (X) denotes the space of probability Borel measures on X, and T * ν denotes the Borel probability measure on X defined by (T * ν)(A) = ν(T −1 (A)) for each A ∈ B, where ν ∈ P (X).Finally, if µ is an invariant measure for T , then because of the identity

METRICS DETERMINED BY COMPOSITION OPERATORS
Let T 1 , T 2 : X → X be mappings on X.As already suggested, we want to use the expressions (1) Let X be a set and let T 1 , T 2 : X → X be mappings.We define d(T 1 , T 2 ) to be B(T 1 ) − B(T 2 ) ∞ .Then d is a metric.
(2) Let X be a compact metric space, let T 1 , T 2 : X → X be continuous, let µ be a Borel measure on X and suppose that L(T 1 ), L(T 2 ) : L p (X) → L p (X) are bounded linear operators.We define the distance function Then d is a pseudometric, and is a metric if we identify those functions T 1 and T 2 which are µ almost everywhere equal.

PSEUDOMETRICS AND METRICS FOR LOGIC PROGRAMS 5.1. Generalities
Let L be a first order language.We will suppose that all logic programs P we discuss have underlying language L, although, of course, a given program P need not employ all the symbols of L. We refer the reader to [8] for background concepts and our notation concerning logic programs.
In fact, it will be convenient to treat each logic program P as the set ground(P ) of all ground instances over B P of each clause in P , where B P denotes the Herbrand base of all ground instances of atoms occurring in P .(This is common practice and we will follow it without further mention.)Each logic program has associated with it a number of semantic operators which capture many of its properties, and the one we work with here mainly is the usual two-valued immediate consequence operator T P : I P → I P which we define below, where I P , or more precisely I L , denotes the set of all two-valued interpretations for L. We endow I P with the Cantor topology in which I P is homeomorphic to the usual Cantor set in R, and hence is a compact metric space, see [12].It turns out that T P is sometimes continuous in this topology, but not always, see [12] again, but is always Borel measurable, see [5].
We are thus in a position to consider the distances, d(P 1 , P 2 ), between two logic programs P 1 and P 2 as given by Proposition 4.1 by taking T 1 and T 2 to be T P1 and T P2 respectively.Definition 5.1 Let P 1 and P 2 be logic programs with underlying language L. We define the distances d ∞ (P 1 , P 2 ) and d p (P 1 , P 2 ) between them by setting d ∞ (P The distance functions just defined are not actually metrics since d(P 1 , P 2 ) = 0 does not necessarily mean that P 1 = P 2 .What it does mean is that T P1 = T P2 .Indeed, it is readily checked that d ∞ and d p satisfy all the axioms for a metric other than the property just mentioned, and hence are pseudometrics.We therefore suppose that the procedure for obtaining a metric from a pseudometric discussed in Section 2 has been carried out, and refer loosely to the metrics d ∞ and d p on the set of all programs whose underlying language is L, and denote them collectively just by d.
In fact, for definite programs, the equivalence relation just described above coincides with Maher's notion of subsumption equivalence, see [9], since two definite programs are subsumption equivalent if and only if T P1 = T P2 .Hence, for definite programs the identity d(P 1 , P 2 ) = 0 has a well-established meaning within logic programming, and in fact implies that the two-valued fixedpoint semantics for P 1 and P 2 coincide.More precisely, it implies that the least fixed points of T P1 and T P2 coincide, and hence that the least Herbrand models for P 1 and P 2 coincide and in turn this means that P 1 and P 2 compute the same things.Indeed, we expand on this point a little next to clarify the situation.Suppose that P is a normal logic program, and let In [9], Maher defined the notion of subsumption equivalence for definite programs containing variable symbols (recall that a definite program is one in which neg(C) is empty for each clause C).As far as sets of ground clauses are concerned, his definition amounts to the following.Suppose that C and C are two clauses with the same head A, say.We say that C subsumes C if pos(C ) ⊆ pos(C).One then says that a program P 1 is subsumed by a program P 2 if each clause of P 1 is subsumed by a clause of P 2 , and that P 1 and P 2 are subsumption equivalent if each program subsumes the other.A main result concerning these notions, as already mentioned, is that P 1 and P 2 are subsumption equivalent if and only if T P1 = T P2 .We want to work more generally with normal logic programs.An obvious extension of the definition of subsumption to this case is as follows.If C and C are two clauses over L with the same head, then we say that C subsumes C if pos(C ) ⊆ pos(C), and neg(C ) ⊆ neg(C).The remaining definitions are extended in the obvious way, and one has the following result.The reverse inclusion is obtained similarly.Moreover, the same argument shows that T P1 (I) = ∅ if and only if T P2 (I) = ∅.It follows that T P1 (I) = T P2 (I) for all I and hence that T P1 = T P2 , as stated.
Consider the following example; it shows that the converse of the previous proposition fails.and let P 2 be as follows The following claims are easily checked: T P1 = T P2 ; T P1 and T P2 are not constant; no clause in either program is redundant in the sense that removing any clause from either program changes the corresponding immediate consequence operator; no atom in any clause is redundant in the same sense (hence, these programs are in some sense irreducible); the first (and second) clause in P 1 is not subsumed by any clause in P 2 ; the first (and second) clause in P 2 is not subsumed by any clause in P 1 .Furthermore, there is no obvious modification of the definition of subsumption just using simple containment of the sets pos and neg which overcomes the previous observation.Finally, consider the following program We see that T P = T P1 = T P2 and that P is definite.Again, no clause and no atom in P is redundant.This time, however, P subsumes both P 1 and P 2 , but, conversely, neither P 1 nor P 2 subsumes P .
We note in passing the well-known fact that if P is definite, then T P is monotonic and even continuous in the order-theoretic sense, but that the programs used in the previous example show that the converse of this is false.As already noted, the converse of the previous proposition does hold for definite programs, and for the sake of completeness we prove this fact next in our present terms, that is, not using variable symbols nor theorems concerning addition of constants to L, see [9].
Proposition 5.4 Suppose that P 1 and P 2 are definite programs.If T P1 = T P2 , then P 1 and P 2 are subsumption equivalent.
Proof.Let C be any clause in P 1 .Suppose that head(C) = A and let I be the (possibly empty) set pos(C).Then we immediately have that A ∈ T P1 (I).Hence, A ∈ T P2 (I) and therefore there is a clause C in P 2 whose head is A and is such that I |= body(C ).But this latter statement simply says that pos(C ) ⊆ pos(C) and hence C subsumes C. By the symmetry of this argument, we obtain that P 1 and P 2 are subsumption equivalent, as required.

Further Generalities
One of the virtues of introducing the operators B(T ) and L(T ) is that they can be used to study properties of T and hence to study, ultimately, the underlying programs.This point is quite general, of course, provided the representation of programs in terms of operators is sufficiently faithful.In this context, we have the following result.
Proposition 5.5 Let P 1 and P 2 be normal logic programs.Then We further illustrate the points just made by showing next how periodic and fixed points of a transformation T : X → X are simply related to the corresponding entities for B(T ) and L(T ).This observation then applies to any of the semantic operators T P , Φ P and Ψ P .In fact, we just work with B(T ) since the same facts for L(T ) can be established in the same way.Suppose that T : X → X and form the operator B(T ) : B(X) → B(X) as usual, where B(T )(g) = g • T : Consider the following calculation, where g ∈ B(X) is arbitrary.
by associativity of composition of functions.Again, . Thus we have, by induction, the following identity for all n ∈ N which shows that we can transform iterates of T into iterates of B(T ), and vice-versa.For example, if T is periodic with period n, that is, T n = Id X for some n ∈ N, then it is clear that B(T ) n = Id B(X) and so B(T ) is periodic with period n.Conversely, if B(T ) is periodic with period n, the same identity shows that T is periodic with period n.
Next, we relate fixed points of T to fixed points of B(T ), as follows.
Proposition 5.6 Let x ∈ X.Then x is a fixed point of T if and only if we have the identity for all g ∈ B(X).
Proof.If T (x) = x, then immediately we have that B(T )(g)(x) = g(T (x)) = g(x) for all g ∈ B(X).

A Simple Example
We close by considering two simple example programs illustrating the theory.Despite their simplicity, they reveal two interesting points.
To make the examples here genuinely simple, we suppose that no function symbols are present in our language and therefore that the sets I P are finite, I P = {I 1 , I 2 , . . ., I n }, say, where n depends on the number of constant and predicate symbols present in P .In this case, F(I P ) is naturally equivalent to R n under the identification of f : So, F(T P ) is then an n × n real matrix A = (a ij ) obtained in the usual way as the matrix of a linear operator.We want to consider the L p norms in this case, 1 ≤ p ≤ ∞, and also the supremum norm, ∞ .These norms are all equivalent due to the fact that R n is finite dimensional, but their actual numerical values are of interest; we want to consider the corresponding operator norms, F(T P ) p and F(T P ) ∞ , determined by the norms on R n .We consider here just the norm 1 , where we take the counting measure on I P .Thus, x 1 is in fact the sum n i=1 |x i | for x = (x 1 , x 2 , ..., x n ) ∈ R n .In this case, F(T P ) 1 coincides with the maximum column sum norm, The Sixth International Workshop in Formal Methods (IWFM'03) Example 5.7 For example, take P 1 and P 2 to be, respectively, the following programs q(a) ← q(a) ← q(b) ← q(c) q(b) ← ¬q(c) The underlying language L for both these programs contains the three constant symbols a, b, c and the unary predicate symbol q.Thus, in both cases the Herbrand base B L is [q(a), q(b), q(c)], which we write as a list since the order matters for our present purposes.Hence, there are eight distinct subsets of B L , which we list as follows (again, the order matters): Our convention here relates to the representation above of B L as a list and, for example, the set denoted by the list [0, 1, 1] is the set {q(b), q(c)} (q(a) is not contained in the set because 0 is the first element of the list, and q(b) and q(c) are elements because 1 is in the second and third position of the list [0, 1, 1], and so on).Thus, F(I P ) is equivalent to R 8 .Indeed, a function f ∈ F(I P ) maps each of the sets in I L to a number in R, and so each f can be represented as an 8-tuple of real numbers (x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 ), where f ([0, 0, 0]) = x 1 , f ([0, 0, 1]) = x 2 , etc. Conversely, an 8-tuple of real numbers determines an element of F(I P ) in the obvious way.Bearing in mind the definition of T P1 and T P2 , we now consider the effect of F(T P1 ) and F(T P2 ) on the standard basis {e i ; i = 1, . . ., 8} for R 8 , where e i is the vector consisting of zeros in all places but the ith, which contains 1, and we regard the standard basis as an element of F(I P ).Then the jth column of the required matrices is obtained from the coordinate vector of F(T P k )(e j ), k = 1, 2, relative to the standard basis.For example, (e , 0, 0]) = 1 etc.Thus, the fifth column of the matrix representing T P1 is the fifth column of the first matrix below, and we proceed similarly to find the other columns and thereby obtain the following matrices.

CONCLUSIONS
We have made two contributions.In the first, we have shown that logic programming semantics can be studied within the framework of Banach spaces and linear operators.Since Kozen [6] showed how to do this for deterministic and probabilistic programs, see also [7], it can now be seen that this framework is rather general in that it provides a semantic framework for all three of the computing paradigms just mentioned.In this process, the denotation of a program is a bounded linear operator.The view is often expressed that the full objective of formal methods (complete proof of correctness of software) is not feasible.On the other hand, software metrics provide only a rough measure of how different two programs are or how far a program is from some ideal.The framework described in the previous paragraph provides a means of defining metrics, based on semantics, with the potential to formalize the process of comparing two programs, and this is our second contribution.In realizing this programme of research, initiated in [4], a number of criteria have to be met and a number of obstacles have to be overcome if software metrics are to be formalized; some of these criteria are as follows.
1.The operator denoting a program must faithfully reflect properties of the program, especially its semantics.
2. The distance between two programs must be meaningful in terms of program development.
3. It must be possible/feasible to calculate the distance between two programs.
The initial results presented here, at least for logic programs, give some positive support for this programme.In fact, the proposal envisaged in [2,4] is even more ambitious than that described here in that it proposes the use of optimal approximation methods, based on techniques of mathematical analysis, to find "a reasonable practically acceptable approximation to the ideal solution".Whether or not any of this is really feasible remains to be seen.Finally, we note that, aside from the issues discussed above, there are several purely mathematical questions raised by our work, which lack of space prevents us from mentioning.However, one example is the following: the process of embedding a domain inside a Banach lattice is going in the opposite direction to the known problem of representing classical spaces (in this case a Banach space) as the maximal point space of some domain in the Scott topology.It should prove to be of interest to investigate this and many related questions.

Proposition 4 . 1
∞ (denoted simply by B(T 1 ) − B(T 2 ) ∞ ) and d(T 1 , T 2 ) = L(T 1 ) − L(T 2 ) p,p (denoted simply by L(T 1 ) − L(T 2 ) p ) to determine metrics on the collection of all mappings on X.Consider the distance functions d defined below on the collection of all mappings from X to X.
denotes its body, body(C).We let pos(C) denote the set {A 1 , . . ., A n } of positive or unnegated atoms in the body of C and let neg(C) denote the set {B 1 , . . ., B m } of negated atoms in the body of C. We recall that in classical two-valued logic, interpretations for (the underlying language L of) a logic program P are identified with subsets I of B P , and hence the set I P of all such interpretations can be identified with the power set of B P .In these terms, we define the immediate consequence operator T P : I P → I P determined by a program P by setting T P (I) to be the set of A ∈ B P for which there is a (ground instance of) a clause C of the form A ← body(C) satisfying I |= body(C).Of course, I |= body(C), that is, body(C) is true in I precisely when pos(C) ⊆ I and neg(C) ⊆ I c , where I c denotes the complement of I in B P .

Proposition 5 . 2
If P 1 and P 2 are subsumption equivalent, then T P1 = T P2 .Proof.Let I be an arbitrary interpretation for L. Suppose that T P1 (I) = ∅ and that A ∈ T P1 (I).Then there is a clause C in P 1 with head(C) = A and such that I |= body(C).By hypothesis, there is a clause C in P 2 such that C subsumes C.But then head(C ) = A, and we immediately have that I |= body(C ).Therefore, A ∈ T P2 (I) and we have T P1 (I) ⊆ T P2 (I).

Example 5 . 3
Let P 1 be the following program p(a) ← p(a) p(a) ← ¬p(a) p(b) ← p(a)