Checking the Consistency of UML Class Diagrams Using Larch Prover

The Unified Modeling Language (UML) has been designed to be a full standard notation for Object-Oriented Modelling. UML is a rather complete set of notations, but it lacks of formal semantics. This article introduces formal semantics for UML based on algebraic abstract data types. We currently consider only class and object diagrams. These diagrams include class structures, associations, multiplicities, constraints, instances as well as specialization relationships. We give a formal semantics for each of these elements by interpreting the structure of a class as an abstract data type, associations as values of type Association, and specialization as structural projection. We show that a tool like Larch Prover is able to support proofs over UML diagrams. We use the critical pair computation to find out inconsistencies. Several different inconsistencies of class diagrams are shown on a library example.


Introduction
The Unified Modeling Language (UML) [1] has been designed to be a full standard notation for Object-Oriented Modelling.UML is a rather complete set of notations, including nine diagrams that cover all the semantics of the object paradigm.The price to pay is an informal semantics: ambiguous concepts (e.g. the semantics of aggregation [2]), concepts overlapping the diagrams (e.g.class vs actor).As a consequence, the modelling is seldom complete and often includes many inconsistencies inside the diagrams or between them.The definition of a formal semantics for UML is a mandatory step for the normalization of UML to be mature, as underlined by the pUML group 1 .
Such a definition is a difficult task because the notation is rich, complex and informal.The class diagram being the reference point of the notation, any formalization must start with this diagram.The current proposals are mainly based on state-based languages like Z [3,4].We choose an algebraic approach because it is more abstract than statebased style languages.Another reason is the number of existing tools and theoretical frameworks.From an historical viewpoint, object-oriented languages and abstract data types (ADT for short) share a common ancestor: Simula [5].The two techniques were developed independently, however some works have tried to merge them: FOOPS [6], and Eiffel [7].The main and first approach of a semantics for UML with ADT was [8].It reuses a previous work about OMT [9].The authors use the Larch Shared Language.The present work is related to these approaches but we focus on a more operational approach and the checking of consistency within the Larch Prover environment.There are also some other differences, mainly in the semantics given for association, composition and inheritance.We introduce a general semantic framework for UML class and object diagrams.The current work includes class related elements (class, association, constraints) and their specialization as well as objects and links.We assume the reader to be familiar with the corresponding UML notations.
A great part of this paper is devoted to the use of the Larch Prover [10] (LP for short) to check the correctness of class diagrams.The consistency of such diagrams is not easily checkable.One reason is the possible use of code annotations to denote constraints on the elements of the diagram.Code annotations may be written in programming languages or rather in OCL (Object Constraint Language).OCL is a kind of first order functional language with collections and iterator facilities.In this paper, code annotations will be LP expressions in order to lighten the translation process.The use of OCL with a similar translation to [11] would be a good way.The problem of consistency of such class diagrams is related to the consistency of first order predicate logic.This kind of problem is not decidable [12].It exists several useful approaches to detect either consistency or inconsistency.For example, techniques based on Herbrand's Theorem may detect inconsistency [13], but they require process termination since this problem is semidecidable.Here we use LP and under some hypotheses it can prove a system consistent for equational logic.These hypotheses concern the use of quantifier free formulas, the termination of the rewriting system and the termination of the Knuth-Bendix completion algorithm.In practice it is difficult to prove consistency, and it is often easier to detect inconsistencies with the critical pair computation.
The structure of the paper is the following.Section 2 describes the context: algebraic specifications of data types and Larch Prover.We briefly explain the method we use; it is based on the critical pair computations.The third Section gives the main elements of our translational semantics of UML class diagram into algebraic specifications.In Section 4 we illustrate our method on different examples where we have detected inconsistencies.We conclude by lessons about these experiments, some related works and future extensions.

Algebraic Specifications and Larch Prover
Algebraic specifications have been used for the definition of ADT and the formal specification of safety systems for a long time.The reader will find a rather comprehensive introduction and survey about these formal specification techniques in [14].
Larch Prover [10,15] is a the proof assistant2 based on the Larch Shared Language.It allows one to define algebraic specifications, to use rewrite rules and to prove properties.However it supports neither partial algebras nor genericity.Some modifications are done to our specifications in order to fix these problems.The genericity problem is simply fixed by a copy and replace mechanism.Partial operations are left unspecified, we generate axioms with terms which are assumed to be well-defined.
LP has a set of proof tools, mainly they are: rewriting, critical pair computation, Knuth-Bendix completion procedure, proof by induction, proof by contradiction, and proof by case.LP supports first order predicate calculus with equality.
An algebraic specification of a data type is composed of three parts: a heading, containing informations about the module: the name (or sort) of the type, the imported modules, and the constructor names (or generators), a signature which describes the operation syntax, and the axioms which describe the operation semantics.
An example, with the Assoc[A, B] ADT, is depicted in Figure 2. Note that operations of a defined type are classified into internal operations and observers.An internal operation is an operation whose resulting type is the defined type.An observer has not the defined type as resulting type.A generator is a particular internal operation needed to build the values of the type.We use an informal syntax close to the Larch Prover one.Axioms are conditional: cond => t = u; where t, u are algebraic terms and cond is a conjunction of ti = ui equations.Larch Prover allows one to write first-order expressions where~is logical not, ^is and, _ is or, => is implication, = is equality.Existential (respectively universal) quantified propositions are possible with the nE (respectively the nA) prefix.The general way, in LP, to check consistency is to compute the critical pairs.A critical pair is an equational deduction in a set of rules.It can be either a new fact forgotten in the system or an irremediable inconsistent fact  (for instance true = false).The command critical-pair <setr1> with <setr2> computes critical pairs between two sets of rewriting rules.The complete command uses the well-known Knuth-Bendix completion algorithm 3 .This algorithm computes all the critical pairs and adds them into the system.The process may stop with an inconsistency.When the process terminates without inconsistency, the system is consistent.Otherwise the process does not terminate, and we cannot say anything about the consistency.Carefully note, consistency is related to equational logic not to predicate calculus.Strictly speaking, it needs a logical system with only quantifier free equations and rewrite rules.However, first-order formulas can be transformed (by skolemization) into quantifier free formulas.

Translation of UML Class Diagrams
In this section we sketch a translational semantics for UML class diagrams.More details and additional elements (aggregation, composition, association as class, abstract classes, instance management or the specification file organization) can be found in [16,17].A class diagram is translated into algebraic definitions.We do not take into account code inheritance and side effects, hence a purely functional approach with identities is pertinent here.The existing instances of a class equate the values of the ADT associated to that class.To define the semantics of a class, two types are needed: the effective type and the maximal type.The effective type is the set of the effective instances of the class.The maximal type is the set of all instances of a class coping with subclasses.This provides an approach where implicit and explicit descriptions of instances and associations are possible.In order to define the mapping rules from class diagrams to algebraic ADTs, the syntax and semantics of each diagram element is evaluated.We give in the following sections the description of some of these mapping rules and the corresponding resulting algebraic constructions.

From Class to ADT
We assume that classes are formalized by ADTs.Basic types are supposed to exist (imported ADTs).To explicit the connection between ADT and Object-Oriented programming, we reuse the formal class principles [18].A UML object has an implicit identity.The identity value of an instance of class C is defined by the IdC associated type.An implicit hypothesis is: two values represent the same object if and only if they have the same identity.As in programming languages there are two equalities: an object equality (eq) which is based on identity equality, and a semantic equality (equal) which is based on equality of attribute values.
As usual, the first one implies the second one.Taking class C, the following mapping rules are defined.A single newC constructor represents the instantiation process.Instance attributes are translated into argument types in the constructor signature.For each argument type of the constructor, a primitive observer, called field selector, is defined.Where sel1:T1, ..., seln:Tn are the typed attributes of class C. Primitive observers are operations related to the main aspect of the ADT which semantics is defined in terms of the constructor.

Association and Multiplicity
The ADT of Figure 2 describes a part of a general type which potentially defines any association between two generic types A and B.
The eq operator defines an intensional equality between associations.Of course this type can be completed with other operators, for instance an equal operator.An association is a value of type Assoc[A, B] with a name, a set of links and association properties (multiplicity, attributes, etc).We have a higher-order association type which allows one to define many general associations, to compute multiplicity, to search links or to search left or right associated objects.It may be specialized and adapted to different situations by adding operations and axioms.Of course it is possible to extend this type to n-ary associations.To define a particular association we declare a new value with a name.

Navigation and Role
In order to express explicit navigation, it is necessary to define a new operator.The same schema allows the definition of roles for the association.

Code Annotation and Constraint
It is possible to define general constraints on classes, operations or relations using code annotations.The UML Guide suggests to use OCL or a programming language to express these code annotations.Our code annotations are LP expressions because they may easily be integrated into the target LP specification.As an example we define a general rightXor constraint on two associations of type A-B and A-C respectively.In the previous example, axiom writing is only limited by first order expressions.Since explicit quantifiers may be used (see examples later) we achieve a great degree of expressiveness.

Inheritance
Specialization is made up of two parts: structural inheritance (attributes and associations) and behavioural inheritance (operations).In this paper we only deal with the structural part.Structural specialization includes: adding new attributes or associations, strengthening the invariant constraint, redefining the attributes or associations.However, we assume that the multiplicity intervals are not redefined.We consider an extended UML approach with multi-covariant redefinitions.Therefore type problems may arise in this framework, even in a purely functional approach [19].These problems are well-known and some solutions exist [20,21].Once translated into our algebraic framework, we assume that our specifications do not carry type-checking problems.

Subclass Translation
Let us consider the specialization between class C and SubC one of its subclasses.To simplify the example we have no attribute redefinition, but this is allowed in our approach.The SubC defines a proper structure: sub1: S1, ..., subm : Sm.We translate the class to get the following ADT:  As previously explained in translation of class C, we add axioms for eq and equal in the ADT associated to class SubC.In fact, the specialization of equalities is subtle because object equalities are symmetric and polymorphic.Some solutions exist in [22,20].Here we simply add declarations to take into account the parameter polymorphism.

Structural Projection
An instance of SubC can be viewed as an instance of C. We formalize this relation by a structural projection which comes from a previous work [23].The interpretation of specialization is done according Figure 5. Specialization of structure defines a mapping between the arguments of the ADT constructors.We assume this mapping is: the Uniqueness of the superclass instances is based on semantic equality, namely the equal predicate.These properties express an isomorphism between the equivalence classes associated to the equal C relation in class C and subclass SubC.We consider PfSubC,Cg as a relation between class C and SubC to be more uniform with associations.Note that it can be seen as a total function from SubC to C. In general it is a many-to-one function and it is not onto.

Association and Specialization
Whenever an association is defined, we expect it to be inherited in some way.This has only effect on structural specialization.An adequate notion of association specialization seems also required.A more comprehensive discussion about this topic exists in [16].
In Section 3.2 we did not handle class polymorphism, but only monomorphic associations between types.From now on, such an association will be noted assoc m .A general definition of a polymorphic association (assoc p ) between two classes is a set of monomorphic associations: where stands for subtyping, A i and B j are subclasses of (respectively) A and B, and assoc m A i B j ] is a monomorphic association between the types associated to A i and B j .We give a semantics of inclusion to the specialization of associations.Let asub and assoc be two associations with the same name, and A 0 B 0 A B, an association asub p A 0 B 0 ] is a specialization of another assoc p A B] iff asub p A 0 B 0 ] p assoc p A B]. Inclusion of polymorphic associations is defined as: asub p A 0 B 0 ] p assoc p A B] () 8 A 0 i B 0 j A 0 B 0 asub m A 0 i B 0 j ] m assoc m A 0 i B 0 j ] where m stands for the inclusion of monomorphic associations 4 .Much work still has to be done in order to get more useful and specific definitions, and also to relate them to association classes.

Tool Integration
To allow the automatic generation of algebraic specifications from a UML diagram, we developed an UML case tool carrying the mapping rules integrated in its context.The case tool [24] was built in Graphtalk meta-tool 5 .We have not yet translated the OCL language into algebraic specifications, but [25] demonstrates its feasibility.

Checking UML Diagrams Consistency Using LP
As we explained before, we use LP to detect inconsistencies in class diagrams.We often need to prove auxiliary facts and LP is of great help in such cases.In this Section, we illustrate the use of LP to check class diagrams.Of course in some examples the problem is quite obvious because the wrong elements have been extracted from larger specifications.Ordinary diagrams contain many classes within different packages so that a tool is necessary to detect the problems.

Examples about Association Consistency
The following examples present inconsistencies with associations, constraints and multiplicities.There are different kinds of consistency problems in UML diagrams, like problems with types of associations (e.g. an inclusion constraint needs that association types are the same).Some problems can be detected by type-checking or multiplicity-checking.However, this is not generally sufficient because the specifier is able to define any kind of constraint, and some constraints need a more complex checking.This is illustrated by the last example of this Subsection.

A Constraint and Multiplicity Problem.
The practitioner usually writes separately the multiplicities and the constraints.Sometimes, their conjunction is inconsistent.Let consider the example of Figure 6, where A = Person, B = Book, assoc1 = author and assoc2 = reviewer.A person may be author of one book or reviewer of one book, but not both.It can be simply expressed by the P1 existential proposition: The rightNand constraint may then be defined by proposition P2: A Simple Proof.Below is a simple proof done with the help of the Larch Prover.The assertion says that if the multiplicity is equal to one then there is a link in the association.
prove equal(rightCardinality(Xab, a), 1) => isRightLinked(Xab, a) % these are LP comments % The above command tries to resume a proof by induction resume by induction on Xab <> basis subgoal % This means ok for basis case [] basis subgoal <> induction subgoal % Try to prove the induction step by implication resume by => <> => subgoal % Try to prove by case res by case identity(a1c) eq identity(ac) <> case identity(a1c) eq identity(ac) [] case identity(a1c) eq identity(ac) % Try to prove by the opposite case <> case ˜(identity(a1c) eq identity(ac)) % Addition of a trivial lemma assert ˜(identity(ac) eq identity(a1c)) % Critical pair computation to finish the proof Proofs are usually not automatic, an expert user must choose the way to process them.We can now show that the previous UML model is not consistent by defining two associations as1(a), as2(a): declare operator as1, as2 : A -> AssocAB % eliminations of \E fix Yab as as2(a), Xab as as1(a) in P1 % elimination of \A instantiate Xab by as1(a), Yab by as2(a) in P2 LP says that the system becomes inconsistent It illustrates one fact: the rightNand constraint is not compatible with two multiplicities of one.The previous example easily extends to a class A rightNand associated to class B1, B2, .. Bn.There are similar problems with xor or other constraints.This gives rules to define a kind of multiplicity-checking.Note that there is a more straightforward way which makes explicit use of the critical pair computation as in the following examples.

Inconsistency without Multiplicity Problem
We illustrate a simple example with an inconsistency, but multiplicity-checking would not find it.In this example C1 is a constraint which states that each link in assoc1 appears either in assoc2 or in assoc3 and not in both associations.The constraint C2 states that the associations assoc2 and assoc3 are equal.It may seem that this is a consistent example, but it is not.Since multiplicities are strictly greater than zero, the associations are not empty.But the C1 relation and the equality between assoc2 and assoc3 imply that these associations are empty.Below are the corresponding axioms, auxiliary facts and the inconsistency proof.

Examples with Specialization
Specialization adds also some potential problems with multiplicity and specialized association.

Wrong Multiplicity or Wrong Constraint
The specialization semantics of Section 3.5.3implies that the multiplicity intervals of the specialized association must be included or equal to the ones of the base association.Hence Figure 8, example a is inconsistent.Furthermore, some constraints are inconsistent with specialization, for instance Figure 8, example b.We may check these two examples and detect inconsistencies.In case 8.a a critical pair computation finds the problem, but a complete command will diverge.In case 8.b a complete command will prove that asub has to be empty, and if not an inconsistency is detected.

Constraint and Specialized Associations
We present now a more complex example where an association (assoc) is specialized twice, see figure 9. Association specialization is defined in page 8, the LP translation of this example is presented below.In this context, we detect an inconsistency if the redefined associations contains one link with two objects equal with respect to the equal operation of class A. However, this inconsistency is more difficult to prove.

Synthesis
Using our algebraic approach, we have checked several kinds of inconsistencies: multiplicity problem with derived association, inconsistency between multiplicity and constraint (see Figure 6), inconsistency without neither multiplicity nor type problem (see Figure 7), problem with multiple and disjoint specialization, problem with circular composition, problems with specialization, multiplicity and constraint (see Figure 8), ROOM 2000 problem with constraint and specialized association (see Figure 9).We have studied different examples and proved their inconsistency, but the results are under the influence of the form of the axioms and the properties known by the system.In the previous examples, the following schema is able to detect any inconsistency: 1. translate the diagram into LP, 2. prove some useful consequences and add them in the system, and 3. check the consistency with complete or critical-pair commands.
However, this is still far away from an automatic strategy to check inconsistency.On the other hand, our approach already gives us some advantages.One has a general approach (for expert only) which allows to define any constraint, to prove properties and to detect inconsistency.Our approach can be adapted to different contexts and may be used to improve or to give formal semantics to UML concepts.One can define and prove simple static checking based on the compatibility between multiplicity and constraint.For instance Nand is not compatible with 1-1, composition or transitivity imply that multiplicity are in relation of integer times, and the inclusion constraint implies that multiplicities are lesser or equal.Sometimes, an automatic strategy to detect problems exists, and it can be implemented in UML tools.Clearly, type-checking is a first and simple level of checking.It is able, for instance, to detect problems in derived associations or ill-written constraints.The consistency of multiplicity with not any code annotation is decidable, see for example [26] for an algorithm to check it.If the model does not contain any user code annotation and if it is consistent for multiplicity, then our translation generates a rewriting system which is terminating and consistent.Another control is to check the multiplicity consistency with association constraints.[27] described a language suitable to Entity-Relationships and Data Base models together with a decidable logic for it.This work can be useful to check some parts of the UML static diagrams (cardinality and predefined constraints).However, [27] deals only with the non recursive structural aspects of objects and does not include any feature for the specification of behavioural properties of objects.This is insufficient because invariant, derived association, constraint are behavioural specifications of UML class diagrams.

Related Work
One important source of information, the first in this area, is [9].It is related to OMT and the use of the Larch Shared Language (LSL).LSL is an abstract language for specification, which is different from the Larch Prover tool language.The authors study in depth classes, associations, cardinality, aggregation and many aspects of OMT.They also consider notions of state and errors.Following this work, Hamie, Howse and Kent apply it to the UML in [8].Using this framework they also study navigation and OCL [11].They use the LSL in a modular way and they give semantics to some static and dynamic parts of UML, including state changes and time.
Our definition is closer to algebraic specifications for data types and therefore operations are quite easy to define.[8] have an abstract approach but they do not guarantee the usual properties about algebraic specifications such as consistency and sufficient completeness.For us, it is not sufficient to write formal specifications, we want to be able to check them.We have a constructive, more operational and more precise approach, as indicated by the choice of generator, the notion of sensible signatures, and the two equality functions.It is difficult, and even impossible, without any additional information, to prove properties in the [8] approach.
We interpret neither OCL nor dynamic diagram.Our current approach seems not to be adequate for the semantics of a dynamic diagrams.We plan to use other works based on GAT [28].In our context, equality is important because, for example, composition needs object identity and inheritance needs semantic equality.We have an explicit instantiation process; instances and associations can be implicitly or explicitly defined.Associations in [8] are interpreted as two functions that map an object of one type to the set of associated objects of another type.In our approach associations are first class values, hence Assoc[A, B] is a true and constructive type.This is not true in [9,8].We think this is more flexible and natural for UML, and that makes the integration of other constructions easier.For example, there is neither generic constraint nor association class in [8].In this work there are no semantics for composition ROOM 2000 either.We also have a proper semantics for specialization which improves the one of [9] (reused in [8]).This former work defines the simulate function which asserts that a subclass is a subtype of another class by requiring that associated sorts are in subsort relationship.The simulate function maps an object identifier of the subclass to an object identifier of the super class that behaves like it.The [9] and [8] works take into account only the maximal type, in addition we formalize the effective type.This provides an approach where implicit and explicit descriptions of instances and associations are possible.We also propose a definition for the structural projection based on semantic equality.It allows us to define associations and inherited code in an operational way.Other related works are [29,30].

Conclusion
Our approach gives a formal semantics to UML class diagrams using abstract data types.It is a successful approach on this sub part of UML.Our work is complementary with previous works and it improves them in several ways: it suggests first-class association, it allows association constraints, it formalizes structural inheritance, it gives a useful and rigorous definitions to specialized association.We also show that the use of a tool like Larch Prover helps in the proof of properties and also can detect some ill-formed designs.Further work will concern methodologies to prove properties.We want to focus on an automatic detection of inconsistencies, and to be able to show schema equivalence.The use of LP is interesting, but actually it surely cannot handle very big examples.We consider that a practical limit should be around 1000 rules in LP.Therefore we have to optimize our translation schema and define new automatic strategies to check examples.Looking at the translational semantics, the current work is able to translate the OCL language in LP and to include the semantics of the operations.The difficulties are not in the translation of OCL basic types, predicates, navigations or collections, but merely in the semantics of subtyping.The semantics of the operations is natural for ADT, it is expressed by axioms.The difficulty comes from "code inheritance".A first approach uses copies of axioms and operator overloading.Formal specification of the meta-level is also a future extension of this work.

Figure 5 :
Figure 5: Structural Specialization first n arguments (except identity) of the newSubC constructor corresponds to the n arguments (except identity) of the newC constructor.When a subclass is defined we add the definition of the structural relation between the ADTs corresponding to the classes.If SubC inherits from C then we define the PfSubC,Cg relation between the ADTs associated to SubC and C: PfSubC,Cg(newSubC(id, X1, ..., Xn, Y1, ..., Ym), newC(jd, Z1, ..., Zn))= equal(X1, Z1) ^... ^equal(Xn, Zn) This defines a partial ordering on abstract data types parallel to the class specialization relationship.This definition is automatically computed once the specialization relations are known.This relation verifies: 8s 1 s 2 : Su b C (equal C (s 1 s 2 ) () 9c : C PfSubC Cg(s 1 c ) ^P fSubC Cg(s 2 c )) ROOM 2000 where equal C is the equality predicate for two instances of class C. It means that the specialization relation defines an equivalence relation on the subclass instances.Furthermore this relation yields the following property: 8s : SubC 8c 1 c 2 : C PfSu b C Cg(s c 1 ) ^P fSubC Cg(s c 2 ) = ) equal C (c 1 c 2 )

Figure 8 :
Figure 8: Wrong Examples with Specialization

Figure 9 :
Figure 9: Example with Constraint and Specialized Associations