Methods Integration Data Modelling in ZIM

Data modelling is an important part of the software development process. In this paper we propose the ZIM (Z In data Modelling) method to do data modelling. ZIM is an approach where structured or object-oriented requirements speci(cid:12)cations given as entity-relationship diagrams are transformed to formal Z speci(cid:12)cations. Techniques from the world of relational data modelling are used to guarantee that the generated Z speci(cid:12)cation is well structured and in a certain sense minimal. The ZIM method is exempli(cid:12)ed via a case study.


Introduction
In this paper we present a new approach, ZIM, to do data modelling. In ZIM, structured, object-oriented, and relational methods are used when creating formal Z specications. The ZIM method uses an entityrelationship diagram as the starting point. The main goal in ZIM is to transform such a diagram to a Z specication that fullls the following: Unnormalized data is removed from the objects (i.e. entities and relationships).
Redundant data representations are removed.
The specication reects the structure of each object.
Every object is explicitly associated with all its constraints in the specication. Furthermore, the transformation from an entity-relationship diagram to a Z specication should be such that it can be performed mechanically within a CASE-tool.
Previously similar integrations have been reported between formal methods and structured methods, see Josephs et al. [7,8], Semmens et al. [14, 1 5 , 1 6 ], Ginbayashi [4], Polack et al. [12,13]. The main dierence in our approach compared to these is that in order to meet the criteria above w e use theory from relational data modelling to remove redundancy and to normalize data objects. Furthermore, we p a y special attention to the structure of every object by k eeping tightly together an object and all its associated constraints.
There also exists integrations between formal methods and relational methods for database design [1,10]. These are, however, mainly used to specify database applications. This can also be done following the ZIM approach.
Overview We proceed as follows. In section 2, we will shortly present the semantics of entity-relationship diagrams. The formal language Z [18] that is used in ZIM is assumed to be familiar. The ZIM method is put forward in section 3. In section 4 we give an extention to our method where some additional features of relational data modelling are taken into account. We show the practical applicability of our method via a case study in section 5. We end in section 6 with some concluding remarks.

Entity relationship models
The entity-relationship model (ER model) is a semantic data model that is used to model data as entities and relationships between these. The result of entity-relationship modelling is called an entity-relationship diagram (ER diagram) rst introduced by Chen [2]. Later some extensions were made to Chen's ER model for example by Engels et al. [3] and Yourdon [19]. The ER model has its origins in structured methods. However, the ER model also forms a basis for object-oriented and relational data modelling methods.
Let us start by briey describing the basic notions of ER model, (i.e. entities and relationships). In Figure 1, a general form of an ER diagram is shown. Entities An entity is something material or nonmaterial that exists in the real world. There are two types of entities, strong entities and weak entities. Strong entities, also known as regular entities, exist independently of other entities. Every weak entity depends on some strong entity. E n tities have attributes that describe their properties. It is assumed that there is a distinct attribute list, or the attributes are presented in the ER diagram. Attribute values are grouped into domains. A t least one attribute (known as a k ey) should uniquely identify instances of every entity. Instances form an entity set. Each w eak entity can inherit all of the properties of its corresponding strong entity. F urthermore, each weak entity m a y add its own properties. However, an instance of a weak entity is identied by the key of the corresponding strong entity in combination possibly with its own attributes.
Relationships Relationships represent the interaction between two or more entities. The participating entities can be paired using one-to-one, many-to-one, or many-to-many mapping constraints. If the constraint is many-to-many w e talk about relational relationships, otherwise the relationship is functional. Each entity m a y participate or has to participate in a relationship, i.e. an entity i s optional or compulsory in a relationship. Cardinality aspect is formalized by giving it as an ordered pair (x; y) next to the appropriate entity in a diagram. Here x 2 f 0 ; 1 g with x = 0 indicating that each instance of the entity m a y participate in the relationship (i.e. the entity is optional in the relationship) and x = 1 indicating that each instance of the entity has to participate in the relationship (i.e. the entity is compulsory in the relationship). Moreover, y 2 f 1 ; N g where y = 1 denotes that an entity participates in at most one relationship and y = N denotes that an entity participates in one relationship or in many relationships.
Functional and relational relationships are called standard r elationships. There can be a standard relationship about which some information is maintained. This information is carried by a n e n tity that is called an assigner [17]. Standard relationships are identied by the identiers of the connecting entities.
ISA-and ID-relationships are two functional relationships called existential relationships. Both relationships are used for representing hierarchical structures between strong entities and weak entities. An ISA-relationship is used for modelling the partitioning of a set of a strong entities (a superclass) into subsets of weak entities (subclasses) [10] particularly familiar from the world of object-oriented design.
An ID-relationship (known as a part-of relationship) is always between exactly two e n tities, a strong entity and a weak entity. The weak entity is identied by an unique identier, which is a concatenation of the key of corresponding strong entity and its own attributes. More on these topics can be found elsewhere [9].
Removing redundancy Often ER diagrams contain information that is redundant. When such diagrams are transformed into Z, this redundancy is reected in the Z specication. Therefore, it is advisable to rst remove the redundant information.
If an entity has an attribute whose value can be derived by a relationship from an attribute of another entity, this derivable attribute is removed as shown in Figure 2. Let us assume that in Figure 2 a) the entity A has an attribute A2 that is derivable via the relationship R2. In Figure 2 b) this attribute is removed.
Another kind of redundancy can be found from relationships, whenever one relationship is created from another by composition. This is also illustrated in Figure 2 where the relationship R3 in a) is a composition of relations R1 and R2 (R3 = R 1 R 2), and hence can be removed. 3 The ZIM method The ZIM method takes an ER diagram, where redundancy has been removed as described above, and produces a Z specication from it. The method has three phases as follows: Entity t ype normalization The attributes have domains, that should be dened before an entity i s formed. All non-key attributes should be dependent on the key attribute of an entity. If there are functional dependencies between non-key attributes they are removed.
Object set generation Schemas that illustrate sets of instances of objects (i.e. entities and relationships) are generated. Constraints associated with the set are considered. The denition of a set links each object to a unique key.
Forming complete state Finally, all classes are gathered into one complete state.
Let us look at these phases.

Entities and entity sets in ZIM
The entities are transformed into Z schemas. First, we look at the dependencies between the dierent attributes within an entity. If such dependencies are found, they are rst removed. Then the domains for the attributes are dened. The entity sets are dened in two steps. First, we generate a specication for the non-key attributes of an entity, and secondly the entity set itself is dened by identifying the possible instances of the entity.
Entity t ype normalization In ZIM, the interdependencies between attributes are removed using entity type normalization. This technique, which is here described only through an example, is used in relational modelling when removing functional dependencies between attributes leaving only the dependencies on the key attribute [5]. The entity t ype normalization is illustrated in Figure 3. There the entity A has four attributes, A1, A2, A3, A4. The attribute A1 is the key attribute, and the attribute A2 is dependent on it. Moreover, attribute A4 is dependent on A3. This dependency is removed by splitting the entity A i n to two e n tities A and A' (Figure 3 b). The attribute A3 becomes the key of the entity A'. Cardinality of A' is (1,N) and of A it is (1,1), i.e. both entities in the functional relationship are compulsory, and A is related by several instances of A'.  The domains of attributes Each e n tity has a set of attributes. Every attribute has an associated type, a domain name. These will directly correspond to Z types: [DOMAIN ] DOMAIN ::= value 1 j value 2 j : : : j value n Domains can also be specied using some other domain, or domains can have constraints: DOMAIN == ANOTHER DOMAIN DOMAIN == fdom : ANOTHER DOMAIN j constraintsg In the relational data model, the attributes are atomic, single-valued, and mandatory or optional [5]. If a value of an attribute is structured, or it has several values, the attribute should be changed to an entity [9]. However, in ZIM an attribute is allowed to be structured, as well. Therefore, the domain of a structured attribute is specied as a schema type: DOMAIN b = [ attribute 1 : DOMAIN 1 ; : : : ; attribute N : DOMAIN N ] Moreover, a non-key attribute is allowed to have several (or null) values. In this case, its type is dened using the power set construction DOMAIN .
Non-key attributes of an entity Let us now consider an entity with N attributes, attribute 1 ; : : : ; attribute N , plus the key attribute. The non-key attributes of an entity are combined into a record schema as follows: Entity b = [ attribute 1 : DOMAIN 1 ; : : : ; attribute N : DOMAIN N ] A similar denition also identies the assigner in an associative relationship. Each strong entity that participates in an ISA-relationship is formalized by the partitioned weak entities. This is shown later in section 3.3.
Entity set Let us now consider a set of instances of the entity, i.e. the entity set. Let ENTITY ID be the domain of the key attribute. The possible instances of every strong entity are now formalized in Z as the schema EntityDS Entities : ENTITY ID Entity constraints where the partial function () guarantees that each e n tity has its own identier but they can have similar non-key attributes. The predicate species the constraints over the attributes of existing instances.

Standard relationships in ZIM
Let us now consider the standard relationships, relational, and functional.
A relational relationship A relational relationship between two e n tities is dened as a Z relation (#) between the domains of the key attributes of the two e n tities: The participating entity set schemas (DomainEntityDS; RangeEntityDS) are included in the denition part of the schema. The known instances of both entities are used to constrain the dened relationship. Both entity sets are dened as EntityDS in the previous section. As it is now, the subset sign () illustrates that both entities (DomainEntities and RangeEntities) are optional in the relationship. In case an entity i s compulsory in the relationship, the corresponding subset sign is replaced with equality (=).
A functional relationship A functional relationship is dened as the relational relationship; the only dierence is in the denition part. There are two cases: many-to-one, illustrated by the partial function symbol () as follows Relationships : DOMAIN ID RANGE ID one-to-one, illustrated by the injective partial function symbol () as follows Relationships : DOMAIN ID RANGE ID A standard relationship with an assigner A relationship between two e n tities is identied by the key domains of these entities, DOMAIN ID and RANGE ID. If some information is carried by the relationship, this information is modelled by an assigner [6]. This is captured in the following Z schema: where Entity refers to an assigner that is related to the relationship. If the relationship between entities is functional, the mapping between entities is either: many-to-one (DOMAIN ID RANGE ID) o r one-to-one (DOMAIN ID RANGE ID)

Existential relationships in ZIM
Next, the existential relationships, ISA and ID, are considered.
An ISA-relationship schema An ISA-relationship partitions the instances of a strong object into several weak objects. However, one instance of the strong object corresponds to one instance of the weak object. Every strong object participating in this relationship is identied by the key of the strong object. Let STRONG ID be the key domain of the strong object. Then an ISA-relationship corresponds to the Z schema: where the key attribute identies exactly one strong object instance.
Methods Integration Workshop, 1996 tribute STRONG ID, and a weak entity. The domain of a weak entity, W ENTITY , is modelled by a non-key Entity. It is identied by the key attribute inherited from the strong entity i n c o m bination with an own attribute, WEAK ID.

Complete state
The complete state schema combines the dened entities and relationships that are included in the denition part of it. In the object-oriented approach there is a concept of a metaclass, i.e., a class whose instances are object sets (entity sets or relationship sets). The complete state corresponds to a metaclass. The possible constraints are applicable to the set of instances as a whole [11], and these are added to the predicate part of the complete state schema.

Foreign keys
Relationships are represented in the relational models by foreign keys [1]. Constraints on foreign keys are used instead of constraints on whether an entity is compulsory or optional in a relationship. In ZIM, either the domain or the range entity has to be compulsory in the relationship when the foreign key representation is used. Let us illustrate the foreign key representation. A functional relationship where the participation of the domain entity is compulsory can be removed by adding the key attribute of the range entity i n to the attributes of the domain entity. The added attribute is called the foreign key. Hence, the value of a foreign key is always equal to the value of the key of the range entity in the removed relationship. This is illustrated in Figure 4.   Observe that the injective partial function () is used when one-to-one relationship is removed. If the mapping constraint b e t w een the domain and the range entity i s m a n y-to-one, the general form of a schema that incorporates a foreign key is: If both entities are optional in the relationship, null values should be considered when foreign keys are used to replace functional or relational relationships between entities. There is no universally accepted approach t o n ull (missing attribute) values. The null values can be explicitly included via the foreign key approach as done by Barros [1]. He has used his FOR KEY operators, and he has specied one null constant for each attribute domain.

An example: Access Security System
In secure computer systems access to information, user/password validation at login, the creation of new data entities and the maintenance of security data need to be controlled. An ER diagram and the associated Z specication for this problem following the method of Semmens et al. is given in [15]. The ER diagram for the system is given in Figure 5. The attribute list for assigners and entities is as follows: There is no need for redundancy elimination, nor entity t ype normalization. Hence, we can directly give a Z formalization of the ER diagram in Figure 5 following the ZIM method.
The attribute domains The domains of Access Security Systems do not have a n y constraints [15]. The

Conclusions
We h a v e proposed a new method, ZIM, to do data modelling. Our main goal in designing the method was to be able to generate clear, readable, and well-structured Z specications with as little redundancy as possible. Furthermore, ZIM is independent of used software development method that can be structured, object-oriented, or relational.
Compared to other reported integrations between the entity-relationship model and Z [4, 6, 7 , 8 , 1 2 , 1 4 , 15], our complete state specication is more concise. In ZIM, all the constraints belonging to an entity and a relationship are given when entities and relationships are presented. Moreover, we p a y special attention to relationships between entities. In ZIM, the participating entities are not redened when the relationships are specied. The participating entity set schemas (DomainDS; RangeDS) are used to constrain the dened relationship.

RelationshipDS DomainEntityDS; RangeEntityDS
Relationships : DOMAIN ID # RANGE ID domRelationships = dom DomainEntities ran Relationships = domRangeEntities However, in the related integrations relationship schemas have redundancy in the denition part where the set of existing instances (or known identiers) of the participating entities (knownDomains; knownRanges) have to be redened, even though there already exist denitions for the participating entities.
RelationshipDS knownDomains : DOMAIN ID knownRanges : RANGE ID Relationships : DOMAIN ID # RANGE ID knownDomains = domRelationships knownRanges = ran Relationships Additionally, the reported integrations do not consider the dierent features of the relationships when the denition part of the relationship schemas are specied. Instead dierent relationships are specied by adding contraints to the predicate part of the schema. In ZIM, both standard and existential relationships have their own formalizations. The features of the relationships are declared in the denition part. Therefore, additional constraints are not needed in the predicate part of the relationship schemas.
Close to our method, in the sense of redundancy removal, is the SAZ method of Polack et al. [13], because in their work the key attribute is introduced only when dening the instances of both an entity and a relationship. Furthermore, in the relationship schemas the participating entity sets as types are used to constrain its domain and range sets: We are currently considering how state transitions and operations can be included in ZIM. Furthermore, we are considering the possibility to include our ideas into a CASE-tool.