Key Integrity for Cooperative Database

ACID properties and serialisability have been extensively studied for transactions that are submitted concurrently in a multithread database environment. A number of protocols have been proposed to ensure ACID properties for centralised and distributed databases. These protocols aim to ensure consistency of update operations. However, they do not address key integrity problems that arise in a heterogeneous multidatabase environment when some of the hosts in the multidatabase have no constant communication links which is a frequently found situation for environments with mobile hosts. Transactions in such environments have longer life time which complicates transaction management and replica control protocols. In this paper we propose to use semantic information from the conceptual schema of the MDBS to ensure key integrity. Transaction handling and reconciliation protocols for entities with natural and surrogate primary keys are proposed and the description of the delayed commit protocol is presented.


Introduction
Transaction management for centralised and distributed databases ensures that all transactions in a multithread database environment are provided with ACID transactional guarantees [8,16].ACID guarantees are achieved if the results of all reads and writes are consistent and do not breach data integrity.This is based on the idea that a schedule of read/write operations for concurrent transactions should be equivalent to some serial schedule, when transactions are scheduled one at a time, which is known as the property of serialisability [7].This property and the means of ensuring thereof has been studied quite extensively.
Cooperative and federated database environments known as multidatabase systems (MDBS) [10,15,14,20] impose new challenges upon the transaction management mechanism.An MDBS environment represents a loose integration of several database systems.In such an environment both global applications whose transactions span more than one participating databases at once, and local applications that issue transactions that affect only one underlying database, must be supported.It has been shown that ACID properties, and global serialisability and atomicity in particular, are self-guaranteed if all local database systems (LDBS) ensure commit-order serialisability [2,17] and do participate in a global two-phase commit.For global transactions this assumes use of the nested transaction management mechanism.
However, scheduling that ensures strict serialisability, usually in practice causes severe performance problems for a regular MDBS.Indeed, the closed nested transaction management mechanism [11] does not allow subtransactions to commit before the whole global transaction reaches the last, so called ready-to-commit state [3].The commit only can begin when the longest subtransaction from the series has acknowledged that it has finished and is ready to R. M. I. T. -Royal Melbourne Institute of Technology Advances in Databases and Information Systems, 1996 commit.Therefore, all locks placed to ensure serialisability (even for the shortest transaction in the series) cannot be released until the longest subtransaction finishes.
Many authors have argued that serialisability might be too strong of a requirement in an MDBS environment.Thus, a concept of quasi serialisability has been introduced [5].Another concept of open nested transaction management mechanism [1,8] is seen as a means of increasing efficiency of handling concurrent transactions while providing sufficient guarantees for data consistency and integrity.This is achieved via reducing the probability of subtransactions conflicts by allowing a subtransaction to commit as soon as possible after it has completed all its read and write operations.Semantic-based transaction management [4] based on open transaction management mechanism is proposed in order to speed up the MDBS performance further more by eliminating pseudo-conflicts through exploiting the semantic properties of subtransactions to assess the possibilities of a conflict to appear.

MDBS with mobile hosts
Cooperative heterogeneous database environments with mobile hosts represent a class of federated databases with no constant on-line communication links between component databases.This assumes certain restrictions in accessing the component LDBSs.
There is less of a problem when a constant on-line link between such mobile hosts and stationary networked nodes exists as although the mobile nature of the nodes adds to the complexity of the problem and has its impact on the way such an environment operates, but still, it resembles a conventional MDBS, whereby each of the nodes most of the time has an on-line connection to the rest of the MDBS.
When there is no possibility to have each of such mobile hosts constantly connected to the rest of the MDBS because of various problems (e.g. higher costs of communication link that is not used completely, location outside of the area of coverage for mobile phones, etc), additional problems in maintaining MDBS integrity may arise.There is no longer the possibility to enforce complete integrity, controlling the execution of a distributed transaction.Certain design decisions, transaction and replica control protocols must be put in place in order to ensure MDBS integrity.
There are certain conflicting operational requirements that such an MDBS environment should satisfy.Among them are the following: The requirement for stationary and mobile hosts running LDBSs to preserve their autonomy.Support for distributed transactions as well as local transactions at each host.
Provision of certain level of transparency for global transactions where the mobile nature of some hosts is hidden.
Correct handling of off-line periods with no communication link for the rest of the MDBS as well as tolerance to network failures.

Periodical broadcasting database updates for both mobile and stationary hosts.
There are some approaches and protocols proposed to handle transaction processing for an MDBS with mobile hosts [9,23] that address specific features of mobile computing.

Key integrity in an MDB system
Serialisability as a means of enforcing ACID properties, and special transaction management protocols proposed for MDB systems, ensure data consistency when data is being accessed and updated by a number of concurrent transactions.Hence, the correctness of protocols has been the main theme of research concerned with transaction management for MDBSs.
However, when it comes to consideration of data integrity for heterogeneous MDB systems with no constant communication link, the integrity of primary keys, maintaining of referential integrity of primary key -foreign key links for databases supporting relational data model, becomes an important topic of research.Since transactions in such an environment can have longer life time because of the necessity to wait for periodical on-line connections that Advances in Databases and Information Systems, 1996 are established from time to time, special replica control or reconciliation protocols need to underlie the transaction management mechanism.Clearly, in order to solve the key integrity problems, semantics found in conceptual and logical models of the MDBS must be used.
This paper makes the following contribution: It studies how the semantics extracted from the conceptual model can be used to enforce key integrity.
It studies the types and semantics of entity keys from the data modelling point of view and explains how these affect the length of a transaction life and type of reconciliation protocol needed.
It provides two models for reconciliation for entities with natural and surrogate primary keys.
It introduces the delayed commit protocol to resolve the key integrity problem for entities with surrogate primary keys.
The structure of the paper is as follows.Section 2 discusses the class of MDBSs being the subject of enquiry for this paper, the model thereof and main assumptions.Section 3 is devoted to the types of update operations that a mobile host can initiate in an MDBS.Reconciliation protocol for entities with a natural primary key and delayed commit protocol are discussed in section 4. The last section of the paper represents conclusions.

The Model and Assumptions
This section sets out the framework for the study and describes the class of multidatabase systems being considered for the rest of the paper.We describe assumptions regarding the topology of the multidatabase system and logical schema thereof.For the sake of clarity, we assume that all LDB systems at each node have the same type of data model and this being the relational model.This slight simplification is still fairly close to the reality as the relational model can in most cases be considered as a reasonable common ground for generalisation of the global database schema for a heterogeneous MDBS.
Our assumptions and study encompass two levels of abstraction, namely -conceptual level where we consider modelling entity types (referred to as just entities further on in the paper) and entity instances, and logical level where relations and their tuples are considered, and how the tuples participate in transaction and reconciliation protocols.
We assume that the primary site technique that is based on the idea of distinguished copy [12,13], is used, and it is used over particular type of network topology being the star topology (Fig. 1a).In such a case the central node is the primary site that maintains the primary copy of the MDBS.This assumption can be easily extended to the case of a network with non-mobile hosts (constantly on-line ones) replacing single central node (Fig. 1b).
Star topology for cooperative database environments is very popular when it comes to designing database systems for industry, whereby a headquarter controls operation of its branches, or when there is a central company and a number of agents that operate for the company.It is frequently found these days that such an agent or a regional host is mobile and has certain level of autonomy.
This assumption is very important for the study as while designing reconciliation protocols we shall assume that there is always at most one non constant communication link between a mobile host and the central site.
There are two types of transaction managers that run on different hosts of the MDBS.Each host runs its own local transaction manager (LTM) that is used either to access the LDBS autonomously or to accomplish subtransactions of a global transaction.Global transactions are handled by the global transaction manager (GTM) that can run on the primary host.
Generally, vertical fragmentation takes place between the central host and all or some of the regional hosts, and join is used for re-constructing full relations.In other words, let where n c is the degree of relation R c ) be attributes of a relation schema R c being part of the local database schema for the central node, and A mj i 2 R mj -those for the relation R within the j th mobile host of the system.Then and fA k g = P Kis the primary key of the relation.
Advances in Databases and Information Systems, 1996 Outer union over all regional hosts is used to re-construct the regional part of the relation R: where r(R) denotes the relation instance of R. No horizontal replication occurs between the regional nodes of the database, i.e.
8i; j 2 [1::m] ) r mi (R mi ) \ r mj (R mj ) = ; We shall distinguish between two types of primary keys for entities whose corresponding relations are kept in the database.By natural primary key (NPK) we understand a set of attributes that (i) uniquely identifies an entity instance and (ii) retains its meaning beyond the context of the database, application, or data model thereof.Examples of an NPK can include such attributes as Social Security No, Driver's Licence No, Product ID, Part No, etc. From a strict data modelling point of view not all of them can be classified as natural primary keys as some of them are in fact former surrogate keys (see further) that have become natural keys with time.However, due to their relative stability they can be seen as completely natural.Clearly, it is possible that NPK can be either simple or composite.The existence of a natural primary key is the guarantee that there is a bijection between an entity instance (as a physical real world entity) and a tuple from the corresponding relation in the database.
Surrogate (artificial) primary key [6,18,19] is a key that has no explicit meaning and is frequently assigned a value by the transaction manager or the database engine.An example of an attribute that is not a natural key, could be Transaction Number for a bank transaction.This, in fact, is a surrogate key whose value does not have any explicit meaning beyond the scope of the corporate data model for the bank in question.
We shall assume that transactions are queued on the mobile host until the communication link is re-established and reconciliation begins.

Update Operations For An MDB System With Mobile Hosts
A rigid approach to transaction management for an MDBS with features described in the previous section, dictates that in accordance with the isolation property, any update initiated from a mobile host should be first made invisible for the whole MDBS including the host of its origin.The transaction should be deemed committed and the result of the update made visible when the subtransaction that updates the primary copy successfully commits.Then the MDBS propagates the update trough the other hosts according to its replica control protocol.This would mean that the result of the update is unavailable at least until the time of re-establishing a communication link, that may be totally impractical and may have serious impact on the performance of the MDBS.However, as we shall show further, immediate commit of the local subtransaction may create key integrity problems within the MDBS.Our aim is to devise such reconciliation protocols that would allow to relax the rigid requirements making the results of a transaction available as soon as possible, still avoiding, however, key integrity problems.
In order to reveal problems that update operations may introduce, and observe the impact they may have on the key integrity over the whole MDBS, let's consider different types of updates and the ways they affect reconciliation of an LDBS that resides on a mobile host, and the rest of the MDBS.First, assume that there is an entity RE within the conceptual schema of the cooperative database that is mapped to a relation R within the logical schema, that satisfies the criteria represented in the previous section.Then the following updates are possible: modification of the value for any attribute A mj i 2 R mj of a tuple of R where A mj i 6 2 P K .This might create no integrity problems if the attribute A mj i is also not part of any referential integrity constraints, and no pre-update triggers or RPC are associated with it.If these conditions are met, the corresponding local transaction can be executed straightaway while the rest of the global distributed transaction can be queued for submission to the GTM at a later stage when communication link is restored.The results of the local transaction can be committed immediately, thus, being made available for local use provided there are no constraints that are violated within the global part of the transaction.If there are those that are violated, compensating techniques [21,22] should be used to roll the distributed transaction back.deletion of any tuple from the relation R. Again, this does not create integrity problems provided there are no referential integrity constraints held across the distributed system, whereby even at least one path that this type of constraint traverses, includes non-permanent communication link.In fact, it is clear that this requirement can be relaxed to the degree of having all such constraints that hold over this type of paths, specified with the cascade delete option.Again, the results of the local transaction can be committed and made available immediately, having the rest of the global transaction queued until connection between the mobile host and the rest of the system is restored.insertion of a new tuple into the relation R.This should not create problems for the entities with natural primary keys as there is the bijection between entity instances for the entity RE and tuples of the relation instance r(R).However, a reconciliation for the relations that correspond to entities that have immediate and transitive relationships with RE may be required.Totally different situation occurs if RE has a surrogate primary key.
The concept of a surrogate key in most cases requires that the system allocates a key value for the newly introduced tuple and this value should be unique.In a homogenious MDBS with on-line communication links it would be possible to solve the problem, having a composite primary key consisting, of the key of RE plus the site number.However, this cannot be achieved in a heterogenious cooperative MDBS.This creates integrity problem straightaway as a mobile node cannot immediately synchronise allocation of a unique primary key with the rest of the distributed system.The transaction cannot be regarded partially committed even locally until a unique primary key is assigned.modification of the value for any attribute A mj i 2 R mj of a tuple where A mj i 2 P K .This either should be prohibited (ie automatically replaced by a deletion/insertion pair) or should be treated the same way as an insertion.
In this section we will concentrate on the insertion problems and show how the strict requirements can be relaxed to make a transaction locally committed before the communication link is restored and before the insertion is submitted to GTM.We also show how the semantics is extracted from the conceptual and logical model of the MDBS can be used in enforcing key integrity.For this purpose we assume that GTM as well as LTMs for all mobile hosts have access to some kind of internal representation of the conceptual model (e.g.entity-relationship) for correct reasoning on primary key -foreign key links and propagation of the primary keys of any entity into composite primary keys for other entities connected to it via direct or transitive relationships.Naturally, each LTM can have its own version of the conceptual model, however, all the models must comply with the requirements presented in section 2.

Entities with natural primary keys
For the sake of simplicity we consider here only the first part of the reconciliation protocol, namely: submission of updates from a mobile host to the distributed transaction manager.When a communication link is established, LTM starts reconciliation using the following protocol.
Step 1.Let RE be an entity with the natural primary key K = fA 1k ; A 2 k ; :::; A nk g.LTM for a mobile host isolates the relation instance r(R) that corresponds to the conceptual level entity RE on the logical level, with the relation schema R, where K R. It also searches for relationships between the entity RE and other entities of the conceptual model.
Not all such relationships need to be taken into consideration, however.The only relationships of interest are those through whose schemas the primary key K of R is propagated.Let SE be an entity on the other end of the relationship.Then, only the following relationships need to be considered: 1:1 relationships when the following holds for the cardinalities of the two entities: car(RE) car(SE), and perhaps the total participation constraint holds for RE in this relationship; 1:M relationships where the entity RE participates at the '1' end of the relationship; all M:N relationships.
For all these cases the primary key of RE is normally propagated into the relation schema of SE and forms part of its primary key.
Step 2. LTM issues a query to GTM that runs at the other end of the communication link, to search for a tuple with the same value of the primary key K in the global relation RE.If the tuple does not exist then the reconciliation is simple and the tuple is added to the global relation instance r(R) residing on the central site that maintains the primary copy, and GTM takes care of the fragmentation and replication issues for the rest of the MDBS.If the tuple does exist, then there is no need to insert the tuple again therefore avoiding key violation.The fact that the primary key K is a natural one, guarantees that the two tuples (from both ends of the link) correspond to one and the same entity instance.Reconciliation in this case consists of submitting tuples from all the other relations that were isolated on the previous step for insertion.These relations represent entities that have relationships with RE and their new tuples should be added to the corresponding relations within the primary copy.

Entities with surrogate primary keys
When no natural key exists for the entity, quite popular solution is to use surrogate keys to ensure uniqueness of a tuple being inserted.Usually, the value of a surrogate key is assigned by the transaction manager or database engine and determined at the time of committing a transaction.As a rule, an MDBS provides certain means for that either via supporting special data types (e.g.autoincrement data type in Borland database engine -Paradox, Delphi) or via special mechanisms provided by the engine (e.g.sequences in Oracle DB engine).
In fact, it is frequently found that on the stage of mapping conceptual models to logical database design, the designer makes the decision to use surrogate keys even though a natural primary key exists.This is done to get rid of a long sequence of attributes being the natural key for a relation, where those attributes are inherited from its ancestors through a long chain of 1:M relationships.
Since many of DB engines are ACID compliant and ensure atomicity and isolation for serialisable schedules1 , there is no much of a problem to ensure correct assignment of a surrogate key for a centralised database environment.However, a transaction management mechanism for an MDBS environment that is just ACID compliant cannot ensure key integrity on its own.Hence we propose a delayed commit protocol for entities with surrogate primary keys, that addresses the following two issues: immediate local commit and queuing requests to the global manager to propagate the transaction throughout the rest of the MDBS for modification and deletion types of transactions; assigning locally unique values to the primary keys, immediate local commit and queuing requests to the global manager for insertion type of transactions with subsequent reconciliation of the values of the primary keys at a later stage.
Following is the strategy that the delayed commit protocol exercises (Fig. 2) for a transaction that is initiated between a mobile and the central host when an insert type transaction is being committed.
Step 1.The mobile host H k is about to commit a record.LTM for this host generates a unique surrogate key for the new tuple and commits the subtransaction locally.A request for a global transaction is put in a queue to the global transaction manager.The local subtransaction is committed, therefore, the new record is now visible for LDBS, however, it is not considered fully globally committed as yet.
Step 2. The second stage of the protocol takes place when the mobile host and the central node of the MDBS initiate a communication session.The mobile host replicates all new records that were successfully committed locally since the last communication session between the two hosts.Each record, when replicated, is assigned a global primary key generated by the global transaction manager.The global transaction that is to propagate changes throughout the whole database is stored in the global transactions queue pending acknowledgment from the local host (see next steps).If this set of actions fails, a local roll back is initiated by the LTM on the mobile host.If the value of the primary key has been changed successfully, the mobile host acknowledges GTM that the subtransaction has been successfully completed locally.This triggers the global manager to start propagating changes for the rest of the distributed system.

Conclusion
We have studied key integrity problem for MDB systems with mobile hosts that do not maintain constant communication links.We have shown that there are two distinct types of integrity problems: the one for entities with natural primary keys and the one for entities with surrogate primary keys.It appears that due to the very nature of natural primary keys the key integrity problem reduces to a reconciliation problem for entities that have direct or transitive relationships with the entity in question.Maintaining integrity for entities with surrogate primary keys however, requires special protocol for insertion type transactions.We have presented the delayed commit protocol that preserves key integrity.
The main feature of the delayed commit protocol is that it is uses semantic information from the conceptual model of the MDBS.It requires that this information is made available for LTM for correct propagating of the primary key within LDBS.GTM announces the value of the primary key to the mobile host.8. LTM initiates the primary key change for all associated relations.9. LTM acknowledges GTM of success or fail.10.GTM releases the record from its "pending" status and submits it for further replication.11.LTM submits all changed records (step 8) for replication

Figure 1 :
Figure 1: Examples of network star topology with mobile hosts

Fig. 2 .Step 3 .
State transition diagram for the delayed commit protocol The next step begins when the central host initiates a new subtransaction on the mobile host H k that changes the value of the primary key for every previously locally committed record.A separate subtransaction is initiated for every record.Not does the subtransaction encompass only the relation r(R) where the new tuple was inserted in.LTM uses semantic information from the conceptual schema to update the value of K R for all relations that have direct and transitive relationships with R. If this part of the transaction fails, the whole global transaction rolls back to the state before step 2.

1 .
Mobile host application submits a new tuple.2. LTM commits the record locally.3. LTM puts the record in the global transaction queue.4. The global transaction is submitted to GTM. 5. GTM assigns the primary key attributes a value.6. GTM puts the record in the queue for further replication, assigning it the "pending" status.

Figure 2 :
Figure 2: Delayed commit protocol for entities with surrogate keys