Querying and Updating Constraint Databases with Incomplete Information

It is the aim of the research described in this paper is to to extend the work to deductive databases by proposing a theory of indefinite deductive constraint databases (IDCDs). This model provides a useful and natural representation of partial and infinite information. As a result of basing our representation of incomplete information on constraints, we can easily extend our model in order to enable the representation of temporal and/or spatial data by defining the domain of variables over which constraints can be defined to either be the temporal or spatial domain or by using, for example, linear arithmetic constraints to represent these domains. We describe how the syntax of the deductive database model can be enhanced with global conditions on incompletely specified constants to naturally represent partial information and provide a full declarative, fixpoint and procedural semantics for the language defined. We discuss an as yet unexplored problem with respect to constraint databases - that of updating and maintaining a consistent set of data in the presense of global conditions on incomplete information.


Introduction
Non-traditional applications in domains such as planning and scheduling, decision support, spatial information systems, CAD/CAM, scientific databases, etc. require a number of functionalities not traditionally provided by standard database models.These include the need to represent temporal and/or spatial information, the existence of both complete and incomplete representations for data, the ability to retrieve both conditional and definite answers to queries in the face of incomplete information and the ability to represent, query and dynamically update complex interrelated data.
In this extended abstract, we present the main concepts behind the ongoing development of a prototype deductive database which aims to provide this functionality through an integration with constraint handling similar to that proposed by Koubarakis [9] for relational databases.We motivate this work using the following example.
Example 1 Routing and scheduling applications are concerned with collections of entities (eg locations, depots) requiring service by a fleet of vehicles.Real-world routing and scheduling applications involve large amounts of data and multi-user access and would therefore benefit from database support.Applications in this domain require the representation of spatio-temporal information, for example, routes can be represented as spatially embedded networks and the times at which vehicles service locations on the route needs to be stored, as well as basic attribute data about these vehicles and entities.Information about these values may be incompletely specified but still needs to be represented and maintained prior to the generation of a schedule.Indeed, even once a schedule has been generated incompletely specified information may still exisit.For example, the actual times at which locations are scheduled may not be known at the outset, however, such Querying and Updating Constraint Databases with Incomplete Information details as the latest time by which a location must be serviced, that location A must be serviced before location B, etc, is known.
In addition to the problem of how to model such information, such representations are not static.For example, new information about the possible times that locations can be serviced may become known at any time and existing information may be found to be incorrect and need to be deleted.
Analysis of this and similar applications raises several issues not adequately dealt with by existing general purpose systems, in particular, the presence of spatio-temporal data represented as continuous values, the presence of incompletely specified values, which may also be continous, and interrelated (incomplete) specification of values, for example, that one event occurred before another.
There have been numerous attempts to extend existing data models in order to deal with this sort of incomplete information (see, for example, [1,4,8,10] however, these extensions either concentrate on discretely representable, independent, incomplete values or definite representations of continuous data.In common with other researchers, however, we believe that many applications, for example applications involving temporal or spatial reasoning, can benefit from the development of a database model which allows the representation of definite, indefinite, finite and infinite information in a single unifying framework and that this can best be achieved through an integration of formalisms developed in both Artificial Intelligence and databases.Such an integration will not only provide the expressive power required for such applications, but also the necessary functionalities for more general forms of querying, dynamic updates, concurrency, recovery mechanisms, etc.We see these as necessary functionalities given the large amounts of data which these applications are now generating, the dynamic nature of that data and the need for multi-user access to the data.The approach we take is based on an integration of constraint databases, deductive models for dealing with incomplete data at the level of value incompleteness" and formalisms for dealing with constraints. Constraints have been extensively used to represent continuous data, such as temporal intervals or spatial objects, and have recently been integrated into database query languages.In [8] the Constraint Query Language (CQL) scheme was introduced which directly supports the representation of continous data.In this scheme a "constraint tuple" is a finite representation of an infinite fact, such as an interval, a line or a region.A traditional data value is a special case of constraint tuple where the only constraint is equality.
In [9] the CQL scheme was extended for relational databases to allow the representation of global constraints on existentially quantified variables appearing in constraint tuples.This extension enables complex partial information such as described above to be represented.
It is the aim of the research described in this paper is to to extend the work reported above to deductive databases by proposing a theory of indefinite deductive constraint databases (IDCDs).This model provides a useful and natural representation of partial and infinite information.As a result of basing our representation of incomplete information on constraints, we can easily extend our model in order to enable the representation of temporal and/or spatial data by defining the domain of variables over which constraints can be defined to either be the temporal or spatial domain or by using, for example, linear arithmetic constraints to represent these domains.
The contributions of this paper are as follows: We describe how the syntax of the deductive database model can be enhanced with global conditions on incompletely specified constants to naturally represent partial information and provide a full declarative, fixpoint and procedural semantics for the language defined.
We present a deductive query language which can be used to reason with information represented in this model and address the issue of bottom-up evalution in this framework.
We discuss an as yet unexplored problem with respect to constraint databases -that of updating and maintaining a consistent set of data in the presense of global conditions on incomplete information.
This paper reports on work in progress and some of the ideas are presented through examples only.
Advances in Database and Information Systems, 1997

Syntax and Semantic
Work on representing partial information in deductive databases describes their semantics as a set of databases, each one corresponding to a possible world but requiring more information to know which one which corresponds to the actual state of the world [2].This imaginary scenario is unsuitable for direct implementation due to the prohibitive expense of storing multiple databases [13] particularly where the domain for any partial information is continuous.A more compact representation of the candidate databases is required for the sake of efficiency.Several formalisms currently exist, however, these are inadequate for representing the sorts of application domains in which we are interested.
In this section we lay the foundations for the implementation of a deductive constraint database with indefinite information, based on an integration of semantic concepts from deductive databases with null values [10,4] constraint databases [9,10] and constraint logic programming [5,6,7] by developing the model of indefinite deductive L-databases where L, the parameter, is a first order constraint language.In this model IC-constants represent incompletely specified data.Global conditions specify what is known about these ICconstants at assertion time.

Language syntax
The Indefinite Deductive L-Constraint Database (IDC(L)) model extends the concepts of definite deductive databases by allowing constraint terms known as C-constants in facts.These constants can have one of two interpretations.
1. Definite C-constants (DC-constants) represent infinite (continuous) data, for example, temporal intervals or spatial attributes such as lines or regions.Bounds on these constant values are represented by local constraints.Using constraints in this way is the subject of most work on constraint databases and we only describe the details of how DC-constants are used and interpreted where necessary to highlight the semantics of IDC(L).
2. Indefinite C-constants (IC-constants) represent values which are known to exist but which are imprecisely or incompletely specified.The partial specification of such constant values is represented by global conditions.
We distinguish C-constants from ordinary variables because of their different interpretation.
We assume familiarity with deductive and constraint databases and introduce the following extended definitions.
A term can be any one of the following: a variable, a constant , a DC-constant or an IC-constant.An atom takes the form pt 1 ; : : : ; t n ( n 1 ) where p is a predicate, the t i are terms and c is a (possibly empty) conjunction of constraints.A ground atom (or fact) is an atom which only contains constants.
An IDC(L) clause takes the general form A 0 C ^A1 ^: : : Â n ; c where each of the A i is an atom, C is a conjunction of query constraints and c is a constraint expression.A 0 is the head of the clause, C^A 1 ^: : : Â n ; c is called the body of the clause and c is called the domain specification of the clause.A goal is a program clause with no head.If there are no IC-or DC-constants in a clause then c = ;.
We use the notational conventions of representing normal constants and predicate symbols as strings beginning with a lower case character and general variables as strings beginning with an upper case character.We require that atoms with the same predicate symbols are of the same arity, that is they have the same number of arguments.DC-constants are denoted by lowercase letters of the English alphabet (for example, x, y, z, t), possibly subscripted and IC-constants by letters from the Greek alphabet (for example, !).
A constraint expression c is a conjunction of closed L-formulas (constraints) which place bounds on the values of an IC-constant !i (resp DC-constant x i ).The partially solved form of c is called the range of !i (resp.

x i ).
An atom A is an infinite L-constraint fact pt 1 ; : : : ; t m ; c if and only if 9t i 2 p such that t i is a DCconstant and all other terms in p are either normal constants or DC-constants.Each infinite L-constraint fact Advances in Database and Information Systems, 1997 p is associated with a conjunction of L-constraints lp 2 c over DC-constants appearing in p called the local condition of p.
An atom A is an indefinite L-constraint fact if and only if 9t i 2 p such that t i is an IC-constant and all other terms in p are either normal constants or DC-constants or IC-constants.Each indefinite L-constraint fact p is associated with a boolean combination of L-constraints gp 2 c called the global condition of p.The collection of global conditions defined for predicates in D is known as the global constraint store G.It is assumed that G is always consistent.
An IDC(L) database consists of a finite set of clauses that satisfy the following conditions: Each infinite or indefinite constraint fact appearing in D is ground, that is, they contain no ordinary variables.
If an IC-constant !i appears in two different clauses, then the range of !i is the same for both clauses.
Each fact of DB is ground.
Every variable occurring in the head of a rule must also occur in the body of the same rule.
Each DC-and IC-constant occurring in a condition in the body of a rule must also occur in a predicate in the body of the same rule.
We also require all predicates with the same name to be of the same arity.Conditions in the body of a rule cannot be used to manipulate the values of DC-and IC-constants stored in the database, unless they appear in an update clause.Rather, they are collected and solved during query evaluation to restrict the facts generated for a particular rule.The following example highlights the syntax of IDC(L) where L is instantiated with the arithmetic constraint domain Z.The example shows the use of IC-constants to represent indefinite information only.
Example 2 Let us consider a portion of a simple genealogy database.This database contains information about possible ancestors and the year of their birth represented by the atom person(First, Last,DoB).
Years are integer values whose range is drawn from the integer arithmetic constraint domain denoted by Z. Names of individuals are always fully known but we may not know, or may only have partial information about, dates of birth.

person(arthur, hayward, 1896).
This first ground atom is a normal deductive fact which states that the date of birth of Arthur Hayward is known to have been 1896.The following atoms contain indefinite information person(abraham, hayward, ! 1 ).person(alfred, hayward, ! 2 ).person(alice, hayward, ! 3 ).
Conditions on the IC-constants appearing in the Z-constraint facts provide all currently known information about those values and are held in the global constraint store gperson shown below: gperson : 1860 ! 1 ; ! 1 1870; ! 1 , 15 ! 2 ; ! 2 1845; ! 3 1825; ! 2 , ! 3 20: The global condition states that the date of birth of Abraham Hayward is known to have been sometime prior to 1870 and after 1860 and the date of birth of Alfred Hayward is assumed to be at least 15 years prior to that of Abraham and 20 years later than that of Alfred.Finally, Alice Hayward is known to have been born sometime prior to 1825 and 20 years before Alfred.We also define a rule that a person is a possible ancestor of another person if they have the same surname and were born at least 15 years apart.
possible ancestorF i r s t 1 ; First2; Last personF i r s t 1 ; Last; X; personF i r s t 2 ; Last; Y ; Y X + 1 5 : In this rule the query constraint Y X + 1 5 is used to restrict the derivation of facts using the rule.

Language semantics
IDC(L) has a full declarative, procedural and fixpoint semantics based on extending existing database semantics with IC-constants and conditions and taking account of the indefinite nature of the data through the provision of both definite and possible model semantics.This extension is heavily based on approaches to representing nulls in deductive databases (see, for example, [10]).
The declarative semantics is based upon an extension of the Herbrand universe with IC-constants.This is used to generate a 'quasi' Herbrand Base with IC-constants.Information about IC-constants is derived from the solution universe of the database.This is the set of all interesting constraints achieved by projecting out each IC-constant from the global constraint set and provides a lower and upper bound on the value of each IC-constant (known as the solved form of its associated set of conditions).By attaching relevent constraints to predicates appearing in the Herbrand Base we achieve the generalised Herbrand Base with indefinite values.From this we generate two types of interpretations -definite interpretations, which represent facts true in every possible world of the database, and possible interpretations which represent values true in at least one possible world of the database.A minimal model semantics is generated from these models by taking their intersection.
We begin by extending some definitions for definite constraint databases.

Definition 1 (Valuation)
A valuation : C ! M is a mapping from C ! M which maps an IC-constant C to an element of the constraint domain L, a DC-constant to a interval over the constraint domain and a constraint c to a closed L*formula of the structure D. If X is a set of facts, each denoted by A, containing ICand/or DC-constants, then The constrained atom pt 1 ; : : : ; t m ; c is an atom with distinct IC-and DC-constants as arguments and a conjunction of global and local constraints on these constants.Each constrained atom with DC-constants represents a possibly infinite set of facts which satisfy the local conditions on those constants.
Each constrained atom with IC-constants represents a "set of possible worlds" which could be obtained by assigning a single value to each constant which satisfies the global constraint(s) associated with it.That is, each IC-constant, together with its global constraint(s) is a finite representation of a possibly infinite set of disjunctive values.
The semantics of both IC-constants and DC-constants is defined by the set of their domain instances (see [] for the introduction of this notion in another setting).

Definition 3 (Domain instances)
The (set of possible) domain instances pt 1 ; :::; t m ; c of a constrained atom pt 1 ; :::; t m ; c is defined as pt 1 ; :::; t m ; c = fpt 1 ; :::; t m j is a solution of cg Thus, the true (as yet unknown) value of an IC-constant is an instantiation to a single value from the set of possible domain instances for that constant.The true (known) value of a DC-constant, on the other hand, is the set of domain instances for that constant.
The set of possible worlds for a predicate pt 1 ; :::; t m ; c, therefore, consists of the set of possible domain instances obtained by replacing each IC-variable with each possible instantiation according to the global constraints.The meaning of pt 1 ; :::; t m ; c is given by the constraint domain.
Definition 4 (Solution set) The solution set of a predicate pt 1 ; :::; t m ; c is the set of possible worlds for that predicate obtained as described above.

Defintion 5 Equivalence
Let A be the set of constrained atoms appearing in a database D. We define a preorder on A as follows pt 1 ; :::; t m ; c pt 1 ; :::; t m ; c iff pt 1 ; :::; t m ; c pt 1 ; :::; t m ; c .
Equivalence induced by on the set of atoms is denoted .That is, two constraint atoms are equivalent if their solution sets are equivalent.We next extend the notion of substitution to constraint solving as in [7].where the following conditions hold 1.Each v i is either a variable or an IC-constant, and v i 6 = v j for i 6 = j.
2. Each v i does not appear in the corresponding t i .3. If v i is a variable, then t i is an arbitrary term.If v i is an IC-constant then t i can only be a normal constant or another IC-constant.
The former case is known as a possible constrained substitution, denoted by P, and the latter as a definite constrained substitution, denoted by D.

Declarative Semantics
In this section we define a declarative semantics for IDC(L) by generalising the existing semantics for constraint databases based on ideas presented in work on value-based incomplete databases.In this semantics, rather than generating subdatabases, we represent alternative models of the world through constrained constants.We develop the semantics using the following example where L is the domain of integers.We start by defining the declarative semantics without taking full consideration of the structure provided by the constraints.This enables us to show the meaning of the incomplete database in terms of the actual values of each possible world represented.This semantics is similar to that defined for definite constraint databases but results in a semantics which does not have a unique minimal model.We follow this initial definition by a semantics which takes full account of the constraint structure and generates a unique minimal model.This second semantics does not require us to deal with the full complexity of disjunctive logic programs and also enables us to provide a tractable semantics for infinite indefinite data.

Definition 9 (Generalised Herbrand Universe)
The universe U D of a database D with constraint constants is the set of all ground terms which can be formed from all constants and DC-constants appearing in D and the solution sets of all IC-constants appearing in indefinite facts of D, wrt equivalence.
Example 4 The universe of the set of clauses C in D is as follows:  are true under I.A fact pt 1 ; :::; t m ; c is true under I iff 8pt 1 ; :::; t m ; c ; 9 pt 1 ; :::; t m ; c 2 I.That is, a ground fact which represents a possible domain instance for p appears in I.A rule A 0 A 1 ^: : : Â n Ĉ 1 : : : Ĉ m : is true under I iff whenever A 1 ^: : : Â n 2 I then it also holds that A 0 2 I.A constraint atom A is true iff it is true in all minimal possible models.Ground facts of D, denoted GroundD, are exactly the ones which are true in all possible minimal models.That is if F is a ground fact, D j = F iff F is true in every minimal model for D.

Querying and Updating Constraint Databases with Incomplete Information
Each possible minimal model is a possible world for D according to the global conditions of D. However, generating each possible world would be intractable, particularly when the underlying domain is continuous or little (or no) information is available for the IC-constants appearing in the database.
In the following examples we show that by taking full account of the constraint structure it is unnecessary to generate each possible world explicitly in order to generate models for D, instead, we can continue to model domain instances using constraints.The advantage of this approach is that we obtain a unique minimal model for the database.Definition 13 (Generalised Herbrand Universe with IC-constants) The Generalised Herbrand Universe with IC-constants U C D is the set of all constants, DC-constants (with their associated intervals) and IC-constants appearing in D. The H B I C D is not adequate as a basis to describe the semantics of IDC(L) databases as it contains no information about the incompleteness of IC-constants.We also require that the range of each predicate be associated with the H B I C D , .We call this the solution universe of D [10].Definition 15 (Solution Universe) The solution universe s of a predicate p is the set of constraints which define its range.The solution universe S of a database D is therefore S = fcjc 2 consistentgpg: Example 9 The solution universe of the predicate p! 2 ; ! 3 is s = f5 ! 2 ; ! 1 ! 2 , 6 ; ! 2 12; 12 !3 ; 9 ! 3 ; ! 3 12; ! 4 ! 3 + 2 ; 11 ! 3 ; ! 4 14:5 ! 2 ; ! 2 12; ! 2 ! 3 g This being the set of constraints necessary to define its range.
Definition 16 (Generalised Herbrand Base with Indefinite Values) Let D be a database, H B I C D be its Herbrand Base with IC-constants and S its solution universe.The Generalised Herbrand Base with Indefinite Values for database D, H B I D is the set of ordered pairs pt 1 ; :::; t m ; s where pt1; :::; tm 2 H B I C D and s 2 S. We call elements of H B I D extended atoms.

Querying and Updating Constraint Databases with Incomplete Information
along with the globally defined binary constraints.
A program with indefinite values can be viewed as a compact means of representing a set of possible worlds.
For example, if D had the following atoms p 1 ! 1 ; 5 ! 1 ; ! 1 11, p 2 ! 2 ; 3 ! 2 ; ! 2 4, andp35; ; this means that either 1 or 2 is in predicate p 1 , either 3 or 4 is in predicate p 2 and 5 is in predicate p 3 , we can view D as the representing four possible worlds: D 1 = fp 1 1; p 2 3; p 3 5g; D 2 = f p 1 2; p 2 3; p 3 5g; D 3 = f p 1 1; p 2 4; p 3 5g; D 4 = f p 1 2; p 2 4; p 3 5g.These can be obtained by instantiating each indefinite value !i with a constant from the constraint domain which satisfies any binary conditions.Each possible world can therefore be identified with a particular substitution [10] but it is not necessary to carry out this substitution in order to represent the semantics of the database.p! 2 ; 15; 6 ! 2 ; ! 2 12g where I 1 and I 3 are possible intrepretations and I 2 is a definite interpretation.As a consequence of allowing two types of interpretations, two types of model can be derived for a clause c or set of clauses s.
Definition 18 (Possible model) I is a possible model M P D for a clause c (or set of clauses s) iff for each fact F in I, S j = P F (that is, F satisfies S).
Definition 19 (Definite model) I is a definite model M D D for a clause c or set of clauses s iff for each fact Definite (resp.possible) models have a unique minimal model M M D D (resp.M M P D ) which is the intersection of all the definite (resp.possible) models of D.
The minimal model therefore consists of two parts, the data part and the constraint part, and defines the declarative meaning of a constraint database D. That is, it contains all constrained atoms which are definite (resp.possible) consequences of D. For any atom A, D j = P AD j = D A if A is a member of minimal model

M M D D ( M M P D ).
Example 13 I 1 is a possible model of D, I 2 is a definite model of D and I 3 is not a model.

Fixpoint Semantics
Using either of the declarative approaches to generate the model theoretic semantics of indefinite constraint databases would be inefficient as it would require generating all models of D and then finding the intersection.
Advances in Database and Information Systems, 1997

Querying and Updating Constraint Databases with Incomplete Information
However, IDC(L) databases have a fixed point semantics which generates extended atoms.The fixpoint semantics of IDC(L) starts from an interpretation containing the base extended atoms of D. A function T D (resp.T P ) is applied to an interpretation I to obtain the interpretation which is its immediate consequence.The immediate consequence is generated by applying the rules in D to an interpretation I.

Procedural semantics
Before we can define a procedural semantics for IDC(L) we need to extend the definitions of unification and resolution for definite deductive databases.

Definition 20 (Unifiers for extended atoms)
A set of constrained atoms T = A 1 ; :::; A n is unifiable if there is a constrained substitution such that A i = A j for 1 I ; jnand A 6 = ;. is called a unifier of T. A unifier of T is an mgu iff for each unifier of T, there is a substitution such that = .
Depending on which form of constrained substitution is used we say that a unifier is either a definite unifier D or a possible unifier p .

Example 15
Let A 1 = pY ;! 1 and A 2 = pa; X and let c 1 = f6 ! 1 ; ! 1 9 g and c 2 = fX = 8g.fA 1 ; A 2 g is definitely unifiable since D = fY a ; ! 1 X g is a definite unifier.If c 2 = f5 X;X 10g then D is not a definite unifier but is a possible unifier theta P since c 1 is consistent with c 2 .

Definition 21 (Definite (possible) SLD-(L)-derivation step)
A definite (resp.possible) SLD(L)-derivation from the goal A on database D proceeds by selecting a clause H A 1 ^: : : Â n Ĉ 1 : : : Ĉ l : in D such that ; 2 f D;Pg, is the mgu of A and H.An SLD(L)-derivation of a goal A is a finite or infinite sequence G 0 = G; G 1 ; :::; G n of goals, a sequence C 1 ^: : : Ĉ l of variants of clauses of D and a sequence ; : : : ; of mgus, such that each G i+1 is derived from G i and C i+1 using i+1 .

Definition 22 (SLD(L)-refutation)
An SLD(L)-refutation of P fAgis a finite SLD(L)-derivation of P fAg where the last element is a goal of the form c where c is the answer constraint of the derivation.All other finite SLD(L)-derivations are finitely failed.

Answering queries in IDC(L)
IDC(L) extends existing query optimisations and evaluation algorithms for constraint databases to take account of the possible incompleteness of data.In particular, IDC(L) allows three types of queries: definite queries denoted by the symbol , preceding a goal, possible queries denoted by the symbol and conditional queries represented by unannotated goals.
Advances in Database and Information Systems, 1997

Querying and Updating Constraint Databases with Incomplete Information
Definite queries have a "truth in all possible worlds" semantics.Under this semantics a fact "satisfies" a query if every assignment of fully specified values to IC-constants appearing in the facts, consistent with the global constraint set, satisfies any conditions appearing in the query.To compute the answer set to this query requires checking whether the global conditions for that fact entail the query conditions.This corresponds to the definite model of the database.
Possible queries have a "truth in at least one possible world" semantics.Under this semantics, a fact "satisfies" the query if at least one assignment of fully specified values to IC-constants in the fact, consistent with the global condition, satsfies the constraint in the deductive relation.To compute the answer set to this type of query we need to check the satisfiability of the conjunction of constraints present in the global constraint set and the deductive relation.This is equivalent to a possible model for the query.
Conditional queries also have a possible model semantics, however, answers to conditional queries are accompanied by the conditions under which they are true.This is computed by integrating any query conditions with the global conditions for each fact and then simplifying the resulting constraint set.
The above query types require that the rule application step in bottom-up evaluation is replaced by the following steps: 1.For every goal atom appearing in the body of the rule, choose a fact contained in the corresponding relation.
2. If the query is a definite query, check that any conditions associated with the fact entail any conditions appearing in the body of the rule else, check that the two sets of conditions are consistent.
3. If step 2 is satisfied, unify the constants appearing in the consistent facts with variables appearing in the head of the rule and use variable elimation techniques to project relevent constraints onto any ICconstants which appear in the newly generated fact.
Once the set of answers has been generated, if the query is conditional, query conditions are integrated with the resulting conditions on facts appearing in answers to the query and simplified.
Prior to evaluation, we utilise various extensions to rewriting techniques in order to reduce the number of facts generated during query evaluation.This includes Query independent constraint propagation which occurs at assertion time and is outlined in a subsequent subsection.
Extended rule adornment which identifies constant and constraint bindings generated by a query.
Query dependent constraint propagation and sideways information passing which uses the extended adornments to rewrite rules so that subgoals with bound arguments are evaluated before unbound subgoals, and rule and global condition are placed at the earliest point at which they can be solved.

Updating ICD(L) databases
The simple extensions to query evaluation outlined in the previous section rest on a number of assumptions: That the information which we have about IC-constants is finitely representable.That all interesting constraints on IC-constants are explicity represented so that it is only necessary to collect the constraints relevent to a fact together for query evaluation to occur.
All constraints stored in the database are consistent.
Existing approaches to incomplete constraint databases do not address the issue of how this state is to be achieved in a dynamic database.Our research has therefore been concerned to explore the update issue wrt ICD(L) databases.The problems raised can be charaterised as follows:

Querying and Updating Constraint Databases with Incomplete Information
The presence of global conditions means that data is interrelated and that updates to one piece of data may have an effect on another.
The existence of the global constraint store means that the issue of belief revision needs to be taken into account when discussing updates.
Additional complexity is introduced if updates with conditions are allowed.These may require that an update is performed only to a subset of the possible worlds of the database.These problems mean that the standard set of update operators (insert, delete and modify) are not adequate.Instead we extend the work of Abitoule and Grahne and Winslett incomplete relational databases to develop the Dynamic IDC(L) (DIDC(L)) language.

The CDUL Language -Background
In general, the goal of updating a database D with a fact U is a database that contains all the information in D as well as the information needed to derive the new fact or to prove its falsity.The problem tackled here is slightly different and can be characterised as follows: Given a normal indefinite constraint database D consisting of an EDB, an IDB and a constraint program GCS, a partially solved database of D 0 , and an update U to apply to D, compute an update such that U 0 is a partially solved program wrt D. The method used to achieve U 0 is to find ways to identify those parts of the GCS affected by an update and leave unchanged those parts not affected.In IDC(L) this requires that when a new constraint is added to the GCS all interesting constraints which can be derived from it are also inserted into the GCS and, when a constraint is removed from the GCS, any constraints which have been derived from it are also removed.This approach means that the following categories of update can be distinguished.

Updates vs revisions of the database
Firstly, we distinguish between updates and revision of a database where an update is an action which changes the truth value of a fact in the extended database (EDB) of an IDC(L) database and a revision is an action which alters the information that is stored about a partially specified fact in the EDB.For example, suppose we wished to carry out the action U = "Insert personfrank; hayward; !; 1840 !where personarthur;hayward; ".Assuming there is some fact which can be substituted for the where part of the clause, then if personfrank; hayward; ! is not true wrt the EDB of D prior to the insert and is true afterwards, then U is an update.On the other hand, if personfrank; hayward; ! is already true, then U represents a revision of an already known fact.In order to avoid confusion, in the remainder of this paper we call the act of changing the contents of a database a dynamic action where such an action may be either an update or a revision.

Passive vs active updates and revisions
Secondly, we distinguish between passive and active actions.A passive action changes the truth value or revises the known information about a single explicitly listed fact.An active action, on the other hand, causes the update or revision of other, existing atoms.It should be noted that in this thesis we do not consider active actions resulting in the update of other atoms, however, this category of actions would allow us to model actions on rules and is considered as one element of future research.Instead, we only consider active actions which result in the revision of another atom.For example, in the action U = "Insert personfrank; hayward; !; 1840 !where personarthur;hayward; " only the fact personfrank; hayward; ! was affected.However, the action U = "Insert personfrank; hayward; !; !! 1 where personarthur;hayward;! 1 " not only affects the fact personfrank; hayward; !, but also the fact personarthur;hayward;! 1 as the introduction of the constraint !! 1 infers an additional constraint ! 1 1840.We normally require that active updates or revisions only result in adddition of data about other atoms if that data is consistent with what is already known.However, we recognise that the data could be inconsistent yet still represent the true state of the world.This situation is equivalent to modication in standard databases and is handled ina similar fashion.
That is, we would require that the existing information is first deleted before the update/revision is carried out.This could, of course, be done automatically.

Definite vs possible world actions
A further distinction is made between definite and possible updates.This distinction arises because IDC(L) has two semantics -a definite world semantics which generates interpretations of clauses based on whether they are entailed by EDB facts and a possible world semantics which generates interpretatations for clauses based on their consistency with the EDB.Either of these semantics can be used to check the pre-condition(s) of the update.

Every world and some world actions
In this section we distinguish between actions which are performed on every world represented by a constraint and those which are performed on only a subset of possible worlds.For example, suppose U = "Insert personfrank; hayward; !; 1840 !where personarthur;hayward;X; X 1850".If a possible world semantics is used to evaluate the precondition of this update then the body is true even though it is possible for Arthur to have been born after 1850.The head of the update is a therefore only true in those possible worlds where Arthur was born before 1850, but is false in those possible worlds where he was born after 1850.Using the definite world semantics for the precondition would result in the head of the update being definitely true in all possible worlds.

The Update Language -Syntax
The primitive manipulations definable on a constraint database program are general insertion, general deletion, positive revision and negative revision of (constraint) atoms.These are denoted by the symbols f+; ,; ++; ,,g respectively.For the sake of readablity, in this chapter we will refer to ground constraint atoms of the form pX c. using the form pX; c or, where the only constraints are equality, as pv.
Definition 23 A basic action U is of the form p 0 X; c p 1 X 1 ; : : : ; p n X n ; C m +1 ; : : : ; C l : where denotes either possible or definite, is an action operator, the head of the action consists of p 0 X (also denoted by A), a predicate to be inserted or deleted, and c, a constraint expression to be inserted or deleted.Either p 0 X or c may be empty, but not both.The body of the action p 1 X 1 ; : : : ; p 0 X n ; C m +1 ; : : : ; C l : (also denoted by B) 1.The head constraint expression c is a conjunction of constraints of the form t 1 t 2 where at least one t i is an IC-constant appearing in p 0 .The second t i may be any of the following: an IC-constant appearing in p 0 , a variable appearing in p 1 X 1 ; : : : ; p 0 X n , or a constant from the constraint domain denoted by L.
2. Each C i is a constraint t 1 t 2 where at least one t i is a variable appearing in one of the EDB predicates, the other t i may be any of the following: a variable appearing in p 1 X 1 ; : : : ; p 0 X n , or a constant from the constraint domain denoted by .
Basic actions are combined declaratively to allow actions in sequences to express actions involving more than one atom.
Definition 25 Let fU 1 ; :::; U n gfn 1g be a set of basic actions.Then U 1 ; :::; U n is an action operation.
The execution of the action operation U 1 ; :::; U n corresponds to executing the basic action U 1 first; then U 2 is executed, and so on, until the execution of the basic action U n .

Example 16
The following example actions highlight the syntax of CDUL(L).Each example is followed by an informal description of its semantics.In the examples, each !i repesents an IC-constant from the constraint domain dePCL [9].+personharold; wilks; !; 1840 !personalfred; mccomb; : +personrobert; mccombe; !; 1840 !;!! 2 personalfred; hayward; ! 3 : The first update is a passive insert which results in a database where personharold; wilks; !; 1840 ! is definitely true assuming that prior to the update personalfred; hayward; is either definitely or possibly true wrt the EDB (depending on whether the update is a definite or possible update) and personharold; wilks; !; 1840 ! is definitely false.This is achieved by adding the data part of the update to the EDB and generating the solution universe for the IC-constant.The second update is an active insert, the result of which is a database where personrobert; mccombe; !; 1840 !;!! 2 is definitely true and is achieved by inserting personrobert; mccombe; ! 1 into the EDB and merging the associated solution universe with that for ! 3 .This insertion will only succeed if the new constraint is consistent with the existing constraints in the EDB and results in the addition of new constraints on the range of ! 3 (e.g.1840 ! 3 .
+ + 1830 X personalice; mccombe; X: + + X ! 2 ; 1830 X personalice; mccombe; X; personrobert; mccomb; ! 1 : The first revision is passive and provides additional information on the IC-constant which is substiuted for the variable X.The result, assuming a substititution personalice; mccombe; ! 3 exists for the body predicate, is to add the constraint 1830 ! 3 to the solution universe for the substituted fact (if it is consistent) and to propagate any constraints which can be derived from the existing global conditions and the new constraint.
The second revision is active and, again assuming personalice; mccombe; X and personrobert; mccomb; ! 1 are true are true wrt the EDB, is to add the conditions ! 3 ! 1 ; 1830 ! 3 to the global conditions of the two pre-condition predicates, as long as the conditions being inserted are consistent with the conditions in the store.
, , personrobert; mccombe; ! 1 personalfred; hayward; ! 2 .This is a deletion.If the global condition (should there be one) for personrobert; mccombe; ! 2 does not contain any constraints involving IC-constants which do not appear in the data part of the constraint fact, the deletion is passive and the result, assuming both personalfred; hayward; ! 2 and personrobert; mccombe; ! 1 are true wrt the EDB, is to delete personrobert; mccombe; ! 1 from the EDB and to delete the solution universe for ! 1 .On the other hand, if the global condition for personrobert; mccombe; ! 1 does contain constraints involving IC-constants which do not appear in the data part of the constraint fact, then the deletion is active and requires that not only is the data for the constraint fact personrobert; mccombe; ! 1 be deleted but any constraints which were derived from its insertion.
, , ! 3 ! 1 ; 1825 ! 3 personalice; mccombe; ! 3 ; personrobert; mccomb; ! 1 : This is an active negative revision.Assuming the predicates in the body and constraints in the head of the action are true prior to the action, this action results in the deletion of those constraints and any information which has been inferred from them through constraint propagation.
+personrobert; mccombe; ! 1 personalfred; hayward; ! 2 ; ! 2 1862: This last example highlights the every world and some world aspects of IDC(L).The effect of the update is to action to be performed only in the subset of possible worlds denoted by the rule condition.In the example this means that the inserted atom and associated conditions are only true in worlds where Robert McCombe was born after 1840.

For
the purposes of query evaluation and optimisation, the database D is partitioned into three sets: the extensional database D B , the set of ground and constrained user atoms in D, the intensional database D D , an IDC(L) program and the global constraint set GCS D , the set of constraints on IC-constants appearing in DB.The predicates occurring in D B and D D are divided into two disjoint sets: the D B -predicates which are all those occurring in D B and the D D -predicates which are all those occurring in D but not in D B .We require that the head predicate of each rule in D be a D D -predicate, D B -predicates may occur in D D but only in the body of clauses.The global constraint store (GCS) is also partitioned into two disjoint sets: the set of unary constraints U C D and the set of binary constraints BC D .

Definition 6 (Definition 8 (
Consistent)[7] Let c 1 an c 2 be sets of constraints, then c 1 and c 2 is consistent iff D j = c 1 ^c2 .Definition 7 (Entailment)[7] Let c 1 and c 2 be two constraints, c 1 entails c 2 iff D j = c 1 !c 2 .For the purposes of this thesis the tests for consistency and entailment are always combined with a function infer which computes from c 1 an c 2 a new set of simplified constraints.The above definitions provide the basis for two alternative definitions of constrained substitution.Constrained substitution) A constrained substitution is a finite set of the form ft 1 =v 1 ; :::; t m =v m g;

Definition 10 ( 5
Generalised Herbrand Base) The base B D is the set of all ground definite facts which can be generated from the predicates in D and the universe U D .Example The base of the set of clauses C is as follows: B D = fp9; 9; p 9; 10; p 9; 11; :::; p15; 15g Definition 11 (Generalised Interpretation) An interpretation I is a subset of B D containing all facts which

Example 6
Possible interpretations for the example database are: I 1 = fp9; 10; p 10; 9; p 9; 9g I 2 = fp9; 10; p 10; 10; p 9; 10g I 3 = fp9; 10; p 10; 11; p 9; 11; p 13; 15g Definition 12 (Generalised Possible Model) An interpretation which is a possible solution to a clause C is called a Generalised Possible Model (GPMD) of D. D has no unique minimal model.Instead, a minimal possible model is one for which no other model of D is strictly smaller.
; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15g DC The Generalised Herbrand Base with ICconstants H B I C D of D is the set of all atoms generated from the predicates in D and the universe U C D .We note that when there is no IC-constant in D, H B I C D reduces to the (classical) Herbrand Base.Example 8 A sample of the base H B I C D is as follows An extended interpretation I is a subset of the H B I D containing all the extended atoms which are true under I.An extended fact pt 1 ; :::; t m ; c 2 D is true under I iff pt 1 ; :::; t m ; c 2 I .A rule R = A 0 A 1 ^: : : Â n Ĉ 1 : : : Ĉ o : is true under I iff whenever A 1 ^: : : Â n are true in I, (A i = pt 1 ; : : : ; t m ; c ) it also holds that A 0 is true in I, where 2 f D;Pg.If = D then I is called a definite extended interpretation otherwise, if = P, I is called a possible extended interpretation.Example 12 Three extended interpretations of the database D are: predicate pre-condition of U, and C m+1 ; : : : ; C l ; called the constraint pre-condition of U. If both c and C are empty then the action is equivalent to a standard (relational) update.Definition 24 Each update to an IDC database must satisfy the following conditions: is a pre-condition which consists of p 1 X 1 ; : : : ; p 0 X n , Advances in Database and Information Systems, 1997 called the