Query Language Primitives for Programming with Incomplete Databases

We study the problem of choosing a suitable collection of primitives for querying databases with disjunctive information. Theoretical foundations for querying such databases have been developed in [11, 12]. The main tool for querying disjunctive information has come to be known under the name of normalization. In this paper we show how these theoretical results can lead to practical languages for querying databases with disjunctive information. We discuss a collection of primitives that one may want to add to a language in order to be able to ask a variety of queries over incomplete databases (including existential and optimization queries). We describe a new practical and easily implementable technique for partial normalization, and show how to combine it with the known technique for space-e cient normalization. As the result, we demonstrate that with very little extra added to the language, one can express a variety of primitives using just one general polynomial-space iterator. We discuss some practical implications, including nondeterminism of the resulting language, and the implementation project.


Introduction
We study querying databases in which incomplete information is represented via disjunctions.Such databases often arise in the design and planning areas, as was rst noticed in [9].For certain objects whose values are not known at present, a database may contain a number of possible values.Choosing one possibility for each instance of disjunctive information gives us a possible world described by an incomplete database.In practical applications, most queries the user would like to ask are queries against collections of possible worlds, rather than the representation of those possible worlds by means of disjunctive information.That is, additional transformation of the data stored in a database is needed in order to answer such queries.The need for distinguishing two classes of queries against databases with disjunctive information is known in the literature, cf.[9, 10, 1 2 , 16].Queries that ask questions about the representation of possible worlds are called structural, whereas conceptual queries ask questions about the data encoded by the information in a database.For example, consider a template used by a designer (shown in gure 1).It may indicate that part D consists of two subparts, A1 and A2, and A1 is built from B1 and B2 and B3, while B1 i s a or b or c, B2 i s d or e, and B3 i s f or g.The subpart A2 has a similar structure.In gure 1, vertical and horizontal lines represent parts that must be included, while the sloping lines represent possible choices.It must be stressed that the smallest subparts shown in gure 1 may in turn have v ery complex structure and involve incomplete information.With the example in gure 1 we can illustrate the dierence between structural and conceptual queries.A structural query may ask about the number of possible choices for B1 { this information can be directly obtained from the database.Conceptual queries ask questions about possible completed designs.Most typically, these are existential queries (is there a completed design that costs less than $m?) or optimization queries (nd the most reliable design).Complexity of conceptual queries was studied in [9, 1 0 ] and a coNP-completeness result was proved.Then tight upper bounds on the number of possible worlds encoded by databases with disjunctions were obtained in [12].Roughly, if a database has size n, the size of the collection of possible worlds encoded by it is bounded above b y n 1 : 45 n .T h us, answering conceptual queries is generally very expensive; nevertheless, they do arise in practice and one needs mechanisms for answering them.

Query Language Primitives for Programming with Incomplete Databases
A collection of tools for answering conceptual queries was developed in [12] and further investigated in [11].These tools have come to be known under the name of normalization, and the collection of all possible worlds as normal forms.A normalized database is a collection of all possible worlds encoded by a database; a conceptual query is simply a structural query on a normalized database.In [12], a simple algorithm to compute normalized databases was given.However, it required exponential space.
That solution was rened in [11], where a polynomial-space normalization mechanism was presented.It was achieved by reusing space for possible worlds, and processing them one at a time.This requires keeping a special structure, called an annotated object, to indicate choices for all instances of disjunctive information in a database.A new primitive called norm, based on this idea, was suggested in [11].It allows more control over the process of normalization.For example, it can stop iterating if a condition is satised.This has a potential of speeding up existential queries.However, the solution of [11] is still far from what we need in practical problems.There are at least two reasons for this.
Most importantly, a programmer may w ant a larger collection of primitives suitable for various kinds of queries.For example, if a normal form is so large that producing all its elements in infeasible, one may want to set a time limit and attempt to nd an entry either satisfying a given condition, or optimizing a criterion for a given time.Moreover, one may w ant a mechanism for resuming this process from the point where it was stopped.In the case of optimizing criteria over extremely large normal form, one may w ant to randomize this process, trying possible worlds from dierent \areas".Some of the disjunctions may not be involved in conceptual queries.For instance, in the design example above, the designer may decide that the reliability of part A2 is irrelevant, and try to optimize the reliability of part A1.In current query evaluation methods, this would involve normalizing the whole object.So if part A2 has a complex structure, a lot of redundant computation will be done.Thus, we need tools for partial normalization that avoid such unnecessary computations.The solution of [11] was based on the concept of -rewriting, which is rather hard to grasp, and therefore very hard to incorporate into a query language.
The main goal of this paper is to use the theoretical results of [11,12] to come up with a collection of query language primitives suitable for a variety of conceptual queries against databases with disjunctive Database Programming Languages, 1995 information; in particular, we w ant to address the shortcoming mentioned above.The main contributions are summarized below.
1. We dene the concept of a partial normal form which represents incomplete possible worlds.That is, some of the disjunctions are still allowed in possible worlds.Our concept of partial normal form is less general but much more intuitive than that of [11] and can be easily incorporated into a query language.
2. We generalize the normalization mechanism in two aspects.First, we make i t w ork with both normal forms and partial normal forms.Second, its output includes a special data structure, called an annotated object, that allows us to resume the normalization process from the point where it was stopped.
3. The normalization mechanism we present in this paper is suitable for extending the language with a number of primitives that are useful in various kinds of conceptual queries; moreover, as we shall show, it is easy to construct new primitives for new applications in a uniform way.F or some applications, such as optimizing criteria over very large sets of possible worlds, we h a v e to settle for operations with nondeterministic semantics.This is the price to pay for making the language more practical.
4. We briey discuss the implementation of the operations presented in this paper.It is done as a library in OR-SML [7], the system for querying databases with disjunctive information.
Let us give a simple example to explain the gist of our approach.With each object, we associate an annotated object that indicates the choices made for each instance of disjunctive information that is relevant to the query.The idea of annotation is illustrated by the second picture in gure 1, where an arrow indicates the choice that was made.In this example we assume that a query only concerns A1 (for instance, what is the most reliable conguration of A1?).Hence, the subobject corresponding to A2 is not annotated.Note that simply picking an element from each disjunctive collection is not enough to list all possible worlds, as we m ust also know which ones have been looked at.For this, we translate collections (bags, or multisets in this paper) into lists, and mark each subobject with a label, indicating its type and whether all possible subworlds it encodes have been looked at.
In the example in gure 1, we assume the order of elements in collections to be \from left to right".The D node receives the (P;T) label.Here P stands for \pair", and T is true { there are still possible worlds to look at.The label of the A2 n o d e i s ( I;F).Here I is \initial" { we do not consider possible worlds encoded by this subobject.Hence, F (false) means that there are no additional objects that A2 m a y encode.The arrows point at the elements of disjunctive collections that are to be chosen.Since two arrows point t o t h e last elements (in the lists), they are labeled by F. The key to the polynomial-space normalization is the algorithm that takes an annotation and produces the \next" one.In our example, the next annotation is produced by shifting the rst arrow one position right (to point a t b ), and resetting two other arrows by making them point a t d and f.Also, they will be labeled by T because they will no longer be pointing at the last element.
To formalize this intuitive notion of annotation, we need a formal way of distinguishing instances of disjunctive information.Our approach to representation of disjunctive information is based on [9,12,15]: to distinguish ordinary sets from collections of disjunctive possibilities, we call the latter or-sets and use hi to denote them.In the design example, A1 can be represented as a set or multiset fB1; B 2 ; B 3 g , while B1 is an or-set ha; b; ci.Or-sets have t w o distinct representations.With respect to structural queries, or-sets behave like sets, but with respect to conceptual queries, an or-set denotes one of its elements.For example, h1; 2i is structurally a two-element set, but conceptually it is an integer that equals either 1 or 2.
A language for sets and or-sets was designed in [12] and rened in [11].We use it here as an ambient language.Note that we use the version based on bags (multisets) rather than sets.This is necessary because keeping duplicates is very important for the normalization process [11].Our ambient language contains standard languages for nested bags, such as BALG [5,6] and BQL [13,14], as its sublanguages.To obtain the corresponding results for sets, one can use the techniques of [11] in a straightforward way, so here we only present results for bags.
Database Programming Languages, 1995 Organization.We dene normal forms, partial normal forms, the ambient language, and prove the generalized normalization theorem for partial normal forms in section 2. Annotated objects, space-ecient normalization algorithm and a general programming primitive for iterating over partial normal forms are presented in section 3. Extending the language with a variety of normalization primitives based on the general iterating schema is described in section 4. A brief description of the implementation project is given in section 5. Concluding remarks are given in section 6.

Normalization revisited
In this section we dene our ambient language, the Nested Bag{OrSet Algebra NBOA, and explain the concept of normalization.We also give a new denition of partial normalization that is suitable for being used in a query language, and is more intuitive than the one given in [11].
Types and Objects.Types of objects are given by the following grammar: t := b j unit j t t j f j t j g j h t i Here b ranges over a collection of base types such a s i n tegers (type int), booleans (type bool) and reals (type real).Type unit has one value denoted by ().Values of the product type t t 0 are pairs (x; y) where x has type t and y has type t 0 .V alues of the bag type fjtj g (or-set type hti) are nite bags (or-sets) of values of type t.
Any object containing or-sets is also called an or-object.A n y t ype that uses the hi constructor is called an or-type.Empty or-sets hi mean inconsistency.Handling empty or-sets was discussed in [12], and we do not touch it here, assuming throughout the paper that no object contains an empty or-set subobject hi.
Normal forms and partial normal forms.First, following [12], we dene the rewrite system (TRS) on types: s h t i !h s t i h s i t !h s t i hhtii !hti fjhsij g !h f j s j gi We use the notation s !t if s rewrites to t in zero or more steps.A normal form (type) is a type that can not be rewritten any further.The skeleton sk(t) is dened as t from which all or-set brackets have been removed.That is, sk(b) = b , sk(t t 0 ) = sk(t) sk(t 0 ), sk(fjtj g) = f j sk(t)j g and sk(hti) = sk(t).Lemma 1 ([12]) The rewrite system (TRS) is Church-Rosser and terminating; hence, every type has a unique normal form.For every or-type t, hsk(t)i is its normal form.

2
Intuitively, objects of type sk(t) are those encoded by objects of type t.F or example, if an incomplete design is stored as an object of type t, then the completed designs represented by i t h a v e t ype sk(t).One can also assume that certain disjunctions may still be allowed in the conceptual representation for the following reason.If a conceptual query asks only for possibilities encoded by certain disjunctions, others should not be unfolded in order to answer the query { that would be a redundant computation.Since the normalization process is very expensive, redundant computations may be too costly and may disallow some queries that are in fact answerable.To provide a mechanism for partial unfolding, we dene the concept of partial normalization.The intuition behind partial normalization is the following.We treat certain subtypes (perhaps involving or-sets) as base types and perform the usual normalization.This way those subtypes are not aected and consequently some of the disjunctions are not unfolded.To state this precisely, let s[t=p] b e s in which the subtype at position p is replaced by t, and let s[t=t 0 ] b e s in which e v ery occurrence of the subtype t 0 is replaced by t.Let s p denote the subtype of s at position p and let b 1 ; b 2 : : :be uninterpreted base types.Denition.Let s and t be two types, not involving b 1 ; b 2 ; : : : .Then s is called a partial normal form of t if there exist n 0 positions p 1 ; : : : ; p n in type t, n o p i dominating p j , i 6 = j, and two types s 0 and t 0 such that 1) t 0 = t[b 1 =p 1 ; : : : ; b n =p n ]; 2) s 0 is the normal form of t 0 ; 3) s = s 0 [t p1 =b 1 ; : : : ; t The following diagram provides an illustration for this denition.We rst replace subtypes at p i 's with b i 's, then normalize the type and then restore the subtypes at p i 's in place of b i 's.Note that a type may h a v e more than one partial normal form, but only one normal form. 2 Our next goal is to dene the concepts of normal form and partial normal form on objects.Intuitively, a n object x, not involving disjunctions, is in the normal form of an or-object y, written as x l y, i i t i s i n the conceptual representation of y.F or partial normal forms we dene the relation x l [ [ [ [y : t; s] ] ] ] meaning that x is in the conceptual representation of y of type t at type hsi.That is, x of type s can be viewed as a representation of y under unfolding of those disjunctions that are to be unfolded in order to transform y into an object of type hsi.It can also be viewed as an incomplete possible world for y.The formal denition of both versions of l is given in gure 3. n denotes the group of permutations on f1; : : : ; n g .Proposition 3 1) Suppose that for an object y of type t and an object x there is a derivation, according to the rules of gure 3 , f o r x l [ [ [ [ y : t; s] ] ] ].Then x is of type s.Moreover, either s = t, o r h s i is a partial normal form of t.
2) Suppose that for some object y of type t there is a derivation for x l y.Then x is of type s k ( t ) . 2 Denition. 1) For any object X, its normal form nf (X) is dened as the or-set hx 1 ; : : : ; x n iof all objects x i such that x i l X.
2) For any object X of type t, its partial normal form over type hsi, pnf (X;s) is dened as the or-set of all x of type s such that x l [ [ [ [X : t; s] ] ] ].
Note that nf (X) and pnf (X;s) are always nite.Furthermore, nf (X) can be alternatively dened as pnf (X;sk(t)) if the or-object X is of type t.
Ambient language and normalization theorems.Normalization theorems provide us with a list of operations that can be applied to an object until the normal form is produced.We need a language that Database Programming Languages, 1995 Base x l x x of base type Pair x 1 l y 1 x 2 l y 2 (x 1 ; x 2 ) l( y 1 ; y 2 )

Bag
x i l y i ; i = 1 ; : : : ; n f j x 1 ; : : : ; x n j glf j y 1 ; : : : ; y n j g Figure 3: Rules for l contains these operations.We adopt the framework of [12] based on [2,3].The operators and their most general types are given in gure 4.
Semantics.F or general operations: f g is function composition; (f;g) is pair formation; 1 and 2 are the rst and the second projections; !always returns the unique element o f t ype unit; eq is equality test; id is the identity and cond is conditional.For bag operations: b empty is the function that represents the constant fjj g; b sng forms singletons: b sng(x) = f j x j g ; ] takes additive union of two bags; b at attens bags of bags, adding up multiplicities: b at(fjfj1; 2j g; fj2; 3j gj g) = f j 1 ; 2 ; 2 ; 3 j g ; b map(f) applies f to all elements of a bag; and b pair 2 is pair-with: b pair 2 (1; fj2; 3j g) = f j (1; 2); (1; 3)j g.Operators on or-sets are exactly the same as operators on bags except that the prex or is used, and duplicates are eliminated.
It was suggested in [12] to assign functions in the language to the rewrite rules so that for every rewriting from t to s there would be an associated denable function of type t !s.The goal of this assignment i s t o obtain a function of type t !h sk(t)i that produces the normal forms for or-objects of type t.
We associate the following functions with the rewrite rules: or pair 2 : s h t i !h s t i or pair 1 : hsi t !h s t i or at : hhtii !hti combin : fjhsij g !h f j s j gi Here or pair 1 = or map(( 2 ; 1 )) or pair 2 ( 2 ; 1 ) is pair-with over the rst argument.It is possible to dene the function app(r) that applies rewrite rules to objects using the above functions.For example, applying the rewriting r = fjfjhsij gj g !f j hfjsj gij g yields the function b map(combin).This function can be extended to rewrite strategies by composition.(Technical details of the denitions can be found in [11,12].)Database Programming Languages, 1995 General operators g : u !s f : s !t f g : u !t f : u !s g : u !t (f;g) : u !s t !: t ! unit 1 : s t !s 2 : s t !t eq : t t !bool id : t !t c : s !bool f : s !t g : s !t cond(c; f; g) : s !t Operators on bags b empty : unit !f j t j g b p air 2 : s f j t j g !f j s t j g ] : f j t j g f j t j g !f j t j g b sng : t !f j t j g f : s !t b map f : fjsj g !f j t j g b at : fjfjtj gj g !f j t j g Operators on or-sets or empty : unit !h t i or pair 2 : s h t i !h s t i or [ : hti h t i !h t i or sng : t !h t i f : s !t or map f : hsi !h t i or at : hhtii !hti Interaction combin : fjhtij g !h f j t j gi Figure 4: Operators of NBOA The following result is new.The normalization theorems of [11,12] can be seen as its corollaries.
Theorem 4 (Partial Normalization) For any or-object x of type t, any type hsi which is a partial normal form of t and any rewrite strategy r : t !h s i , the following holds: app(r)(x) = pnf (x; s).Corollary 5 (Normalization [11]) For any or-object x of type t and any rewrite strategy r : t !h sk(t)i, the following holds: app(r)(x) = nf (x).

Annotations and polynomial-space normalization
In this section we extend the polynomial-space normalization of [11] to partial normal forms.The idea of the polynomial-space normalization is similar in the spirit to that of the \pipeline" evaluation of queries in the powerset algebra of Abiteboul and Beeri, see [1].Note that combining polynomial-space normalization primitives and partial normalization was an open problem mentioned in [11].As the rst step, we i n troduce annotated types .An annotated type denotes an and-or tree underlying an or-object, and it indicates a choice of element for certain or-sets.Using these choices in places of or-sets, we obtain elements of partial normal forms, or, if the choice is specied for all or-sets, elements of normal forms.Annotated types are given by the grammar Here K i s a t ype that has four possible values: I (Initial case), P (Pair), B (Bag) and O (Or-set); t is an object type, and [] is the type of lists of type .
Database Programming Languages, 1995 Query Language Primitives for Programming with Incomplete Databases For each pair of types t and s, for which pnf(t; hsi) holds, we produce an annotated type A(t; s) as explained below.First though, we treat the simplied case in which s is the skeleton of t (i.e.hsi is the normal form of t).Then we use the notation A(t).The translation is given by the following inductive rules: The boolean value is true if not all entries encoded by the object have been looked at.For or-sets, the boolean component inside lists is used for indicating the element that is currently used as the choice given by that or-set.
For any or-type t, A(t; s) is dened if and only if hsi is a partial normal form of t.The idea of annotation is the same as above, except that some subtypes (maybe involving or-sets) are treated as base types and are not annotated.The positions of those subtypes in t are determined by s; they are precisely the subtypes whose disjunctions are not to be unfolded in the process of normalization.The annotated types A(t; s) are dened by the following rules, which are applied in the order in which they are given below.A(t; t) = K t A ( t 1 t 2 ; s 1 s 2 ) = K b o ol A(t 1 ; s 1 ) A ( t 2 ; s 2 ) A ( f j t j g ; f j s j g Proposition 6 If t is an or-type, t 6 = s, and A(t; s) is dened, then hsi is a partial normal form of t. 2 Objects of type A(t; s) can be seen as and-or trees underlying or-objects, such that selection of possibilities for all or-nodes gives us a complex object in the partial normal form.Hence, for evaluation of conceptual queries, we need mechanisms for a) translating or-objects into annotated objects, b) obtaining (partial) normal form entries encoded by an annotation, and, most importantly, c) iterating over all possible annotations.The solution proposed in [11] can be readily adapted here.Moreover, the iteration mechanism remains unchanged for partial normalization.We need three functions.The rst, init s : t !A(t; s), produces the initial annotation of an object, provided A(t; s) is dened.It is given by the following rules: init s x = ( I;x) i f x is of type s. init s1s2 (x; y) = ( P;true; (init s1 x; init s2 y)).init fjsj g fjx 1 ; : : : ; x n j g= ( B;true; [init s x 1 ; : : : ; init s x n ]).init s hx 1 ; : : : ; x n i= ( O;true; [(init s x 1 ; v 1 ) ; : : : ; where v 1 = false and v 2 = : : : =v n = true.
The function pick : A(t; s) !s produces an element of the partial normal form given by an annotation.In the denition below, void indicates the end of traversing an annotated object, i.e., all possibilities have been looked at.
pick (I;x) = x .pick (P;c;(x; y)) = if c then (pick x; pick y) else void.pick (B;c;[x 1 ; : : : ; x n ]) = if c then fjpick x 1 ; : : : ; pick x n j g else void.pick (O;c;[x 1 ; : : : ; x n ]) = if c then pick 1 (x i ) else void if 2 (x i ) = true.Finally, end : A(t; s) !bool returns true i all possibilities encoded by its argument h a v e been exhausted: end (I;x) = true, and on any annotated object x = ( k; c; v), end x = :c.The key part of the normalization algorithm is the iterator next : A(t; s) !A(t; s) which provides the depth rst search on the and-or trees, obtaining all possible annotations (given by the positions of the boolean components in lists encoding or-sets).The version of [11] has type A(t; hsk(t)i) !A(t; hsk(t)i) but it can be easily modied to produce the one of type A(t; s) !A(t; s).Also, next can be implemented in a purely functional language.Now w e can show that starting with init s (o : t) and repeatedly applying next to it, we obtain annotations for all elements in pnf (o; s).

Database Programming Languages, 1995 4 Extending the language
In this section we show h o w a n umber of desirable normalization primitives mentioned in the Introduction can be obtained if apnorm cont is present in the language.We divide these primitives into four groups.
It is known that there exists a calculus version of NBOA, see [3,12], in which expressions denote objects and not functions.This equally expressive v ersion of the language allows the standard if -then-else construct, as well as using -abstraction to specify the function argument o f b map and or map.In this section, we shall use both if -then-else construct and -abstraction.However, this does not enrich expressiveness of the language.

General normalization primitives
Partial normalization, starting with an annotation.For operations in this group, we require presence of init.F or our rst operation, the idea is the same as for the general partial normalization: we start with an annotation and iterate over all partial normal form entries, producing the result.It is typed and dened as follows: (s; u; v) pnf(t; hsi) apnorm(P) : A ( t; s) !v apnorm = 1 apnorm cont Partial normalization, parameterized by types.The idea is the same as above, but no annotated objects are involved.Instead, these primitives are parameterized by t ypes of partial normal forms.(s; u; v) pnf(t; hsi) pnorm s (P) : t !v pnorm(P) = apnorm(P) init s Standard normalization.Given an object, iterate over its normal form, checking for condition and accumulating the result.This is the norm primitive of [11].It is simply pnorm skt (P).

Normalization with time constraints
Large sizes of normal forms can make iterating over them impractical.Then it is reasonable to set up a time limit for the normalization process to run, and return the result obtained so far, and an annotated object, so that the process of normalization can be resumed.To allow this, we use primitives of Standard ML of New Jersey [8] and dene a new type timer and two functions: start timer : unit !timer starts a new timer, and get time : timer !real gives the time that passed since the timer was initiated.We also use the let: : : in: : : end construct for local declarations, see [8].Partial normalization with time constraints.The normalization process starts from an annotated object and runs for a given time, returning the result formed by out, and the last annotation processed.The typing and a denition based on apnorm cont are as follows: (s; u; v) pnf(t; hsi) apnorm time (P) : A ( t; s) real !v A(t; s) Often one has to nd a (partial) normal from entry which is best according to some criteria (e.g., the most reliable design).For this we need the optimizing version of normalization primitives.Now b y P w e denote the pair (F; F ), and (s; v) is the abbreviation for (s; v) F : s !v F : v v !bool Here F is the criterion to be maximized with respect to the comparison function F .The main operator we use for this purpose is opt apnorm: (s; v) pnf(t; hsi) opt apnorm(P) : A ( t; s) !s v The semantics is the following: starting with a given annotation, look at all annotations that can be obtained from it by applying next, and return the one with the maximal value of F, together with that F-value: opt apnorm(P) ao = apnorm(P[ x:false=condition; (pick ao; F(pick ao))=initial acc; x:y:if F (F(y); 2 ( x )) then x else (y;F(y))=update; 2 =out ]) ao We can also dene opt pnorm s (P) : t !s v that optimizes F for all elements of the partial normal form of type hsi (as opt pnorm s (P) = opt apnorm init s ), and opt norm(P) : t !sk(t) v which optimizes F over the normal form (as opt pnorm skt (P)).

Optimization with time constraints
We present t w o functions that evaluate optimization queries under time constraints.The rst one starts with an annotated object, and uses next to produce new annotations for the time specied by a time limit.When the time has run out, it returns the best partial normal form entry found so far and the last annotation.
(s; v) pnf(t; hsi) opt apnorm time (P) : A ( t; s) real !(s v) A(t; s) opt apnorm(P) ( ao; T) = let tm = start timer() in apnorm cont (P[x:get timer(tm) > T = c ondition; (pick ao; F(pick ao))=initial acc; x:y:if F (F(y); 2 ( x )) then x else (y;F(y))=update; 2 =out ]) ao end Optimizing criteria with time constraints and random annotations.There is a more intersting approach to optimizing criteria on very large normal forms, when it is not feasible to calculate the value of F for each normal form entry. Indeed, the simple time limit approach m a y not be sucient, because optimal values may b e \ v ery far" from a given annotation in terms of the number of times next must be applied.Then, we believe (and experimental results conrm this), the right approach is to generate randomly a number of annotations and run the optimizing version of normalization from each of them for a given time.At the end, the best entry that was found is returned.This solution is given by the function (s; v) pnf(t; hsi) opt pnorm rand s (P) : t r e al int !s v The semantics is as follows.On the argument ( o : t; T : real; n : int), the following operation is performed n times: a random annotation for o is generated, such that applying pick to it produces an object of type s.Then, from this annotation, we generate new ones (repeatedly using next), until the last one is produced, or the time limit T is reached, returning the best one with respect to F. H a ving done this, we h a v e n pairs of type s v of objects of type s and values of F on them.The result of opt pnorm rand (P) is the best one with respect to F .The function opt pnorm rand s is implementable using the basic iteration mechanism and three auxiliary functions: pnf(t; hsi) random s : t !A(t; s) (s; v) select best(P) : f j s j g !s gen : int !f j intj g The rst one produces a random annotation of type A(t; s).The second, select best, selects the best element from a bag of type fjsj g with respect to the criterion F. It is undened on empty bags, and, if more than one element of a bag have nondominated F-values, selects one nondeterministically.The semantics of gen is given by gen(n) = f j 1 ; : : : ; n j g(this function plays an important role in establishing equivalences between set and bag languages with structural recursion and power operators [13, 1 4 ]).Now opt pnorm rand is dened in two steps.First, we dene one iteration step iter opt pnorm rand s (P)(o; T) = opt apnorm time s (P)(random s (o); T ) and then opt pnorm rand s (P)(o; T; n) = if n 1 then iter opt pnorm rand s (P)(o; T) else select best( 2 ; F )(b map(x:iter opt pnorm rand s (P)(o; T))(gen(n))) Summing up, to obtain the list of desirable normalization primitives, we do not have to add them all to the language.Instead, it is enough to have one general iteration scheme apnorm cont and a limited number of auxiliary functions.In this way it is easy to add new variations of normalization primitives.

Implementation project
The collection of normalization primitives discussed in this paper has been implemented as a library in the system OR-SML [7], which itself is a database programming language on top of Standard ML of New Jersey [8].In OR-SML, complex objects are SML-values, and one can take advantage of combining the features of a query language with the features of a fully-edged programming language.For example, we can use SML library that provides objects of type timer and functions on them to express time-constrained normalization primitives in the same way as it is done in section 4. For the extended abstract, we mention just one experimental result.If a normal form is very big, optimizing a criterion over it may take w eeks.Using a time limit, we m a y not reach a good result.In the example of [11], a criterion was optimized for 30 minutes, and the result within 4% of the optimal was produced.However, using the function opt pnorm rand , w e can see entries in dierent \areas" of the normal form.In fact, in the example from [11], using 10 iterations, each running 30 seconds (for the total of only 5 minutes), we consistently obtained results within 0:5% of the optimal.

Conclusion
In this paper we h a v e studied various techniques for normalizing databases with disjunctive information represented by or-sets.This problem is particularly important in the areas of application such as design and planning.Most of previous work provided foundations for asking queries against such databases.However, proposed solutions were impractical, mostly because of their complexity.

Database Programming Languages, 1995
In this paper we took advantage of the polynomial-space normalization iterator, proposed in [11], and extended the idea behind it.As the result, we came up with a number of query language primitives that can help answer a variety of conceptual queries.In fact, all that must be added to the language is one general iterator and a small number of auxiliary functions.The resulting variants of normalization are suitable for various kinds of conceptual queries.In addition, they provide mechanism for answering queries approximately, which i s v ery helpful when one has to optimize some criteria over extremely large number of encoded objects.In order to obtain such approximate solution, we often have to settle for nondeterministic operations, which limits our ability to reason about the resulting language.This is the price to pay for making the language applicable in practice.Summing up, we believe that using techniques of this paper can provide good practical algorithms for dealing with large applications involving databases with disjunctive information.

Figure 1 :
Figure 1: An incomplete database and its annotation apnorm time (P)(ao; T) = let tm = start timer() in apnorm cont (P[x:condition(x) _ (get time(tm) > T ) = c ondition]) ao end Partial normalization with time constraints, parameterized by types.Instead of an arbitrary annotation, we start with the initial one.Such a family of functions pnorm time s (P) : t r e al !v A(t; s) is dened by pnorm time s (P)(o; T) = apnorm time (P)(init s (o); T ).The full normalization with time limit norm time (P) : t r e al !v A(t) is then simply pnorm time skt (P).Database Programming Languages, 1995 Query Language Primitives for Programming with Incomplete Databases Optimization primitives p n =b n ].