Modeling Information Retrieval with Probabilistic Argumentation Systems

Probabilistic Argumentation Systems (PAS) are a technique for representing uncertainty both symbolically and numerically. It is shown that this technique, which combines symbolic logic and probability, can be used as a general model of information retrieval. PAS provide a dual (symbolic and numerical) interpretation of the logical uncertainty principle, and are a flexible model for integrating various sources of information about query or document contents


Introduction
Empirical results have shown that performances of a retrieval system may be improved by proper integration of multiple query representations [TC91,BKFS95] and multiple document representations [RC95], relations between words [RY79,CvR95] and relations between documents [FNL88,Sav95].These results suggest a new direction in designing IR systems : multiplying the sources of information should compensate partly for their fundamental uncertainty.As stated in the famous Principle of Combination [FNL88]: " Effective integration of more information should lead to better information retrieval ".Indeed, the " Combining evidence " paradigm is more and more regarded as one of the most promising ways of improving IR performances.Nevertheless, at this time these evidence have not always been optimally exploited, because IR models are sometimes not general enough to model information sources for which they were not initially conceived.Non-classical sources of evidence are complex to model, and the added evidence is rather difficult to quantify.More general formalisms are needed to model and combine the evidence from various kinds of knowledge.The inference network of Turtle [TC91,92] has proven powerful at modeling various kind of evidence.
A different and very promising approach to IR is the logical paradigm, in which relevance is computationally defined as the degree to which a query/document can be proved, having a document/query as evidence.This approach has led to very interesting theoretical results, notably by clarifying in a logical way the concept of relevance, and by providing a guideline for IR models.Also it has been argued that logic can represent the flow of information, fundamental to IR [LB96,vRL96].Logical models certainly have a very promising future in IR [CC92,LB96].They are one of the most adequate formalisms for representing multimedia IR, which implies a very general and complex structure of documents [CMF96,Lal97].
While in the Combining evidence approach, retrieval is done using multiple sources of evidence about document and query contents, in the logical approach it is done by transforming the initial information using a certain body of knowledge.Nevertheless, these approaches are more complementary than antagonist : in both cases relevance is evaluated by drawing inference chains between documents and the query, and computing the overall degree of certainty of these chains.
Until now, the most developed implementation of the logical paradigm have used modal logic [Nie89,CvR95].We propose here to take a rather different technique, namely Probabilistic Argumentation Systems (PAS) [KR96].PAS are a technique which combines symbolic logic with probability to model uncertain knowledge (facts and rules) both symbolically and numerically.In a model of IR based on PAS, the Combining evidence and logical Modeling Information Retrieval with Probabilistic Argumentation Systems IRSG98 paradigms can be unified.This model provides a dual (symbolic and numerical) interpretation of the logical uncertainty principle.
We will first present some theory on PAS.Then we will show how it can serve as a model of IR.In section 4, query expansion will be treated using PAS.A discussion on the potential of PAS for modeling IR will end this paper.

An introduction to PAS
Classical logic cannot be used to handle, represent and compute numerical uncertainty: it is restricted to certain facts or rules.Nevertheless, it is one of the simplest as well as one of the most powerful ways to encode knowledge, for the purpose of reasoning (making inferences) from that knowledge.But is representing uncertainty with classical logic really impossible ?
In fact, if we add a certain type of propositional symbols called assumptions to represent uncertainty, we can model uncertain facts and rules, as shown below.Facts and rules are true under the condition that specific assumptions are true.The table 1 shows how uncertainty can be represented with classical logic using assumptions.

Type of knowledge Logical representation Natural language equivalence
A fact P 1 " P 1 is true " A rule P 1 →P 2 " P 1 entails P 2 " An uncertain fact a 1 →P 1 " if assumption a 1 is true, then P 1 is true " An uncertain rule a 2 →(P 1 →P 2 ) ⇔ P 1 ∧a 2 →P 2 " if assumption a 2 is true, then P 1 entails P 2 "

Table 1 : Representing uncertainty with classical logic using assumptions
A triple (P,A,Σ), where P={P 1 ,...,P N } is the set of propositions representing the N variables of interest, A={a 1 ,...,a M } the set of M propositions called assumptions used for representing the uncertainty, and Σ={ξ 1 ,...,ξ R } a set of facts and rules on literals from A and P, is called a Propositional Argumentation System [Hae96,Hae97].A Propositional Argumentation System can represent uncertainty symbolically, which is useful to explain decisions taken and renders the inference process transparent.In this text, we will take capital letters for propositions and small letters for assumptions.
Arguments supporting or discounting certain hypotheses are derived from the knowledge base Σ.A hypothesis h is any logical formula with symbols in A∪P.An argument in favor (or against) h is a conjunction of literals of assumptions for which h becomes true (or false).Then the hypothesis h is said to be supported (or discarded) by the argument.The support of h is defined as the disjunction of all minimal arguments supporting h, and is denoted sp(h).
Example 1 : Suppose we have a set of variables of interest P={P 1 , P 2 } the uncertainty being represented by a set of assumptions A={a 1 ,a 2 ,a 3 } and a set of rules Σ={ξ 1 : a 1 →P 1 ,ξ 2 : a 2 →P 2 ,ξ 3 : P 2 ∧a 3 →P 1 }, and we wish to test the hypothesis P 1 (P 1 true).We have two arguments in favor of P 1 : a 1 and a 2 ∧a 3 .The support of P 1 is then sp(P 1 )=(a 1 ∨(a 2 ∧a 3 )).
We may also be interested in finding evidence against an hypothesis h, or reasons to doubt about h.The doubt of h is defined as the disjunction of all arguments supporting ~h and not supporting h, and is denoted db(h).Alternatively, the less reasons we have to doubt about h, the more plausible it seems.The plausibility of h is defined as pl(h)=~db(h).When both an hypothesis and its negation are supported, there is a contradiction in the knowledge base.The support of the contradiction is the disjunction of all arguments which, if true, entails the contradiction.
Example 2 : To illustrate these new concepts, take example 1 and add the rule ξ 4 : a 4 →~P 1 .Obviously a 4 is an argument against p 1 .The support of P 1 is now sp(p 1 )=(a 1 ∨(a 2 ∧a 3 ))∧~a 4 .The doubt of P 1 is db(P 1 )= a 4 ∧~(a 1 ∨(a 2 ∧a 3 )), and the plausibility is pl(P 1 )=~db(P 1 )=~( ~(a 1 ∨(a 2 ∧a 3 ))∧a 4 )= (a 1 ∨(a 2 ∧a 3 )) ∨~a 4 .Since it is not possible for P 1 and ~P1 to be true at the same time, there is a contradiction in the knowledge base.The support of the contradiction is sp(⊥)=(a 1 ∨(a 2 ∧a 3 ))∧a 4 With Propositional Argumentation Systems, the reasoning process is fully described but the uncertainty is only represented symbolically, not assessed.To assess uncertainty, we need to assign probabilities to assumptions, e.g.p(a 1 )=x 1 , p(a 2 )=x 2 etc. Assumptions are probabilistically independent, i.e. p(a 1 ∧a 2 )=p(a 1 ).p(a 2 ).Adding the set X of probabilities of assumptions to the triple (A,P,Σ), we obtain a Probabilistic Argumentation System (PAS).From the support of a hypothesis and the probabilities assigned to assumptions, we can compute a numerical degree of support dsp(h) of the hypothesis, but we need first to put its symbolic support in disjoint form.Different algorithms have been developed for transforming a logical expression in disjoint form, see [Hei89,Mon96,Abr79].A numerical degree of support is always between 0 and 1, but must not be assimilated with a probability.Example 3 : In example 1, we want to calculate the numerical support dsp(P 1 ) of hypothesis P 1 from the symbolic support sp(P 1 )=a 1 ∨(a 2 ∧a 3 ).We have p(a 1 )=0.5, p(a 2 )=0.6, p(a 3 )=0.3.sp(P 1 ) must be first put in disjoint form : sp(P 1 )= a 1 ∨(a 2 ∧a 3 )= a 1 ∨(a 2 ∧a 3 ∧¬a 1 ).We then have dsp(P 1 ) = 0.5+0.6*0.3*(1-0.5)= 0.59.
In the case of a partly contradictory knowledge base, the degree of support is normalized by taking into account the support of the contradiction in calculating the support of a hypothesis.Example 4 : Take example 2 with p(a 4 )=0.2We have sp(⊥)=(a 1 ∨(a 2 ∧a 3 ))∧a 4 =(a 1 ∨(a 2 ∧a 3 ∧~a 1 ))∧a 4 , then dsp(⊥)=0.59*0.2=0.118.sp(p 1 )=(a 1 ∨(a 2 ∧a 3 ))∧~a 4 =(a 1 ∨(a 2 ∧a 3 ∧~a 1 ))∧~a 4 .Then the normalized numerical degree of support of P 1 is : dspa a a ~a ~a . .This survey of the theory is sufficient for understanding its application to IR.The reader may have noticed a similarity between Assumption Truth Maintenance Systems (ATMS) [dKl86] and PAS.In fact PAS are an extension of ATMS.While ATMS are limited to Horn clauses, PAS can handle any kind of clauses, for instance, rules like P 1 ∧~P 2 ∧a→~P 3 .Also, it can be shown that PAS is a concrete model of a general theory of evidence [Koh95].

3
Modeling IR with PAS

The logical approach and PAS
Comparison of different retrieval models of IR has led van Rijsbergen to argue that IR is a form of uncertain inference, each model having its own way to assess uncertainty [vR86].This led to the logical approach to IR (for a survey, see [Lal96]).This approach, as reformulated in [CC92], states that: 1.In order to be relevant to a query Q, a document D must logically imply Q : D→Q.

2.
Since information is by nature uncertain in IR, the truth of this implication cannot be established with certainty, and we can only measure a degree of certainty P(D→Q).

3.
This degree of certainty is evaluated through the bias of a logic, following a general uncertainty principle, which in the present case can be enunciated as follows : Given a query Q and a document D, a measure of the certainty of D→Q is given by the minimal amount of information that must be added to D in order that D→Q.
Nie [89] proposed an extension to this approach, to take into account two different aspects of relevance.For some users, a document is relevant if it covers all aspects of the query, and relevance is interpreted as exhaustivity (D→Q).For others, a document must be specific to the query in order to be relevant : relevance is rather interpreted as exclusivity (Q→D).This leads to a more general evaluation of relevance, composed of these two properties of relevance : R(D,Q)=F[P(D→Q),P'(Q→D)].We believe that in practice, any interpretation of relevance ((D→Q) or (Q→D)) can be used depending on the specific IR problem to be modeled with logic.In some cases, rules for inferring the query are better suited while in others, rules for inferring documents are more adequate.
With PAS, the evaluation of P(D→Q) (or P'(Q→D)) can take either a symbolic or a numerical form.For a detailed explanation of the inference process, we take the logical interpretation in which P(D→Q) is evaluated by the support of Q once D (and no other document) is considered true (we add D to the knowledge base), which is denoted sp D (Q).The minimal amount of information that must be added to D for D→Q is expressed in the form of a set of arguments (a logical formula containing assumptions).For a numerical evaluation, we put the arguments in disjoint form and assign probabilities to assumptions, and thus compute a numerical degree of support denoted dsp D (Q).Equivalently, P'(Q→D) is evaluated by the support of D once Q is set to true, denoted respectively sp Q (D) and dsp Q (D) for symbolic and numerical support.
This dual evaluation of uncertainty is rather new in IR.Nevertheless it corresponds to the profound nature of IR, which can be approached both logically and probabilistically.Different authors have outlined that retrieval can be modeled by logical inference [vR86, 89, CC92, Lal96], and it has been shown that many retrieval models can be reformulated with logic [vR86,Nie89].But retrieval may also be viewed as an evidential reasoning process [TC90] based on multiple sources of evidence, where the probabilistic nature of information is fundamental.With PAS the role of logic and probability are clearly distinguished.Logic is used to represent uncertainty and drawing inferences, while probability is used to evaluate uncertainty.An interesting aspect of PAS is that the inference process is completely explicit : the retrieval system can " explain " why a document is judged relevant (or not).
An important issue has not been addressed : how are the probabilities of assumptions assessed ?Since an assumption represents the uncertainty of a specific rule/fact, the probability of this assumption should be equal to the probability that this rule/fact is true.For example, having a rule P 1 ∧a 1 →P 2 , the assumption a 1 should be evaluated by p(a 1 )=p(P 2 |P 1 ).Of course probability estimation is one of the fundamental problems of IR [TC97], and this is one of the main problem we face when modeling IR with PAS.We will discuss that issue in Section 5.

An example
To see how a retrieval system can be modeled with PAS, consider the following example.A document D is represented by terms T 1 , T 2 and T 3 , and a query Q by terms T 1 and T 3 .The set of variables of interest is P={D, Q, T 1 , T 2 , T 3 }.
If we take the Q→D interpretation of relevance, we need rules for inferring D .The set of rules is reversed: The figure 1 shows the two PAS corresponding to the two interpretations of relevance.Until now, we have only a symbolical model of IR, and we need to assess assumptions to have a complete model which can rank documents relatively to a certain query.We will make the two very simple assumptions : • if a document or a query contains a term there is a 0.5 probability that the document/query entails that term • if a document or a query contains N term, there is a 1/N probability that the term entails the document/query We then have p(a 1 )=p(a 2 )=p(a 3 )=0.5, p(b 1 )=p(b 2 )=0.5, p(c 1 )=p(c 2 )=0.5, p(d 1 )=p(d 2 )=p(d 3 )=0.33.The two degrees of support are : dsp D (Q)=(0.5*0.5)+(0.5*0.5*(1-0.5*0.5))=0.4375dsp Q (D)=(0.5*0.33)+(0.5*0.33*(1-0.5*0.33)=0.3027Logically the measure of exhaustivity is superior to the measure of specificity, since all query terms are document terms, while the converse is not true.
Modeling Information Retrieval with Probabilistic Argumentation Systems IRSG98

A model of IR based on PAS
To design a PAS , we must first choose our variables of interest, represented by a set P of propositions.We will consider only one document for sake of simplicity.The set P contains : • D which represents the original document.

PAS for query expansion
While there is a great hope that uncertain logics may help building powerful IR models, they are often considered too Modeling Information Retrieval with Probabilistic Argumentation Systems IRSG98 computationally expensive to be used in large scale IR (but see [CRSvR96]).Another handicap of logical models is that commercial retrieval systems are well established and it is costly to change them completely.But uncertain logics can be used to help solving specific IR problems, working at a precise stage of the retrieval process, without changing fundamentally the retrieval system.Also, a general model of IR should not be built solely on theoretical considerations : combining theoretical investigation with experiments on practical problems of IR should lead to a better understanding of the inference processes.Here we investigate the use of PAS for improving the query representation using positive and negative relationships between terms.

Modeling query expansion
Our modeling of query expansion is based on the Q→D interpretation of relevance : starting from the query, we try to infer the document.To design a PAS for query expansion, the variables of interest are the initial query Q, document D and all the representation terms.These variables of interest are represented by the set of propositions P={Q, D, T 1 ,...T N }.The retrieval system computes a score on the terms representing the query, which must be converted into a probability.Assumption a i represents the uncertainty on term T i for representing the query.For all T i representing the query : Q∧a i →T i Since in this model the query is set to true, we have the equivalent rule : a i →T i A link between terms T i and T j is interpreted as information that the presence of T i is evidence for the presence of T j .The uncertainty of this information is represented by an assumption l ij : T i ∧l ij →T j In a similar way, negative evidence (T i is evidence for absence of T k ) is modeled this way : T i ∧l ik →~T k Note that multiple relationships between words T 1 and T 2 when combining different body of knowledge (for example when combining relationships from thesauri, statistical co-occurrence and pseudo-classification) can be modeled by different assumptions: T 1 ∧l 12 →T 2 , T 1 ∧l' 12 →T 2 , T 1 ∧l'' 12 →¬T 2 .Assumption b i represents the uncertainty on term T i for representing the document.For all T i representing the document : T i ∧b i →D The main purpose of using PAS here is to provide a theoretical framework for making inferences using term relationships.If the PAS is used solely for expanding the query in a well-established IR system based for example on the vector-space model, a simple matching can be done once the numerical support of each query term is computed.The rules for inferring document D are necessary only if the whole system is based on the PAS framework.
We are presently investigating ways to assess the probabilities of link assumptions.Nie and Brisebois [96] propose a very interesting way to learn the strength (between 0 and 1) of thesauri relationships using previous relevance judgments, within a fuzzy modal logic framework.On the CACM collection with Wordnet TM thesaurus and a set of 50 queries for training, they find approximately a strenght of 0.1 for synonymy relationships, 0.3 for holonoymy and 0.85 for meronymy.Of course these values are thesaurus, collection, test queries and system dependent.We intend to adapt their method to PAS.
Statistical co-occurrence information has not always shown useful to IR : Peat and Willet [91] give an explanation to that paradox.Second-order co-occurrence is more reliable : with that technique, a term is represented by a vector of all terms with which it occurs in a certain context.The context can be for example a sentence, a paragraph, a document or a sliding window.Then a measure of similarity (typically a cosine measure) can be computed for every term pairs.With that method, two synonyms like " color " and " colour " (which rarely occur in the same document) should have a high measure of similarity since they are usually found with the same words.The complexity of computing second-order co-occurrence is O(N 3 ) with the number of different terms N, but Schütze and Pedersen [97] show how to reduce it to O(N 2 ) using singular value decomposition.
We are still investigating ways for converting a similarity measure to the probability of a link assumption, but the general idea is as follows : for a specific word, we compute its similarity with all the other words, and find the average similarity for this word.This word is then considered as positive/negative evidence for the presence of words with a similarity measure higher/lower than the average similarity.
In our preliminary investigation on the CISI collection we have computed the cosine similarity measure of the word 'information' with all words found more than 10 times in the collection.The average similarity measure is 0.35.Two highly correlated terms are 'data'(0.61)and 'retrieval'(0.76), while the word 'game' has a similarity of 0.18.Assume we compute the probability of a link assumption as the difference between the similarity measure and the average similarity measure, we find that 'information' entails 'retrieval' with a probability of 0.41 : 'information'∧l 12 →'retrieval' with p(l 12 )=0.76-0.35=0.41The probability of link assumption is not symmetric : 'retrieval' has an average similarity with the other words of only 0.24.It is then stronger evidence for the presence of 'information' than the converse : 'retrieval'∧l 21 →'information' with p(l 21 )=0.76-0.24=0.52Since word 'game' has a low similarity measure with 'information', the latter should be considered as negative evidence for the former : 'information'∧l 14 → ~'game' with p(l 14 )=-(0.18-0.35)=0.17Negative evidence should help discount " noisy " words which are added from a manual thesauri but should not be related in the context of the query.There is still a lot of investigation for finding a proper way of computing link assumptions, for building an efficient inference engine to compute the support of each term, and for reducing the amount of computation required.

An example
A query is represented by terms T 1 , T 2 and T 3 .As in section 3.2, there is a prior support of 0.5 on these terms.Assume that terms T 1 , T 2 and T 3 are respectively linked T 4 and T 5 T 6 and T 7 , T 8 and T 9 .by thesaurus relationships.For sake of simplicity, we do not consider co-occurrence relationships, which can be negative.Also, we do not consider for this example words that would be linked with T 4 to T 9 , but we consider " inside " links between T 1 to T 9 .Assume there are three types of relationships, with probability 0.6, 0.4 and 0.2.
The set of rules is :

IRSG98
In summary, we started from a query Q={T 1 :0.5,T 2 :0.5,T 3 :0.5}.The expanded query is : Q'={T 1 :0.5, T 2 :0.5, T 3 :0.5,T 4 :0.3,T 5 :0.3, T 6 :0.37, T 7 :0.3827,T 8 :0.37, T 9 :0.3692}T 4 to T 9 are the added terms in the expanded query.Computing T 4 and T 5 's probability is straightforward.The support of terms T 6 to T 9 illustrate how evidence is propagated with PAS : a piece of evidence is never counted twice.In practice, making a query expansion with PAS will probably entail much more related terms and complex computations.

Discussion
More general formalisms are needed to model and combine the evidence from various kinds of knowledge.The inference network model of Turtle [TC90, 91] has emerged as powerful for modeling various kind of uncertain knowledge.In this approach, probabilistic causal relationships between variables are combined in order to estimate the probability that a document meets a user's information need.Different formulation of a query, multiple document representation, etc. are very naturally modeled within this model.Nevertheless, some types of knowledge are not easily modeled : special attention must be put to prevent cycles in the network (evidence would be propagated indefinitely), multiple relationships between variables must be summarized in one link matrix.Also, this approach is purely numerical, and the system cannot easily explain its decisions.PAS could then be considered as an alternative solution to the inference network model, for solving these specific problems.
While the need of a logic as a formal model of IR has been strongly justified by theoretical arguments, it is not always obvious how a logic would solve practical IR problems.We think that logic can serve for modeling specific knowledge, which cannot be adequately addressed with classical methods.Non-classical sources of evidence, which seem to be a very promising way to improve IR performances, need special methods to be adequately modeled.
In further research, we will concentrate on assessing probabilities of assumptions, in order to make different practical applications of PAS to IR.It is of course possible to give ad-hoc or " well-suited " values to probabilities, but our intention is to base probability estimates on strong theoretical arguments.In another paper [Pic98], the problem of modeling and combining evidence provided by document relationships is tackled with PAS.It is shown how prior probabilities on document's relevance is assessed by a logistic regression using the rank.It is also shown how to assess the probability that a document is relevant if it is linked to a document known to be relevant.A practical implementation was made on the CACM collection, with satisfying results.
Our intention now is to develop more thoroughly the model for modeling document relationships, and to make a practical implementation of PAS for query expansion.Then, we may think of implementing a complete IR system with PAS.

Figure 1 :
Figure 1 : Representation of the PAS for the D→Q and Q→D interpretations of relevance

Figure 2
Figure 2 shows an example.Dashed circles represent assumptions.The document D has 3 different document representations.There are 4 indexing terms of which two are related (T 3 to T 4 ).Query Q has 2 different representations.The support of Q can be found by tracing all paths (inference chains) from D to Q.

Table 2 : Rules in the D→Q interpretation of relevance
• D 1 ,...,D R , which represent the R different document representations.