Dialogue-Driven Information Retrieval

Web search engines are built for helping web users to locate information quickly and efficiently. However, despite the fact that search engines provide considerable assistance in locating information, one of the main difficulties remains the ambiguity and uncertainty involved in matching information needs against documents which might satisfy those needs. A possible solution is the application of natural language dialogue systems, an area that is becoming increasingly prominent in the field of natural language processing. Dialogue Systems aim at conversing with a human using a logical and articulate structure. Dialogue systems have been shown to work well over structured knowledge sources. Imposing a dialogue system on an intranet however is a new challenge. Here we are looking at combining a dialogue system with the power of a standard search engine.


INTRODUCTION
Imagine you could interact with a university intranet search engine as follows: User: Head of department System: Which department are you looking for?User: computer science System: The head of the computer science department is Dr. Sam Steel.His contact details are as follows ... Do you want any further information?User: How do I get to his office?System: The quickest route from the University information point is as follows ... This type of interaction seems well beyond what is currently possible but the documents stored at the website do contain a lot of implicit structure that could be used to make such a system reality.
One thing we should point out here is that a user is not forced to interact with the dialogue system as there is no need to initiate a separate dialogue to satisfy every user information need.Search engines can perform very well with user queries in most of the cases.Therefore, with any response from the system the user will also see the best matching documents returned by the local search engine.A user is free to ignore all options proposed by the dialogue manager if the top matches contain the desired information.
However, even queries in document collections of limited size often return a large number of documents, many of them not relevant to the query (Kruschwitz, 2005).To stick to the example of the Essex1 intranet, for instance, the head of the computer science department query, the search engine should be able to provide an excellent match to this simple query and retrieve Sam Steel's home page as the best match.However, we end up getting several other pages that are not relevant to this query.
One way of imposing a dialogue system on such a document collection is to enforce formatting guidelines on web site developers so that all documents are properly structured, syntactically as  well as semantically.There are a number of problems with such an approach (Hawking and Zobel, 2007).We do not impose such restrictions and assume that most of the content is unstructured, or partially structured.Our aim is to automatically extract useful domain knowledge from the documents.This knowledege can then be used to guide the dialogue manager.For example, we are interested in extracting subject/object relations to construct domain specific knowledge.This domain knowledge would be quite helpful, to provide a user precise and straight forward results.
The work proposed here represents a PhD project that has just started.

MOTIVATION
There has been little progress in the implementation of such dialogue systems in intranet search engines.Our work is based on UKSearch (Kruschwitz et al., 2008), a domain specific dialogue search system developed at the University of Essex.In this, a domain model has been constructed by extracting document markup structure.A user query is submitted to search engine as well as to the domain model.Apart from showing the standard search engine results to the user, the system assists the users by suggesting query modification terms to refine and relax the query.However, the system has no "understanding" of these modification suggestions.
Our motivation emanated from the user queries submitted to the UKSearch system (Essex University website).Queries have been collected for more than two years to study the user search behaviour.Similar to other studies (Bendersky and Croft, 2009), queries were found to be very short, less than two words on an average.In addition, it was found that queries like room numbers, lab numbers, telephone numbers and course titles etc. were routinely searched for.

RELATED WORK
Our research work is closely related to areas like dialogue systems, question answering systems and information extraction etc. Dialogue systems engage in some sort of conversation with a user, to perform some domain specific task.The domain related task could be booking a flight, accessing yellow pages directory data and enquiring about train time tables etc.All these tasks can be categorized as information seeking tasks.The idea here is, users seek some information by explicitly providing some constraints to the system.Table 1 gives a quick overview of the related research areas.

Information Seeking Dialogue Systems
In recent years, the area of NLP has witnessed a rapid development of dialogue systems from simple conversational agents to sophisticated Multi-modal dialogue systems.Such systems can use various modalities to interact with a user like text,speech,graphics and gestures.We will not look at multi-modal dialogue systems.Early dialogue systems like ELIZA and PARRY are known as conversational agents.The main purpose of ELIZA (Weizenbaum, 1966) was to study the interaction between human and computer, where ELIZA played the role of a psychotherapist.It is a script based program (decomposition and reassembly rules) and reads the input string to identify the necessary keywords.The keyword related transformation rules are applied to the same sentence as a system's response.Although, conversational agents are part of dialogue systems, we are not interested in such type of applications.Instead, we are particularly looking at information seeking aspect in dialogue systems.
Initially, dialogue systems concentrated on travel domain related applications (ATIS) (Hemphill et al., 1990).Later, it moved to other domains like call routing (HMIHY) (Gorin et al., 1997) and intelligent tutoring systems (ITSPOKE) (Litman and Silliman, 2004).Some examples of Multi-modal dialogue systems are: bathroom designer (COMIC) (Catizone et al., 2003), Queens Communicator (O'Neill et al., 2003) and unmanned robot helicopter (WITAS) (Lemon et al., 2001) etc.Unlike these dialogue systems which focus on structured databases, our work concentrates on unstructured data found on the webpages.The significant difference between these dialogue systems and intranets is firstly, we have web type queries.Secondly, queries are more focused in these dialogue systems and finally, we don't have explicit domain specific knowledge.

Dialogue System Modelling Approaches
Finite grammars are the simplistic approaches to model dialogue systems.Initially, finite grammars were used to model the entire structure of conversation.In finite grammars, states represent system utterance and the transition between the states is based on user's response.The problem with this approach is flexibility and portability issues to other domains (Wilks et al., 2006).
The frame based approaches are an extension of finite grammars.They are also known as slot and filler structures.The slots are filled with in a frame based on user's utterance.The best feature about this approach is multiple numbers of slots can be filled in random order.The database is queried once it has filled all the slots.It is somehow flexible when compared to finite grammars because this approach doesn't enforce any strict ordering on user utterance (De Roeck et al., 1998).

Question Answering Systems
Question answering (QA) systems automatically extract answers to questions formulated by a user in a natural language.Question answering is not a new research topic and the main goal of these systems is to provide the user short answers, instead of a list of documents.Question answering systems can be broadly classified into two domains: • Closed domain (limited to particular domain) • Open domain (can be anything and mostly fact based) The first question answering systems are Baseball and LUNAR.Baseball answered questions pertaining to baseball games played in the American league over one period (Green et al., 1961).LUNAR focused on questions related to moon rocks and soil collected from the Apollo moon missions (Woods, 1973).These two systems are good examples of closed domain question answering systems.In closed domain question answering systems, the domain specific knowledge is stored in databases.A natural language interface is provided to access that domain information.
In open domain, question answering systems can be divided into factoid and complex QA systems.Most of the open domain question answering systems are factoid based QA systems.Factoid based QA systems deal with facts related to person names, date of births and organizations etc. Factoid question types are classified into: where, when, who etc. On the other hand, for complex and analytical tasks interactive systems like HITIQA (Small et al., 2003) and FERRET (Hickl et al., 2006) are also developed.Both of these systems are especially meant for answering explanatory questions like: why , how, list , define etc.
QuASM, question answering system depends on html markup structures to extract answers to factoid questions (Pinto et al., 2002).AQUA, (Vargas-Vera et al., 2003) question answering system incorporates various knowledge sources like ontology and WordNet along with a domain specific database.Questions are answered by means of knowledge database.If AQUA fails to answer it uses a search engine to find an answer to a query.We are planning to implement a similar strategy.
Another example of open domain question answering systems is AnswerBus2 .It considers the user query in any of these languages: English, German, French, Spanish, Italian, and Portuguese and returns results in English.Alta Vista's language translator tool Babel Fish is used for translation.Search engines and directories are deployed to find relevant documents to user queries.The major difference with other open domain QA systems is AnswerBus returns a set of possible sentences instead of fixed length answers (Zheng, 2002).
Perhaps the most popular and first web based natural language QA system available online today is START3 .START (Katz and Lin, 2002) relies on Omnibase, (Katz et al., 2002) a heterogeneous database in which documents are annotated in a natural language to extract triples (subject, relation, and object).In our proposed work, we will also consider these triples to extract domain specific knowledge.

RESEARCH QUESTIONS
The questions to be addressed by this proposed research are : • How can a dialogue system be incorporated into the search process to provide the user with a more natural language interface?• Can such a dialogue system on an intranet provide the user with more relevant information and offer a better user experience than a standard search engine?

PROPOSED RESEARCH
There is a wealth of literature on research in dialogue systems and question answering systems.Dialogue systems do however, typically rely on some sort of structured knowledge whereas question answering systems are in most cases one shot interactions.The proposed dialogue systems for intranet search are different to the related work discussed.The main differences with other areas are : • Short queries in most cases (often just keywords rather than questions) • Unstructured data to start with • Domain-specific dialogue (without having lots of domain knowledge) • Wide range of possible user queries The past few years have also seen a rapid explosion of activities in information extraction (Jurafsky and Martin, 2008).Extracting named entities and simple relations has become much more robust and this is another area of research that we will tap into.There are two main parts of proposed research.
• Turn the document collection into a structured knowledge source employing NLP techniques and methods of information extraction to automatically detect named entities like person, names, room numbers etc.As well as facts eg.Predicate -argument structures like: is(Sam steel, head of department) • Impose a dialogue system that employs the automatically extracted knowledge and appropriate domain-specific knowledge to assist a user in the navigation as outlined in the motivating example.
Evaluating such a system is particularly difficult as standard measures like precision and recall are not necessarily the best (and certainly not the only) measures to assess a dialogue system.We will perform a range of evaluations, ranging from technical evaluations that investigate the quality of the extracted facts to full user evaluations.Our main methodology to evaluate the proposed dialogue system will be task-based evaluations.Some of the measures that will be used to compare the dialogue system against some baselines (e.g. a standard search engine) include: • Number of interaction steps required to arrive at an answer (dialogue length) • Time taken to process a user query • Precision and recall (success rate of retrieved results) • User satisfaction

TABLE 1 :
Related research areas