This paper is concerned with the use of linguistically motivated phrases as indexing terms in Information Retrieval applications. Apart from the conventional noun phrases, we propose to use verb phrases as index terms for text classification. Techniques for phrase matching through syntactic normalization and semantical matching are described. We discuss the realization of the syntactic normalization of phrases by transduction to frames. Semantical normalization is based on lexico-semantical relations, taking into account certain properties of the classification algorithms used.
The ideas described here are being implemented in the Document Routing system DORO, in which statistical learning algorithms are applied to document profiles consisting of phrases. This paper describes the rationale behind work in progress, rather than presenting final results.