This paper offers a proposal for some preliminary research on the retrieval of structured text, such as extensible mark-up language (XML). We believe that capturing the way in which a reader perceives the meaning of documents, especially genres of text, may have implications for information retrieval (IR) and in particular, for cognitive IR and relevance. Previous research on ‘shallow’ features of structured text has shown that categorization by form is possible. Gibson’s theory of ‘affordances’ and genre offer the reader the meaning and purpose – through structure – of a text, before the reader has even begun to read it, and should therefore provide a good basis for the ‘deep’ skimming and categorization of texts. We believe that Gibson’s ‘affordances’ will aid the user to locate, examine and utilize shallow or deep features of genres and retrieve relevant output. Our proposal puts forward two hypotheses, with a list of research questions to test them, and culminates in experiments involving the studies of human categorization behaviour when viewing the structures of emails and web documents. Finally, we will examine the effectiveness of adding structural layout cues to a Yahoo discussion forum (currently only a bag-of-words), which is rich in structure, but only searchable through a Boolean search engine.
Content
Author and article information
Contributors
Malcolm Clark
Conference
Publication date:
August
2007
Publication date
(Print):
August
2007
Pages: 1-6
Affiliations
[0001]School of Computing
The Robert Gordon University