Fourth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2011) (FDIA)
Future Directions in Information Access (FDIA 2011)
31 August 2011
Recent years have witnessed an upsurge in the quantity of news, encyclopedic articles, blogs, forum and social networking posts etc. over the web. Some of these, such as the news and Wikipedia articles are carefully authored, edited and quality controlled, while others such as blogs and social networking posts are not. A document in the former category is often explicitly decomposed into paragraphs or sections to convey a specific aspect of the overall information in a more focused way. The sub-topic based organizational pattern is more implicit in a document of the later category, due to the absence of explicitly demarcated paragraphs. It is very important to gain insights into the sub-topical structure of documents to effectively manage and utilize the different aspects of information contained therein. The process of mining out the sub topical structure from a document is known as text segmentation.