Information Retrieval is the Informatics field primarily focused on all problems and challenges related to information storage and access. The large majority of works in this area are based on static collections of documents. However, many of these collections are dynamic, and have evolved over time with documents being added, edited or simply removed at different times. Even in highly dynamic environments such as the World Wide Web, research tends to be centered on the most recent version of the documents and all the past information is normally discarded. Recognizing these changes over dynamic text collections and exploiting them for document retrieval and presentation purposes introduce new and relevant research challenges. This paper addresses the opportunity that gains relevance in this context – summarization of changes in dynamic text collections. We first define the problem in order to produce a summary that describes textual changes to an entire document or a set of related documents over an user defined time period. Then, from literature we present an extensive overview of the relevant approaches depicting similar problems and at last some discussions including future aspects.
Author and article information
PhD in Computer Science (MAP-i), INESC TEC, Universidade do Porto
Rua Dr. Roberto Frias, s/n 4200-465 Porto, Portugal