Web Information Retrieval (WebIR) is the application of Information Retrieval concepts to the World Wide Web. The most successful approaches in this field have modeled the web’s structure as a directed graph and explored this concept using different approaches. Within this line of research, HITS and PageRank are two of the most well known paradigms for evaluating the importance of web documents. Most of this research has origins in the area of citation analysis, but although time is an important dimension in the citation analysis literature, it hasn’t been explored in depth within WebIR. Recent studies show that the web is a highly dynamic environment, with significant changes occurring weekly. The Blogospace is a good example of this very active behavior. In this work, temporal web evidence is identified and categorized according to two classes, one based on features extracted form individual documents and the other based on features extracted from the whole web. Also, a broad survey of previous work exploring temporal evidence is presented. Finally, ideas for exploring temporal web evidence in typical web tasks are briefly discussed. The lack of suitable corpora containing temporal evidence has been a deterrent to research on this field. The recent availability of public datasets containing temporal information has raised public awareness of this topic.
Author and article information
Faculdade de Engenharia, Universidade do Porto
Rua Dr. Roberto Frias, s/n 4200-465 Porto PORTUGAL