PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed ^®search interface, a MEDLINE ^®citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web.

Results

We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics.

Conclusion

The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.

Related collections

Most cited references 14

Record: found
Abstract: not found
Article: not found

Multiple Hypothesis Testing

J. Shaffer (1995)

0 comments Cited 304 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

PubMed related articles: a probabilistic topic-based model for content similarity

Jimmy Lin, W John Wilbur (2007)

Background We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH ® in MEDLINE ®. Results The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. Conclusion Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search.

0 comments Cited 71 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Information retrieval

CJ van Rijsbergen, C. J. van Rijsbergen, CJ Van … (1979)

0 comments Cited 59 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2008

Publication date (Electronic): 6 June 2008

Volume: 9

Page: 270

Affiliations

[1 ]National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA

[2 ]The iSchool, University of Maryland, College Park, Maryland, USA

Article

Publisher ID: 1471-2105-9-270

DOI: 10.1186/1471-2105-9-270

PMC ID: 2442104

PubMed ID: 18538027

SO-VID: a8502b15-ab4e-473a-9939-8242d9e05464

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 14

Multiple Hypothesis Testing

PubMed related articles: a probabilistic topic-based model for content similarity

Information retrieval

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 73

Cited by 4

Most referenced authors 83