923
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Celebrating 65 years of The Computer Journal - free-to-read perspectives - bcs.org/tcj65

      scite_
       
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Focused Retrieval Using Topical Language and Structure

      proceedings-article
      BCS IRSG Symposium: Future Directions in Information Access 2007 (FDIA)
      Future Directions in Information Access
      28-29 August 2007
      Focused Retrieval, Web Retrieval, Language Modeling, Relevance Feedback
      Bookmark

            Abstract

            We investigate focused retrieval techniques that deal with the increasing amount of structure on the web. Our approach is to combine multiple representations of web information in a common framework based on statistical language models. In this framework, it will be possible to derive a topical language model of the actual language-use on web pages on a certain topic—such as arts, business, entertainment, education, etc.—using the unigrams and bigrams taken from the plain text of the web pages. Similarly, it will be possible to derive models of the structure of web pages to distinguish between blogs, FAQs, personal web pages, etc. Structural characteristics of a web page include, amongst others, tagname statistics and parent-child tags. We will build a multiple level language model to exploit the information contained in the topical language and structure models. The .GOV2 corpus will be used as a test collection on which queries will be run on different topical categories and on web pages with different structures. We plan to develop so-called parsimonious models to derive a compact representation and to handle dependencies between representations of the data.

            Content

            Author and article information

            Contributors
            Conference
            August 2007
            August 2007
            : 1-6
            Affiliations
            [0001]Archives and Information Studies, University of Amsterdam

            Turfdraagsterpad 9, 1012 XT Amsterdam, The Netherlands
            Article
            10.14236/ewic/FDIA2007.9
            a2524cab-2205-4ca5-946d-c3313d9d88b5
            © A.M. Kaptein. Published by BCS Learning and Development Ltd. BCS IRSG Symposium: Future Directions in Information Access 2007, Glasgow

            This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

            BCS IRSG Symposium: Future Directions in Information Access 2007
            FDIA
            Glasgow
            28-29 August 2007
            Electronic Workshops in Computing (eWiC)
            Future Directions in Information Access
            History
            Product

            1477-9358 BCS Learning & Development

            Self URI (article page): https://www.scienceopen.com/hosted-document?doi=10.14236/ewic/FDIA2007.9
            Self URI (journal page): https://ewic.bcs.org/
            Categories
            Electronic Workshops in Computing

            Applied computer science,Computer science,Security & Cryptology,Graphics & Multimedia design,General computer science,Human-computer-interaction
            Focused Retrieval,Web Retrieval,Language Modeling,Relevance Feedback

            Comments

            Comment on this article