225
views
0
recommends
+1 Recommend
1 collections
    4
    shares
       
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Advanced Information Retrieval from Web Pages

      proceedings-article

      BCS IRSG Symposium: Future Directions in Information Access 2007 (FDIA)

      Future Directions in Information Access

      28-29 August 2007

      web information retrieval, information extraction from web

      Bookmark

            Abstract

            A lightweight, web based with near to real-time speed algorithm is proposed in this work. It is able to retrieve main parts (menu, main text, header and footer) of a randomly selected web page entirely using CSS, JavaScript, frames, layers, images, etc. for retrieval. Moreover shortcomings of wellknown modern algorithms for content retrieval from web pages are discussed in this proposal. The algorithm is useful for the improvement of existing: searching, content matching, summaries making, web graph calculation, and etc. engines. Moreover it is practical as a data provider for classification and data mining. The experimental results of a PHP realization of the algorithm showed near to real-time speed, 20-25% error rate for the multipurpose mode and less than 1% error rate for the specific mode.

            Content

            Author and article information

            Contributors
            Conference
            August 2007
            August 2007
            : 1-6
            Affiliations
            [0001]Tallinn University of Technology

            Ehitajate tee 5,

            19086 Tallinn, Estonia
            Article
            10.14236/ewic/FDIA2007.12
            8678d48b-229e-44a3-acc8-9cc017f96c84
            © A. Vedeshin. Published by BCS Learning and Development Ltd. BCS IRSG Symposium: Future Directions in Information Access 2007, Glasgow

            This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

            BCS IRSG Symposium: Future Directions in Information Access 2007
            FDIA
            Glasgow
            28-29 August 2007
            Electronic Workshops in Computing (eWiC)
            Future Directions in Information Access
            Product
            Product Information: 1477-9358BCS Learning & Development
            Self URI (journal page): https://ewic.bcs.org/
            Categories
            Electronic Workshops in Computing

            Comments

            Comment on this article