Blog
About

115
views
0
recommends
+1 Recommend
1 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Advanced Information Retrieval from Web Pages

      BCS IRSG Symposium: Future Directions in Information Access 2007 (FDIA)

      Future Directions in Information Access

      28-29 August 2007

      web information retrieval, information extraction from web

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A lightweight, web based with near to real-time speed algorithm is proposed in this work. It is able to retrieve main parts (menu, main text, header and footer) of a randomly selected web page entirely using CSS, JavaScript, frames, layers, images, etc. for retrieval. Moreover shortcomings of wellknown modern algorithms for content retrieval from web pages are discussed in this proposal. The algorithm is useful for the improvement of existing: searching, content matching, summaries making, web graph calculation, and etc. engines. Moreover it is practical as a data provider for classification and data mining. The experimental results of a PHP realization of the algorithm showed near to real-time speed, 20-25% error rate for the multipurpose mode and less than 1% error rate for the specific mode.

          Related collections

          Most cited references 5

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Seeing the whole in parts: text summarization for web browsing on handheld devices

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Effective Web data extraction with standard XML technologies

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Visualization and structure analysis of legislative acts

                Bookmark

                Author and article information

                Contributors
                Conference
                August 2007
                August 2007
                : 1-6
                Affiliations
                Tallinn University of Technology

                Ehitajate tee 5,

                19086 Tallinn, Estonia
                Article
                10.14236/ewic/FDIA2007.12
                © A. Vedeshin. Published by BCS Learning and Development Ltd. BCS IRSG Symposium: Future Directions in Information Access 2007, Glasgow

                This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                BCS IRSG Symposium: Future Directions in Information Access 2007
                FDIA
                Glasgow
                28-29 August 2007
                Electronic Workshops in Computing (eWiC)
                Future Directions in Information Access
                Product
                Product Information: 1477-9358BCS Learning & Development
                Self URI (journal page): https://ewic.bcs.org/
                Categories
                Electronic Workshops in Computing

                Comments

                Comment on this article