6
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Disentangling the Wikipedia Category Graph for Corpus Extraction

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In several areas of research such as knowledge management and natural language processing, domain-specific corpora are required for tasks such as terminology extraction and ontology learning. The presented investigations herein are based on the assumption that Wikipedia can be used for the purpose of corpus extraction. It presents the advantage of possessing a semantic layer, which should ease the extraction of domain-specific corpora. Yet, as the Wikipedia category graph is scale-free, it can not be used as it is for these purposes. In this paper, we propose a novel approach to graph clustering called BorderFlow, which we use and evaluate on the Wikipedia category graph. Additional possible applications of these results in the area of information retrieval are presented.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: not found
          • Book: not found

          Modern InformationRetrieval

            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)

            G. Gan, C. MA, J Wu (2007)
              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              Introduction to Data Mining, (First Edition)

                Bookmark

                Author and article information

                Journal
                poli
                Polibits
                Polibits
                Instituto Politécnico Nacional, Centro de Innovación y Desarrollo Tecnológico en Cómputo (México, DF, Mexico )
                1870-9044
                June 2009
                : 39
                : 5-10
                Affiliations
                [01] Leipzig orgnameUniversity of Leipzig orgdiv1Department of Computer Science Germany ngonga@ 123456informatik.uni-leipzig.de
                Article
                S1870-90442009000100002 S1870-9044(09)00003900002
                34db6ace-49c5-4d38-9a3d-fd30618dbd3e

                This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

                History
                : 05 February 2009
                : 20 March 2009
                Page count
                Figures: 0, Tables: 0, Equations: 0, References: 13, Pages: 6
                Product

                SciELO Mexico


                Natural language processing,corpus extraction,local graph clustering

                Comments

                Comment on this article