18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: not found
          • Article: not found

          SLINK: An optimally efficient algorithm for the single-link cluster method

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Efficient algorithms for agglomerative hierarchical clustering methods

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Distributional clustering of English words

                Bookmark

                Author and article information

                Journal
                09 September 2011
                Article
                10.1613/jair.1648
                1109.2140
                3ca953d1-1fed-459e-b2c0-63e7cd5da2ab

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Journal Of Artificial Intelligence Research, Volume 24, pages 305-339, 2005
                cs.AI
                jair.org

                Comments

                Comment on this article