2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Keyword extraction: Issues and methods

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Due to the considerable growth of the volume of text documents on the Internet and in digital libraries, manual analysis of these documents is no longer feasible. Having efficient approaches to keyword extraction in order to retrieve the ‘key’ elements of the studied documents is now a necessity. Keyword extraction has been an active research field for many years, covering various applications in Text Mining, Information Retrieval, and Natural Language Processing, and meeting different requirements. However, it is not a unified domain of research. In spite of the existence of many approaches in the field, there is no single approach that effectively extracts keywords from different data sources. This shows the importance of having a comprehensive review, which discusses the complexity of the task and categorizes the main approaches of the field based on the features and methods of extraction that they use. This paper presents a general introduction to the field of keyword/keyphrase extraction. Unlike the existing surveys, different aspects of the problem along with the main challenges in the field are discussed. This mainly includes the unclear definition of ‘keyness’, complexities of targeting proper features for capturing desired keyness properties and selecting efficient extraction methods, and also the evaluation issues. By classifying a broad range of state-of-the-art approaches and analysing the benefits and drawbacks of different features and methods, we provide a clearer picture of them. This review is intended to help readers find their way around all the works related to keyword extraction and guide them in choosing or designing a method that is appropriate for the application they are targeting.

          Related collections

          Most cited references63

          • Record: found
          • Abstract: not found
          • Article: not found

          A Mathematical Theory of Communication

          C. Shannon (1948)
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Understanding interobserver agreement: the kappa statistic.

            Items such as physical exam findings, radiographic interpretations, or other diagnostic tests often rely on some degree of subjective interpretation by observers. Studies that measure the agreement between two or more observers should include a statistic that takes into account the fact that observers will sometimes agree or disagree simply by chance. The kappa statistic (or kappa coefficient) is the most commonly used statistic for this purpose. A kappa of 1 indicates perfect agreement, whereas a kappa of 0 indicates agreement equivalent to chance. A limitation of kappa is that it is affected by the prevalence of the finding under observation. Methods to overcome this limitation have been described.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The anatomy of a large-scale hypertextual Web search engine

                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                Journal
                Natural Language Engineering
                Nat. Lang. Eng.
                Cambridge University Press (CUP)
                1351-3249
                1469-8110
                May 2020
                November 11 2019
                May 2020
                : 26
                : 3
                : 259-291
                Article
                10.1017/S1351324919000457
                db9f9783-a8ca-4335-b3a8-e4f63a433efc
                © 2020

                https://www.cambridge.org/core/terms

                History

                Comments

                Comment on this article