1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Impact analysis of keyword extraction using contextual word embedding

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A document’s keywords provide high-level descriptions of the content that summarize the document’s central themes, concepts, ideas, or arguments. These descriptive phrases make it easier for algorithms to find relevant information quickly and efficiently. It plays a vital role in document processing, such as indexing, classification, clustering, and summarization. Traditional keyword extraction approaches rely on statistical distributions of key terms in a document for the most part. According to contemporary technological breakthroughs, contextual information is critical in deciding the semantics of the work at hand. Similarly, context-based features may be beneficial in the job of keyword extraction. For example, simply indicating the previous or next word of the phrase of interest might be used to describe the context of a phrase. This research presents several experiments to validate that context-based key extraction is significant compared to traditional methods. Additionally, the KeyBERT proposed methodology also results in improved results. The proposed work relies on identifying a group of important words or phrases from the document’s content that can reflect the authors’ main ideas, concepts, or arguments. It also uses contextual word embedding to extract keywords. Finally, the findings are compared to those obtained using older approaches such as Text Rank, Rake, Gensim, Yake, and TF-IDF. The Journals of Universal Computer (JUCS) dataset was employed in our research. Only data from abstracts were used to produce keywords for the research article, and the KeyBERT model outperformed traditional approaches in producing similar keywords to the authors’ provided keywords. The average similarity of our approach with author-assigned keywords is 51%.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          A fast learning algorithm for deep belief nets.

          We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Glove: Global Vectors for Word Representation

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The anatomy of a large-scale hypertextual Web search engine

                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ Comput Sci
                PeerJ Comput Sci
                peerj-cs
                PeerJ Computer Science
                PeerJ Inc. (San Diego, USA )
                2376-5992
                30 May 2022
                2022
                : 8
                : e967
                Affiliations
                [1 ]Institute of Computing, Kohat University of Science & Technology, Kohat , Kohat, Pakistan
                [2 ]Department of Information Technology, College of Computers and Information Technology, Taif University , Taif, Saudi Arabia
                [3 ]Department of Computer Science, College of Computer in Al-Leith, Umm Al-Qura University , Makkah, Saudi Arabia
                [4 ]College of Computing and Information Technology, Shaqra University , Shaqra, Saudi Arabia
                Article
                cs-967
                10.7717/peerj-cs.967
                9202614
                35721401
                3bd0a9c1-70d0-4775-ba47-26bcaf1d3c6a
                ©2022 Khan et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                History
                : 11 February 2022
                : 8 April 2022
                Funding
                Funded by: Taif University Researchers Supporting Project number (TURSP-2020/231), Taif University, Taif, Saudi Arabia
                This research was supported by Taif University Researchers Supporting Project number (TURSP-2020/231), Taif University, Taif, Saudi Arabia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Artificial Intelligence
                Data Mining and Machine Learning
                Emerging Technologies

                text rank,yake,tf-idf,keyword extraction,contextual word embedding

                Comments

                Comment on this article