Impact analysis of keyword extraction using contextual word embedding

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A document’s keywords provide high-level descriptions of the content that summarize the document’s central themes, concepts, ideas, or arguments. These descriptive phrases make it easier for algorithms to find relevant information quickly and efficiently. It plays a vital role in document processing, such as indexing, classification, clustering, and summarization. Traditional keyword extraction approaches rely on statistical distributions of key terms in a document for the most part. According to contemporary technological breakthroughs, contextual information is critical in deciding the semantics of the work at hand. Similarly, context-based features may be beneficial in the job of keyword extraction. For example, simply indicating the previous or next word of the phrase of interest might be used to describe the context of a phrase. This research presents several experiments to validate that context-based key extraction is significant compared to traditional methods. Additionally, the KeyBERT proposed methodology also results in improved results. The proposed work relies on identifying a group of important words or phrases from the document’s content that can reflect the authors’ main ideas, concepts, or arguments. It also uses contextual word embedding to extract keywords. Finally, the findings are compared to those obtained using older approaches such as Text Rank, Rake, Gensim, Yake, and TF-IDF. The Journals of Universal Computer (JUCS) dataset was employed in our research. Only data from abstracts were used to produce keywords for the research article, and the KeyBERT model outperformed traditional approaches in producing similar keywords to the authors’ provided keywords. The average similarity of our approach with author-assigned keywords is 51%.

Related collections

Most cited references 34

Record: found
Abstract: found
Article: not found

A fast learning algorithm for deep belief nets.

Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh (2006)

We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

0 comments Cited 1047 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

Glove: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher Manning (2014)

0 comments Cited 980 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

The anatomy of a large-scale hypertextual Web search engine

Sergey Brin, Lawrence Page (1998)

0 comments Cited 457 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Abdul Shahid

Journal

Journal ID (nlm-ta): PeerJ Comput Sci

Journal ID (iso-abbrev): PeerJ Comput Sci

Journal ID (publisher-id): peerj-cs

Title: PeerJ Computer Science

Publisher: PeerJ Inc. (San Diego, USA )

ISSN (Electronic): 2376-5992

Publication date (Electronic): 30 May 2022

Publication date Collection: 2022

Volume: 8

Electronic Location Identifier: e967

Affiliations

[1 ]Institute of Computing, Kohat University of Science & Technology, Kohat , Kohat, Pakistan

[2 ]Department of Information Technology, College of Computers and Information Technology, Taif University , Taif, Saudi Arabia

[3 ]Department of Computer Science, College of Computer in Al-Leith, Umm Al-Qura University , Makkah, Saudi Arabia

[4 ]College of Computing and Information Technology, Shaqra University , Shaqra, Saudi Arabia

Article

Publisher ID: cs-967

DOI: 10.7717/peerj-cs.967

PMC ID: 9202614

PubMed ID: 35721401

SO-VID: 3bd0a9c1-70d0-4775-ba47-26bcaf1d3c6a

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

History

Date received : 11 February 2022

Date accepted : 8 April 2022

Funding

Funded by: Taif University Researchers Supporting Project number (TURSP-2020/231), Taif University, Taif, Saudi Arabia

This research was supported by Taif University Researchers Supporting Project number (TURSP-2020/231), Taif University, Taif, Saudi Arabia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Impact analysis of keyword extraction using contextual word embedding

Read this article at

Abstract

Related collections

Core Readings in Statistical Mediation Analysis

Most cited references 34

A fast learning algorithm for deep belief nets.

Glove: Global Vectors for Word Representation

The anatomy of a large-scale hypertextual Web search engine

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 264

Cited by 1

Most referenced authors 357