1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An automated approach to identify sarcasm in low-resource language

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Sarcasm detection has emerged due to its applicability in natural language processing (NLP) but lacks substantial exploration in low-resource languages like Urdu, Arabic, Pashto, and Roman-Urdu. While fewer studies identifying sarcasm have focused on low-resource languages, most of the work is in English. This research addresses the gap by exploring the efficacy of diverse machine learning (ML) algorithms in identifying sarcasm in Urdu. The scarcity of annotated datasets for low-resource language becomes a challenge. To overcome the challenge, we curated and released a comparatively large dataset named Urdu Sarcastic Tweets (UST) Dataset, comprising user-generated comments from (former Twitter). Automatic sarcasm detection in text involves using computational methods to determine if a given statement is intended to be sarcastic. However, this task is challenging due to the influence of the user’s behavior and attitude and their expression of emotions. To address this challenge, we employ various baseline ML classifiers to evaluate their effectiveness in detecting sarcasm in low-resource languages. The primary models evaluated in this study are support vector machine (SVM), decision tree (DT), K-Nearest Neighbor Classifier (K-NN), linear regression (LR), random forest (RF), Naïve Bayes (NB), and XGBoost. Our study’s assessment involved validating the performance of these ML classifiers on two distinct datasets—the Tanz-Indicator and the UST dataset. The SVM classifier consistently outperformed other ML models with an accuracy of 0.85 across various experimental setups. This research underscores the importance of tailored sarcasm detection approaches to accommodate specific linguistic characteristics in low-resource languages, paving the way for future investigations. By providing open access to the UST dataset, we encourage its use as a benchmark for sarcasm detection research in similar linguistic contexts.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: not found
          • Article: not found

          A Survey on Automatic Detection of Hate Speech in Text

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Sentiment of Emojis

            There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Selecting a classification method by cross-validation

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Writing – original draft
                Role: Conceptualization
                Role: Writing – review & editing
                Role: Formal analysis
                Role: MethodologyRole: Supervision
                Role: Funding acquisitionRole: Project administration
                Role: Formal analysisRole: Funding acquisition
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                PLOS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                5 December 2024
                2024
                : 19
                : 12
                : e0307186
                Affiliations
                [1 ] Institute of CS & IT, University of Science & Technology, Bannu, Pakistan
                [2 ] Department of Computer Science, School of Physics, Engineering & Computer Science, University of Hertfordshire, Hatfield, United Kingdom
                [3 ] Department of Informatics and Computer Systems, King Khalid University, Abha, Saudi Arabia
                [4 ] Department of Computer Science, Al Ain University, Al Ain, UAE
                University of Kurdistan Hewler, IRAQ
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-3783-0871
                Article
                PONE-D-24-04057
                10.1371/journal.pone.0307186
                11620596
                39637015
                67a9341f-31f5-49ec-9978-ea3e8f451737
                © 2024 Khan et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 30 January 2024
                : 2 July 2024
                Page count
                Figures: 6, Tables: 8, Pages: 29
                Funding
                Funded by: Deanship of Research and Graduate Studies at King Khalid University
                Award ID: RGP2/455/45
                Award Recipient :
                The authors declare that there are no competing financial interests. The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under grant number RGP2/455/45. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Language
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Language
                Social Sciences
                Psychology
                Cognitive Psychology
                Language
                Social Sciences
                Sociology
                Communications
                Social Communication
                Social Media
                Twitter
                Computer and Information Sciences
                Network Analysis
                Social Networks
                Social Media
                Twitter
                Social Sciences
                Sociology
                Social Networks
                Social Media
                Twitter
                Social Sciences
                Sociology
                Culture
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Support Vector Machines
                Social Sciences
                Sociology
                Communications
                Social Communication
                Social Media
                Computer and Information Sciences
                Network Analysis
                Social Networks
                Social Media
                Social Sciences
                Sociology
                Social Networks
                Social Media
                Social Sciences
                Linguistics
                Semantics
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Custom metadata
                All relevant data are within the paper.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article