8
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Text categorization and sentiment analysis are two of the most typical natural language processing tasks with various emerging applications implemented and utilized in different domains, such as health care and policy making. At the same time, the tremendous growth in the popularity and usage of social media, such as Twitter, has resulted on an immense increase in user-generated data, as mainly represented by the corresponding texts in users’ posts. However, the analysis of these specific data and the extraction of actionable knowledge and added value out of them is a challenging task due to the domain diversity and the high multilingualism that characterizes these data. The latter highlights the emerging need for the implementation and utilization of domain-agnostic and multilingual solutions. To investigate a portion of these challenges this research work performs a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus. In this context, four multilingual BERT-based classifiers and a zero-shot classification approach are utilized and compared in terms of their accuracy and applicability in the classification of multilingual data. Their comparison has unveiled insightful outcomes and has a twofold interpretation. Multilingual BERT-based classifiers achieve high performances and transfer inference when trained and fine-tuned on multilingual data. While also the zero-shot approach presents a novel technique for creating multilingual solutions in a faster, more efficient, and scalable way. It can easily be fitted to new languages and new tasks while achieving relatively good results across many languages. However, when efficiency and scalability are less important than accuracy, it seems that this model, and zero-shot models in general, can not be compared to fine-tuned and trained multilingual BERT-based classifiers.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Word2Vec

          My last column ended with some comments about Kuhn and word2vec. Word2vec has racked up plenty of citations because it satisifies both of Kuhn’s conditions for emerging trends: (1) a few initial (promising, if not convincing) successes that motivate early adopters (students) to do more, as well as (2) leaving plenty of room for early adopters to contribute and benefit by doing so. The fact that Google has so much to say on ‘How does word2vec work’ makes it clear that the definitive answer to that question has yet to be written. It also helps citation counts to distribute code and data to make it that much easier for the next generation to take advantage of the opportunities (and cite your work in the process).
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Pre-Training With Whole Word Masking for Chinese BERT

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A Survey of Zero-Shot Learning : Settings, Methods, and Applications

              Most machine-learning methods focus on classifying instances whose classes have already been seen in training. In practice, many applications require classifying instances whose classes have not been seen previously. Zero-shot learning is a powerful and promising learning paradigm, in which the classes covered by training instances and the classes we aim to classify are disjoint. In this paper, we provide a comprehensive survey of zero-shot learning. First of all, we provide an overview of zero-shot learning. According to the data utilized in model optimization, we classify zero-shot learning into three learning settings. Second, we describe different semantic spaces adopted in existing zero-shot learning works. Third, we categorize existing zero-shot learning methods and introduce representative methods under each category. Fourth, we discuss different applications of zero-shot learning. Finally, we highlight promising future research directions of zero-shot learning.
                Bookmark

                Author and article information

                Contributors
                gmanias@unipi.gr
                margy@unipi.gr
                kiourtis@unipi.gr
                simvoul@unipi.gr
                dimos@unipi.gr
                Journal
                Neural Comput Appl
                Neural Comput Appl
                Neural Computing & Applications
                Springer London (London )
                0941-0643
                1433-3058
                8 May 2023
                8 May 2023
                : 1-17
                Affiliations
                GRID grid.4463.5, ISNI 0000 0001 0558 8585, University of Piraeus, ; Piraeus, Greece
                Author information
                http://orcid.org/0000-0003-0128-2022
                Article
                8629
                10.1007/s00521-023-08629-3
                10165589
                68623323-0554-4b1a-99dd-0ad7b8250ade
                © The Author(s) 2023

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 2 January 2023
                : 24 April 2023
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100010686, H2020 European Institute of Innovation and Technology;
                Award ID: 870675
                Award Recipient :
                Funded by: University of Piraeus
                Categories
                S.I.: Technologies of the 4th Industrial Revolution with applications

                Neural & Evolutionary computing
                multilingual classifiers,transfer learning,zero-shot classification,transformers

                Comments

                Comment on this article