1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Reaching for upper bound ROUGE score of extractive summarization methods

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Get To The Point: Summarization with Pointer-Generator Networks

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The Automatic Creation of Literature Abstracts

            H. P. Luhn (1958)
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              The use of MMR, diversity-based reranking for reordering documents and producing summaries

                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ Comput Sci
                PeerJ Comput Sci
                peerj-cs
                PeerJ Computer Science
                PeerJ Inc. (San Diego, USA )
                2376-5992
                26 September 2022
                2022
                : 8
                : e1103
                Affiliations
                [1 ]Kazakh-British Technical University , Almaty, Almaty, Kazakhstan
                [2 ]Institute of Information and Computational Technologies , Almaty, Almaty, Kazakhstan
                [3 ]Instituto Politecnico Nacional , Mexico, Mexico
                Author information
                http://orcid.org/0000-0002-3221-9352
                http://orcid.org/0000-0001-7283-5144
                http://orcid.org/0000-0001-7845-9039
                Article
                cs-1103
                10.7717/peerj-cs.1103
                9575858
                36262160
                b4a2fb36-2a84-4b33-ac7c-4822f709cbf7
                © 2022 Akhmetov et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                History
                : 12 May 2022
                : 24 August 2022
                Funding
                Funded by: Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan
                Award ID: AP09058174
                Funded by: CONACYT, Mexico
                Award ID: A1-S-47854
                Funded by: Secretaria de Investigación y Posgrado of the Instituto Politecnico Nacional, Mexico
                Award ID: 20211784, 20211884, and 20211178
                This research is conducted within the Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan under the grant number AP09058174 in the course of “Development of language-independent unsupervised semantic analysis methods large amounts of text data” project. The work was done with the support from the Mexican Government through the grant A1-S-47854 of CONACYT, Mexico, and grants 20211784, 20211884, and 20211178 of the Secretaria de Investigación y Posgrado of the Instituto Politecnico Nacional, Mexico. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Artificial Intelligence
                Data Science
                Natural Language and Speech
                Optimization Theory and Computation
                Text Mining

                text summarization,genetic algorithm,greedy algorithm,variable neighborhood search,rouge

                Comments

                Comment on this article