Reaching for upper bound ROUGE score of extractive summarization methods

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.

Related collections

Most cited references 37

Record: found
Abstract: not found
Conference Proceedings: not found

Get To The Point: Summarization with Pointer-Generator Networks

Abigail See, Peter Liu, Christopher D Manning (2017)

0 comments Cited 204 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

The Automatic Creation of Literature Abstracts

H. P. Luhn (1958)

0 comments Cited 203 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Jaime Carbonell, Jade Goldstein (1998)

0 comments Cited 186 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Alexander Gelbukh:

ORCID: http://orcid.org/0000-0001-7845-9039

Journal

Journal ID (nlm-ta): PeerJ Comput Sci

Journal ID (iso-abbrev): PeerJ Comput Sci

Journal ID (publisher-id): peerj-cs

Title: PeerJ Computer Science

Publisher: PeerJ Inc. (San Diego, USA )

ISSN (Electronic): 2376-5992

Publication date (Electronic): 26 September 2022

Publication date Collection: 2022

Volume: 8

Electronic Location Identifier: e1103

Affiliations

[1 ]Kazakh-British Technical University , Almaty, Almaty, Kazakhstan

[2 ]Institute of Information and Computational Technologies , Almaty, Almaty, Kazakhstan

[3 ]Instituto Politecnico Nacional , Mexico, Mexico

Author information

Iskander Akhmetov http://orcid.org/0000-0002-3221-9352

Rustam Mussabayev http://orcid.org/0000-0001-7283-5144

Alexander Gelbukh http://orcid.org/0000-0001-7845-9039

Article

Publisher ID: cs-1103

DOI: 10.7717/peerj-cs.1103

PMC ID: 9575858

PubMed ID: 36262160

SO-VID: b4a2fb36-2a84-4b33-ac7c-4822f709cbf7

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

History

Date received : 12 May 2022

Date accepted : 24 August 2022

Funding

Funded by: Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan

Award ID: AP09058174

Funded by: CONACYT, Mexico

Award ID: A1-S-47854

Funded by: Secretaria de Investigación y Posgrado of the Instituto Politecnico Nacional, Mexico

Award ID: 20211784, 20211884, and 20211178

This research is conducted within the Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan under the grant number AP09058174 in the course of “Development of language-independent unsupervised semantic analysis methods large amounts of text data” project. The work was done with the support from the Mexican Government through the grant A1-S-47854 of CONACYT, Mexico, and grants 20211784, 20211884, and 20211178 of the Secretaria de Investigación y Posgrado of the Instituto Politecnico Nacional, Mexico. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Reaching for upper bound ROUGE score of extractive summarization methods

Read this article at

Abstract

Related collections

NeuroImaging Methods

Most cited references 37

Get To The Point: Summarization with Pointer-Generator Networks

The Automatic Creation of Literature Abstracts

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 141

Cited by 1

Most referenced authors 266