Text mining approaches for dealing with the rapidly expanding literature on COVID-19

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

More than 50 000 papers have been published about COVID-19 since the beginning of 2020 and several hundred new papers continue to be published every day. This incredible rate of scientific productivity leads to information overload, making it difficult for researchers, clinicians and public health officials to keep up with the latest findings. Automated text mining techniques for searching, reading and summarizing papers are helpful for addressing information overload. In this review, we describe the many resources that have been introduced to support text mining applications over the COVID-19 literature; specifically, we discuss the corpora, modeling resources, systems and shared tasks that have been introduced for COVID-19. We compile a list of 39 systems that provide functionality such as search, discovery, visualization and summarization over the COVID-19 literature. For each system, we provide a qualitative description and assessment of the system’s performance, unique data or user interface features and modeling decisions. Many systems focus on search and discovery, though several systems provide novel features, such as the ability to summarize findings over multiple documents or linking between scientific articles and clinical trials. We also describe the public corpora, models and shared tasks that have been introduced to help reduce repeated effort among community members; some of these resources (especially shared tasks) can provide a basis for comparing the performance of different systems. Finally, we summarize promising results and open challenges for text mining the COVID-19 literature.

Related collections

Most cited references 81

Record: found
Abstract: found
Article: found

Is Open Access

A new coronavirus associated with human respiratory disease in China

Fan Wu, Su Zhao, Bin Yu … (2020)

Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health 1–3 . Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing 4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China 5 . This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans.

0 comments Cited 5297 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Rayyan—a web and mobile app for systematic reviews

Mourad Ouzzani, Hossam Hammady, Zbys Fedorowicz … (2017)

Background Synthesis of multiple randomized controlled trials (RCTs) in a systematic review can summarize the effects of individual outcomes and provide numerical answers about the effectiveness of interventions. Filtering of searches is time consuming, and no single method fulfills the principal requirements of speed with accuracy. Automation of systematic reviews is driven by a necessity to expedite the availability of current best evidence for policy and clinical decision-making. We developed Rayyan (http://rayyan.qcri.org), a free web and mobile app, that helps expedite the initial screening of abstracts and titles using a process of semi-automation while incorporating a high level of usability. For the beta testing phase, we used two published Cochrane reviews in which included studies had been selected manually. Their searches, with 1030 records and 273 records, were uploaded to Rayyan. Different features of Rayyan were tested using these two reviews. We also conducted a survey of Rayyan’s users and collected feedback through a built-in feature. Results Pilot testing of Rayyan focused on usability, accuracy against manual methods, and the added value of the prediction feature. The “taster” review (273 records) allowed a quick overview of Rayyan for early comments on usability. The second review (1030 records) required several iterations to identify the previously identified 11 trials. The “suggestions” and “hints,” based on the “prediction model,” appeared as testing progressed beyond five included studies. Post rollout user experiences and a reflexive response by the developers enabled real-time modifications and improvements. The survey respondents reported 40% average time savings when using Rayyan compared to others tools, with 34% of the respondents reporting more than 50% time savings. In addition, around 75% of the respondents mentioned that screening and labeling studies as well as collaborating on reviews to be the two most important features of Rayyan. As of November 2016, Rayyan users exceed 2000 from over 60 countries conducting hundreds of reviews totaling more than 1.6M citations. Feedback from users, obtained mostly through the app web site and a recent survey, has highlighted the ease in exploration of searches, the time saved, and simplicity in sharing and comparing include-exclude decisions. The strongest features of the app, identified and reported in user feedback, were its ability to help in screening and collaboration as well as the time savings it affords to users. Conclusions Rayyan is responsive and intuitive in use with significant potential to lighten the load of reviewers.

0 comments Cited 4120 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The FAIR Guiding Principles for scientific data management and stewardship

Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg … (2016)

There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

0 comments Cited 3012 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Lucy Lu Wang

Kyle Lo

Journal

Journal ID (nlm-ta): Brief Bioinform

Journal ID (iso-abbrev): Brief Bioinform

Journal ID (publisher-id): bib

Title: Briefings in Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1467-5463

ISSN (Electronic): 1477-4054

Publication date (Electronic): 07 December 2020

Electronic Location Identifier: bbaa296

Affiliations

The Allen Institute for Artificial Intelligence , Seattle, WA 98112, USA

Author notes

Corresponding author: Lucy Lu Wang, The Allen Institute for Artificial Intelligence, Seattle, WA 98112, USA. Fax: +1 443 824 9725; E-mail: lucyw@ 123456allenai.org , mail@ 123456llwang.net

Article

Publisher ID: bbaa296

DOI: 10.1093/bib/bbaa296

PMC ID: 7799291

PubMed ID: 33279995

SO-VID: 87cd7cc6-4080-43d7-8f24-14a29324690b

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 5 August 2020

Date revision received : 2 October 2020

Date accepted : 7 October 2020

Page count

Pages: 19

Custom metadata

article-lifecycle PAP

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: covid-19,text mining,natural language processing,information retrieval,information extraction,question answering,summarization,shared tasks,cord-19

Data availability:

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: covid-19, text mining, natural language processing, information retrieval, information extraction, question answering, summarization, shared tasks, cord-19

Text mining approaches for dealing with the rapidly expanding literature on COVID-19

Read this article at

Abstract

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 81

A new coronavirus associated with human respiratory disease in China

Rayyan—a web and mobile app for systematic reviews

The FAIR Guiding Principles for scientific data management and stewardship

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 97

Cited by 21

Most referenced authors 1,800