Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the flexibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles. We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: not found

Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

Alex Graves, Jürgen Schmidhuber (2005)

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.

0 comments Cited 751 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

The genetic association database.

Kevin G. Becker, Kathleen C Barnes, Tiffani Bright … (2004)

0 comments Cited 438 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index

Peder Larsen, Markus von Ins (2010)

The growth rate of scientific publication has been studied from 1907 to 2007 using available data from a number of literature databases, including Science Citation Index (SCI) and Social Sciences Citation Index (SSCI). Traditional scientific publishing, that is publication in peer-reviewed journals, is still increasing although there are big differences between fields. There are no indications that the growth rate has decreased in the last 50 years. At the same time publication using new channels, for example conference proceedings, open archives and home pages, is growing fast. The growth rate for SCI up to 2007 is smaller than for comparable databases. This means that SCI was covering a decreasing part of the traditional scientific literature. There are also clear indications that the coverage by SCI is especially low in some of the scientific areas with the highest growth rate, including computer science and engineering sciences. The role of conference proceedings, open access archives and publications published on the net is increasing, especially in scientific fields with high growth rates, but this has only partially been reflected in the databases. The new publication channels challenge the use of the big databases in measurements of scientific productivity or output and of the growth rate of science. Because of the declining coverage and this challenge it is problematic that SCI has been used and is used as the dominant source for science indicators based on publication and citation numbers. The limited data available for social sciences show that the growth rate in SSCI was remarkably low and indicate that the coverage by SSCI was declining over time. National Science Indicators from Thomson Reuters is based solely on SCI, SSCI and Arts and Humanities Citation Index (AHCI). Therefore the declining coverage of the citation databases problematizes the use of this source.

0 comments Cited 228 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9711271

Journal ID (pubmed-jr-id): 20660

Journal ID (nlm-ta): Pac Symp Biocomput

Journal ID (iso-abbrev): Pac Symp Biocomput

Title: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

ISSN (Print): 2335-6928

ISSN (Electronic): 2335-6936

Publication date Nihms-submitted: 7 December 2018

Publication date (Print): 2019

Publication date PMC-release: 14 March 2019

Volume: 24

Pages: 112-123

Affiliations

[1 ]Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA

[3 ]Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA

[5 ]Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA

[2 ]Chinese University of Hong Kong Shenzhen, China

[4 ]Tsinghua University Beijing, China

[6 ]Google AI Pittsburgh, PA, USA

[7 ]Pettum Inc. Pittsburgh, PA, USA

Author notes

[‡]

The work is done while the author is at CMU.

Article

Manuscript ID: NIHMS999772

PMC ID: 6417822

PubMed ID: 30864315

SO-VID: 8eab31db-2e08-476c-a97f-71366259e0d3

License:

Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.

History

Comments

Comment on this article

Cited by 1

Utilizing network pharmacology to explore the underlying mechanism of Radix Salviae in diabetic retinopathy
Authors: Chun-Li Piao, Jin-Li Luo, De Jin …

See all cited by

Most referenced authors 678

See all reference authors

Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 33

Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

The genetic association database.

The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 89

Cited by 1

Most referenced authors 678