2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the flexibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles. We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: not found

          Framewise phoneme classification with bidirectional LSTM and other neural network architectures.

          In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The genetic association database.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index

              The growth rate of scientific publication has been studied from 1907 to 2007 using available data from a number of literature databases, including Science Citation Index (SCI) and Social Sciences Citation Index (SSCI). Traditional scientific publishing, that is publication in peer-reviewed journals, is still increasing although there are big differences between fields. There are no indications that the growth rate has decreased in the last 50 years. At the same time publication using new channels, for example conference proceedings, open archives and home pages, is growing fast. The growth rate for SCI up to 2007 is smaller than for comparable databases. This means that SCI was covering a decreasing part of the traditional scientific literature. There are also clear indications that the coverage by SCI is especially low in some of the scientific areas with the highest growth rate, including computer science and engineering sciences. The role of conference proceedings, open access archives and publications published on the net is increasing, especially in scientific fields with high growth rates, but this has only partially been reflected in the databases. The new publication channels challenge the use of the big databases in measurements of scientific productivity or output and of the growth rate of science. Because of the declining coverage and this challenge it is problematic that SCI has been used and is used as the dominant source for science indicators based on publication and citation numbers. The limited data available for social sciences show that the growth rate in SSCI was remarkably low and indicate that the coverage by SSCI was declining over time. National Science Indicators from Thomson Reuters is based solely on SCI, SSCI and Arts and Humanities Citation Index (AHCI). Therefore the declining coverage of the citation databases problematizes the use of this source.
                Bookmark

                Author and article information

                Journal
                9711271
                20660
                Pac Symp Biocomput
                Pac Symp Biocomput
                Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
                2335-6928
                2335-6936
                7 December 2018
                2019
                14 March 2019
                : 24
                : 112-123
                Affiliations
                [1 ]Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
                [3 ]Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
                [5 ]Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
                [2 ]Chinese University of Hong Kong Shenzhen, China
                [4 ]Tsinghua University Beijing, China
                [6 ]Google AI Pittsburgh, PA, USA
                [7 ]Pettum Inc. Pittsburgh, PA, USA
                Author notes
                [‡]

                The work is done while the author is at CMU.

                Article
                NIHMS999772
                6417822
                30864315
                8eab31db-2e08-476c-a97f-71366259e0d3

                Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.

                History
                Categories
                Article

                biomedical text-mining,deep reinforcement learning,genetic association

                Comments

                Comment on this article