16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew’s correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Speech Recognition with Deep Recurrent Neural Networks

          Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

            Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy

                Bookmark

                Author and article information

                Contributors
                Role: MethodologyRole: Software
                Role: Writing – original draftRole: Writing – review & editing
                Role: MethodologyRole: Project administrationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Writing – review & editing
                Role: Data curationRole: Software
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2017
                29 December 2017
                : 12
                : 12
                : e0188129
                Affiliations
                [1 ] School of Computer Science and Technology, Tianjin University, Nankai, Tianjin, China, 30072
                [2 ] Tianjin Key Laboratory of Cognitive Computing and Application, Nankai, Tianjin, China, 30072
                [3 ] Beijing KEDONG Electric Power Control System Co. LTD, Qinghe, Beijing, China, 100192
                Harbin Institute of Technology Shenzhen Graduate School, CHINA
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0001-9591-2237
                Article
                PONE-D-17-27178
                10.1371/journal.pone.0188129
                5747425
                29287069
                7299bacd-1f59-4e12-accb-fa707aa10247
                © 2017 Qu et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 21 July 2017
                : 1 November 2017
                Page count
                Figures: 8, Tables: 13, Pages: 18
                Funding
                Funded by: Natural Science Funding of China
                Award ID: 61170177
                Award Recipient :
                Funded by: National Basic Research Program
                Award ID: 2013CB32930X
                Funded by: National High Technology Research and Development Program of China
                Award ID: 2013CB32930X
                Award Recipient :
                This work was supported by: (1) Natural Science Funding of China, grant number 61170177, http://www.nsfc.gov.cn, funding institutions: Tianjin University, authors: Xiu-Jun GONG, Hua Yu; (2) National Basic Research Program of China, grant number 2013CB32930X, http://www.most.gov.cn, funding institutions: Tianjin University; and (3) National High Technology Research and Development Program of China, grant number 2013CB32930X, http://www.most.gov.cn, funding institutions: Tianjin University, authors: Xiu-Jun GONG. The funders did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.
                Categories
                Research Article
                Biology and life sciences
                Biochemistry
                Proteins
                DNA-binding proteins
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Motif Analysis
                Computer and Information Sciences
                Neural Networks
                Biology and Life Sciences
                Neuroscience
                Neural Networks
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Protein Sequencing
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Protein Sequencing
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Learning
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Learning
                Social Sciences
                Psychology
                Cognitive Psychology
                Learning
                Biology and Life Sciences
                Neuroscience
                Learning and Memory
                Learning
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Mathematical Functions
                Convolution
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Custom metadata
                All the source codes and data used in this study are available from the figshare server https://doi.org/10.6084/m9.figshare.5231602.v1.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article