On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew’s correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins.

Related collections

Most cited references 30

Record: found
Abstract: found
Article: found

Is Open Access

Speech Recognition with Deep Recurrent Neural Networks

Alex Graves, Abdel-rahman Mohamed, Geoffrey E Hinton (2013)

Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

0 comments Cited 400 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy (2015)

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

0 comments Cited 329 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy

J Shore, R R Johnson (1980)

0 comments Cited 257 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Yu-Hui Qu: Role: MethodologyRole: Software

Hua Yu: Role: Writing – original draftRole: Writing – review & editing

Xiu-Jun Gong:

ORCID: http://orcid.org/0000-0001-9591-2237

Role: MethodologyRole: Project administrationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing

Jia-Hui Xu: Role: Data curationRole: Writing – review & editing

Hong-Shun Lee: Role: Data curationRole: Software

Bin Liu: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2017

Publication date (Electronic): 29 December 2017

Volume: 12

Issue: 12

Electronic Location Identifier: e0188129

Affiliations

[1 ] School of Computer Science and Technology, Tianjin University, Nankai, Tianjin, China, 30072

[2 ] Tianjin Key Laboratory of Cognitive Computing and Application, Nankai, Tianjin, China, 30072

[3 ] Beijing KEDONG Electric Power Control System Co. LTD, Qinghe, Beijing, China, 100192

Harbin Institute of Technology Shenzhen Graduate School, CHINA

Author notes

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: gongxj@ 123456tju.edu.cn

Author information

Xiu-Jun Gong http://orcid.org/0000-0001-9591-2237

Article

Publisher ID: PONE-D-17-27178

DOI: 10.1371/journal.pone.0188129

PMC ID: 5747425

PubMed ID: 29287069

SO-VID: 7299bacd-1f59-4e12-accb-fa707aa10247

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 21 July 2017

Date accepted : 1 November 2017

Page count

Figures: 8, Tables: 13, Pages: 18

Funding

Funded by: Natural Science Funding of China

Award ID: 61170177

Award Recipient :

ORCID: http://orcid.org/0000-0001-9591-2237

Xiu-Jun GONG

Funded by: National Basic Research Program

Award ID: 2013CB32930X

Funded by: National High Technology Research and Development Program of China

Award ID: 2013CB32930X

Award Recipient :

ORCID: http://orcid.org/0000-0001-9591-2237

Xiu-Jun GONG

This work was supported by: (1) Natural Science Funding of China, grant number 61170177, http://www.nsfc.gov.cn, funding institutions: Tianjin University, authors: Xiu-Jun GONG, Hua Yu; (2) National Basic Research Program of China, grant number 2013CB32930X, http://www.most.gov.cn, funding institutions: Tianjin University; and (3) National High Technology Research and Development Program of China, grant number 2013CB32930X, http://www.most.gov.cn, funding institutions: Tianjin University, authors: Xiu-Jun GONG. The funders did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

Custom metadata

Data Availability All the source codes and data used in this study are available from the figshare server https://doi.org/10.6084/m9.figshare.5231602.v1.

On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 30

Speech Recognition with Deep Recurrent Neural Networks

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 46

Cited by 20

Most referenced authors 503