+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DEEPre: sequence-based enzyme EC number prediction by deep learning


      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.


          We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre’s ability to capture the functional difference of enzyme isoforms.

          Availability and implementation

          The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references 64

          • Record: found
          • Abstract: found
          • Article: not found

          The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.

          The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            COFACTOR: an accurate comparative algorithm for structure-based protein function annotation

            We have developed a new COFACTOR webserver for automated structure-based protein function annotation. Starting from a structural model, given by either experimental determination or computational modeling, COFACTOR first identifies template proteins of similar folds and functional sites by threading the target structure through three representative template libraries that have known protein–ligand binding interactions, Enzyme Commission number or Gene Ontology terms. The biological function insights in these three aspects are then deduced from the functional templates, the confidence of which is evaluated by a scoring function that combines both global and local structural similarities. The algorithm has been extensively benchmarked by large-scale benchmarking tests and demonstrated significant advantages compared to traditional sequence-based methods. In the recent community-wide CASP9 experiment, COFACTOR was ranked as the best method for protein–ligand binding site predictions. The COFACTOR sever and the template libraries are freely available at http://zhanglab.ccmb.med.umich.edu/COFACTOR.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Mojo Hand, a TALEN design tool for genome editing applications

              Background Recent studies of transcription activator-like (TAL) effector domains fused to nucleases (TALENs) demonstrate enormous potential for genome editing. Effective design of TALENs requires a combination of selecting appropriate genetic features, finding pairs of binding sites based on a consensus sequence, and, in some cases, identifying endogenous restriction sites for downstream molecular genetic applications. Results We present the web-based program Mojo Hand for designing TAL and TALEN constructs for genome editing applications (http://www.talendesign.org). We describe the algorithm and its implementation. The features of Mojo Hand include (1) automatic download of genomic data from the National Center for Biotechnology Information, (2) analysis of any DNA sequence to reveal pairs of binding sites based on a user-defined template, (3) selection of restriction-enzyme recognition sites in the spacer between the TAL monomer binding sites including options for the selection of restriction enzyme suppliers, and (4) output files designed for subsequent TALEN construction using the Golden Gate assembly method. Conclusions Mojo Hand enables the rapid identification of TAL binding sites for use in TALEN design. The assembly of TALEN constructs, is also simplified by using the TAL-site prediction program in conjunction with a spreadsheet management aid of reagent concentrations and TALEN formulation. Mojo Hand enables scientists to more rapidly deploy TALENs for genome editing applications.

                Author and article information

                Role: Associate Editor
                Oxford University Press
                01 March 2018
                23 October 2017
                23 October 2017
                : 34
                : 5
                : 760-769
                [1 ]Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
                [2 ]Computer Science Department, Illinois Institute of Technology, Chicago, IL, USA
                [3 ]Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China
                Author notes
                To whom correspondence should be addressed. Email: xin.gao@ 123456kaust.edu.sa
                © The Author 2017. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                Page count
                Pages: 10
                Funded by: King Abdullah University of Science and Technology 10.13039/501100004052
                Funded by: KAUST 10.13039/501100004052
                Funded by: National Natural Science Foundation of China 10.13039/501100001809
                Award ID: 61401131 and 61731008
                Original Papers
                Sequence Analysis

                Bioinformatics & Computational biology


                Comment on this article