18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Computational Method for Classifying Different Human Tissues with Quantitatively Tissue-Specific Expressed Genes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Tissue-specific gene expression has long been recognized as a crucial key for understanding tissue development and function. Efforts have been made in the past decade to identify tissue-specific expression profiles, such as the Human Proteome Atlas and FANTOM5. However, these studies mainly focused on “qualitatively tissue-specific expressed genes” which are highly enriched in one or a group of tissues but paid less attention to “quantitatively tissue-specific expressed genes”, which are expressed in all or most tissues but with differential expression levels. In this study, we applied machine learning algorithms to build a computational method for identifying “quantitatively tissue-specific expressed genes” capable of distinguishing 25 human tissues from their expression patterns. Our results uncovered the expression of 432 genes as optimal features for tissue classification, which were obtained with a Matthews Correlation Coefficient ( MCC) of more than 0.99 yielded by a support vector machine (SVM). This constructed model was superior to the SVM model using tissue enriched genes and yielded MCC of 0.985 on an independent test dataset, indicating its good generalization ability. These 432 genes were proven to be widely expressed in multiple tissues and a literature review of the top 23 genes found that most of them support their discriminating powers. As a complement to previous studies, our discovery of these quantitatively tissue-specific genes provides insights into the detailed understanding of tissue development and function.

          Related collections

          Most cited references64

          • Record: found
          • Abstract: found
          • Article: not found

          Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

          Predictions of the secondary structure of T4 phage lysozyme, made by a number of investigators on the basis of the amino acid sequence, are compared with the structure of the protein determined experimentally by X-ray crystallography. Within the amino terminal half of the molecule the locations of helices predicted by a number of methods agree moderately well with the observed structure, however within the carboxyl half of the molecule the overall agreement is poor. For eleven different helix predictions, the coefficients giving the correlation between prediction and observation range from 0.14 to 0.42. The accuracy of the predictions for both beta-sheet regions and for turns are generally lower than for the helices, and in a number of instances the agreement between prediction and observation is no better than would be expected for a random selection of residues. The structural predictions for T4 phage lysozyme are much less successful than was the case for adenylate kinase (Schulz et al. (1974) Nature 250, 140-142). No one method of prediction is clearly superior to all others, and although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Data mining in bioinformatics using Weka.

            The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              A study of cross-validation and bootstrap for accuracy estimation and model selection in

                Bookmark

                Author and article information

                Journal
                Genes (Basel)
                Genes (Basel)
                genes
                Genes
                MDPI
                2073-4425
                07 September 2018
                September 2018
                : 9
                : 9
                : 449
                Affiliations
                [1 ]School of Life Sciences, Shanghai University, Shanghai 200444, China; jiaruili@ 123456shu.edu.cn
                [2 ]College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; chen_lei1@ 123456163.com
                [3 ]Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China
                [4 ]Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; zhangyh825@ 123456163.com
                Author notes
                [* ]Correspondence: xykong@ 123456sibs.ac.cn (X.K.); tohuangtao@ 123456126.com (T.H.); cai_yud@ 123456126.com (Y.-D.C.); Tel.: +86-021-6613-6132 (Y.-D.C.)
                [†]

                These authors contributed to work equally.

                Author information
                https://orcid.org/0000-0003-3068-1583
                https://orcid.org/0000-0001-5664-7979
                Article
                genes-09-00449
                10.3390/genes9090449
                6162521
                30205473
                05a80de8-467d-4f9d-b1d4-a6cae4a8dbe5
                © 2018 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 03 August 2018
                : 04 September 2018
                Categories
                Article

                tissue-specific expressed genes,transcriptome,tissue classification,support vector machine,feature selection

                Comments

                Comment on this article