8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Prediction of condition-specific regulatory genes using machine learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.

          Related collections

          Most cited references82

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Stability selection

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              High-resolution mapping and characterization of open chromatin across the genome.

              Mapping DNase I hypersensitive (HS) sites is an accurate method of identifying the location of genetic regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions. We employed high-throughput sequencing and whole-genome tiled array strategies to identify DNase I HS sites within human primary CD4+ T cells. Combining these two technologies, we have created a comprehensive and accurate genome-wide open chromatin map. Surprisingly, only 16%-21% of the identified 94,925 DNase I HS sites are found in promoters or first exons of known genes, but nearly half of the most open sites are in these regions. In conjunction with expression, motif, and chromatin immunoprecipitation data, we find evidence of cell-type-specific characteristics, including the ability to identify transcription start sites and locations of different chromatin marks utilized in these cells. In addition, and unexpectedly, our analyses have uncovered detailed features of nucleosome structure.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                19 June 2020
                24 April 2020
                24 April 2020
                : 48
                : 11
                : e62
                Affiliations
                [1 ] Graduate program in Genetics, Bioinformatics and Computational Biology . Virginia Tech., Blacksburg, VA 24061, USA
                [2 ] School of Plant and Environmental Sciences . Virginia Tech., Blacksburg, VA 24061, USA
                [3 ] Department of Statistics . Virginia Tech., Blacksburg, VA 24061, USA
                Author notes
                To whom correspondence should be addressed. Tel: +1 540 231 2756; Email: songli@ 123456vt.edu
                Author information
                http://orcid.org/0000-0002-8133-3944
                Article
                gkaa264
                10.1093/nar/gkaa264
                7293043
                32329779
                f019ae71-2679-4ae7-8b53-f0d49e956378
                © The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

                History
                : 20 April 2020
                : 19 February 2020
                : 06 November 2019
                Page count
                Pages: 17
                Funding
                Funded by: Jeffress Trust, DOI 10.13039/100006990;
                Funded by: United States Department of Energy;
                Award ID: DE-SC0020358
                Funded by: United States Department of Agriculture, DOI 10.13039/100000199;
                Categories
                AcademicSubjects/SCI00010
                Narese/7
                Narese/24
                Methods Online

                Genetics
                Genetics

                Comments

                Comment on this article