Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Inferring phenotypic outcomes from genomic features is both a promise and challenge for systems biology. Using gene expression data to predict phenotypic outcomes, and functionally validating the genes with predictive powers are two challenges we address in this study. We applied an evolutionarily informed machine learning approach to predict phenotypes based on transcriptome responses shared both within and across species. Specifically, we exploited the phenotypic diversity in nitrogen use efficiency and evolutionarily conserved transcriptome responses to nitrogen treatments across Arabidopsis accessions and maize varieties. We demonstrate that using evolutionarily conserved nitrogen responsive genes is a biologically principled approach to reduce the feature dimensionality in machine learning that ultimately improved the predictive power of our gene-to-trait models. Further, we functionally validated seven candidate transcription factors with predictive power for NUE outcomes in Arabidopsis and one in maize. Moreover, application of our evolutionarily informed pipeline to other species including rice and mice models underscores its potential to uncover genes affecting any physiological or clinical traits of interest across biology, agriculture, or medicine.

          Abstract

          Predicting complex phenotypes from genomic information is still a challenge. Here, the authors use an evolutionarily informed machine learning approach within and across species to predict genes affecting nitrogen utilization in crops, and show their approach is also useful in mammalian systems.

          Related collections

          Most cited references63

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

          Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

            Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genome-wide insertional mutagenesis of Arabidopsis thaliana.

              J Alonso (2003)
              Over 225,000 independent Agrobacterium transferred DNA (T-DNA) insertion events in the genome of the reference plant Arabidopsis thaliana have been created that represent near saturation of the gene space. The precise locations were determined for more than 88,000 T-DNA insertions, which resulted in the identification of mutations in more than 21,700 of the approximately 29,454 predicted Arabidopsis genes. Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels. Insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.
                Bookmark

                Author and article information

                Contributors
                gloria.coruzzi@nyu.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                24 September 2021
                24 September 2021
                2021
                : 12
                : 5627
                Affiliations
                [1 ]GRID grid.137628.9, ISNI 0000 0004 1936 8753, Department of Biology, Center for Genomics and Systems Biology, , New York University, ; New York, NY 10003 USA
                [2 ]GRID grid.169077.e, ISNI 0000 0004 1937 2197, Department of Horticulture and Landscape Architecture, , Purdue University, ; West Lafayette, IN USA
                [3 ]GRID grid.169077.e, ISNI 0000 0004 1937 2197, Purdue Center for Plant Biology, , Purdue University, ; West Lafayette, IN USA
                [4 ]GRID grid.35403.31, ISNI 0000 0004 1936 9991, Department of Crop Sciences, , University of Illinois at Urbana-Champaign, ; Urbana, IL 61801 USA
                [5 ]GRID grid.19188.39, ISNI 0000 0004 0546 0241, Present Address: Department of Life Science, , National Taiwan University, ; Taipei, Taiwan
                Author information
                http://orcid.org/0000-0003-1051-6636
                http://orcid.org/0000-0002-2381-0853
                http://orcid.org/0000-0002-4416-7983
                http://orcid.org/0000-0003-2439-9012
                http://orcid.org/0000-0002-1011-2466
                http://orcid.org/0000-0001-6516-4995
                http://orcid.org/0000-0003-2608-2166
                Article
                25893
                10.1038/s41467-021-25893-w
                8463701
                34561450
                d4a8e2ef-80eb-4b71-a1e0-a9e74476bb03
                © The Author(s) 2021

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 7 October 2020
                : 3 September 2021
                Funding
                Funded by: FundRef https://doi.org/10.13039/100000001, National Science Foundation (NSF);
                Award ID: IOS-1339362
                Award ID: 2016-67011025167
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100007917, United States Department of Agriculture | Agricultural Research Service (USDA Agricultural Research Service);
                Award ID: 1013620
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2021

                Uncategorized
                machine learning,transcriptomics
                Uncategorized
                machine learning, transcriptomics

                Comments

                Comment on this article