9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.

          Related collections

          Most cited references55

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PLINK: a tool set for whole-genome association and population-based linkage analyses.

            Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Regression Shrinkage and Selection Via the Lasso

                Bookmark

                Author and article information

                Contributors
                Journal
                Front Plant Sci
                Front Plant Sci
                Front. Plant Sci.
                Frontiers in Plant Science
                Frontiers Media S.A.
                1664-462X
                04 November 2022
                2022
                : 13
                : 932512
                Affiliations
                [1] 1 Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics , Straubing, Germany
                [2] 2 Weihenstephan-Triesdorf University of Applied Sciences , Bioinformatics, Straubing, Germany
                [3] 3 Computomics GmbH , Tübingen, Germany
                [4] 4 Technical University of Munich, Department of Informatics , Garching, Germany
                Author notes

                Edited by: Dirk Walther, Max Planck Institute of Molecular Plant Physiology, Germany

                Reviewed by: Hao Tong, Max Planck Institute of Molecular Plant Physiology, Germany; Karansher Singh Sandhu, Bayer Crop Science, United States

                *Correspondence: Dominik G. Grimm, dominik.grimm@ 123456hswt.de

                †These authors have contributed equally to this work and share first authorship

                This article was submitted to Technical Advances in Plant Science, a section of the journal Frontiers in Plant Science

                Article
                10.3389/fpls.2022.932512
                9673477
                36407627
                ab7b6bb8-97d4-48d0-a114-0917f2b24756
                Copyright © 2022 John, Haselbeck, Dass, Malisi, Ricca, Dreischer, Schultheiss and Grimm

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 29 April 2022
                : 25 July 2022
                Page count
                Figures: 4, Tables: 2, Equations: 8, References: 55, Pages: 16, Words: 11053
                Funding
                Funded by: Bundesministerium für Bildung und Forschung , doi 10.13039/501100002347;
                Categories
                Plant Science
                Original Research

                Plant science & Botany
                phenotype prediction,genomic selection,plant phenotyping,machine learning,arabidopsis thaliana

                Comments

                Comment on this article