3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      How Population Structure Impacts Genomic Selection Accuracy in Cross-Validation: Implications for Practical Breeding

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Over the last two decades, the application of genomic selection has been extensively studied in various crop species, and it has become a common practice to report prediction accuracies using cross validation. However, genomic prediction accuracies obtained from random cross validation can be strongly inflated due to population or family structure, a characteristic shared by many breeding populations. An understanding of the effect of population and family structure on prediction accuracy is essential for the successful application of genomic selection in plant breeding programs. The objective of this study was to make this effect and its implications for practical breeding programs comprehensible for breeders and scientists with a limited background in quantitative genetics and genomic selection theory. We, therefore, compared genomic prediction accuracies obtained from different random cross validation approaches and within-family prediction in three different prediction scenarios. We used a highly structured population of 940 Brassica napus hybrids coming from 46 testcross families and two subpopulations. Our demonstrations show how genomic prediction accuracies obtained from among-family predictions in random cross validation and within-family predictions capture different measures of prediction accuracy. While among-family prediction accuracy measures prediction accuracy of both the parent average component and the Mendelian sampling term, within-family prediction only measures how accurately the Mendelian sampling term can be predicted. With this paper we aim to foster a critical approach to different measures of genomic prediction accuracy and a careful analysis of values observed in genomic selection experiments and reported in literature.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: not found

          Efficient methods to compute genomic predictions.

          P VanRaden (2008)
          Efficient methods for processing genomic data were developed to increase reliability of estimated breeding values and to estimate thousands of marker effects simultaneously. Algorithms were derived and computer programs tested with simulated data for 2,967 bulls and 50,000 markers distributed randomly across 30 chromosomes. Estimation of genomic inbreeding coefficients required accurate estimates of allele frequencies in the base population. Linear model predictions of breeding values were computed by 3 equivalent methods: 1) iteration for individual allele effects followed by summation across loci to obtain estimated breeding values, 2) selection index including a genomic relationship matrix, and 3) mixed model equations including the inverse of genomic relationships. A blend of first- and second-order Jacobi iteration using 2 separate relaxation factors converged well for allele frequencies and effects. Reliability of predicted net merit for young bulls was 63% compared with 32% using the traditional relationship matrix. Nonlinear predictions were also computed using iteration on data and nonlinear regression on marker deviations; an additional (about 3%) gain in reliability for young bulls increased average reliability to 66%. Computing times increased linearly with number of genotypes. Estimation of allele frequencies required 2 processor days, and genomic predictions required <1 d per trait, and traits were processed in parallel. Information from genotyping was equivalent to about 20 daughters with phenotypic records. Actual gains may differ because the simulation did not account for linkage disequilibrium in the base population or selection in subsequent generations.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.

            Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach

              Background The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Plant Sci
                Front Plant Sci
                Front. Plant Sci.
                Frontiers in Plant Science
                Frontiers Media S.A.
                1664-462X
                16 December 2020
                2020
                : 11
                : 592977
                Affiliations
                [1] 1The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Research Centre , Midlothian, United Kingdom
                [2] 2NPZ Innovation GmbH , Holtsee, Germany
                [3] 3German Seed Alliance GmbH , Hohenlieth, Germany
                [4] 4Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University , Giessen, Germany
                [5] 5Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance , Quedlinburg, Germany
                Author notes

                Edited by: Ryo Fujimoto, Kobe University, Japan

                Reviewed by: Hans D. Daetwyler, La Trobe University, Australia; Shouvik Das, Indian Agricultural Research Institute (ICAR), India

                *Correspondence: Andreas Stahl andreas.stahl@ 123456julius-kuehn.de

                This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science

                Article
                10.3389/fpls.2020.592977
                7772221
                33391305
                994d471b-3536-4073-b224-e95db24ccb04
                Copyright © 2020 Werner, Gaynor, Gorjanc, Hickey, Kox, Abbadi, Leckband, Snowdon and Stahl.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 08 August 2020
                : 24 November 2020
                Page count
                Figures: 5, Tables: 1, Equations: 6, References: 31, Pages: 14, Words: 8895
                Funding
                Funded by: Bundesministerium für Bildung und Forschung 10.13039/501100002347
                Categories
                Plant Science
                Original Research

                Plant science & Botany
                predictive breeding,genomic prediction,oilseed rape,nested association mapping population,structure

                Comments

                Comment on this article