Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The availability of whole genome sequencing (WGS) data enables the discovery of causative single nucleotide polymorphisms (SNPs) or SNPs in high linkage disequilibrium with causative SNPs. This study investigated effects of integrating SNPs selected from imputed WGS data into the data of 54K chip on genomic prediction in Danish Jersey. The WGS SNPs, mainly including peaks of quantitative trait loci, structure variants, regulatory regions of genes, and SNPs within genes with strong effects predicted with variant effect predictor, were selected in previous analyses for dairy breeds in Denmark–Finland–Sweden (DFS) and France (FRA). Animals genotyped with 54K chip, standard LD chip, and customized LD chip which covered selected WGS SNPs and SNPs in the standard LD chip, were imputed to 54K together with DFS and FRA SNPs. Genomic best linear unbiased prediction (GBLUP) and Bayesian four-distribution mixture models considering 54K and selected WGS SNPs as one (a one-component model) or two separate genetic components (a two-component model) were used to predict breeding values. For milk production traits and mastitis, both DFS (0.025) and FRA (0.029) sets of additional WGS SNPs improved reliabilities, and inclusions of all selected WGS SNPs generally achieved highest improvements of reliabilities (0.034). A Bayesian four-distribution model yielded higher reliabilities than a GBLUP model for milk and protein, but extra gains in reliabilities from using selected WGS SNPs were smaller for a Bayesian four-distribution model than a GBLUP model. Generally, no significant difference was observed between one-component and two-component models, except for using GBLUP models for milk.

Related collections

Most cited references 37

Record: found
Abstract: found
Article: found

Is Open Access

A new approach for efficient genotype imputation using information from relatives

Mehdi Sargolzaei, Jacques P Chesnais, Flavio S. Schenkel (2014)

Background Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships. Results The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals. Conclusions The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.

0 comments Cited 385 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach

Hans D Daetwyler, Beatriz Villanueva, John A. Woolliams (2008)

Background The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.

0 comments Cited 289 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.

M Erbe, B. Hayes, L Matukumalli … (2012)

Achieving accurate genomic estimated breeding values for dairy cattle requires a very large reference population of genotyped and phenotyped individuals. Assembling such reference populations has been achieved for breeds such as Holstein, but is challenging for breeds with fewer individuals. An alternative is to use a multi-breed reference population, such that smaller breeds gain some advantage in accuracy of genomic estimated breeding values (GEBV) from information from larger breeds. However, this requires that marker-quantitative trait loci associations persist across breeds. Here, we assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers. The surrogate used for accuracy was the correlation of GEBV with daughter trait deviations in a validation population. Two methods were used to predict breeding values, either a genomic BLUP (GBLUP_mod), or a new method, BayesR, which used a mixture of normal distributions as the prior for SNP effects, including one distribution that set SNP effects to zero. The GBLUP_mod method scaled both the genomic relationship matrix and the additive relationship matrix to a base at the time the breeds diverged, and regressed the genomic relationship matrix to account for sampling errors in estimating relationship coefficients due to a finite number of markers, before combining the 2 matrices. Although these modifications did result in less biased breeding values for Jerseys compared with an unmodified genomic relationship matrix, BayesR gave the highest accuracies of GEBV for the 3 traits investigated (milk yield, fat yield, and protein yield), with an average increase in accuracy compared with GBLUP_mod across the 3 traits of 0.05 for both Jerseys and Holsteins. The advantage was limited for either Jerseys or Holsteins in using 624,213 SNP rather than 39,745 SNP (0.01 for Holsteins and 0.03 for Jerseys, averaged across traits). Even this limited and nonsignificant advantage was only observed when BayesR was used. An alternative panel, which extracted the SNP in the transcribed part of the bovine genome from the 624,213 SNP panel (to give 58,532 SNP), performed better, with an increase in accuracy of 0.03 for Jerseys across traits. This panel captures much of the increased genomic content of the 624,213 SNP panel, with the advantage of a greatly reduced number of SNP effects to estimate. Taken together, using this panel, a combined breed reference and using BayesR rather than GBLUP_mod increased the accuracy of GEBV in Jerseys from 0.43 to 0.52, averaged across the 3 traits. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

0 comments Cited 256 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Yachun Wang:

ORCID: http://orcid.org/0000-0003-3629-2802

+8615801595851 , wangyachun@cau.edu.cn

Guosheng Su: +4587157985 , guosheng.su@mbg.au.dk

Journal

Journal ID (nlm-ta): Heredity (Edinb)

Journal ID (iso-abbrev): Heredity (Edinb)

Title: Heredity

Publisher: Springer International Publishing (Cham )

ISSN (Print): 0018-067X

ISSN (Electronic): 1365-2540

Publication date (Electronic): 5 July 2019

Publication date PMC-release: 5 July 2019

Publication date (Print): January 2020

Volume: 124

Issue: 1

Pages: 37-49

Affiliations

[1 ]ISNI 0000 0001 1956 2722, GRID grid.7048.b, Department of Molecular Biology and Genetics, , Aarhus University, ; 8830 Tjele, Denmark

[2 ]ISNI 0000 0004 0530 8290, GRID grid.22935.3f, Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA; National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, , China Agricultural University, ; 100193 Beijing, PR China

[3 ]ISNI 0000 0004 4910 6535, GRID grid.460789.4, GABI, INRA, AGROParisTech, , Université Paris Saclay, ; 78350 Jouy-en-Josas, France

[4 ]ISNI 0000 0001 1956 2722, GRID grid.7048.b, Nordic Cattle Genetic Evaluation, ; 8200 Aarhus N, Denmark

[5 ]ISNI 0000 0004 4688 8316, GRID grid.426594.8, Seges, ; 8200 Aarhus N, Denmark

Author information

Didier Boichard http://orcid.org/0000-0003-0361-2961

Emre Karaman http://orcid.org/0000-0003-1010-683X

Yachun Wang http://orcid.org/0000-0003-3629-2802

Article

Publisher ID: 246

DOI: 10.1038/s41437-019-0246-7

PMC ID: 6906477

PubMed ID: 31278370

SO-VID: ded90510-f493-45f1-8ed0-8b6b587d9e7c

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 30 January 2019

Date revision received : 11 May 2019

Date accepted : 17 June 2019

Funding

Funded by: Genomic Selection in Animals and Plants (GenSAP)

Custom metadata

ScienceOpen disciplines: Human biology

Keywords: animal breeding,quantitative trait,genetic markers

Data availability:

ScienceOpen disciplines: Human biology

Keywords: animal breeding, quantitative trait, genetic markers

Comments

Comment on this article

scite_

Cited by 20

See all cited by

Most referenced authors 475

See all reference authors

- Version 1

Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 37

A new approach for efficient genotype imputation using information from relatives

Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach

Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 223

Cited by 20

Most referenced authors 475