Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Genomic selection is a recently developed technology that is beginning to revolutionize animal breeding. The objective of this study was to estimate marker effects to derive prediction equations for direct genomic values for 16 routinely recorded traits of American Angus beef cattle and quantify corresponding accuracies of prediction.

Methods

Deregressed estimated breeding values were used as observations in a weighted analysis to derive direct genomic values for 3570 sires genotyped using the Illumina BovineSNP50 BeadChip. These bulls were clustered into five groups using K-means clustering on pedigree estimates of additive genetic relationships between animals, with the aim of increasing within-group and decreasing between-group relationships. All five combinations of four groups were used for model training, with cross-validation performed in the group not used in training. Bivariate animal models were used for each trait to estimate the genetic correlation between deregressed estimated breeding values and direct genomic values.

Results

Accuracies of direct genomic values ranged from 0.22 to 0.69 for the studied traits, with an average of 0.44. Predictions were more accurate when animals within the validation group were more closely related to animals in the training set. When training and validation sets were formed by random allocation, the accuracies of direct genomic values ranged from 0.38 to 0.85, with an average of 0.65, reflecting the greater relationship between animals in training and validation. The accuracies of direct genomic values obtained from training on older animals and validating in younger animals were intermediate to the accuracies obtained from K-means clustering and random clustering for most traits. The genetic correlation between deregressed estimated breeding values and direct genomic values ranged from 0.15 to 0.80 for the traits studied.

Conclusions

These results suggest that genomic estimates of genetic merit can be produced in beef cattle at a young age but the recurrent inclusion of genotyped sires in retraining analyses will be necessary to routinely produce for the industry the direct genomic values with the highest accuracy.

Related collections

Most cited references 20

Record: found
Abstract: not found
Article: not found

Algorithm AS 136: A K-Means Clustering Algorithm

J. A. Hartigan, M. A. Wong (1979)

0 comments Cited 790 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Paul Scheet, Matthew Stephens (2006)

We present a statistical model for patterns of genetic variation in samples of unrelated individuals from natural populations. This model is based on the idea that, over short regions, haplotypes in a population tend to cluster into groups of similar haplotypes. To capture the fact that, because of recombination, this clustering tends to be local in nature, our model allows cluster memberships to change continuously along the chromosome according to a hidden Markov model. This approach is flexible, allowing for both "block-like" patterns of linkage disequilibrium (LD) and gradual decline in LD with distance. The resulting model is also fast and, as a result, is practicable for large data sets (e.g., thousands of individuals typed at hundreds of thousands of markers). We illustrate the utility of the model by applying it to dense single-nucleotide-polymorphism genotype data for the tasks of imputing missing genotypes and estimating haplotypic phase. For imputing missing genotypes, methods based on this model are as accurate or more accurate than existing methods. For haplotype estimation, the point estimates are slightly less accurate than those from the best existing methods (e.g., for unrelated Centre d'Etude du Polymorphisme Humain individuals from the HapMap project, switch error was 0.055 for our method vs. 0.051 for PHASE) but require a small fraction of the computational cost. In addition, we demonstrate that the model accurately reflects uncertainty in its estimates, in that probabilities computed using the model are approximately well calibrated. The methods described in this article are implemented in a software package, fastPHASE, which is available from the Stephens Lab Web site.

0 comments Cited 754 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Development and Characterization of a High Density SNP Genotyping Assay for Cattle

Lakshmi Matukumalli, Cynthia Lawley, Robert Schnabel … (2009)

The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.

0 comments Cited 381 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Genet Sel Evol

Title: Genetics, Selection, Evolution : GSE

Publisher: BioMed Central

ISSN (Print): 0999-193X

ISSN (Electronic): 1297-9686

Publication date Collection: 2011

Publication date (Electronic): 28 November 2011

Volume: 43

Issue: 1

Page: 40

Affiliations

[1 ]Department of Animal Science, Iowa State University, Ames, 50011, USA

[2 ]Division of Animal Sciences, University of Missouri, Columbia, 65211, USA

[3 ]Bovine Functional Genomics Laboratory, ARS, USDA, Beltsville, MD 20705, USA

[4 ]American Angus Association, 3201 Frederick Avenue, Saint Joseph, 64506, USA

[5 ]Igenity Livestock Business Unit, Merial Limited, Duluth, 30096, USA

[6 ]Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand

Article

Publisher ID: 1297-9686-43-40

DOI: 10.1186/1297-9686-43-40

PMC ID: 3250932

PubMed ID: 22122853

SO-VID: ed3c30b4-7bb8-4fd4-8174-42d75a4ae103

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation

Read this article at

Abstract

Background

Methods

Results

Conclusions

Related collections

Genome Engineering using CRISPR

Most cited references 20

Algorithm AS 136: A K-Means Clustering Algorithm

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Development and Characterization of a High Density SNP Genotyping Assay for Cattle

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 274

Cited by 90

Most referenced authors 363