Strategies for genotype imputation in composite beef cattle

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Genotype imputation has been used to increase genomic information, allow more animals in genome-wide analyses, and reduce genotyping costs. In Brazilian beef cattle production, many animals are resulting from crossbreeding and such an event may alter linkage disequilibrium patterns. Thus, the challenge is to obtain accurately imputed genotypes in crossbred animals. The objective of this study was to evaluate the best fitting and most accurate imputation strategy on the MA genetic group (the progeny of a Charolais sire mated with crossbred Canchim X Zebu cows) and Canchim cattle. The data set contained 400 animals (born between 1999 and 2005) genotyped with the Illumina BovineHD panel. Imputation accuracy of genotypes from the Illumina-Bovine3K (3K), Illumina-BovineLD (6K), GeneSeek-Genomic-Profiler (GGP) BeefLD (GGP9K), GGP-IndicusLD (GGP20Ki), Illumina-BovineSNP50 (50K), GGP-IndicusHD (GGP75Ki), and GGP-BeefHD (GGP80K) to Illumina-BovineHD (HD) SNP panels were investigated. Seven scenarios for reference and target populations were tested; the animals were grouped according with birth year (S1), genetic groups (S2 and S3), genetic groups and birth year (S4 and S5), gender (S6), and gender and birth year (S7). Analyses were performed using FImpute and BEAGLE software and computation run-time was recorded. Genotype imputation accuracy was measured by concordance rate (CR) and allelic R square (R ²).

Results

The highest imputation accuracy scenario consisted of a reference population with males and females and a target population with young females. Among the SNP panels in the tested scenarios, from the 50K, GGP75Ki and GGP80K were the most adequate to impute to HD in Canchim cattle. FImpute reduced computation run-time to impute genotypes from 20 to 100 times when compared to BEAGLE.

Conclusion

The genotyping panels possessing at least 50 thousands markers are suitable for genotype imputation to HD with acceptable accuracy. The FImpute algorithm demonstrated a higher efficiency of imputed markers, especially in lower density panels. These considerations may assist to increase genotypic information, reduce genotyping costs, and aid in genomic selection evaluations in crossbred animals.

Electronic supplementary material

The online version of this article (doi:10.1186/s12863-015-0251-7) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: found

Is Open Access

A new approach for efficient genotype imputation using information from relatives

Mehdi Sargolzaei, Jacques P Chesnais, Flavio S. Schenkel (2014)

Background Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships. Results The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals. Conclusions The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.

0 comments Cited 385 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.

A de Roos, B. Hayes, R J Spelman … (2008)

When a genetic marker and a quantitative trait locus (QTL) are in linkage disequilibrium (LD) in one population, they may not be in LD in another population or their LD phase may be reversed. The objectives of this study were to compare the extent of LD and the persistence of LD phase across multiple cattle populations. LD measures r and r(2) were calculated for syntenic marker pairs using genomewide single-nucleotide polymorphisms (SNP) that were genotyped in Dutch and Australian Holstein-Friesian (HF) bulls, Australian Angus cattle, and New Zealand Friesian and Jersey cows. Average r(2) was approximately 0.35, 0.25, 0.22, 0.14, and 0.06 at marker distances 10, 20, 40, 100, and 1000 kb, respectively, which indicates that genomic selection within cattle breeds with r(2) >or= 0.20 between adjacent markers would require approximately 50,000 SNPs. The correlation of r values between populations for the same marker pairs was close to 1 for pairs of very close markers (<10 kb) and decreased with increasing marker distance and the extent of divergence between the populations. To find markers that are in LD with QTL across diverged breeds, such as HF, Jersey, and Angus, would require approximately 300,000 markers.

0 comments Cited 196 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking.

Hans D Daetwyler, Mario Calus, Ricardo Pong-Wong … (2013)

The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.

0 comments Cited 187 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Tatiane C. S. Chud: tatischud@gmail.com

Ricardo V. Ventura: rvventura@gmail.com

Flavio S. Schenkel: schenkel@uoguelph.ca

Roberto Carvalheiro: rcar@fcav.unesp.br

Marcos E. Buzanskas: marcosbuz@yahoo.com.br

Jaqueline O. Rosa: jaqueolrosa@hotmail.com

Maurício de Alvarenga Mudadu: mauricio.mudadu@embrapa.br

Marcos Vinicius G. B. da Silva: marcos.vb.silva@embrapa.br

Fabiana B. Mokry: fabiana_barichello@yahoo.com.br

Cintia R. Marcondes: cintia.marcondes@embrapa.br

Luciana C. A. Regitano: luciana.regitano@embrapa.br

Danísio P. Munari: +55 1632092624 , danisio@fcav.unesp.br

Journal

Journal ID (nlm-ta): BMC Genet

Journal ID (iso-abbrev): BMC Genet

Title: BMC Genetics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2156

Publication date (Electronic): 7 August 2015

Publication date PMC-release: 7 August 2015

Publication date Collection: 2015

Volume: 16

Electronic Location Identifier: 99

Affiliations

[ ]Departamento de Ciências Exatas, UNESP - Univ Estadual Paulista “Júlio de Mesquita Filho”, Jaboticabal, SP Brazil

[ ]Departamento de Zootecnia, UNESP - Univ Estadual Paulista “Júlio de Mesquita Filho”, Jaboticabal, SP Brazil

[ ]Beef Improvement Opportunities, Guelph, ON Canada

[ ]University of Guelph, Guelph, ON Canada

[ ]Embrapa Southeast Livestock - Brazilian Corporation of Agricultural Research, São Carlos, SP Brazil

[ ]Embrapa Dairy Cattle - Brazilian Corporation of Agricultural Research, Juiz de Fora, MG Brazil

[ ]Department of Genetics and Evolution, Federal University of São Carlos, São Carlos, SP Brazil

Article

Publisher ID: 251

DOI: 10.1186/s12863-015-0251-7

PMC ID: 4527250

PubMed ID: 26250698

SO-VID: acefc4c1-2fce-4261-8066-228ffec358a5

License:

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 9 March 2015

Date accepted : 9 July 2015

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: canchim breed,crossbred cattle,genomic data,low-density panel,single nucleotide polymorphism

Data availability:

ScienceOpen disciplines: Genetics

Keywords: canchim breed, crossbred cattle, genomic data, low-density panel, single nucleotide polymorphism

Strategies for genotype imputation in composite beef cattle

Read this article at

Abstract

Background

Results

Conclusion

Electronic supplementary material

Related collections

Arabidopsis genomics

Most cited references 22

A new approach for efficient genotype imputation using information from relatives

Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.

Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 255

Cited by 6

Most referenced authors 198