+1 Recommend
2 collections
      • Record: found
      • Abstract: found
      • Article: found

      Quality control of genotypes using heritability estimates of gene content at the marker.

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.

          Related collections

          Most cited references 24

          • Record: found
          • Abstract: found
          • Article: not found

          Efficient methods to compute genomic predictions.

           Paul VanRaden (2008)
          Efficient methods for processing genomic data were developed to increase reliability of estimated breeding values and to estimate thousands of marker effects simultaneously. Algorithms were derived and computer programs tested with simulated data for 2,967 bulls and 50,000 markers distributed randomly across 30 chromosomes. Estimation of genomic inbreeding coefficients required accurate estimates of allele frequencies in the base population. Linear model predictions of breeding values were computed by 3 equivalent methods: 1) iteration for individual allele effects followed by summation across loci to obtain estimated breeding values, 2) selection index including a genomic relationship matrix, and 3) mixed model equations including the inverse of genomic relationships. A blend of first- and second-order Jacobi iteration using 2 separate relaxation factors converged well for allele frequencies and effects. Reliability of predicted net merit for young bulls was 63% compared with 32% using the traditional relationship matrix. Nonlinear predictions were also computed using iteration on data and nonlinear regression on marker deviations; an additional (about 3%) gain in reliability for young bulls increased average reliability to 66%. Computing times increased linearly with number of genotypes. Estimation of allele frequencies required 2 processor days, and genomic predictions required <1 d per trait, and traits were processed in parallel. Information from genotyping was equivalent to about 20 daughters with phenotypic records. Actual gains may differ because the simulation did not account for linkage disequilibrium in the base population or selection in subsequent generations.
            • Record: found
            • Abstract: found
            • Article: not found

            Multipoint quantitative-trait linkage analysis in general pedigrees.

            Multipoint linkage analysis of quantitative-trait loci (QTLs) has previously been restricted to sibships and small pedigrees. In this article, we show how variance-component linkage methods can be used in pedigrees of arbitrary size and complexity, and we develop a general framework for multipoint identity-by-descent (IBD) probability calculations. We extend the sib-pair multipoint mapping approach of Fulker et al. to general relative pairs. This multipoint IBD method uses the proportion of alleles shared identical by descent at genotyped loci to estimate IBD sharing at arbitrary points along a chromosome for each relative pair. We have derived correlations in IBD sharing as a function of chromosomal distance for relative pairs in general pedigrees and provide a simple framework whereby these correlations can be easily obtained for any relative pair related by a single line of descent or by multiple independent lines of descent. Once calculated, the multipoint relative-pair IBDs can be utilized in variance-component linkage analysis, which considers the likelihood of the entire pedigree jointly. Examples are given that use simulated data, demonstrating both the accuracy of QTL localization and the increase in power provided by multipoint analysis with 5-, 10-, and 20-cM marker maps. The general pedigree variance component and IBD estimation methods have been implemented in the SOLAR (Sequential Oligogenic Linkage Analysis Routines) computer package.
              • Record: found
              • Abstract: not found
              • Article: not found

              Recovery of inter-block information when block sizes are unequal


                Author and article information

                Genetics Society of America
                Mar 2015
                : 199
                : 3
                [1 ] Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, C1417DSE Buenos Aires, Argentina Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1033AAJ Buenos Aires, Argentina.
                [2 ] INRA, Génétique, Physiologie et Systèmes d'Elevage (GenPhySE), F-31326 Castanet-Tolosan, France Université de Toulouse, INP, ENSAT, Génétique, Physiologie et Systèmes d'Elevage (GenPhySE), F-31326 Castanet-Tolosan, France
                [3 ] INRA, Génétique, Physiologie et Systèmes d'Elevage (GenPhySE), F-31326 Castanet-Tolosan, France Université de Toulouse, INP, ENSAT, Génétique, Physiologie et Systèmes d'Elevage (GenPhySE), F-31326 Castanet-Tolosan, France.
                [4 ] Animal and Dairy Science, University of Georgia, Athens, Georgia 30602.
                [5 ] Instituto Nacional de Investigación Agropecuaria, Canelones 90200, Uruguay.
                Copyright © 2015 by the Genetics Society of America.


                Comment on this article