Quantifying the Fraction of Missing Information for Hypothesis Testing
  in Statistical and Genetic Studies

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we show that a satisfactory, likelihood-based general solution exists by using appropriate forms of the relative Kullback--Leibler information, and that the proposed measures are computationally inexpensive given the maximized likelihoods with the observed data. Two measures are introduced, under the null and alternative hypothesis respectively. We exemplify the measures on data coming from mapping studies on the inflammatory bowel disease and diabetes. For small-sample problems, which appear rather frequently in practice and sometimes in disguised forms (e.g., measuring individual contributions to a large study), the robust Bayesian approach holds great promise, though the choice of a general-purpose "default prior" is a very challenging problem.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: not found

Association mapping in structured populations.

J. Pritchard, M. Stephens, N. Rosenberg … (2000)

The use, in association studies, of the forthcoming dense genomewide collection of single-nucleotide polymorphisms (SNPs) has been heralded as a potential breakthrough in the study of the genetic basis of common complex disorders. A serious problem with association mapping is that population structure can lead to spurious associations between a candidate marker and a phenotype. One common solution has been to abandon case-control studies in favor of family-based tests of association, such as the transmission/disequilibrium test (TDT), but this comes at a considerable cost in the need to collect DNA from close relatives of affected individuals. In this article we describe a novel, statistically valid, method for case-control association studies in structured populations. Our method uses a set of unlinked genetic markers to infer details of population structure, and to estimate the ancestry of sampled individuals, before using this information to test for associations within subpopulations. It provides power comparable with the TDT in many settings and may substantially outperform it if there are conflicting associations in different subpopulations.

0 comments Cited 520 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Maximum likelihood estimation via the ECM algorithm: A general framework

Xiao-Li Meng, Donald B Rubin (1993)

0 comments Cited 253 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A general test of association for quantitative traits in nuclear families.

G Abecasis, L R Cardon, W Cookson (2000)

High-resolution mapping is an important step in the identification of complex disease genes. In outbred populations, linkage disequilibrium is expected to operate over short distances and could provide a powerful fine-mapping tool. Here we build on recently developed methods for linkage-disequilibrium mapping of quantitative traits to construct a general approach that can accommodate nuclear families of any size, with or without parental information. Variance components are used to construct a test that utilizes information from all available offspring but that is not biased in the presence of linkage or familiality. A permutation test is described for situations in which maximum-likelihood estimates of the variance components are biased. Simulation studies are used to investigate power and error rates of this approach and to highlight situations in which violations of multivariate normality assumptions warrant the permutation test. The relationship between power and the level of linkage disequilibrium for this test suggests that the method is well suited to the analysis of dense maps. The relationship between power and family structure is investigated, and these results are applicable to study design in complex disease, especially for late-onset conditions for which parents are usually not available. When parental genotypes are available, power does not depend greatly on the number of offspring in each family. Power decreases when parental genotypes are not available, but the loss in power is negligible when four or more offspring per family are genotyped. Finally, it is shown that, when siblings are available, the total number of genotypes required in order to achieve comparable power is smaller if parents are not genotyped.

0 comments Cited 209 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Publication date Created: 14 February 2011

Article

DOI: 10.1214/07-STS244

ArXiV ID: 1102.2774

SO-VID: 38bd0920-aa38-41e6-ac67-8491e71623eb

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Report No IMS-STS-STS244

Journal reference Statistical Science 2008, Vol. 23, No. 3, 287-312

Comments Published in at http://dx.doi.org/10.1214/07-STS244 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Categories stat.ME

Proxy vtex

Data availability: