ParallABEL: an R library for generalized parallelization of genome-wide association studies

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files.

Results

Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors.

Conclusions

Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.

Related collections

Most cited references 8

Record: found
Abstract: found
Article: not found

TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study.

Robert Plenge, Mark Seielstad, Leonid Padyukov … (2007)

Rheumatoid arthritis has a complex mode of inheritance. Although HLA-DRB1 and PTPN22 are well-established susceptibility loci, other genes that confer a modest level of risk have been identified recently. We carried out a genomewide association analysis to identify additional genetic loci associated with an increased risk of rheumatoid arthritis. We genotyped 317,503 single-nucleotide polymorphisms (SNPs) in a combined case-control study of 1522 case subjects with rheumatoid arthritis and 1850 matched control subjects. The patients were seropositive for autoantibodies against cyclic citrullinated peptide (CCP). We obtained samples from two data sets, the North American Rheumatoid Arthritis Consortium (NARAC) and the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA). Results from NARAC and EIRA for 297,086 SNPs that passed quality-control filters were combined with the use of Cochran-Mantel-Haenszel stratified analysis. SNPs showing a significant association with disease (P<1x10(-8)) were genotyped in an independent set of case subjects with anti-CCP-positive rheumatoid arthritis (485 from NARAC and 512 from EIRA) and in control subjects (1282 from NARAC and 495 from EIRA). We observed associations between disease and variants in the major-histocompatibility-complex locus, in PTPN22, and in a SNP (rs3761847) on chromosome 9 for all samples tested, the latter with an odds ratio of 1.32 (95% confidence interval, 1.23 to 1.42; P=4x10(-14)). The SNP is in linkage disequilibrium with two genes relevant to chronic inflammation: TRAF1 (encoding tumor necrosis factor receptor-associated factor 1) and C5 (encoding complement component 5). A common genetic variant at the TRAF1-C5 locus on chromosome 9 is associated with an increased risk of anti-CCP-positive rheumatoid arthritis. Copyright 2007 Massachusetts Medical Society.

0 comments Cited 235 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Implementing a unified approach to family-based tests of association.

N M Laird, S Horvath, X. Xu (2000)

We describe a broad class of family-based association tests that are adjusted for admixture; use either dichotomous or measured phenotypes; accommodate phenotype-unknown subjects; use nuclear families, sibships or a combination of the two, permit multiple nuclear families from a single pedigree; incorporate di- or multi-allelic marker data; allow additive, dominant or recessive models; and permit adjustment for covariates and gene-by-environment interactions. The test statistic is basically the covariance between a user-specified function of the genotype and a user-specified function of the trait. The distribution of the statistic is computed using the appropriate conditional distribution of offspring genotypes that adjusts for admixture.

0 comments Cited 170 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Pedigree disequilibrium tests for multilocus haplotypes.

Frank Dudbridge (2003)

Association tests of multilocus haplotypes are of interest both in linkage disequilibrium mapping and in candidate gene studies. For case-parent trios, I discuss the extension of existing multilocus methods to include ambiguous haplotypes in tests of models which distinguish between the cis and trans phase. A likelihood-ratio test is proposed, using the expectation-maximization (E-M) algorithm to account for haplotype ambiguities. Assumptions about the population structure are required, but realistic situations, including population stratification, which violate the assumptions lead to conservative tests. I describe a permutation procedure for the null hypothesis of interest, which controls for violation of the assumptions. For general pedigrees, I describe extensions of the pedigree disequilibrium test to include uncertain haplotypes. The summary statistics are replaced by their expected values over prior distributions of haplotype frequencies. If prior distributions are not available, a valid test is possible by using the E-M algorithm to estimate the null distribution of haplotype frequencies. Similar methods are available for quantitative traits. Exact permutation tests are difficult to construct in small samples, but an approximate procedure is appropriate in large samples, and can be used to account for dependencies between tests of multiple haplotypes and loci. Copyright 2003 Wiley-Liss, Inc.

0 comments Cited 148 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2010

Publication date (Electronic): 29 April 2010

Volume: 11

Page: 217

Affiliations

[1 ]Center for Genomics and Bioinformatics Research, Faculty of Science, Prince of Songkla University, Songkhla, 90112, Thailand

[2 ]Medical Genetic Section, National Institute of Health, Department of Medical Sciences, Ministry of Public Health, Nonthaburi, 11000, Thailand

[3 ]Department of Pathology, Faculty of Medicine, Ramathibodhi Hospital, Mahidol University, Bangkok, 10400, Thailand

[4 ]Department of Computer Engineering, Faculty of Engineering, Prince of Songkla University, Songkhla, 90112, Thailand

[5 ]Department of Epidemiology, Erasmus MC Rotterdam, Postbus 2040, 3000 CA Rotterdam, the Netherlands

[6 ]Quantitative Integrative Genomics Group, Institute of Cytology & Genetics SD RAS, Novosibirsk 630090, Russia

Article

Publisher ID: 1471-2105-11-217

DOI: 10.1186/1471-2105-11-217

PMC ID: 2879286

PubMed ID: 20429914

SO-VID: 534ac3e7-ff61-4efb-89ad-70e4e93173b8

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

ParallABEL: an R library for generalized parallelization of genome-wide association studies

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genetoberfest

Most cited references 8

TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study.

Implementing a unified approach to family-based tests of association.

Pedigree disequilibrium tests for multilocus haplotypes.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 219

Cited by 2

Most referenced authors 375