24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Big datasets, accumulated from biomedical and agronomic studies, provide the potential to identify genes that control complex human diseases and agriculturally important traits through genome-wide association studies (GWAS). However, big datasets also lead to extreme computational challenges, especially when sophisticated statistical models are employed to simultaneously reduce false positives and false negatives. The newly developed fixed and random model circulating probability unification (FarmCPU) method uses a bin method under the assumption that quantitative trait nucleotides (QTNs) are evenly distributed throughout the genome. The estimated QTNs are used to separate a mixed linear model into a computationally efficient fixed effect model (FEM) and a computationally expensive random effect model (REM), which are then used iteratively. To completely eliminate the computationally expensive REM, we replaced REM with FEM by using Bayesian information criteria. To eliminate the requirement that QTNs be evenly distributed throughout the genome, we replaced the bin method with linkage disequilibrium information. The new method is called Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK). Both real and simulated data analyses demonstrated that BLINK improves statistical power compared to FarmCPU, in addition to remarkably reducing computing time. Now, a dataset with one million individuals and one-half million markers can be analyzed within three hours, instead of one week using FarmCPU.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          Association mapping in structured populations.

          The use, in association studies, of the forthcoming dense genomewide collection of single-nucleotide polymorphisms (SNPs) has been heralded as a potential breakthrough in the study of the genetic basis of common complex disorders. A serious problem with association mapping is that population structure can lead to spurious associations between a candidate marker and a phenotype. One common solution has been to abandon case-control studies in favor of family-based tests of association, such as the transmission/disequilibrium test (TDT), but this comes at a considerable cost in the need to collect DNA from close relatives of affected individuals. In this article we describe a novel, statistically valid, method for case-control association studies in structured populations. Our method uses a set of unlinked genetic markers to infer details of population structure, and to estimate the ancestry of sampled individuals, before using this information to test for associations within subpopulations. It provides power comparable with the TDT in many settings and may substantially outperform it if there are conflicting associations in different subpopulations.
            • Record: found
            • Abstract: not found
            • Article: not found

            Extended Bayesian information criteria for model selection with large model spaces

              • Record: found
              • Abstract: found
              • Article: not found

              An efficient multi-locus mixed model approach for genome-wide association studies in structured populations

              Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods, in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying novel associations in known candidates as well as evidence for allelic heterogeneity. We also demonstrate how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large datasets (n > 10000) practicable.

                Author and article information

                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                11 December 2018
                February 2019
                11 December 2018
                : 8
                : 2
                : giy154
                Affiliations
                [1 ]Department of Crop and Soil Sciences, Washington State University, 1170 NE Stadium Way, Pullman, Washington, 99164-6420, USA
                [2 ]Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, 1 Shizishan Street, Wuhan, Hubei, 430070, China
                [3 ]School of Electrical Engineering and Computer Science, Washington State University, 355 NE Spokane Street, Pullman, Washington, 99164-2752, USA
                Author notes
                Correspondence address. Zhiwu Zhang, Jonhson Hall 105, Department of Crop and Soil Sciences, Washington State University, 1170 NE Stadium Way, Pullman, Washington, 99164-6420, USA. E-mail: zhiwu.zhang@ 123456wsu.edu
                Author information
                http://orcid.org/0000-0001-7295-9788
                http://orcid.org/0000-0002-5784-9684
                Article
                giy154
                10.1093/gigascience/giy154
                6365300
                30535326
                876fd52a-1fed-4ebd-b9d7-0940e0de1c93
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 24 January 2018
                : 18 June 2018
                : 27 November 2018
                Page count
                Pages: 12
                Funding
                Funded by: Natural Resource Sciences at Washington State University 10.13039/100007593
                Award ID: 126593
                Funded by: National Science Foundation 10.13039/100006445
                Award ID: 1661348
                Funded by: National Institute of Food and Agriculture 10.13039/100005825
                Award ID: 2018–70005-28792
                Funded by: United States Department of Agriculture 10.13039/100000199
                Award ID: 2016–68004-24770
                Categories
                Technical Note

                gwas,big datasets,complex traits,farmcpu
                gwas, big datasets, complex traits, farmcpu

                Comments

                Comment on this article

                Related Documents Log