38
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The UK Biobank resource with deep phenotyping and genomic data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

          Abstract

          Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.

          Related collections

          Most cited references 10

          • Record: found
          • Abstract: found
          • Article: not found

          Fast and accurate long-range phasing in a UK Biobank cohort

          Recent work has leveraged the extensive genotyping of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here, we develop a fast and accurate LRP method, Eagle, that extends this paradigm to populations with much smaller proportions of genotyped samples by harnessing long (>4cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to N≈150,000 samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1–2 orders of magnitude faster than existing methods while achieving similar or better phasing accuracy (switch error rate ≈0.3%, corresponding to perfect phase in a majority of 10Mb segments). We also observed that when used within an imputation pipeline, Eagle pre-phasing improved downstream imputation accuracy compared to pre-phasing in batches using existing methods (as necessary to achieve comparable computational cost).
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia.

            Searching for genetic variants with unusual differentiation between subpopulations is an established approach for identifying signals of natural selection. However, existing methods generally require discrete subpopulations. We introduce a method that infers selection using principal components (PCs) by identifying variants whose differentiation along top PCs is significantly greater than the null distribution of genetic drift. To enable the application of this method to large datasets, we developed the FastPCA software, which employs recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using the PC-based test for natural selection, we replicate previously known selected loci and identify three new genome-wide significant signals of selection, including selection in Europeans at ADH1B. The coding variant rs1229984(∗)T has previously been associated to a decreased risk of alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents. We also detect selection signals at IGFBP3 and IGH, which have also previously been associated to human disease.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Systematic Evaluation of Pleiotropy Identifies 6 Further Loci Associated With Coronary Artery Disease

              Background Genome-wide association studies have so far identified 56 loci associated with risk of coronary artery disease (CAD). Many CAD loci show pleiotropy; that is, they are also associated with other diseases or traits. Objectives This study sought to systematically test if genetic variants identified for non-CAD diseases/traits also associate with CAD and to undertake a comprehensive analysis of the extent of pleiotropy of all CAD loci. Methods In discovery analyses involving 42,335 CAD cases and 78,240 control subjects we tested the association of 29,383 common (minor allele frequency >5%) single nucleotide polymorphisms available on the exome array, which included a substantial proportion of known or suspected single nucleotide polymorphisms associated with common diseases or traits as of 2011. Suggestive association signals were replicated in an additional 30,533 cases and 42,530 control subjects. To evaluate pleiotropy, we tested CAD loci for association with cardiovascular risk factors (lipid traits, blood pressure phenotypes, body mass index, diabetes, and smoking behavior), as well as with other diseases/traits through interrogation of currently available genome-wide association study catalogs. Results We identified 6 new loci associated with CAD at genome-wide significance: on 2q37 (KCNJ13-GIGYF2), 6p21 (C2), 11p15 (MRVI1-CTR9), 12q13 (LRP1), 12q24 (SCARB1), and 16q13 (CETP). Risk allele frequencies ranged from 0.15 to 0.86, and odds ratio per copy of the risk allele ranged from 1.04 to 1.09. Of 62 new and known CAD loci, 24 (38.7%) showed statistical association with a traditional cardiovascular risk factor, with some showing multiple associations, and 29 (47%) showed associations at p < 1 × 10−4 with a range of other diseases/traits. Conclusions We identified 6 loci associated with CAD at genome-wide significance. Several CAD loci show substantial pleiotropy, which may help us understand the mechanisms by which these loci affect CAD risk.
                Bookmark

                Author and article information

                Contributors
                marchini@stats.ox.ac.uk
                Journal
                Nature
                Nature
                Nature
                Nature Publishing Group UK (London )
                0028-0836
                1476-4687
                10 October 2018
                10 October 2018
                2018
                : 562
                : 7726
                : 203-209
                Affiliations
                [1 ]ISNI 0000 0004 1936 8948, GRID grid.4991.5, Wellcome Centre for Human Genetics, , University of Oxford, ; Oxford, UK
                [2 ]ISNI 0000 0004 1936 8948, GRID grid.4991.5, Department of Statistics, , University of Oxford, ; Oxford, UK
                [3 ]ISNI 0000 0001 2179 088X, GRID grid.1008.9, Melbourne Integrative Genomics and the Schools of Mathematics and Statistics, and BioSciences, , The University of Melbourne, Parkville, ; Victoria, Australia
                [4 ]ISNI 0000 0000 9442 535X, GRID grid.1058.c, Murdoch Children’s Research Institute, Parkville, ; Victoria, Australia
                [5 ]ISNI 0000 0001 2322 4988, GRID grid.8591.5, Department of Genetic Medicine and Development, , University of Geneva, ; Geneva, Switzerland
                [6 ]ISNI 0000 0001 2322 4988, GRID grid.8591.5, Swiss Institute of Bioinformatics, , University of Geneva, ; Geneva, Switzerland
                [7 ]ISNI 0000 0001 2322 4988, GRID grid.8591.5, Institute of Genetics and Genomics in Geneva, , University of Geneva, ; Geneva, Switzerland
                [8 ]GRID grid.434747.7, Illumina Ltd, Chesterford Research Park, Little Chesterford, ; Essex, UK
                [9 ]Nuffield Department of Clinical Neurosciences, Division of Clinical Neurology, John Radcliffe Hospital, University of Oxford, Oxford, UK
                [10 ]ISNI 0000 0004 0396 0496, GRID grid.421945.f, UK Biobank, Adswood, Stockport, ; Cheshire, UK
                [11 ]ISNI 0000 0004 1936 8948, GRID grid.4991.5, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, , University of Oxford, ; Oxford, UK
                [12 ]GRID grid.425582.c, Present Address: Procter & Gamble, ; Brussels, Belgium
                Article
                579
                10.1038/s41586-018-0579-z
                6786975
                30305743
                © Springer Nature Limited 2018

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                Categories
                Article
                Custom metadata
                © Springer Nature Limited 2018

                Uncategorized

                genome-wide association studies, genome, population genetics, genotype, haplotypes

                Comments

                Comment on this article