72
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ∼5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ∼800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

          Author Summary

          We performed genome-wide association mapping analysis in maize for 41 different phenotypes in order to identify which types of variants are more likely to be important for controlling traits. We took advantage of a large mapping population (roughly 5000 recombinant inbred lines) and nearly 30 million segregating variants to identify ∼4800 variants that were significantly associated with at least one phenotype. While these variants are enriched in genes, most of them occur outside of genes, often in regions where regulatory elements likely lie. We also found a significant enrichment for paralogous (duplicated) genes, implying that functional divergence after gene duplication plays an important role in trait variation. Overall these analyses provide important insight into the unifying patterns of variation in traits across maize, and the results will likely also apply to other organisms with similarly large, complex genomes.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: not found
          • Article: not found

          COPPER ENZYMES IN ISOLATED CHLOROPLASTS. POLYPHENOLOXIDASE IN BETA VULGARIS.

          D ARNON (1949)
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Duplication and divergence: the evolution of new genes and old ideas.

            Over 35 years ago, Susumu Ohno stated that gene duplication was the single most important factor in evolution. He reiterated this point a few years later in proposing that without duplicated genes the creation of metazoans, vertebrates, and mammals from unicellular organisms would have been impossible. Such big leaps in evolution, he argued, required the creation of new gene loci with previously nonexistent functions. Bold statements such as these, combined with his proposal that at least one whole-genome duplication event facilitated the evolution of vertebrates, have made Ohno an icon in the literature on genome evolution. However, discussion on the occurrence and consequences of gene and genome duplication events has a much longer, and often neglected, history. Here we review literature dealing with the occurrence and consequences of gene duplication, beginning in 1911. We document conceptual and technological advances in gene duplication research from this early research in comparative cytology up to recent research on whole genomes, "transcriptomes," and "interactomes."
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Rare Variants Create Synthetic Genome-Wide Associations

              Introduction Efforts to fine map the causal variants responsible for genome-wide association studies (GWAS) signals have been largely predicated on the common disease common variant theory, postulating a common variant as the culprit for observed associations. This has led to extensive resequencing efforts that have been largely unsuccessful [1]–[5]. Here, we explore the possibility that part of the reason for this may be that the disease class causing an observed association may consist of multiple low-frequency variants across large regions of the genome—a phenomenon we call synthetic association. For convenience, these less common variants will be referred to here as “rare,” but we emphasize that we use this term loosely, only to refer to variants less common than those routinely studied in GWAS. The basic idea of how synthetic associations emerge in this model is illustrated in Figure 1, which shows how rare variants, by chance, can occur disproportionately in some parts of a gene genealogy. Any variant “higher up in the genealogy” that partitions those parts of the genealogy containing more disease variants than average will be identified as disease-associated. It is well appreciated that a noncausal variant will show association with a causal variant if the two are in strong linkage disequilibrium (LD). We use the previously introduced term synthetic association [6], however, to describe how such indirect association can occur between a common variant and at least one and possibly many rarer causal variants. Using the term synthetic as opposed to indirect emphasizes that the properties of the association signal are very different when the responsible variant or variants are much less frequent than the marker that carries the signal, as we detail below. 10.1371/journal.pbio.1000294.g001 Figure 1 Example genealogies showing causal variants and the strongest association for a common variant. (A) A genealogy with 10,000 original haplotypes was generated with 3,000 cases and 3,000 controls, genotype relative risk (γ) = 4, and nine causal variants. The branches containing the strongest synthetic association are indicated in blue. The branches containing the rare causal variants are in red. (B) A second genealogy was generated using the same parameters. These genealogies demonstrate two scenarios with genome-wide significant synthetic associations: the first (upper genealogy) had a high risk allele frequency (RAF = 0.49), and the second (lower genealogy) had a low RAF (0.08). To assess the tendency of rare disease-causing variants to create synthetic signals of association that are credited to single polymorphisms that are much more common in the population than the causal variants, we have simulated 10,000 haplotypes based on a coalescent model in a region either with or without recombination (Materials and Methods). We assumed that gene variants that influence disease have an allele frequency between 0.005 and 0.02, which is generally below the range of reliable detection (either by inclusion or indirect representation) using the genome-wide association platforms currently in use. We assumed a baseline probability of disease of φ for individuals with none of the rare genetic risk factors. The presence of at least one rare risk allele at the locus increased the probability of disease from φ to γ. We considered two values of φ (0.01, 0.1) and chose values of the penetrance γ such that the genotypic relative risk (GRR) of the rare causal variants varied incrementally between 2 and 6, where GRR is the ratio γ/φ. These values were chosen to explore the space around a GRR of 4, a threshold above which consistent linkage signals would be expected [7]. We simulated scenarios with one, three, five, seven, and nine rare causal variants. Results Across the conditions we have studied, not only is it possible to achieve genome-wide significance for common variants when one or more rare variants are the only contributors to disease, it is often the likely outcome (Figure 2). Overall, 30% of the simulations were able to detect an association with a common SNP at genome-wide significance (p 5%, Hardy-Weinberg equilibrium p-value >1×10−6, SNP call rate >95%), using the PLINK software [40]. For the sickle cell anemia GWAS, we compared 194 cases and 7,407 controls of inferred African ancestry via multidimensional scaling, with a genomic control inflation factor of 1.01. For hearing loss, we performed a GWAS on 418 cases and 6,892 control subjects, all of whom were of genetically inferred European ancestry via multidimensional scaling, with a genomic control inflation factor of 1.02.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                December 2014
                4 December 2014
                : 10
                : 12
                : e1004845
                Affiliations
                [1 ]Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
                [2 ]United States Department of Agriculture-Agricultural Research Service, Ithaca, New York, United States of America
                [3 ]Max Planck Institute of Molecular Plant Physiology, Golm-Potsdam, Germany
                [4 ]INRA, UMR 1332, Univ. Bordeaux, Villenave d'Ornon, France
                [5 ]Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
                The Australian National University, Australia
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: JGW MS YG ESB. Performed the experiments: JGW NZ YG. Analyzed the data: JGW NZ YG. Contributed reagents/materials/analysis tools: PJB NZ YG MS. Wrote the paper: JGW PJB NZ YG MS ESB.

                [¤a]

                Current address: BASF Plant Science, Research Triangle Park, North Carolina, United States of America

                [¤b]

                Current address: INRA, UMR 1332, Univ. Bordeaux, Villenave d'Ornon, France

                Article
                PGENETICS-D-14-01996
                10.1371/journal.pgen.1004845
                4256217
                25474422
                14cdcf00-b821-4c95-b701-aab27c7714a8
                Copyright @ 2014

                This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 22 July 2014
                : 23 October 2014
                Page count
                Pages: 10
                Funding
                This study was supported by National Science Foundation ( www.nsf.gov) grants DBI-0820619 (JGW), DBI–0501700 (NZ), and IOS-1238014 (JGW), the United States Department of Agriculture - Agricultural Research Service ( www.ars.usda.gov) (PJB, ESB), and the Max Planck Society ( www.mpg.de) (YG, MS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Agriculture
                Crop Science
                Crops
                Cereal Crops
                Maize
                Computational Biology
                Genome Analysis
                Genome-Wide Association Studies
                Family-Based Association Studies
                Quantitative Trait Association Studies
                Statistical Analysis of Genetic Association
                Genetics
                Plant Genetics
                Crop Genetics
                Genomics
                Plant Science
                Custom metadata
                The authors confirm that all data underlying the findings are fully available without restriction. All relevant data is either available from the cited publications or included in the supplemental files.

                Genetics
                Genetics

                Comments

                Comment on this article