+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Recent advances in conservation and population genomics data analysis

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          New computational methods and next‐generation sequencing (NGS) approaches have enabled the use of thousands or hundreds of thousands of genetic markers to address previously intractable questions. The methods and massive marker sets present both new data analysis challenges and opportunities to visualize, understand, and apply population and conservation genomic data in novel ways. The large scale and complexity of NGS data also increases the expertise and effort required to thoroughly and thoughtfully analyze and interpret data. To aid in this endeavor, a recent workshop entitled “Population Genomic Data Analysis,” also known as “ConGen 2017,” was held at the University of Montana. The ConGen workshop brought 15 instructors together with knowledge in a wide range of topics including NGS data filtering, genome assembly, genomic monitoring of effective population size, migration modeling, detecting adaptive genomic variation, genomewide association analysis, inbreeding depression, and landscape genomics. Here, we summarize the major themes of the workshop and the important take‐home points that were offered to students throughout. We emphasize increasing participation by women in population and conservation genomics as a vital step for the advancement of science. Some important themes that emerged during the workshop included the need for data visualization and its importance in finding problematic data, the effects of data filtering choices on downstream population genomic analyses, the increasing availability of whole‐genome sequencing, and the new challenges it presents. Our goal here is to help motivate and educate a worldwide audience to improve population genomic data analysis and interpretation, and thereby advance the contribution of genomics to molecular ecology, evolutionary biology, and especially to the conservation of biodiversity.

          Related collections

          Most cited references 136

          • Record: found
          • Abstract: found
          • Article: not found

          Inference of population structure using multilocus genotype data.

          We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci-e.g. , seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from approximately pritch/home. html.
            • Record: found
            • Abstract: found
            • Article: not found

            Fast model-based estimation of ancestry in unrelated individuals.

            Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.
              • Record: found
              • Abstract: found
              • Article: not found

              GCTA: a tool for genome-wide complex trait analysis.

              For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

                Author and article information

                Evol Appl
                Evol Appl
                Evolutionary Applications
                John Wiley and Sons Inc. (Hoboken )
                20 August 2018
                September 2018
                : 11
                : 8 ( doiID: 10.1111/eva.2018.11.issue-8 )
                : 1197-1211
                [ 1 ] Institute for Bioinformatics and Evolutionary Studies University of Idaho Moscow Idaho
                [ 2 ] Fisheries Ecology Division Southwest Fisheries Science Center National Marine Fisheries Service National Oceanic and Atmospheric Administration Santa Cruz California
                [ 3 ] University of California Santa Cruz California
                [ 4 ] Division of Biological Sciences University of Montana Missoula Montana
                [ 5 ] Département de Biologie Institut de Biologie Intégrative et des Systèmes (IBIS) Université Laval Québec Québec Canada
                [ 6 ] Department of Biology Colorado State University Fort Collins Colorado
                [ 7 ] Flathead Lake Biological Station Montana Conservation Genomics Laboratory Division of Biological Science University of Montana Missoula Montana
                [ 8 ] Wildlife Program Fish and Wildlife Genomics Group College of Forestry and Conservation University of Montana Missoula Montana
                [ 9 ] Department of Biology Centre for Biomedical Research University of Victoria Victoria British Columbia Canada
                [ 10 ] Department of Biological Sciences California State University San Marcos San Marcos California
                [ 11 ] NOAA Fisheries Northwest Fisheries Science Center Seattle Washington
                Author notes
                [* ] Correspondence

                Sarah Hendricks, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID 83844.

                Email: shendri4@

                © 2018 The Authors. Evolutionary Applications published by John Wiley & Sons Ltd

                This is an open access article under the terms of the License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

                Page count
                Figures: 2, Tables: 0, Pages: 12, Words: 12729
                Funded by: Bioinformatics and Computational Biology Program, University of Idaho
                Funded by: American Genetic Association (AGA)
                Funded by: the US Geological Survey (USGS)
                Funded by: NSF grant
                Award ID: DoB‐1639014
                Award ID: DEB‐1655809
                Funded by: NASA grant
                Award ID: NNX14AB84G
                Funded by: NSF
                Award ID: DEB‐1258203
                Meeting Report
                Meeting Report
                Custom metadata
                September 2018
                Converter:WILEY_ML3GV2_TO_NLMPMC version:version=5.4.4 mode:remove_FC converted:20.08.2018


                Comment on this article