4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      OptM: estimating the optimal number of migration edges on population trees using Treemix

      research-article
      Biology Methods & Protocols
      Oxford University Press
      likelihood, population genomics, SNPs, structure

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The software Treemix has become extensively used to estimate the number of migration events, or edges ( m), on population trees from genome-wide allele frequency data. However, the appropriate number of edges to include remains unclear. Here, I show that an optimal value of m can be inferred from the second-order rate of change in likelihood (Δ m) across incremental values of m. Repurposed from its original use to estimate the number of population clusters in the software StructureK), I show using simulated populations that Δ m performs equally as well as current recommendations for Treemix. A demonstration of an empirical dataset from domestic dogs indicates that this method may be preferable in large, complex population histories and can prioritize migration events for subsequent investigation. The method has been implemented in a freely available R package called “OptM” and as a web application ( https://rfitak.shinyapps.io/OptM/) to interface directly with the output files of Treemix.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: not found

          PLINK: a tool set for whole-genome association and population-based linkage analyses.

          Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The variant call format and VCFtools

            Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability: http://vcftools.sourceforge.net Contact: rd@sanger.ac.uk
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Detecting the number of clusters of individuals using the software structure: a simulation study

              The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software STRUCTURE allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated 'log probability of data' does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic DeltaK based on the rate of change in the log probability of data between successive K values, we found that STRUCTURE accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.
                Bookmark

                Author and article information

                Journal
                Biol Methods Protoc
                Biol Methods Protoc
                biomethods
                Biology Methods & Protocols
                Oxford University Press
                2396-8923
                2021
                16 September 2021
                16 September 2021
                : 6
                : 1
                : bpab017
                Affiliations
                Department of Biology, Genomics and Bioinformatics Cluster, University of Central Florida , Orlando, FL 32816, USA
                Author notes
                Correspondence address. Department of Biology, Genomics and Bioinformatics Cluster, University of Central Florida, 4110 Libra Dr., Orlando, FL 32816, USA. E-mail: Robert.fitak@ 123456ucf.edu
                Author information
                https://orcid.org/0000-0002-7398-6259
                Article
                bpab017
                10.1093/biomethods/bpab017
                8476930
                34595352
                517c061b-3c20-4aee-a887-499062cd2d6a
                © The Author(s) 2021. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 20 August 2021
                : 09 September 2021
                : 10 September 2021
                : 13 September 2021
                : 23 September 2021
                Page count
                Pages: 6
                Categories
                Innovations
                AcademicSubjects/SCI00960

                likelihood,population genomics,snps,structure
                likelihood, population genomics, snps, structure

                Comments

                Comment on this article