6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data

      research-article
      1 , 2 , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Analysis of population genetic data often includes a search for genomic regions with signs of recent positive selection. One of such approaches involves the concept of extended haplotype homozygosity (EHH) and its associated statistics. These statistics typically require phased haplotypes, and some of them necessitate polarized variants. Here, we unify and extend previously proposed modifications to loosen these requirements. We compare the modified versions with the original ones by measuring the false discovery rate in simulated whole-genome scans and by quantifying the overlap of inferred candidate regions in empirical data. We find that phasing information is indispensable for accurate estimation of within-population statistics (for all but very large samples) and of cross-population statistics for small samples. Ancestry information, in contrast, is of lesser importance for both types of statistic. Our publicly available R package rehh incorporates the modified statistics presented here.

          Related collections

          Most cited references59

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets.

            We present version 6 of the DNA Sequence Polymorphism (DnaSP) software, a new version of the popular tool for performing exhaustive population genetic analyses on multiple sequence alignments. This major upgrade incorporates novel functionalities to analyze large data sets, such as those generated by high-throughput sequencing technologies. Among other features, DnaSP 6 implements: 1) modules for reading and analyzing data from genomic partitioning methods, such as RADseq or hybrid enrichment approaches, 2) faster methods scalable for high-throughput sequencing data, and 3) summary statistics for the analysis of multi-locus population genetics data. Furthermore, DnaSP 6 includes novel modules to perform single- and multi-locus coalescent simulations under a wide range of demographic scenarios. The DnaSP 6 program, with extensive documentation, is freely available at http://www.ub.edu/dnasp.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

              F Tajima (1989)
              The relationship between the two estimates of genetic variation at the DNA level, namely the number of segregating sites and the average number of nucleotide differences estimated from pairwise comparison, is investigated. It is found that the correlation between these two estimates is large when the sample size is small, and decreases slowly as the sample size increases. Using the relationship obtained, a statistical method for testing the neutral mutation hypothesis is developed. This method needs only the data of DNA polymorphism, namely the genetic variation within population at the DNA level. A simple method of computer simulation, that was used in order to obtain the distribution of a new statistic developed, is also presented. Applying this statistical method to the five regions of DNA sequences in Drosophila melanogaster, it is found that large insertion/deletion (greater than 100 bp) is deleterious. It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Formal analysisRole: MethodologyRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2022
                18 January 2022
                : 17
                : 1
                : e0262024
                Affiliations
                [1 ] Institute for Genetics, University of Cologne, Cologne, Germany
                [2 ] CBGP, Univ Montpellier, CIRAD, INRAE, IRD, Institut Agro, Montpellier, France
                Government College University Faisalabad, PAKISTAN
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                [¤]

                Current address: Mathieu Gautier, UMR CBGP, CS30016, Montferrier sur lez Cedex, France

                Author information
                https://orcid.org/0000-0002-0175-1397
                https://orcid.org/0000-0001-7257-5880
                Article
                PONE-D-21-17689
                10.1371/journal.pone.0262024
                8765611
                35041674
                147162da-fd8d-406d-b83d-b43574b5e108
                © 2022 Klassmann, Gautier

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 28 May 2021
                : 15 December 2021
                Page count
                Figures: 9, Tables: 3, Pages: 22
                Funding
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Genetics
                Heredity
                Genetic Mapping
                Haplotypes
                Biology and Life Sciences
                Genetics
                Heredity
                Homozygosity
                Biology and Life Sciences
                Computational Biology
                Genomics Statistics
                Biology and Life Sciences
                Genetics
                Genomics
                Genomics Statistics
                Biology and Life Sciences
                Genetics
                Genomics
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Alleles
                Biology and Life Sciences
                Genetics
                Single Nucleotide Polymorphisms
                Research and Analysis Methods
                Simulation and Modeling
                Biology and Life Sciences
                Genetics
                Genomics
                Animal Genomics
                Mammalian Genomics
                Custom metadata
                All relevant data are within the paper and its Supporting information files.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article