Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Restriction site-associated DNA sequencing (RAD-seq) has become a powerful and widely used tool in molecular ecology studies as it allows to cost-effectively recover thousands of polymorphic sites across individuals of non-model organisms. However, its successful implementation in population genetics relies on correct data processing that would minimize potential loci-assembly biases and consequent genotyping error rates. RAD-seq data processing when no reference genome is available involves the assembly of hundreds of thousands high-throughput sequencing reads into orthologous loci, for which various key parameter values need to be selected by the researcher. Previous studies exploring the effect of these parameter values found or assumed that a larger number of recovered polymorphic loci is associated with a better assembly. Here, using three RAD-seq datasets from different species, we explore the effect of read filtering, loci assembly and polymorphic site selection on number of markers obtained and genetic differentiation inferred using the Stacks software. We find (i) that recovery of higher numbers of polymorphic loci is not necessarily associated with higher genetic differentiation, (ii) that the presence of PCR duplicates, selected loci assembly parameters and selected SNP filtering parameters affect the number of recovered polymorphic loci and degree of genetic differentiation, and (iii) that this effect is different in each dataset, meaning that defining a systematic universal protocol for RAD-seq data analysis may lead to missing relevant information about population differentiation.

      Related collections

      Most cited references 36

      • Record: found
      • Abstract: found
      • Article: not found

      PLINK: a tool set for whole-genome association and population-based linkage analyses.

      Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        genepop'007: a complete re-implementation of the genepop software for Windows and Linux.

        This note summarizes developments of the genepop software since its first description in 1995, and in particular those new to version 4.0: an extended input format, several estimators of neighbourhood size under isolation by distance, new estimators and confidence intervals for null allele frequency, and less important extensions to previous options. genepop now runs under Linux as well as under Windows, and can be entirely controlled by batch calls. © 2007 The Author.
          Bookmark
          • Record: found
          • Abstract: not found
          • Article: not found

          Estimating F-Statistics for the Analysis of Population Structure

           B Weir,  C. Cockerham (1984)
            Bookmark

            Author and article information

            Affiliations
            Marine Research Division, AZTI , Sukarrieta, Spain
            Author notes

            Edited by: Miguel Arenas, University of Vigo, Spain

            Reviewed by: Yukio Nagano, Saga University, Japan; Debabrata Sarkar, Indian Council of Agricultural Research, India; Manuel Vera, University of Santiago de Compostela, Spain; Xun Gong, Kunming Institute of Botany (CAS), China

            *Correspondence: Natalia Díaz-Arce, ndiaz@ 123456azti.es

            This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

            Contributors
            Journal
            Front Genet
            Front Genet
            Front. Genet.
            Frontiers in Genetics
            Frontiers Media S.A.
            1664-8021
            29 May 2019
            2019
            : 10
            6549478 10.3389/fgene.2019.00533
            Copyright © 2019 Díaz-Arce and Rodríguez-Ezpeleta.

            This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

            Counts
            Figures: 5, Tables: 1, Equations: 0, References: 36, Pages: 10, Words: 0
            Categories
            Genetics
            Original Research

            Comments

            Comment on this article