31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Supervised Machine Learning for Population Genetics: A New Paradigm

      research-article
      1 , 1
      Trends in genetics : TIG

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.

          Related collections

          Most cited references54

          • Record: found
          • Abstract: not found
          • Article: not found

          The hitch-hiking effect of a favourable gene.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genomic scans for selective sweeps using SNP data.

            Detecting selective sweeps from genomic SNP data is complicated by the intricate ascertainment schemes used to discover SNPs, and by the confounding influence of the underlying complex demographics and varying mutation and recombination rates. Current methods for detecting selective sweeps have little or no robustness to the demographic assumptions and varying recombination rates, and provide no method for correcting for ascertainment biases. Here, we present several new tests aimed at detecting selective sweeps from genomic SNP data. Using extensive simulations, we show that a new parametric test, based on composite likelihood, has a high power to detect selective sweeps and is surprisingly robust to assumptions regarding recombination rates and demography (i.e., has low Type I error). Our new test also provides estimates of the location of the selective sweep(s) and the magnitude of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence for selective sweeps is also found in many other regions, including genes known to be associated with disease risk such as DPP10 and COL4A3.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Soft sweeps: molecular population genetics of adaptation from standing genetic variation.

              A population can adapt to a rapid environmental change or habitat expansion in two ways. It may adapt either through new beneficial mutations that subsequently sweep through the population or by using alleles from the standing genetic variation. We use diffusion theory to calculate the probabilities for selective adaptations and find a large increase in the fixation probability for weak substitutions, if alleles originate from the standing genetic variation. We then determine the parameter regions where each scenario-standing variation vs. new mutations-is more likely. Adaptations from the standing genetic variation are favored if either the selective advantage is weak or the selection coefficient and the mutation rate are both high. Finally, we analyze the probability of "soft sweeps," where multiple copies of the selected allele contribute to a substitution, and discuss the consequences for the footprint of selection on linked neutral variation. We find that soft sweeps with weaker selective footprints are likely under both scenarios if the mutation rate and/or the selection coefficient is high.
                Bookmark

                Author and article information

                Journal
                8507085
                7839
                Trends Genet
                Trends Genet.
                Trends in genetics : TIG
                0168-9525
                6 April 2018
                10 January 2018
                April 2018
                18 April 2018
                : 34
                : 4
                : 301-312
                Affiliations
                [1 ]Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA
                Author notes
                [* ]Correspondence: dan.schrider@ 123456rutgers.edu (D.R. Schrider) and kern@ 123456biology.rutgers.edu (A.D. Kern)
                Article
                NIHMS957852
                10.1016/j.tig.2017.12.005
                5905713
                29331490
                3cf677a8-a54c-4bef-b7f3-f9f4fd51cc9e

                This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/).

                History
                Categories
                Article

                Genetics
                Genetics

                Comments

                Comment on this article