10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ARIADNA: machine learning method for ancient DNA variant discovery

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Ancient DNA (aDNA) studies often rely on standard methods of mutation calling, optimized for high-quality contemporary DNA but not for excessive contamination, time- or environment-related damage of aDNA. In the absence of validated datasets and despite showing extreme sensitivity to aDNA quality, these methods have been used in many published studies, sometimes with additions of arbitrary filters or modifications, designed to overcome aDNA degradation and contamination problems. The general lack of best practices for aDNA mutation calling may lead to inaccurate results. To address these problems, we present ARIADNA (ARtificial Intelligence for Ancient DNA), a novel approach based on machine learning techniques, using specific aDNA characteristics as features to yield improved mutation calls. In our comparisons of variant callers across several ancient genomes, ARIADNA consistently detected higher-quality genome variants with fast runtimes, while reducing the false positive rate compared with other approaches.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: not found

          The complete genome sequence of a Neandertal from the Altai Mountains

          We present a high-quality genome sequence of a Neandertal woman from Siberia. We show that her parents were related at the level of half siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neandertal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neandertals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high quality Neandertal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neandertals and Denisovans.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Ancient human genomes suggest three ancestral populations for present-day Europeans

            We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations' deep relationships and show that EEF had $\sim$44% ancestry from a "Basal Eurasian" lineage that split prior to the diversification of all other non-African lineages.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Patterns of damage in genomic DNA sequences from a Neandertal.

              High-throughput direct sequencing techniques have recently opened the possibility to sequence genomes from Pleistocene organisms. Here we analyze DNA sequences determined from a Neandertal, a mammoth, and a cave bear. We show that purines are overrepresented at positions adjacent to the breaks in the ancient DNA, suggesting that depurination has contributed to its degradation. We furthermore show that substitutions resulting from miscoding cytosine residues are vastly overrepresented in the DNA sequences and drastically clustered in the ends of the molecules, whereas other substitutions are rare. We present a model where the observed substitution patterns are used to estimate the rate of deamination of cytosine residues in single- and double-stranded portions of the DNA, the length of single-stranded ends, and the frequency of nicks. The results suggest that reliable genome sequences can be obtained from Pleistocene organisms.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                DNA Res
                DNA Res
                dnares
                DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes
                Oxford University Press
                1340-2838
                1756-1663
                December 2018
                11 September 2018
                11 September 2018
                : 25
                : 6
                : 619-627
                Affiliations
                Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
                Author notes
                To whom correspondence should be addressed. Tel: +1 856 225 2960. Fax: +1 856 225 6312. Email: andrey.grigoriev@ 123456rutgers.edu
                Present address: Department of Biology, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.
                Article
                dsy029
                10.1093/dnares/dsy029
                6289774
                30215675
                0e9b57f4-d007-48a9-a6cb-e4ebda3c61ba
                © The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 11 February 2018
                : 15 August 2018
                Page count
                Pages: 9
                Funding
                Funded by: National Science Foundation 10.13039/100000001
                Award ID: DBI-1458202
                Funded by: National Institutes of Health 10.13039/100000002
                Award ID: R15CA220059
                Funded by: New Jersey Health Foundation 10.13039/100001774
                Categories
                Full Papers

                Genetics
                ancient dna,genome variants,machine learning
                Genetics
                ancient dna, genome variants, machine learning

                Comments

                Comment on this article