Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Information theoretic alignment free variant calling

Read this article at

ScienceOpenPublisher
Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      While traditional methods for calling variants across whole genome sequence data rely on alignment to an appropriate reference sequence, alternative techniques are needed when a suitable reference does not exist. We present a novel alignment and assembly free variant calling method based on information theoretic principles designed to detect variants have strong statistical evidence for their ability to segregate samples in a given dataset. Our method uses the context surrounding a particular nucleotide to define variants. Given a set of reads, we model the probability of observing a given nucleotide conditioned on the surrounding prefix and suffixes of length k as a multinomial distribution. We then estimate which of these contexts are stable intra-sample and varying inter-sample using a statistic based on the Kullback–Leibler divergence.The utility of the variant calling method was evaluated through analysis of a pair of bacterial datasets and a mouse dataset. We found that our variants are highly informative for supervised learning tasks with performance similar to standard reference based calls and another reference free method (DiscoSNP++). Comparisons against reference based calls showed our method was able to capture very similar population structure on the bacterial dataset. The algorithm’s focus on discriminatory variants makes it suitable for many common analysis tasks for organisms that are too diverse to be mapped back to a single reference sequence.

      Related collections

      Most cited references 18

      • Record: found
      • Abstract: found
      • Article: not found

      The landscape of somatic copy-number alteration across human cancers

      A powerful way to discover key genes playing causal roles in oncogenesis is to identify genomic regions that undergo frequent alteration in human cancers. Here, we report high-resolution analyses of somatic copy-number alterations (SCNAs) from 3131 cancer specimens, belonging largely to 26 histological types. We identify 158 regions of focal SCNA that are altered at significant frequency across multiple cancer types, of which 122 cannot be explained by the presence of a known cancer target gene located within these regions. Several gene families are enriched among these regions of focal SCNA, including the BCL2 family of apoptosis regulators and the NF-κB pathway. We show that cancer cells harboring amplifications surrounding the MCL1 and BCL2L1 anti-apoptotic genes depend upon expression of these genes for survival. Finally, we demonstrate that a large majority of SCNAs identified in individual cancer types are present in multiple cancer types.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        Lateral gene transfer and the nature of bacterial innovation.

        Unlike eukaryotes, which evolve principally through the modification of existing genetic information, bacteria have obtained a significant proportion of their genetic diversity through the acquisition of sequences from distantly related organisms. Horizontal gene transfer produces extremely dynamic genomes in which substantial amounts of DNA are introduced into and deleted from the chromosome. These lateral transfers have effectively changed the ecological and pathogenic character of bacterial species.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          New approaches to population stratification in genome-wide association studies.

          Genome-wide association (GWA) studies are an effective approach for identifying genetic variants associated with disease risk. GWA studies can be confounded by population stratification--systematic ancestry differences between cases and controls--which has previously been addressed by methods that infer genetic ancestry. Those methods perform well in data sets in which population structure is the only kind of structure present but are inadequate in data sets that also contain family structure or cryptic relatedness. Here, we review recent progress on methods that correct for stratification while accounting for these additional complexities.
            Bookmark

            Author and article information

            Affiliations
            [1 ]IBM Research—Australia , Carlton, VIC, Australia
            [2 ]Department of Computing and Information Systems, The University of Melbourne , Parkville, VIC, Australia
            [3 ]Centre For Epidemiology and Biostatistics, The University of Melbourne , Parkville, VIC, Australia
            [4 ]School of Mathematics and Statistics, The University of Melbourne , Parkville, VIC, Australia
            Contributors
            Journal
            peerj-cs
            peerj-cs
            PeerJ Comput. Sci.
            PeerJ Computer Science
            PeerJ Comput. Sci.
            PeerJ Inc. (San Francisco, USA )
            2376-5992
            25 July 2016
            : 2
            cs-71
            10.7717/peerj-cs.71
            (Editor)
            ©2016 Bedo et al.

            This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

            This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

            Product
            Self URI (journal-page): https://peerj.com/computer-science/
            Funding
            The authors received no funding for this work.
            Categories
            Bioinformatics
            Computational Biology

            Computer science

            Acteria, Feature extraction, Assembly free, Genome, Alignment free, Variant

            Comments

            Comment on this article