Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Beyond Homozygosity Mapping: Family-Control analysis based on Hamming distance for prioritizing variants in exome sequencing

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      A major challenge in current exome sequencing in autosomal recessive (AR) families is the lack of an effective method to prioritize single-nucleotide variants (SNVs). AR families are generally too small for linkage analysis, and length of homozygous regions is unreliable for identification of causative variants. Various common filtering steps usually result in a list of candidate variants that cannot be narrowed down further or ranked. To prioritize shortlisted SNVs we consider each homozygous candidate variant together with a set of SNVs flanking it. We compare the resulting array of genotypes between an affected family member and a number of control individuals and argue that, in a family, differences between family member and controls should be larger for a pathogenic variant and SNVs flanking it than for a random variant. We assess differences between arrays in two individuals by the Hamming distance and develop a suitable test statistic, which is expected to be large for a causative variant and flanking SNVs. We prioritize candidate variants based on this statistic and applied our approach to six patients with known pathogenic variants and found these to be in the top 2 to 10 percentiles of ranks.

      Related collections

      Most cited references 17

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Sequence Alignment/Map format and SAMtools

      Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

        Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found
          Is Open Access

          An integrated map of genetic variation from 1,092 human genomes

          Summary Through characterising the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help understand the genetic contribution to disease. We describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methodologies to integrate information across multiple algorithms and diverse data sources we provide a validated haplotype map of 38 million SNPs, 1.4 million indels and over 14 thousand larger deletions. We show that individuals from different populations carry different profiles of rare and common variants and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways and that each individual harbours hundreds of rare non-coding variants at conserved sites, such as transcription-factor-motif disrupting changes. This resource, which captures up to 98% of accessible SNPs at a frequency of 1% in populations of medical genetics focus, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
            Bookmark

            Author and article information

            Affiliations
            [1 ]Department of Cardiovascular Medicine, Osaka University Graduate School of Medicine , Osaka, Japan
            [2 ]Department of Genome Informatics, Osaka University Graduate School of Medicine , Osaka, Japan
            [3 ]McGill University and Genome Québec Innovation Centre , Montréal, QC H3A 1B1, Canada
            [4 ]Department of Medical Biochemistry, Osaka University Graduate School of Medicine , Osaka, Japan
            [5 ]Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; and Rockefeller University , New York, NY 10065, USA
            Author notes
            Journal
            Sci Rep
            Sci Rep
            Scientific Reports
            Nature Publishing Group
            2045-2322
            06 July 2015
            2015
            : 5
            26143870 5155624 srep12028 10.1038/srep12028
            Copyright © 2015, Macmillan Publishers Limited

            This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

            Categories
            Article

            Uncategorized

            Comments

            Comment on this article