27
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evaluation of three read-depth based CNV detection tools using whole-exome sequencing data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Whole exome sequencing (WES) has been widely accepted as a robust and cost-effective approach for clinical genetic testing of small sequence variants. Detection of copy number variants (CNV) within WES data have become possible through the development of various algorithms and software programs that utilize read-depth as the main information. The aim of this study was to evaluate three commonly used, WES read-depth based CNV detection programs using high-resolution chromosomal microarray analysis (CMA) as a standard.

          Methods

          Paired CMA and WES data were acquired for 45 samples. A total of 219 CNVs (size ranged from 2.3 kb – 35 mb) identified on three CMA platforms (Affymetrix, Agilent and Illumina) were used as standards. CNVs were called from WES data using XHMM, CoNIFER, and CNVnator with modified settings.

          Results

          All three software packages detected an elevated proportion of small variants (< 20 kb) compared to CMA. XHMM and CoNIFER had poor detection sensitivity (22.2 and 14.6%), which correlated with the number of capturing probes involved. CNVnator detected most variants and had better sensitivity (87.7%); however, suffered from an overwhelming detection of small CNVs below 20 kb, which required further confirmation. Size estimation of variants was exaggerated by CNVnator and understated by XHMM and CoNIFER.

          Conclusion

          Low concordances of CNV, detected by three different read-depth based programs, indicate the immature status of WES-based CNV detection. Low sensitivity and uncertain specificity of WES-based CNV detection in comparison with CMA based CNV detection suggests that CMA will continue to play an important role in detecting clinical grade CNV in the NGS era, which is largely based on WES.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13039-017-0333-5) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          Structural variation in the human genome.

          The first wave of information from the analysis of the human genome revealed SNPs to be the main source of genetic and phenotypic human variation. However, the advent of genome-scanning technologies has now uncovered an unexpectedly large extent of what we term 'structural variation' in the human genome. This comprises microscopic and, more commonly, submicroscopic variants, which include deletions, duplications and large-scale copy-number variants - collectively termed copy-number variants or copy-number polymorphisms - as well as insertions, inversions and translocations. Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth.

            Sequencing of gene-coding regions (the exome) is increasingly used for studying human disease, for which copy-number variants (CNVs) are a critical genetic component. However, detecting copy number from exome sequencing is challenging because of the noncontiguous nature of the captured exons. This is compounded by the complex relationship between read depth and copy number; this results from biases in targeted genomic hybridization, sequence factors such as GC content, and batching of samples during collection and sequencing. We present a statistical tool (exome hidden Markov model [XHMM]) that uses principal-component analysis (PCA) to normalize exome read depth and a hidden Markov model (HMM) to discover exon-resolution CNV and genotype variation across samples. We evaluate performance on 90 schizophrenia trios and 1,017 case-control samples. XHMM detects a median of two rare (<1%) CNVs per individual (one deletion and one duplication) and has 79% sensitivity to similarly rare CNVs overlapping three or more exons discovered with microarrays. With sensitivity similar to state-of-the-art methods, XHMM achieves higher specificity by assigning quality metrics to the CNV calls to filter out bad ones, as well as to statistically genotype the discovered CNV in all individuals, yielding a trio call set with Mendelian-inheritance properties highly consistent with expectation. We also show that XHMM breakpoint quality scores enable researchers to explicitly search for novel classes of structural variation. For example, we apply XHMM to extract those CNVs that are highly likely to disrupt (delete or duplicate) only a portion of a gene. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.

              We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs.
                Bookmark

                Author and article information

                Contributors
                yaoruen@126.com
                evonne16@163.com
                ytt.007@163.com
                liniu0509@163.com
                hu_xuyun@126.com
                wangxiumin1019@126.com
                labwangjian@shsmu.edu.cn
                yiping.shen@childrens.harvard.edu
                Journal
                Mol Cytogenet
                Mol Cytogenet
                Molecular Cytogenetics
                BioMed Central (London )
                1755-8166
                23 August 2017
                23 August 2017
                2017
                : 10
                : 30
                Affiliations
                [1 ]GRID grid.415869.7, Department of Medical Genetics and Molecular Diagnostic Laboratory, Shanghai Children’s Medical Center, , Shanghai Jiaotong University School of Medicine, ; Shanghai, 200127 China
                [2 ]ISNI 0000 0004 0378 8438, GRID grid.2515.3, , Boston Children’s Hospital, ; Boston, MA 02115 USA
                [3 ]GRID grid.415869.7, Department of Endocrinology and Metabolism, Shanghai Children’s Medical Center, , Shanghai Jiaotong University School of Medicine, ; Shanghai, 200127 China
                Article
                333
                10.1186/s13039-017-0333-5
                5569469
                28852425
                5cd3aab8-10c9-449a-a142-ee3d22f781be
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 26 April 2017
                : 15 August 2017
                Categories
                Research
                Custom metadata
                © The Author(s) 2017

                Genetics
                clinical sequencing,copy number variants,whole exome sequencing,structural variation
                Genetics
                clinical sequencing, copy number variants, whole exome sequencing, structural variation

                Comments

                Comment on this article