60
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      In silico prediction of splice-altering single nucleotide variants in the human genome

      research-article
      1 , 2 , 3 , 1 , 2 , 3 , 4 , 1 , 2 , *
      Nucleic Acids Research
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: not found

          Splicing in disease: disruption of the splicing code and the decoding machinery.

          Human genes contain a dense array of diverse cis-acting elements that make up a code required for the expression of correctly spliced mRNAs. Alternative splicing generates a highly dynamic human proteome through networks of coordinated splicing events. Cis- and trans-acting mutations that disrupt the splicing code or the machinery required for splicing and its regulation have roles in various diseases, and recent studies have provided new insights into the mechanisms by which these effects occur. An unexpectedly large fraction of exonic mutations exhibit a primary pathogenic effect on splicing. Furthermore, normal genetic variation significantly contributes to disease severity and susceptibility by affecting splicing efficiency.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Predictive identification of exonic splicing enhancers in human genes.

            Specific short oligonucleotide sequences that enhance pre-mRNA splicing when present in exons, termed exonic splicing enhancers (ESEs), play important roles in constitutive and alternative splicing. A computational method, RESCUE-ESE, was developed that predicts which sequences have ESE activity by statistical analysis of exon-intron and splice site composition. When large data sets of human gene sequences were used, this method identified 10 predicted ESE motifs. Representatives of all 10 motifs were found to display enhancer activity in vivo, whereas point mutants of these sequences exhibited sharply reduced activity. The motifs identified enable prediction of the splicing phenotypes of exonic mutations in human genes.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Improved splice site detection in Genie.

              We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                16 December 2014
                21 November 2014
                21 November 2014
                : 42
                : 22
                : 13534-13544
                Affiliations
                [1 ]Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
                [2 ]Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
                [3 ]Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
                [4 ]Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
                Author notes
                [* ]To whom correspondence should be addressed. Tel: +1 713 500 9820; Fax: +1 713 500 0900; Email: Xiaoming.Liu@ 123456uth.tmc.edu
                Article
                10.1093/nar/gku1206
                4267638
                25416802
                e1e16164-8bca-4b8b-b30f-56128ac2666b
                © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 4 November 2014
                : 12 October 2014
                : 27 August 2014
                Page count
                Pages: 11
                Categories
                22
                24
                Data Resources and Analyses
                Custom metadata
                16 December 2014

                Genetics
                Genetics

                Comments

                Comment on this article