In silico prediction of splice-altering single nucleotide variants in the human genome

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.

Related collections

Most cited references 32

Record: found
Abstract: found
Article: not found

Splicing in disease: disruption of the splicing code and the decoding machinery.

Guey-Shin Wang, Thomas Cooper (2007)

Human genes contain a dense array of diverse cis-acting elements that make up a code required for the expression of correctly spliced mRNAs. Alternative splicing generates a highly dynamic human proteome through networks of coordinated splicing events. Cis- and trans-acting mutations that disrupt the splicing code or the machinery required for splicing and its regulation have roles in various diseases, and recent studies have provided new insights into the mechanisms by which these effects occur. An unexpectedly large fraction of exonic mutations exhibit a primary pathogenic effect on splicing. Furthermore, normal genetic variation significantly contributes to disease severity and susceptibility by affecting splicing efficiency.

0 comments Cited 342 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Predictive identification of exonic splicing enhancers in human genes.

William G. Fairbrother, Ru-Fang Yeh, Phillip A Sharp … (2002)

Specific short oligonucleotide sequences that enhance pre-mRNA splicing when present in exons, termed exonic splicing enhancers (ESEs), play important roles in constitutive and alternative splicing. A computational method, RESCUE-ESE, was developed that predicts which sequences have ESE activity by statistical analysis of exon-intron and splice site composition. When large data sets of human gene sequences were used, this method identified 10 predicted ESE motifs. Representatives of all 10 motifs were found to display enhancer activity in vivo, whereas point mutants of these sequences exhibited sharply reduced activity. The motifs identified enable prediction of the splicing phenotypes of exonic mutations in human genes.

0 comments Cited 336 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Improved splice site detection in Genie.

Laura M Reese, D. Haussler, D Kulp … (1996)

We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.

0 comments Cited 307 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (hwp): nar

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 16 December 2014

Publication date (Electronic): 21 November 2014

Publication date PMC-release: 21 November 2014

Volume: 42

Issue: 22

Pages: 13534-13544

Affiliations

[1 ]Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA

[2 ]Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA

[3 ]Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA

[4 ]Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA

Author notes

[* ]To whom correspondence should be addressed. Tel: +1 713 500 9820; Fax: +1 713 500 0900; Email: Xiaoming.Liu@ 123456uth.tmc.edu

Article

DOI: 10.1093/nar/gku1206

PMC ID: 4267638

PubMed ID: 25416802

SO-VID: e1e16164-8bca-4b8b-b30f-56128ac2666b

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 4 November 2014

Date revision received : 12 October 2014

Date received : 27 August 2014

Page count

Pages: 11

Custom metadata

cover-date 16 December 2014

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 189

See all cited by

Most referenced authors 1,729

See all reference authors

In silico prediction of splice-altering single nucleotide variants in the human genome

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 32

Splicing in disease: disruption of the splicing code and the decoding machinery.

Predictive identification of exonic splicing enhancers in human genes.

Improved splice site detection in Genie.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 127

Cited by 189

Most referenced authors 1,729