High-throughput methods such as EST sequencing, microarrays and deep sequencing have identified large numbers of alternative splicing (AS) events, but studies have shown that only a subset of these may be functional. Here we report a sensitive bioinformatics approach that identifies exons with evidence of a strong RNA selection pressure ratio (RSPR) —i.e., evolutionary selection against mutations that change only the mRNA sequence while leaving the protein sequence unchanged—measured across an entire evolutionary family, which greatly amplifies its predictive power. Using the UCSC 28 vertebrate genome alignment, this approach correctly predicted half to three-quarters of AS exons that are known binding targets of the NOVA splicing regulatory factor, and predicted 345 strongly selected alternative splicing events in human, and 262 in mouse. These predictions were strongly validated by several experimental criteria of functional AS such as independent detection of the same AS event in other species, reading frame-preservation, and experimental evidence of tissue-specific regulation: 75% (15/20) of a sample of high-RSPR exons displayed tissue specific regulation in a panel of ten tissues, vs. only 20% (4/20) among a sample of low-RSPR exons. These data suggest that RSPR can identify exons with functionally important splicing regulation, and provides biologists with a dataset of over 600 such exons. We present several case studies, including both well-studied examples ( GRIN1) and novel examples ( EXOC7). These data also show that RSPR strongly outperforms other approaches such as standard sequence conservation (which fails to distinguish amino acid selection pressure from RNA selection pressure), or pairwise genome comparison (which lacks adequate statistical power for predicting individual exons).
Alternative splicing is an important mechanism for regulating gene function in complex organisms, and has been shown to play a key role in human diseases such as cancer. Recently, high-throughput technologies have been used in an effort to detect alternative splicing events throughout the human genome. However, validating the results of these automated detection methods, and showing that the minor splice forms they detected play an important role in regulating biological functions, have traditionally required time-consuming experiments. In this study we show that such regulatory functions can very often be detected by a distinctive pattern of strong selection on RNA sequence motifs within the alternatively spliced region. We have measured this “RNA selection pressure ratio” (RSPR) across 28 animal species representing 400 million years of evolution, and show that this metric successfully predicts known patterns of alternative splicing, and also have validated its predictions experimentally. For example, whereas high-RSPR alternative splices were found experimentally to undergo tissue-specific regulation in 75% of cases, only 20% of low-RSPR cases were found to be tissue-specific. Using RSPR, we have predicted over 600 human and mouse alternative splicing events that appear to be under strong selection. These data should be valuable for biologists seeking to understand the functional effects and underlying mechanisms of splicing regulation.