+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comprehensive splice-site analysis using comparative genomics

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          We have collected over half a million splice sites from five species— Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana—and classified them into four subtypes: U2-type GT–AG and GC–AG and U12-type GT–AG and AT–AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT–AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3′ splice sites (3′ss) and (iv) distinct evolutionary histories of 5′ and 3′ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.

          Related collections

          Most cited references 67

          • Record: found
          • Abstract: found
          • Article: not found

          Understanding alternative splicing: towards a cellular code.

          In violation of the 'one gene, one polypeptide' rule, alternative splicing allows individual genes to produce multiple protein isoforms - thereby playing a central part in generating complex proteomes. Alternative splicing also has a largely hidden function in quantitative gene control, by targeting RNAs for nonsense-mediated decay. Traditional gene-by-gene investigations of alternative splicing mechanisms are now being complemented by global approaches. These promise to reveal details of the nature and operation of cellular codes that are constituted by combinations of regulatory elements in pre-mRNA substrates and by cellular complements of splicing regulators, which together determine regulated splicing pathways.
            • Record: found
            • Abstract: found
            • Article: not found

            Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

             G Hertz,  G Stormo (2015)
            Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the relatedness of the aligned sequences. If the alignment is not known, one can be determined by finding an alignment that optimizes the scoring scheme. We describe four components to our approach for determining alignments of multiple sequences. First, we review a log-likelihood scoring scheme we call information content. Second, we describe two methods for estimating the P value of an individual information content score: (i) a method that combines a technique from large-deviation statistics with numerical calculations; (ii) a method that is exclusively numerical. Third, we describe how we count the number of possible alignments given the overall amount of sequence data. This count is multiplied by the P value to determine the expected frequency of an information content score and, thus, the statistical significance of the corresponding alignment. Statistical significance can be used to compare alignments having differing widths and containing differing numbers of sequences. Fourth, we describe a greedy algorithm for determining alignments of functionally related sequences. Finally, we test the accuracy of our P value calculations, and give an example of using our algorithm to identify binding sites for the Escherichia coli CRP protein. Programs were developed under the UNIX operating system and are available by anonymous ftp from ftp://beagle.colorado.edu/pub/consensus.
              • Record: found
              • Abstract: found
              • Article: not found

              RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression.

              A systematic analysis of the RNA splice junction sequences of eukaryotic protein coding genes was carried out using the GENBANK databank. Nucleotide frequencies obtained for the highly conserved regions around the splice sites for different categories of organisms closely agree with each other. A striking similarity among the rare splice junctions which do not contain AG at the 3' splice site or GT at the 5' splice site indicates the existence of special mechanisms to recognize them, and that these unique signals may be involved in crucial gene-regulation events and in differentiation. A method was developed to predict potential exons in a bare sequence, using a scoring and ranking scheme based on nucleotide weight tables. This method was used to find a majority of the exons in selected known genes, and also predicted potential new exons which may be used in alternative splicing situations.

                Author and article information

                Nucleic Acids Res
                Nucleic Acids Research
                Nucleic Acids Research
                Oxford University Press
                12 August 2006
                : 34
                : 14
                : 3955-3967
                Cold Spring Harbor Laboratory 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
                Author notes
                *To whom correspondence should be addressed. Tel: +1 516 367 8864; Fax: +1 516 367 8389; Email: sachidan@ 123456cshl.edu

                Present address: Nihar Sheth, Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284-2030, USA

                The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors

                © 2006 The Author(s).

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.




                Comment on this article