66
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          MicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.

          Results

          A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information.

          Conclusion

          The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: not found
          • Article: not found

          Basic Local Alignment Search Tool

          S Altschul (1990)
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Phylogenetic shadowing and computational identification of human microRNA genes.

            We sequenced 122 miRNAs in 10 primate species to reveal conservation characteristics of miRNA genes. Strong conservation is observed in stems of miRNA hairpins and increased variation in loop sequences. Interestingly, a striking drop in conservation was found for sequences immediately flanking the miRNA hairpins. This characteristic profile was employed to predict novel miRNAs using cross-species comparisons. Nine hundred and seventy-six candidate miRNAs were identified by scanning whole-genome human/mouse and human/rat alignments. Most of the novel candidates are conserved also in other vertebrates (dog, cow, chicken, opossum, zebrafish). Northern blot analysis confirmed the expression of mature miRNAs for 16 out of 69 representative candidates. Additional support for the expression of 179 novel candidates can be found in public databases, their presence in gene clusters, and literature that appeared after these predictions were made. Taken together, these results suggest the presence of significantly higher numbers of miRNAs in the human genome than previously estimated.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The microRNAs of Caenorhabditis elegans.

              MicroRNAs (miRNAs) are an abundant class of tiny RNAs thought to regulate the expression of protein-coding genes in plants and animals. In the present study, we describe a computational procedure to identify miRNA genes conserved in more than one genome. Applying this program, known as MiRscan, together with molecular identification and validation methods, we have identified most of the miRNA genes in the nematode Caenorhabditis elegans. The total number of validated miRNA genes stands at 88, with no more than 35 genes remaining to be detected or validated. These 88 miRNA genes represent 48 gene families; 46 of these families (comprising 86 of the 88 genes) are conserved in Caenorhabditis briggsae, and 22 families are conserved in humans. More than a third of the worm miRNAs, including newly identified members of the lin-4 and let-7 gene families, are differentially expressed during larval development, suggesting a role for these miRNAs in mediating larval developmental transitions. Most are present at very high steady-state levels-more than 1000 molecules per cell, with some exceeding 50,000 molecules per cell. Our census of the worm miRNAs and their expression patterns helps define this class of noncoding RNAs, lays the groundwork for functional studies, and provides the tools for more comprehensive analyses of miRNA genes in other species.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                2005
                29 December 2005
                : 6
                : 310
                Affiliations
                [1 ]MOE Key Laboratory of Bioinformatics / Department of Automation, Tsinghua University, Beijing 100084, China
                [2 ]Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China
                [3 ]School of Electronics, University of Glamorgan, Pontypridd CF37 1DL, UK
                Article
                1471-2105-6-310
                10.1186/1471-2105-6-310
                1360673
                16381612
                d95da27a-3b8e-4bb2-9d8b-ad6dd0ccc686
                Copyright © 2005 Xue et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 5 August 2005
                : 29 December 2005
                Categories
                Methodology Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article