40
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting gene function in a hierarchical context with an ensemble of classifiers

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background:

          The wide availability of genome-scale data for several organisms has stimulated interest in computational approaches to gene function prediction. Diverse machine learning methods have been applied to unicellular organisms with some success, but few have been extensively tested on higher level, multicellular organisms. A recent mouse function prediction project (MouseFunc) brought together nine bioinformatics teams applying a diverse array of methodologies to mount the first large-scale effort to predict gene function in the laboratory mouse.

          Results:

          In this paper, we describe our contribution to this project, an ensemble framework based on the support vector machine that integrates diverse datasets in the context of the Gene Ontology hierarchy. We carry out a detailed analysis of the performance of our ensemble and provide insights into which methods work best under a variety of prediction scenarios. In addition, we applied our method to Saccharomyces cerevisiae and have experimentally confirmed functions for a novel mitochondrial protein.

          Conclusion:

          Our method consistently performs among the top methods in the MouseFunc evaluation. Furthermore, it exhibits good classification performance across a variety of cellular processes and functions in both a multicellular organism and a unicellular organism, indicating its ability to discover novel biology in diverse settings.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          The transcriptional landscape of the mammalian genome.

          This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            The Jackknife, the Bootstrap and Other Resampling Plans

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Pfam: clans, web tools and services

              Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (), the USA (), France () and Sweden ().
                Bookmark

                Author and article information

                Journal
                Genome Biol
                Genome Biology
                BioMed Central
                1465-6906
                1465-6914
                2008
                27 June 2008
                : 9
                : Suppl 1
                : S3
                Affiliations
                [1 ]Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
                [2 ]Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
                [3 ]Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
                Article
                gb-2008-9-s1-s3
                10.1186/gb-2008-9-s1-s3
                2447537
                18613947
                a30e97ea-7552-47af-81a3-53acc3d33722
                Copyright © 2008 Guan et al; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                Categories
                Research

                Genetics
                Genetics

                Comments

                Comment on this article