85
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Semantic Similarity in Biomedical Ontologies

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization.

          We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.

          Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          Coexpression analysis of human genes across many microarray data sets.

          We present a large-scale analysis of mRNA coexpression based on 60 large human data sets containing a total of 3924 microarrays. We sought pairs of genes that were reliably coexpressed (based on the correlation of their expression profiles) in multiple data sets, establishing a high-confidence network of 8805 genes connected by 220,649 "coexpression links" that are observed in at least three data sets. Confirmed positive correlations between genes were much more common than confirmed negative correlations. We show that confirmation of coexpression in multiple data sets is correlated with functional relatedness, and show how cluster analysis of the network can reveal functionally coherent groups of genes. Our findings demonstrate how the large body of accumulated microarray data can be exploited to increase the reliability of inferences about gene function. Copyright 2004 Cold Spring Harbor Laboratory Press
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.

            The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Online predicted human interaction database.

              High-throughput experiments are being performed at an ever-increasing rate to systematically elucidate protein-protein interaction (PPI) networks for model organisms, while the complexities of higher eukaryotes have prevented these experiments for humans. The Online Predicted Human Interaction Database (OPHID) is a web-based database of predicted interactions between human proteins. It combines the literature-derived human PPI from BIND, HPRD and MINT, with predictions made from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Mus musculus. The 23,889 predicted interactions currently listed in OPHID are evaluated using protein domains, gene co-expression and Gene Ontology terms. OPHID can be queried using single or multiple IDs and results can be visualized using our custom graph visualization program. Freely available to academic users at http://ophid.utoronto.ca, both in tab-delimited and PSI-MI formats. Commercial users, please contact I.J. juris@ai.utoronto.ca http://ophid.utoronto.ca/supplInfo.pdf.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                July 2009
                July 2009
                31 July 2009
                : 5
                : 7
                : e1000443
                Affiliations
                [1 ]LaSIGE, Faculty of Sciences, University of Lisboa, Lisboa, Portugal
                [2 ]School of Computing Science, Newcastle University, Newcastle-upon-Tyne, United Kingdom
                University of California San Diego, United States of America
                Author notes
                Article
                08-PLCB-RV-1011R3
                10.1371/journal.pcbi.1000443
                2712090
                19649320
                59255567-5beb-49eb-9e5c-4d8f794dd35e
                Pesquita et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                Page count
                Pages: 12
                Categories
                Review
                Computational Biology
                Computer Science/Applications
                Genetics and Genomics/Bioinformatics
                Molecular Biology/Bioinformatics

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article