KaBOB: ontology-based semantic integration of biomedical databases

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.

Results

We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license.

Conclusions

KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts ( e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0559-3) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: not found

BioPAX – A community standard for pathway data sharing

Emek Demir, Michael P. Cary, Suzanne Paley … (2010)

BioPAX (Biological Pathway Exchange) is a standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data (http://www.biopax.org). Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.

0 comments Cited 270 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Bio2RDF: towards a mashup to build bioinformatics knowledge systems.

Francois Belleau, Marc-Alexandre Nolin, Nicole Tourigny … (2008)

Presently, there are numerous bioinformatics databases available on different websites. Although RDF was proposed as a standard format for the web, these databases are still available in various formats. With the increasing popularity of the semantic web technologies and the ever growing number of databases in bioinformatics, there is a pressing need to develop mashup systems to help the process of bioinformatics knowledge integration. Bio2RDF is such a system, built from rdfizer programs written in JSP, the Sesame open source triplestore technology and an OWL ontology. With Bio2RDF, documents from public bioinformatics databases such as Kegg, PDB, MGI, HGNC and several of NCBI's databases can now be made available in RDF format through a unique URL in the form of http://bio2rdf.org/namespace:id. The Bio2RDF project has successfully applied the semantic web technology to publicly available databases by creating a knowledge space of RDF documents linked together with normalized URIs and sharing a common ontology. Bio2RDF is based on a three-step approach to build mashups of bioinformatics data. The present article details this new approach and illustrates the building of a mashup used to explore the implication of four transcription factor genes in Parkinson's disease. The Bio2RDF repository can be queried at http://bio2rdf.org.

0 comments Cited 205 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

Henning Hermjakob, Luisa Montecchi-Palazzi, Gary Bader … (2004)

A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

0 comments Cited 199 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Kevin M Livingston: Kevin.Livingston@ucdenver.edu

Michael Bada: Mike.Bada@ucdenver.edu

William A Baumgartner Jr: William.Baumgartner@ucdenver.edu

Lawrence E Hunter: Larry.Hunter@ucdenver.edu

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date (Electronic): 23 April 2015

Publication date PMC-release: 23 April 2015

Publication date Collection: 2015

Volume: 16

Issue: 1

Electronic Location Identifier: 126

Affiliations

Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO USA

Article

Publisher ID: 559

DOI: 10.1186/s12859-015-0559-3

PMC ID: 4448321

PubMed ID: 25903923

SO-VID: cdf6fe68-c694-40eb-b427-ec502de18045

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 23 October 2014

Date accepted : 30 March 2015

Custom metadata

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: knowledge representation and reasoning,semantic data integration,biomedical,databases,open biomedical ontologies,semantic web,owl,rdf

Data availability:

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: knowledge representation and reasoning, semantic data integration, biomedical, databases, open biomedical ontologies, semantic web, owl, rdf

KaBOB: ontology-based semantic integration of biomedical databases

Read this article at

Abstract

Background

Results

Conclusions

Electronic supplementary material

Related collections

BIO Integration

Most cited references 33

BioPAX – A community standard for pathway data sharing

Bio2RDF: towards a mashup to build bioinformatics knowledge systems.

The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 104

Cited by 17

Most referenced authors 1,012