A Chado case study: an ontology-based modular schema for representing genome-associated biological information

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).

Related collections

Most cited references 13

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15324 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Bioperl toolkit: Perl modules for the life sciences.

Jason E Stajich, David Block, Kris Boulez … (2002)

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

0 comments Cited 715 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The generic genome browser: a building block for a model organism system database.

Lincoln D. Stein, Christopher John Mungall, ShengQiang Shu … (2002)

The Generic Model Organism System Database Project (GMOD) seeks to develop reusable software components for model organism system databases. In this paper we describe the Generic Genome Browser (GBrowse), a Web-based application for displaying genomic annotations and other features. For the end user, features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of all features, and the ability to enable and disable tracks and change their relative order and appearance. The user can upload private annotations to view them in the context of the public ones, and publish those annotations to the community. For the data provider, features of the browser software include reliance on readily available open source components, simple installation, flexible configuration, and easy integration with other components of a model organism system Web site. GBrowse is freely available under an open source license. The software, its documentation, and support are available at http://www.gmod.org.

0 comments Cited 533 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Electronic): 1460-2059

ISSN (Print): 1367-4803

Publication date Created: July 2007

Publication date Created: July 01 2007

Publication date Created: July 2007

Publication date Other: July 2007

Publication date (Print): July 01 2007

Publication date (Electronic): July 2007

Volume: 23

Issue: 13

Pages: i337-i346

Article

DOI: 10.1093/bioinformatics/btm189

PubMed ID: 17646315

SO-VID: b656322f-79e0-44d5-addc-9c22d2aad736

History

Data availability:

Comments

Comment on this article

scite_

Cited by 124

See all cited by

Most referenced authors 1,189

See all reference authors

- Version 1

A Chado case study: an ontology-based modular schema for representing genome-associated biological information

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 13

Gene Ontology: tool for the unification of biology

The Bioperl toolkit: Perl modules for the life sciences.

The generic genome browser: a building block for a model organism system database.

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 2,365

Cited by 124

Most referenced authors 1,189