18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Development and use of Ontologies Inside the Neuroscience Information Framework: A Practical Approach

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          An initiative of the NIH Blueprint for neuroscience research, the Neuroscience Information Framework (NIF) project advances neuroscience by enabling discovery and access to public research data and tools worldwide through an open source, semantically enhanced search portal. One of the critical components for the overall NIF system, the NIF Standardized Ontologies (NIFSTD), provides an extensive collection of standard neuroscience concepts along with their synonyms and relationships. The knowledge models defined in the NIFSTD ontologies enable an effective concept-based search over heterogeneous types of web-accessible information entities in NIF’s production system. NIFSTD covers major domains in neuroscience, including diseases, brain anatomy, cell types, sub-cellular anatomy, small molecules, techniques, and resource descriptors. Since the first production release in 2008, NIF has grown significantly in content and functionality, particularly with respect to the ontologies and ontology-based services that drive the NIF system. We present here on the structure, design principles, community engagement, and the current state of NIFSTD ontologies.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          An ontology for cell types

          Background One of the most challenging problems now facing the model organism databases is the formal description of phenotypic data. While some databases, for example those for mouse (Mus musculus) [1], corn (Zea mays) [2] and fruit fly (Drosophila melanogaster) [3], include a rich heritage of data describing the phenotypes of mutants, and some progress is being made to bring these data into a well structured computable representation [3-5], the annotation of these phenotypes is hampered by a lack of structured information describing a variety of other biological objects, including cell types. A structured vocabulary of cell types is also required by databases for the description of other biological objects, such as gene-expression data. In addition, using the same concepts for the description of these data in all of these databases would facilitate interoperability among them. To address these needs, we have developed an ontology that describes the cell types of the major model organisms, both animal and plant. Its use will allow a biologist to query a single database with such questions as: list all of the cell types in mouse that express the Notch gene and all of the cell types in Drosophila and Caenorhabditis elegans that express the closest homolog of this gene; list all of the genes in mouse, rat, human and zebrafish that are expressed in the cell type Schwann_cell; CL:0000218; list all of the genes in D. melanogaster and C. elegans that have a mutant phenotype in the cell types that develop from the cell type myoblast; CL:0000056. The use of the cell ontology will thereby promote the de facto integration of data from diverse databases. Since the development of the Gene Ontology (GO) for the annotation of attributes of gene products [6], many ontologies have been developed in the model organism informatics community. Several of these are available, in a choice of common formats, from the Open Biological Ontologies (OBO) site [7]. They include comprehensive developmental and anatomical ontologies for many model organisms (for example, mouse, Drosophila, Arabidopsis thaliana and C. elegans), and ontologies for mouse pathology and human disease. There are several other ontologies that include cell types such as Systematized Nomenclature of Medicine (SNOMED) [8], the Foundational Model of Anatomy (FMA) [9], the anatomy ontologies used in model organism databases at the OBO site [7], vocabularies used by the resources that hold cell lines such as the American Type Cell Collection (ATCC) or the European Collection of Cell Cultures (ECACC) [10,11], and others [12,13]. Our approach for handling cell types differs from that adopted by these resources. First, SNOMED, FMA and the species-specific anatomy ontologies explicitly assume that the cell types they include are associated with one particular organism. Their identifiers cannot therefore be used to annotate cell types from other organisms, even if these cell types are essentially identical to those in the organism-specific ontologies. Second, these resources, together with those that hold cell lines (for example, ECACC and ATCC), tend to define cell types as constituents of tissues rather than provide phenotypic information about their attributes - the knowledge that they encapsulate is severely limited. Third, some ontologies do not have publicly available identifiers for each term; hence they cannot be used for general annotation [10,11]. The Plant Ontology [14] provides a cell type node that shares some of the organizing principles of our cell ontology, but it is limited to those cell types found in plants. For all these reasons, we set out to produce an organism-independent ontology of cell types based on their properties (such as functional, histological and lineage classes) and report here the availability on the Open Biological Ontologies site [7] of this ontology, which incorporates the cell types possessed by a broad range of phyla and is defined by a rich set of criteria. Results The ontology The first design decision was whether we should attempt to integrate cell types from all phyla within a single ontology or build independent ontologies for different taxonomic groups. The former has the great advantage of facilitating de facto integration of data from diverse databases, as described above. This approach does, however, pose conceptual problems: for example, are a mammalian 'muscle_cell' and a nematode 'muscle_cell' homologous? In this particular example we have little doubt that the answer is 'yes'; both of these cell types are evolutionary descendants of the first metazoan's 'muscle_cell'. In other cases, however, matters are not quite as straightforward, a plant 'hair_cell', a 'hair_cell' of the mammalian cochlea and an insect 'hair_cell' are probably not homologous, despite some similarities in their functions and genes expressed within them [15]. Despite these problems in building an 'integrated' cell-type ontology, the advantages, were we to succeed, outweigh them, and we have therefore taken this approach to develop a single ontology that integrates cell types from different phyla. The ontology consists of concepts or terms (nodes) that are linked by two types of relationships (edges). This means that the ontology appears as a complex hierarchy (technically known as a directed acyclic graph, or DAG) where a given term (or concept) may not only have several children, but also several parents. The parent and child terms are connected to each other by is_a and develops_from relationships. The former is a subsumption relationship, in which the child term is a more restrictive concept than its parent (thus chondrocyte is_a mesenchyme_cell). The latter is used to code developmental lineage relationships between concepts, for example that a hepatocyte develops_from a mesenchymal_cell. The is_a relationship implies inheritance, so that any properties of the parent concept are inherited by its children; the develops_from concept carries no inheritance implications. The rules for building the ontology are the same as those defined by the GO Consortium. That is, each concept in the Cell Ontology has an identifier with the syntax CL:nnnnnnn, where nnnnnnn is a unique integer, and CL identifies the Cell Ontology, (concepts should always be cited with their full identifier when being used in the context of a database). In addition, if there are precisely equivalent terms in other databases, for example in the Fungal Anatomy [16], Arabidopsis [17], Plant Ontology [14] or FlyBase databases [3], then the unique identifiers from these databases are included in the Cell Ontology. Most concepts in the Cell Ontology are provided with free-text definitions and may have one or more synonyms. Within the context of this ontology, synonyms are precise; a concept and its synonym can be exchanged without changing the concept's meaning. We use the same stratagem as does the GO when we have concepts that are lexically identical but have different meanings in different communities [18]. Thus, it is far from obvious that vertebrate and invertebrate pigment cells are homologous and these concepts are therefore described as pigment_cell_(sensu_Vertebrata) and pigment_cell_(sensu_Nematoda_and_Protostoma, respectively. The two top-level nodes of the Cell Ontology are cell_in_vivo and experimentally_modified_cell. The former includes cell types that occur in nature, the latter those that are experimentally derived, including cell lines and such constructs as protoplasts. Experimentally derived cells are under-represented in the current version of the ontology. Naturally occurring cells are classified both by organism-independent categories and by organism (animal cells, plant cells, prokaryotic cells). The organism-independent classification of cells follows several different criteria that include: 'function' (for example, electrically_excitable_cell, secretory_cell, photosynthetic_cell), histology (for example, epthelial_cell, mesenchyme_cell), lineage (for example, ectodermal_cell, endodermal_cell) and ploidy (for example, haploid_cell, polyploid_cell). The present version of the Cell Ontology has an average 'depth' of about 10 nodes. The richness of the ontology can be illustrated by example (Figure 1). Kupffer cells are specialized vertebrate macrophages of the reticuloendothelial system. They function to filter small foreign particles (including bacteria) and old reticulocytes from the blood. In the Cell Ontology they are to be found by their function (they are a type of defensive_cell), by their lineage (they are derived from a mesodermal_cell derived from a hematopoietic_stem_cell, itself a type of stem_cell), by their morphology (they are a type of circulating_cell) and by their organism (they are a type of animal_cell). Discussion Ontologies in bioinformatics are intended to capture and formalize a domain of knowledge, and the ontology reported here attempts to do this within the domain of cell types. It is designed to be useful in the sense that a researcher should be able to find, in a rapid and intuitive way, any cell type in any of the major model organisms and, having found it, learn a considerable amount about that cell type and its relationships to other biological objects. A core feature of the ontology, and one that differentiates it from other resources that contain cell types such as SNOMED and the FMA [8,9], and the Drosophila and Arabidopsis ontologies [3,17], is that the cell ontology explicitly sets out to include cell types from all the major model organisms within a common framework. In addition, it also seeks to incorporate a great deal of phenotypic information about these cell types and is thus far more comprehensive in its cellular detail than these other resources. The intention is that the new cell-type ontology should provide organism-independent knowledge as well as cell-type unique identifiers (ID) that can be incorporated into any database holding cell-type-associated knowledge. The formalized structure of the ontology, together with its set of unique IDs, will allow curators to incorporate cell-type data into their databases, integrate the data with the knowledge encapsulated in the ontology, and use the IDs to interoperate with other databases. While we expect such bioinformatics applications to be its immediate use, we hope that, in the longer term, all biologists will find the ontology useful. The expected short-term use of the ontology will thus be in cataloguing phenotypes and gene expression patterns. Indeed, it is quite surprising that those who work with model organisms still lack the bioinformatics resources needed to catalogue, archive and access the details of the phenotypes emerging from mutant screens and natural variations. A robust representation of normal and mutant phenotypes in all of the model organisms will require ontologies for a wide range of macroscopic properties (pathology, anatomy, abnormal quantifiers, and so on) and we view the cell ontology as a component of this programme that should be useful in cataloguing phenotypes (and other attributes) associated with cell types. In the long term we expect that molecular biology and biological databases will move beyond being gene-centric and that biological mechanisms will be studied at a more integrated level. Cells are the biological units with which tissues and organs and organ systems are built. A rich and explicit description of cell types across phyla that are adapted by biological databases will help facilitate this transition. Finally, it should be pointed out that, like many such resources, this ontology is not complete: although it contains all the common cell types, there will certainly be some that have been omitted. Most importantly, although many of the cell types are fully described by function, morphology, organism, and so on, others are inadequately described and more relationships need to be made. A particular weakness is the fact that the category identified as experimentally_modified_cell has yet to be populated, and doing this will involve consideration of the various cell lines held in the major collections. As with other community resources, community input is essential for the development and maintenance of the Cell Ontology; biologists with comments and additions are therefore welcome to contribute to the ontology and should contact the curator ashburner@ebi.ac.uk. Materials and methods The ontology includes the major cell types from the major model organisms (for example, human, mouse, Drosophila, Caenorhabditis, zebrafish, Dictyostelium discoideum, Arabidopsis, fungi and prokaryotes). These cell types have been collated from our own knowledge, from major textbooks (for example [20-22]), from the embryo and anatomy ontologies available on the OBO site [7], and from colleagues (who are thanked in the acknowledgements). The ontology currently holds some 680 cell types, together with their synonyms and, in most cases, text definitions. The ontology was constructed using the open source Java tool OBO-Edit (previously known as DAG-Edit) [23], which is convenient for building ontologies that are consistent with the GO formalism. The resulting ontology is available in both the GO 'flat-file' format [24] and the newly defined 'OBO format' [25], and can easily be viewed using the OBO-Edit or the COBrA open source Java tool [26]. Availability The Cell Ontology is available from the OBO site [19]. Following the cell.obo link will take the user to a page in which the current version of the Ontology, and archived older versions, can be viewed (view) or downloaded (download). Differences between the current and previous version can be seen by following the Diff to link.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The NIFSTD and BIRNLex vocabularies: building comprehensive ontologies for neuroscience.

            A critical component of the Neuroscience Information Framework (NIF) project is a consistent, flexible terminology for describing and retrieving neuroscience-relevant resources. Although the original NIF specification called for a loosely structured controlled vocabulary for describing neuroscience resources, as the NIF system evolved, the requirement for a formally structured ontology for neuroscience with sufficient granularity to describe and access a diverse collection of information became obvious. This requirement led to the NIF standardized (NIFSTD) ontology, a comprehensive collection of common neuroscience domain terminologies woven into an ontologically consistent, unified representation of the biomedical domains typically used to describe neuroscience data (e.g., anatomy, cell types, techniques), as well as digital resources (tools, databases) being created throughout the neuroscience community. NIFSTD builds upon a structure established by the BIRNLex, a lexicon of concepts covering clinical neuroimaging research developed by the Biomedical Informatics Research Network (BIRN) project. Each distinct domain module is represented using the Web Ontology Language (OWL). As much as has been practical, NIFSTD reuses existing community ontologies that cover the required biomedical domains, building the more specific concepts required to annotate NIF resources. By following this principle, an extensive vocabulary was assembled in a relatively short period of time for NIF information annotation, organization, and retrieval, in a form that promotes easy extension and modification. We report here on the structure of the NIFSTD, and its predecessor BIRNLex, the principles followed in its construction and provide examples of its use within NIF.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The cognitive paradigm ontology: design and application.

              We present the basic structure of the Cognitive Paradigm Ontology (CogPO) for human behavioral experiments. While the experimental psychology and cognitive neuroscience literature may refer to certain behavioral tasks by name (e.g., the Stroop paradigm or the Sternberg paradigm) or by function (a working memory task, a visual attention task), these paradigms can vary tremendously in the stimuli that are presented to the subject, the response expected from the subject, and the instructions given to the subject. Drawing from the taxonomy developed and used by the BrainMap project ( www.brainmap.org ) for almost two decades to describe key components of published functional imaging results, we have developed an ontology capable of representing certain characteristics of the cognitive paradigms used in the fMRI and PET literature. The Cognitive Paradigm Ontology is being developed to be compliant with the Basic Formal Ontology (BFO), and to harmonize where possible with larger ontologies such as RadLex, NeuroLex, or the Ontology of Biomedical Investigations (OBI). The key components of CogPO include the representation of experimental conditions focused on the stimuli presented, the instructions given, and the responses requested. The use of alternate and even competitive terminologies can often impede scientific discoveries. Categorization of paradigms according to stimulus, response, and instruction has been shown to allow advanced data retrieval techniques by searching for similarities and contrasts across multiple paradigm levels. The goal of CogPO is to develop, evaluate, and distribute a domain ontology of cognitive paradigms for application and use in the functional neuroimaging community.
                Bookmark

                Author and article information

                Journal
                Front Genet
                Front Genet
                Front. Gene.
                Frontiers in Genetics
                Frontiers Research Foundation
                1664-8021
                22 June 2012
                2012
                : 3
                : 111
                Affiliations
                [1] 1simpleNeuroscience Information Framework, Center for Research in Biological Systems, University of California San Diego La Jolla, CA, USA
                Author notes

                Edited by: John Hancock, Medical Research Council, UK

                Reviewed by: Douglas M. Bowden, University of Washington School of Medicine, USA; Qiangfeng Cliff Zhang, Columbia University, USA

                *Correspondence: Fahim T. Imam and Maryann E. Martone, Neuroscience Information Framework, Center for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093-0446, USA. e-mail: mimam@ 123456ucsd.edu ; memartone@ 123456ucsd.edu

                This article was submitted to Frontiers in Bioinformatics and Computational Biology, a specialty of Frontiers in Genetics.

                Article
                10.3389/fgene.2012.00111
                3381282
                22737162
                d8dd47c3-9f08-4589-8c84-e184ef522033
                Copyright © 2012 Imam, Larson, Bandrowski, Grethe, Gupta and Martone.

                This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.

                History
                : 23 February 2012
                : 29 May 2012
                Page count
                Figures: 7, Tables: 3, Equations: 0, References: 12, Pages: 12, Words: 8250
                Categories
                Genetics
                Methods Article

                Genetics
                semantic search,ontology reuse,ontologies,neuroscience ontology
                Genetics
                semantic search, ontology reuse, ontologies, neuroscience ontology

                Comments

                Comment on this article