118
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources

      research-article
      1 , 2 , 3 , 3 , 4 , 3 , 5 , 3 , 6 , 3 , 4 , 3 , 5 , 3 , 4 , 3 , 7 , 3 , 8 , 3 , 9 , 3 , 8 , 3 , 10 , 11 , 12 , 3 , 13 , 3 , 9 , 3 , 4 , 14 , 15 , 16 , 17 , 18 , 17 , 14 , 18 , 19 , 20 , 21 , 3 , 22 , 23 , 24 , 25 ,   26 , 27 , 28 , 24 , 29 , 4 , 30 , 31 , 32 , 23 , 33 , 34 , 34 , 35 , 36 , 32 , 30 , 31 , 37 , 4 , 38 , 39 , 4 , 40 , 34 , 32 , 24 , 24 , 3 , 4 , 41 , 27 , 42 , 43 , 44 , 45 , 46 , 35 , 32 , 47 , 33 , 24 , 45 , 34 , 24 , 24 , 3 , 22 , 3 , 6 , 3 , 5 , 9 , 3 , 7 , 3 , 4 , 48
      Nucleic Acids Research
      Oxford University Press

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Human Phenotype Ontology (HPO)—a standardized vocabulary of phenotypic abnormalities associated with 7000+ diseases—is used by thousands of researchers, clinicians, informaticians and electronic health record systems around the world. Its detailed descriptions of clinical abnormalities and computable disease definitions have made HPO the de facto standard for deep phenotyping in the field of rare disease. The HPO’s interoperability with other ontologies has enabled it to be used to improve diagnostic accuracy by incorporating model organism data. It also plays a key role in the popular Exomiser tool, which identifies potential disease-causing variants from whole-exome or whole-genome sequencing data. Since the HPO was first introduced in 2008, its users have become both more numerous and more diverse. To meet these emerging needs, the project has added new content, language translations, mappings and computational tooling, as well as integrations with external community data. The HPO continues to collaborate with clinical adopters to improve specific areas of the ontology and extend standardized disease descriptions. The newly redesigned HPO website ( www.human-phenotype-ontology.org) simplifies browsing terms and exploring clinical features, diseases, and human genes.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: not found

          Resolution of Disease Phenotypes Resulting from Multilocus Genomic Variation.

          Background Whole-exome sequencing can provide insight into the relationship between observed clinical phenotypes and underlying genotypes. Methods We conducted a retrospective analysis of data from a series of 7374 consecutive unrelated patients who had been referred to a clinical diagnostic laboratory for whole-exome sequencing; our goal was to determine the frequency and clinical characteristics of patients for whom more than one molecular diagnosis was reported. The phenotypic similarity between molecularly diagnosed pairs of diseases was calculated with the use of terms from the Human Phenotype Ontology. Results A molecular diagnosis was rendered for 2076 of 7374 patients (28.2%); among these patients, 101 (4.9%) had diagnoses that involved two or more disease loci. We also analyzed parental samples, when available, and found that de novo variants accounted for 67.8% (61 of 90) of pathogenic variants in autosomal dominant disease genes and 51.7% (15 of 29) of pathogenic variants in X-linked disease genes; both variants were de novo in 44.7% (17 of 38) of patients with two monoallelic variants. Causal copy-number variants were found in 12 patients (11.9%) with multiple diagnoses. Phenotypic similarity scores were significantly lower among patients in whom the phenotype resulted from two distinct mendelian disorders that affected different organ systems (50 patients) than among patients with disorders that had overlapping phenotypic features (30 patients) (median score, 0.21 vs. 0.36; P=1.77×10(-7)). Conclusions In our study, we found multiple molecular diagnoses in 4.9% of cases in which whole-exome sequencing was informative. Our results show that structured clinical ontologies can be used to determine the degree of overlap between two mendelian diseases in the same patient; the diseases can be distinct or overlapping. Distinct disease phenotypes affect different organ systems, whereas overlapping disease phenotypes are more likely to be caused by two genes encoding proteins that interact within the same pathway. (Funded by the National Institutes of Health and the Ting Tsung and Wei Fong Chao Foundation.).
            • Record: found
            • Abstract: found
            • Article: not found

            The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information

            Background Mammalian phenotypes are complex and the term itself is imprecise. Generally, we use the word phenotype in referring to the appearance or manifestation of a set of traits in an individual that result from the combined action and interaction of genotype and environment. Because mouse is the premier model organism for the study of human biology and disease, the goal of comparative phenotyping and building new animal models through genetic engineering holds great promise. The mouse has distinct advantages for studies that translate to humans. It is a small, short-lived mammal with a fully sequenced genome in which all life stages can be accessed, and for which myriad tools are available for precisely experimentally manipulating its genome. Further, the large collection of inbred strains of mice and the controlled environment in which the animals live provides the ability to confirm phenotype observations and to systematically perturb environmental factors and genetic input to measure effects under defined conditions. Current international efforts to 'make a mutation' for every gene through mutagenesis [1] and genetic engineering [2,3] make it imperative for phenotype data to be represented in standard descriptive formats to enable computational analysis and comparison. Mammalian phenotypes are frequently genetically complex. Mutation of even a single gene almost always produces pleiotropic effects. Conversely, non-allelic mutations can produce indistinguishable phenotypes. Modifier genes and epistatic interactions can markedly alter the phenotype. Combining different allelic combinations of different genes can produce unique phenotypes not found in the single-gene mutation genotype. Imprinting of genes can dramatically affect phenotype. Mutations expressed in different inbred strains of mice can manifest as an increase or decrease of severity or penetrance of the corresponding phenotype. Quantitative trait loci (QTL) can contribute in complex nonlinear ways to the phenotype. In addition, mutations that are 'genomic' in nature, either disrupting or deleting multiple genes or occurring in intergenic regions, can produce distinct phenotypes and challenge us to think beyond gene effects to genomic effects. The outcome of these complex interactions can be dissected and reproducibly examined by characterizing inbred strains that represent the combined phenotype of the 'whole-genome' genotype in its environmental context. The Mouse Genome Database (MGD) at the Mouse Genome Informatics website [4,5] serves as the model organism database for mouse, representing the genetics, genomics and biology of the mouse and as a community resource for mammalian studies. Significant reorganization and modeling of phenotypes is now underway to support these data robustly, to represent phenotypes in ways that are computationally accessible, and to provide human interfaces to these data that will enable knowledge building and hypothesis generation. One component of this work is the development of the Mammalian Phenotype (MP) Ontology, a structured vocabulary that will aid in standardizing annotations and, with its concepts definitions, unambiguously describe phenotypic observations. Results and discussion The problems of text Written descriptions of phenotypes in higher organisms reflect the complexity of the subject, the richness of language, and the phenomenal diversity that these data represent. While text descriptions are commonly used in publications describing phenotype, and have been the basis of electronically accessible phenotypic descriptions (for example, Online Mendelian Inheritance in Man (OMIM) [6] and the Mouse Locus Catalog (MLC) [7], text is unreliable for searching, either manually or computationally. From the user's perspective, even the best full-text search including Boolean operators will miss appropriate records (false negatives) and return unwanted records (false positives). Consider the example in Table 1 where searches were done to find spontaneous mutations in which mice were entirely or partially devoid of hair/fur. To obtain a complete result, the user would need to use a number of search terms and synonyms. The wording within the text depends upon the author of the record and his/her particular word usage and editorial style. A minimum of four search terms is needed to recover the 27 relevant mutations displayed in this table and it cannot be ascertained if this is a complete set of mutations for this phenotype. Conversely, the user is returned with 23 irrelevant results. Irrelevant results can be returned for many reasons including, but not limited to, the following: the author of the record is contrasting the phenotype of a mutation in one gene with a mutation in another gene; the author is making a statement that includes the negation of the trait; the match is based on gene name rather than phenotype; the mutant was used as a linkage marker to genetically map another gene. A further detriment to database text records is their difficulty to update and maintain. As new information is learned about a phenotypic mutant, the record must be continually rewritten. Although this practice might be sustained for a small number of records, it does not scale when thousands of mutant records are considered. The alternative of simply adding on another paragraph to existing text records becomes confusing, with potentially conflicting information and different writing styles appearing in one textual description, and unwieldy, with more and more text that may no longer represent a logical synthesis. Nomenclatures, vocabularies and ontologies Formal nomenclatures for genes, mutant alleles and inbred strains of mice have existed since the 1940s [8,9]. The MGD [4] serves as the authoritative source for the names and symbols associated with mouse genes, alleles and strains. The advantage of applying such nomenclatures has been increasingly recognized as genomes become better defined and the realized power of comparative genomics allows homologous and orthologous gene relationships to be explicitly defined. At present, human, mouse and rat gene nomenclatures operate in parallel, using coordinated symbols for all three species' genes. In addition, mouse and rat strain nomenclatures were merged to one standard strain nomenclature recently, making strain identity and nomenclature conventions consistent. Nomenclature guides for mouse and rat genes, mutant alleles, and strains are available online and regularly revised based on international nomenclature committees' reviews [10]. Beyond nomenclatures, which are key to object identities and relationships, are vocabularies that can be used to describe broader concepts and categorizations. Vocabularies can take many forms, including simple lists of controlled terms, such as the cytogenetic band designations used to name the bands defined by chromosome staining or the classes of genetic markers, such as gene, pseudogene, expressed sequence tag (EST), and so forth. The annotation of complex biological data and concepts requires more than lists and simple vocabularies. Ontologies, or 'descriptions of what there is', contain both concepts, with precise meanings, and relationships among those concepts. As such ontologies are able to support descriptions of complex biology and are useful in making these data more amenable to computational analyses. The first widely used ontology developed and adopted in the biological domain is the Gene Ontology (GO) [11-13] which contains concepts of molecular function, cellular localization and biological process for annotating the functional aspects of genes. The GO is structured as a directed acyclic graph (DAG), where each vocabulary term (node) may have both multiple parent term and multiple child term relationships. MGD uses GO extensively for gene annotation [14]. In addition, MGD has adopted the Mouse Embryo Anatomy Nomenclature Database [15] and the Anatomical Dictionary for the Adult Mouse [16] for annotating data that include anatomical attributes, such as tissue sources for clones and phenotypes. The Gene Expression Database (GXD) [17], integrated with MGD through the Mouse Genome Informatics (MGI) system [4], applies these anatomical ontologies as a central concept in the description of expression data. Mammalian Phenotype Ontology Although the need for vocabularies as key components to consistent phenotype annotations for mammals has been recognized for some time [18], and many smaller controlled vocabularies have been implemented to describe various aspects of phenotype in MGD (for example, class of mutation, embryonic stem (ES) cell lines used for generating targeted mutations, type of inheritance), much of the data has remained in text form. Over the past two years, the Mammalian Phenotype (MP) Ontology has emerged to more precisely describe phenotypes, and to allow easier access to phenotype-sequence interactions. Our goal is to describe the richness of phenotypes as precisely as they are known, recognizing that phenotype data are by nature complex and usually incomplete. Taking advantage of structural properties of a DAG, we have the ability to annotate phenotypes to the level of data resolution available, whether general or very specific and the ability to query with a high-level term, returning all phenotypes containing annotations to that term or to terms more specific than the query term. Thus, one can query for 'respiratory signs/symptoms' and retrieve all phenotypes annotated to this term and its hierarchical 'children' (abnormal breathing, abnormal respiratory sounds, anoxia, apnea, dyspnea, hypercapnia, and so on), or specifically request annotations to any of these sub-terms. The top level terms of the MP Ontology include physiological systems, behavior, developmental phenotypes and survival/aging. Physiological systems branch into morphological and physiological phenotypes at the level immediately below. A browser to view the ontology is available at [19] (Figure 1). In this browser the DAG structure is flattened into a hierarchy, with multiple hierarchies representing unique paths to a term displayed sequentially. MP terms and synonyms can be searched or users can browse the ontology starting from the high-level terms and open levels continuously to increasingly granular terms. Each MP ontology term has a unique identifier, a definition and synonyms. In the term detail pages, these data and the number of hierarchical paths of the vocabulary where the terms appear are displayed. A plus sign following the term indicates that children of this term exist. In this figure, displayed next to the term, is a link indicating the number of annotation instances in MGD using this term or children of this term. This feature, due to be publicly available in early 2005, will greatly improve phenotype-centric searching in MGD. Developing the MP vocabulary To initiate the vocabulary, we first developed a high-level categorization of phenotypes consisting of approximately 100 terms, such as heart/cardiovascular dysmorphology and skeletal axial defects. As we used this list for annotations, terms were refined and general organizing principles for the MP vocabulary were developed. An important component of our approach has been to address two practical implementation questions. From the biologist's perspective, the question is what term would be used to describe a specific phenotypic trait. From the curation perspective, we ask what terms reflect biological reality and maximize curator productivity. From a purely ontological perspective, every trait could be broken down into a core object, such as 'cornea' or 'gastrulation', defined by anatomical, behavioral or physiological terms, and a series of attribute vocabularies that describe the quality, quantity and character of a trait. For the practical reason of needing robust terms to describe phenotypes up-front to speed curation and the problem of losing biological meaning, particularly for clinical or dysmorphology terms, when terms are completely deconstructed (that is, the sum of the parts is less than the term itself), we have chosen to use compound terms in the MP Ontology. A few examples of terms where it is difficult to preserve the full biological meaning once they are deconstructed are shown in Table 2. In addition, it should be noted that each of these terms requires multiple annotations to recover all aspects that the single term provides. Use of complex terms in the MP Ontology, however, does not preclude also storing the decomposed version should this later prove desirable (see PATO model discussed in [20]). More important, the MP Ontology can currently hold, for each term, database cross-references to other ontologies. This is a common practice in GO when compound terms are developed. For the MP Ontology, these cross-references include anatomical terms from the Mouse Anatomy ontologies [15,16] and the GO process terms [21]. Three major strategies are being pursued to further develop the vocabulary itself. First and most important is through the ongoing process of curating phenotype data. As new phenotypic traits are described and published, the need for new terms is recognized. New terms added in this way may be a simple addition to an existing hierarchical path or may result in the addition of entire new branches in the hierarchy. Second, collaborative efforts between the MGD phenotype curators, the mouse mutagenesis centers and the rat genetics community identify new specific terms and suggest improved organization of terms within particular hierarchical branches. Third, we are recruiting individuals with expertise in specific biological domains to review and evaluate sections of the vocabulary for accuracy, completeness and systematic arrangement. The MP Ontology is a work in progress and remains incomplete in some areas. We welcome the participation of the mammalian research community so that the most useful, definitive and universally applicable terms will be included. Information can be obtained by sending e-mail to pheno@informatics.jax.org. While common pathological and clinical terms are used in the MP Ontology, considerations for term placement within the structure and for precise terminology is often derived from comparison with other open biological ontologies (OBO) [22]. Recently, a cell-type ontology has become available [23] and a comparison of terminology to this ontology has not yet been completed. We are working with the mutant mouse pathology database Pathbase [24,25] to map and cross-reference terms from their Pathology Ontology. Vocabulary tools The MP Ontology was built as a DAG using the DAG-Edit software written by John Richter and Suzanna Lewis [26]. The MP Ontology is updated daily and can be browsed or searched online at [19]. MP files also are available in flat file format and OBO format from our ftp site [27] and are posted at the OBO site [28]. Phenotype data annotation Phenotypes are described in the MGD relative to the genotype of the individual. Genotype objects specifically consist of one or more allele pairs describing mutations or QTLs and the genetic background strain(s) where the phenotype was observed. Each phenotype annotation associates a MP Ontology term with a genotype/strain and the reference or data source supporting this assertion. Additional modifying text may be annotated to describe detail that is not easily standardized. Examples include experimental conditions, age of onset and incidence, and trait penetrance, among others. The annotation note may also include specifics of the phenotype where such details are deemed to be too case specific to be a MP term. In addition, genotypes are associated with OMIM where a particular mouse genotype is a model for human diseases and syndromes. Figure 2 shows the portion of one phenotype record that uses the MP Ontology. Conclusions The MP Ontology and annotation schema was designed to minimize curatorial time, yet remain precise enough to describe phenotypic data. It supports robust phenotypic annotations and querying capabilities for mouse phenotype data. While this vocabulary is far from complete, we have designed strategies for its continued development as a collaborative effort for supporting the representation of existing mutations and those that continue to be created. As of 1 November 2004, over 11,150 phenotypic alleles representing mutations in 5,214 unique genes had been catalogued in MGD. For these alleles, 9,696 genotype records exist, with 21,556 phenotypic annotation instances. The MP Ontology is also used in phenotypic data annotations at the RGD [29]. As our database groups continue to accumulate annotations, it will be possible to mine these data to ask interesting questions about similarities and differences in comparable allele effects between the species, as well as within species. Comparative phenotype data will potentially uncover new modifier effects and point to new pathway relationships and genetic networks tied to disease processes. The MP Ontology will be critical for enabling computational analyses and providing a framework for improved web views and other human-comprehensible displays for the research community.
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              An ontology for cell types

              Background One of the most challenging problems now facing the model organism databases is the formal description of phenotypic data. While some databases, for example those for mouse (Mus musculus) [1], corn (Zea mays) [2] and fruit fly (Drosophila melanogaster) [3], include a rich heritage of data describing the phenotypes of mutants, and some progress is being made to bring these data into a well structured computable representation [3-5], the annotation of these phenotypes is hampered by a lack of structured information describing a variety of other biological objects, including cell types. A structured vocabulary of cell types is also required by databases for the description of other biological objects, such as gene-expression data. In addition, using the same concepts for the description of these data in all of these databases would facilitate interoperability among them. To address these needs, we have developed an ontology that describes the cell types of the major model organisms, both animal and plant. Its use will allow a biologist to query a single database with such questions as: list all of the cell types in mouse that express the Notch gene and all of the cell types in Drosophila and Caenorhabditis elegans that express the closest homolog of this gene; list all of the genes in mouse, rat, human and zebrafish that are expressed in the cell type Schwann_cell; CL:0000218; list all of the genes in D. melanogaster and C. elegans that have a mutant phenotype in the cell types that develop from the cell type myoblast; CL:0000056. The use of the cell ontology will thereby promote the de facto integration of data from diverse databases. Since the development of the Gene Ontology (GO) for the annotation of attributes of gene products [6], many ontologies have been developed in the model organism informatics community. Several of these are available, in a choice of common formats, from the Open Biological Ontologies (OBO) site [7]. They include comprehensive developmental and anatomical ontologies for many model organisms (for example, mouse, Drosophila, Arabidopsis thaliana and C. elegans), and ontologies for mouse pathology and human disease. There are several other ontologies that include cell types such as Systematized Nomenclature of Medicine (SNOMED) [8], the Foundational Model of Anatomy (FMA) [9], the anatomy ontologies used in model organism databases at the OBO site [7], vocabularies used by the resources that hold cell lines such as the American Type Cell Collection (ATCC) or the European Collection of Cell Cultures (ECACC) [10,11], and others [12,13]. Our approach for handling cell types differs from that adopted by these resources. First, SNOMED, FMA and the species-specific anatomy ontologies explicitly assume that the cell types they include are associated with one particular organism. Their identifiers cannot therefore be used to annotate cell types from other organisms, even if these cell types are essentially identical to those in the organism-specific ontologies. Second, these resources, together with those that hold cell lines (for example, ECACC and ATCC), tend to define cell types as constituents of tissues rather than provide phenotypic information about their attributes - the knowledge that they encapsulate is severely limited. Third, some ontologies do not have publicly available identifiers for each term; hence they cannot be used for general annotation [10,11]. The Plant Ontology [14] provides a cell type node that shares some of the organizing principles of our cell ontology, but it is limited to those cell types found in plants. For all these reasons, we set out to produce an organism-independent ontology of cell types based on their properties (such as functional, histological and lineage classes) and report here the availability on the Open Biological Ontologies site [7] of this ontology, which incorporates the cell types possessed by a broad range of phyla and is defined by a rich set of criteria. Results The ontology The first design decision was whether we should attempt to integrate cell types from all phyla within a single ontology or build independent ontologies for different taxonomic groups. The former has the great advantage of facilitating de facto integration of data from diverse databases, as described above. This approach does, however, pose conceptual problems: for example, are a mammalian 'muscle_cell' and a nematode 'muscle_cell' homologous? In this particular example we have little doubt that the answer is 'yes'; both of these cell types are evolutionary descendants of the first metazoan's 'muscle_cell'. In other cases, however, matters are not quite as straightforward, a plant 'hair_cell', a 'hair_cell' of the mammalian cochlea and an insect 'hair_cell' are probably not homologous, despite some similarities in their functions and genes expressed within them [15]. Despite these problems in building an 'integrated' cell-type ontology, the advantages, were we to succeed, outweigh them, and we have therefore taken this approach to develop a single ontology that integrates cell types from different phyla. The ontology consists of concepts or terms (nodes) that are linked by two types of relationships (edges). This means that the ontology appears as a complex hierarchy (technically known as a directed acyclic graph, or DAG) where a given term (or concept) may not only have several children, but also several parents. The parent and child terms are connected to each other by is_a and develops_from relationships. The former is a subsumption relationship, in which the child term is a more restrictive concept than its parent (thus chondrocyte is_a mesenchyme_cell). The latter is used to code developmental lineage relationships between concepts, for example that a hepatocyte develops_from a mesenchymal_cell. The is_a relationship implies inheritance, so that any properties of the parent concept are inherited by its children; the develops_from concept carries no inheritance implications. The rules for building the ontology are the same as those defined by the GO Consortium. That is, each concept in the Cell Ontology has an identifier with the syntax CL:nnnnnnn, where nnnnnnn is a unique integer, and CL identifies the Cell Ontology, (concepts should always be cited with their full identifier when being used in the context of a database). In addition, if there are precisely equivalent terms in other databases, for example in the Fungal Anatomy [16], Arabidopsis [17], Plant Ontology [14] or FlyBase databases [3], then the unique identifiers from these databases are included in the Cell Ontology. Most concepts in the Cell Ontology are provided with free-text definitions and may have one or more synonyms. Within the context of this ontology, synonyms are precise; a concept and its synonym can be exchanged without changing the concept's meaning. We use the same stratagem as does the GO when we have concepts that are lexically identical but have different meanings in different communities [18]. Thus, it is far from obvious that vertebrate and invertebrate pigment cells are homologous and these concepts are therefore described as pigment_cell_(sensu_Vertebrata) and pigment_cell_(sensu_Nematoda_and_Protostoma, respectively. The two top-level nodes of the Cell Ontology are cell_in_vivo and experimentally_modified_cell. The former includes cell types that occur in nature, the latter those that are experimentally derived, including cell lines and such constructs as protoplasts. Experimentally derived cells are under-represented in the current version of the ontology. Naturally occurring cells are classified both by organism-independent categories and by organism (animal cells, plant cells, prokaryotic cells). The organism-independent classification of cells follows several different criteria that include: 'function' (for example, electrically_excitable_cell, secretory_cell, photosynthetic_cell), histology (for example, epthelial_cell, mesenchyme_cell), lineage (for example, ectodermal_cell, endodermal_cell) and ploidy (for example, haploid_cell, polyploid_cell). The present version of the Cell Ontology has an average 'depth' of about 10 nodes. The richness of the ontology can be illustrated by example (Figure 1). Kupffer cells are specialized vertebrate macrophages of the reticuloendothelial system. They function to filter small foreign particles (including bacteria) and old reticulocytes from the blood. In the Cell Ontology they are to be found by their function (they are a type of defensive_cell), by their lineage (they are derived from a mesodermal_cell derived from a hematopoietic_stem_cell, itself a type of stem_cell), by their morphology (they are a type of circulating_cell) and by their organism (they are a type of animal_cell). Discussion Ontologies in bioinformatics are intended to capture and formalize a domain of knowledge, and the ontology reported here attempts to do this within the domain of cell types. It is designed to be useful in the sense that a researcher should be able to find, in a rapid and intuitive way, any cell type in any of the major model organisms and, having found it, learn a considerable amount about that cell type and its relationships to other biological objects. A core feature of the ontology, and one that differentiates it from other resources that contain cell types such as SNOMED and the FMA [8,9], and the Drosophila and Arabidopsis ontologies [3,17], is that the cell ontology explicitly sets out to include cell types from all the major model organisms within a common framework. In addition, it also seeks to incorporate a great deal of phenotypic information about these cell types and is thus far more comprehensive in its cellular detail than these other resources. The intention is that the new cell-type ontology should provide organism-independent knowledge as well as cell-type unique identifiers (ID) that can be incorporated into any database holding cell-type-associated knowledge. The formalized structure of the ontology, together with its set of unique IDs, will allow curators to incorporate cell-type data into their databases, integrate the data with the knowledge encapsulated in the ontology, and use the IDs to interoperate with other databases. While we expect such bioinformatics applications to be its immediate use, we hope that, in the longer term, all biologists will find the ontology useful. The expected short-term use of the ontology will thus be in cataloguing phenotypes and gene expression patterns. Indeed, it is quite surprising that those who work with model organisms still lack the bioinformatics resources needed to catalogue, archive and access the details of the phenotypes emerging from mutant screens and natural variations. A robust representation of normal and mutant phenotypes in all of the model organisms will require ontologies for a wide range of macroscopic properties (pathology, anatomy, abnormal quantifiers, and so on) and we view the cell ontology as a component of this programme that should be useful in cataloguing phenotypes (and other attributes) associated with cell types. In the long term we expect that molecular biology and biological databases will move beyond being gene-centric and that biological mechanisms will be studied at a more integrated level. Cells are the biological units with which tissues and organs and organ systems are built. A rich and explicit description of cell types across phyla that are adapted by biological databases will help facilitate this transition. Finally, it should be pointed out that, like many such resources, this ontology is not complete: although it contains all the common cell types, there will certainly be some that have been omitted. Most importantly, although many of the cell types are fully described by function, morphology, organism, and so on, others are inadequately described and more relationships need to be made. A particular weakness is the fact that the category identified as experimentally_modified_cell has yet to be populated, and doing this will involve consideration of the various cell lines held in the major collections. As with other community resources, community input is essential for the development and maintenance of the Cell Ontology; biologists with comments and additions are therefore welcome to contribute to the ontology and should contact the curator ashburner@ebi.ac.uk. Materials and methods The ontology includes the major cell types from the major model organisms (for example, human, mouse, Drosophila, Caenorhabditis, zebrafish, Dictyostelium discoideum, Arabidopsis, fungi and prokaryotes). These cell types have been collated from our own knowledge, from major textbooks (for example [20-22]), from the embryo and anatomy ontologies available on the OBO site [7], and from colleagues (who are thanked in the acknowledgements). The ontology currently holds some 680 cell types, together with their synonyms and, in most cases, text definitions. The ontology was constructed using the open source Java tool OBO-Edit (previously known as DAG-Edit) [23], which is convenient for building ontologies that are consistent with the GO formalism. The resulting ontology is available in both the GO 'flat-file' format [24] and the newly defined 'OBO format' [25], and can easily be viewed using the OBO-Edit or the COBrA open source Java tool [26]. Availability The Cell Ontology is available from the OBO site [19]. Following the cell.obo link will take the user to a page in which the current version of the Ontology, and archived older versions, can be viewed (view) or downloaded (download). Differences between the current and previous version can be seen by following the Diff to link.

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                08 January 2019
                22 November 2018
                22 November 2018
                : 47
                : Database issue , Database issue
                : D1018-D1027
                Affiliations
                [1 ]Charité Centrum für Therapieforschung, Charité—Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin 10117, Germany
                [2 ]Einstein Center Digital Future, Berlin 10117, Germany
                [3 ]Monarch Initiative, monarchinitiative.org
                [4 ]The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
                [5 ]Oregon Health & Science University, Portland, OR 97217, USA
                [6 ]Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
                [7 ]Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
                [8 ]European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
                [9 ]Linus Pauling institute, Oregon State University, Corvallis, OR, USA
                [10 ]William Harvey Research Institute, Queen Mary University College of London
                [11 ]UCL Genetics Institute, University College of London
                [12 ]UCL Institute of Ophthalmology, University College of London
                [13 ]Renaissance Computing Institute, University of North Carolina at Chapel Hill
                [14 ]Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, WA, Australia
                [15 ]School of Paediatrics and Telethon Kids Institute, University of Western Australia, Perth, WA, Australia
                [16 ]Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA, Australia
                [17 ]Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia
                [18 ]The Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia
                [19 ]SimulConsult, Chestnut Hill, MA, USA
                [20 ]Neurogenetics Research Group, Vrije Universiteit Brussel, Brussels, Belgium
                [21 ]Pediatric Neurology Unit, Department of Pediatrics, UZ Brussel, Brussels, Belgium
                [22 ]Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
                [23 ]Centre for Computational Medicine, Hospital for Sick Children and Department of Computer Science, University of Toronto, Toronto, Canada
                [24 ]National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
                [25 ]Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin & Marquette University, 8701 Watertown Plank Road Milwaukee, WI 53226, USA
                [26 ]Bioscientia GmbH, Ingelheim, Germany
                [27 ]CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona 08028, Spain
                [28 ]Universitat Pompeu Fabra (UPF), Barcelona, Spain
                [29 ]University of Manchester & Manchester Royal Eye Hospital, Manchester, UK
                [30 ]ICF, Rockville, MD, USA
                [31 ]National Center for Advancing Translational Sciences, Office of Rare Diseases Research, National Institutes of Health, Bethesda, MD, USA
                [32 ]INSERM, US14—Orphanet, Plateforme Maladies Rares, 75014 Paris, France
                [33 ]The Jackson Laboratory, Bar Harbor, ME, USA
                [34 ]Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
                [35 ]Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA
                [36 ]Department of Genetics, University Medical Center Utrecht, the Netherlands
                [37 ]Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK
                [38 ]Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
                [39 ]Mount Sinai School of Medicine, New York, NY, USA
                [40 ]Institute of Cardiovascular Science, University College London, UK
                [41 ]Child Neurology Unit. Hospital Universitari Vall d’Hebron, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
                [42 ]Department of Neuropediatrics and Muscle Disorders, Medical Center—University of Freiburg, Faculty of Medicine, Freiburg, Germany
                [43 ]Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada
                [44 ]Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
                [45 ]Centre for Rare Eye Diseases CARGO, SENSGENE FSMR Network, Strasbourg University Hospital, Strasbourg, France
                [46 ]Immunology Service, Department of Laboratory Medicine, NIH Clinical Center, Bethesda, MD, USA
                [47 ]Department of Pediatrics, Division of Allergy Immunology, The Children’s Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, 3615 Civic Center Boulevard, Philadelphia, PA 19104, USA
                [48 ]Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
                Author notes
                To whom correspondence should be addressed. Tel: +1 860 837 2095; Email: peter.robinson@ 123456jax.org
                Author information
                http://orcid.org/0000-0002-5316-1399
                http://orcid.org/0000-0001-5208-3432
                http://orcid.org/0000-0001-9969-8610
                http://orcid.org/0000-0002-9353-5498
                http://orcid.org/0000-0002-0839-9955
                http://orcid.org/0000-0002-8688-6599
                http://orcid.org/0000-0003-0355-5581
                http://orcid.org/0000-0003-4557-5492
                http://orcid.org/0000-0001-5356-4174
                http://orcid.org/0000-0002-2085-5773
                http://orcid.org/0000-0002-2810-3445
                http://orcid.org/0000-0003-4606-0597
                http://orcid.org/0000-0001-8721-3022
                http://orcid.org/0000-0002-6889-0121
                http://orcid.org/0000-0002-9791-0064
                http://orcid.org/0000-0002-7284-3950
                http://orcid.org/0000-0003-2324-8001
                http://orcid.org/0000-0003-4308-6337
                http://orcid.org/0000-0003-3691-0324
                http://orcid.org/0000-0002-0736-9199
                Article
                gky1105
                10.1093/nar/gky1105
                6324074
                30476213
                f2f8fe4e-bf21-4229-a86e-1720cf861c7e
                © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 24 October 2018
                : 18 October 2018
                : 17 September 2018
                Page count
                Pages: 10
                Funding
                Funded by: National Institutes of Health 10.13039/100000002
                Award ID: OD #5R24OD011883
                Funded by: Forums for Integrative Phenomics
                Award ID: U13 CA221044-01
                Funded by: NCATS Data Translator
                Award ID: 1OT3TR002019
                Funded by: NCATS National Center for Digital Health Informatics Innovation 10.13039/100006108
                Award ID: U24 TR002306
                Funded by: National Institutes of Health 10.13039/100000002
                Award ID: 1 OT3 OD02464-01 UNCCH
                Funded by: British Heart Foundation 10.13039/501100000274
                Award ID: RG/13/5/30112
                Funded by: Division of Intramural Research NIAID 10.13039/100006492
                Award ID: 01GM1608
                Funded by: European Union ’s Horizon 2020 Research and Innovation Programme 10.13039/501100007601
                Award ID: 779257
                Categories
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article

                Related Documents Log