Blog
About

3
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      OpenBiodiv: an Implementaion of a Semantic System Running on top of the Biodiversity Knowledge Graph

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present OpenBiodiv - an implementation of the Open Biodiversity Knowledge Management System. The need for an integrated information system serving the needs of the biodiversity community can be dated at least as far back as the sanctioning of the Bouchout declaration in 2007. The Bouchout declaration proposes to make biodiversity knowledge freely available as Linked Open Data (LOD)*1. At TDWG2016 Fig. 1) we presented the prototype of the sytem - then called Open Biodiversity Knolwedge Management Sysyttem (OBKMS). The specification and design of OpenBiodiv was outlined by Senderov and Penev (2016) and in this talk we would like to showcase its pilot. We believe OpenBiodiv is possibly the first pilot-stage implenatation of a semantic system running on top of the biodiversity knowledge graph. OpenBiodiv has several components: OpenBiodiv ontology: general data model allowing the extraction of biodiversity knowledge from taxonomic articles or from databases such as GBIF. The ontology (in preparation, Journal of Biomedical Semantics, available on GitHub) incorporates several pre-existing models: Darwin-SW (Baskauf and Webb 2016), SPAR (Peroni 2014), Treatment Ontology, and several others. It defines classes, properties, and rules allowing to interlink these disparate ontologies and to create a LOD of biodiversity knowledge. New is the Taxonomic Name Usage class, accompanied by a Vocabulary of Taxonomic Statuses (created via an analysis of 4,000 Pensoft articles) allowing for the automated inference of the taxonomic status of Latinized scientific names. The ontology allows for multiple backbone taxonomies via the introduction of a Taxon Concept class (equivalent to DarwinCore Taxon) and Taxon Concept Labels as a subclass of biological name. The Biodiversity Knowledge Graph - a LOD dataset of information extracted from taxonomic literature and databases. In practice, it has realized part of what has been proposed during pro-iBiosphere and later discussed by Page (2016). Its main resources are articles, sub-article componets (tables, figures, treatents, references), author names, institution names, geographical locations, biological names, taxon concepts, and occurrences. Authors have been disambiguated via their affiliation with the use of fuzzy-logic based on the GraphDB Lucene connector. The graph interlinks: (1) Prospectively published literature via Pensoft Publishers. (2) Legacy literature via Plazi. (3) Well-known resources such as geographical places or institutions via DBPedia. (4) GBIF's backbone taxonomy as a default but not preferential hierarchy of taxon concepts. (5) OpenBiodiv id's are matched to nomenclator id's (e.g. ZooBank) whenever possible. Names form two networks in the graph: (1) A directed-acyclical graph (DAG) of supercedence that can be followed to the corresponding sinks to infer the currently applicable scientific name for a given taxon. (2) A network of bi-directional relations indicating the relatedness of names. These names may be compared to the related names inferred on the basis of distributional semantics by the co-organizers of this workshop (Nguyen et al. 2017). ropenbio: an R package for RDF*2-ization of biodiversity information resources according to the OpenBiodiv ontology. It will be submitted to the rOpenSci project. While many of its high-level functions are specific to OpenBiodiv, the low-level functions, and its RDF-ization framework can be used for any R-based RDF-ization effort. OpenBiodiv.net: a front-end of the system allowing users to run low-level SPARQL queries as well to use an extensible set of semantic apps running on top of the Biodiversity Knowledge Graph. The talk will showcase the progress from prototype to pilot stage of the system since TDWG2016. It will focus on the new features and about the web UI allowing researchers and other interested parties to already use the system. We will discuss several possible scenarios including semantic search and finding related names.

          Related collections

          Most cited references 5

          • Record: found
          • Abstract: not found
          • Article: not found
          Is Open Access

          Towards a biodiversity knowledge graph

           Roderic Page (2016)
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found
            Is Open Access

            The Open Biodiversity Knowledge Management System in Scholarly Publishing

            This project aims to develop and implement novel ways of publication, visualization, and dissemination of biodiversity and biodiversity-related data and thus bring the Open Biodiversity Knowledge Management System closer to fruition. In order to do so, we will develop new types of Enhanced Publications (EP's), which will allow automated data import into the manuscript and export from the manuscript and provide dynamic visualizations. These EP's will enable biodiversity researchers and taxonomists to streamline their work and publish more data-rich species descriptions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Constructing a biodiversity terminological inventory

              The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications.
                Bookmark

                Author and article information

                Journal
                Proceedings of TDWG
                TDWGProc
                Pensoft Publishers
                2535-0897
                August 07 2017
                August 07 2017
                : 1
                : e20084
                Article
                10.3897/tdwgproceedings.1.20084
                © 2017

                http://creativecommons.org/licenses/by/4.0/

                Comments

                Comment on this article