2
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Robust Integration of Biodiversity Data by Process- and State-based Representation of Object Histories and Modular Application Architecture

      , , , , ,

      Biodiversity Information Science and Standards

      Pensoft Publishers

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Biodiversity data is obtained by a variety of methodological approaches—including observation surveys, environmental sampling and biological object collection—employing diverse sample processing protocols and data transformations. While complete and accurate accounts of these data-generating processes are important to enable integration and informed reuse of data, the structure and content of published biodiversity data currently are often shaped by specific application goals. For example, data publishers that export specimen-based data from collection management systems for inclusion in aggregations like those in the Global Biodiversity Information Facility (GBIF) must frequently relax their internal models and produce unnatural joins to fit GBIF’s occurrences-based data structure. Third-party assertions over these aggregated data therefore assume the risk of irreproducibility or concept drift.Here we introduce process- and state-based representation of object histories as the main organizing principle for data about specimens and samples in Digital Information System for Natural History Data (DINA, Glöckler et al. 2020)-compliant collection management software (Fig. 1). Specimens, samples and objects in general are subjected to a variety of processes, including planned actions involving the object, e.g., collecting, preparing, subsampling, loaning. Object states are any particular mode of being of an object at a certain point in time. For example, any one intermediate step in preparing a collected specimen for long-term conservation in a collection would constitute an individual object state. An object’s history is the entire chain of these interrelated processes and states.We argue that using object histories as main conceptual modeling paradigm in DINA offers the generality required to accommodate a diverse, open set of use cases in biodiversity data representation, yet also offers the versatility to serve as basis for use-case specific data aggregation and presentation. Specifically, a representation based on object histories providesa coherent structure for documenting individual processes and states for any given object and for linking this documentation (e.g., textual descriptions or images pertaining to a given process or state),a natural representational structure of the real-world sequence of processes an object participates in and for the data generated in these processes (e.g., a DNA-extraction procedure and sequence information generated on its basis),a straightforward structure to link data about related objects (e.g., tissue samples, the biological specimen a bone is derived from) in a network of connected object histories.The approach is designed to be embedded in DINA’s modular application architecture, so that information on object histories can be accessed via corresponding APIs either through its own interfaces (Fig. 2) or by integration with external web services (Fig. 3). Viewing collection management tasks as part of object histories also informs delineation of modules to support these tasks with specialized functions and interfaces. It also admits the use of persistent, dereferencable identifiers for individual processes and states in object histories and for linking their representations to elements in ontologies and controlled vocabularies.In this contribution to the symposium, DINA's object histories as a main organizing principle for collection object data will be discussed and the utility of using it in the context of modular application architecture, data federation, and data integration in projects like BiCIKL will be illustrated.

          Related collections

          Most cited references 1

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          DINA—Development of open source and open services for natural history collections & research

          The DINA Consortium (DINA = “DIgital information system for NAtural history data”, https://dina-project.net) is a framework for like-minded practitioners of natural history collections to collaborate on the development of distributed, open source software that empowers and sustains collections management. Target collections include zoology, botany, mycology, geology, paleontology, and living collections. The DINA software will also permit the compilation of biodiversity inventories and will robustly support both observation and molecular data.The DINA Consortium focuses on an open source software philosophy and on community-driven open development. Contributors share their development resources and expertise for the benefit of all participants. The DINA System is explicitly designed as a loosely coupled set of web-enabled modules. At its core, this modular ecosystem includes strict guidelines for the structure of Web application programming interfaces (APIs), which guarantees the interoperability of all components (https://github.com/DINA-Web). Important to the DINA philosophy is that users (e.g., collection managers, curators) be actively engaged in an agile development process. This ensures that the product is pleasing for everyday use, includes efficient yet flexible workflows, and implements best practices in specimen data capture and management.There are three options for developing a DINA module:create a new module compliant with the specifications (Fig. 1),modify an existing code-base to attain compliance (Fig. 2), orwrap a compliant API around existing code that cannot be or may not be modified (e.g., infeasible, dependencies on other systems, closed code) (Fig. 3).All three of these scenarios have been applied in the modules recently developed: a module for molecular data (SeqDB), modules for multimedia, documents and agents data and a service module for printing labels and reports:The SeqDB collection management and molecular tracking system (Bilkhu et al. 2017) has evolved through two of these scenarios. Originally, the required architectural changes were going to be added into the codebase, but after some time, the development team recognised that the technical debt inherent in the project wasn’t worth the effort of modification and refactoring. Instead a new codebase was created bringing forward the best parts of the system oriented around the molecular data model for Sanger Sequencing and Next Generation Sequencing (NGS) workflows.In the case of the Multimedia and Document Store module and the Agents module, a brand new codebase was established whose technology choices were aligned with the DINA vision. These two modules have been created from fundamental use cases for collection management and digitization workflows and will continue to evolve as more modules come online and broaden their scope.The DINA Labels & Reporting module is a generic service for transforming data in arbitrary printable layouts based on customizable templates. In order to use the module in combination with data managed in collection management software Specify (http://specifysoftware.org) for printing labels of collection objects, we wrapped the Specify 7 API with a DINA-compliant API layer called the “DINA Specify Broker”. This allows for using the easy-to-use web-based template engine within the DINA Labels & Reports module without changing Specify’s codebase.In our presentation we will explain the DINA development philosophy and will outline benefits for different stakeholders who directly or indirectly use collections data and related research data in their daily workflows. We will also highlight opportunities for joining the DINA Consortium and how to best engage with members of DINA who share their expertise in natural science, biodiversity informatics and geoinformatics.
            Bookmark

            Author and article information

            Contributors
            (View ORCID Profile)
            (View ORCID Profile)
            (View ORCID Profile)
            (View ORCID Profile)
            Journal
            Biodiversity Information Science and Standards
            BISS
            Pensoft Publishers
            2535-0897
            September 14 2021
            September 14 2021
            : 5
            Article
            10.3897/biss.5.75178
            © 2021

            Comments

            Comment on this article