DINA—Development of open source and open services for natural history collections &amp; research

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The DINA Consortium (DINA = “DIgital information system for NAtural history data”, https://dina-project.net) is a framework for like-minded practitioners of natural history collections to collaborate on the development of distributed, open source software that empowers and sustains collections management. Target collections include zoology, botany, mycology, geology, paleontology, and living collections. The DINA software will also permit the compilation of biodiversity inventories and will robustly support both observation and molecular data.The DINA Consortium focuses on an open source software philosophy and on community-driven open development. Contributors share their development resources and expertise for the benefit of all participants. The DINA System is explicitly designed as a loosely coupled set of web-enabled modules. At its core, this modular ecosystem includes strict guidelines for the structure of Web application programming interfaces (APIs), which guarantees the interoperability of all components (https://github.com/DINA-Web). Important to the DINA philosophy is that users (e.g., collection managers, curators) be actively engaged in an agile development process. This ensures that the product is pleasing for everyday use, includes efficient yet flexible workflows, and implements best practices in specimen data capture and management.There are three options for developing a DINA module:create a new module compliant with the specifications (Fig. 1),modify an existing code-base to attain compliance (Fig. 2), orwrap a compliant API around existing code that cannot be or may not be modified (e.g., infeasible, dependencies on other systems, closed code) (Fig. 3).All three of these scenarios have been applied in the modules recently developed: a module for molecular data (SeqDB), modules for multimedia, documents and agents data and a service module for printing labels and reports:The SeqDB collection management and molecular tracking system (Bilkhu et al. 2017) has evolved through two of these scenarios. Originally, the required architectural changes were going to be added into the codebase, but after some time, the development team recognised that the technical debt inherent in the project wasn’t worth the effort of modification and refactoring. Instead a new codebase was created bringing forward the best parts of the system oriented around the molecular data model for Sanger Sequencing and Next Generation Sequencing (NGS) workflows.In the case of the Multimedia and Document Store module and the Agents module, a brand new codebase was established whose technology choices were aligned with the DINA vision. These two modules have been created from fundamental use cases for collection management and digitization workflows and will continue to evolve as more modules come online and broaden their scope.The DINA Labels & Reporting module is a generic service for transforming data in arbitrary printable layouts based on customizable templates. In order to use the module in combination with data managed in collection management software Specify (http://specifysoftware.org) for printing labels of collection objects, we wrapped the Specify 7 API with a DINA-compliant API layer called the “DINA Specify Broker”. This allows for using the easy-to-use web-based template engine within the DINA Labels & Reports module without changing Specify’s codebase.In our presentation we will explain the DINA development philosophy and will outline benefits for different stakeholders who directly or indirectly use collections data and related research data in their daily workflows. We will also highlight opportunities for joining the DINA Consortium and how to best engage with members of DINA who share their expertise in natural science, biodiversity informatics and geoinformatics.

Related collections

Most cited references 1

Record: found
Abstract: found
Article: found

Is Open Access

SeqDB: Biological Collection Management with Integrated DNA Sequence Tracking

Satpal Bilkhu, Nazir El-Kayssi, Matthew Poff … (2017)

Agriculture and Agri-Food Canada (AAFC) is home to a world-class taxonomy program based on Canada’s national agricultural collections for Botany, Mycology and Entomology. These collections contain valuable resources, such as type specimen for authoritative identification using approaches that include phenotyping, DNA barcoding, and whole genome sequencing. These authoritative references allow for accurate identification of the taxonomic biodiversity found in environmental samples in fields such as metagenomics. AAFC’s internally developed web application, termed SeqDB, tracks the complete workflow and provenance chain from source specimen information through DNA extractions, PCR reactions, and sequencing leading to binary DNA sequence files. In the context of Next Generation Sequencing (NGS) of environmental samples, SeqDB tracks sampling metadata, DNA extractions, and library preparation workflow leading to demultiplexed sequence files. SeqDB implements the Taxonomic Databases Working Group (TDWG) Darwin Core standard Wieczorek et al. 2012 for Biodiversity Occurrence Data, as well as the Genome Standards Consortium (GSC) Minimum Information about any (X) Sequences (MIxS) specification Yilmaz et al. 2011. When coupled with the built-in data standards validation system, this has led to the ability to search consistent metadata across multiple studies. Furthermore, the application enables tracking the physical storage of the aforementioned specimens and their derivative molecular extracts using an integrated barcode printing and reading system. All the information is presented using a graphical user interface that features intuitive molecular workflows as well as a RESTful API that facilitates integration with external applications and programmatic access of the data. The success of SeqDB has been due to the close collaboration with scientists and technicians undertaking molecular research involving the national collection, and the centralization of their data sets in an access controlled relational database implementing internationally recognized standards. We will describe the overall system, and some of our lessons learned in building it.