28
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Publish your biodiversity research with us!

      Submit your article here.

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Comprehensive and Standards-Aware Common Data Model (CDM) for Taxonomic Research

      , , , , ,
      Proceedings of TDWG
      Pensoft Publishers

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The EDIT Common Data Model (CDM) (FUB, BGBM 2008) is the centrepiece of the EDIT Platform for Cybertaxonomy (FUB, BGBM 2011, Ciardelli et al. 2009). Building on modelling efforts reaching back to the 1990ies, it aims to combine existing standards relevant to the taxonomic domain (but often designed for data exchange) with requirements of modern taxonomic tools. Modelled in the Unified Modelling Language (UML) (Booch et al. 2005), it offers an object oriented view on the information domain managed by expert taxonomists that is implemented independent of the used operating system and database management system (DBMS). Being used in various national and international research projects with diverse foci over the past decade, the model evolved and became the common base of a variety of taxonomic projects, such as floras, faunas and checklists (see FUB, BGBM 2016 for a number of data portals created and made publicly available by different projects). The CDM is strictly oriented towards the needs of the taxonomic experts community. Where requirements are complex it tries to reflect them reasonably rather than introducing ambiguity or reduced functionality via (over-)simplification. Where simplification is possible it tries to stay or become simple. Simplification on the model level is achieved by implementing business rules via constraints rather than via typification and subclassing. Simplification on the user interface level is achieved by numerous options for customisation. Being used as a generic model for a variety of application types and use cases, it is adaptable and extendable by users and developers. It uses a combination of static and dynamic typification to allow both efficient handling of complex but well-defined data domains such as taxonomic classifications and nomenclature as well as less well-defined flexible domains like factual and descriptive data. Additionally it allows the creation of more than 30 types of user defined vocabularies such as those for taxonomic rank, nomenclatural status, name-to-name relationships, geographic area, presence status, etc. A strong focus is set on good scientific praxis by making the source of almost all data citable in detail and offering data lineage to trace data back to its roots. It is also easy to reflect multiple opinions in parallel, e.g. differing taxonomic concepts (Berendsohn 1995, Berendsohn & al., this session) or several descriptive treatments obtained from different regional floras or faunas. The CDM attempts to comprehensively cover the data used in the taxonomic domain - nomenclature, taxonomy (including concepts), taxon distribution data, descriptive data of all kinds, including morphological data referring to taxa and/or specimens, images and multimedia data of various kinds, and a complex system covering specimens and specimen derivatives down to DNA samples and sequences (Kilian et al. 2015, Stöver and Müller 2015) that mirrors the complexity of knowledge accumulation in the taxonomic research process. In the context of the EDIT Platform, several applications have been developed based on the CDM and the library that provides the API and web Service interfaces based on the CDM (see Kohlbecker & al. and Güntsch & al., this session). In some areas the CDM is still evolving - although the basic structures are present, questions of application development feed back into modelling decisions. However, a "no-shortcuts" approach to modelling has variously delayed application development in the past, but it now pays off: the Platform can rapidly adapt to changing requirements from different projects and taxonomic specialists.

          Related collections

          Most cited references2

          • Record: found
          • Abstract: not found
          • Article: not found

          The Concept of "Potential Taxa" in Databases

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens

            We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work. Database URL: http://campanula.e-taxonomy.net/
              Bookmark

              Author and article information

              Journal
              Proceedings of TDWG
              TDWGProc
              Pensoft Publishers
              2535-0897
              August 16 2017
              August 16 2017
              : 1
              : e20367
              Article
              10.3897/tdwgproceedings.1.20367
              fe2be53e-f49a-4c5c-9412-1cd2adc671ca
              © 2017

              http://creativecommons.org/licenses/by/4.0/

              History

              Comments

              Comment on this article