Developing Standards for Improved Data Quality and for Selecting Fit for Use Biodiversity Data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The quality of biodiversity data publicly accessible via aggregators such as GBIF (Global Biodiversity Information Facility), the ALA (Atlas of Living Australia), iDigBio (Integrated Digitized Biocollections), and OBIS (Ocean Biogeographic Information System) is often questioned, especially by the research community.The Data Quality Interest Group, established by Biodiversity Information Standards (TDWG) and GBIF, has been engaged in four main activities: developing a framework for the assessment and management of data quality using a fitness for use approach; defining a core set of standardised tests and associated assertions based on Darwin Core terms; gathering and classifying user stories to form contextual-themed use cases, such as species distribution modelling, agrobiodiversity, and invasive species; and developing a standardised format for building and managing controlled vocabularies of values.Using the developed framework, data quality profiles have been built from use cases to represent user needs. Quality assertions can then be used to filter data suitable for a purpose. The assertions can also be used to provide feedback to data providers and custodians to assist in improving data quality at the source. A case study, using two different implementations of tests and assertions based around the Darwin Core "Event Date" terms, were also tested against GBIF data, to demonstrate that the tests are implementation agnostic, can be run on large aggregated datasets, and can make biodiversity data more fit for typical research uses.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: not found

New developments in museum-based informatics and applications in biodiversity analysis.

Catherine Graham, Simon Ferrier, Falk Huettman … (2004)

Information from natural history collections (NHCs) about the diversity, taxonomy and historical distributions of species worldwide is becoming increasingly available over the Internet. In light of this relatively new and rapidly increasing resource, we critically review its utility and limitations for addressing a diverse array of applications. When integrated with spatial environmental data, NHC data can be used to study a broad range of topics, from aspects of ecological and evolutionary theory, to applications in conservation, agriculture and human health. There are challenges inherent to using NHC data, such as taxonomic inaccuracies and biases in the spatial coverage of data, which require consideration. Promising research frontiers include the integration of NHC data with information from comparative genomics and phylogenetics, and stronger connections between the environmental analysis of NHC data and experimental and field-based tests of hypotheses.