2
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Tackling Data Quality Challenges in the Finnish Biodiversity Information Facility (FinBIF)

      , , ,

      Biodiversity Information Science and Standards

      Pensoft Publishers

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Finnish Biodiversity Information Facility (FinBIF) Research Infrastructure (Schulman et al. 2021) is a national service with a broad coverage of the components of biodiversity informatics (Bingham et al. 2017). Data flows are managed under a single information technology (IT) architecture. Services are available in a single, branded on-line portal. Data are collated from all relevant sources e.g., research institutes, scientific collections, public authorities and citizen science projects, whose data represent a major contribution. The challenge is to analyse, classify and share good quality data in a way that the user understands its utility.Need for quality dataThe philosophy of FinBIF is that all observation records are important, and that all data are assessed for quality and able to be annotated. The challenge is that, in practice, many users desire data with 100% reliability. In our experience, most user concerns about data quality are related to citizen science data. Researchers are usually able to manage raw data to serve their purposes. However, decision-making authorities often have less capacity to analyse the data and thus require data that can be used instantly. Therefore, we need tools to provide users the data that are the most relevant and reliable for their specific use. For all users, standardized metadata (information about datasets) are key, when the user has doubts about the fitness-for-use of a particular dataset. There is also a need to provide data in different formats to serve various users. Finally, the service has to be machine-actionable (using an application programming interface (API) and R-package) as well as human-accessible for viewing and downloading data.Quality assignment FinBIF data accuracy varies significantly within and between datasets, and observers. Two quality-based classifications suitable for filtering are therefore applied. The dataset origin filter is based on the quality of a whole dataset (e.g. citizen science project) and includes three broad classes assigned with an appropriate quality label: Datasets by Professionals, by Specialists and by Citizen Scientists. The observation reliability filter is based on a single observation and on annotations by FinBIF users. This classification includes Expert verified, Community verified, Unassessed (default for all records), Uncertain, and Erroneous. The dataset origin does not necessarily determine the quality of the individual records in it. Observations made by citizen scientists are often accurate, while there may be errors in the professionally collected data. Records are frequently subject to annotation, which raises their quality over time (e.g., iNaturalist). Naturally, evidence (e.g., media, detailed descriptions, specimens) is needed for reliable identification.Annotating dataWhen observations are compiled at FinBIF’s portal (Laji.fi), they are initially “Unassessed” (unless they have otherwise been assessed at the original source). When annotating occurrences, volunteers can make various entries using the tools provided. The aim of the commentary is to improve the quality of the observation data. Annotators are divided into two categories with two different roles:As a basic user, anyone who has logged in at Laji.fi can make comments or tag observations for review by experts.Users defined as experts have wider rights than basic users and their comments carry more weight. The most desired actions of expert users are to classify observations into confidence levels or to give them new or refined identifications.Information about new comments passes to the observer if the observation is recorded by using the FinBIF Observation Management System “Notebook”. However, comments cannot yet be automatically forwarded e.g., to the primary data management systems at the original source.Annotations add extra indications of quality. They do not replace or delete the original information. Nevertheless, annotations can change a record’s taxonomic identification, and by default, a record will be handled based on its latest identification.R-package for researchers and Public Authority Portal (PAP) for decision makersFinBIF has produced an R programming language interface to its API, which makes the publicly available data in FinBIF accessible from within R. For authorities, the PAP offers direct access to all available species information to authorised users, including sensitive and restricted-use data.

          Related collections

          Most cited references 2

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Biodiversity Informatics Landscape: Elements, Connections and Opportunities

          There are a multitude of biodiversity informatics projects, datasets, databases and initiatives at the global level, and many more at regional, national, and sometimes local levels. In such a complex landscape, it can be unclear how different elements relate to each other. Based on a high-level review of global and European-level elements, we present a map of the biodiversity informatics landscape. This is a first attempt at identifying key datasets/databases and data services, and mapping them in a way that can be used to identify the links, gaps and redundancies in the landscape. While the map is predominantly focused on elements with a global scope, the sub-global focus at the European-level was incorporated in the map in order to demonstrate how a regional network such as the European Biodiversity Observation Network (EU BON) can usefully contribute to connecting some of the nodes within the landscape. We identify 74 elements, and find that the informatics landscape is complex in terms of the characteristics and diversity of these elements, and that there is high variability in their level of connectedness. Overall, the landscape is highly connected, with one element boasting 28 connections. The average "degrees of separation" between elements is low, and the landscape is deemed relatively robust to failures since there is no single point that information flows through. Examples of possible effort duplication are presented, and the inclusion of five policy-level elements in the map helps illustrate how informatics products can contribute to global processes that define and direct political targets. Beyond simply describing the existing landscape, this map will support a better understanding of the landscape’s current structure and functioning, enabling responsible institutions to establish or strengthen collaborations, work towards avoiding effort duplication, and facilitate access to the biodiversity data, information and knowledge required to support effective decision-making, in the context of comparatively limited funding for biodiversity knowledge and conservation. To support this, we provide the input matrix and code that created this map as supplementary materials, so that readers can more closely examine the links in the landscape, and edit the map to suit their own purposes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Finnish Biodiversity Information Facility as a best-practice model for biodiversity data infrastructures

            Biodiversity informatics has advanced rapidly with the maturation of major biodiversity data infrastructures (BDDIs), such as the Global Biodiversity Information Facility sharing unprecedented data volumes. Nevertheless, taxonomic, temporal and spatial data coverage remains unsatisfactory. With an increasing data need, the global BDDIs require continuous inflow from local data mobilisation, and national BDDIs are being developed around the world. The global BDDIs are specialised in certain data types or data life cycle stages which, despite possible merits, renders the BDDI landscape fragmented and complex. That this often is repeated at the national level creates counterproductive redundancy, complicates user services, and frustrates funders. Here, we present the Finnish Biodiversity Information Facility (FinBIF) as a model of an all-inclusive BDDI. It integrates relevant data types and phases of the data life cycle, manages them under one IT architecture, and distributes the data through one service portal under one brand. FinBIF has experienced diverse funder engagement and rapid user uptake. Therefore, we suggest the integrated and inclusive approach be adopted in national BDDI development.
              Bookmark

              Author and article information

              Contributors
              Journal
              Biodiversity Information Science and Standards
              BISS
              Pensoft Publishers
              2535-0897
              September 21 2021
              September 21 2021
              : 5
              Article
              10.3897/biss.5.75559
              © 2021

              Comments

              Comment on this article