107
views
0
recommends
+1 Recommend
1 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work.

          Database URL: http://campanula.e-taxonomy.net/

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: not found

          The availability of research data declines rapidly with article age.

          Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2-4], and journal [5, 6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8-11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested data sets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives. Copyright © 2014 Elsevier Ltd. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BioJava: an open-source framework for bioinformatics in 2012

            Motivation: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality. Results: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model. Availability: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists Contact: andreas.prlic@gmail.com
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

              In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.
                Bookmark

                Author and article information

                Journal
                Database (Oxford)
                Database (Oxford)
                databa
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press
                1758-0463
                2015
                30 September 2015
                30 September 2015
                : 2015
                : bav094
                Affiliations
                1Botanic Garden and Botanical Museum Berlin-Dahlem, Dahlem Centre of Plant Sciences, Freie Universität Berlin, Königin-Luise-Str. 6–8, 14195 Berlin, Germany and
                2Institut für Evolution und Biodiversität und Botanischer Garten Münster, Westfälische Wilhelms-Universität Münster, Hüfferstr. 1, 48149 Münster, Germany
                Author notes
                *Corresponding author: Tel: 0049 30 83850129; Fax: 0049 30 838450129; Email: n.kilian@ 123456bgbm.org

                Citation details: Kilian,N., Henning,T., Plitzner,P., et al. Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens. Database (2015) Vol. 2015: article ID bav094; doi:10.1093/database/bav094

                Article
                bav094
                10.1093/database/bav094
                4589695
                26424081
                e75a5e79-162a-4f7d-b5ac-73a8f994eabf
                © The Author(s) 2015. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 19 June 2015
                : 1 September 2015
                : 2 September 2015
                Page count
                Pages: 19
                Categories
                Original Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article