Mobilizing Data from Taxonomic Literature for an Iconic Species (Dinosauria, Theropoda, Tyrannosaurus rex)

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A vast amount of biodiversity data is reported in the primary taxonomic literature. In the past, we have demonstrated the use of semantic enhancement to extract data from taxonomic literature and make it available to a network of databases (Miller et al. 2015). For technical reasons, semantic enhancement of taxonomic literature is most efficient when customized according to the format of a particular journal. This journal-based approach captures and disseminates data on whatever taxa happen to be published therein. But if we want to extract all treatments on a particular taxon of interest, these are likely to be spread across multiple journals. Fortunately, the GoldenGATE Imagine document editor (Sautter 2019) is flexible enough to parse most taxonomic literature. Tyrannosaurus rex is an iconic dinosaur with broad public appeal, as well as the subject of more than a century of scholarship. The Naturalis Biodiversity Center recently acquired a specimen that has become a major attraction in the public exhibit space. For most species on earth, the primary taxonomic literature contains nearly everything that is known about it. Every described species on earth is the subject of one or more taxonomic treatments. A taxon-based approach to semantic enhancement can mobilize all this knowledge using the network of databases and resources that comprise the modern biodiversity informatics infrastructure. When a particular species is of special interest, a taxon-based approach to semantic enhancement can be a powerful tool for scholarship and communication. In light of this, we resolved to semantically enhance all taxonomic treatments on T. rex. Our objective was to make these treatments and associated data available for the broad range of stakeholders who might have an interest in this animal, including professional paleontologists, the curious public, and museum exhibits and public communications personnel. Among the routine parsing and data sharing activities in the Plazi workflow (Agosti and Egloff 2009), taxonomic treatments, as well as cited figures, are deposited in the Biodiversity Literature Repository (BLR), and occurrence records are shared with the Global Biodiversity Information Facility (GBIF). Treatment citations were enhanced with hyperlinks to the cited treatment on TreatmentBank, and specimen citations were linked to their entries on public facing collections databases. We used the OpenBiodiv biodiversity knowledge graph (Senderov et al. 2017) to discover other taxa mentioned together with T. rex, and to create a timeline of T. rex research to evaluate the impact of individual researchers and specimen repositories to T. rex research. We contributed treatment links to WikiData, and queried WikiData to discover identifiers to different platforms holding data about T. rex. We used bloodhound-tracker.net to disambiguate human agents, like collectors, identifiers, and authors. We evaluate the adequacy of the fields currently available to extract data from taxonomic treatments, and make recommendations for future standards.

Related collections

Most cited references 3

Record: found
Abstract: found
Article: found

Is Open Access

Taxonomic information exchange and copyright: the Plazi approach

Donat Agosti, Willi Egloff (2009)

Background A large part of our knowledge on the world's species is recorded in the corpus of biodiversity literature with well over hundred million pages, and is represented in natural history collections estimated at 2 – 3 billion specimens. But this body of knowledge is almost entirely in paper-print form and is not directly accessible through the Internet. For the digitization of this literature, new territories have to be chartered in the fields of technical, legal and social issues that presently impede its advance. The taxonomic literature seems especially destined for such a transformation. Discussion Plazi was founded as an association with the primary goal of transforming both the printed and, more recently, "born-digital" taxonomic literature into semantically enabled, enhanced documents. This includes the creation of a test body of literature, an XML schema modeling its logic content (TaxonX), the development of a mark-up editor (GoldenGATE) allowing also the enhancement of documents with links to external resources via Life Science Identifiers (LSID), a repository for publications and issuance of bibliographic identifiers, a dedicated server to serve the marked up content (the Plazi Search and Retrieval Server, SRS) and semantic tools to mine information. Plazi's workflow is designed to respect copyright protection and achieves extraction by observing exceptions and limitations existent in international copyright law. Conclusion The information found in Plazi's databases – taxonomic treatments as well as the metadata of the publications – are in the public domain and can therefore be used for further scientific research without any restriction, whether or not contained in copyrighted publications.

0 comments Cited 39 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Integrating and visualizing primary data from prospective and legacy taxonomic literature

Jeremy Miller, Donat Agosti, Lyubomir Penev … (2015)

Abstract Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae ). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.

0 comments Cited 15 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

OpenBiodiv: an Implementaion of a Semantic System Running on top of the Biodiversity Knowledge Graph

Viktor Senderov, Teodor Georgiev, Donat Agosti … (2017)

We present OpenBiodiv - an implementation of the Open Biodiversity Knowledge Management System. The need for an integrated information system serving the needs of the biodiversity community can be dated at least as far back as the sanctioning of the Bouchout declaration in 2007. The Bouchout declaration proposes to make biodiversity knowledge freely available as Linked Open Data (LOD)*1. At TDWG2016 Fig. 1) we presented the prototype of the sytem - then called Open Biodiversity Knolwedge Management Sysyttem (OBKMS). The specification and design of OpenBiodiv was outlined by Senderov and Penev (2016) and in this talk we would like to showcase its pilot. We believe OpenBiodiv is possibly the first pilot-stage implenatation of a semantic system running on top of the biodiversity knowledge graph. OpenBiodiv has several components: OpenBiodiv ontology: general data model allowing the extraction of biodiversity knowledge from taxonomic articles or from databases such as GBIF. The ontology (in preparation, Journal of Biomedical Semantics, available on GitHub) incorporates several pre-existing models: Darwin-SW (Baskauf and Webb 2016), SPAR (Peroni 2014), Treatment Ontology, and several others. It defines classes, properties, and rules allowing to interlink these disparate ontologies and to create a LOD of biodiversity knowledge. New is the Taxonomic Name Usage class, accompanied by a Vocabulary of Taxonomic Statuses (created via an analysis of 4,000 Pensoft articles) allowing for the automated inference of the taxonomic status of Latinized scientific names. The ontology allows for multiple backbone taxonomies via the introduction of a Taxon Concept class (equivalent to DarwinCore Taxon) and Taxon Concept Labels as a subclass of biological name. The Biodiversity Knowledge Graph - a LOD dataset of information extracted from taxonomic literature and databases. In practice, it has realized part of what has been proposed during pro-iBiosphere and later discussed by Page (2016). Its main resources are articles, sub-article componets (tables, figures, treatents, references), author names, institution names, geographical locations, biological names, taxon concepts, and occurrences. Authors have been disambiguated via their affiliation with the use of fuzzy-logic based on the GraphDB Lucene connector. The graph interlinks: (1) Prospectively published literature via Pensoft Publishers. (2) Legacy literature via Plazi. (3) Well-known resources such as geographical places or institutions via DBPedia. (4) GBIF's backbone taxonomy as a default but not preferential hierarchy of taxon concepts. (5) OpenBiodiv id's are matched to nomenclator id's (e.g. ZooBank) whenever possible. Names form two networks in the graph: (1) A directed-acyclical graph (DAG) of supercedence that can be followed to the corresponding sinks to infer the currently applicable scientific name for a given taxon. (2) A network of bi-directional relations indicating the relatedness of names. These names may be compared to the related names inferred on the basis of distributional semantics by the co-organizers of this workshop (Nguyen et al. 2017). ropenbio: an R package for RDF*2-ization of biodiversity information resources according to the OpenBiodiv ontology. It will be submitted to the rOpenSci project. While many of its high-level functions are specific to OpenBiodiv, the low-level functions, and its RDF-ization framework can be used for any R-based RDF-ization effort. OpenBiodiv.net: a front-end of the system allowing users to run low-level SPARQL queries as well to use an extensible set of semantic apps running on top of the Biodiversity Knowledge Graph. The talk will showcase the progress from prototype to pilot stage of the system since TDWG2016. It will focus on the new features and about the web UI allowing researchers and other interested parties to already use the system. We will discuss several possible scenarios including semantic search and finding related names.

0 comments Cited 3 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Biodiversity Information Science and Standards

Abbreviated Title: BISS

Publisher: Pensoft Publishers

ISSN (Electronic): 2535-0897

Publication date Created: June 13 2019

Publication date (Electronic): June 13 2019

Volume: 3

Article

DOI: 10.3897/biss.3.37078

SO-VID: aa428317-ba75-4eef-8ab4-52aa91530c10

License:

http://creativecommons.org/licenses/by/4.0/

History

Data availability:

Comments

Comment on this article

scite_

Most referenced authors 27

See all reference authors

Publish your biodiversity research with us!

Submit your article here.

Mobilizing Data from Taxonomic Literature for an Iconic Species (Dinosauria, Theropoda, Tyrannosaurus rex)

Read this article at

Abstract

Related collections

Pensoft Biodiversity

Most cited references 3

Taxonomic information exchange and copyright: the Plazi approach

Integrating and visualizing primary data from prospective and legacy taxonomic literature

OpenBiodiv: an Implementaion of a Semantic System Running on top of the Biodiversity Knowledge Graph

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 31

Most referenced authors 27