The taxonomic name resolution service: an online tool for automated standardization of plant names

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science.

Results

The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets.

Conclusions

We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.

Related collections

Most cited references 21

Record: found
Abstract: found
Article: found

Is Open Access

GenBank

Dennis A Benson, Ilene Karsch-Mizrachi, David Lipman … (2009)

GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 300 000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank® staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

0 comments Cited 295 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology.

Alejandro Bortolus (2008)

Why do ecologists seem to underestimate the consequences of using bad taxonomy? Is it because the consequences of doing so have not been yet scrutinized well enough? Is it because these consequences are irrelevant? In this paper I examine and discuss these questions, focusing on the fact that because ecological works provide baseline information for many other biological disciplines, they play a key role in spreading and magnifying the abundance of a variety of conceptual and methodological errors. Although overlooked and underestimated, this cascade-like process originates from trivial taxonomical problems that affect hypotheses and ideas, but it soon shifts into a profound practical problem affecting our knowledge about nature, as well as the ecosystem structure and functioning and the efficiency of human health care programs. In order to improve the intercommunication among disciplines, I propose a set of specific requirements that peer reviewed journals should request from all authors, and I also advocate for urgent institutional and financial support directed at reinvigorating the formation of scientific collections that integrate taxonomy and ecology.

0 comments Cited 90 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

The Global Index of Vegetation-Plot Databases (GIVD): a new resource for vegetation science

Joop H J Schaminée, Jürgen Dengler, Florian Jansen … (2011)

0 comments Cited 88 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2013

Publication date (Electronic): 16 January 2013

Volume: 14

Page: 16

Affiliations

[1 ]Department of Ecology and Evolutionary Biology, University of Arizona Tucson, P.O. Box 210088, Tucson, AZ, 85721, USA

[2 ]The iPlant Collaborative, Thomas W. Keating Bioresearch Building, 1657 East Helen Street, Tucson, AZ, 85721, USA

[3 ]BIO5 Institute, 1657 East Helen Street, PO Box 210240, Tucson, AZ, 85721-0240, USA

[4 ]Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724-2202, USA

[5 ]7 MBL street, Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, 02543, USA

[6 ]Divisional Data Centre, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, Tasmania, 7001, Australia

[7 ]Yale-NUS College, 6 College Avenue East, Singapore, 138614, Singapore

[8 ]Missouri Botanical Garden, 4344 Shaw Blvd. |, St. Louis, MO, 63110, USA

[9 ]Department of Biology, CB 3280, University of North Carolina, Chapel Hill, NC, 27599-3280, USA

[10 ]The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA

Article

Publisher ID: 1471-2105-14-16

DOI: 10.1186/1471-2105-14-16

PMC ID: 3554605

PubMed ID: 23324024

SO-VID: 7e940bf2-2e5f-4b11-bf81-13eabd635bfb

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The taxonomic name resolution service: an online tool for automated standardization of plant names

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Pensoft Biodiversity

Most cited references 21

GenBank

Error cascades in the biological sciences: the unwanted consequences of using bad taxonomy in ecology.

The Global Index of Vegetation-Plot Databases (GIVD): a new resource for vegetation science

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 48

Cited by 194

Most referenced authors 1,046