26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Taxon Hypothesis Paradigm—On the Unambiguous Detection and Communication of Taxa

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Here, we describe the taxon hypothesis (TH) paradigm, which covers the construction, identification, and communication of taxa as datasets. Defining taxa as datasets of individuals and their traits will make taxon identification and most importantly communication of taxa precise and reproducible. This will allow datasets with standardized and atomized traits to be used digitally in identification pipelines and communicated through persistent identifiers. Such datasets are particularly useful in the context of formally undescribed or even physically undiscovered species if data such as sequences from samples of environmental DNA (eDNA) are available. Implementing the TH paradigm will to some extent remove the impediment to hastily discover and formally describe all extant species in that the TH paradigm allows discovery and communication of new species and other taxa also in the absence of formal descriptions. The TH datasets can be connected to a taxonomic backbone providing access to the vast information associated with the tree of life. In parallel to the description of the TH paradigm, we demonstrate how it is implemented in the UNITE digital taxon communication system. UNITE TH datasets include rich data on individuals and their rDNA ITS sequences. These datasets are equipped with digital object identifiers (DOI) that serve to fix their identity in our communication. All datasets are also connected to a GBIF taxonomic backbone. Researchers processing their eDNA samples using UNITE datasets will, thus, be able to publish their findings as taxon occurrences in the GBIF data portal. UNITE species hypothesis (species level THs) datasets are increasingly utilized in taxon identification pipelines and even formally undescribed species can be identified and communicated by using UNITE. The TH paradigm seeks to achieve unambiguous, unique, and traceable communication of taxa and their properties at any level of the tree of life. It offers a rapid way to discover and communicate undescribed species in identification pipelines and data portals before they are lost to the sixth mass extinction.

          Related collections

          Most cited references63

          • Record: found
          • Abstract: found
          • Article: not found

          Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

          S Altschul (1997)
          The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Search and clustering orders of magnitude faster than BLAST.

            Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

              K Katoh (2002)
              A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
                Bookmark

                Author and article information

                Journal
                Microorganisms
                Microorganisms
                microorganisms
                Microorganisms
                MDPI
                2076-2607
                30 November 2020
                December 2020
                : 8
                : 12
                : 1910
                Affiliations
                [1 ]Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; kadri.poldmaa@ 123456ut.ee (K.P.); ave.suija@ 123456ut.ee (A.S.); kristjan.adojaan@ 123456ut.ee (K.A.); filipp.ivanov@ 123456ut.ee (F.I.); timo.piirmann@ 123456ut.ee (T.P.); raivo.pohonen@ 123456ut.ee (R.P.); allan.zirk@ 123456ut.ee (A.Z.); kessy.abarenkov@ 123456ut.ee (K.A.)
                [2 ]Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; leho.tedersoo@ 123456ut.ee (L.T.); irja.saar@ 123456ut.ee (I.S.); anton.savchenko@ 123456ut.ee (A.S.); iryna.yatsiuk@ 123456ut.ee (I.Y.)
                [3 ]Department of Biological and Environmental Sciences, Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden; henrik.nilsson@ 123456bioenv.gu.se (H.R.N.); k.h.larsson@ 123456nhm.uio.no (K.-H.L.)
                [4 ]Global Biodiversity Information Facility, 2100 Copenhagen, Denmark; dschigel@ 123456gbif.org (D.S.); tsjeppesen@ 123456gbif.org (T.S.J.)
                [5 ]Royal Botanic Gardens Victoria, Birdwood Ave, Melbourne, Victoria 3004, Australia; tom.may@ 123456rbg.vic.gov.au
                [6 ]The James Hutton Institute, Craigiebuckler, Aberdeen AB15 8QH, UK; Andy.Taylor@ 123456hutton.ac.uk
                [7 ]Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, St Machar Drive, Aberdeen AB24 3UU, UK
                [8 ]GLOBE Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 København, Denmark; tobiasgf@ 123456sund.ku.dk
                [9 ]Systematic Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden; bjorn.lindahl@ 123456slu.se
                Author notes
                [* ]Correspondence: urmas.koljalg@ 123456ut.ee ; Tel.: +372-53-412-823
                Author information
                https://orcid.org/0000-0002-8052-0107
                https://orcid.org/0000-0002-2919-1168
                https://orcid.org/0000-0002-1248-3674
                https://orcid.org/0000-0003-2214-4972
                https://orcid.org/0000-0003-1691-239X
                https://orcid.org/0000-0001-8453-9721
                Article
                microorganisms-08-01910
                10.3390/microorganisms8121910
                7760934
                33266327
                444ecf04-73b2-4158-a41f-e26a3de54f36
                © 2020 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 31 October 2020
                : 24 November 2020
                Categories
                Article

                microbial species,taxonomy,dna taxonomy,biodiversity informatics,discovery of species,taxon hypotheses,species hypotheses,metabarcoding

                Comments

                Comment on this article