17
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Publish your biodiversity research with us!

      Submit your article here.

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Flora Prepper: Preparing floras for morphological parsing and integration

      Biodiversity Information Science and Standards
      Pensoft Publishers

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The increased availability of digital floras and the application of optical character recognition (OCR) to digitized texts has resulted in exciting opportunities for flora data mining. For example, the software package CharaParser has been developed for the semantic annotation of morphological descriptions from taxonomic treatments (Cui 2012). However, after digitization and OCR processing and before parsing of morphological treatments can begin, content types must be annotated (i.e., text represents names, morphology, discussion or distribution). In addition to enabling morphological parsing, content type annotation also facilitates content search and data linkage. For example, by annotating pieces of a floral treatment, assertions from various floras of the same type can be combined into a single document (i.e., a "mash-up" floral treatment). Several products and pipelines have been developed for the semantic annotation, or mark-up, of taxonomic documents (e.g., GoldenGATE, FlorML; Sautter et al. 2012, Hamann et al. 2014). However, these products lack a combination of both ease of implementation (e.g., the ability to run as a script in a programmatic workflow) and the use of modern parsing methods, such as text mining and Natural Language Processing (NLP) approaches. Here I present a pilot project implementing text mining and NLP approaches to marking-up floras implemented in Python. I will describe the success of the project, and summarize lessons learned, especially in relation to previous flora markup projects. Annotation of existing flora documents is an essential step towards building next-generation floras (i.e., mash-ups and enhanced floras as platforms) and enables automated trait extraction. Building an easy-to-use access point to modern text mining and NLP techniques for botanical literature will allow for more flexible and responsive flora annotation, and is an important step towards realizing botanical data integration goals.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: not found
          • Article: not found

          CharaParser for fine-grained semantic annotation of organism morphological descriptions

          Hong Cui (2012)
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Detailed mark-up of semi-monographic legacy taxonomic works using FlorML

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              SEMI-AUTOMATED XML MARKUP OF BIOSYSTEMATIC LEGACY LITERATURE WITH THE GOLDENGATE EDITOR

                Bookmark

                Author and article information

                Journal
                Biodiversity Information Science and Standards
                BISS
                Pensoft Publishers
                2535-0897
                July 04 2019
                July 04 2019
                : 3
                Article
                10.3897/biss.3.37743
                d66fe6bd-2373-4c22-be22-c5e83be9d070
                © 2019

                https://creativecommons.org/share-your-work/public-domain/cc0/

                History

                Comments

                Comment on this article