257
views
1
recommends
+1 Recommend
1 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud

      research-article
      ,
      PeerJ Computer Science
      PeerJ Inc.
      Natural language processing, Semantic web, Semantic publishing

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation. Finding relevant scientific literature is one of the essential tasks researchers are facing on a daily basis. Digital libraries and web information retrieval techniques provide rapid access to a vast amount of scientific literature. However, no further automated support is available that would enable fine-grained access to the knowledge ‘stored’ in these documents. The emerging domain of Semantic Publishing aims at making scientific knowledge accessible to both humans and machines, by adding semantic annotations to content, such as a publication’s contributions, methods, or application domains. However, despite the promises of better knowledge access, the manual annotation of existing research literature is prohibitively expensive for wide-spread adoption. We argue that a novel combination of three distinct methods can significantly advance this vision in a fully-automated way: (i) Natural Language Processing (NLP) for Rhetorical Entity (RE) detection; (ii) Named Entity (NE) recognition based on the Linked Open Data (LOD) cloud; and (iii) automatic knowledge base construction for both NEs and REs using semantic web ontologies that interconnect entities in documents with the machine-readable LOD cloud.

          Results. We present a complete workflow to transform scientific literature into a semantic knowledge base, based on the W3C standards RDF and RDFS. A text mining pipeline, implemented based on the GATE framework, automatically extracts rhetorical entities of type Claims and Contributions from full-text scientific literature. These REs are further enriched with named entities, represented as URIs to the linked open data cloud, by integrating the DBpedia Spotlight tool into our workflow. Text mining results are stored in a knowledge base through a flexible export process that provides for a dynamic mapping of semantic annotations to LOD vocabularies through rules stored in the knowledge base. We created a gold standard corpus from computer science conference proceedings and journal articles, where Claim and Contribution sentences are manually annotated with their respective types using LOD URIs. The performance of the RE detection phase is evaluated against this corpus, where it achieves an average F-measure of 0.73. We further demonstrate a number of semantic queries that show how the generated knowledge base can provide support for numerous use cases in managing scientific literature.

          Availability. All software presented in this paper is available under open source licenses at http://www.semanticsoftware.info/semantic-scientific-literature-peerj-2015-supplements. Development releases of individual components are additionally available on our GitHub page at https://github.com/SemanticSoftwareLab.

          Most cited references36

          • Record: found
          • Abstract: not found
          • Article: not found

          Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Improving efficiency and accuracy in multilingual entity extraction

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Publishing on the semantic web.

                Bookmark

                Author and article information

                Contributors
                Journal
                peerj-cs
                PeerJ Computer Science
                PeerJ Comput. Sci.
                PeerJ Inc. (San Francisco, USA )
                2376-5992
                9 December 2015
                : 1
                : e37
                Affiliations
                [-1] Semantic Software Lab, Department of Computer Science and Software Engineering, Concordia University , Montréal, Québec, Canada
                Article
                cs-37
                10.7717/peerj-cs.37
                cd7b9247-9b9f-4876-960e-02d4e2f78584
                © 2015 Sateli and Witte

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                History
                : 4 August 2015
                : 13 November 2015
                Funding
                Funded by: NSERC Discovery Grant
                This work was partially funded by an NSERC Discovery Grant. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Artificial Intelligence
                Digital Libraries
                Natural Language and Speech

                Computer science
                Natural language processing,Semantic web,Semantic publishing
                Computer science
                Natural language processing, Semantic web, Semantic publishing

                Comments

                Comment on this article