+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Negation and Speculation Detection for Improving Information Retrieval Effectiveness

      Fifth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2013) (FDIA)

      Future Directions in Information Access (FDIA 2013)

      3 September 2013

      Information retrieval, Negation and speculation detection, Biomedical and review domains

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          The thesis proposed here intends to assist information retrieval and text mining tasks through the negation and speculation detection focusing on two different areas. In the biomedical domain, the existence of an annotated corpus with this kind of information has made possible the development of an effective system to automatically detect these language forms. In the review domain, we have annotated for negation, speculation and their scope a set of reviews.

          Related collections

          Most cited references 4

          • Record: found
          • Abstract: not found
          • Article: not found

          Lexicon-Based Methods for Sentiment Analysis

            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes

            Background Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). Results The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief linguist – also responsible for setting up the annotation guidelines – who resolved cases where the annotators disagreed. The resulting corpus consists of more than 20.000 sentences that were considered for annotation and over 10% of them actually contain one (or more) linguistic annotation suggesting negation or uncertainty. Conclusion Statistics are reported on corpus size, ambiguity levels and the consistency of annotations. The corpus is accessible for academic purposes and is free of charge. Apart from the intended goal of serving as a common resource for the training, testing and comparing of biomedical Natural Language Processing systems, the corpus is also a good resource for the linguistic analysis of scientific and clinical texts.
              • Record: found
              • Abstract: found
              • Article: not found

              A novel hybrid approach to automated negation detection in clinical radiology reports.

              Negation is common in clinical documents and is an important source of poor precision in automated indexing systems. Previous research has shown that negated terms may be difficult to identify if the words implying negations (negation signals) are more than a few words away from them. We describe a novel hybrid approach, combining regular expression matching with grammatical parsing, to address the above limitation in automatically detecting negations in clinical radiology reports. Negations are classified based upon the syntactical categories of negation signals, and negation patterns, using regular expression matching. Negated terms are then located in parse trees using corresponding negation grammar. A classification of negations and their corresponding syntactical and lexical patterns were developed through manual inspection of 30 radiology reports and validated on a set of 470 radiology reports. Another 120 radiology reports were randomly selected as the test set on which a modified Delphi design was used by four physicians to construct the gold standard. In the test set of 120 reports, there were a total of 2,976 noun phrases, of which 287 were correctly identified as negated (true positives), along with 23 undetected true negations (false negatives) and 4 mistaken negations (false positives). The hybrid approach identified negated phrases with sensitivity of 92.6% (95% CI 90.9-93.4%), positive predictive value of 98.6% (95% CI 96.9-99.4%), and specificity of 99.87% (95% CI 99.7-99.9%). This novel hybrid approach can accurately locate negated concepts in clinical radiology reports not only when in close proximity to, but also at a distance from, negation signals.

                Author and article information

                September 2013
                September 2013
                : 46-48
                Universidad de Huelva

                E.T.S de ingeniería. Ctra. Palos de la Frontera s/n. 21819. Palos de la Frontera (Huelva)
                © Noa P. Cruz Díaz. Published by BCS Learning and Development Ltd. Fifth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2013), Granada, Spain

                This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit

                Fifth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2013)
                Granada, Spain
                3 September 2013
                Electronic Workshops in Computing (eWiC)
                Future Directions in Information Access (FDIA 2013)
                Product Information: 1477-9358BCS Learning & Development
                Self URI (journal page):
                Electronic Workshops in Computing


                Comment on this article