13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries.

      Methods of information in medicine
      Abbreviations as Topic, Algorithms, Dictionaries as Topic, Humans, Information Storage and Retrieval, MEDLINE, Pattern Recognition, Automated, Programming Languages, Software

      Read this article at

      ScienceOpenPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          To develop an automated, accurate and scalable method by which acronym-definition pairs can be identified within text. Its primary advantage is in enabling information processing methods to resolve author-defined acronyms, but it also allows an automated creation of a reference work on acronym definitions. This has several advantages over manual or semi-automated methods, besides time and effort saved, such as enabling identification of relative frequencies for alternate acronyms and definitions as well as spelling, phrasing and hyphenation variants for a unique acronym-definition pair. It also aids users in identifying acronym/definition variants present in the literature that may not necessarily be in biomedical databases. A set of heuristics to accurately locate and identify the boundaries of acronym-definition pairs was developed and refined in terms of precision and recall on subsets of MEDLINE records. These training sets were gradually increased in size and heuristics re-evaluated to ensure scalability. Our final set of Acronym Resolving General Heuristics (ARGH) had a sample-based estimated rate of 96.5 +/- 0.4% precision and 93.0 +/- 2.7% recall when tested on over 12 million MEDLINE records, identifying more than 174,000 unique acronyms and their 737,000 associated definitions. We estimate that as much as 36% of the acronyms in MEDLINE are associated with more than one definition and, conversely, up to 10% of definitions are associated with more than one acronym. The number of unique acronyms in MEDLINE is increasing at a rate of approximately 11,000 per year, while the number of definitions associated with them is growing at approximately four times that rate. Access to the ARGH database is available online at http://lethargy.swmed.edu/ARGH/argh.asp. The heuristic module and database are available upon request.

          Related collections

          Author and article information

          Comments

          Comment on this article