9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      A new computing method for extracting contiguous phraseological sequences from academic text corpora

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This study aims to develop a new computing method for extracting contiguous phraseological sequences (PSs) of various lengths from academic text corpora by measuring internal associations of n-grams. We construct a new normalizing algorithm of probability-weighted average for refining the MI measure and enhancing precision in extracting PSs from corpora. This computing method is applied to the data in a medium-sized text corpus of academic English. Results indicate that the resultant new MI measure can provide statistics which better reveal internal associations within an n-gram, regardless of size. Lexico-grammatical sequences extracted with this method are more complete and less arbitrary in terms of grammar and semantics. The method can be applied to treating a variety of linguistic phenomenon, ranging from well-established phrases to likely phrasal entities, thus having potentially practical applications in corpus-based studies of phraseology and natural language processing.

          Related collections

          Author and article information

          Journal
          International Journal of Corpus Linguistics
          IJCL
          John Benjamins Publishing Company
          1384-6655
          1569-9811
          2013
          2013
          2013
          : 18
          : 4
          : 506-535
          Article
          10.1075/ijcl.18.4.03wei
          ce5e5d79-60e6-4fa1-91eb-3fc96b6a3fdb
          © 2013
          History

          Comments

          Comment on this article