1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturallyoccurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5% \(F_{0.5}\) score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39 \(F_1\) score, indicating that our model generates mostly human-like instances.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Improving Neural Machine Translation Models with Monolingual Data

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Deep Unordered Composition Rivals Syntactic Methods for Text Classification

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              The CoNLL-2014 Shared Task on Grammatical Error Correction

                Bookmark

                Author and article information

                Journal
                26 September 2018
                Article
                1810.00668
                cd970ac1-1836-4246-8b1e-183001295328

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Accepted as a short paper at EMNLP 2018
                cs.CL cs.LG stat.ML

                Theoretical computer science,Machine learning,Artificial intelligence
                Theoretical computer science, Machine learning, Artificial intelligence

                Comments

                Comment on this article