ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

1

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Preprint

Author(s): Sudhanshu Kasewa , Pontus Stenetorp , Sebastian Riedel

Publication date Created: 26 September 2018

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturallyoccurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5% \(F_{0.5}\) score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39 \(F_1\) score, indicating that our model generates mostly human-like instances.

Related collections

Most cited references 18

Record: found
Abstract: not found
Conference Proceedings: not found

Improving Neural Machine Translation Models with Monolingual Data

Alexandra Birch, Rico Sennrich, Barry Haddow (2016)

0 comments Cited 151 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

Deep Unordered Composition Rivals Syntactic Methods for Text Classification

Mohit Iyyer, Jordan Boyd-Graber, Hal Daumé III … (2015)

0 comments Cited 90 times – based on 0 reviews

Record: found
Abstract: not found
Conference Proceedings: not found

The CoNLL-2014 Shared Task on Grammatical Error Correction

Raymond Susanto, Christopher J Bryant, Christian Hadiwinoto … (2014)

0 comments Cited 36 times – based on 0 reviews

Author and article information

Journal

Publication date Created: 26 September 2018

Article

ArXiV ID: 1810.00668

SO-VID: cd970ac1-1836-4246-8b1e-183001295328

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments Accepted as a short paper at EMNLP 2018

Categories cs.CL cs.LG stat.ML

ScienceOpen disciplines: Theoretical computer science,Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Theoretical computer science, Machine learning, Artificial intelligence

Comments

Comment on this article