6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Portuguese Native Language Identification Dataset

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing. The dataset includes 1,868 student essays written by learners of European Portuguese, native speakers of the following L1s: Chinese, English, Spanish, German, Russian, French, Japanese, Italian, Dutch, Tetum, Arabic, Polish, Korean, Romanian, and Swedish. NLI-PT includes the original student text and four different types of annotation: POS, fine-grained POS, constituency parses, and dependency parses. NLI-PT can be used not only in NLI but also in research on several topics in the field of Second Language Acquisition and educational NLP. We discuss possible applications of this dataset and present the results obtained for the first lexical baseline system for Portuguese NLI.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: not found
          • Article: not found

          The International Corpus of Learner English: A New Resource for Foreign Language Learning and Teaching and Second Language Acquisition Research

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Generating artificial errors for grammatical error correction

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Arabic Native Language Identification

                Bookmark

                Author and article information

                Journal
                30 April 2018
                Article
                1804.11346
                4b262e5c-ea94-41ec-86d7-a310e07174f5

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Proceedings of The 13th Workshop on Innovative Use of NLP for Building Educational Applications (BEA)
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article