36
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SQuAD: 100,000+ Questions for Machine Comprehension of Text

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees. We build a strong logistic regression model, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). However, human performance (86.8%) is much higher, indicating that the dataset presents a good challenge problem for future research. The dataset is freely available at https://stanford-qa.com.

          Related collections

          Author and article information

          Journal
          2016-06-16
          2016-10-06
          Article
          1606.05250
          eaf98117-5654-4916-a246-3b11c3f2afc8

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          10 pages
          cs.CL

          Theoretical computer science
          Theoretical computer science

          Comments

          Comment on this article