14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Learning Semantic Textual Similarity from Conversations

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform well on the semantic textual similarity (STS) benchmark and SemEval 2017's Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A large annotated corpus for learning natural language inference

            Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Deep Unordered Composition Rivals Syntactic Methods for Text Classification

                Bookmark

                Author and article information

                Journal
                20 April 2018
                Article
                1804.07754
                8bbb513c-2f88-4f94-a2ac-666b49bb255f

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                10 pages, 8 Figures, 6 Tables
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article