4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic Data Expansion for Customer-care Spoken Language Understanding

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Spoken language understanding (SLU) systems are widely used in handling of customer-care calls.A traditional SLU system consists of an acoustic model (AM) and a language model (LM) that areused to decode the utterance and a natural language understanding (NLU) model that predicts theintent. While AM can be shared across different domains, LM and NLU models need to be trainedspecifically for every new task. However, preparing enough data to train these models is prohibitivelyexpensive. In this paper, we introduce an efficient method to expand the limited in-domain data. Theprocess starts with training a preliminary NLU model based on logistic regression on the in-domaindata. Since the features are based onn= 1,2-grams, we can detect the most informative n-gramsfor each intent class. Using these n-grams, we find the samples in the out-of-domain corpus that1) contain the desired n-gram and/or 2) have similar intent label. The ones which meet the firstconstraint are used to train a new LM model and the ones that meet both constraints are used to train anew NLU model. Our results on two divergent experimental setups show that the proposed approachreduces by 30% the absolute classification error rate (CER) comparing to the preliminary modelsand it significantly outperforms the traditional data expansion algorithms such as the ones based onsemi-supervised learning, TF-IDF and embedding vectors.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          A unified architecture for natural language processing

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Estimation of probabilities from sparse data for the language model component of a speech recognizer

            S Katz (1987)
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              An empirical study of smoothing techniques for language modeling

                Bookmark

                Author and article information

                Journal
                27 September 2018
                Article
                1810.00670
                e9c0879a-20cc-4f67-adee-19871857c6dd

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                10 pages, 4 figures, 5 tabels
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article