11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Purely End-to-end System for Multi-speaker Speech Recognition

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have required additional training data such as isolated source signals or senone alignments for effective learning. In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner. We further propose a new objective function to improve the contrast between the hidden vectors to avoid generating similar hypotheses. Experimental results show that the model is directly able to learn a mapping from a speech mixture to multiple label sequences, achieving 83.1 % relative improvement compared to a model trained without the proposed objective. Interestingly, the results are comparable to those produced by previous end-to-end works featuring explicit separation and recognition modules.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          End-to-end attention-based large vocabulary speech recognition

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Deep clustering: Discriminative embeddings for segmentation and separation

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

                Bookmark

                Author and article information

                Journal
                15 May 2018
                Article
                1805.05826
                3a4c324f-5d7f-4781-bb12-8f455d0bd8a2

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                ACL 2018
                cs.SD cs.CL eess.AS stat.ML

                Theoretical computer science,Machine learning,Electrical engineering,Graphics & Multimedia design

                Comments

                Comment on this article