1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Low-Latency Speaker-Independent Continuous Speech Separation

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task.This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural post-filter.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: not found
          • Article: not found

          Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening

                Bookmark

                Author and article information

                Journal
                13 April 2019
                Article
                1904.06478
                29538206-a67c-46a4-92e7-d30c2f166d9e

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                eess.AS cs.CL cs.SD

                Theoretical computer science,Graphics & Multimedia design,Electrical engineering

                Comments

                Comment on this article