ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

1

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

Low-Latency Speaker-Independent Continuous Speech Separation

Preprint

Author(s): Takuya Yoshioka , Zhuo Chen , Changliang Liu , Xiong Xiao , Hakan Erdogan , Dimitrios Dimitriadis

Publication date Created: 13 April 2019

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task.This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural post-filter.

Related collections

Most cited references 7

Record: found
Abstract: not found
Article: not found

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Jesper Jensen, Dong Yu, Zheng-Hua Tan … (2017)

0 comments Cited 66 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Article: not found

Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

Hiroshi Sawada, Shoko Araki, Shoji Makino (2011)

0 comments Cited 21 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Article: not found

Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening

Takuya Yoshioka, Tomohiro Nakatani (2012)

0 comments Cited 19 times – based on 0 reviews      Review now

Author and article information

Journal

Publication date Created: 13 April 2019

Article

ArXiV ID: 1904.06478

SO-VID: 29538206-a67c-46a4-92e7-d30c2f166d9e

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories eess.AS cs.CL cs.SD

ScienceOpen disciplines: Theoretical computer science,Graphics & Multimedia design,Electrical engineering

Data availability:

ScienceOpen disciplines: Theoretical computer science, Graphics & Multimedia design, Electrical engineering

Comments

Comment on this article