ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

4

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

A Deep Generative Model of Speech Complex Spectrograms

Preprint

Author(s): Aditya Arie Nugraha , Kouhei Sekiguchi , Kazuyoshi Yoshii

Publication date Created: 07 March 2019

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

This paper proposes an approach to the joint modeling of the short-time Fourier transform magnitude and phase spectrograms with a deep generative model. We assume that the magnitude follows a Gaussian distribution and the phase follows a von Mises distribution. To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i.e., the group delay and the instantaneous frequency. Based on these assumptions, we explore and compare several combinations of loss functions for training our models. Built upon the variational autoencoder framework, our model consists of three convolutional neural networks acting as an encoder, a magnitude decoder, and a phase decoder. In addition to the latent variables, we propose to also condition the phase estimation on the estimated magnitude. Evaluated for a time-domain speech reconstruction task, our models could generate speech with a high perceptual quality and a high intelligibility.

Related collections

Most cited references 9

Record: found
Abstract: not found
Article: not found

Signal estimation from modified short-time Fourier transform

D D Griffin, Jae Lim (1984)

0 comments Cited 225 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Article: not found

Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals

B Boashash (1992)

0 comments Cited 211 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Article: not found

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

Richard Heusdens, Jesper Jensen, Cees H Taal … (2011)

0 comments Cited 145 times – based on 0 reviews      Review now

Author and article information

Journal

Publication date Created: 07 March 2019

Article

ArXiV ID: 1903.03269

SO-VID: 36df0374-06c1-4caa-b662-1bd438e20b3d

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.SD cs.LG eess.AS stat.ML

ScienceOpen disciplines: Machine learning,Artificial intelligence,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Machine learning, Artificial intelligence, Electrical engineering, Graphics & Multimedia design

Comments

Comment on this article