Header menu link for other important links
X
A non-linear source-filter based vocoder with prosody control
P. Giridhar, G. Ramesh,
Published in Institute of Electrical and Electronics Engineers Inc.
2023
Abstract
Speech signal reconstruction from its compact acoustic representation is a challenging task. Although the acoustic representations obtained from the speech processing systems (like Text-to-speech synthesis, speech enhancement, etc.) are highly accurate, the performance of the vocoder affects the naturalness of the synthesized speech. Conventional vocoders are based on the linear source-filter model of the human speech production mechanism. But, we can't incorporate them in training an end-to-end model, and they are vulnerable to the estimated acoustic representations. Neural vocoders like WaveNet can be incorporated in training end-to-end models. But the complexity and the inference time are pretty high and do not have provision to control the prosody. In this paper, we propose a neural network based compact non-linear vocoder with prosody control using the source-filter model of the human speech production mechanism. We can effectively control the prosody of the synthesized speech by controlling the prosodic parameters like fundamental frequency (f_0) without affecting the naturalness of the speech. The model achieves a better performance with a mean opinion score (MOS) of 4.09, with a much lower real-time factor and model complexity. © 2023 IEEE.
About the journal
Journal2023 National Conference on Communications, NCC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.