Header menu link for other important links
X
Relative Significance of Speech Sounds in Speaker Verification Systems
B.S.M. Rafi, S. Sankala,
Published in Birkhauser
2023
Volume: 42
   
Issue: 9
Pages: 5412 - 5427
Abstract
Automatic speaker verification (ASV) is the task of authenticating claimed identity of a speaker from his/her voice characteristics. State-of-the-art ASV systems rely on capturing the voice signature of a speaker in a fixed-dimensional embedding. Recent studies reported that the performance of the ASV system improves when phonetic information obtained from a phoneme recognizer is appended to the frame-level speech representations. This work aims at analyzing the relative significance of various phonetic classes in extracting the speaker discriminative embeddings. We use the temporal attention mechanism to analyze the importance of different phonetic classes in speaker verification. It is observed that vowels, fricatives, and nasals receive relatively higher attention in the speaker verification task. This observation is in accordance with the subjective studies reported earlier, which signify the speaker discriminative characteristics of vowels and nasals. In the process, we demonstrate the efficiency of self-supervised phonetic information in extracting robust speaker embeddings. The proposed self-supervised phonetic attentive ASV system achieved a relative improvement of 29.2% over the baseline x-vector system and 19.3% over its supervised counterpart. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
About the journal
JournalCircuits, Systems, and Signal Processing
PublisherBirkhauser
ISSN0278081X