Header menu link for other important links
Attention-based phonetic convolutional recurrent neural networks for language identification
R. Gundluru, V. Venkatesh,
Published in Institute of Electrical and Electronics Engineers Inc.
Language identification is the task of identifying the language of the spoken utterance. Deep neural models such as LSTM-RNN with attention mechanism shown great potential in language identification. The language cues like phonemes and their co-occurrences are an important component while distinguishing the languages. The acoustic feature-based systems do not utilize phonetic information. So the phonetic feature-based LSTM-RNN models have shown improvement over the raw-acoustic features. These methods require a large amount of transcribed speech data to train the phoneme discriminator. Obtaining transcribed speech data for low resource Indian languages is a difficult task. To alleviate this issue, we investigate the usage of pre-trained rich resource phonetic discriminators for low resource target languages to extract the phonetic features. We then trained an attention CRNN based end-to-end utterance level language identification (LID) system with these discriminative phonetic features. We used open-source LibriSpeech English data to train the phoneme discriminator with sequence discriminate objective lattice-free maximum mutual information (LF-MMI). We achieved overall 20% absolute improvements over the baseline acoustic features CRNN model. We also investigate the significance of the duration in LID. © 2021 IEEE.
About the journal
Journal2021 National Conference on Communications, NCC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.