Attention-based phonetic convolutional recurrent neural networks for language identification

R. Gundluru; V. Venkatesh; Sri Rama Murty Kodukula

doi:10.1109/NCC52529.2021.9530030

Language identification is the task of identifying the language of the spoken utterance. Deep neural models such as LSTM-RNN with attention mechanism shown great potential in language identification. The language cues like phonemes and their co-occurrences are an important component while distinguishing the languages. The acoustic feature-based systems do not utilize phonetic information. So the phonetic feature-based LSTM-RNN models have shown improvement over the raw-acoustic features. These methods require a large amount of transcribed speech data to train the phoneme discriminator. Obtaining transcribed speech data for low resource Indian languages is a difficult task. To alleviate this issue, we investigate the usage of pre-trained rich resource phonetic discriminators for low resource target languages to extract the phonetic features. We then trained an attention CRNN based end-to-end utterance level language identification (LID) system with these discriminative phonetic features. We used open-source LibriSpeech English data to train the phoneme discriminator with sequence discriminate objective lattice-free maximum mutual information (LF-MMI). We achieved overall 20% absolute improvements over the baseline acoustic features CRNN model. We also investigate the significance of the duration in LID. © 2021 IEEE.

Journal	2021 National Conference on Communications, NCC 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.