Unsupervised acoustic segmentation and clustering using siamese network embeddings

S. Bhati; S. Nayak; Sri Rama Murty Kodukula; N. Dehak

doi:10.21437/Interspeech.2019-2981

Profiles Research Units Publications

Conferences

Unsupervised acoustic segmentation and clustering using siamese network embeddings

S. Bhati, S. Nayak, , N. Dehak

Published in International Speech Communication Association

2019

DOI: 10.21437/Interspeech.2019-2981

Volume: 2019-September

Pages: 2668 - 2672

Abstract

Unsupervised discovery of acoustic units from the raw speech signal forms the core objective of zero-resource speech processing. It involves identifying the acoustic segment boundaries and consistently assigning unique labels to acoustically similar segments. In this work, the possible candidates for segment boundaries are identified in an unsupervised manner from the kernel Gram matrix computed from the Mel-frequency cepstral coefficients (MFCC). These segment boundary candidates are used to train a siamese network, that is intended to learn embeddings that minimize intrasegment distances and maximize the intersegment distances. The siamese embeddings capture phonetic information from longer contexts of the speech signal and enhance the intersegment discriminability. These properties make the siamese embeddings better suited for acoustic segmentation and clustering than the raw MFCC features. The Gram matrix computed from the siamese embeddings provides unambiguous evidence for boundary locations. The initial candidate boundaries are refined using this evidence, and siamese embeddings are extracted for the new acoustic segments. A graph growing approach is used to cluster the siamese embeddings, and a unique label is assigned to acoustically similar segments. The performance of the proposed method for acoustic segmentation and clustering is evaluated on Zero Resource 2017 database. Copyright © 2019 ISCA

About the journal

Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher	International Speech Communication Association
ISSN	2308457X

Authors (1)

Sri Rama Murty Kodukula
- Department of Electrical Engineering

ACADEMICS

FACILITIES

CAMPUS LIFE

COUNCILS

QUICK LINKS