Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech
Abstract
Distant speech capture in lecture halls and auditoriums offers unique challenges in algorithm development for automatic speech recognition. In this study, a new adaptation strategy for distant noisy speech is created by the means of phoneme classes. Unlike previous approaches which adapt the acoustic model to the features, the proposed phoneme-class based feature adaptation (PCBFA) strategy adapts the distant data features to the present acoustic model which was previously trained on close microphone speech. The essence of PCBFA is to create a transformation strategy which makes the distributions of phoneme-classes of distant noisy speech similar to those of a close talk microphone acoustic model in a multidimensional MFCC space. To achieve this task, phoneme-classes of distant noisy speech are recognized via artificial neural networks. PCBFA is the adaptation of features rather than adaptation of acoustic models. The main idea behind PCBFA is illustrated via conventional Gaussian mixture model-Hidden Markov model (GMM-HMM) although it can be extended to new structures in automatic speech recognition (ASR). The new adapted features together with the new and improved acoustic models produced by PCBFA are shown to outperform those created only by acoustic model adaptations for ASR and keyword spotting. PCBFA offers a new powerful understanding in acoustic-modeling of distant speech.
Source
International Journal of Speech TechnologyVolume
20Issue
4Collections
- Makale Koleksiyonu [791]
- Scopus İndeksli Yayınlar Koleksiyonu [8325]
- WoS İndeksli Yayınlar Koleksiyonu [7605]