Robust automatic speech recognition employing phoneme-dependent multi-environment enhanced models based linear normalization-Edición Única
Export citation
Abstract
This work shows a robust normalization technique by cascading a speech enhancement method followed by a feature vector normalization algorithm. An efficient scheme
used to provide speech enhancement is the Spectral Subtraction algorithm, which reduces the effect of additive noise by performing a subtraction of noise spectrum estimate
over the complete speech spectrum. On the other hand, a new and promising technique
known as PD-MEMLIN (Phoneme-Dependent Multi-Enviroment Models based LInear
Normalization) has also shown to be effective. PD-MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models
(GMMs), and estimates the different compensation linear transformation to be performed to clean the signal. In this work the integration of both approaches is proposed.
The final design is called PD-MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Models based LInear Normalization), which confirms and improves the effectivness of both approaches. The results obtained show that in very high degraded speech
(between -5dB and 0dB) PD-MEEMLIN outperforms the SS by a range between 11.4%
and 34.5%,for PD-MEMLIN by a range between 11.7% and 24.84%, and for SPLICE
by a range between 6.04% and 22.23%. Furthemore, in moderate SNR, i.e. 15 or 20
dB, PD-MEEMLIN is as good as PD-MEMLIN and SS techniques.