Robust automatic speech recognition employing phoneme-dependent multi-environment enhanced models based linear normalization-Edición Única

Hernández Ochoa, Igmar

View/Open

DocsTec_4848.pdf Size (2.136Mb)

DocsTec_4848_1.pdf Size (79.50Kb)

Date

2006-12-01

Author

Hernández Ochoa, Igmar

Metadata

Show full item record

Export citation

Abstract

This work shows a robust normalization technique by cascading a speech enhancement method followed by a feature vector normalization algorithm. An efficient scheme used to provide speech enhancement is the Spectral Subtraction algorithm, which reduces the effect of additive noise by performing a subtraction of noise spectrum estimate over the complete speech spectrum. On the other hand, a new and promising technique known as PD-MEMLIN (Phoneme-Dependent Multi-Enviroment Models based LInear Normalization) has also shown to be effective. PD-MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models (GMMs), and estimates the different compensation linear transformation to be performed to clean the signal. In this work the integration of both approaches is proposed. The final design is called PD-MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Models based LInear Normalization), which confirms and improves the effectivness of both approaches. The results obtained show that in very high degraded speech (between -5dB and 0dB) PD-MEEMLIN outperforms the SS by a range between 11.4% and 34.5%,for PD-MEMLIN by a range between 11.7% and 24.84%, and for SPLICE by a range between 6.04% and 22.23%. Furthemore, in moderate SNR, i.e. 15 or 20 dB, PD-MEEMLIN is as good as PD-MEMLIN and SS techniques.

URI

http://hdl.handle.net/11285/567599

Collections

Ciencias Exactas y Ciencias de la Salud 5426

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess