Sparse non-negative decomposition of speech power spectra for formant tracking

Details

Ressource 1Download: BIB_DD27EAF39243.P001.pdf (207.10 [Ko])
State: Public
Version: author
Serval ID
serval:BIB_DD27EAF39243
Type
Inproceedings: an article in a conference proceedings.
Collection
Publications
Title
Sparse non-negative decomposition of speech power spectra for formant tracking
Title of the conference
ICASSP 2011, International Conference on Acoustics Speech and Signal Processing
Author(s)
Durrieu J.L., Thiran J.P.
Address
Prague, Czech Republic, May 22-27, 2011
ISBN
1520-6149
Publication state
Published
Issued date
2011
Series
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
Pages
5260-5263
Language
english
Abstract
Many works on speech processing have dealt with
auto-regressive (AR) models for spectral envelope and
formant frequency estimation, mostly focusing on the
estimation of the AR parameters. However, it is also
interesting to be able to directly estimate the formant
frequencies, or equivalently the poles of the AR filter.
To tackle this issue, we propose in this paper to
decompose the signal onto several bases, one for each
formant, taking advantage of recent works on nonnegative
matrix factorization (NMF) for the estimation stage,
further refined by sparsity and smoothness penalties. The
results are encouraging, and the proposed system provides
formant tracks which seem robust enough to be used in
different applications such as phonetic analysis, emotion
detection or as visual cue for computer-aided
pronunciation training applications. The model can also
be extended to deal with multiple-speaker signals.
Web of science
Create date
07/01/2014 9:11
Last modification date
20/08/2019 17:01
Usage data