Sparse non-negative decomposition of speech power spectra for formant tracking

Détails

Ressource 1Télécharger: BIB_DD27EAF39243.P001.pdf (207.10 [Ko])
Etat: Public
Version: de l'auteur⸱e
ID Serval
serval:BIB_DD27EAF39243
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Titre
Sparse non-negative decomposition of speech power spectra for formant tracking
Titre de la conférence
ICASSP 2011, International Conference on Acoustics Speech and Signal Processing
Auteur⸱e⸱s
Durrieu J.L., Thiran J.P.
Adresse
Prague, Czech Republic, May 22-27, 2011
ISBN
1520-6149
Statut éditorial
Publié
Date de publication
2011
Série
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
Pages
5260-5263
Langue
anglais
Résumé
Many works on speech processing have dealt with
auto-regressive (AR) models for spectral envelope and
formant frequency estimation, mostly focusing on the
estimation of the AR parameters. However, it is also
interesting to be able to directly estimate the formant
frequencies, or equivalently the poles of the AR filter.
To tackle this issue, we propose in this paper to
decompose the signal onto several bases, one for each
formant, taking advantage of recent works on nonnegative
matrix factorization (NMF) for the estimation stage,
further refined by sparsity and smoothness penalties. The
results are encouraging, and the proposed system provides
formant tracks which seem robust enough to be used in
different applications such as phonetic analysis, emotion
detection or as visual cue for computer-aided
pronunciation training applications. The model can also
be extended to deal with multiple-speaker signals.
Web of science
Création de la notice
07/01/2014 8:11
Dernière modification de la notice
20/08/2019 16:01
Données d'usage