Sparse non-negative decomposition of speech power spectra for formant tracking

Durrieu, J.L.; Thiran, J.P.

doi:10.1109/ICASSP.2011.5947544

Sparse non-negative decomposition of speech power spectra for formant tracking

Details

Download: BIB_DD27EAF39243.P001.pdf (207.10 [Ko])
State: Public
Version: author

Serval ID

serval:BIB_DD27EAF39243

Type

Inproceedings: an article in a conference proceedings.

Collection

Publications

Institution

Production externe

Title

Sparse non-negative decomposition of speech power spectra for formant tracking

Title of the conference

ICASSP 2011, International Conference on Acoustics Speech and Signal Processing

Author(s)

Durrieu J.L., Thiran J.P.

Address

Prague, Czech Republic, May 22-27, 2011

ISBN

1520-6149

Publication state

Published

Issued date

2011

Series

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing

Pages

5260-5263

Language

english

Abstract

Many works on speech processing have dealt with
auto-regressive (AR) models for spectral envelope and
formant frequency estimation, mostly focusing on the
estimation of the AR parameters. However, it is also
interesting to be able to directly estimate the formant
frequencies, or equivalently the poles of the AR filter.
To tackle this issue, we propose in this paper to
decompose the signal onto several bases, one for each
formant, taking advantage of recent works on nonnegative
matrix factorization (NMF) for the estimation stage,
further refined by sparsity and smoothness penalties. The
results are encouraging, and the proposed system provides
formant tracks which seem robust enough to be used in
different applications such as phonetic analysis, emotion
detection or as visual cue for computer-aided
pronunciation training applications. The model can also
be extended to deal with multiple-speaker signals.

DOI

10.1109/ICASSP.2011.5947544

Web of science

000296062405217

Create date

07/01/2014 8:11

Last modification date

20/08/2019 16:01

Usage data

SERVAL

serveur académique lausannois

Sparse non-negative decomposition of speech power spectra for formant tracking

Details