Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.
Details
Serval ID
serval:BIB_1DA76AB4F142
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.
Journal
Statistics and computing
ISSN
0960-3174 (Print)
ISSN-L
0960-3174
Publication state
Published
Issued date
01/01/2012
Peer-reviewed
Oui
Volume
22
Number
1
Pages
33-52
Language
english
Notes
Publication types: Journal Article
Publication Status: ppublish
Publication Status: ppublish
Abstract
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
Pubmed
Web of science
Create date
28/02/2022 11:45
Last modification date
23/03/2024 7:24