Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Détails

Ressource 1Télécharger: BIB_5A2CBDB06CA2.P001.pdf (1815.32 [Ko])
Etat: Public
Version: de l'auteur
ID Serval
serval:BIB_5A2CBDB06CA2
Type
Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).
Collection
Publications
Titre
Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings
Titre de la conférence
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Auteur(s)
Cocco C.
Editeur
Association for Computational Linguistics
Organisation
Université d'Avignon
Adresse
Stroudsburg
ISBN
978-1-937284-19-0
Statut éditorial
Publié
Date de publication
04/2012
Peer-reviewed
Oui
Pages
55-63
Langue
anglais
Notes
Actes de conférence en ligne
Résumé
Abstract:
To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
Mots-clé
Discourse types, K-means, high-dimensional embeddings, fuzzy clustering
Création de la notice
22/08/2012 14:18
Dernière modification de la notice
20/08/2019 15:13
Données d'usage