Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Cocco, C.

Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Détails

Télécharger: BIB_5A2CBDB06CA2.P001.pdf (1815.32 [Ko])
Etat: Public
Version: de l'auteur⸱e

ID Serval

serval:BIB_5A2CBDB06CA2

Type

Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).

Collection

Publications

Institution

UNIL/CHUV

Titre

Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Titre de la conférence

Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics

Auteur⸱e⸱s

Cocco C.

Editeur

Association for Computational Linguistics

Organisation

Université d'Avignon

Adresse

Stroudsburg

ISBN

978-1-937284-19-0

Statut éditorial

Publié

Date de publication

04/2012

Peer-reviewed

Oui

Pages

55-63

Langue

anglais

Notes

Actes de conférence en ligne

Résumé

Abstract:
To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.

Mots-clé

Discourse types, K-means, high-dimensional embeddings, fuzzy clustering

URN

urn:nbn:ch:serval-BIB_5A2CBDB06CA20

OAI-PMH

oai:serval.unil.ch:BIB_5A2CBDB06CA2

Site de l'éditeur

http://aclweb.org/anthology-new/E/E12/E12-3.pdf

Création de la notice

22/08/2012 14:18

Dernière modification de la notice

20/08/2019 15:13

Données d'usage

SERVAL

serveur académique lausannois

Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Détails