Collecting and de-identifying half a million WhatsApp messages

Gupta, Prakhar; Doudot, Lliana; Loup, Romain; Xanthos, Aris

Collecting and de-identifying half a million WhatsApp messages

Détails

Demande d'une copie

ID Serval

serval:BIB_2B279EDD8929

Type

Actes de conférence (partie): contribution originale à la littérature scientifique, publiée à l'occasion de conférences scientifiques, dans un ouvrage de compte-rendu (proceedings), ou dans l'édition spéciale d'un journal reconnu (conference proceedings).

Collection

Publications

Institution

UNIL/CHUV

Titre

Collecting and de-identifying half a million WhatsApp messages

Titre de la conférence

Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora 2023), 14–15 September 2023, University of Mannheim, Germany

Auteur⸱e⸱s

Gupta Prakhar, Doudot Lliana, Loup Romain, Xanthos Aris

Editeur

Leibniz-Institut für Deutsche Sprache (IDS)

Statut éditorial

Publié

Date de publication

07/09/2023

Peer-reviewed

Oui

Langue

anglais

Notes

https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/12095/file/CMC_Corpora_2023_Proceedings_2023.pdf

Résumé

Instant messaging (IM) applications, especially WhatsApp, have become ubiquitous in contemporary computer-mediated communication practices. IM data have the potential to constitute a rich source of research material for corpus linguistics and cultural analytics, owing to their similarities with face-to-face conversations as well as their private nature. In this work, we outline the creation process of a large curated dataset of WhatsApp messages in French. The paper covers the protocol for collecting these messages as well as the de-identification process for removing sensitive information liable to identify the users in these messages. The de-identified dataset will ultimately be made available to researchers on request.

Mots-clé

WhatsApp, chats, instant messaging, IM, de-identification, corpus, French

OAI-PMH

oai:serval.unil.ch:BIB_2B279EDD8929

Open Access

Oui

Création de la notice

19/09/2023 10:18

Dernière modification de la notice

20/09/2023 5:55

Données d'usage

SERVAL

serveur académique lausannois

Collecting and de-identifying half a million WhatsApp messages

Détails