Collecting and de-identifying half a million WhatsApp messages

Gupta, Prakhar; Doudot, Lliana; Loup, Romain; Xanthos, Aris

Collecting and de-identifying half a million WhatsApp messages

Details

Request a copy

Serval ID

serval:BIB_2B279EDD8929

Type

Inproceedings: an article in a conference proceedings.

Collection

Publications

Institution

UNIL/CHUV

Title

Collecting and de-identifying half a million WhatsApp messages

Title of the conference

Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora 2023), 14–15 September 2023, University of Mannheim, Germany

Author(s)

Gupta Prakhar, Doudot Lliana, Loup Romain, Xanthos Aris

Publisher

Leibniz-Institut für Deutsche Sprache (IDS)

Publication state

Published

Issued date

07/09/2023

Peer-reviewed

Oui

Language

english

Notes

https://ids-pub.bsz-bw.de/frontdoor/deliver/index/docId/12095/file/CMC_Corpora_2023_Proceedings_2023.pdf

Abstract

Instant messaging (IM) applications, especially WhatsApp, have become ubiquitous in contemporary computer-mediated communication practices. IM data have the potential to constitute a rich source of research material for corpus linguistics and cultural analytics, owing to their similarities with face-to-face conversations as well as their private nature. In this work, we outline the creation process of a large curated dataset of WhatsApp messages in French. The paper covers the protocol for collecting these messages as well as the de-identification process for removing sensitive information liable to identify the users in these messages. The de-identified dataset will ultimately be made available to researchers on request.

Keywords

WhatsApp, chats, instant messaging, IM, de-identification, corpus, French

OAI-PMH

oai:serval.unil.ch:BIB_2B279EDD8929

Open Access

Yes

Create date

19/09/2023 10:18

Last modification date

20/09/2023 5:55

Usage data

SERVAL

serveur académique lausannois

Collecting and de-identifying half a million WhatsApp messages

Details