Accelerating Clinical Text Annotation in Underrepresented Languages: A Case Study on Text De-Identification
Details
Download: 39176927.pdf (189.18 [Ko])
State: Public
Version: Final published version
License: CC BY-NC 4.0
State: Public
Version: Final published version
License: CC BY-NC 4.0
Serval ID
serval:BIB_9D47A8BCBA47
Type
A part of a book
Publication sub-type
Chapter: chapter ou part
Collection
Publications
Institution
Title
Accelerating Clinical Text Annotation in Underrepresented Languages: A Case Study on Text De-Identification
Title of the book
Digital Health and Informatics Innovations for Sustainable Health Care Systems
Publisher
IOS Press
ISBN
9781643685335
ISSN
0926-9630
1879-8365
1879-8365
ISSN-L
0926-9630
Publication state
Published
Issued date
22/08/2024
Peer-reviewed
Oui
Volume
316
Series
Studies in Health Technology and Informatics
Pages
853-857
Language
english
Abstract
Clinical notes contain valuable information for research and monitoring quality of care. Named Entity Recognition (NER) is the process for identifying relevant pieces of information such as diagnoses, treatments, side effects, etc., and bring them to a more structured form. Although recent advancements in deep learning have facilitated automated recognition, particularly in English, NER can still be challenging due to limited specialized training data. This exacerbated in hospital settings where annotations are costly to obtain without appropriate incentives and often dependent on local specificities. In this work, we study whether this annotation process can be effectively accelerated by combining two practical strategies. First, we convert usually passive annotation tasks into a proactive contest to motivate human annotators in performing a task often considered tedious and time-consuming. Second, we provide pre-annotations for the participants to evaluate how recall and precision of the pre-annotations can boost or deteriorate annotation performance. We applied both strategies to a text de-identification task on French clinical notes and discharge summaries at a large Swiss university hospital. Our results show that proactive contest and average quality pre-annotations can significantly speed up annotation time and increase annotation quality, enabling us to develop a text de-identification model for French clinical notes with high performance (F1 score 0.94).
Pubmed
Open Access
Yes
Create date
30/08/2024 9:45
Last modification date
05/09/2024 9:10