Procode: A Machine-Learning Tool to Support (Re-)coding of Free-Texts of Occupations and Industries.

Détails

Ressource 1Télécharger: Savic_PROCODE_Pre-print.pdf (189.97 [Ko])
Etat: Public
Version: de l'auteur⸱e
Licence: Non spécifiée
ID Serval
serval:BIB_24E830FACC56
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Procode: A Machine-Learning Tool to Support (Re-)coding of Free-Texts of Occupations and Industries.
Périodique
Annals of work exposures and health
Auteur⸱e⸱s
Savic N., Bovio N., Gilbert F., Paz J., Guseva Canu I.
ISSN
2398-7316 (Electronic)
ISSN-L
2398-7308
Statut éditorial
Publié
Date de publication
07/01/2022
Peer-reviewed
Oui
Volume
66
Numéro
1
Pages
113-118
Langue
anglais
Notes
Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: ppublish
Résumé
Procode is a free of charge web-tool that allows automatic coding of occupational data (free-texts) by implementing Complement Naïve Bayes (CNB) as a machine-learning technique. The paper describes the algorithm, performance evaluation, and future goals regarding the tool's development. Almost 30 000 free-texts with manually assigned classification codes of French classification of occupations (PCS) and French classification of activities (NAF) were used to train CNB. A 5-fold cross-validation found that Procode predicts correct classification codes in 57-81 and 63-83% cases for PCS and NAF, respectively. Procode also integrates recoding between two classifications. In the first version of Procode, this operation, however, is only a simple search function of recoding links in existing crosswalks. Future focus of the project will be collection of the data to support automatic coding to other classification and to establish a more advanced method for recoding.
Mots-clé
Bayes Theorem, Humans, Industry, Machine Learning, Occupational Exposure, Occupations, Naïve Bayes, cross-validation, epidemiology, machine learning, occupational classifications
Pubmed
Web of science
Création de la notice
29/06/2021 9:00
Dernière modification de la notice
22/10/2022 6:08
Données d'usage