Evaluating Synthetic Data Augmentation to Correct for Data Imbalance in Realistic Clinical Prediction Settings
Détails
Télécharger: 39176944.pdf (324.60 [Ko])
Etat: Public
Version: Final published version
Licence: CC BY-NC 4.0
Etat: Public
Version: Final published version
Licence: CC BY-NC 4.0
ID Serval
serval:BIB_F4F5679ACFD4
Type
Partie de livre
Sous-type
Chapitre: chapitre ou section
Collection
Publications
Institution
Titre
Evaluating Synthetic Data Augmentation to Correct for Data Imbalance in Realistic Clinical Prediction Settings
Titre du livre
Digital Health and Informatics Innovations for Sustainable Health Care Systems
Editeur
IOS Press
ISBN
9781643685335
ISSN
0926-9630
1879-8365
1879-8365
ISSN-L
0926-9630
Statut éditorial
Publié
Date de publication
22/08/2024
Peer-reviewed
Oui
Volume
316
Série
Studies in Health Technology and Informatics
Pages
929-933
Langue
anglais
Résumé
Predictive modeling holds a large potential in clinical decision-making, yet its effectiveness can be hindered by inherent data imbalances in clinical datasets. This study investigates the utility of synthetic data for improving the performance of predictive modeling on realistic small imbalanced clinical datasets. We compared various synthetic data generation methods including Generative Adversarial Networks, Normalizing Flows, and Variational Autoencoders to the standard baselines for correcting for class underrepresentation on four clinical datasets. Although results show improvement in F1 scores in some cases, even over multiple repetitions, we do not obtain statistically significant evidence that synthetic data generation outperforms standard baselines for correcting for class imbalance. This study challenges common beliefs about the efficacy of synthetic data for data augmentation and highlights the importance of evaluating new complex methods against simple baselines.
Pubmed
Open Access
Oui
Création de la notice
30/08/2024 10:16
Dernière modification de la notice
05/09/2024 9:14