Enhancing perinatal health patient information through ChatGPT - An accuracy study.
Details
Serval ID
serval:BIB_EDB8EA87D48F
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Enhancing perinatal health patient information through ChatGPT - An accuracy study.
Journal
PEC innovation
ISSN
2772-6282 (Electronic)
ISSN-L
2772-6282
Publication state
Published
Issued date
06/2025
Peer-reviewed
Oui
Volume
6
Pages
100381
Language
english
Notes
Publication types: Journal Article
Publication Status: epublish
Publication Status: epublish
Abstract
To evaluate ChatGPT's accuracy as information source for women and maternity-care workers on "nutrition" and "red flags" in pregnancy.
Accuracy of ChatGPT-generated recommendations was assessed by a 5-point Likert scale by eight raters for ten indicators per topic in four languages (French, English, German and Dutch). Accuracy and interrater agreement were calculated per topic and language.
For both topics, median accuracy scores of ChatGPT-generated recommendations were excellent (5.0; IQR 4-5) independently of language. Median accuracy scores varied with a maximum of 1 on a 5-point Likert-scare according to question's framing. Overall accuracy scores were 83-89 % for 'nutrition in pregnancy' versus 96-98 % for 'red flags in pregnancy'. Inter-rater agreement was good to excellent for both topics.
Although ChatGPT generated accurate recommendations regarding the tested indicators for nutrition and red flags during pregnancy, women should be aware of ChatGPT's limitations such as inconsistencies according to formulation, language and the woman's personal context.
Despite a growing interest in the potential use of artificial intelligence in healthcare, this is, to the best of our knowledge, the first study assessing potential limitations that may impact accuracy of ChatGPT-generated recommendations such as language and question-framing in key domains of perinatal health.
Accuracy of ChatGPT-generated recommendations was assessed by a 5-point Likert scale by eight raters for ten indicators per topic in four languages (French, English, German and Dutch). Accuracy and interrater agreement were calculated per topic and language.
For both topics, median accuracy scores of ChatGPT-generated recommendations were excellent (5.0; IQR 4-5) independently of language. Median accuracy scores varied with a maximum of 1 on a 5-point Likert-scare according to question's framing. Overall accuracy scores were 83-89 % for 'nutrition in pregnancy' versus 96-98 % for 'red flags in pregnancy'. Inter-rater agreement was good to excellent for both topics.
Although ChatGPT generated accurate recommendations regarding the tested indicators for nutrition and red flags during pregnancy, women should be aware of ChatGPT's limitations such as inconsistencies according to formulation, language and the woman's personal context.
Despite a growing interest in the potential use of artificial intelligence in healthcare, this is, to the best of our knowledge, the first study assessing potential limitations that may impact accuracy of ChatGPT-generated recommendations such as language and question-framing in key domains of perinatal health.
Keywords
Artificial intelligence, ChatGPT, Nutrition, Patient information, Perinatal health information, Pregnancy, Warning signs
Pubmed
Open Access
Yes
Create date
07/03/2025 16:45
Last modification date
08/03/2025 7:21