Mind the gap: Performance metric evaluation in brain-age prediction.

de Lange, A.G.; Anatürk, M.; Rokicki, J.; Han, LKM; Franke, K.; Alnaes, D.; Ebmeier, K.P.; Draganski, B.; Kaufmann, T.; Westlye, L.T.; Hahn, T.; Cole, J.H.

doi:10.1002/hbm.25837

Mind the gap: Performance metric evaluation in brain-age prediction.

Details

Download: 35312210_BIB_D714EB202852.pdf (8174.84 [Ko])
State: Public
Version: Final published version
License: CC BY 4.0

Serval ID

serval:BIB_D714EB202852

Type

Article: article from journal or magazin.

Collection

Publications

Institution

UNIL/CHUV

Title

Mind the gap: Performance metric evaluation in brain-age prediction.

Journal

Human brain mapping

Author(s)

de Lange A.G., Anatürk M., Rokicki J., Han LKM, Franke K., Alnaes D., Ebmeier K.P., Draganski B., Kaufmann T., Westlye L.T., Hahn T., Cole J.H.

ISSN

1097-0193 (Electronic)

ISSN-L

1065-9471

Publication state

Published

Issued date

07/2022

Peer-reviewed

Oui

Volume

Number

Pages

3113-3129

Language

english

Notes

Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: ppublish

Abstract

Estimating age based on neuroimaging-derived data has become a popular approach to developing markers for brain integrity and health. While a variety of machine-learning algorithms can provide accurate predictions of age based on brain characteristics, there is significant variation in model accuracy reported across studies. We predicted age in two population-based datasets, and assessed the effects of age range, sample size and age-bias correction on the model performance metrics Pearson's correlation coefficient (r), the coefficient of determination (R <sup>2</sup> ), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). The results showed that these metrics vary considerably depending on cohort age range; r and R <sup>2</sup> values are lower when measured in samples with a narrower age range. RMSE and MAE are also lower in samples with a narrower age range due to smaller errors/brain age delta values when predictions are closer to the mean age of the group. Across subsets with different age ranges, performance metrics improve with increasing sample size. Performance metrics further vary depending on prediction variance as well as mean age difference between training and test sets, and age-bias corrected metrics indicate high accuracy-also for models showing poor initial performance. In conclusion, performance metrics used for evaluating age prediction models depend on cohort and study-specific data characteristics, and cannot be directly compared across different studies. Since age-bias corrected metrics generally indicate high accuracy, even for poorly performing models, inspection of uncorrected model results provides important information about underlying model attributes such as prediction variance.

Keywords

Algorithms, Brain/diagnostic imaging, Cohort Studies, Humans, Machine Learning, brain-age prediction, machine learning, neuroimaging, statistics

URN

urn:nbn:ch:serval-BIB_D714EB2028527

OAI-PMH

oai:serval.unil.ch:BIB_D714EB202852

DOI

10.1002/hbm.25837

Pubmed

35312210

Web of science

000770936900001