Effects of simulated observation errors on the performance of species distribution models

Details

Ressource 1Download: Fernandes_et_al-2019-Diversity_and_Distributions.pdf (1281.19 [Ko])
State: Public
Version: Final published version
Serval ID
serval:BIB_B4715DCA6623
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Effects of simulated observation errors on the performance of species distribution models
Journal
Diversity and Distributions
Author(s)
Fernandes R.F., Scherrer D., Guisan A.
ISSN
1472-4642
ISSN-L
1366-9516
Publication state
Published
Issued date
2019
Peer-reviewed
Oui
Volume
25
Number
3
Pages
400–413
Language
english
Abstract
Aim: Species distribution information is essential under increasing global changes, and models can be used to acquire such information but they can be affected by dif-ferent errors/bias. Here, we evaluated the degree to which errors in species data (false presences–absences) affect model predictions and how this is reflected in com-monly used evaluation metrics.Location: Western Swiss Alps.Methods: Using 100 virtual species and different sampling methods, we created ob-servation datasets of different sizes (100–400–1,600) and added increasing levels of errors (creating false positives or negatives; from 0% to 50%). These degraded data-sets were used to fit models using generalized linear model, random forest and boosted regression trees. Model fit (ability to reproduce calibration data) and predic-tive success (ability to predict the true distribution) were measured on probabilistic/binary outcomes using Kappa, TSS, MaxKappa, MaxTSS and Somers’D (rescaled AUC).Results: The interpretation of models’ performance depended on the data and met-rics used to evaluate them, with conclusions differing whether model fit, or predic-tive success were measured. Added errors reduced model performance, with effects expectedly decreasing as sample size increased. Model performance was more af-fected by false positives than by false negatives. Models with different techniques were differently affected by errors: models with high fit presenting lower predictive success (RFs), and vice versa (GLMs). High evaluation metrics could still be obtained with 30% error added, indicating that some metrics (Somers’D) might not be sensitive enough to detect data degradation.Main conclusions: Our findings highlight the need to reconsider the interpretation scale of some commonly used evaluation metrics: Kappa seems more realistic than Somers’D/AUC or TSS. High fits were obtained with high levels of error added, show-ing that RF overfits the data. When collecting occurrence databases, it is advisory to reduce the rate of false positives (or increase sample sizes) rather than false negatives.
Keywords
artificial data, AUC, ecological niche models, evaluation metric, habitat suitability models, Kappa, model fit, predictive accuracy, TSS, uncertainty
Web of science
Open Access
Yes
Create date
28/09/2018 22:31
Last modification date
20/08/2019 15:22
Usage data