Robust Nonlinear Mapping of Soil Contamination Using Support Regression

Kanevski, M.; Pozdnoukhov, A.; Timonin, V.; Maignan, M.

Abstract

Spatial data analysis mapping and visualization is of great importance
in various fields: environment, pollution, natural hazards and risks,
epidemiology, spatial econometrics, etc. A basic task of spatial
mapping is to make predictions based on some empirical data (measurements).
A number of state-of-the-art methods can be used for the task: deterministic
interpolations, methods of geostatistics: the family of kriging estimators
(Deutsch and Journel, 1997), machine learning algorithms such as
artificial neural networks (ANN) of different architectures, hybrid
ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et
al., 1996), etc. All the methods mentioned above can be used for
solving the problem of spatial data mapping. Environmental empirical
data are always contaminated/corrupted by noise, and often with noise
of unknown nature. That's one of the reasons why deterministic models
can be inconsistent, since they treat the measurements as values
of some unknown function that should be interpolated. Kriging estimators
treat the measurements as the realization of some spatial randomn
process. To obtain the estimation with kriging one has to model the
spatial structure of the data: spatial correlation function or (semi-)variogram.
This task can be complicated if there is not sufficient number of
measurements and variogram is sensitive to outliers and extremes.
ANN is a powerful tool, but it also suffers from the number of reasons.
of a special type ? multiplayer perceptrons ? are often used as a
detrending tool in hybrid (ANN+geostatistics) models (Kanevski and
Maignank, 2004). Therefore, development and adaptation of the method
that would be nonlinear and robust to noise in measurements, would
deal with the small empirical datasets and which has solid mathematical
background is of great importance. The present paper deals with such
model, based on Statistical Learning Theory (SLT) - Support Vector
Regression. SLT is a general mathematical framework devoted to the
problem of estimation of the dependencies from empirical data (Hastie
et al, 2004; Vapnik, 1998). SLT models for classification - Support
Vector Machines - have shown good results on different machine learning
tasks. The results of SVM classification of spatial data are also
promising (Kanevski et al, 2002). The properties of SVM for regression
- Support Vector Regression (SVR) are less studied. First results
of the application of SVR for spatial mapping of physical quantities
were obtained by the authorsin for mapping of medium porosity (Kanevski
et al, 1999), and for mapping of radioactively contaminated territories
(Kanevski and Canu, 2000). The present paper is devoted to further
understanding of the properties of SVR model for spatial data analysis
and mapping. Detailed description of the SVR theory can be found
in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations
for the nonlinear modeling are given in section 2. Section 3 discusses
the application of SVR for spatial data mapping on the real case
study - soil pollution by Cs137 radionuclide. Section 4 discusses
the properties of the modelapplied to noised data or data with outliers.

SERVAL

serveur académique lausannois

Robust Nonlinear Mapping of Soil Contamination Using Support Regression

Details