Time-varying effective EEG source connectivity: the optimization of model parameters*

Adaptive estimation methods based on general Kalman filter are powerful tools to investigate brain networks dynamics given the non-stationary nature of neural signals. These methods rely on two parameters, the model order p and adaptation constant c, which determine the resolution and smoothness of the time-varying multivariate autoregressive estimates. A sub-optimal filtering may present consistent biases in the frequency domain and temporal distortions, leading to fallacious interpretations. Thus, the performance of these methods heavily depends on the accurate choice of these two parameters in the filter design. In this work, we sought to define an objective criterion for the optimal choice of these parameters. Since residual- and information-based criteria are not guaranteed to reach an absolute minimum, we propose to study the partial derivatives of these functions to guide the choice of p and c. To validate the performance of our method, we used a dataset of human visual evoked potentials during face perception where the generation and propagation of information in the brain is well understood and a set of simulated data where the ground truth is available.


I. INTRODUCTION
In the field of neuroscience, the interest in studying the dynamical causal interactions that characterize rest-or taskrelated brain networks has led to a remarkable increase in the application of adaptive estimation algorithms. In particular, Granger causality based on adaptive filtering algorithms represent a suitable approach to study dynamical networks composed by highly non-stationary neural signals (e.g., electroencephalogram (EEG) signals) [1] [2]. While adaptive filtering allows dealing with time-varying multivariate timeseries, the interpretation and test of direct causal influences strictly depends upon the optimal choice of parameters in the filter settings. A signal # Granger-causes another signal $ if the history of # contains information that helps to predict $ above and beyond the information contained in the history of $ alone [3]. Such dependency on the history of multiple signals implies a model that should take into account the optimal lag (model order) at which causal influences occur and the rapidity with which they evolve. In time-varying multivariate autoregressive (tv-MVAR) models based on the general Kalman filter [4], these two aspects need to be taken into account as a mandatory step during the filter design: a suboptimal filter that is unable to track the true dynamics of a system can easily increase the risk of wrong causality inference. For this reason, it is imperative to rely on objective criteria for the selection of the two key Kalman parameters that determine the structure and speed of the estimated causal influences. The model order & indicates the number of lags to be considered during modeling, and the adaptation constant ' is the tuning parameter that controls the tracking speed and smoothness of the tv-MVAR estimates [5] [6].
The model order selection in tv-MVAR models is not trivial [7] and has been usually performed by means of information-based criteria that neglect the non-stationary nature of the signals (e.g., Akaike information criterion, AIC [8], and Bayesian information criterion, BIC [9]) [10] [11] or through modified time-varying versions of the same criteria (e.g., the modified AIC, MAIC [12]). Other approaches have relied on choosing the model order a priori [13] or based on a comparison between parametric and non-parametric spectral density estimates [14] [15], thus, considering only the univariate part of the tv-MVAR system.
The optimal adaptation constant has been also determined via similar a priori and information-based approaches [15] [16] or through different cost functions that minimize residual errors [10] [17] which are prone to the risk of overfitting real data. In more recent years, a new method has been proposed to estimate the adaptation constant ', based on the predictive least square principle (PLS) [5]. Alternative, to overcome the use of adaptation constants, windowing approaches have also been proposed [18] [19], but with some critical limitations: high temporal resolution requires short time windows which lead to few residuals when assessing the quality of the fit, the model order also depends on the size of the window and both the choice of the size of the window and the overlap percentage are subjective.
Given the lack of a standard selection procedure and the tendency of residual-and information-based criteria to promote overfitting in which an absolute minimum is not guaranteed, in the present work we test a new method based on the partial derivatives of a residual minimization function.
This approach allows the determination of the inflection point, i.e., a point on a continuously differentiable plane curve at which the function changes from being concave to convex, which corresponds to our optimal choice of & and '. Because of the lack of an objective ground truth, the evaluation of the performance of effective connectivity methods on real data sets is challenging. As first step to demonstrate the validity of the proposed method, we used a dataset of visual evoked potentials (VEP) during face perception where the generation and propagation of information in the brain is relatively well understood. Complementarily, we tested the method in a set of simulated data imposing & and ' a priori.

A. Visual Evoked Potential of face perception
Many behavioral studies have investigated the process involved in visual stimuli such as face images [20] [21]. Traditional measures are based on the N170 face-sensitive evoked response component [22]. Human faces evoke a large negative potential (N170) over the occipital-parietal scalp, more prominent over the right than the left hemisphere, which is reduced in evoked potentials elicited by other animate and inanimate non-face stimuli [23]. Applying effective connectivity in face perception, i.e., describing the network of directional effects of one brain region over another, may be a powerful instrument to study this visual process. In order to study these causal effects, it is important to precisely reconstruct the face-response stimulus in the source space. The accuracy of the model reconstruction depends on the choice of the model parameters. We expected that our tv-MVAR estimates would reliably identify the main components of the VEP (i.e., the P100 and the N170 in the source space), and that the major driver of the network would be localized in the lateral, basal temporal and occipital cortices including the fusiform gyrus as shown trough EEG changes from implanted electrodes [24] [25] [26].

B. Experimental protocol
Participants (13 subjects, 2 males, age=24.15 ± 3.41 y) sat in a dimly lit sound-attenuated and electrically shielded room with their head positioned on a chinrest at ~70 cm from the monitor. Each trial lasted 1.2 s and started with a blank screen lasting 500 ms. After the blank interval, one image (either a face or a scramble image) was presented for 200 ms and participants had the remaining 1000 ms to respond. The task was to report whether they saw a face or not (Yes/No task) by pressing two buttons on a response box. Faces and scrambled images were randomly interleaved across trials. After the participant's response, there was a random interval (from 600 to 900 ms) before the beginning of a new trial. The experiment consisted of 4 blocks of 150 trials each, for a total of 600 trials, i.e., 300 with faces and 300 with scrambled images. For this study, we used the EEG data in response of the face images (300 trials per subject). During the experiment, EEG data were recorded continuously at 1024 Hz through a 128-channel Biosemi Active Two EEG system (Biosemi, Amsterdam, The Netherlands). Experimental procedures were conformed to Swiss Ethics Committee standards and were approved by the regional ethics committee (CER-VD).

C. Preprocessing
The VEP EEG signals were downsampled at fs=200 Hz and detrended to remove slow fluctuations and linear trends [57]. The line and monitor noise (50 and 75 Hz, plus harmonics) were attenuated with an adaptive multitaper filter (Cleanline plugin for EEGLAB). EEG epochs were then extracted from the continuous dataset and time-locked from -1000 ms to 1000 ms relative to the onset of each image. Noisy channels were identified by visual inspection and removed before preprocessing. Individual epochs containing nonstereotyped artifacts, peri-stimulus eye blinks and eye movements (occurring within ±500 ms from stimulus onset) were also identified by visual inspection and removed from further analysis (mean number of epochs removed across participants: 6±5). Data were cleaned from remaining physiological artifacts (eye blinks, horizontal and vertical eye movements, muscle potentials and other artifacts) through a PCA-informed ICA algorithm implemented in EEGLAB. After ICA cleaning, the identified artifact channels were interpolated using the nearest-neighbor spline method and the data were re-referenced to the average reference.

D. Simulated Data
To test if the proposed method is capable of detecting the correct values of the model order and the adaptation constant, we generated two sets of simulated data imposing & and ' a priori. Seventy-eight networks with 82 nodes were generated through a random discrete state-space model adding a singletrial VEP component and observation noise [27]. The amplitude of the surrogate tv-MVAR coefficients decreased exponentially with the lags following an underdamped oscillator. VEPs were generated perturbating the univariate values of the simulated tv-MVAR matrices. Asymptotical stability of the tv-system was imposed as in [28]. Each simulated network contains 100 trials of 800-sample signals with signal-to-noise ratio (SNR) equal to 1, model order equal to 5, 7 or 9 and the adaptation constant equal to 3 • 10 ,or 3 • 10 ,. .

III. METHOD
In the framework of a MATLAB toolbox (code available upon request to the authors) that implements the adaptive Kalman filtering [4] and information Partial Directed Coherence (iPDC) in the source space [29], we propose a new analytical approach for the optimal choice of the Kalman parameters. (We refer the reader to [30] for further details on the implementation of the effective connectivity pipeline in the source space from high-density EEG recordings). In order to estimate in parallel the adaptation constant and the tv-MVAR model order, we refined the predictive least square principle (PLS) [5]  However, as for other residual-and information-based criteria in the context of tv-MVAR models (AIC, BIC), the cost function is not ensured to reach its absolute minimum and tends to monotonically decrease with overfitting (Fig. 1a). This could leads to too high model order and adaptation constant that would increase both computational cost and noise sensitivity. If an absolute minimum does not exist, we propose to study the partial derivative of the cost function /01 2 (&, '). Our aim was to find the value at which the /@ does not significantly improve by increasing & and '. Thus, we To test the goodness of the data fitting using the values H& F4= , ' F4= I for each subject, we computed the residuals from the data and the model estimates, the J/KL outflow from the tv-MVAR coefficients, and parametric and non-parametric power spectra of VEP dataset.

IV. RESULTS
As a representative example, the cost function /01 2 (&, ') is reported in Fig.1a for a representative subject. Given that /01 2 (&, ') is a differentiable function and the lack of a global minimum, we could study its partial derivatives (Fig.1 b-c-d) and estimate the inflection point. Considering all subjects, the model order & was estimated in the interval [4: 9], i.e., [20: 45] TU, and the adaptation constant ' spanned between [1.6: 5.4]10 ,-. (In parallel, we also exploited the time-varying formulation of AIC and BIC as cost function varying p and c in the same intervals, the 96% of the estimated values tended to -¥, rendering a reliable estimation of the absolute minimum and the computation of the partial derivatives impossible.) The obtained values for the model order & satisfied the hypothesis on the average maximum human brain fiber length (circa 15 'T) and conduction delay (circa 6 T/U) [33]. Whereas, the values for the adaptation constant ' are in the same interval to the one suggested by Schögl in [6] and exploited in [4]. Fig. 2 reports a comparison between the model prediction (Fig. 2b) and the original data (Fig. 2a) for a representative subject. It is clear that there is a strong correlation between the model prediction and the original data, Y > 0.5 (Fig. 2c). Fig. 3 depicts the average across subject of the power in  Hz for our 82 regions of interest (ROIs). The power was computed both directly from the reconstructed source waveforms by a non-parametric approach (Fig. 3a) and from the tv-MVAR coefficients (Fig. 3b) during N170. We found a strong correlation between the distributions of power in the brain regions computed with the two different methods. Indeed, ROIs with the maximal power (> 95% percentile) were localized in the right lateral occipital cortex and in the right inferior temporal cortex with both methods. Finding the regions with highest power with a non-parametric approach might already give enough information about the main drivers in a network, but we should take into account that nonparametric approaches are less robust to artifacts and noise in the EEG than parametric ones.
Exploiting the tv-MVAR coefficients, we computed the |iPDC| values during the first 500 ms after the stimulus, and we compared the values of the outflow from each ROIs at N170. The connectivity patterns between the different cortical regions were summarized by representing the total outflow from a cortical region toward the others, generated by the sum of all the statistically significant links obtained by application of the iPDC to the cortical waveforms (with their values). In Fig. 4, the total outflow during N170 for each ROI is represented by a sphere centered on the cortical region, whose radius is linearly related to the magnitude of all the out-coming directed links to the other regions. Such information is also coded through a color scale. The greatest amount of   information outflow depicts the ROI as one of the main sources (drivers) of functional connections to the other ROIs [34]. The ROIs with the maximum outflow (> 95% percentile) were localized in the right lateral-occipital cortex, and in the inferior temporal cortex (Fig. 4). Our findings are in accordance with the ones in the literature. N170 usually displays right-hemisphere lateralization and its response in the scalp is maximal over occipito-temporal electrodes.

Non-parametric and parametric average power in all subjects
Regarding the source of N170, this event-related potential was shown to be generated in widely distributed occipitaltemporal-parietal cortical regions.
In Tab. 1 the results of the set of simulations are reported: the 2 nd column stands for the p and c imposed a priori; the 3 rd and the 4 th columns to c predicted and to the p predicted (mean±std) by 100 trials of each simulated network. Both the estimated values for the adaptation constant and the model order satisfied the hypothesis. V. CONCLUSION In this work, we proposed to study the partial derivatives of residual-and information-based criteria to guide the choice of p and c when these functions do not reach an absolute minimum. We reported the results in both surrogate and real data testing the partial derivatives of /01 2 . The consistence of the results of the simulations and in finding the main drivers during N170 prove the validity of adaptive estimation algorithms in estimating tv-Granger causality after an accurate choice of the model order and adaptation constant. This is an important first step towards automated model selection in tv-MVAR modeling.