Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.

Liakoni, V.; Lehmann, M.P.; Modirshanechi, A.; Brea, J.; Lutti, A.; Gerstner, W.; Preuschoff, K.

doi:10.1016/j.neuroimage.2021.118780

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.

Details

Request a copy Under indefinite embargo.
UNIL restricted access
State: Public
Version: author
License: CC BY-NC-ND 4.0

Serval ID

serval:BIB_6EC8EB5DAAF6

Type

Article: article from journal or magazin.

Collection

Publications

Institution

UNIL/CHUV

Title

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making.

Journal

NeuroImage

Author(s)

Liakoni V., Lehmann M.P., Modirshanechi A., Brea J., Lutti A., Gerstner W., Preuschoff K.

ISSN

1095-9572 (Electronic)

ISSN-L

1053-8119

Publication state

Published

Issued date

01/02/2022

Peer-reviewed

Oui

Volume

246

Pages

118780

Language

english

Notes

Publication types: Journal Article ; Research Support, Non-U.S. Gov't
Publication Status: ppublish

Abstract

Learning how to reach a reward over long series of actions is a remarkable capability of humans, and potentially guided by multiple parallel learning modules. Current brain imaging of learning modules is limited by (i) simple experimental paradigms, (ii) entanglement of brain signals of different learning modules, and (iii) a limited number of computational models considered as candidates for explaining behavior. Here, we address these three limitations and (i) introduce a complex sequential decision making task with surprising events that allows us to (ii) dissociate correlates of reward prediction errors from those of surprise in functional magnetic resonance imaging (fMRI); and (iii) we test behavior against a large repertoire of model-free, model-based, and hybrid reinforcement learning algorithms, including a novel surprise-modulated actor-critic algorithm. Surprise, derived from an approximate Bayesian approach for learning the world-model, is extracted in our algorithm from a state prediction error. Surprise is then used to modulate the learning rate of a model-free actor, which itself learns via the reward prediction error from model-free value estimation by the critic. We find that action choices are well explained by pure model-free policy gradient, but reaction times and neural data are not. We identify signatures of both model-free and surprise-based learning signals in blood oxygen level dependent (BOLD) responses, supporting the existence of multiple parallel learning modules in the brain. Our results extend previous fMRI findings to a multi-step setting and emphasize the role of policy gradient and surprise signalling in human learning.

Keywords

Adult, Brain/diagnostic imaging, Brain/physiology, Decision Making/physiology, Female, Functional Neuroimaging/methods, Humans, Learning/physiology, Magnetic Resonance Imaging/methods, Male, Models, Biological, Reinforcement, Psychology, Young Adult, Behavior, Human learning, Reinforcement learning, Sequential decision making, Surprise, fMRI

OAI-PMH

oai:serval.unil.ch:BIB_6EC8EB5DAAF6

DOI

10.1016/j.neuroimage.2021.118780

Pubmed

34875383

Web of science

000736989900003

Open Access

Yes

Create date

11/12/2021 12:59