prefrontal cortex as a meta-reinforcement learning system · prefrontal cortex as a...

Prefrontal cortex as a Meta-reinforcement learning system

Matthew BotvinickDeepMind, London UKGatsby Computational Neuroscience Unit, UCL

Mnih et al, Nature (2015)

Yamins & DiCarlo, 2016

Schultz et al, Science (1997)

Jederberg et al., 2016

Mante et al., Nature, 2013

Song et al., Elife, 2017

Lake et al, BBS (2017)

Harlow, Psychological Review, 1949

“Learning to learn”

Harlow, Psychological Review, 1949

Training episodes

“Learning to learn”

Jederberg et al., 2016

https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/

ot at-1 rt-1

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)

0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

ot at-1 rt-1

Trial1008060401 20

Gittins indices

UCBThompson sampling

Episode

Left Right

ot at-1 rt-1

0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2

Trial1008060401 20

Gittins indices

Episode

Training episodes

ot at-1 rt-1

Volkmann et al., Nature Reviews Neurology, 2010

420-2-4

log2RRRL

420-2-4

log2RRRL

Tsutsui et al., Nature Comms, 2016

Wang et al., Nature Neuroscience (2018)

ot at-1 rt-1

at-1 rt-1 at-1x rt-1 vt

Tsutsui et al., Nature Comms, 2016

at-1 rt-1 at-1x rt-1 vt

ot at-1 rt-1

Trial1008060401 20

Gittins indices

Episode

0 20 40 60 80 100 120 140 160 180 200Step

0 20 40 60 80 100 120 140 160 180 200

Reward probability

Inferred/decoded volatilityLearning rate

action feedback

Behrens et al., Nature Neuroscience, 2007Wang et al., Nature Neuroscience (2018)

ot at-1 rt-1

Volkmann et al., Nature Reviews Neurology, 2010

Bromberg-Martin et al, J Neurophys, 2010

REVERSAL

ot at-1 rt-1

Left rewardedRight rewarded

Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011

Model-based RPE

Stage 2

Reward

r2 = 0.89

Model-based RL (from model-free RL)

DA blocked uponfood reward fromlarge/risky option

DA blocked upon food reward from

small/certain option

DA triggered uponfood omission from large/risky option

Wang et al., arXiv; 2018Stopper et al., Neuron, 2014

Optogenetic manipulation of dopamine

• Richer environments / abstractions (Espeholt et al., arXiv, 2018)

• Architectural biases (e.g., Raposo et al., NIPS, 2017)

• Complementary forms of meta-learning (e.g., Fernando et al., under review)

• Episodic reinstatement (Ritter et al., in press)

Current / Future Work

Neuroscience and AI: A virtuous circle

Jane WangZeb Kurth-NelsonDharshan KumaranChris SummerfieldHubert SoyerJoel LeiboSam Ritter

Collaborators

Adam SantoroTim LillicrapDavid Barrett Dhruva TirumalaRemi MunosCharles BlundellDemis Hassabis

DeepMind, London UKGatsby Computational Neuroscience Unit, UCL

prefrontal cortex as a meta-reinforcement learning system · prefrontal cortex as a...

Documents

do rats have a prefrontal cortex

medial prefrontal cortex circuit function during retrieval...

hierarchical and nonlinear dynamics in prefrontal cortex

dorsolateral prefrontal and orbitofrontal cortex...

prefrontal cortex as a meta-reinforcement learning system...

gender-specific hemodynamics in prefrontal cortex -

the impact of orbital prefrontal cortex damage on...

prefrontal cortex and symbol learning:

rostrolateral prefrontal cortex: domaingeneral or...

prefrontal cortex and early child development

prefrontal cortex activation supports the emergence of

prefrontal cortex as a meta-reinforcement cs330 student...

feature article lateral prefrontal cortex … hampshire...

medial prefrontal cortex predicts internally driven strategy...

orbitofrontal cortex and its contribution to...

medial prefrontal cortex dopamine controls the persistent...

development of abstract thinking during childhood and...

inactivation of parietal and prefrontal cortex reveals

rostrolateral prefrontal cortex and individual differences

prefrontal cortex oxygenation evoked by convergence load