prefrontal cortex as a meta-reinforcement learning system · prefrontal cortex as a...

Post on 19-Aug-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Prefrontal cortex as a Meta-reinforcement learning system

Matthew BotvinickDeepMind, London UKGatsby Computational Neuroscience Unit, UCL

Mnih et al, Nature (2015)

Mnih et al, Nature (2015)

Yamins & DiCarlo, 2016

Schultz et al, Science (1997)

Jederberg et al., 2016

Jederberg et al., 2016

Mante et al., Nature, 2013

Song et al., Elife, 2017

Lake et al, BBS (2017)

Harlow, Psychological Review, 1949

“Learning to learn”

Harlow, Psychological Review, 1949

Training episodes

“Learning to learn”

Mnih et al, Nature (2015)

Jederberg et al., 2016

Jederberg et al., 2016

https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)

0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Trial1008060401 20

1

2

3

4

Cum

ulat

ive

regr

et

Gittins indices

UCBThompson sampling

Trial

Episode

Left Right

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Trial1008060401 20

1

2

3

4

Cum

ulat

ive

regr

et

Gittins indices

UCBThompson sampling

Trial

Episode

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Training episodes

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

420-2-4

-4

-2

0

2

4

log2RRRL

log 2

CR CL

420-2-4

log2RRRL

log 2

CR CL

-4

-2

0

2

4

Tsutsui et al., Nature Comms, 2016

Wang et al., Nature Neuroscience (2018)

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

at-1 rt-1 at-1x rt-1 vt

0.2

0.1

0.3

0.4

0.5

0.6

Pro

porti

on

Tsutsui et al., Nature Comms, 2016

0.2

0.1

0.3

0.4

0.5

0.6

Cor

rela

tion

at-1 rt-1 at-1x rt-1 vt

Wang et al., Nature Neuroscience (2018)

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

Trial1008060401 20

1

2

3

4

Cum

ulat

ive

regr

et

Gittins indices

UCBThompson sampling

Trial

Episode

A

B

0 20 40 60 80 100 120 140 160 180 200Step

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160 180 200

Step

Reward probability

Inferred/decoded volatilityLearning rate

action feedback

Behrens et al., Nature Neuroscience, 2007Wang et al., Nature Neuroscience (2018)

Behrens et al., Nature Neuroscience, 2007Wang et al., Nature Neuroscience (2018)

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

Bromberg-Martin et al, J Neurophys, 2010

REVERSAL

Wang et al., Nature Neuroscience (2018)

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Left rewardedRight rewarded

Wang et al., Nature Neuroscience (2018)

Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011

Model-based RPE

Stage 2

1

0

-1

1-1 0

Met

a-R

L R

PE

Reward

r2 = 0.89

Model-based RL (from model-free RL)

Wang et al., Nature Neuroscience (2018)

DA blocked uponfood reward fromlarge/risky option

DA blocked upon food reward from

small/certain option

DA triggered uponfood omission from large/risky option

Wang et al., arXiv; 2018Stopper et al., Neuron, 2014

Optogenetic manipulation of dopamine

Mnih et al, Nature (2015)

• Richer environments / abstractions (Espeholt et al., arXiv, 2018)

• Architectural biases (e.g., Raposo et al., NIPS, 2017)

• Complementary forms of meta-learning (e.g., Fernando et al., under review)

• Episodic reinstatement (Ritter et al., in press)

Current / Future Work

Neuroscience and AI: A virtuous circle

Jane WangZeb Kurth-NelsonDharshan KumaranChris SummerfieldHubert SoyerJoel LeiboSam Ritter

Collaborators

Adam SantoroTim LillicrapDavid Barrett Dhruva TirumalaRemi MunosCharles BlundellDemis Hassabis

DeepMind, London UKGatsby Computational Neuroscience Unit, UCL

top related