normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed...

30
Normative solutions to the speed/accuracy trade-off in perceptual decision-making Jan Drugowitsch University of Geneva FENS-Hertie Winter School 2015 The neuroscience of decision making Ruben Moreno-Bote Alex Pouget Mike Shadlen

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Normative solutions to thespeed/accuracy trade-off in perceptual decision-making

Jan DrugowitschUniversity of Geneva

FENS-Hertie Winter School 2015The neuroscience of decision making

Ruben Moreno-BoteAlex Pouget

Mike Shadlen

Page 2: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Perceptual decision-making?

Page 3: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Normative solutions?

Can be parameterised to fit the behaviour

Deviations from normative solutions give insight into specific deficits or biases

Norma&ve  =  how  things  ought  to  be  

the information on which decisions should be based

the computations that need to be carried out on that informationIdentify

Page 4: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Resources

Notes, derivations, and figure code (MATLAB) is on

https://github.com/jdrugo/FENS2015

Page 5: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Page 6: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Page 7: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Normative evidence accumulation

51.2% coherence

12.8% coherence

Random dot motion task

“right”?“left”?

z ∈ −1,1{ }Hidden state

“left” “right”p(x | z) =N x | z,σε

2( )Evidence

observation / stimulus

Posterior belief

p(z | x)∝ p(x | z)p(z)∝ 1

1+ e−2 xzσε2

likelihood prior

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

01evacc/single_obs.m

Page 8: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Evidence accumulation with multiple observations

z ∈ −1,1{ }Hidden state

“left” “right”p(xn | z) =N xn | z,σε

2( )nth evidence

observation / stimulus

Posterior belief after N observations

p(z | x1:N )∝ p(z) p(xn | z)n=1

N

∏ ∝1

1+ e−2XNz

σε2

i.i.d. likelihoodprior

XN = xnn=1

N

∑with sufficient statistics

x n

1 2 3 4 5 6 7 8 9 10−4−2

024

X n

1 2 3 4 5 6 7 8 9 10−5

0

5

p(z

= 1|

x1:

n)

observation n1 2 3 4 5 6 7 8 9 10

0

0.5

1

01evacc/discrete_obs.m

Page 9: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Evidence accumulation with stream of evidence

Discretise stream into small chunks of δt

Likelihood of momentary evidence δxn

p δxn | z( ) =N δxn | zδt,σε2δt( )

information in δxn about z goes to 0 with δt 0

dxn /

dt

1

−2

0

2

X(t)

1−1

0

1

2

p(z

= 1|

x0:

t)

time t1

0

0.5

1

01evacc/continuous_obs.m

Posterior upon observing δx1, δx2, … for t seconds(taking δt 0)

X(t) = δx(s)0

t

∫with sufficient statistics

p z |δx0:t( )∝ p(z) p δxn | z( )n=1

tδt

∏ ∝1

1+ e−2X (t )z

σε2

i.i.d. likelihoodprior

X(t) ~ N zt,σε2t( )diffusion model

Page 10: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Page 11: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Maximizing 0-1 reward

To get from posterior to choices: need loss/reward functionreward(choice a, hidden state z)

Perform choice that maximises expected reward

0-1 reward

choicea=“right”

a=“left”z=-1 z=110 1

0

hidden state

p z =1| X(t)( ) ≥ 12

p z =1| X(t)( ) < 12

choose “right”

choose “left”

choose more likely correct option

p(z

= 1

| x0:

t)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

time t

expe

cted

rew

ard

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

how long to accumulate?

02lossfn/loss_01.m

forever?

Page 12: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

A cost for accumulating evidence

Cost c for accumulating evidence internal (e.g., effort / attention)external (e.g., lost future opportunities)

p(z

= 1

| x0:

t)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

time t

expe

cted

rew

ard

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

cost 0.0cost 0.1cost 0.2

02lossfn/loss_cost.m

After some time,marginal choice accuracy increase doesnot justify associated cost

ER(PC,RT ) = PC − cRT

Expected reward

p(correct) reaction time

This is the reward functionthat we will use for the rest of the tutorial

Page 13: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Maximising reward rate rather than expected reward

Loss of future reward is explicit in reward rate

RR(PC,RT ) = PC − cRTRT + ti + (1−PC)tp

inter-trial interval penalty time(for wrong choices)

p(z

= 1

| x0:

t)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

time t

expe

cted

rew

ard

rate

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1ti 0.0ti 2.0ti 4.0

02lossfn/loss_rr.m

Even c=0 leads to early choices,as large RT reduces reward rate in denominator

We will come back to that as possible extension

Page 14: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Page 15: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Dynamic Programming in a nutshell

We find the optimal speed/accuracy trade-off by Dynamic Programming

Ingredients (Markov Decision Process):Set of states,Set of actions,State transition probabilities,Reward function,(Discount factor, )

s ∈ Sa ∈ A

p !s | s,a( )r(s,a)γ =1

s1

s2s3

s5s4

s6

r(s1,1)=-1

a=1

a=2

a=1a=1

a=1

a=1 r(s2,{1,2})=-1r(s3,1)=-1

r(s4,1)=-1r(s5,1)=10

a=1r(s6,1)=0

Aim: find policy that maximises total rewardπ : S→ A

V π (s) = r sn,π (sn )( )n=1

∑p s1,s2 ,…|s1=s,π( )

from each state s.

Can be found recursively by solving Bellman’s equation:

V s( ) =maxa

r s,a( )+ V ( !s )p !s |s,a( )

"#

$%

Optimal action (i.e., optimal policy) is action that maximises right-hand side.

V(s6)=0

V(s5)=10V(s4)=9

V(s3)=8V(s2)=9

V(s1)=8

Page 16: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

States and actions in perceptual decision-making

States: all observations so far sufficient statistics X ∈ −∞,∞[ ]To have bounded state-space, use belief instead:g∈ 0,1[ ]

g(X) ≡ p z =1| X( ) = 1

1+ e−2 Xσε2

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

01evacc/single_obs.m

X ∈ −∞,∞[ ]

g∈ 0,1[ ]

Actions: r s,a( )+ V ( !s )p( !s |s,a)

choose “right” (z=1) g+ 01− g+ 0−cδt + V ( "g )

p(g '|g)

choose “left” (z=-1)accumulate another δt

Bellman’s equation for perceptual decision-making:

V (g) =max g,1− g, V ( "g )p(g '|g)

− cδt#$

%&

belief c

valu

e

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

max(g, 1−g)<V> − c dt

value

belief g03knownreliab/plot_valueintersect.m

optimal decision-making by boundaries on belief g

Page 17: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

An example decision

time t

belie

f g

0 1 20

0.5

1

03knownreliab/plot_diffusion_example.m

Page 18: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Finding the optimal bounds numerically

Find belief transition probability p(g’|g) for accumulating δt more evidence

Discretise belief and value function into k=1,…K, and solve numerically by value iteration:

V k,n =max gk,1− gk, p g j | gk( )V j,n−1 − cδtj=1

K

∑#

$%%

&

'((

dt = 0.100, sig2 = 1.000

g

0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 1.000

0 0.5 1

0

0.5

1

dt = 0.001, sig2 = 1.000

0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 0.250

g

g’0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 1.000

g’0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 25.000

g’0 0.5 1

0

0.5

1

03knownreliab/plot_belieftrans.m

decreasing δt

belief g’ after another δt

curr

ent b

elief

g

increasing σε(task difficulty)

Page 19: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Optimal boundary with cost, task difficulty

evidence accumulation cost c

boun

d in

bel

ief

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1directDP

boun

dary

in b

elief

g

evidence accumulation cost c03knownreliab/plot_bound_with_cost.m

task difficulty sqrt(sig2)

boun

d in

bel

ief

0 0.5 1 1.5 2 2.5 3 3.5 40.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1directDP

task difficulty σε hardeasy

03knownreliab/plot_bound_with_sig2.m

An alternative to DP to find the optimal boundaries:

ER(PC,RT ) = PC − cRT

PC = 1

1+ e−2 θσε2

RT =θ tanh θσε2

Maximising using known expression for diffusion models

Page 20: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Optimal decision-making with diffusion models

Boundaries on belief g boundaries in X (diffusion model)

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

01evacc/single_obs.m

X ∈ −∞,∞[ ]

g∈ 0,1[ ]

time t

suffi

cien

t sta

tistc

s X

0 1 2−3

−2

−1

0

1

2

3

time t

belie

f g

0 1 20

0.5

1

03knownreliab/plot_diffusion_example.m

Can perform optimal decision-making without ever computing belief g

Page 21: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Page 22: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Evidence accumulation with reliability changing across trials

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

01evacc/single_obs.m

defines trial difficulty for fixed σε=1

Hidden state, z ∈ −1,1{ }Trial difficulty, (small = hard)α ∈ 0,∞[ ]

Per-trial evidence mean µ =αz

p(µ) =N µ | 0,σα2( )

drawn for each trial from

overall task difficulty (small = hard)Hard trials more likely, easy trials less likely

With evidence, evidence accumulation by δxn ~ N µδt,δt( )

g(X, t) ≡ p z =1|δx0:t( ) = p µ ≥ 0 | X(t), t( ) =Φ X(t)σα

−2 + t

%

&''

(

)**

such that sufficient statistics are now both X(t) and t

p µ |δx0:t( )∝ p(µ) p δxn |µ( )n=1

tδt

∏ ∝N µ | X(t)σα

−2 + t, 1σα

−2 + t$

%&

'

()

Page 23: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

The resulting optimal policy

V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− cδt#

$%&'(

Bellman’s equation over belief g and time t

time t

belie

f g

0 1 20

0.5

1

value

belie

f g

0 0.5 10

0.5

1

max[g, 1−g]<V>−c dt, t=0.00<V>−c dt, t=0.99<V>−c dt, t=1.98

04knownreliab/plot_valuefn.m

future expected value takes into account time-dependence

For each fixed t boundaries in belief g time-dependent boundaries

Page 24: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Optimal decision-making with diffusion models

Using time-dependent mapping between belief g and diffusing particle X

g(X, t) =Φ Xσα

−2 + t

#

$%%

&

'(( X(g, t) = σα

−2 + tΦ−1 g( )

Can map bound in belief g to bound in diffusing particle X

time t

suffi

cien

t sta

tistic

X

0 1 2−4

−2

0

2

4

time t

belie

f g0 1 2

0

0.5

1

04knownreliab/plot_diffusion_example.m

As before, optimal decision-making without ever directly computing the beliefNaturally leads to slower error than correct choices

Page 25: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Urgency signal

“urgency signal” ascollapsing bound?

Page 26: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Page 27: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Maximising reward rate instead of expected reward

Maximising reward rate = maximising reward in infinite sequence of structurally equal trials

Problem: value of initial choice will be infinite (infinite, possibly rewarding choices follow)

Solution: use average-adjusted value, discounting time δt by -ρδt

reward rate (reward per unit time)

this value is relative to reward that can be achieved on average

For example, Bellman’s equation for known evidence reliability

Policy is invariant to shifts in value function, such that we can choose .V 12( ) = 0

Find both value function and reward rate, using consistency criterion, .V 12( ) = 0

V (g) =maxg− ti + (1− g)tp( )ρ +V 1

2( ),1− g− ti + gtp( )ρ +V 12( ),

V "g( ) p "g |g( )− ρ + c( )δt

#

$

%%%

&

'

(((

expected time from decision to onset of next trial

Page 28: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Generalisations to evidence model

A time-varying accumulation cost

Accumulation cost might rise/drop over time

V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− c(t)δt#

$%&'(

supported by human/animal behavior (Drugowitsch et al., 2012)

Evidence reliability that varies within individual trials

In real-world decisions, the evidence reliability is practically never constant.

Requires simultaneous estimation of hidden state and momentary evidence reliabilitysufficient statistics are at least two-dimensional

Leads to reliability-dependent bound on decision-maker’s belief,see Drugowitsch, Moreno-Bote, Pouget (NIPS, 2014).

Paper also introduces faster way to find expected future reward, using PDE solvers.

Page 29: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]

Summary

Notes, derivations, and figure code (MATLAB) is on

https://github.com/jdrugo/FENS2015

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Page 30: Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed σ ε =1 Hidden state, z∈{−1,1} Trial difficulty, (small = hard)α∈[0,∞]