normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed...

Normative solutions to thespeed/accuracy trade-off in perceptual decision-making

Jan DrugowitschUniversity of Geneva

FENS-Hertie Winter School 2015The neuroscience of decision making

Ruben Moreno-BoteAlex Pouget

Mike Shadlen

Perceptual decision-making?

Normative solutions?

Can be parameterised to fit the behaviour

Deviations from normative solutions give insight into specific deficits or biases

Norma&ve = how things ought to be

the information on which decisions should be based

the computations that need to be carried out on that informationIdentify

Resources

Notes, derivations, and figure code (MATLAB) is on

https://github.com/jdrugo/FENS2015

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions


51.2% coherence

12.8% coherence

Random dot motion task

“right”?“left”?

z ∈ −1,1{ }Hidden state

“left” “right”p(x | z) =N x | z,σε

2( )Evidence

observation / stimulus

Posterior belief

p(z | x)∝ p(x | z)p(z)∝ 1

1+ e−2 xzσε2

likelihood prior

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

01evacc/single_obs.m

Evidence accumulation with multiple observations

z ∈ −1,1{ }Hidden state

“left” “right”p(xn | z) =N xn | z,σε

2( )nth evidence

observation / stimulus

Posterior belief after N observations

p(z | x1:N )∝ p(z) p(xn | z)n=1

N

∏ ∝1

1+ e−2XNz

σε2

i.i.d. likelihoodprior

XN = xnn=1

N

∑with sufficient statistics

x n

1 2 3 4 5 6 7 8 9 10−4−2

024

X n

1 2 3 4 5 6 7 8 9 10−5

0

5

p(z

= 1|

x1:

n)

observation n1 2 3 4 5 6 7 8 9 10

0

0.5

1

01evacc/discrete_obs.m

Evidence accumulation with stream of evidence

Discretise stream into small chunks of δt

Likelihood of momentary evidence δxn

p δxn | z( ) =N δxn | zδt,σε2δt( )

information in δxn about z goes to 0 with δt 0

dxn /

dt

1

−2

0

2

X(t)

1−1

0

1

2

p(z

= 1|

x0:

t)

time t1

0

0.5

1

01evacc/continuous_obs.m

Posterior upon observing δx1, δx2, … for t seconds(taking δt 0)

X(t) = δx(s)0

t

∫with sufficient statistics

p z |δx0:t( )∝ p(z) p δxn | z( )n=1

tδt

∏ ∝1

1+ e−2X (t )z

σε2

i.i.d. likelihoodprior

X(t) ~ N zt,σε2t( )diffusion model

Road map





Extensions

Maximizing 0-1 reward

To get from posterior to choices: need loss/reward functionreward(choice a, hidden state z)

Perform choice that maximises expected reward

0-1 reward

choicea=“right”

a=“left”z=-1 z=110 1

0

hidden state

p z =1| X(t)( ) ≥ 12

p z =1| X(t)( ) < 12

choose “right”

choose “left”

choose more likely correct option

p(z

= 1

| x0:

t)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

time t

expe

cted

rew

ard

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

how long to accumulate?

02lossfn/loss_01.m

forever?

A cost for accumulating evidence

Cost c for accumulating evidence internal (e.g., effort / attention)external (e.g., lost future opportunities)

p(z

= 1

| x0:

t)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

time t

expe

cted

rew

ard

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

cost 0.0cost 0.1cost 0.2

02lossfn/loss_cost.m

After some time,marginal choice accuracy increase doesnot justify associated cost

ER(PC,RT ) = PC − cRT

Expected reward

p(correct) reaction time

This is the reward functionthat we will use for the rest of the tutorial

Maximising reward rate rather than expected reward

Loss of future reward is explicit in reward rate

RR(PC,RT ) = PC − cRTRT + ti + (1−PC)tp

inter-trial interval penalty time(for wrong choices)

p(z

= 1

| x0:

t)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

time t

expe

cted

rew

ard

rate

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1ti 0.0ti 2.0ti 4.0

02lossfn/loss_rr.m

Even c=0 leads to early choices,as large RT reduces reward rate in denominator

We will come back to that as possible extension

Road map





Extensions

Dynamic Programming in a nutshell

We find the optimal speed/accuracy trade-off by Dynamic Programming

Ingredients (Markov Decision Process):Set of states,Set of actions,State transition probabilities,Reward function,(Discount factor, )

s ∈ Sa ∈ A

p !s | s,a( )r(s,a)γ =1

s1

s2s3

s5s4

s6

r(s1,1)=-1

a=1

a=2

a=1a=1

a=1

a=1 r(s2,{1,2})=-1r(s3,1)=-1

r(s4,1)=-1r(s5,1)=10

a=1r(s6,1)=0

Aim: find policy that maximises total rewardπ : S→ A

V π (s) = r sn,π (sn )( )n=1

∞

∑p s1,s2 ,…|s1=s,π( )

from each state s.

Can be found recursively by solving Bellman’s equation:

V s( ) =maxa

r s,a( )+ V ( !s )p !s |s,a( )

"#

$%

Optimal action (i.e., optimal policy) is action that maximises right-hand side.

V(s6)=0

V(s5)=10V(s4)=9

V(s3)=8V(s2)=9

V(s1)=8

States and actions in perceptual decision-making

States: all observations so far sufficient statistics X ∈ −∞,∞[ ]To have bounded state-space, use belief instead:g∈ 0,1[ ]

g(X) ≡ p z =1| X( ) = 1

1+ e−2 Xσε2

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)


X ∈ −∞,∞[ ]

g∈ 0,1[ ]

Actions: r s,a( )+ V ( !s )p( !s |s,a)

choose “right” (z=1) g+ 01− g+ 0−cδt + V ( "g )

p(g '|g)

choose “left” (z=-1)accumulate another δt

Bellman’s equation for perceptual decision-making:

V (g) =max g,1− g, V ( "g )p(g '|g)

− cδt#$

%&

belief c

valu

e

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

max(g, 1−g)<V> − c dt

value

belief g03knownreliab/plot_valueintersect.m

optimal decision-making by boundaries on belief g

An example decision

time t

belie

f g

0 1 20

0.5

1

03knownreliab/plot_diffusion_example.m

Finding the optimal bounds numerically

Find belief transition probability p(g’|g) for accumulating δt more evidence

Discretise belief and value function into k=1,…K, and solve numerically by value iteration:

V k,n =max gk,1− gk, p g j | gk( )V j,n−1 − cδtj=1

K

∑#

$%%

&

'((

dt = 0.100, sig2 = 1.000

g

0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 1.000

0 0.5 1

0

0.5

1

dt = 0.001, sig2 = 1.000

0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 0.250

g

g’0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 1.000

g’0 0.5 1

0

0.5

1

dt = 0.010, sig2 = 25.000

g’0 0.5 1

0

0.5

1

03knownreliab/plot_belieftrans.m

decreasing δt

belief g’ after another δt

curr

ent b

elief

g

increasing σε(task difficulty)

Optimal boundary with cost, task difficulty

evidence accumulation cost c

boun

d in

bel

ief

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1directDP

boun

dary

in b

elief

g

evidence accumulation cost c03knownreliab/plot_bound_with_cost.m

task difficulty sqrt(sig2)

boun

d in

bel

ief

0 0.5 1 1.5 2 2.5 3 3.5 40.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1directDP

task difficulty σε hardeasy

03knownreliab/plot_bound_with_sig2.m

An alternative to DP to find the optimal boundaries:

ER(PC,RT ) = PC − cRT

PC = 1

1+ e−2 θσε2

RT =θ tanh θσε2

Maximising using known expression for diffusion models

Optimal decision-making with diffusion models

Boundaries on belief g boundaries in X (diffusion model)

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)


X ∈ −∞,∞[ ]

g∈ 0,1[ ]

time t

suffi

cien

t sta

tistc

s X

0 1 2−3

−2

−1

0

1

2

3

time t

belie

f g

0 1 20

0.5

1


Can perform optimal decision-making without ever computing belief g

Road map





Extensions

Evidence accumulation with reliability changing across trials

x

post

erio

r and

like

lihoo

d

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)


defines trial difficulty for fixed σε=1

Hidden state, z ∈ −1,1{ }Trial difficulty, (small = hard)α ∈ 0,∞[ ]

Per-trial evidence mean µ =αz

p(µ) =N µ | 0,σα2( )

drawn for each trial from

overall task difficulty (small = hard)Hard trials more likely, easy trials less likely

With evidence, evidence accumulation by δxn ~ N µδt,δt( )

g(X, t) ≡ p z =1|δx0:t( ) = p µ ≥ 0 | X(t), t( ) =Φ X(t)σα

−2 + t

%

&''

(

)**

such that sufficient statistics are now both X(t) and t

p µ |δx0:t( )∝ p(µ) p δxn |µ( )n=1

tδt

∏ ∝N µ | X(t)σα

−2 + t, 1σα

−2 + t$

%&

'

()

The resulting optimal policy

V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− cδt#

$%&'(

Bellman’s equation over belief g and time t

time t

belie

f g

0 1 20

0.5

1

value

belie

f g

0 0.5 10

0.5

1

max[g, 1−g]<V>−c dt, t=0.00<V>−c dt, t=0.99<V>−c dt, t=1.98

04knownreliab/plot_valuefn.m

future expected value takes into account time-dependence

For each fixed t boundaries in belief g time-dependent boundaries

Optimal decision-making with diffusion models

Using time-dependent mapping between belief g and diffusing particle X

g(X, t) =Φ Xσα

−2 + t

#

$%%

&

'(( X(g, t) = σα

−2 + tΦ−1 g( )

Can map bound in belief g to bound in diffusing particle X

time t

suffi

cien

t sta

tistic

X

0 1 2−4

−2

0

2

4

time t

belie

f g0 1 2

0

0.5

1


As before, optimal decision-making without ever directly computing the beliefNaturally leads to slower error than correct choices

Urgency signal

“urgency signal” ascollapsing bound?

Road map





Extensions

Maximising reward rate instead of expected reward

Maximising reward rate = maximising reward in infinite sequence of structurally equal trials

Problem: value of initial choice will be infinite (infinite, possibly rewarding choices follow)

Solution: use average-adjusted value, discounting time δt by -ρδt

reward rate (reward per unit time)

this value is relative to reward that can be achieved on average

For example, Bellman’s equation for known evidence reliability

Policy is invariant to shifts in value function, such that we can choose .V 12( ) = 0

Find both value function and reward rate, using consistency criterion, .V 12( ) = 0

V (g) =maxg− ti + (1− g)tp( )ρ +V 1

2( ),1− g− ti + gtp( )ρ +V 12( ),

V "g( ) p "g |g( )− ρ + c( )δt

#

$

%%%

&

'

(((

expected time from decision to onset of next trial

Generalisations to evidence model

A time-varying accumulation cost

Accumulation cost might rise/drop over time

V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− c(t)δt#

$%&'(

supported by human/animal behavior (Drugowitsch et al., 2012)

Evidence reliability that varies within individual trials

In real-world decisions, the evidence reliability is practically never constant.

Requires simultaneous estimation of hidden state and momentary evidence reliabilitysufficient statistics are at least two-dimensional

Leads to reliability-dependent bound on decision-maker’s belief,see Drugowitsch, Moreno-Bote, Pouget (NIPS, 2014).

Paper also introduces faster way to find expected future reward, using PDE solvers.

Summary

Notes, derivations, and figure code (MATLAB) is on

https://github.com/jdrugo/FENS2015





Extensions

normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed...

Documents