supplementary modelingclm.utexas.edu › compjclub › papers ›...

18
Supplementary Modeling Inferring decoding strategies from choice probabilities in the presence of correlated variability Ralf M. Haefner , Sebastian Gerwinn, Jakob H. Macke, and Matthias Bethge Contents S1 Derivation of choice probability formula ..................... S3 S2 Choice probabilities for Poisson-like responses and integration-to-bound or attractor-based decision-making ........................ S7 S3 Notes on nonlinear read-out ............................ S10 S4 Pooling noise ..................................... S11 S5 Notes on earlier simulation-based results .................... S11 S5.1 Uniform read-out ................................... S11 S5.2 Uniform correlations ................................. S12 S6 Notes on weight reconstruction for populations of small and known size . S13 S7 Smooth models overestimate choice probabilities ............... S15 S8 Notes on optimal linear read-out and its relationship to zero-signal stimuli S18 Corresponding author ([email protected]) S2 Nature Neuroscience: doi:10.1038/nn.3309

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

Supplementary Modeling

Inferring decoding strategies from choice probabilities

in the presence of correlated variability

Ralf M. Haefner, Sebastian Gerwinn, Jakob H. Macke, and Matthias Bethge

ContentsS1 Derivation of choice probability formula . . . . . . . . . . . . . . . . . . . . . S3

S2 Choice probabilities for Poisson-like responses and integration-to-bound

or attractor-based decision-making . . . . . . . . . . . . . . . . . . . . . . . . S7

S3 Notes on nonlinear read-out . . . . . . . . . . . . . . . . . . . . . . . . . . . . S10

S4 Pooling noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S11

S5 Notes on earlier simulation-based results . . . . . . . . . . . . . . . . . . . . S11

S5.1 Uniform read-out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S11

S5.2 Uniform correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S12

S6 Notes on weight reconstruction for populations of small and known size . S13

S7 Smooth models overestimate choice probabilities . . . . . . . . . . . . . . . S15

S8 Notes on optimal linear read-out and its relationship to zero-signal stimuli S18

Corresponding author ([email protected])

S2

Nature Neuroscience: doi:10.1038/nn.3309

Page 2: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

S1 Derivation of choice probability formula

P r

j

D 0P r

j

, D 0

P D 0

2P r

j

P D 0 r

j

Since r

j

is assumed to be normally distributed around f

j

0 we obtain:

P r

j

N r

j

: f

j

0 , f

j

0 2

S

2

C

jj

: N y : µ

y

, ⌃yy

P D N D : 0, b

2

S

2

� C� : N x : µ

x

, ⌃xx

where �

S

2

is the signal variance in the stimulus (e.g. coherence) and f is the rate of change of the responsewith the stimulus (f df ds where s is the stimulus variable, and s 0 denoting the ambiguousstimulus). �

S is assumed to be negligible in the main text for simplicity, however, when presenting astimulus with high signal variance, for instance in order to perform psychophysical reverse correlation20,24,accounting for this component of choice probabilities becomes important (e.g. for Figure S4c). With s thestimulus fluctuation on every trial, and ⌫

j

the additional response fluctuation unrelated to the stimulus(e.g. intrinsic or top-down), we find:

⌃xy

j

E r

j

r

j

D D

E r

j

r

j

D since D 0

E f

j

0 s ⌘

j

bs

n

k 1

k

k

with E ⌘

j

k

C

jk

since D

n

k 1

k

r

k

s

n

k 1

k

f

k

0n

k 1

k

k

bf

j

0 �

S

2

n

k 1

k

C

jk

bf

j

0 �

S

2

C�j

Since

P x y N x : µ

x

⌃xy

⌃yy

y µ

y

, ⌃xx

⌃2

xy

⌃yy

and µ

x

0, we find:

P D r

j

N D :C�

j

bf

j

0 �

S

2

f

j

0 2

S

2

C

jj

r

j

f

j

0 ,

b

2

S

2

� C�C�

j

bf

j

0 �

S

2

2

f

j

0 2

S

2

C

jj

.

With⌃ I

j

f

j

0 2

S

2

C

jj

, ⌃IC

j

C�j

bf

j

0 �

S

2

and⌃ C

b

2

S

2

� C� it follows:

P r

j

f

j

0 D 0 2� r

j

: f

j

0 , ⌃I

j

� 0 :⌃IC

j

⌃I

j

r

j

f

j

0 ; ⌃C

⌃IC

2

⌃I

j

2

⌃I

j

r

j

f

j

0

⌃I

j

: 0, 1 �⌃IC

j

⌃I

j

⌃C ⌃IC

2 ⌃I

j

r

j

f

j

0 : 0, 1

S3

Nature Neuroscience: doi:10.1038/nn.3309

Page 3: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

Therefore

r

j DT f

j

0 C

j

1 ↵

2

2

with

⌃IC

j

⌃I

j

⌃C ⌃IC

2

(S1.1)

The choice probability can be computed as 1

CPj

dr

j

P r

j

f

j

0 D

T

0rj

dr

j

P r

j

f

j

0 D

T

0

4

⌃I

j

dx �

x

⌃I

j

�↵

⌃I

j

x

x

dy �

y

⌃I

j

1 �↵

⌃I

j

y

4 dx � x � ↵x

x

dy � y 1 � ↵y

4 dx � x � ↵x

x

dy � y dx � x � ↵x

x

dy � y � ↵y

4 dx � x � x � ↵x dx � x � ↵x

x

dy � y � ↵y

where zero mean and unit variance have been omitted from � and � for brevity. With

dx � x � x � ↵x � x

2� ↵x dx � x ↵� ↵x � x � x � ↵x

2 dx � x � x � ↵x 1 ↵ dx � x � ↵x � x

dx � x � x � ↵x

1

2

2dx � ↵x � x

2

and

dx � x � ↵x

x

dy � y � ↵y

x

dz � z � ↵z

x

dy � y � ↵y

x

x

dx

x

dz � z � ↵z � x � ↵x

2 dx � x � ↵x

x

dy � y � ↵y

1

2

1

2

dx � x � ↵x

x

dy � y � ↵y

1

8

it follows

CPj

2 2↵ dx � ↵x � x

2

1

2(S1.2)

3

22↵ dx � ↵x � x

2 (S1.3)

1Remember 1 � x � x .

S4

Nature Neuroscience: doi:10.1038/nn.3309

Page 4: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

We solve F ↵ : dx � ↵x � x

2 analytically by di↵erentiation using d� ↵x d↵ x

2

↵� ↵x

and d� ↵x dx ↵

2

x� ↵x :

dF ↵

d↵

↵ dx x

2

� ↵x � x

2

1

� ↵x x� x

2 dx

1

� ↵x � x

2 2x� x � x

01

F ↵

2

dx x � ↵x � x � x

where the last term can again be computed by integration by parts:

dx x � ↵x � x � x

1

2

dx

d� ↵x

dx

� x � x

1

2

� ↵x � x � x

dx � ↵x

d� x

dx

� x dx � ↵x �

2

x

1

2

dx � ↵x

d� x

dx

� x dx � ↵x �

2

x

1

2

dx x � ↵x � x � x dx � ↵x �

2

x

1

1 ↵

2

dx

1

2⇡

3

exp↵

2

x

2 2x

2

2

1

2⇡

1

1 ↵

2

1

2 ↵

2

ThereforedF ↵

d↵

1

F ↵

1

1

↵ 1 ↵

2 2 ↵

2

(S1.4)

The homogeneous part of this di↵erential equation yields

dF

F

d↵

implying that F ↵

A

(S1.5)

where A is an integration constant. Hence our ansatz for the original inhomogenous di↵erential equation(S1.4) is F ↵ g ↵ ↵. We find:

g ↵

2

g ↵

g ↵

2

1

1

↵ 1 ↵

2 2 ↵

2

g ↵ d↵

1

1

1 ↵

2 2 ↵

2

1

arctan↵

2 2B

F ↵

1

↵⇡

arctan↵

2 2

B

where B is again some (di↵erent) integration constant. Using equation (S1.3) we find

CPj

3

2

2

arctan↵

2 22B

S5

Nature Neuroscience: doi:10.1038/nn.3309

Page 5: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

We determine B from the special case ↵ 0 which corresponds to C�j

0, i.e. when neuron j does notcontribute to the decision (�

j

0) and is not correlated with any neurons who do. It follows that choiceprobability must be 0.5:

CPj

↵ 01

2

3

2

2

arctan 0 2B

B

1

2

CPj

1

2

2

arctan↵

2 2

With equation (S1.1) we obtain:

CPj

1

2

2

arctan⌃IC

j

2⌃I

j

⌃C ⌃IC

2

j

from which equation (1) follows for the case of negligible response variability, f

j

0 2

S

2

C

jj

. Anotherto write CP is as

CPj

1

2

2

arctan⇠

2 ⇠

2

(S1.6)

with

⌃IC

⌃I⌃C

. (S1.7)

S6

Nature Neuroscience: doi:10.1038/nn.3309

Page 6: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

c

f(s)neuronal response

frequ

ency

(pdf

)

0.5 0.6 0.7 0.8 0.90.5

0.6

0.7

0.8

0.9

choice probability (CP)

CP

appr

oxim

atio

n

100 102 1040.5

0.6

0.7

0.8

0.9

1

number of neurons (n)

choi

ce p

roba

bilit

y (C

P)

a b c

Neural response Choice probability (CP) Number of neurons (n)

Freq

uenc

y (p

df)

CP

appr

oxim

atio

n

Cho

ice

prob

abilit

y (C

P)

1.0

Figure S1: Analytical results. (a) Choice-conditioned response distribution. Black: Distribution ofsensory response distribution across all trials. Red & Blue: Sensory response distributions only for thosetrials in which the subject selected choice 1 and 2, respectively. For the case of Gaussian responsevariability, they are given by a point-wise multiplication of the total distribution (black) with cumulativeGaussians (shown in dashed). (b) Comparison of choice probability computed from equation (1) withits first-order approximation equation (2). The dotted line is the identity line. (c) Choice probability isshown as a function of the number of neurons n, for the case of constant correlations within each pool, andacross pools. The di↵erent curves correspond to di↵erent levels for the di↵erence between the correlationswithin and across decision pools: c

0, 0.05, 0.1, 0.15, 0.2 . CP 0.5 c

⇡ for large n.

S2 Choice probabilities for Poisson-like responses and integration-to-bound or attractor-based decision-making

Now we will demonstrate that our results are relevant in realistic circumstances when the Gaussianassumption and the perfect integration assumption made in our analytical derivation is violated. Ourassumption of a Gaussian response distribution for each neuron, and hence the hypothetical decisionneuron, is challenged under realistic (e.g. Poisson) circumstances in two ways: (1) spike counts arediscrete, not continuous, and (2) the response distribution is skewed rather than symmetric about itsmean. (Note that we do not assume a constant response variance for di↵erent means so this aspectof neuronal firing is unproblematic.) These two challenges are most severe for very low spike counts.In Figure S2a we plot the relationship between our analytical predictions and simulation results for aneuronal populations of 128 neurons assuming the same correlation structure as in the main text23. InFigure S2b we show how the factor by which the actual choice probabilities are reduced compared to theanalytical solution depending on the average number of spikes per trial per neuron. We find that theGaussian approximation is excellent for spike counts as low as 2 spikes per trial (error less than 3%) andonly starts breaking down significantly below 1 spike per trial. However, as panel (a) demonstrates, evenin those cases, a simple scaling correction depending on the average spike count brings analytical andactual choice probabilities into agreement.

In addition to non-Gaussian response variability, our decision model of perfect evidence integration overthe entire stimulus duration is likely to be an approximation. At least under time pressure, the behaviorof subjects is better modeled using an integration-to-bound framework18. As soon as a su�cient numberof spikes supporting one of the two choices has been collected, a decision is made and unchanged even inthe presence of contradictory evidence later on. This means that spikes occurring after the decision-timeare ignored. Since our analytical prediction is based on all the spikes over the entire trial duration, wewill overestimate the actually observed choice probabilities. In Figure S3a we illustrate the relationshipbetween analytical and simulated choice probabilities in a population of 32 neurons for di↵erent valuesfor the bound (high bound in green, most decisions are made at the end of the stimulus presentation,

S7

Nature Neuroscience: doi:10.1038/nn.3309

Page 7: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

a b

Analytical CP

Sim

ulat

ed C

P

Mean spikes per trial

Sim

ulat

ed C

P / a

naly

tical

CP

0.5 0.55 0.60.5

0.52

0.54

0.56

0.58

0.6

0.62

analytical CP

sim

ulat

ed C

P

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

<spikes/trial>

0.1 Spikes/trial0.3 Spikes/trial1.0 Spikes/trial

1.0

1.0

Figure S2: Analytical results apply to Poisson spiking down to 2 spikes/trial and deviatesystematically below. (a) Comparison of analytical choice probabilities with simulated choice proba-bilities for three di↵erent levels of mean response per trial. Errorbar (in the bottom right corner) based on100,000 repeats in the simulation (2�). Population of 128 neurons with the realistic correlation structureused for the simulations in the main text. Each colored dot corresponds to a single neuron. (b) Theproportionality factor between analytical solution and Poisson simulation is shown for di↵erent levels ofthe mean response. The colored dots correspond to the cases shown on the left.

0.5 0.6 0.7 0.8 0.90.5

0.6

0.7

0.8

0.9

analytical CP

sim

ulat

ed C

P

0.6 0.8 10.5

0.6

0.7

0.8

0.9

analytical CP

sim

ulat

ed C

P

0 0.5 10

0.2

0.4

0.6

0.8

average decision time

sim

CP/

ana

CP

0.5 0.55 0.6 0.65 0.70.5

0.55

0.6

0.65

analytical CP

sim

ulat

ed C

P

analytical CP analytical CP analytical CPaverage decision time

(a) a b c

Analytical CP Average decision time Analytical CP

Sim

ulat

ed C

P/an

alyt

ical

CP

Sim

ulat

ed C

P

Sim

ulat

ed C

P

1.0 1.0

Figure S3: Integration-to-bound decision model implies simple scaling down of analytical so-lution. (a) Simulated choice probabilities for integration to bound decision making for 3 di↵erent bounds.In blue for a low bound (quick reaction time), red and green successively higher bounds. Proportionalityfits shown as lines. Combination of results from several populations of 32 neurons each, with di↵erentlevels of correlations (to cover a wider range of choice probabilities). Mean spike count per trial was2. (b) The proportionality factor between shown for di↵erent levels for the bound. The x-axis showsthe average implied reaction time, or the average time at which the decision was made, as a fractionof the entire stimulus duration. Note that the proportionality factor measures the scaling of the choiceprobabilities after subtracting 0.5. (c) The relationship between analytical prediction and simulation forone large-scale realistic simulation with a heterogenous population of 1024 neurons as used for illustratingthe read-out weight reconstruction below. In panels (a), (b) and (d), the standard error (2�) for thesimulation is shown in black.

S8

Nature Neuroscience: doi:10.1038/nn.3309

Page 8: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

0.5 0.6 0.7 0.80.5

0.55

0.6

0.65

0.7

0.75

0.8

analytical CP

sim

ulat

ed C

P

0 5 10 15 200

0.02

0.04

0.06

0.08

time

PK

0.3 0.4 0.5 0.6 0.7

0.3

0.4

0.5

0.6

0.7

analytical CP

sim

ulat

ed C

P

a b c

Strong attractorWeak attractor

Strong attractorWeak attractor

Strong attractorWeak attractor

Uniform read-outOptimal read-out

Analytical CPTimeAnalytical CP

Sim

ulat

ed C

P

Sim

ulat

ed C

P

Psyc

hoph

ysica

l ker

nel (

PK)

Figure S4: Comparison of analytical results with simulations using attractor-based decision-making. (a) Simulated choice probabilities compared with analytical predictions for a weak (blue) and astrong (cyan) attractor. Mean responses 10 spikes/sec. Population size 32 neurons with correlations var-ied to cover a range of choice probabilities from 0.5 to 0.8. Lines indicate the proportionality relationship.Dotted line is the identity. The 2�-errorbar for the simulation is shown in black. (b) Psycho-physicalkernels (the strength with which stimulus evidence is weighted over time). (c) The relationship betweenanalytical prediction and simulation for a simulation with realistic correlations and a heterogenous popu-lation of 128 neurons as used for illustrating the read-out weight reconstruction in the main text. choiceprobabilities were transformed to values between 0 and 0.5, and 0.5 and 1, respectively, for better visibility.The 2�-errorbar (100,000 repeats) for the simulation is shown in black.

low bound in blue: almost all decisions are made before the end of the stimulus presentation, and anintermediate level in red). We find that the agreement is still very good once one allows for an additionalscaling factor depending on the level of the bound. We also see that the deviations from proportionalitybecome larger as decisions are made earlier. The size of the errorbar from the simulation is shown in theupper left corner of the panels. However, even for the lowest bound investigated here (blue), where onaverage the decision is made after already a quarter of the available time, and three quarters of the spikesare ignored, the error in our analytical results is much smaller than that in high-quality measurementsof choice probabilities. (To make the deviations visible in the plot, we had to simulate 10,000 trials, anumber that is 1-2 orders of magnitude larger than what has been experimentally feasible.) In FigureS3b we plot the scaling factor by which the actual choice probabilities are reduced compared to theanalytical prediction as a function of the average decision time. Using this curve one could convertthe analytical results for perfect integration with Gaussian variability into integration-to-bound choiceprobabilities assuming Poisson variability given the average decision-time. Figure S3c shows that theagreement between analytical solution and empirical choice probabilities for a realistic simulation likethose used to illustrate the weight reconstruction in the main text. While the deviations appear slightlylarger than what would be expected based on the simulation error (2� shown in black), it is much smallerthan the empirical error even in high-quality recordings.

Figure S4 indicates a similar e↵ect if the decision mechanism is based on an attractor network ratherthan integration-to-bound as used above. We test the accuracy of our analytical solution for 2 attractors ofdi↵erent strength in di↵erent scenarios. Figure S4a compares analytical and simulated choice probabilitiesfor pool wise uniform correlations of di↵erent strengths and for random read-out weights (to cover a rangeof choice probabilities). Figure S4b&c shows the simulation results using the realistic correlation structureemployed before for both uniform and linear optimal weights. We used the same parameters as for thereconstruction in the main text, except that the population contained only 128 neurons. However, in orderto quantify the strength of the two attractors we added Gaussian noise to the signal and used psycho-physical reverse correlation to compute the psycho-physical kernel20. The psycho-physical kernel is a

S9

Nature Neuroscience: doi:10.1038/nn.3309

Page 9: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

0 pi 2pi0

pi

2pi

φ

φ

0.05

0.1

0.15

0.2

0 pi 2pi0

0.5

1

1.5

2

φ

expo

nent

0 pi 2pi

−2

−1

0

1

2

φ

wei

ghts

β

0 pi 2pi0.3

0.4

0.5

0.6

0.7

φ

CP

a b

c d

Const exponentVariable exponent

Reconstructedmodel:

Constant exponentVariable exponent

Simulated data:

Const exponentVariable exponent

Reconstructedmodel:

Ground truth

Constant exponentVariable exponent

Ground truth:

2⇡⇡

2⇡

2⇡⇡

2⇡⇡ 2⇡⇡

1.0Expo

nent

Wei

ghts

β

Cho

ice

prob

abilit

y (C

P)

Figure S5: Application of our weight reconstruction method to a system with a nonlinearread-out recovers a linear approximation. (a) Realistic correlation structure as used in main text.(b) Two example profiles were considered. One in which the response of all neurons was weighted linearly(blue), and one in which the response of 4 out of 36 neurons was squared before it entered the decision.The weights for both read-outs were uniform in each pool. (c) Choice probability profiles produced bythe two weight profiles (circles). Lines indicate the choice probability predictions from the reconstructedweights in panel (d). (d) Reconstructed weight profiles for both scenarios together with the ground truth(black).

measure for how much the evidence in the stimulus is weighted when arriving at the decision. Figure S4bshows the respective psycho-physical kernels as a function of time. The results from the weaker attractormodel are similar to what has been observed before (compare Figure 2B in20 while the stronger attractormodel discards most evidence after about a quarter of the time. In all cases do we find an excellentagreement between analytical solution and simulation results after adjusting for an overall scaling factor.

S3 Notes on nonlinear read-out

In Figure S5 we show what happens when applying our linear framework to a system with a nonlinearread-out rule, in particular one in which the decision depends on n

i k

k

r

↵kk

for arbitrary ↵

k

. As anexample, we simulate a small population of 36 neurons with uniform read-out weights (�

k

1) in whichthe most relevant neurons in the middle of each pool are read out with an ↵ 2 instead of ↵ 1 as allother neurons (panel b). We further assume that all neurons in our population have a mean response of

S10

Nature Neuroscience: doi:10.1038/nn.3309

Page 10: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

3 spikes/trial (zero-signal stimulus). This means that neurons with ↵ 1 have a stronger influence onthe decision than suggested by their �

i

. This is reflected in the observed choice probabilities (panel c) aswell as the reconstructed weight profile (panel d). In general, the reconstruction finds the closest linearapproximation to the nonlinear ground truth. This approximation is excellent in the cases that we haveinvestigated (e.g. assuming random unknown exponents in addition to the simple case presented here forclarity). In order to understand why that is, let us consider the contribution of neuron k in our case,�

k

r

2

k

, where r

k

r

k

k

and ⌘

k

is the neural noise. With r

2

k

r

k

2 2↵ r

k

k

2

k

we can thinkof �

k

2↵ r

k

k

as the ’e↵ective’ linear weight for this system, which now fluctuates from trial to trialdepending on the particular noise ⌘

k

in any one trial. (In a population consisting of two balanced poolsas is usually assumed15,18, n

k 1

k

r

k

2 0 and the contribution of r

k

2 can therefore be ignored).Since ⌘

k

0, the e↵ective linear weight reconstructed by our method will be 2�

k

↵ r

k

which is exactlywhat we observe in Figure S5d where the weights of the neurons whose output is squared are assignedweights that are 6 times as large as those of the other neurons. To verify that a linear model with thereconstructed weights does in fact produce the choice probabilities observed in the nonlinear system (dotsin panel (c)), we overlay the choice probabilities implied by the reconstructed weights as lines and findexcellent agreement. This also holds for large neuronal populations with the choice probability profilesclosely resembling those shown in Figure 2g where instead of the exponents, the linear read-out weightswere increased for a small subset of neurons.

S4 Pooling noise

Pooling noise is an additional source of variability during the decision-making stage that has been discussedpreviously15. Its e↵ect within the context of our framework is to increase the variance of the decisionvariable D by an amount �

2

pool

so that the total variability becomes: varD n

j 1

n

l 1

j

l

C

jl

2

pool

� C� �

2

pool

. It follows:

CPk

1

2

2

arctanC�

k

2C

kk

� C� �

2

pool

C� 2

k

(S1.8)

CPk

1

2

2

C�k

C

kk

� C� �

2

pool

. (S1.9)

This shows that the only e↵ect of pooling noise on the choice probability is as a scaling factor that isapplied uniformly to all neurons.

S5 Notes on earlier simulation-based results

S5.1 Uniform read-out

In this section we use our analytical framework to explain earlier simulation-based results15–17. As usualwe split the population into two pools of neurons: one pool whose responses support decision 1, and onewhose responses support decision 2. The two terms entering equation (2) can then be split into two sumseach: one sum over neurons within the same pool as neuron k, and one that sums over the neurons in theother pool:

C�k

j is in the same pool as k

C

kj

j

j is in a di↵erent pool from k

C

kj

j

(S1.10)

� C�i and j are in the same pool

C

ij

i

j

i and j are in di↵erent pools

C

ij

i

j

(S1.11)

S11

Nature Neuroscience: doi:10.1038/nn.3309

Page 11: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

Let us now consider the case that the neurons in each pool all have the same read-out weight, plus andminus 1, respectively15,17. Let’s further assume that neuron k belongs to the pool with positive read-outweights, i.e. �

k

1. This implies

C�k

j is in the same pool as k

C

kj

j is in a di↵erent pool from k

C

kj

(S1.12)

� C�i and j are in the same pool

C

ij

i and j are in di↵erent pools

C

ij

. (S1.13)

As a result, the choice probability of neuron k depends only on the sum of the covariances of neuron k withall the other neurons in its pool, and the sum over all the covariances with the neurons in the other pool(in addition to their individual variance). This explains why Shadlen et al found in their simulations thatthe choice probabilities only depended on the average correlation values, not their detailed structure15. Italso explains, why Nienborg & Cumming17 found that choice probabilities only depended on the di↵erencebetween same-pool and di↵erent-pool correlations.

Assuming a homogenous population in which all variances are the same C

kk

C

11

, defining C (c )to be the average covariance (correlation) between neurons within a pool, and C (c ) to be the averagecovariance (correlation) between neurons from di↵erent pools, and counting the terms in the respectivequadrants of the correlation matrix C, equations (S1.12) & (S1.13) simplify to:

C�k

C

kk

n

21 C

n

2C

C

11

1n

2c

where c

:n 2

n

c c

� C� nC

11

n

n

21 C

n

2

2C

nC

11

1n

21 c

n

2c

nC

11

1n

2c

.

We can derive the general formula for the choice probability using equation (2) (Figure S1c):

CPk

1

2

2

C�k

C

kk

� C�

1

2

2

1

n

c

2. (S1.14)

Figure S1c shows the choice probability as a function of the number of neurons for di↵erence values of c

.Numerically, such a relationship was first reported by15 for the special case of zero correlations betweenneurons in di↵erent pools, however its parametric form had not been known. Our result also shows thata recent conjecture by17, that choice probabilities depend on the correlation structure only through thedi↵erence in average correlations between the pools, is true for large homogenous populations for whichn 2

n

1.

S5.2 Uniform correlations

Furthermore, if we now allow the read-out weights to vary and instead fix the covariances within eachpool, and across pools, we find that:

C�k

C

same pool

j is in the same pool as k

j

C

di↵erent pools

j is in a di↵erent pool from k

j

� C� C

same pool

i and j are in the same pool

i

j

C

di↵erent pools

i and j are in di↵erent pools

i

j

S12

Nature Neuroscience: doi:10.1038/nn.3309

Page 12: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

Here, the situation is even more interesting: the terms on the right depend on the value of k only throughits pool identity, i.e. whether neuron k belongs to pool 1 or pool 2. This means, that in large populations,all neurons within a pool have the same CP, regardless of their read-out weight! This is exactly what Cohen& Newsome16 found in their simulations concluding that choice probabilities by themselves do not containinformation about the read-out weights. In fact, even if we knew C

same pool

and C

di↵erent pools

, we wouldnot be able to infer the read-out weights in this scenario, as the choice probabilities for neurons withina pool are all the same, regardless of the actual weight. Fortunately, however, for realistic correlationsstructures which are not constant within each pool, it is possible to infer information about the read-outweights.

Figure S6 illustrates why the reconstruction of read-out weights is virtually impossible in large neuronalpopulations if the correlations were uniform within each pool while the same is not true for a more realisticcorrelation structure (shown in Figure S6a, equivalent to the one used in the main text part of the paper).Figure S6b shows the two sets of weights that we compare: in blue are pool-wise constant weights thatcorrespond to a simple average over the responses of all neurons within a pool. The profile in redcorresponds to a read-out that only considers a small fraction of the neurons – those that are aligned withthe task (i.e. the neurons that are typically the most informative for the task). In Figure S6c the impliedchoice probabilities are shown: in blue and red based on the correlation structure in panel (a), and inmagenta and cyan based on the case of zero correlations. The case of zero correlations is equivalent toevery quadrant-wise constant correlation structure in that the implied choice probabilities for di↵erentvalues of correlations in the quadrants will only di↵er by a constant o↵set. The magenta line in FigureS6d shows the mean absolute di↵erence between the cyan and the magenta lines in panel (c) dependingon the pool size n. For n , the magenta line tends to 0 which means that in large populations thechoice probabilities implied by the blue weights in panel (b) and by the red weights in panel (c) becomeidentical, making it impossible to infer which of the two profiles the brain is using. On the other hand,the black line in Figure S6d, which shows the mean absolute di↵erence between the red and the blue linein panel (c), asymptotes at a positive value. This means that even in arbitrarily large populations, theirwill be an appreciable di↵erence between the choice probability profiles allowing us to infer which profileis used by the brain. As this example indicates, the implied di↵erence in choice probability may be quitesmall. However, since the choice probability profile that we need to reconstruct is only one-dimensional,a realistic amount of data (on the order of 100 neurons) is able to distinguish between the red and theblue line in Figure S6c – especially for recordings with many trials.

S6 Notes on weight reconstruction for populations of small andknown size

If the number of neurons, n, is known or can be estimated roughly, equations (1) & (2) can be inverteddirectly to infer the weights �:

k

2

n

l 1

C 1

kl

C

ll

CPl

1

2(S1.15)

We have assumed, without loss of generality, that � C� 1. This is possible since � C� n

k,l 1

k

l

C

kl

scales with the square of the overall scale of � so that it can always be chosen such that � C� 1. Sucha scaling implies no loss of generality since the overall scale of the weights is irrelevant for the behavior ofthe system: multiplying all weights by the same factor changes neither neuronal responses nor decisions.On the other hand, not every set of �

k

obtained from equation (S1.15) is guaranteed to obey the condition� C� 1. Given � C� 1, equation 2 becomes a linear equation in � that can be inverted to yieldequation (S1.15). In principle, this inversion is always possible since C, being a covariance matrix, is

S13

Nature Neuroscience: doi:10.1038/nn.3309

Page 13: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

0 pi 2pi0

pi

2pi

Δφ

Δφ

0

0.05

0.1

0.15

0.2

0 pi 2 pi−1

−0.5

0

0.5

1

φ

β

0 pi 2 pi

0.4

0.45

0.5

0.55

0.6

φ

CP

100 1000 5000100000

0.01

0.02

0.03

0.04

0.05

0.06

nN

mea

n di

ffere

nce

a b

dc

1.0

−1.02⇡⇡

Wei

ghts

β

2⇡⇡

2⇡

⇡ 2⇡

Mea

n di

ffere

nce

Cho

ice

prob

abilit

y

Figure S6: Comparison of weight reconstruction from quadrant-wise constant and non-constant correlation structure. (a) Realistic correlation structure equivalent to that used in themain text. (b) Two sets of weights � shown: simple average across all neurons within each pool (blue)and average over a small subset of neurons whose preferred direction is aligned with the task (red). (c)Implied choice probabilities for correlation structure shown in panel a (blue and red) and for zero cor-relations (cyan and magenta). (d): Absolute di↵erence in implied choice probabilities for both weightprofiles – averaged across all neurons.

0 pi 2pi0

pi

2pi

φ

φ

0.1

0.15

0.2

0 5 10−0.05

00.05

0.10.15

eige

nval

ue

0 pi 2pi

0

φ0 pi 2pi

0

φ

a b

2⇡⇡

c d

2⇡⇡

2⇡⇡

2⇡

Figure S7: Properties of pool-wise flat correlation structure in large populations. (a) Cor-relation structure with uniform correlation between neurons within the same pool of 0.2, and betweenneurons in di↵erent pools of 0.1. (b) Largest 10 eigenvalues are shown. All but 2 eigenvalues are zero. (c)Eigenfunction associated with the largest eigenvalue. (d) Eigenfunction associated with the second-largesteigenvalue.

S14

Nature Neuroscience: doi:10.1038/nn.3309

Page 14: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

invertible. Linearly scaling from the CPk

to �

k

we obtain:

k

2C

kk

CPk

1 2 and hence

� C�

� C�or

� C�C 1�

For a valid � to exist, � C 1� 1 needs to hold. If we had actually measured all the choice probabilities,and all the pairwise correlations C

ij

, and � C 1� were not 1, then this would imply that our model iswrong, or not incorporating some essential aspects of reality. However, if we have observed only a smallsubset of neurons in the population and hence measured only a small set of �

k

and C

jk

, then the firstcandidate for the mismatch may be the way we extrapolated from the few measured neurons to the entirepopulation. In particular, variability in the underlying weights and/or noise correlations will lead tovariability in the choice probabilities. If this variability is not accounted for, but averaged away, then thiswill bias � C� to values larger or smaller than 1. For instance, adding variability to the � will generallyincrease � C�. This means that an extrapolation that averages away unobserved variability will lead to� C� 1 and hence imply choice probabilities that are larger than those that are actually observed.Fortunately, such a bias will only a↵ect the magnitude of the implied choice probabilities, and would nota↵ect the structure of the read-out weights, i.e. the decoding strategy.

Instead of basing the inversion on the first-order approximation equation (2), it can also be based onthe exact equation for the CP, equation (1), to yield:

k

n

l 1

C

1

kl

C

ll

2

1 tan ⌘

2

where ⌘

2CP

l

1

2(S1.16)

However, the improvement in accuracy is small for realistic values of choice probability (see also FigureS1b).

S7 Smooth models overestimate choice probabilities

We show here that using only one eigenfunction for the reconstruction of a weight profile that is actually thesuperposition of two eigenfunctions, will lead to an overestimation of the magnitude of the implied choiceprobabilities. The general case of omitting the smallest eigenfunctions from a large set of eigenfunctionsthat constitute the weight vector follows by induction.

We want to show that the magnitude of the choice probabilities implied by a weight function �

1 v 1

2 v 2 is overestimated if the component of � with the smaller eigenvalue, i.e. �

2

1 , is

ignored. Since our convention is � C�i

i

2

i 1, and since v 1 and v 2 are orthogonal, themagnitude of the true choice probability is given by:

���� diag C CP1

2

����2

2

4

2

1

2

1

2

2

2

2

2

4

2

1 ⌫

2

2

2

1

2

2

2

2

where diag C is the vector of neuronal response variances, and denotes the Hadamard (element wise)

product. The choice probability implied by a profile � ⌘

1 v 1 with � C� ⌘

1

2

v 1 1 follows as:���� diag C CP

1

2

����2

2

4

2

1

.

S15

Nature Neuroscience: doi:10.1038/nn.3309

Page 15: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

a b

c d

0 pi0

pi

Δφ

Δφ

0

0.1

0.2

0.3

0.4

0 pi

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Δφ

0 5 10 15

10−2

10−1

100

Eige

nval

ue

pi

−0.05

0

0.05

φ�

2⇡⇡

2⇡

2⇡⇡0

Figure S8: Example reconstruction of correlation structure. (a) Reconstructed smooth correlationstructure.. (b) Reconstruction of correlation profiles from observed data. (c) Largest 10 eigenvalues. (d):Eigenfunctions corresponding to largest 6 eigenvalues. Color coding as in panel (c). (c,d): Eigenvaluesand eigenfunctions that highlighted that obey the task symmetries: symmetry around ⇡ 2 and 3⇡ 2 withineach pool, respectively, and anti-symmetry around ⇡ – the pool boundary.

S16

Nature Neuroscience: doi:10.1038/nn.3309

Page 16: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

a b c

1

0 pi 2 pi−1.5

−1

−0.5

0

0.5

1

1.5

φ

β

0.1 0.15 0.2 0.25

−0.2

−0.1

0

0.1

0.2

ν(1)

ν(2)

−pi/2 0 pi/20

0.5

1

1.5

2

ν(2) ν(1)

UniformOptimal

UniformOptimal

UniformOptimal

arctan ⌫

2

1

1.0

1.0

−1.0

2⇡⇡

⇡ 2⇡ 2

Figure S9: Simulation results for the weight reconstruction in the heterogenous case as inFigure 5 but with 4000 trials on 96 channels. (a) Reconstructed read-out profiles. Solid linesindicate mean reconstruction results while dashed lines indicate standard deviations across the simulations.Blue shows the reconstruction for the constant weight profile while red shows the reconstruction for theoptimal weights case. (b) Coordinates of individual simulation results within the respective reconstructionspace spanned by the eigenfunctions with the two largest eigenvalues (and correct symmetry). Colorscheme as before. Black lines indicate 2 standard deviations based on 500 simulations. (c) Histogram ofthe ratios of the reconstructed projections onto the two eigenfunctions with the largest two eigenvalues(as in panel b). Blue: constant weights, red: optimal weights.

a a b

� mean square errorpi

0.4

0.45

0.5

0.55

0.6

φ

CP

−1 0 1 20.4

0.45

0.5

0.55

0.6

z−score

CP

0 0.5 1 1.5 2−3

0

0.5

1

1.5

2x 104

MSE

frequ

ency

uniformoptimal

uniformoptimal

z-score

x104

x10-3

1.0

1.02⇡⇡0Mean square errord’

UniformOptimal

UniformOptimal

Freq

uenc

y

Choi

ce p

roba

bility

Figure S10: Results corresponding to Figure 6, however based on 4000 trials on 96 channels.(a) Optimality test from equation (6) applied to the simulation in Figure 4. Blue and red dots representthe simulated data (constant and optimal weights, respectively). Thick curves represent running averagesover 16 adjacent values (compare Figure 4 in6. (b) Deviations from the proportionality relationship forthe two cases as quantified by the mean-square-error.

S17

Nature Neuroscience: doi:10.1038/nn.3309

Page 17: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

0 10 200

0.1

0.2

tuning curve amplitudes0 2 4

0

0.5

1

tuning curve widths κ0 pi 2 pi

0

5

10

φ

resp

onse

0 pi 2 pi−5

0

5

preferred stimulus

z−sc

ore

a b c d

Tuning curve amplitudes Tuning curve widths κ

Res

pons

e

� �

d’

2⇡⇡ 2⇡⇡

Figure S11: Illustration of population heterogeneity. (a) Distribution of tuning curve amplitudesin response to coherent stimulus. (b) Distribution of von-Mises tuning curve width parameters . (c)Example tuning curve shapes representing widest, narrowest, and mean width. Note that the actualtuning curves used also had varying amplitudes (distribution in panel a). (d) Implied sensitivity ofneurons depending on preferred stimulus. Note that the underlying ’ground truth’ is shown here. Theactual measurements, based on a limited number of trials, will be significantly noisier.

We now need to show that

4

2

1 ⌫

2

2

2

1

2

2

2

2 4

2

1

0 �

1

2

2

2

2

which is always true for �

1

2 (by definition) and �

2 0 (since C is a correlation matrix/function).(Technically, �

2 0 is possible, e.g. for perfectly correlated neurons, however that would correspond toa case where at least 1 neuron has no intrinsic/private variability which is unrealistic.)

S8 Notes on optimal linear read-out and its relationship to zero-signal stimuli

The task of distinguishing between stimuli s

1

and s

2

is only meaningful when s

1

or s

2

are actually presentin the stimulus that is shown to the subject. In the absence of a signal, the subject has to guess, and istypically rewarded on a random basis. However, since such zero-signal trials are always interleaved withlow-signal-trials, in which either s

1

or s

2

is present (or outweighs the other one if both are shown), thesubject cannot be sure whether any one stimulus does contain a signal and tries to perform the task asif there was a signal present. This means that the subject is essentially trying to classify the stimulusas one in which either stimulus s

1

or s

2

is present at a very low signal strength. This means that evenin zero-signal trials, the decision area is assumed to read out its sensory neurons with the same strategythat it employs in very-low-signal strength trials.

While the notion of optimality does not make sense in the context of zero-signal trials in whichperformance is at chance for any read-out (assuming the reward schedule is random as it usually is), itdoes make sense in the context of those very-low-signal-strength trials. Some read-outs will be better ableto extract the signal in the stimulus than others and the linear read-out that maximizes performance,we call the optimal linear read-out. Since the standard model we study here assume a linear read-out,we sometimes drop the ’linear’ for brevity. The set of weights which maximizes performance is given byFisher’s Linear Discriminant (Bishop 2006, p.189) – as defined by equation (5).

The neural responses r s

1

and r s

2

therefore strictly refer to either of the two stimuli that the subjectexpects when presented with a zero-signal stimulus. While we have no way of knowing the precise signal-level of the expected s

1

and s

2

, we know that the response of a neuron can be linearly approximated in

S18

Nature Neuroscience: doi:10.1038/nn.3309

Page 18: Supplementary Modelingclm.utexas.edu › compjclub › papers › Haefner2013_supplement.pdfSupplementary Modeling Inferring decoding strategies from choice probabilities in the presence

the vicinity of zero-signal stimuli:

r c r c 0dr

dc

c 0

(S1.17)

where c quantifies signal strength (e.g. coherence) and interpolates between c 1 corresponding tos

1

at 100% coherence and c 1 corresponding to s

2

at 100%. Since equation (5) is a proportionalityrelationship, i.e. only depends on the direction of the vector r s

1

r s

2

representing the change inpopulation response, not its magnitude, the experimentally observed responses at any coherence level atwhich the response depends approximately linearly on signal level (i.e. where equation (S1.17) holds) canbe used for the optimality test described in the main text.

S19

Nature Neuroscience: doi:10.1038/nn.3309