why consider probabilistic models? computational reasons

1

Probabilistic Models of Cortical Computation

Rajesh P. N. RaoDept. of Computer Sci. and Engineering &

Neurobio. and Behavior ProgramUniversity of Washington

Seattle, WA Lab website: http://neural.cs.washington.edu

November, 2004

Funding: Sloan Foundation, Packard Foundation, ONR, and NSF

2

Why Consider Probabilistic Models?Computational Reasons

Sensory measurements are typically ambiguous E.g. Projection from 3D to 2D in vision

Biological sensors and processing elements are noisy

Animal’s knowledge of the world is usually incomplete

There appears to be a need to be able to represent, learn, and reason about probabilities

3

Example 1: Ambiguity of Stimuli

Is it an oval-shaped or a circular object?

Retinal Image

Eye

Eye

4

Bayesian Model: The Likelihood Function

(From Geisler & Kersten, 2002)

Retinal Image I

Likelihood = P(I | Slant, Aspect ratio)

5

Bayesian Model: The Posterior

(From Geisler & Kersten, 2002)

Posterior = Likelihood Prior k

(k = normalization constant)

6

What isthis imagedepicting?

Example 2: Noise and Incomplete Knowledge

7

Bayesian Model

LikelihoodP(I | )

Prior probabilityP()

PosteriorProbabilityP( | I) = P(I | )P()/P(I)

Input Image

dog …street

???(Bayesian decision)

sample

Okinawabeach

8

Bayesian Model with “Top-Down” Bias

LikelihoodP(I | )

Prior probabilityP()

PosteriorProbabilityP( | I) = P(I | )P()/P(I)

Input Image

dog

“Dog”(Bayesian decision)

sample

dog …street Okinawabeach

9

Psychophysical Evidence for Bayesian Perception

Motion from cast shadows (Kersten et al., 1996)

Surface perception based on texture (Knill, 1998)

Inferring 3D shape from 2D images (Mamassian et al., 2002)

Color perception (Bloj et al., 1999)

Cue combination for depth perception (Jacobs, 2002)

Motion illusions (Weiss et al., 2002)

Motor Control (Körding and Wolpert, 2004)

10

Other Results: Contextual Modulation in V1

(Zipser et al., 1996 )

11

Attentional Modulation in V2 and V4

(Reynolds et al., 1999)

12

Decision Neurons in Areas LIP and FEF

t (ms)

(Roitman and Shadlen, 2002)

13

Rev. Thomas Bayes (1702-1761)

Can a network of neurons perform Bayesian inference?

• How is prior knowledge about the world (prior probabilities and likelihoods) stored in a network?

• How are posterior probabilities of states computed?

14

Generative Models for Bayesian Inference

Fundamental Idea: Inputs received by an organism are caused by external “states” of the world (hidden “causes”)

Goal: Estimate the probability of these causes (or states or “interpretations”) based on the inputs received thus far

15

Example: Linear Generative Models

16

Linear Generative Model

Spatial Generative Model:I(t) = Ur(t) + n(t)

r(t) = representation vector, n = zero mean Gaussian white noise with covariance

Temporal Dynamics for Time-Varying Processes:r(t) = Vr(t-1) + m(t-1)

V = transition matrix, m = zero mean Gaussian white noise with covariance m

Goal: Find optimal representation vector r(t) given inputs I(t), I(t-1), …, I(1).

17

Optimization Functions

Find optimal r(t) by Minimizing Prediction Errors for all t:

= mean of r before measurement of I

Generalize to Weighted Least Squares Function:

M = covariance before measurement of I

rrrrrIrI

rrrI

TT

i

ii

i

ii

UU

UE22

1

rrrrrIrI 11 MUUE TT

r

18

Minimizing E = Maximizing Posterior Probability

Minimizing E is equivalent to Maximizing log P(r|I) which is equivalent to Maximizing Posterior Probability P(r|I)

kE

kMUU

PPPPTT

)(log)(log)|(log)|(log11 rrrrrIrI

IrrIIr

19

Optimal Estimation and Kalman Filtering

Setting dE/dr = 0 and solving for the optimal r yields the Kalman Filter:

K(t) = “Kalman gain” matrix = N(t)UT-1

N(t) = covariance of r after measurement of I(t) = (UT-1U + M(t) -1) -1

M(t) = VN(t-1)VT + m

)1()(

)()()()()(

tVt

tUttKtt

rr

rIrr

20

A Simplified Kalman Filter

If is diagonal and equal to , K(t) = (N(t)/ )UT = G(t)UT

Kalman filter equation is of the form:

New Estimate = Prediction + Gain x Prediction Error

UT = Feedforward Matrix

U = Feedback Matrix

V = Recurrent Matrix (Lateral Connections)

)1()( Prediction

)()()()()(

tVt

tUtUtGtt T

rr

rIrr

21

Neural Implementation via Predictive Coding

(Rao & Ballard, 1997,1999; Rao, 1999)

Predictive Coding Model:Feedback = PredictionFeedforward = Prediction Error

22

Clues from Cortical Anatomy

HigherArea

LowerArea

23

Hierarchical Organization of the Visual Cortex

Lower

Higher

24

Hierarchical Generative Model (Rao & Ballard, 1999)

Original Generative Model:I = Ur + n

Hierarchical Generalization:r = Uhrh+ nh

rh = representation at a higher level

With Temporal Dynamics:r(t) = Vr(t-1) + Uhrh(t-1) + m(t-1)

Can derive Kalman filter equations for each levelYields a Hierarchical Model for Predictive Coding

r

I

rh

25

Hierarchical Predictive Coding Model

I

I

rI U

(Rao & Ballard, 1997,1999)

= Uh rh

26

The Predictive Coding Hypothesis

Feedback connections from higher areas convey predictions of expected activity in lower areas

Feedforward connections convey the errors between actual and predicted responses

Model Prediction

Since feedforward connections to higher areas originate from layer 2+3, responses of layer 2+3 neurons should be

interpretable as prediction errors

27

Results from the Classic Studies of Hubel and Weisel (1960s)

28

“Endstopping” in Cortical Neurons

29

Contextual Modulation in Visual Cortex

(Zipser et al., 1996 )

30

Example Network for Predictive Coding

31

Natural Images used for Training

32

Synaptic Weights after Learning

33

Endstopping as a Predictive Error Signal

34

Comparison with Layer 2+3 Cortical Neuron

35

Why Does

Endstopping

Occur in the

Model?

Orientation-

Dependent

Correlations

in Natural

Images

36

Other Contextual Effects in the Model

37

Support for

Predictive

Coding

from an

Imaging

Study

(Murray et al., 2002)

38

Predictive Coding in the Retina

From:Nicholls et al., 1992

Response of a retinal ganglion cell can be interpreted as the difference (error) between center pixel values and their prediction based on surrounding pixels (Srinivasan et al., 1982)

+- -

-+ +

Receptive Fields

On-center off-surround

Off-center on-surround

39

Predictive Coding in the LGN

Temporal Receptive Field of LGN X-cell

From:Dan et al., 1996

LGN cell responses

Response of LGN cell can be interpreted as the difference (error) between current pixel values and their prediction based on past pixel values

40

Summary for Part I

Computational and experimental studies point to the need for probabilistic models of brain function

Probabilistic models typically rely on generative models of sensory (and motor) processes

We examined a simple linear generative model and its hierarchical generalizationBayesian inference via Kalman filteringNeural implementation allows Hierarchical Predictive Coding

Feedback connections convey predictionsFeedforward connections convey errors in prediction

Hierarchical predictive coding explains endstopping and other contextual surround effects based on natural image statistics

41

Break

Questions to Ponder over:

1. Can we go beyond linear generative models and Gaussian distributions?

2. Can a neural population encode an entire probability distribution rather than simply

the mean or mode?

42

Generative Models II: Graphical Models

Graphical models depict the generative process as a graphNodes denote random variables (states)Edges denote dependencies

Example: If states are continuous, linear generative model: I = Ur + n

),;()|( rIrI UNP

r

I

Earthquake Burglar

Radio Alarm

43

Continuous versus Discrete States

1 i M

UnimodalE.g. Normal N(;,)

Multimodal

Discrete Approximation

Discrete States

44

The Belief Propagation Algorithm

If states are discrete, probabilities of random variables can be calculated through “belief propagation” (Pearl, 1988): Each node j sends a “message” (probability

density) to every neighbor iMessage to neighbor i depends on messages

received from all other neighbors

ijkj XXNX

jkjjjjix

ijiji xmxxxxm\)(

)()(),()(

Earthquake Burglar

Radio Alarm

45

An Example: Hidden Markov Models (HMMs)

A Simple but Powerful Graphical Model for Temporal Data:Observed world can be in one of M states 1, 2, …, M

The state t at time step t depends only on previous state t-1 and is given by the probabilities:

P(t = i | t-1 = j ) (or for convenience)

The input It at time t is given by P(It | t = j )

)|( 1tj

tiP

It-2

t-2

It-1

t-1

It

t

Graphical Model for a HMM

State

Input

46

Inference in HMMs

),,|()|(),,|,( 1111 IIIIII tti

tittt

ti PPP

),,|,() |()|( 12111 IIII

tttj

tj

j

ti

tit PPP

Likelihood of i at time t Prediction for i at time t

It-2

t-2

It-1

t-1

It

tState

Input

47

Equivalence to Belief Propagation for HMMs

Equivalent to on-line

(“forward”) belief

propagation through

time

ijkj XXNX

jkjjjjix

ijiji xmxxxxm\)(

)()(),()(

It-2

t-2

It-1

t-1

It

tState

Input

)|() |( ,111, tit

ttj

tj

j

ti

tti PmPm I

48

Can a network of neurons perform this computation?

ttj

tj

j

ti

tit

tti mPPm ,111, ) |()|( I

49

Recurrent Network Model

vIvv

RW dt

d

Synaptic weights

Input I

Leaky Integrator Equation for Output Firing Rate v

Output Decay Input Feedback

R

50

Discrete Implementation

jjijtii

jjijtiiii

tvrtv

tvRtvtvtv

)()1( i.e.

))()(()()1(

Iw

Iw

New activity Input Prior Activity

Input I

R

51

Can this equation implement Belief Propagation for HMMs?

?

j

jijtii tvrtv )()1( Iw

ttj

tj

j

ti

tit

tti mPPm ,111, ) |()|( I

52

Consider Belief Propagation in Log Domain

j

jijtii tvrtv )()1( IwEquation for a recurrent network:

log) |(log)|(loglog ,111,

tt

jtj

j

ti

tit

tti mPPm I

53

Bayesian Inference in a Recurrent Network

Network can perform Bayesian inference using:

)()|(log

)|(log

,11

jjij

ttj

tj

j

ti

titit

tvrmP

P

IwI

log)()1( and j

jijtii tvrtv Iw

log posterior log likelihood log prior normalization log 1, tt

im

54

Example 1: Orientation Discrimination Task

Feedforward weights wi (= F(i)): A set of 36 oriented filters spanning orientations i = 0°, 5°, 10°, …, 175°

Transition Probabilities = 1 if i = j, 0 otherwise

Input images = oriented edge plus additive Gaussian noise

)|( 1tj

tiP

t = 1 t = 2 t = 3 t = 4 t = 5 t = 6

…

55

Demo: Orientation Discrimination

Input Image Sequence

Log likelihood computed from Feedforward Weights

Posterior computed by the Network over time

Orientation Estimation: Pick the preferred orientation of neuron with maximum response Maximum a Posteriori (MAP) Estimation

Neurons

Res

pons

e

56

Example 2: Motion Detection Task

• The Task: Guess the direction of motion of the coherently moving dots (UP/DOWN or LEFT/RIGHT)

Coherence of dots controls task difficulty Widely used to study decision making in humans and monkeys (E.g. (Shadlen and Newsome, 2001))

Example Stimuli:5% coherence50% coherence

57

Network for Motion Detection

Let ij encode (stimulus location i, motion direction j)

We can create a network for detection of 1D motion direction by selecting appropriate transition probabilities P(ij | kl)

P(iR | kR)

P(kL | jL)Rightward selective

Leftward selective

Input image

F(i)

58

Feedforward Weights

Spatial Location

F(1) F(2) … F(15)

59

Recurrent Weights

t-1

Transition Probabilities Recurrent Weights

jj

ji

jjijij xPxmm )|(loglogsuch that chosen weightsRecurrent

From Neuron j

To

Neu

ron

i

Rightward Leftward

t

60

Network Output for Moving Inputs

Rightward Moving Input Leftward Moving Input

Right selective neurons

Left selective neurons

))|(log( slikelihood log bIP tit

Posterior log

Posterior

Right selective neurons

Left selective neurons

61

Solving the Random Dots Task

Neurons in the network compute log posterior probabilities:

Random dots task: Need to decide whether majority of dots are moving Left or Right

Compute posterior probability of L and R by summing over all locations xi (marginalize over xi)

),,|,(log and ),,|,(log 11 IIRxPIILxP titi

itit

itit

IIRxPIIRP

IILxPIILP

),,|,( ),,|(

),,|,(),,|(

11

11

L R

62

Probabilistic Motion Detection in a Model Network

Demo 1: Activities in a model network for noisy motion Activities represent posterior probabilities of left/rightward motion

Demo 2: Activities of model “decision” neurons Decision neurons sum up log posterior probabilities over time Solid line = Leftward motion, Dotted line = Rightward motion

Demo 3: Effect of making the stimulus more noisy Longer decision times for noisier stimuli

63

Reaction Time depends on Coherency

Rate of evidence accumulation depends on stimulus coherency

Reaction Time(decision making time)

Shorter reaction times for more coherent stimuli

40% coherency 60% coherency 80% coherency

64

Two Brain Areas involved in Visual Decision Making

65

“Decision Neurons” in cortical area LIP

Monkey deciding direction of motion in random dots task

Plot shows average response in LIP to stimuli with different noise levels

Model neuron responses resemble LIP activities

Slower rise to threshold for noisier stimuli

t (ms)

(Roitman and Shadlen, 2002)

66

“Decision” Neurons in Frontal Cortex

Monkey making an eye movement to an “odd-ball” target among a field of distractors

Monkey’s reaction time distribution can be predicted from threshold crossings!

Data from (Schall & Thompson, 1999)

67

Distribution of Reaction Times in the Model

0 100 200

Fre

quen

cy

Reaction Times (number of time steps)

60 % Coherence 90 % Coherence

0 40 80

68

What if we increase the prior for Leftward motion?

Higher prior for L

dL

(Based on www.physiol.cam.ac.uk/staff/carpente/recinormal.htm)

69

Model Prediction: Increasing Prior for Left Motion

0 50 100 150 200

Fre

quen

cy


Left/Right equal probabilities Left more probable than Right

0 20 40 60 80 100

Distribution shifts –Shorter reaction times

for Left trials

60% coherence 60% coherence

70

What if speed is more important than accuracy?

Lower threshold

for making faster

decisions dL

(Based on www.physiol.cam.ac.uk/staff/carpente/recinormal.htm)

71

Model Prediction: Imposing an “Urgency” Constraint

0 50 100 150 200

Fre

quen

cy


Decision Threshold = T Decision Threshold = T/2

0 20 40 60 80 100

T= 0.03 T= 0.015

Distribution shifts – Shorter reaction times

72

What about Spikes?

Recall the leaky integrator equation:

Assume vi is linearly related to the membrane potential of neuron i as follows:

For the standard integrate-and-fire model with additive noise, one can show (Plesser & Gerstner, 2000; Gerstner, 2000):

)()( j

jijtiii tvRtv

dt

dvIw

miV

TvkV im

i

ti

tti

tvkTtVmii meetVtspikeP i

mi

ofy ProbabilitPosterior

)1(|)1(( 1,)1(/))1((

73

Example

Membrane Potential (log posterior)

SampledSpikes

PostsynapticMembrane Potential(decoded log posterior)

))1(),...,1(|)(,(log

)1(

III

ttP

tVti

mi

kTtV

mii

mie

tVtspikeP/))1((

)1(|)1((

Recipient neuron withalpha synapse

74

What about Top-Down Information?

Hypothesis: Top-down

priors influence

lower-level probability estimates

75

Probabilistic Graphical ModelHierarchical Network

• Top-down feedback conveys prior probability for spatial locations

• Posterior probability at lower level computed from prior & image

Hierarchical Belief Propagation in Cortical Networks

(Rao, NIPS, 2004)

76

Attention can restore V4 responses in the presence of distractors (Reynolds et al., 1999)

Reference stimulus only

Reference and probe (No Attention)

Reference and probe (with Attention)

Example: Modeling Spatial Attention in V4

77

Attentional Restoration of Responses in the Model

Reference only Ref. and probe Ref. and probe with attention

(Rao, NIPS, 2004)

78

Related Work on Probabilistic Models

Linear Generative Model Sparse Coding Models (Oshausen & Field, 1996; 1997) ICA (Bell & Sejnowski, 1997)

Hierarchical ModelMacKay, 1956; Mumford, 1992; Kawato et al., 1993; Dayan et

al., 1995; Lee & Mumford, 2003; Friston, 2003; Hawkins, 2004

Encoding Uncertainty and Belief Propagation with NeuronsAnderson & Van Essen, 1994; Zemel et al., 1998; Pouget et al.,

2000; Deneve, NIPS, 2004; Yu & Dayan, NIPS, 2004; Zemel et al., NIPS, 2004

79

Summary and Conclusions (“Posterior” for this lecture)

There is growing evidence that the brain utilizes probabilistic principles such as Bayesian inference

This lecture explored two neural models for Bayesian inference: Predictive Coding: Feedback connections convey predictions

while feedforward connections carry errorsBelief Propagation: The membrane potential encodes log

posterior probability via belief propagation; spiking probability is equal to posterior probability of the state encoded by the neuron

Some broad predictions of the models:Cortical architecture implements a graphical model of the sensory

(and motor) environmentCortical networks perform hierarchical Bayesian inferenceCorticocortical feedback conveys predictions or prior probabilities

80

(http://employees.csbsju.edu/tcreed/pb/pdoganim.html)

Open Problems: Synaptic Plasticity: Role of STDP and short-term plasticity in

Bayesian models Neural Implementation of Sensorimotor Bayesian models Incorporating rewards (Pavlovian conditioning, etc.)…

Future Directions

why consider probabilistic models? computational reasons

Documents