jonathan pillowpillowlab.princeton.edu/.../slides08_efficientcoding.pdfinformation theory slides...

Information Theory Slides

Jonathan Pillow

Barlow’s “Efficient Coding

Hypothesis”

Efficient Coding Hypothesis:

mutual information

channel capacityredundancy:

• goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience)

Barlow 1961Atick & Redlich 1990

Efficient Coding Hypothesis: Barlow 1961

Atick & Redlich 1990

mutual information

channel capacityredundancy:

channel capacity: • upper bound on mutual information• determined by physical properties of encoder

mutual information: • avg # yes/no questions you can answer about x given y (“bits”)

• entropy: “noise” entropyresponse entropy

• goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience)

Barlow’s original version:

mutual informationredundancy:

mutual information:

response entropy “noise” entropyif responses are noiseless


Barlow’s original version:

response entropyredundancy:

mutual information:

“noise” entropynoiseless system

brain should maximize response entropy• use full dynamic range• decorrelate (“reduce redundancy”)

• mega impact: huge number of theory and experimental papers focused on decorrelation / information-maximizing codes in the brain


response entropy

basic intuitionnatural image

nearby pixels exhibit strong dependencies

neural response i

neur

al re

spon

se i+

1

neural representationdesired

encoding

pixel i

pixe

l i+1

pixels

stimulus prior

noiseless, discreteencoding

Gaussian priorQ: what solution for infomax?

Application Example: single neuron encoding stimuli from a distribution P(x)

cdf

stimulus prior

noiseless, discreteencoding

infomax

Gaussian priorQ: what solution for infomax?A: histogram-equalization

Application Example: single neuron encoding stimuli from a distribution P(x)

responsedata

Laughlin 1981: blowfly light response

cdf of light level

• first major validation of Barlow’s theory

luminance-dependent receptive fields

Atick & Redlich 1990 - extended theory to noisy responses

wei

ghtin

g

space

High SNR(“whitening” / decorrelating)

Middle SNR(partial whitening)

Low SNR(averaging / correlating)

estimating entropy and MI from data

1. the “direct method”

repeated stimulus

raster

(Strong et al 1998)

• fix bin size Δ• fix word length N

samples from 001010010110

...estimate

eg, Δ=10ms,N=3 23=8 possible words

(Strong et al 1998)

i.e., from histogram-based estimate of probabilities p(R|Sj), then H = -∑P log P

repeated stimulus

raster• fix bin size Δ• fix word length N

samples from 001010010110

...estimate

eg, Δ=10ms,N=3 23=8 possible words

average over all blocks of size N

Estimate is:all words

1. the “direct method” (Strong et al 1998)

repeated stimulus

raster

psth

Information per spike:

mean rate

• equal to the information carried by an inhomogeneous Poisson process

2. “single-spike information” (Brenner et al 2000)

derivation of single-spike information

entropy of Unif([0 T]) entropy of p(tsp|stim)

normalized psth

mean rate

derivation of single-spike information

mean rate

entropy of Unif([0 T])

normalized psth

entropy of p(tsp|stim)

So far we have focused on the formulation:

Decoding-based approaches focus on the alternative version:

3. decoding-based methods

Suppose we have decoder to estimate the stimulus from spikes: (e.g., MAP, or Optimal Linear Estimator):

Stimulus Response Decoder

Covariance of residual errors

entropy of a Gaussian with cov = cov(residual errors) (Maximum Entropy distribution with this covariance)

Data Processing Inequality: Bound #1

Bound #2

3. decoding-based methods

jonathan pillowpillowlab.princeton.edu/.../slides08_efficientcoding.pdfinformation theory slides...

Documents