jonathan pillowpillowlab.princeton.edu/.../slides08_efficientcoding.pdfinformation theory slides...
TRANSCRIPT
Information Theory Slides
Jonathan Pillow
Barlow’s “Efficient Coding
Hypothesis”
Efficient Coding Hypothesis:
mutual information
channel capacityredundancy:
• goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience)
Barlow 1961Atick & Redlich 1990
Efficient Coding Hypothesis: Barlow 1961
Atick & Redlich 1990
mutual information
channel capacityredundancy:
channel capacity: • upper bound on mutual information• determined by physical properties of encoder
mutual information: • avg # yes/no questions you can answer about x given y (“bits”)
• entropy: “noise” entropyresponse entropy
• goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience)
Barlow’s original version:
mutual informationredundancy:
mutual information:
response entropy “noise” entropyif responses are noiseless
Barlow 1961Atick & Redlich 1990
Barlow’s original version:
response entropyredundancy:
mutual information:
“noise” entropynoiseless system
brain should maximize response entropy• use full dynamic range• decorrelate (“reduce redundancy”)
• mega impact: huge number of theory and experimental papers focused on decorrelation / information-maximizing codes in the brain
Barlow 1961Atick & Redlich 1990
response entropy
basic intuitionnatural image
nearby pixels exhibit strong dependencies
neural response i
neur
al re
spon
se i+
1
neural representationdesired
encoding
pixel i
pixe
l i+1
pixels
stimulus prior
noiseless, discreteencoding
Gaussian priorQ: what solution for infomax?
Application Example: single neuron encoding stimuli from a distribution P(x)
cdf
stimulus prior
noiseless, discreteencoding
infomax
Gaussian priorQ: what solution for infomax?A: histogram-equalization
Application Example: single neuron encoding stimuli from a distribution P(x)
responsedata
Laughlin 1981: blowfly light response
cdf of light level
• first major validation of Barlow’s theory
luminance-dependent receptive fields
Atick & Redlich 1990 - extended theory to noisy responses
wei
ghtin
g
space
High SNR(“whitening” / decorrelating)
Middle SNR(partial whitening)
Low SNR(averaging / correlating)
estimating entropy and MI from data
1. the “direct method”
repeated stimulus
raster
(Strong et al 1998)
• fix bin size Δ• fix word length N
samples from 001010010110
...estimate
eg, Δ=10ms,N=3 23=8 possible words
(Strong et al 1998)
i.e., from histogram-based estimate of probabilities p(R|Sj), then H = -∑P log P
repeated stimulus
raster• fix bin size Δ• fix word length N
samples from 001010010110
...estimate
eg, Δ=10ms,N=3 23=8 possible words
average over all blocks of size N
Estimate is:all words
1. the “direct method” (Strong et al 1998)
repeated stimulus
raster
psth
Information per spike:
mean rate
• equal to the information carried by an inhomogeneous Poisson process
2. “single-spike information” (Brenner et al 2000)
derivation of single-spike information
entropy of Unif([0 T]) entropy of p(tsp|stim)
normalized psth
mean rate
derivation of single-spike information
mean rate
entropy of Unif([0 T])
normalized psth
entropy of p(tsp|stim)
So far we have focused on the formulation:
Decoding-based approaches focus on the alternative version:
3. decoding-based methods
Suppose we have decoder to estimate the stimulus from spikes: (e.g., MAP, or Optimal Linear Estimator):
Stimulus Response Decoder
Covariance of residual errors
entropy of a Gaussian with cov = cov(residual errors) (Maximum Entropy distribution with this covariance)
Data Processing Inequality: Bound #1
Bound #2
3. decoding-based methods