information theoretic models of auditory...

84
Michael S. Lewicki Joint work with Evan Smith Departments of Computer Science and Psychology Center for the Neural Basis of Cognition Carnegie Mellon University We gratefully acknowledge the following support: NSF CAREER Award 0238351 NIH Training Grant MH19983 Information theoretic models of auditory coding

Upload: others

Post on 10-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Michael S. Lewicki†

Joint work with Evan Smith‡

Departments of Computer Science† and Psychology‡

Center for the Neural Basis of CognitionCarnegie Mellon University

We gratefully acknowledge the following support:NSF CAREER Award 0238351NIH Training Grant MH19983

Information theoretic models of auditory coding

Page 2: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

from Warren, 1999

The cochlea and inner ear

Page 3: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

A simple model of auditory coding

1 2 3 4 5 6 7 8 9 10

time (ms)

54321

time (ms)

3 5421

time (ms)

deBoer and deJongh, 1978 Carney and Yin, 1988

Auditory nerve filters can be estimated usingreverse correlation (spike-triggered averaging)

soundwaveform

filterbank staticnon-linearities

stochasticspiking

populationspike code

filterbank

Page 4: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Models are data driven

data: revcor filter

g(t) = atn!1e!bt cos(2πft + φ)

model: gammatone

“gammatone” function

Page 5: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Models are data driven

g(t) = atn!1e!bt cos(2πft + φ)

data: revcor filter

model: gammatone

model fit

residual error

“gammatone” function

Page 6: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Models are data driven

g(t) = atn!1e!bt cos(2πft + φ)

model fit

residual error

“gammatone” function

Page 7: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

• The aim of both theories and models is to explain data

- Models are data driven.

- Theories are driven by principles.

- Theories require a definition of an ideal.

A theoretical approach

Theoretical questions:

• Why gammatones?

• Why spikes?

• How is sound coded by

the spike population?

How do we develop a theory?

Page 8: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Theoretical models

● explains data from principles

● requires idealization and abstraction

Efficient coding

Spike coding

Coding efficiency

Explaining physiological data

Current directions

Page 9: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Efficient coding theory

• Barlow, 1961; Attneave, 1954

- main goal of sensory coding is to code signals efficiently

- sensory codes are adapted to the sensory environment

- each code “feature” should have minimal redundancy

- each feature should describe independent information

• caveats:

- applies to behaviorally relevant information

- not all redundancy is bad, e.g. when compensating for noise

Page 10: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Efficient coding of natural soundsLewicki (2002) Nat. Neurosci. 5:356-363

Goal:

Predict optimal transformation of sound waveform from statistics of the natural acoustic environment

• use simplest model: linear

• derive optimal code for sound in analysis window

• use ICA to learn optimal linear transform

x1:N

Page 11: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

What sounds to use?

• What are auditory systems adapted for doing?

- localization ⇒ environmental sounds

- communication ⇒ vocalizations

- general sound recognition ⇒ variety of sounds

• Learn codes for a variety of sound ensembles:

- non-harmonic environmental sounds (e.g. footsteps, stream sounds, etc.)

- animal vocalizations (rainforest mammals, e.g. chirps, screeches, cries, etc.)

- speech (from 100 male & female speakers from the TIMIT corpus)

Page 12: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Learning optimal linear transforms for natural sounds

environmental sounds vocalizations

speech

The optimal code depends on theclass of sounds being encoded: - a wavelet-like transform is best for environmental sounds - a Fourier-like transform is best for vocalizations - an intermediate transform is best for speech or general natural sounds

Page 13: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

0.2 0.5 1 2 5 10 20

1

2

5

10

20

characteristic frequency (kHz)Q

10dB

Evans, 1975Rhode and Smith, 1985

0.2 0.5 1 2 5 10 20

1

2

5

10

20

center frequency (kHz)

Q10

dB

+ vocalizationso speech/combinedx environmental sounds

Theory

0.2 0.5 1 2 5 10 20

1

2

5

10

20

center frequency (kHz)

Q10

dB

Theory

Efficient coding explains auditory nerve population data

Q10dB = fc/w10dB

Filter sharpness:

0.2 0.5 1 2 5 10 20

1

2

5

10

20

characteristic frequency (kHz)Q

10dB

Evans, 1975Rhode and Smith, 1985

Data

Page 14: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Theoretical models

● explains data from principles

● requires idealization and abstraction

Efficient coding ● codes signals accurately and efficiently

● adapted to natural sensory environment

Spike coding

Coding efficiency

Explaining physiological data

Current directions

Page 15: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Limitations of the linear model

• it’s linear

• code is optimal only within a block, not for whole signal

• offers no explanation of phase-locking and spikes

• representation depends on the relative alignment of the signal and block

Page 16: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

A continuous filterbank does not form an efficient code

! "

#

#

#

Goal:

find a representation that is both time-relative and efficient

Page 17: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Efficient signal representation using time-shiftable kernels (spikes)

x(t) =M!

m=1

nm!

i=1

sm,i φm(t ! τm,i) + ε(t)

• Each spike encodes the precise time and magnitude of a particular kernel

• Spike population forms a non-redundant signal representation

• Two important theoretical abstractions for “spikes”

- not binary- not probabilistic

• Can convert to a population of stochastic, binary spikes

Figure 3: Smith, NC ms 2956

4

Smith and Lewicki (2005) Neural Comp. 17:19-45

Page 18: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

The spikegram

0 5 10 15 20

100200

500

1000

2000

5000

K

Input

Residual

Reconstruction

0 5 10 15 20 25 ms

100200

500

1000

2000

5000

Ke

rne

l C

F (

Hz)

Input

Page 19: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

a

100200

500

1000

2000

5000

Ker

nel C

F (H

z)

b

Freq

uenc

y (H

z)

100 200 300 400 500 600 700 800 ms

1000

2000

3000

4000

5000

c

Comparing a spike code to a spectrogram

a

100200

500

1000

2000

5000

Ker

nel C

F (H

z)

b

Freq

uenc

y (H

z)

100 200 300 400 500 600 700 800 ms

1000

2000

3000

4000

5000

c

a

100200

500

1000

2000

5000

Ker

nel C

F (H

z)

b

Freq

uenc

y (H

z)

100 200 300 400 500 600 700 800 ms

1000

2000

3000

4000

5000

c How do we compute the spikes?

Page 20: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Encoding signals with spikes

• There are many possible algorithms, varying degrees of biological plausibility

• Here, we use a variation of Matching Pursuit (Mallat and Zhang, 1993)

- yields near optimal spike representation, but is not biologically plausible

- assume there exists a biol. plausible algorithm that achieves the same end

Page 21: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

Page 22: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

Page 23: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

Page 24: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

4. subtract kernel from signal, record spike, and adjust convolutions

Page 25: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

4. subtract kernel from signal, record spike, and adjust convolutions

5. repeat

Page 26: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

4. subtract kernel from signal, record spike, and adjust convolutions

5. repeat

Page 27: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

4. subtract kernel from signal, record spike, and adjust convolutions

5. repeat

Page 28: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

4. subtract kernel from signal, record spike, and adjust convolutions

5. repeat

Page 29: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

4. subtract kernel from signal, record spike, and adjust convolutions

5. repeat . . .

Page 30: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike Coding with Matching Pursuit

1. convolve signal with kernels

2. find largest peak over convolution set

3. fit signal with kernel

4. subtract kernel from signal, record spike, and adjust convolutions

5. repeat . . .

6. halt when desired fidelity is reached

Page 31: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

“can” 5 dB SNR, 36 spikes, 145 sp/sec

Residual

Reconstruction

Input

0 50 100 150 200 ms100200

500

1000

2000

5000K

erne

l CF

(Hz)

Page 32: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

“can” 10 dB SNR, 93 spikes, 379 sp/sec

Residual

Reconstruction

Input

0 50 100 150 200 ms100200

500

1000

2000

5000K

erne

l CF

(Hz)

Residual

Reconstruction

Input

0 50 100 150 200 ms100200

500

1000

2000

5000K

erne

l CF

(Hz)

Page 33: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

“can” 20 dB SNR, 391 spikes, 1700 sp/sec

Residual

Reconstruction

Input

0 50 100 150 200 ms100200

500

1000

2000

5000K

erne

l CF

(Hz)

Residual

Reconstruction

Input

0 50 100 150 200 ms100200

500

1000

2000

5000K

erne

l CF

(Hz)

Page 34: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

“can” 40 dB SNR, 1285 spikes, 5238 sp/sec

Residual

Reconstruction

Input

0 50 100 150 200 ms100200

500

1000

2000

5000K

erne

l CF

(Hz)

Page 35: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Varying the number of gammatone kernels21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

8 kernels, 12011 spikes21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

Page 36: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Varying the number of gammatone kernels21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

16 kernels, 1167 spikes

21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

Page 37: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Varying the number of gammatone kernels21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

32 kernels, 497 spikes

21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

Page 38: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Varying the number of gammatone kernels21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

64 kernels, 479 spikes

21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

21 Coding time-relative structure with spikes E Smith and M S Lewicki

Figure 6: The number of kernel functions affects both the spectral resolution and the

temporal sparseness of the spike codes. The input signal (top) was encoded using

matching pursuit with 8, 16, 32 or 64 kernel functions (A-D, respectively). The to-

tal number of spikes in each is (A) 12011, (B) 1167, (C) 497 and (D) 479.

activity across bands. It also possesses a nonlinear frequency axis based on the cochlea.

This axis emphasizes the range important to human hearing and is used in many audi-

tory models and speech “front-ends”.

4.5 Sparse representation of transients

Though the “pizzerias” example demonstrates the large scale features of the spike code,

the fine structure is more clearly revealed in a shorter speech segment. The waveform

and spikegram of first half of the word “wealth” appear in figure 8. Here we can see

the time-relative coding of non-stationary structure. 100 msec into the word (about

Page 39: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Coding efficiency in terms of spikes

100 1000 100001

10

100

Cost (spikes/sec)

SN

R (d

B)

Optim. Matching PursuitOptim. Filter!ThresholdMatching PursuitFilter!Threshold

100 1000 100001

10

100

Cost (spikes/sec)

SN

R (d

B)

8 Filters

256 Filters

Matching Pursuit

Page 40: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Efficient auditory coding with optimized kernel shapes

x(t) =M!

m=1

nm!

i=1

sm,i φm(t ! τm,i) + ε(t)

Figure 3: Smith, NC ms 2956

4

Adapt algorithm of Olshausen (2002)

Smith and Lewicki (2006) Nature 439:978-982

What are the optimal kernel shapes?

Page 41: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Optimizing the probabilistic model

x(t) =M!

m=1

nm!

i=1

smi !m(t! "m

i ) + #(t),

p(x|!) ="

p(x|!, s, ")p(s)p(")dsd"

" p(x|!, s, ")p(s)p(")

#(t) # N (0,$!)

Learning (Olshausen, 2002):

%

%!mlog p(x|!) =

%

%!mlog p(x|!, s, ") + log p(s)p(")

=1

2$"

%

%!m[x!

M!

m=1

nm!

i=1

smi !m(t! "m

i )]2

=1$"

[x! x]!

i

smi

Also adapt kernel lengths

Page 42: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Kernel functions are initialized to random vectors

Page 43: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Kernel functions optimized for coding speech

Page 44: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Theoretical models

● explains data from principles

● requires idealization and abstraction

Efficient coding ● codes signals accurately and efficiently

● adapted to natural sensory environment

Spike coding ● non-linear, efficient for time-varying signals

● idealization of binary action potentials

Coding efficiency

Explaining physiological data

Current directions

Page 45: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Quantifying coding efficiency

1. fit signal

2. quantize time and amplitude values

3. prune zero values

4. measure coding efficiency using the entropy of quantized values

5. reconstruct signal using quantized values

6. measure fidelity using signal-to-noise ratio (SNR) of residual error

• identical procedure for other codes (e.g. Fourier and wavelet)

x(t) =M!

m=1

nm!

i=1

sm,i φm(t ! τm,i) + ε(t)Residual

Reconstruction

Input

0 50 150 200 ms100200

500

1000

2000

5000

Kern

el CF

(Hz)

Residual

Reconstruction

Input

0 50 150 200 ms100200

500

1000

2000

5000

Kern

el CF

(Hz)

original

reconstruction

residualerror

spike code

Page 46: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Coding efficiency curves

0 10 20 30 40 50 60 70 80 900

5

10

15

20

25

30

35

40

Rate (Kbps)

SNR

(dB)

Spike Code: adaptedSpike Code: gammatoneBlock Code: waveletBlock Code: Fourier

4x more efficient

+14 dB

Page 47: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Theoretical models

● explains data from principles

● requires idealization and abstraction

Efficient coding ● codes signals accurately and efficiently

● adapted to natural sensory environment

Spike coding ● non-linear, efficient for time-varying signals

● idealization of binary action potentials

Coding efficiency ● much more efficient than linear block codes

Explaining physiological data

Current directions

Page 48: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Using efficient coding theory to make theoretical predictions

Natural Sound Environment

optimal kernels:• properties• coding efficiency

physiological data:• auditory nerve filter shapes• population trends

evolution

?Michael S. Lewicki ! Carnegie Mellon Bad Zwischenahn ! Aug 21, 2004

A simple model of auditory coding

1 2 3 4 5 6 7 8 9 10

time (ms)

54321

time (ms)

3 5421

time (ms)

deBoer and deJongh, 1978 Carney and Yin, 1988

auditory revcor filters: gammatones

soundwaveform

filterbank staticnon-linearities

stochasticspiking

populationspike code

More theoretical questions:

• Why gammatones?

• Why spikes?

• How is sound coded by

the spike population?

How do we develop a theory?

Michael S. Lewicki ! Carnegie Mellon Bad Zwischenahn ! Aug 21, 2004

Comparing a spike code to a spectrogram

How do we compute the spikes?

a

100200

500

1000

2000

5000

Ker

nel C

F (H

z)

b

Freq

uenc

y (H

z)

100 200 300 400 500 600 700 800 ms

1000

2000

3000

4000

5000

c

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

Speech PredictionAuditory Nerve Filters

theory

only compare to the data after optimizingwe do not fit the data

Prediction depends on sound ensemble.

Page 49: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

vocalizationsenvironmental sounds

transient ambient

Natural sounds

fox

squirrel

walking on leaves

rustling leaves

cracking branches

stream by waterfall

Page 50: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Learned kernels share features of auditory nerve filters

Optimized kernelsscale bar = 1 msec

Auditory nerve filtersfrom Carney, McDuffy, and Shekhter, 1999

Page 51: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Learned kernels closely match individual auditory nerve filters

for each kernel find closet matching auditory nerve filterin Laurel Carney’s database of ~100 filters.

Page 52: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Learned kernels overlaid on selected auditory nerve filters

For almost all learned kernels there is a closely matching auditory nerve filter.

Page 53: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike kernels for natural sound mix matches revcor filters

Page 54: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Optimal kernels for environmental sounds are very short

Page 55: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike kernels for vocalizations are much longer and symmetric

Page 56: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Comparing learned kernels to auditory nerve population

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

RevcorEnvironVocalNatural

Page 57: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Population distribution of kernels for natural sounds

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

RevcorEnvironVocalNatural

Page 58: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Population distribution of kernels for environmental sounds

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

RevcorEnvironVocalNatural

Page 59: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Population distribution of kernels for animal vocalizations

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

RevcorEnvironVocalNatural

Page 60: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Kernel distributions for different sound ensembles

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

RevcorEnvironVocalNatural

Page 61: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Population distribution of kernels for natural sounds

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

RevcorEnvironVocalNatural

Page 62: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Population distribution of kernels for speech (TIMIT)

0.1 0.2 0.5 1 2 50.1

0.2

0.5

1

2

5

Center Frequency (kHz)

)z

Hk(

htdi

wd

na

B

RevcorEnvironVocalSpeech

Page 63: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Speech matches composition of natural sounds

ambient environmental sounds

animal vocalizations

transient environmental sounds

Best mix for predicting auditory coding: 1.0 : 0.8 : 1.2

speech

Page 64: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Theoretical models

● explains data from principles

● requires idealization and abstraction

Efficient coding ● codes signals accurately and efficiently

● adapted to natural sensory environment

Spike coding ● non-linear, efficient for time-varying signals

● idealization of binary action potentials

Coding efficiency ● much more efficient than linear block codes

Explaining physiological data

● theory explains gammatone revcor shapes ● also explains population trends

Current directions

Page 65: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Coding of a speech consonant

Residual

Reconstruction

Inputa

100200

500

1000

2000

5000

Ker

nel C

F (H

z)

b

38 39 40

340040004800

Freq

uenc

y (H

z)

c

10 20 30 40 50 60ms

1000

2000

3000

4000

5000

Page 66: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

How is this achieving an efficient, time-relative code?

a

b

0 20 40 60 80 100 ms100200

500

1000

2000

5000

Kern

el C

F (H

z)

Time-relative coding of glottal pulses

Page 67: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Learning higher-order structure

a

b

0 20 40 60 80 100 120

Ker

nel C

F (H

z)

Time (msec)

!20 !15 !10 !5 0

Lag (msec)

Autocorrelation

Correlogram

Spike!triggered Periodicity

c

Page 68: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Spike interval alignment

Page 69: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Low-freq kernel do not precisely match signal period

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

Filte

r CF

(Hz)

original

reconstruction 3 dB SNR

residual

Page 70: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Low-freq kernel do not precisely match signal period

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

Filte

r CF

(Hz)

original

reconstruction 4 dB SNR

residual

Page 71: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Low-freq kernel do not precisely match signal period

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

Filte

r CF

(Hz)

original

reconstruction 5 dB SNR

residual

Page 72: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Low-freq kernel do not precisely match signal period

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

Filte

r CF

(Hz)

original

reconstruction 6 dB SNR

residual

Page 73: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Low-freq kernel do not precisely match signal period

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

Filte

r CF

(Hz)

original

reconstruction 7 dB SNR

residual

Page 74: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

3 Hierarchical spike models M S Lewicki

180 190 200 210 220 230 ms

100

200

500

1000

2000

5000

Filt

er

CF

(H

z)

Figure 2: Spikegram of the ’t’ in ’vietnamese’ at 30 dB SNR.

4 Hierarchical spike models M S Lewicki

25 30 35 40 45 50 55 60 65 ms

100

200

500

1000

2000

5000

()

25 30 35 40 45 50 55 60 65 ms

100

200

500

1000

2000

5000

()

25 30 35 40 45 50 55 60 65 ms

100

200

500

1000

2000

5000

()

Figure 3: Spikegram of onset in ’can’ at fidelities of 20, 30, and 40 dB SNR. Asthe fidelity increases, the high-frequency structure of the consonant becomesmore prominent.

5 Hierarchical spike models M S Lewicki

105 110 115 120 125 130 135 140 145 ms

100

200

500

1000

2000

5000

()

105 110 115 120 125 130 135 140 145 ms

100

200

500

1000

2000

5000

()

105 110 115 120 125 130 135 140 145 ms

100

200

500

1000

2000

5000

()

Figure 4: Spikegram of the vowel in ’can’ at fidelities of 5, 10, and 20 dB SNR.

6 Hierarchical spike models M S Lewicki

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

()

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

()

300 305 310 315 320 325 330 ms

100

200

500

1000

2000

5000

()

Figure 5: Spikegram of the /a/ vowel in ’vietnamese’ at fidelities of 15, 20, and25 dB SNR.

Learning general higher-order acoustic structure

Page 75: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Non-stationary statistical regularity in acoustic features

a

b

c

1 128 256!8

!4

0

4

y1

R1

1 128 256!8

!4

0

4

R2

1

128

256

R1 R2 R3

1 128 256!8

!4

0

4

R3

Karklin and Lewicki (2005) Neural Comp. 17:397-423

Page 76: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Generalizing the standard ICA model

P (s) =!

i

P (si)

P (si) ! exp

!

"

"

"

"

"

si

!i

"

"

"

"

qi#

P (si)

Page 77: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Generalizing the standard ICA model

P (s) =!

i

P (si)

P (si) ! exp

!

"

"

"

"

"

si

!i

"

"

"

"

qi#

P (si)P (ui|!i)

P (ui|!i) ! exp

!

"

"

"

"

"

ui

!i

"

"

"

"

qi#

log !i = [Bv]i

P (ui|!i)

log !i

! log p(u|B,v) "!

i

"

"

"

"

ui

exp([Bv]i)

"

"

"

"

qi

Page 78: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Independent density components

Bj

P(u|vj= 2.0)

P(u|vj= 0.0)

P(u|vj=−2.0)

! log p(u|B,v) "!

i

"

"

"

"

ui

exp([Bv]i)

"

"

"

"

qi

Page 79: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Density components of speech

Learning Density Components 411

Figure 7: A subset of density components of speech. The weights in a column ofB are plotted as shaded patches in one of the nine panels. Each patch is placedaccording to the temporal and frequency distribution of the associated linearbasis function and shaded according to the value of the weight, with whiteindicating positive weights, black negative weights, and gray weights that areclose to zero. The axes represent time, 0 to 16 msec, horizontally, and frequency, 0to 8 kHz, vertically. The density components form a distributed representationof the frequency of the signal and the location of energy within the samplewindow. Density components coding for multiple frequencies might captureharmonic regularities in the speech signal (see the text for details).

Page 80: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Density components of speech

Learning Density Components 411

Figure 7: A subset of density components of speech. The weights in a column ofB are plotted as shaded patches in one of the nine panels. Each patch is placedaccording to the temporal and frequency distribution of the associated linearbasis function and shaded according to the value of the weight, with whiteindicating positive weights, black negative weights, and gray weights that areclose to zero. The axes represent time, 0 to 16 msec, horizontally, and frequency, 0to 8 kHz, vertically. The density components form a distributed representationof the frequency of the signal and the location of energy within the samplewindow. Density components coding for multiple frequencies might captureharmonic regularities in the speech signal (see the text for details).

Page 81: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Density components of speech

Learning Density Components 411

Figure 7: A subset of density components of speech. The weights in a column ofB are plotted as shaded patches in one of the nine panels. Each patch is placedaccording to the temporal and frequency distribution of the associated linearbasis function and shaded according to the value of the weight, with whiteindicating positive weights, black negative weights, and gray weights that areclose to zero. The axes represent time, 0 to 16 msec, horizontally, and frequency, 0to 8 kHz, vertically. The density components form a distributed representationof the frequency of the signal and the location of energy within the samplewindow. Density components coding for multiple frequencies might captureharmonic regularities in the speech signal (see the text for details).

Page 82: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Higher-level representations show invariant properties

a

b

c

Page 83: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Theoretical models

● explains data from principles

● requires idealization and abstraction

Efficient coding ● codes signals accurately and efficiently

● adapted to natural sensory environment

Spike coding ● non-linear, efficient for time-varying signals

● idealization of binary action potentials

Coding efficiency ● much more efficient than linear block codes

Explaining physiological data

● theory explains gammatone revcor shapes ● also explains population trends

Current directions ● hierarchical models, higher-order structure

Page 84: Information theoretic models of auditory codingweb4.cs.ucl.ac.uk/staff/D.Barber/workshops/amac/... · A simple model of auditory coding 1 2 3 4 5 6 7 8 9 10 time (ms) 1 2 3 4 5 time

Q: (I. J. Good)

I cannot help wondering whether it is not largely a

prejudice to analyse signals in terms of frequency.

A: (D. Gabor)

There are in fact two good reasons for this

preference:

1. In communication problems we deal usually with

the infinite or semi-infinite time-axis, and

2. the problems are usually homogenous in time.

Once one or the other of these conditions is dropped,

it may be well worth while to carry out the analysis

in terms of other functions.

Symposium on Information Theory, Imperial College London, 1950