modelling of sensory integration with neural network systems · 2 bm somlt somph nlt xlt nph xph...

Modelling of sensory integration with neural network systems Lennart Gustafsson, Andrew Paplinski & Tamas Jantvik

Q: Why integrate sensory information? A: Because biology does it, at least for higher order animals, and the

animals gain from it… “… the major functions of multisensory convergence and integration seem aimed at enhancing the detection of behaviourally-relevant stimuli, and of promoting rapidity of behavioural responding (e.g. motoric orienting). We would add cognitive processing (e.g. attentional orienting and cognition) to the list of functions that benefit from multisensory processing.” From Schroeder et al.: “Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing”, Int. J. of Psychophysiology, 2003 Thus, in biology, sensory integration of congruent stimuli can yield:

• Shorter reaction time to an event • Lower threshold for detecting an event (e.g. integrated sensory information give larger

responses and/or make additional neurons active) • Better (faster) learning of how to handle an event. • The resulting multimodal percept is much more robust against corrupted stimuli (against

‘noise’) than the individual unimodal percepts. • Modulation of unimodal percepts using the multimodal representation.

All these advantages apply to situations where sensory stimuli in different modalities are congruent, i.e. they are different aspects of the same event.

Example: Response time to stimuli (in cortex) From Laurienti et al.: “Semantic congruence is a critical factor in multisensory behavioural performance”, Experimental Brain Research, 2004

Response for congruent visual and auditory stimuli is quicker

than for the corresponding unimodal stimulation.

Exmaple: Multisensory integration can be a very early event

Multisensory interaction in a single neuron in the superior colliculus (located in the midbrain of mammals). Information flows through here at a very early stage of sensory processing, before cortical processing. Sensory integration can thus increase the response to an event. The picture shows relative response levels to visual stimuli, auditory and the combination of the two.

From King & Calvert: “Multisensory integration: Perceptual grouping by eye and ear”, Current Biology, 2001

A: … and it can develop in an automatic way Example: Multisensory integration in neocortex

From Beauchamp: “See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex”, Current Opinion in Neurobiology, 2005

A closer look at responses of neurons

From Beauchamp et al.: “Unraveling multisensory integration: patchy organization within human STS multisensory cortex”, Nature Neuroscience, 2004

The neocortex develops automatically by learning from incoming stimuli, so it wouldn’t be unreasonable to assume that this integration mechanism has also been developed automatically.

Bimodal integration of auditory and visual stimuli

What if we could mimic the brain’s way of sensory integration and use it in engineering applications. We’ll start with modelling this integration process of visual letters and speech sounds. From van Atteveldt et al. ”Integration of Letters and Speech Sounds in the Human Brain”, Neuron, 2004

An attempt of mimicry We want to create an architecture that features as many of the advantageous properties of those found in biological sensory integration of letters and phonemes as possible. It should:

• Communicate with the outside world via input and output signals • Support integration of congruent signals • Increase the robustness of the integrated signal against corruption/noise • Modulate unimodal sensory signals with the integrated one to converge to a coherent state • Be tuned using the input signals (i.e. support somewhat automatic setup)

Aiming to use the architecture in engineering application it should also be:

• Fast, at least during application (application as opposed to training) • Versatile

And, in an initial approach it is always wise to keep things simple.

Step 1: A way of integration A bimodal, hierarchical system, consisting of three Kohonen maps connected in a bottom-up, or feed-forward, fashion. The networks output the coordinate of the winner neuron. Bimodal map is trained with concatenated winner coordinates of congruent stimuli (xlt, xph). We call this a MuSON; Multimodal Self-Organizing Network.

2

bm

SOMlt SOMph

nltltx

nphphx

lty phy

ybm

x

vWV

x

vWV

x

vWV

2 2

SOM

Letters: 23-element vector based on principal component analysis of > 1000 pixel patterns representing letters

Phonemes: 36-element vector consisting of Mel frequency cepstral coefficients of average phonemes as spoken by ten Swedish speakers

Step 1 results

Bimodal Kohonen map when ‘o’ is the input to the system. Notice that there is high activation of neighbouring populations as well. This is unbiological – lateral inhibition is lacking in the Kohonen map. A patch thus consists of a group of neurons that give the highest activity for a stimulus. The winner neuron for a stimulus is the one with the maximal response to that stimulus.

Step 1 results Patches of maximum activity in response to stimuli after self-organization (An example, organization may change from one run to another, but the relations between patches adhering to similar stimuli are always there). Noteworthy things:

• Letter map: Positions of ‘i’ and ‘l’, ‘a’, ‘ä’ and ‘å’, and other look-alikes. • Phoneme map: Position of ‘p’ and ‘t’, ‘n’ and ‘m’, and other sound-alikes. • Bimodal map: Has a topology too; of coordinates. For instance: stimuli adjacent in both

sensory maps are adjacent in the bimodal map, ‘f’ and ‘v’, ‘i’ and ‘l’.

Bimodal map (4D)

a

äe

i

y

å

u

o

ö

s S

f

b

d

g

k

lm

n

p r

t

v

Phoneme map

a ä

e

i y

åu

o

ö

s

S

f

bd

gk

lm

n

p

r

t

v

Letter map

a ä

e

i

y

å

uo

ö

s

S

f

b

d

g

k

l

mnp

r t

v

Step 1 results Activity in Kohonen maps for three different inputs.

Solid lines: Perfect letters Dash-dot lines: Heavily corrupted letters

Step 1 results Activity in Kohonen maps for three different inputs.

Solid lines: Perfect letters Dash-dot lines: Heavily corrupted letters

Conclusion: Some robustness has been achieved.

Step 2: Modelling feedback using a MuSON with feedback Continuing to be influence by the findings and interpretations of van Atteveldt et al in ”Integration of Letters and Speech Sounds in the Human Brain” we add feedback from the bimodal SOM to a SOM in the speech sound processing system that merges together the sensory input with the feedback (again, by concatenations of coordinates). The sensory and the bimodal maps are trained first, as before, and then the map dealing with feedback. In application SOMrph is passed through during the first loop.

Fee

dbac

k

SOMlt SOMph

phy

SOMrph

rphy

SOMbm

ybm

nltltx

nphphx

lty

W v,d V

x2

2

v,d

x

W V

2

x

v,dW V

2

W v,d V

x

2

Step 2 results

A complete set of maps after self-organization of MuSON with feedback

The organization of the two phoneme maps are very similar – it is as it should be. Otherwise it wouldn’t work very well, would it?

Letter map

a

ä

e

i

y

å

u

o

ö

sS

f

b

d

g

kl

mn

p

r

t v

Bimodal map

a

äe

i

y

å

u

o

ö

s

S

f

b

d

g

k

l

mn

p

r t

v

Phoneme map

a

ä e

i

y

å

u

o

ö

sS

f

b d

g

k

l

mn

p

r

t

v

Re−coded phoneme map

a

äe

i

y

å

u

o

ö

sS

f

bd

g

k

l

mn

p

r

t

v

Step 2 results

The processing of three corrupted phonemes in MuSON with feedback

Letter map

å

i

m

Phoneme map

å

i

m

Re−coded phoneme map

å

i

m

Conclusion: Modulation of unimodal sensory signal with the integrated one to reach to a coherent state is modelled.

“Assuming the activity in auditory cortex generally corresponds to a perceptual experience of something heard, a likely function of a converging visual or somatosensory input would be to enhance auditory analysis of that stimulus. … but the perceptual experience would remain auditory .” From Schroeder et al.: “Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing”, Int. J. of Psychophysiology, 2003.

Step 3: Making the activity levels matter A response from a SOM to an input that is unrecognized. There is some activity and whence a maximum activity, and a winner neuron. But these activities are low. If the activity levels are allowed to matter in the architecture the model becomes fuller.

10

20

30

10

20

30

0

0.2

0.4

0.6

0.8

1

Phoneme mapRelaxation loop 0

Step 3: Making the activity levels matter Goal: If the auditory processing (for example) yields low activity its output should have low significance in the bi-modal processing. Idea: When combining coordinates of winner neurons, make the combination weighted using the activities of these neurons. v – position of the winner. a – activity level of the winner.

1

lt

ltx

nph

phx

abm

SumSOMType2

Bi−modalProcessing

SumSOMType1

AuditoryProcessing

v, a

v, a v, a

v, a

v, a

v, a

bmv

phv pha

ltaltv

x

2 1

V WSOMlt

2 1

1

2

Fee

dbac

k

V

V

x

2

n

Step 3: Making the activity levels matter, using the SumSOMs

M

f

1W

M

g

a1 in 1 inv

1M

g1

nx

SOMa0

SOM0

SOMx0

1/2

max WTA

aout outv

SumSOMType1

1Φ

fusedΦ

M

SOMΦ0

aΦ,

2

Φ

x

VTM1

1

1

M

V W

x

M

1 2V

V

WTAmax

aout outv

2Φf

M

g

f

M

g

1/2

fusedΦ

2W1W

1 ina a2 in 2 inv1 inv

SumSOMType2

TM1 TM2

1ΦM

1 Φ

x

V

1Φ

x

M M

M

2 1 1 2

21

M

V

Outline: Transform the v(s), weight the transformations with the a(s), combine the results, and classify it by determining the winner neuron.

Step 3 results

Phoneme map

p

t

0.425’p’ + 0.575’t’

Letter map

p

1’p’ + 0’p’

Test with noisy input to the auditory processing unit. Conclusion: Modulation of unimodal sensory signal with the integrated one to reach to a coherent state is still modelled.

Step 3 results

Letter map

i

1’i’ + 0’i’

Phoneme map

∅

1’_’ + 0’_’

Test with input to the auditory processing unit that is not learnt and yields a low activity. Conclusion: The effect of the input to the auditory unit has no effect on the bimodal unit. The auditory unit’s classification is modulated by the stronger bimodal signal.

Future work

1

lt

ltx

nph

phx

abm

SumSOMType2

Bi−modalProcessing

SumSOMType1

AuditoryProcessing

v, a

v, a v, a

v, a

v, a

v, a

bmv

phv pha

ltaltv

x

2 1

V WSOMlt

2 1

1

2

Fee

dbac

k

V

V

x

2

n

ATTENTION Modelling attention

Goal: We would like our architecture to be able to pick out one of several stimuli. Like a being can follow a sequence of events while ignoring another. Example: If xph contains several (in some way) combined stimuli, we would like the architecture to indicate whether or not a pre-selected one is amongst them.

modelling of sensory integration with neural network systems · 2 bm somlt somph nlt xlt nph xph...

Documents