modelling of sensory integration with neural network systems · 2 bm somlt somph nlt xlt nph xph...
TRANSCRIPT
Modelling of sensory integration with neural network systems Lennart Gustafsson, Andrew Paplinski & Tamas Jantvik
Q: Why integrate sensory information? A: Because biology does it, at least for higher order animals, and the
animals gain from it… “… the major functions of multisensory convergence and integration seem aimed at enhancing the detection of behaviourally-relevant stimuli, and of promoting rapidity of behavioural responding (e.g. motoric orienting). We would add cognitive processing (e.g. attentional orienting and cognition) to the list of functions that benefit from multisensory processing.” From Schroeder et al.: “Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing”, Int. J. of Psychophysiology, 2003 Thus, in biology, sensory integration of congruent stimuli can yield:
• Shorter reaction time to an event • Lower threshold for detecting an event (e.g. integrated sensory information give larger
responses and/or make additional neurons active) • Better (faster) learning of how to handle an event. • The resulting multimodal percept is much more robust against corrupted stimuli (against
‘noise’) than the individual unimodal percepts. • Modulation of unimodal percepts using the multimodal representation.
All these advantages apply to situations where sensory stimuli in different modalities are congruent, i.e. they are different aspects of the same event.
Example: Response time to stimuli (in cortex) From Laurienti et al.: “Semantic congruence is a critical factor in multisensory behavioural performance”, Experimental Brain Research, 2004
Response for congruent visual and auditory stimuli is quicker
than for the corresponding unimodal stimulation.
Exmaple: Multisensory integration can be a very early event
Multisensory interaction in a single neuron in the superior colliculus (located in the midbrain of mammals). Information flows through here at a very early stage of sensory processing, before cortical processing. Sensory integration can thus increase the response to an event. The picture shows relative response levels to visual stimuli, auditory and the combination of the two.
From King & Calvert: “Multisensory integration: Perceptual grouping by eye and ear”, Current Biology, 2001
A: … and it can develop in an automatic way Example: Multisensory integration in neocortex
From Beauchamp: “See me, hear me, touch me: multisensory integration in lateral occipital-temporal cortex”, Current Opinion in Neurobiology, 2005
A closer look at responses of neurons
From Beauchamp et al.: “Unraveling multisensory integration: patchy organization within human STS multisensory cortex”, Nature Neuroscience, 2004
The neocortex develops automatically by learning from incoming stimuli, so it wouldn’t be unreasonable to assume that this integration mechanism has also been developed automatically.
Bimodal integration of auditory and visual stimuli
What if we could mimic the brain’s way of sensory integration and use it in engineering applications. We’ll start with modelling this integration process of visual letters and speech sounds. From van Atteveldt et al. ”Integration of Letters and Speech Sounds in the Human Brain”, Neuron, 2004
An attempt of mimicry We want to create an architecture that features as many of the advantageous properties of those found in biological sensory integration of letters and phonemes as possible. It should:
• Communicate with the outside world via input and output signals • Support integration of congruent signals • Increase the robustness of the integrated signal against corruption/noise • Modulate unimodal sensory signals with the integrated one to converge to a coherent state • Be tuned using the input signals (i.e. support somewhat automatic setup)
Aiming to use the architecture in engineering application it should also be:
• Fast, at least during application (application as opposed to training) • Versatile
And, in an initial approach it is always wise to keep things simple.
Step 1: A way of integration A bimodal, hierarchical system, consisting of three Kohonen maps connected in a bottom-up, or feed-forward, fashion. The networks output the coordinate of the winner neuron. Bimodal map is trained with concatenated winner coordinates of congruent stimuli (xlt, xph). We call this a MuSON; Multimodal Self-Organizing Network.
2
bm
SOMlt SOMph
nltltx
nphphx
lty phy
ybm
x
vWV
x
vWV
x
vWV
2 2
SOM
Letters: 23-element vector based on principal component analysis of > 1000 pixel patterns representing letters
Phonemes: 36-element vector consisting of Mel frequency cepstral coefficients of average phonemes as spoken by ten Swedish speakers
Step 1 results
Bimodal Kohonen map when ‘o’ is the input to the system. Notice that there is high activation of neighbouring populations as well. This is unbiological – lateral inhibition is lacking in the Kohonen map. A patch thus consists of a group of neurons that give the highest activity for a stimulus. The winner neuron for a stimulus is the one with the maximal response to that stimulus.
Step 1 results Patches of maximum activity in response to stimuli after self-organization (An example, organization may change from one run to another, but the relations between patches adhering to similar stimuli are always there). Noteworthy things:
• Letter map: Positions of ‘i’ and ‘l’, ‘a’, ‘ä’ and ‘å’, and other look-alikes. • Phoneme map: Position of ‘p’ and ‘t’, ‘n’ and ‘m’, and other sound-alikes. • Bimodal map: Has a topology too; of coordinates. For instance: stimuli adjacent in both
sensory maps are adjacent in the bimodal map, ‘f’ and ‘v’, ‘i’ and ‘l’.
Bimodal map (4D)
a
äe
i
y
å
u
o
ö
s S
f
b
d
g
k
lm
n
p r
t
v
Phoneme map
a ä
e
i y
åu
o
ö
s
S
f
bd
gk
lm
n
p
r
t
v
Letter map
a ä
e
i
y
å
uo
ö
s
S
f
b
d
g
k
l
mnp
r t
v
Step 1 results Activity in Kohonen maps for three different inputs.
Solid lines: Perfect letters Dash-dot lines: Heavily corrupted letters
Step 1 results Activity in Kohonen maps for three different inputs.
Solid lines: Perfect letters Dash-dot lines: Heavily corrupted letters
Conclusion: Some robustness has been achieved.
Step 2: Modelling feedback using a MuSON with feedback Continuing to be influence by the findings and interpretations of van Atteveldt et al in ”Integration of Letters and Speech Sounds in the Human Brain” we add feedback from the bimodal SOM to a SOM in the speech sound processing system that merges together the sensory input with the feedback (again, by concatenations of coordinates). The sensory and the bimodal maps are trained first, as before, and then the map dealing with feedback. In application SOMrph is passed through during the first loop.
Fee
dbac
k
SOMlt SOMph
phy
SOMrph
rphy
SOMbm
ybm
nltltx
nphphx
lty
W v,d V
x2
2
v,d
x
W V
2
x
v,dW V
2
W v,d V
x
2
Step 2 results
A complete set of maps after self-organization of MuSON with feedback
The organization of the two phoneme maps are very similar – it is as it should be. Otherwise it wouldn’t work very well, would it?
Letter map
a
ä
e
i
y
å
u
o
ö
sS
f
b
d
g
kl
mn
p
r
t v
Bimodal map
a
äe
i
y
å
u
o
ö
s
S
f
b
d
g
k
l
mn
p
r t
v
Phoneme map
a
ä e
i
y
å
u
o
ö
sS
f
b d
g
k
l
mn
p
r
t
v
Re−coded phoneme map
a
äe
i
y
å
u
o
ö
sS
f
bd
g
k
l
mn
p
r
t
v
Step 2 results
The processing of three corrupted phonemes in MuSON with feedback
Letter map
å
i
m
Phoneme map
å
i
m
Re−coded phoneme map
å
i
m
Conclusion: Modulation of unimodal sensory signal with the integrated one to reach to a coherent state is modelled.
“Assuming the activity in auditory cortex generally corresponds to a perceptual experience of something heard, a likely function of a converging visual or somatosensory input would be to enhance auditory analysis of that stimulus. … but the perceptual experience would remain auditory .” From Schroeder et al.: “Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing”, Int. J. of Psychophysiology, 2003.
Step 3: Making the activity levels matter A response from a SOM to an input that is unrecognized. There is some activity and whence a maximum activity, and a winner neuron. But these activities are low. If the activity levels are allowed to matter in the architecture the model becomes fuller.
10
20
30
10
20
30
0
0.2
0.4
0.6
0.8
1
Phoneme mapRelaxation loop 0
Step 3: Making the activity levels matter Goal: If the auditory processing (for example) yields low activity its output should have low significance in the bi-modal processing. Idea: When combining coordinates of winner neurons, make the combination weighted using the activities of these neurons. v – position of the winner. a – activity level of the winner.
1
lt
ltx
nph
phx
abm
SumSOMType2
Bi−modalProcessing
SumSOMType1
AuditoryProcessing
v, a
v, a v, a
v, a
v, a
v, a
bmv
phv pha
ltaltv
x
2 1
V WSOMlt
2 1
1
2
Fee
dbac
k
V
V
x
2
n
Step 3: Making the activity levels matter, using the SumSOMs
M
f
1W
M
g
a1 in 1 inv
1M
g1
nx
SOMa0
SOM0
SOMx0
1/2
max WTA
aout outv
SumSOMType1
1Φ
fusedΦ
M
SOMΦ0
aΦ,
2
Φ
x
VTM1
1
1
M
V W
x
M
1 2V
V
WTAmax
aout outv
2Φf
M
g
f
M
g
1/2
fusedΦ
2W1W
1 ina a2 in 2 inv1 inv
SumSOMType2
TM1 TM2
1ΦM
1 Φ
x
V
1Φ
x
M M
M
2 1 1 2
21
M
V
Outline: Transform the v(s), weight the transformations with the a(s), combine the results, and classify it by determining the winner neuron.
Step 3 results
Phoneme map
p
t
0.425’p’ + 0.575’t’
Letter map
p
1’p’ + 0’p’
Test with noisy input to the auditory processing unit. Conclusion: Modulation of unimodal sensory signal with the integrated one to reach to a coherent state is still modelled.
Step 3 results
Letter map
i
1’i’ + 0’i’
Phoneme map
∅
1’_’ + 0’_’
Test with input to the auditory processing unit that is not learnt and yields a low activity. Conclusion: The effect of the input to the auditory unit has no effect on the bimodal unit. The auditory unit’s classification is modulated by the stronger bimodal signal.
Future work
1
lt
ltx
nph
phx
abm
SumSOMType2
Bi−modalProcessing
SumSOMType1
AuditoryProcessing
v, a
v, a v, a
v, a
v, a
v, a
bmv
phv pha
ltaltv
x
2 1
V WSOMlt
2 1
1
2
Fee
dbac
k
V
V
x
2
n
ATTENTION Modelling attention
Goal: We would like our architecture to be able to pick out one of several stimuli. Like a being can follow a sequence of events while ignoring another. Example: If xph contains several (in some way) combined stimuli, we would like the architecture to indicate whether or not a pre-selected one is amongst them.