exploiting cross-modal rhythm for robot perception of objects artur m. arsenio paul fitzpatrick mit...

20
Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick Computer Science and Artificial Intelligence Labor m

Upload: juliana-jones

Post on 21-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

Exploiting cross-modal rhythm for robot perception of objects

Artur M. Arsenio Paul Fitzpatrick

MIT Computer Science and Artificial Intelligence Laboratory

m

Page 2: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

cameras on active vision head

microphone array above torso

periodically moving object (hammer)

periodically generated sound (banging)

Cog – the humanoid platform

Page 3: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Motivation

• Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files, …

• Rhythmic information across the visual and acoustic sensory modalities have complementary properties

• Features extracted from visual and acoustic processing are what is needed to build an object recognition system

Page 4: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Interacting with the robotrobot

Page 5: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Talk outline

• Matching sound and vision• Matching with visual distraction• Matching with acoustic distraction• Matching multiple sources• Priming sound detection using vision

• Towards object recognition

Page 6: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

•Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files.

•Points tracked using Lukas-Kanade algorithm

•Periodicity Analysis

•FFTs of tracked trajectories

•Periodicity Histograms

•Phase verification

Detecting periodic events

Page 7: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Matching sound and vision

fre

qu

en

cy (

kHz)

0 500 1000 1500 20000

2

4

6

8

00

500

1000

1500e

ne

rgy

0-80

-70

-60

-50

500 1000 1500 2000 2500

500 1000 1500 2000 2500time (ms)h

am

me

r p

osi

tion

•The sound intensity peaks once per visual period of the hammer

Page 8: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Matching with visual distraction

• One object (the car) making noise• Another object (the ball) in view

– Problem: which object goes with the sound?– Solution: Match periods of motion and sound

Page 9: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Comparing periods

0 500 1000 1500 2000 25000

5

10en

ergy

0 500 1000 1500 2000 2500-50

0

50

car

posi

tion

0 500 1000 1500 2000 250040

50

60

time (ms)

ball

posi

tion

freq

uenc

y (k

Hz)

0 500 1000 1500 20000

1

2

3

4

5

6

7

8

•The sound intensity peaks twice per visual period of the car

Page 10: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Matching with acoustic distraction

Matching with acoustic distraction

500 1000 1500 2000

50

60

70

80

90

car

posi

tion

freq

uenc

y (k

Hz)

0 500 1000 1500 20000

2

4

6

8

Sound

500 1000 1500 2000 25004

6

8

10

12sn

ake

po

sitio

n

Page 11: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Matching multiple sources

• Two objects making sounds with distinct spectrums– Problem: which object goes with which sound?– Solution: Match periods of motion and sound

Page 12: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Binding periodicity features

200 400 600 800 100012001400160018002000

-60

-40

-20

car

posi

tion

200 400 600 800 100012001400160018002000

30

35

40

ratt

le p

ositi

on

freq

uenc

y (k

Hz)

0 500 1000 1500 2000

0

2

4

6

8

•The sound intensity peaks twice per visual period of the car. For the cube rattle, the sound/visual signals have different ratios according to the frequency bands

Page 13: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

134146162723Plane and Mouse

00100494Snake rattle (Car in Background)

001005165Car (Snake rattle in background)

17083566Car

860141117Car & ball

00100686hammer

Missed Binds (%)

Incorrect Binds (%)

Correct binds (%)

Bind made

Sound period found

Visual period found

Experiment

StatisticsAn evaluation of cross-modal binding for various objects and situations

the sound generated by a periodically moving object can be much more complex and ambiguous than its visual trajectory

Page 14: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

time (ms)

fre

qu

en

cy (

kHz)

0 200 400 600 8000

2

4

6

8

time (ms)

fre

qu

en

cy (

kHz)

0 200 400 6000

2

4

6

8

Priming sound detection using vision

Signals in Phase

Page 15: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

time (ms)

fre

qu

en

cy (

kHz)

0 200 400 600 8000

2

4

6

8

time (ms)

fre

qu

en

cy (

kHz)

0 200 400 6000

2

4

6

8

Signals out of phase!

Page 16: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Object recognition

• Visual object segmentation

• Cross-modal object recognition– Ratio between acoustic/visual fundamental frequencies– Phase between acoustic and visual signals– Range of acoustic frequency bands

Page 17: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Cross-modal object recognition

Causes sound when changing direction, often quiet during remainder of trajectory (although bells vary)

Causes sound when changing direction after striking object; quiet when changing direction to strike again

Causes sound while moving rapidly with wheels spinning; quiet when changing direction

Page 18: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

0.4 0.5 0.6 0.7 0.8 0.9 1-200

-150

-100

-50

0

50

100

Ph

ase

diff

ere

nce

(so

un

d/v

isu

al)

Fundamental period Ratio (sound/visual)

Clustering

0.40.5

0.60.7

0.80.9

1 -200

-150

-100

-50

0

50

100

0

5

10

15

Phase difference (sound/visual)

Fundamental period Ratio (sound/visual)

Fre

quen

cy b

ands

Page 19: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Conclusions• Different objects = distinct acoustic-visual patterns

which are a rich source of information for object recognition. Object differentiation from both its visual and acoustic backgrounds by binding pixels and frequency bands that are oscillating together

• Cognitive evidence that, for humans, simple visual periodicity can aid the detection of acoustic periodicity

• More feature can be used for better discrimination, like the ratio of the sound/visual peak amplitudes

• Each type of features are important for recognition when the other is absent. But when both are present, then we can do better by looking at the relationship between visual motion and the sound generated.

Page 20: Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

CIRAS 2003

Questions?Questions?