characterizing and processing robot-directed speech
DESCRIPTION
Characterizing and Processing Robot-Directed Speech. Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group. Characterizing and Processing Robot-Directed Speech. Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal - PowerPoint PPT PresentationTRANSCRIPT
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
Speech Recognition
What speech does the robot evoke?
How can we process that speech?
– Support a growing vocabulary
– Without hurting recognition accuracy
Baseline System
Extending Vocabulary
Extending Vocabulary
Baseline System
“Magic word” to introduce vocabulary
Separation of introduction from use
Robot confused by unknown/filler words
Study: Characterizing Speech
13 children
5-10 years old
20 minute sessions
Some prompting (Turkle et al)
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
children (cumulative %)
nu
mb
er o
f u
tter
ance
s p
er m
inu
te (approx)
Number of Utterances10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
children (cumulative %)
sin
gle
wo
rd u
tter
ance
s (%
)Single Word Utterances
10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
children (cumulative %)
utt
eran
ces
wit
h e
nu
nci
atio
n (
%)
Utterances with Enunciation10 20 30 40 50 60 70 80 90 100
Cooperative Speech Exists
Some clean word-learning scenarios– “Tiffany”
Can we use them as leverage?– “My name is Tiffany”
What recognition rates are possible?
Study: Processing Speech
“LCSINFO” domain (Glass, Weinstein)
Provide some initial vocabulary
Build language model without transcripts
Language Model
wordn
word2
word1
OOV
Out Of Vocabulary Model
wordn
word2
word1
Out Of Vocabulary Model
OOVphonen
phone2
phone1
(Bazzi)
Hypothesized transcript
N-Best hypotheses
Run recognizer
Extract OOV fragments
Identify competition
Identify rarely-used additions
Remove from lexiconAdd to lexicon
Update lexicon,
baseforms
Update Language Model
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
coverage (%)
keyw
ord
erro
r rat
e (%
)Baseline performance Performance after clustering
K
eyw
ord
Err
or
Rat
e (%
)
coverage (%)
Qualitative Results
Given:1600 Utterances
“email, phone, room, office, address”
Finds:1. n ah m b er 6. p l iy z2. w eh r ih z 7. ae ng k y uw3. w ah t ih z 8. n ow4. t eh l m iy 9. hh aw ax b
aw5. k ix n y uw 10. g r uw p
Conclusions
What about acoustic models?
Idiosyncratic vocabulary is important
Pick a good name for your robot!
Acknowledgements
MIT Initiative on Technology and Self
MIT Spoken Language Systems group
DARPA
NTT