![Page 1: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/1.jpg)
Spoken Dialogue for Social Robots
Tatsuya Kawahara(Kyoto University, Japan)
1
Kristiina Jokinen(AIST AI Center, Japan)
![Page 2: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/2.jpg)
Spoken Dialogue Systems (SDS) are prevailing
• Smartphone Assistants
• Smart Speakers
What about Social Robots?
• Social RobotsIntended for interaction with human
3
![Page 3: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/3.jpg)
A majority of Peppers are returnedwithout renewing rental contracts
2015 2018
© Softbank
3 years later
4
![Page 4: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/4.jpg)
In successful cases, speech input is not used
© Softbank 5
![Page 5: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/5.jpg)
Hen na Hotel with robot receptionists
https://youtu.be/zx13fyz3UNg ©価格.com
Dinosaur robot
Female android
Critical interaction such as check‐in isdone with touch panel
6
![Page 6: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/6.jpg)
Robots are in many nursing homes,but do not make speech interaction (effectively)
Paro Palro
7© Daiwa House © Fuji soft
![Page 7: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/7.jpg)
https://www.fastcompany.com/90315692/one‐of‐the‐decades‐most‐hyped‐robots‐sends‐its‐farewell‐message8
![Page 8: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/8.jpg)
Still 5 Robots are Chosenin TIME Magazine 100 Best Inventions 2019
9
Tutor Delivery Porter in Hospital
Companion for Elderly Home Robot
![Page 9: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/9.jpg)
Under COVID‐19Robots Became Essential Workers (IEEE Spectrum)
10
Monitoring visitors
Delivering goods Checking patients (online)
![Page 10: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/10.jpg)
Agenda (Research Questions)
0. Why social robots are not prevailing in society?1. What kind of tasks are social robots expected to conduct?2. What kind of social robots are suitable for the tasks?3. Why spoken dialogue is not working well with robots?4. What kind of non‐verbal and other modalities are useful?5. What kind of system architectures are suitable?6. What kind of ethical issues must be considered?
Who?
What?
How?
13
![Page 11: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/11.jpg)
Agenda (Research Questions)
0. Why social robots are not prevailing in society?1. What kind of tasks are social robots expected to conduct?2. What kind of social robots are suitable for the tasks?3. Why spoken dialogue is not working with robots?
1. ASR and TTS2. SLU+DM (end‐to‐end?)
4. What kind of non‐verbal and other modalities are useful?1. Backchannel, turn‐taking2. Eye‐gaze
5. What kind of system architectures are suitable?6. What kind of ethical issues must be considered?
14
Kawahara
Jokinen
breakoptional
![Page 12: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/12.jpg)
0. Why social robots are not prevailing in society?
• Basically cost issue• Hardware expensive & fragile maintenance• Much more expensive (>10 times) than smart speakers
• Benefit does NOT meet the cost
• Tasks (=what robots can do) are limited or irrelevant• Many tasks can be done via smartphones and smart speakers
• Spoken Language Interaction experience is poor• Compared with smartphones and smart speakers
• While expectation is high
15
![Page 13: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/13.jpg)
1. What kind of tasks are social robots expected to conduct?
16
![Page 14: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/14.jpg)
Expected Roles by RobotsReceptionist Attendant
Health care Teaching Children Senior
receive=welcome attend=care
Physical presence &Face‐to‐Face interaction
matters
17
![Page 15: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/15.jpg)
Still 5 Robots are Chosenin TIME Magazine 100 Best Inventions 2019
18
Tutor Delivery Porter in Hospital
Companion for Elderly Home Robot
![Page 16: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/16.jpg)
Other Scenarios?
1. Who are typical users?2. Where are they served?
19
![Page 17: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/17.jpg)
Dialogue Category (Tasks)
No Resource(Dialog is task)
Information Services
Physical Tasks
Goal observable Negotiation Search, OrderReceptionist
ManipulationPorter, Cleaner
Content definite DebateInterview
NewscasterTutor, Guide
Objective shared CounselingSpeed dating
Attendant Helper
No clear objective(socialization)
ChattingCompanion
20
SmartphoneSmart speaker
![Page 18: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/18.jpg)
Dialogue Category (Tasks)
No Resource(Dialog is task)
Information Services
Physical Tasks
Goal observable Negotiation Search, OrderReceptionist
ManipulationPorter, Cleaner
Content definite DebateInterview
NewscasterTutor, Guide
Objective shared CounselingSpeed dating
Attendant Helper
No clear objective(socialization)
ChattingCompanion
• User initiative• System initiative• Mixed initiative
21
![Page 19: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/19.jpg)
Dialogue Category (Tasks)
No Resource(Dialog is task)
Information Services
Physical Tasks
Goal observable Negotiation Search, OrderReceptionist
ManipulationPorter, Cleaner
Content definite DebateInterview
NewscasterTutor, Guide
Objective shared CounselingSpeed dating
Attendant Helper
No clear objective(socialization)
ChattingCompanion
Agent is OK?
Android effective?22
Mechanical Robot
![Page 20: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/20.jpg)
Dialogue Roles of Adult Androids
Counseling
Receptionist, Attendant
NewscasterRole of Listen
ing
Role of Talking (to)
One person Several persons Many people
Interview
Short and shallow interaction
Long & deepinteraction
Long butno interaction
Guide
23
![Page 21: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/21.jpg)
Chatting function
• Desired in many cases (most of the tasks)• Ice‐breaking in the first meeting• Relaxing during a long interaction• Keeping engagement
• Can be done without robots/agents (cf.) chatbot• Will be more engaging with robots/agents
24
![Page 22: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/22.jpg)
2. What kind of robots are suitable for the tasks?
25
![Page 23: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/23.jpg)
Robot’s Appearance Affordance
People assume robot's capabilities based on its appearance• Looks like a human expected to act like a human• Has eyes expected to see• Speaks expected to understand human language and converse
• Speaks fluently expected to communicate smoothly
• Expresses emotion with facial expressions expected to read emotions[Human Robot Interaction https://www.human‐robot‐interaction.org/Chapter 4]
26
![Page 24: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/24.jpg)
Animal (Non‐Humanoid) RobotsStuffed Animals Talking (some listening)
• Paro • ???
© Daiwa House
Substitute of a pet
© SONY
• Aibo
27
![Page 25: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/25.jpg)
Child‐like or Child‐size Humanoid Robots
• CommU • Nao
©VSTONE, Osaka U
Substitute of a grandchild© Softbank robotics © Fuji soft
• Palro
28
![Page 26: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/26.jpg)
Adult‐size Humanoid Robots
• Asimo • Pepper
© Softbank© HONDA
Expected to do somethingBut still child‐like! Implying not so intelligent30
• Robovie
© ATR
![Page 27: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/27.jpg)
Adult Androids
• ERICA
32Debut in 2015
![Page 28: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/28.jpg)
How long can you keep talking?
• Smart Speaker
• Virtual Agent
• Humanoid Robot
• Human(A person you meet for the first time)
MMD Agent ©NITECH33
![Page 29: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/29.jpg)
How long can you keep talking (about one story)?
• Pet
• Kid (~10 year old)
• Baby
• Humanoid Android
34
![Page 30: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/30.jpg)
Comparison of Dialogue InterfacesSmart Speaker
Virtual Agent
Pet Robot
HumanoidRobot
Adult Android
Would like to have at home?Would like to have at office?
Asking today’s scheduleTalking about your lifeCompanion for senior
35
![Page 31: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/31.jpg)
Comparison of Dialogue InterfacesSmart Speaker
Virtual Agent
Pet Robot
HumanoidRobot
Adult Android
Would you give a nickname????
?????????
36
![Page 32: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/32.jpg)
Difference between Virtual Agents vs. Humanoid Robots/Androids?• Physical presence + mobility• Multi‐modality + flexibility
• Hard to make mutual gaze with virtual agents• Robots are deemed to be more autonomous than agents
• Move and act autonomously• Can be a partner
• ???
BUT• Robots are expensive and difficult to install and maintain
37
![Page 33: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/33.jpg)
Physical Presence of Robots
• Attract people• Can robots hand out flyers on the street better than human?• Can robots attract people to (izakaya) restaurant better than human?• Effective in the beginning• Especially for kids and senior people
• Attachment
• Bullying by group of kids(cf.) Virtual agents cursed
hopefully
unfortunately
38
![Page 34: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/34.jpg)
Physical Presence of Robots NOT NECESSARY
• When the task goal is information exchange and the user is collaborative
• Information exchange tasks• Must be done efficiently/ASAP• Short interaction (command, query) smartphones, smart speakers• Long interaction (news, tutor) virtual agents
• Not ‘collaborative’ users• Kids and senior people who do not understand the protocol
39
![Page 35: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/35.jpg)
Dialogue Category (Tasks)
No Resource(Dialog is task)
Information Services
Physical Tasks
Goal observable Negotiation Search, OrderReceptionist
ManipulationPorter, Cleaner
Content definite DebateInterview
NewscasterTutor, Guide
Objective shared CounselingSpeed dating
Attendant Helper
No clear objective(socialization)
ChattingCompanion
Agent is OK?
Android effective?40
Mechanical Robot
![Page 36: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/36.jpg)
Face‐to‐Face Multimodal Interaction
• Necessary for long and deep interaction• Talk about troubles or life
(ex.) counseling• To know communication skills and personality
(ex.) job interview, speed dating
• Multimodality• Mutual gaze…possible only with adult androids (?)• Head/body orientation• Hand gesture• Nodding
41
![Page 37: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/37.jpg)
Dialogue Roles of Adult Androids
Counseling
Receptionist, Attendant
NewscasterRole of Listen
ing
Role of Talking (to)
One person Several persons Many people
Interview
Short and shallow interaction
Long & deepinteraction
Long butno interaction
Guide
Should be autonomous and multi‐modal
42
Agent is OK
![Page 38: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/38.jpg)
Tool
• Smartphone Assistants
• Smart Speakers
Companion, Partner
• Communicative Robots
43
![Page 39: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/39.jpg)
3. Why spoken dialogue is NOT working well with robots?
46
![Page 40: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/40.jpg)
Agenda (Research Questions)
0. Why social robots are not prevailing in society?1. What kind of tasks are social robots expected to conduct?2. What kind of social robots are suitable for the tasks?3. Why spoken dialogue is not working with robots?
1. ASR and TTS2. SLU+DM (end‐to‐end?)
4. What kind of non‐verbal and other modalities are useful?1. Backchannel, turn‐taking2. Eye‐gaze
5. What kind of system architectures are suitable?6. What kind of ethical issues must be considered?
47
Kawahara
Jokinen
breakoptional
![Page 41: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/41.jpg)
DialogManagement
AutomaticSpeech
Recognition
Text-To-SpeechSynthesis
SpokenLanguage
Understanding
LanguageGeneration
TTS engine
Ask_Weather(PLACE: kyoto, DAY: saturday)“What is the weatherof Kyoto tomorrow?”
“Kyoto will be fine on this Saturday”
DOMAIN: weatherPLACE: kyotoDAY: saturdayASR engine
Architecture of Spoken Dialogue System (SDS)
48
![Page 42: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/42.jpg)
DialogManagement
AutomaticSpeech
Recognition
Text-To-SpeechSynthesis
SpokenLanguage
Understanding
LanguageGeneration
TTS engine
???(PLACE: tokyo)“How about Tokyo?”
DOMAIN: weatherPLACE: tokyoDAY: saturdayASR engine
Architecture of Spoken Dialogue System (SDS)
49
“Tokyo will also be fine on this Saturday”
![Page 43: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/43.jpg)
Automatic Speech Recognition (ASR)
50
![Page 44: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/44.jpg)
Challenges for Automatic Speech Recognition (ASR) for Robots• Distant speech
• Speaker localization & identification• Detection of speech (addressed to the system)• Suppression of noise and reverberation
• Conversational speech• Speech similar to those uttered to human (pets, kids) rather than machines• Typical users are kids and senior people
• Realtime response• Cloud‐based ASR servers have better accuracy, but large latency Talking similar to international phone calls
51
![Page 45: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/45.jpg)
Problems in Distant Speech
Smart Speakers Don’t care Use magic words Implemented
Maybe applicable to small (personal) robots• One person• Not so distant
52
• Speaker localization & identification• Detection of speech (addressed to the system)• Suppression of noise and reverberation
![Page 46: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/46.jpg)
Problems in Distant Speech
• Speaker localization & identification• Detection of speech (addressed to the system)• Suppression of noise and reverberation
Adult humanoid robots with camera ??? Implemented
Multi‐modal processing
53
![Page 47: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/47.jpg)
Detection of Speech addressed to the System
• Eye‐gaze (head‐pose)…most natural and reliable• Content of speech• Prosody of speech
• Machine learning• Not accurate enough must be close to 100%
• Incorporation of turn‐taking model• Context is useful
54
![Page 48: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/48.jpg)
Example Implementation for ERICA
Microphonearray
Kinect v2
HD camera
55
![Page 49: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/49.jpg)
Example Implementation for ERICA
Input
Microphone array
Depth camera (Kinect)
Sound source localization
Prosody extraction
Speech recognition ERICA
Text‐to‐speech
Motor controller
Lip movement
Output
audio speaker
Dialogue Manager (incl.
SLU&LG)
Motion information
Utterance sentence
Speech signal
Lip motioninformation
Motorsignal
Speech enhancement
Position & head‐pose tracking
56
![Page 50: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/50.jpg)
Real Problem in Distant Talking
• When people speak without microphone, speaking style becomes so casual that it is NOT easy to detect utterance units.
• False starts, ambiguous ending and continuation
• Not addressed in conventional “challenges”• Circumvented in conventional products
• Smartphones: push‐to‐talk• Smart speakers: magic word “Alexa”, “OK Google”• Pepper: talk when flash
• Incorporation of turn‐taking model• Context is useful
57
![Page 51: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/51.jpg)
Distant & Conversational Speech Recognition
SmartphoneVoice searchApple Siri
Home applianceAmazon EchoGoogle Home
Lecture &MeetingParliamentSwitchboard
Human‐HumanConversation
Close‐talking Input Distant
Conversational
Speaking‐style
Query/command(one‐sentence)
90%90%
93%95%
Close‐talk 82%Gun‐mic 72%Distant 66%
Accuracy is degraded with the synergy of two factors
58
![Page 52: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/52.jpg)
Review of ASRError Robustness and Recovery• Task and interaction need to be designed to work with low ASR accuracy
• Attentive listening
• Confirmation of critical words for actions• Command & control• Ordering
• Error recovery is difficult• Start‐over is easier for users, too
• Use of GUI?
© Softbank59
![Page 53: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/53.jpg)
Review of ASRLatency is Critical for Human‐like Conversation• Turn‐switch interval in human dialogue
• Average ~500msec• 700msec is too late difficult for smooth conversation (cf.) oversea phone calls
• Many cloud‐based ASR hardly meets requirement
• Recent Development of Streaming End‐to‐End ASR• All downstream NLP modules must also be tuned
60
![Page 54: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/54.jpg)
End‐to‐End Automatic Speech Recognition (ASR)
Enhancement
Acoustic model
Phone model Lexicon Lang.
model
Joint optimization
Acoustic‐to‐Phone/Character
Acoustic‐to‐Word
P(S|X)
P(W|X)
X S W
Speech(acoustic sequence)
Text(word sequence)
61
![Page 55: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/55.jpg)
End‐to‐End Speech Understanding
Enhancement
Acoustic model
Phone model Lexicon Lang.
model
Acoustic‐to‐Concept/Emotion
Speech(acoustic sequence)
Intention,Emotion
Concept &Emotion model
Prosody
How to definein open domain?
62
![Page 56: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/56.jpg)
Text‐To‐Speech Synthesis (TTS)
63
![Page 57: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/57.jpg)
Requirements in Text‐To‐Speech Synthesis (TTS)
• Very high quality• Intelligibility• Naturalness matched to the character (pet, kid, mechanical, humanoid)
• Conversational style rather than text‐reading• Questions (direct/indirect)
• A variety of non‐lexical utterances with a variety of prosody• Backchannels• Fillers• Laughter
Hardly implementedin conventional TTS
64
![Page 58: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/58.jpg)
End‐to‐End Text‐To‐Speech Synthesis (TTS)Tacotron 2 (2017‐)• Seq2seq model: char. seq. acoustic features• Wavenet: acoustic features waveform• “Comparable‐to‐Human performance”
• Mean Opinion Score (MOS) 4.53 vs. 4.58
https://google.github.io/tacotron/publications/tacotron2/Turing Test: Tacotron 2 or Human?
65
![Page 59: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/59.jpg)
Voice of Android ERICA
Conversation-oriented• Backchannels• Filler• Laughter
http://voicetext.jp (ERICA)
66
![Page 60: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/60.jpg)
Spoken Language Understanding (SLU)and
Dialogue Management (DM)
67
![Page 61: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/61.jpg)
DialogManagement
AutomaticSpeech
Recognition
Text-To-SpeechSynthesis
SpokenLanguage
Understanding
LanguageGeneration
TTS engine
ASR engine
Architecture of Spoken Dialogue System (SDS)Analysis/Matching → seq2seq model
68
![Page 62: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/62.jpg)
Historical Shift of Methodology
Finite State Machine (FSM)
Rule‐based mapping (SQL)
Prefixed flow
Statistical LM(N‐gram)
Discriminative model (SVM/CRF)
Reinforcementlearning (POMDP)
Example‐based model (VSM)
Neural LM(RNN)
Neural Classifier (RNN)
Seq2Seq model(encoder‐decoder)
AutomaticSpeechRecognition
SpokenLanguageUnderstanding
DialogueManagement
End‐to‐End model w/o SLU
Rule‐based(1990s)
Statistical Model(2000s)
Neural Model(2010s)
69
![Page 63: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/63.jpg)
Semantic Analysis for SLU
• Domain(ex.) weather, access, restaurant
• Intent• Many domains accept only one intent
(ex.) weather, access• Some accepts many kinds of queries
(ex.) scheduler…where, when
• Slot/Entity• Named Entity (NE) tagger• Numerical values
“play queen bohemian rhapsody”
DOMAIN: music_player
INTENT: start_music
SINGER: queenSONG: bohemian rhapsody
70
![Page 64: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/64.jpg)
Semantic Analysis for SLU
• Domain(ex.) weather, access, restaurant
• Intent• Many domains accept only one intent
(ex.) weather, access• Some accepts many kinds of queries
(ex.) scheduler…where, when
“play queen bohemian rhapsody”
DOMAIN: music_playerweather information……
INTENT: start_musicstop_musicvolume_up…..
Classification problem, given entire sentence• Statistical Discriminative Model: SVM, Logistic Regression• Neural Classifier: CNN, RNN 71
![Page 65: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/65.jpg)
Semantic Analysis for SLU
• Slot/Entity• Named Entity (NE) tagger• Numerical values
“play queen bohemian rhapsody”O B‐singer B‐song I‐song
SINGER: queenSONG: bohemian rhapsody
Sequence labeling problem• Statistical Discriminative Model: CRF• Neural Tagger: RNNDomain‐independent NE tagger
72
![Page 66: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/66.jpg)
Dialogue Management
• Decide proper Action• Make query/command• Present results
• Maintain Context
“What is the weather of Kyoto tomorrow?”
DOMAIN: weatherPLACE: kyotoDAY: saturday
Ask_Weather(PLACE: kyoto, DAY: saturday)
“Kyoto will be fine on this Saturday”
“How about Tokyo?”
DOMAIN: weatherPLACE: tokyoDAY: saturday
“Tokyo will be cloudy on this Saturday”
73
![Page 67: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/67.jpg)
Dialogue Management
• Decide proper Action• Make query/command• Present results
“What is the weather of Kyoto tomorrow?”
DOMAIN: weatherPLACE: kyotoDAY: saturday
Ask_Weather(PLACE: kyoto, DAY: saturday)
“Kyoto will be fine on this Saturday”
• Prefixed (hand‐crafted) flow• still pragmatic• Google Dialogflow, Microsoft LUIS..
• Reinforcement learning of stochastic model (POMDP)• Considers uncertainty/errors in input/processing• Difficult for maintenance, minor fix
• Neural model? 74
![Page 68: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/68.jpg)
Incomplete or Ambiguous Queries
• Majority of actions can be done with required slots(ex.) Weather place (date), Access destination, origin,
Take_object object (place)
• If some slot is missing,or some entity is ambiguous,the system
• needs to ask usersOR• use a default value
• current location/time• most frequently used one
• present all in GUI
“Tell me the weather?”“Weather in Cambridge?”
“Which location?”“Cambridge in UK or MA, USA?”
Widely used in smartphone assistants,but not necessarily applicable to robots working in a real word (w/o GUI)
75
![Page 69: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/69.jpg)
Disambiguation by DialogueSHRDLU [Winograd1972]Operating blocks in virtual worldU: PICK UP A BIG RED BLOCK. S: OK. (does it) U: GRASP THE PYRAMID. S: I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. U: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX. S: BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. U: OK. (does it)
76
![Page 70: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/70.jpg)
Review of Orthodox SLU+DM
• Show intelligent behaviors• Complex and not easy to implement
• Hand‐crafted rules• Easy and practical solution• Not flexible and scalable
• Machine learning (Statistical & Neural)• Needs a lot of data with annotation• Not pragmatic for each task domain
• Except for standard benchmark datasets and big companies• Artificial data generation? or Simulation?
Only works for small tasks
Only possible for big companies
77
![Page 71: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/71.jpg)
Alternatives of Orthodox SLU+DM
• End‐to‐End Dialogue Model w/o SLU• Example‐Based Dialogue Model• Neural Conversation (Seq2Seq) model
78
![Page 72: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/72.jpg)
Dialogue based on Pattern MatchingELIZA [Weizenbaum1966]CounselingUser: Men are all alike.ELIZA: In what way?User: They’re always bugging us about something or other.ELIZA: Can you think of a specific example?User: Well, my boyfriend made me come here.ELIZA: Your boyfriend made you come here.
79
![Page 73: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/73.jpg)
Dialogue based on Pattern MatchingELIZA [Weizenbaum1966]
Word:Rankalike:10
always:5
what:2
OutputsIn what way?What resemblance do you see?
Can you think of a specific example?Really, always?
Why do you ask?Does that interest you?
80
![Page 74: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/74.jpg)
Example‐Based Dialogue Model
Input (example template) Action / Output
what is the weather of PLACE Weather(PLACE, today)is PLACE fine on DAY Weather(PLACE, DAY)I am going to PLACE Access(current, PLACE)Tell me how to get to PLACE Access(current, PLACE)It is hot today turn_on_airconditioner
“Why don’t you have some beer?”
“We are going to Tokyo for a meeting”
“we are going to PLACE for a meeting”
“Here is a direction to get to Tokyo” 81
![Page 75: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/75.jpg)
Example‐Based Dialogue Model
• Vector Space Model (VSM)• Feature: Bag‐Of‐Words model (1‐hot vector word embedding)• Metric: cosine distance weighted on content words
• Neural model• Compute similarity between input text and example templates (in shortlist)• Elaborate matching by considering context• Needs a training data set
82
![Page 76: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/76.jpg)
Incorporation of Information Retrieval (IR) and Question Answering (QA)• Example database…limited & hand‐crafted
• IR technology to search for relevant text• Large documents or Web
• Manuals, recipe “How can I change the battery?”• Wikipedia “I want to visit Kinkakuji temple”• news articles “How was New York Yankees yesterday?”
• Need to modify the text for response utterance• QA technology to find an answer
• Who, when, where…• When was Kinkakuji temple built?• How tall is Mt. Fuji?
• Works only with limited cases83
![Page 77: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/77.jpg)
Review of Example‐Based Dialogue Model
• Easy to implement and generate high‐quality responses• Pragmatic solution for working systems and robots
• Applicable only to a limited domain and not scalable• ~hundreds of patterns
• Does not consider dialogue context• One query One response• Need an anaphora resolution for “he/she/it”• Shallow interaction, Not so intelligent
84
![Page 78: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/78.jpg)
Neural Conversation Model
how are you <eos>
i
i
am
am
fine
fine
<eos>
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
Input
Response
85
![Page 79: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/79.jpg)
Encoder‐Decoder (Seq2Seq) Model with Attention Mechanism
I am fine and how about you
Input Xt hello how are you doing in these days
Distributerep. Ht
Response Yj
Encoder LSTM
・・・・・・・
Decoder LSTMAttention mechanism
weight
ai =f(si, ht)(Σ aihi)
Word embedding
86
![Page 80: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/80.jpg)
Encoder‐Decoder (Seq2Seq) Model with Attention Mechanism
x1
h1
x2
h2
x3
h3
LSTM
LSTM
LSTM
LSTM
LSTMy1 y2
a1 a2 a3
Decoder
Encoder
c1 c2• Encode input sequence via LSTM• Decode with another LSTM
– Asynchronous with input
• Weights on encoded LSTMoutput (Σ aihi)– Weight ai are computed based on decoder state and output
• End‐to‐end joint training
Attentionweights
87
![Page 81: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/81.jpg)
Review of Neural Seq2Seq Model
• Needs a huge amount of training data• Ubuntu [Lowe et al 15] software support• OpenSubtitles [Lison et al 2016] Movie Subtitles• Reddit [Yang et al 2018] text on bulletin boards
• Consider dialogue context (by encoding)• Do NOT explicitly conduct SLU to infer intent and slot values• NOT straightforward to integrate with external DB & KB• Converge to generic responses with little diversity
• Frequent and acceptable in many cases“I see”, “really?”, “how about you?”
88
![Page 82: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/82.jpg)
Ground‐truth in Dialogue(?)
• Many choices in response given a user input• Trade‐off
• Safe (boring)• Elaborate (challenging)
• Simple retrieval or machine learning from human conversations is NOT sufficient
• Filter golden samples• Need a model of emotions, desire and characters
89
I like cheese.(a) That’s good. (Reaction)(b) I like blue cheese. (Statement)(c) What kind of cheese? (Question)
![Page 83: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/83.jpg)
(Summary) Review of Dialogue Models
• SLU + Dialog Flow• Suitable for goal‐oriented (complex) dialogue• Provide appropriate interactions for limited scenarios
• Example‐Based Dialogue and QA• Suitable for simple tasks and conversations• One response per one query
• Chatting based on Neural Seq2Seq Model• Very shallow but wide coverage• Useful for ice‐breaking, relaxing and keeping engagement
HybridCombination
90
![Page 84: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/84.jpg)
4. What kind of non‐verbal and other modalities are usefulfor human‐robot interaction?
91
![Page 85: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/85.jpg)
Non‐verbal Issuesin Dialogue
92
![Page 86: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/86.jpg)
Protocol of Spoken Dialogue
• Human‐Machine Interface• Command & Control• Database/Information Retrieval• One command/query One response• No user utterance No response
• Human‐Human Dialogue• Task goals are not definite• Many sentences per one turn• Backchannels
Half duplex
Full duplex93
![Page 87: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/87.jpg)
Non‐lexical utterances‐‐“Voice” beyond “Speech”‐‐• Continuer Backchannels: “right”, “はい”
• listening, understanding, agreeing to the speaker
• Assessment Backchannels: “wow”, “へー”• Surprise, interest and empathy
• Fillers: “well”, “えーと”• Attention, politeness
• Laughter• Funny, socializing, self‐pity
94
![Page 88: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/88.jpg)
Comparison of Dialogue InterfacesSmart Speaker
Virtual Agent
Pet Robot
Child Robot
Adult Android
Continuer BC “right”Assessment BC “wow”
Filler “well”
laughter
95???
![Page 89: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/89.jpg)
Role of Backchannels
• Feedback for smooth communication• Indicate that the listener is listening, understanding, agreeing to the speaker“right” , “はい” , “うん”
• Express listener’s reactions• Surprise, interest and empathy“wow”, “あー” , “へー”
• Produce a sense of rhythm and feelings of synchrony, contingency and rapport
96
![Page 90: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/90.jpg)
Factors in Backchannel Generation
• Timing (when)• Usually at the end of speaker’s utterances• Should predict before end‐point detection
• Lexical form (what)• Machine learning using prosodic and linguistic features
• Prosody (how)• Adjust according to preceding user utterance
(cf.) Many systems use same recorded pattern,giving monotonous impression to users
97
![Page 91: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/91.jpg)
Generating Backchannels
• Conventional: fixed patterns
• Random 4 kinds
• Machine learning: context‐dependent(proposed)
98
![Page 92: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/92.jpg)
Subjective Evaluation of Backchannels[Kawahara:INTERSPEECH16]
random proposed counselorAre backchannels natural? ‐0.42 1.04 0.79Are backchannels in good tempo? 0.25 1.29 1.00Did the system understand well? ‐0.13 1.17 0.79Did the system show empathy? 0.13 1.04 0.46Would like to talk to this system? ‐0.33 0.96 0.29
• obtained higher rating than random generation• even comparable to the counselor’s choice, though the scores are not sufficiently high
• Same voice files are used for each backchannel form• Need to change the prosody as well 99
![Page 93: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/93.jpg)
Role of Fillers
• Signals thinking & hesitation• Improves comprehension
• Provide time for comprehension
• Attracts attention & improves politeness• Mitigate abrupt speaking
• Smooth turn‐taking• Hold the current turn, or Take a turn
100
![Page 94: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/94.jpg)
Factors in Filler Generation
• Timing (when)• Usually at the beginning of speaker’s utterances
• Lexical form (what)• Machine learning using prosodic and linguistic features and also dialogue acts
• Prosody (how)???
(cf.) frequent generation of fillers (at every pause) is annoying
101
![Page 95: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/95.jpg)
Generating Fillers
• No filler
• Filler before moving to next question
102
![Page 96: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/96.jpg)
Generating Laugher
• People laugh not necessarily because funny• But to socialize and relax
• Should laugh together (shared‐laughter)
• Sometimes for masochistic• Should not respond to negative laugher
103
![Page 97: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/97.jpg)
Detection of Laughter, Backchannels & Fillers
Detectionresult
Groundtruth
104
![Page 98: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/98.jpg)
Turn‐taking
105
![Page 99: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/99.jpg)
Protocol of Spoken Dialogue
• Human‐Machine Interface• One command/query One response• No user utterance No response
• Human‐Human Dialogue• Many sentences per one turn• Backchannels
Half duplex
Full duplex106
![Page 100: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/100.jpg)
Tool
• Smartphone Assistants
• Smart Speakers
Companion, Partner
• Communicative Robots
107
![Page 101: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/101.jpg)
Flexible Turn‐taking
• Natural turn‐taking push‐to‐talk, magic words
• Avoid speech collision (of system utterance in user utterance) required• Latency of robot’s response
• Allow barge‐in (user utterance while system speaking)? challenging• ASR and SLU errors
• Machine learning using human conversation is not easy• Behavior is different between human‐human and human‐robot• Turn‐taking is arbitrary, no ground‐truth
108
![Page 102: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/102.jpg)
Turn‐switch after Question
Speaker A
Speaker B
Question
Response
“What do you like to eat?” “Do you like Sushi?”
Question
“I like Sushi.”
Obvious?
109
![Page 103: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/103.jpg)
Turn‐keep/switch after Statement?
Statement Statement
Response
Speaker A
Speaker B
“I like Tempura.”
Ambiguous
“I like Sushi very much.” “Sushi is best in Japan.”
110
![Page 104: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/104.jpg)
Turn‐keep/switch after Response?
ResponseResponseStatementQuestionQuestionResponseStatement
Speaker A
Speaker B
Very Ambiguous
“I like Sushi.“
• “I like Tempura, too.”• “Sushi is best in Japan.”• “What do you like?“
• “What kind of Sushi?”• “I like Sushi, too.”• “Sushi is best in Japan.”
“What do you like?”
111
![Page 105: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/105.jpg)
Turn‐taking Prediction Model
• System needs to determine if the user keeps talking or the system can (or should) take a turn
• Turn‐taking cue (features) can be different between human and robot• Prosody…pause, pitch, power• Eye‐gaze
• Machine learning model ground truth? Turn‐taking is arbitrary• Logistic regression…decision at each end of utterance• LSTM…frame‐wise prediction, but decision at each end of utterance
112
![Page 106: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/106.jpg)
Proactive Turn‐taking System
• Fuzzy decision Binary decision
• Use fillers and backchannels when ambiguous
User status System actionUser definitely holds a turn nothingUser maybe holds a turn continuer backchannelUser maybe yields a turn filler to take a turnUser definitely yields a turn response
confiden
ce
113
![Page 107: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/107.jpg)
Turn‐keep/switch after Statement?
Statement Statement
backchannel
Speaker A
Speaker B
“I like Sushi very much.” “Sushi is best in Japan.”
“right.”
Please go on!
114
![Page 108: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/108.jpg)
Turn‐keep/switch after Statement?
fillerStatement StatementSpeaker A
Speaker B
“I like Sushi very much.” “Well,“
Hold a turn
“Sushi is best in Japan.”
115
![Page 109: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/109.jpg)
Use Filler (+Gaze Aversion)for Proactive Turn‐taking
116
![Page 110: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/110.jpg)
References
1. Christoph Bartneck, Tony Belpaeme, Friederike Eyssel, Takayuki Kanda, Merel Keijsers, Selma Sabanovi.Human‐Robot Interaction — An Introduction.https://www.human‐robot‐interaction.org/
2. T. Kawahara.Spoken dialogue system for a human‐like conversational robot ERICA. In Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS), (keynote speech), 2018.
3. K. Jokinen. Dialogue Models for Socially Intelligent Robots.In Proc. ICSR, 2018.
137
![Page 111: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/111.jpg)
Agenda (Research Questions)
0. Why social robots are not prevalent in society?1. What kind of tasks are social robots expected to perform?2. What kind of social robots are suitable for the tasks?3. Why spoken dialogue is not working with robots?
1. ASR and TTS2. SLU+DM (end‐to‐end?)
4. What kind of non‐verbal and other modalities are useful?1. Backchannel, turn‐taking2. Eye‐gaze, gestures
5. What kind of system architectures are suitable?6. What kind of ethical issues must be considered?
4
Kawahara
Jokinen
breakoptional
![Page 112: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/112.jpg)
Gaze and Attention
• Visual processing systems (Unema et al. 2005, Findlay & Gilchrist 2003)• Global processing: long saccades and short fixations
early in the viewing to get a gist of a scene and the main regions of interest
• Local processing: short saccades and long fixations later in the viewing to examine the focus of attention in more details
• Eye‐gaze in everyday activities (Land, 2006)• Proactive and preparatory information• Gaze and gesture coordination:
look ahead before manipulating them
Visual Attention in Interaction
• Gaze in human‐human and human‐agent interactive situations (Kendon 1967, Argyle & Cook 1976, Nakano et al. 2007, Edlund et al. 2000, Mutlu et al. 2009, Jokinen et al. 2009, 2010, Andrist et al. 2014)• Focus of shared attention
• Coordination of turn‐taking (mutual gaze)
• “Pointing device”
• Conversational feedback
• Building trust and rapport
Attention: a process to select the information that enters working memory to be processed (Knudsen, 2007)
Primary gaze function is to get information. Also, gaze indicates one’s attention, engagement, and presence. It coordinates and organises interaction.
![Page 113: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/113.jpg)
Measurements• Frequency of fixations on AOI (area of interest)• Duration of individual fixations on the AOI• Accumulated fixations time on the AOI
• Average Gazing Ratio (GR):
1𝑛
𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡′𝑠 𝑔𝑎𝑧𝑒 𝑡𝑜𝑤𝑎𝑟𝑑 𝑡ℎ𝑒 𝑟𝑜𝑏𝑜𝑡 𝑑𝑢𝑟𝑖𝑛𝑔 𝑖 𝑤𝑖𝑛𝑑𝑜𝑤𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑖 𝑤𝑖𝑛𝑑𝑜𝑤
Fixations (stops):fixation length 80‐120 ms, ave 3 fixations/secSaccades (jumps): fast eye‐movemens, during which we’re practically blind
![Page 114: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/114.jpg)
Eye‐gaze in predicting turn‐taking• Spoken dialogues studies
• Acoustic correlates at suitable turn‐taking places• Pause length: usually about 0.2 second pause, but if longer than
0.5 seconds, the current speaker is likely to continue talking
• Eye‐gaze in interactions• Speakers look at face area 99% of time• Mutual gaze to agree on the change of speaker• Gaze activity changes more at start of utterance than in middle or
end of utterance • Gaze wanders off quickly after start of utterance, but fixates on
partner a long time before end of utterance
• Eye‐gaze helps to distinguish who will speak after pauses• Gaze aversion and longer pause signals hesitation
=> the current speaker wants to hold the turn• Gaze at the partner and longer pause signals end of utterance
=> the speaker wants to give turn to the partner
Jokinen, K., Furukawa, H., Nishida, M., Yamamoto, S. (2013). Gaze and Turn‐taking behaviour in Casual Conversational Interactions. ACM Transactions on Interactive Intelligent Systems (TiiS) Journal, Special Section on Eye‐gaze and Conversational Engagement, Guest Editors: Elisabeth André and Joyce Chai. Vol 3, Issue 2.
![Page 115: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/115.jpg)
Gaze behavior in three‐party conversations
• Interlocutors: speaker, active partner, silent partner• More gazing to the active partner than to the silent one• More gaze activity to both partners when speaking than when listening or backchannelling• When speaking, gaze is divided between partners• When listening, gaze is directed at the active partner
• Silent partner’s impact on the conversation:• If silent partner is passive (not moving), Subjectgazes at the silent partner’s face less often but twice aslong than when this is engaged (seems to listen actively)
• If the silent partner passive, Subject gazes at the background more than engaged actively
Levitski, A., Radun, J., Jokinen, K. (2012). Visual Interaction and Conversational Activity. The 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye Gaze and Multimodality, at the 14th ACM International Conference on Multimodal Interaction. ICMI.
![Page 116: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/116.jpg)
Eye‐gaze and unexpected dialogue breakdownsUsing the AICO corpus (Jokinen 2020) and gaze ratio measurement:• Usually participants tend to gaze away from the robot after they finish speaking (up to 200ms) and start to gaze at the
robot when the robot gives feedback (after 200ms) ‐> Correctly understood or Misunderstood utterances
• Unexpected responses (robot gives no feedback or says something unexpected) produce a different gaze pattern: participants gaze at somewhere else than the robot (after 200ms) ‐> Not‐understood utterances
50.0%
55.0%
60.0%
65.0%
70.0%
75.0%
BEFORE ‐200~0MS DURING UTTERANCE AFTER 0~200MS AFTER 200~400MS AFTER 400~600MS
SPEA
KER’S GAZ
ING RAT
IO
All utterances Correct‐understood Miss‐understood None‐understood“All utterances” include Other and Backchannel
Ijuin, K., Jokinen, K. (2019). Utterances in Social Robot Interactions – correlation analyses between robot’s fluency and participant’s impression. HAI 2019
Gaze ratio:Ave. ratio between duration of looking
at the partner within a window and length
of the window
Cognitive processing demands are reflected in the eye‐gaze behaviour
![Page 117: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/117.jpg)
Human-Human Human-RobotStory-telling Face Face, Body
Instruction giving Face Face
Comparison of Eye Gazes in HHI and HRI
Laohakangvalvit, T., Jokinen, K. (2019). Eye‐gaze Behaviors between Human‐Human and Human‐Robot Interactions in Natural Scene. ERCEM, August 2019
• Participants tend to look at “face” in HHIand “body” in HRI.
• In HHI, humans monitor partner’s reactions expressed by face and eyes, which provide necessary information.
• In HRI, participants focus on the robot’s body rather than on its face, since the face does not provide “live” information about its internal state like its emotion or intention.
• Results concern Nao robotic face, may be different for Erica’s android face
![Page 118: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/118.jpg)
Comparison of Eye Gazes in HHI and HRI (results)
Laohakangvalvit, T., Jokinen, K. (2019). Eye‐gaze Behaviors between Human‐Human and Human‐Robot Interactions in Natural Scene. ERCEM, August 2019
![Page 119: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/119.jpg)
Eye‐gaze and perceived personality traits
6. Reserved, quiet7. Sympathetic, warm8. Disorganized, careless9. Calm, emotionally stable10. Conventional, uncreative
Ijuin & Jokinen (2020). Exploring Gaze Behaviour and Perceived Personality Traits. HCII 2020.
Questionnaire (7‐point Likert scale) composed on 10 questions on perceived personality, paired with Big5 personality scale:I believe the participant to be: Big 5 Model (McCrae 2002):
1. Extraverted, enthusiastic2. Critical, quarrelsome3. Dependable, self‐disciplined4. Anxious, easily upset5. Open to new experiences
Extraversion (1 & 6)Agreeableness (2 & 7)
Conscientiousness (3 & 8)Emotional stability (4 & 9)
Openness (5 & 10)
• Positive correlation between frequency of gaze to partner’s face in HHI & perceived emotional stability of the subject ‐> relaxed, interest in the partner, deals well with stress
• Positive correlation between average duration of gaze to partner’s body in HRI & perceived openness to experience of the subject ‐> curiosity towards the robot
• Other factors besides personality: social politeness, interest in robot gesturing, type of robot (Nao‐robot)
![Page 120: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/120.jpg)
Gesturing and intercultural aspects in HHI and HRI
LREC ONION Workhshop on peOple in laNguage, vIsiOn and the miNd.
https://onion2020.github.io/presentations/
• Hand, head and body gestures in• human‐human interaction vs human‐robot interaction• Japanese vs English interaction
• AICO Corpus (Jokinen, 2020)• 60 conversations (30 participants × 2 sessions)• 20 Japanese + 10 English speaking participants• 30 human‐human (HH) + 30 human‐robot (HR) interactions• Annotated with gesture form and function based on the
MUMIN annotation scheme (Allwood et al. 2007)
• Hand gestures are 6 times more frequent in English than Japanese HHI,with more varied form and function, but the difference is not big in HRI
• Head gestures reduced significantly in HRI, but the difference between Japanese and English speakers is not big
• Body gestures increased in HRI, and English speakers produced more body gestures than Japanese
• Gesture detection and activity analysis are important for HRI, but it is important to elicit human gesturing in HRI first
![Page 121: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/121.jpg)
Understanding
NLP Topic
Interaction
Decision Making
Planning
Learning
Memory
Reaction
TTS
Head Action
Gesturing
Moving
Contact
SignalDetection
Perception
Attention
ASR
Gaze recognition
Gesture recognition Dialogue Context
Digital DatabasesKnow
ledge bases
Ontologies
Awareness of communication Engagement
in interaction
Situational Awareness
Ability to communicate smoothly in a given situation• Knowledge of the world, partner, “what is going on”• States of knowledge (dialogue states)• Perception, feedback, and action
→
Endsley (1995)
Grounding and
Mutual Context
![Page 122: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/122.jpg)
WikiTalk and WikiListenTowards listening robots that can join in conversations with topically relevant contributions
Challenges• Progress on multiparty speech recognition• Progress on robust wikification of speech • Ethical, legal & social issues of listening
robots (saving/clearing short/long‐term memories)
WikiTalk: Wikipedia‐based talking• Robots can talk fluently about thousands of
topics using Wikipedia information• Robots make smooth shifts to related topics
predicted from Wikipedia links• Topics are disambiguated using the
continuously changing dialogue context
WikiListen: Wikipedia‐based listening• Robots will listen to multiparty human
conversations and track changing topics• Wikification of speech to link mentioned
entities and events to Wikipedia articles• Later, robots will learn to join in with
topically relevant dialogue contributions
ERICA and WikiTalk
NAO and WikiTalkD. Lala, G. Wilcock, K. Jokinen, T. Kawahara. ERICA and WikiTalk. IJCAI 2019.G. Wilcock, K. Jokinen. Multilingual WikiTalk: Wikipedia‐based talking robots that switch languages. SIGDial 2015.
![Page 123: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/123.jpg)
Agenda (Research Questions)
0. Why social robots are not prevalent in society?1. What kind of tasks are social robots expected to perform?2. What kind of social robots are suitable for the tasks?3. Why spoken dialogue is not working with robots?
1. ASR and TTS2. SLU+DM (end‐to‐end?)
4. What kind of non‐verbal and other modalities are useful?1. Backchannel, turn‐taking2. Eye‐gaze
5. What kind of system architectures are suitable?6. What kind of ethical issues must be considered?
16
Kawahara
Jokinen
breakoptional
![Page 124: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/124.jpg)
Co‐creating Interaction
Allwood (1976), Jokinen (2009)
•Production of one’s own action•Refinement and execution of the planReaction
•Meaning creation for the symbols in thecontext
•Establish joint goals, plansUnderstanding
•Recognition of communicative signals(symbols)
•Build representation of conceptsPerception
•Hearing/seeing/touching (distance)•Establish the communicative situationContact
Grounding and Mutual Context
![Page 125: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/125.jpg)
•Production of one’s own action•Refinement and execution of the planReaction
•Meaning creation for the symbols in thecontext
•Establish joint goals, plansUnderstanding
•Recognition of communicative signals(symbols)
•Build representation of conceptsPerception
•Hearing/seeing/touching (distance)•Establish the communicative situationContact
Co‐creating Interaction
Jokinen (2009)
![Page 126: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/126.jpg)
Constructive Dialogue Modelling (CDM)
Sets of modules which correspond to the four communicative enablementsJokinen (2009, 2019)
![Page 127: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/127.jpg)
More modularised system architecture
SignalDetectionModule
Contact Perception Understanding Reaction
ASRModule
GazeRecognitionModule
GestureRecognitionModule
NLPModule TTS
Module
HeadAction Module
GestureAction Module
Digital Knowledge Base
Externalworld External
world
PlanningModule
InteractionModule
Context
Memory
TopicModule
Attention
LearningModule
DecisionMakingModule
Ontology
ASR+keywordspotting
Event‐based semanticsTopicRec, ”Proposals”
Template Generation”Autonomous Life”
Jokinen (2009, 2019), Jokinen & Wilcock (2014)
![Page 128: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/128.jpg)
More end2end integration (deep learning)
![Page 129: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/129.jpg)
Yoshua Bengio (IJCAI 2018)• What’s Missing with Deep Learning? • Answer: Deep Understanding
• Learning « How the world ticks »• So long as our machine learning models “cheat” by relying only on superficial statistical regularities, they remain vulnerable to out‐of‐distribution examples
• Humans generalize better than other animals thanks to a more accurate internal model of the underlying causal relationships
• To predict future situations (e.g., the effect of planned actions) far from anything seen before while involving known concepts, an essential component of reasoning, intelligence and science
• Deep learning to expand from perception & system 1 cognition to reasoning & system 2 cognition (Kahneman (2011) Thinking, Fast and Slow.)
![Page 130: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/130.jpg)
Subsumption Architecture for Autonomous Robots
Brooks (1986), Li et al. (2016)
![Page 131: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/131.jpg)
Dialogue Acts (Speech, Text)
Physical Acts(Gaze,Gesturing,Moving)
DATA
FUSION
Speech
Gaze
Gesturing
Sensoring
IoT data
Goals, reason to interact
Task plans, memories
Topic
NLU, NLG
Reference
Signal detection
Attention
Dialogue State
Dialogue policy
MULTIMODAL INPUTS MULTIMODAL OUTPUTS
Context‐Aware Cognitive Agent Architecture
Jokinen (2020). Robotdial. IJCAI.Subsumption architecture with a hierarchy of layers which relate to behavioural competences = communicative enablements
![Page 132: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/132.jpg)
Agenda (Research Questions)
0. Why social robots are not prevalent in society?1. What kind of tasks are social robots expected to perform?2. What kind of social robots are suitable for the tasks?3. Why spoken dialogue is not working with robots?
1. ASR and TTS2. SLU+DM (end‐to‐end?)
4. What kind of non‐verbal and other modalities are useful?1. Backchannel, turn‐taking2. Eye‐gaze
5. What kind of system architectures are suitable?6. What kind of ethical issues must be considered?
25
Kawahara
Jokinen
breakoptional
![Page 133: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/133.jpg)
Human‐like, but not human communication
• Reeves and Nass: anthropomorphize inanimated objects
• Mori: Uncanny valley
• Moore 2012: Bayesian explanation of Uncanny Valley: CognitiveDissonance between conflicting categories
• Jokinen and Watanabe (2019): Robots as boundary‐crossing agents for new services
5/12/2019 NerdNite Tokyo / K.Jokinen
![Page 134: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/134.jpg)
Robot’s Dual Characteristics
• Processing capability• Accurate movement/ mobility
• Information from Internet
Robot as a smart computer
• Dialogue capability• Social awareness
Robot as a smart agent
![Page 135: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/135.jpg)
Boundary Crossing Robots• Facilitate interaction and mutual intelligibility between different perspectives• Novel ways to interact with social robots as cooperative agents
• Different boundaries to cross:• Conceptual categorization • Team membership• Expectations on understanding
• Experiments with various types of social robots• Assigning a clear role to the robot agent in the context
• Goal • not only to increase user’s positive experience with robot• but to minimize differences between the user’s expectations
and experience of social robot agents
• Symbiotic relation between humans and robots
Jokinen K., Watanabe K. (2019) Boundary‐Crossing Robots: Societal Impact of Interactions with Socially Capable Autonomous Agents. In: Salichs M. et al. (eds) Social Robotics. ICSR 2019. Watanabe, K., Jokinen, K., 2020. Interactive Robotic Systems as Boundary‐Crossing Robots – the User’s View. Ro‐Man
![Page 136: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/136.jpg)
Ethics, trust, and reliability• Data storing and privacy
• Encryption, secure identification, periodic deletion
• Issues in the development of dialogue systems• Unintentional biases in the data for the development of dialogue models • Data sharing, transfer learning, sensitive info, masking• Delivery of sensitive information: critical vs. less important info• System evaluation: new evaluation metrics, acceptance and suitability:
what is conversational adequacy?
• Legal issues• Awareness and conscious agreement of recording, logging• Ownership of the dialogue data and its use• Access rights to interactive situations (family, friends, staff, passers‐by, officials)• Responsibility for actions and information (inaccurate, unreliable, prejudiced, …)
![Page 137: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/137.jpg)
Ethical issues for social robot applications
• Personalisation and individual preferences • Embodiment, appearance, digital • True information vs. respecting the person’s
view‐point• Long‐term interactions
• Affection and emotional support• Mutual trust • Learning interaction strategies through
interaction• Recorded changes count towards long‐term
monitoring• Tradeoff between security & safety vs. privacy
• Social norms and general principles • Appropriate, trustworthy and acceptable
behaviour • Different cultures, different social norms
• Standards and standardisation• Maximize compatibility, interoperability,
safety, quality, explainability• Repeatability or creative variation
• Participatory research• Involving final users
Acceptance and impactWhat kind of robot systems are desirable? (function, appearance)
Where can the social robot make a difference?
![Page 138: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/138.jpg)
Future capabilities of social robots?
Can a robot be a counselor?
Can a robot assess a human?
Can AI assess a human?
Can robot have conflicting goals with humans?
Can a robot have a personality?
Can a robot be a soul mate of a senior person?
Can an AI agent be a soul mate (lover) of a young person?
![Page 139: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/139.jpg)
Future capabilities of social robots?
Can a robot be a counselor? yes
Can a robot assess a human? yes
Can AI assess a human? yes
Can robot have conflicting goals with humans? yes
Can a robot have a personality? yes
Can a robot be a soul mate of a senior person? yes
Can an AI agent be a soul mate (lover) of a young person? yes
![Page 140: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/140.jpg)
Examples of the capabilities
Can a robot be a counselor? Eliza (Weizenbaum 1966), Ellie (ICT, 2011, https://ict.usc.edu/prototypes/simsensei/
Can a robot assess a human?
DiNuovo et al. 2019 https://www.researchgate.net/publication/331673714_Assessment_of_Cognitive_skills_via_Human‐robot_Interaction and Cloud_Computing
Can AI assess a human? Personality tests, job interviews, implicit assessment of articles by author information, …
Can robot have conflicting goals with humans? HAL2000, navigation obstacles, recommendations unacceptable
Can a robot have a personality? Affection, humor, appearance, …
Can a robot be a soul mate of a senior person? Companion robots
Can an AI agent be a soul mate (lover) of a young person? Companion robots
![Page 141: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/141.jpg)
Future capabilities and ethics
Should a robot provide counseling?
Should a robot perform assessment of a human?
Should AI perform assessment a human?
Should robot be allowed to have conflicting goals with humans?
Should a robot have a personality?
Should a robot be a soul mate of a senior person?
Should an AI agent be a soul mate (lover) of a young person?
![Page 142: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/142.jpg)
Future capabilities and ethics
“It is important to reflect how the capabilities and characteristics of
current robot agents can shape the world and our reality (skills) and how
such agents should shape the future societies and services (needs)”
Jokinen, K. (2020). Exploring Boundaries among Interactive Robots and Humans. In: D'Haro et al. (Eds.) Conversational Dialogue Systems for the Next Decade. LNEE, Springer.
![Page 143: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/143.jpg)
Future capabilities and ethics
Should a robot provide counseling? ???
Should a robot perform assessment of a human? ???
Should AI perform assessment a human? ???
Should robot be allowed to have conflicting goals with humans? ???
Should a robot have a personality? ???
Should a robot be a soul mate of a senior person? ???
Should an AI agent be a soul mate (lover) of a young person? ???
![Page 144: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/144.jpg)
Development of Interactive Agents
![Page 145: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/145.jpg)
Development of Interactive Agents
![Page 146: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/146.jpg)
Some References• Allwood, J. 1976. Linguistic Communication as Action and Cooperation. Gothenburg Monographs in Linguistics, 2. Göteborg
University, Department of Linguistics.
• Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P. 2007. The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Special Issue, International Journal of Language Resources and Evaluation, pp. 273–287.
• Andrist, S., Mutlu, B., Gleicher, M. 2013. Conversational gaze aversion for virtual agents. In Proc. IVA 2013, pp 249–262.
• Argyle M., Cook, M. 1976. Gaze and Mutual Gaze. Cambridge University Press.
• Endsley, M.R.: Toward a Theory of Situation Awareness in Dynamic Systems. Human Factors Journal 37(1), 32‐64
• Findlay, J.M., Gilchrist, I.D. 2003. Active Vision: the Psychology of Looking and Seeing. Oxford University Press, Oxford.
• Ijuin, K., Jokinen, K. (2019). Utterances in Social Robot Interactions – correlation analyses between robot’s fluency and participant’s impression. HAI 2019
• Jokinen, K. 2009. Constructive Dialogue Modelling ‐ Speech Interaction with Rational Agents. John Wiley & Sons.
• Jokinen, K. 2018. Dialogue Models for Socially Intelligent Robots. ICSR 2018, Qingdao.
• Jokinen, K., Harada, K., Nishida, M. Yamamoto, S. 2010. Turn‐alignment using eye‐gaze and speech in conversational interaction.Proceedings of Interspeech‐2010. Makuhari, Japan.
• Jokinen, K., Furukawa, H., Nishida, M., Yamamoto, S. (2013). Gaze and Turn‐taking behaviour in Casual Conversational Interactions. , Special Issue on Eye‐gaze and Conversational Engagement. ACM Transactions on Interactive Intelligent Systems (TiiS), 3:2.
• Jokinen K., Watanabe K. (2019) Boundary‐Crossing Robots: Societal Impact of Interactions with Socially Capable Autonomous Agents. In: Salichs M. et al. (eds) Social Robotics. ICSR 2019.
![Page 147: Spoken Dialoguefor Social RobotsEye‐gaze 5.What kind of system architecturesare suitable? 6.What kind of ethical issues must be considered? 14 Kawahara Jokinen break optional 0](https://reader035.vdocuments.us/reader035/viewer/2022070205/60f761d1dd2c8c546e00c18a/html5/thumbnails/147.jpg)
• Jokinen, K., Hurtig, T. 2006. User Expectations and Real Experience on a Multimodal Interactive System. Proceedings of Interspeech2006, Pittsburgh, US.
• Jokinen, K., Scherer, S. 2012. Embodied Communicative Activity in Cooperative Conversational Interactions ‐ studies in Visual Interaction Management. Acta Polytechnica Hungarica. Journal of Applied Sciences. Vol 9, No. 1, pp. 19‐40.
• Kendon, A. 2004. Gesture: Visible Action as Utterance. Cambridge University Press
• Lala, D., Wilcock, G., Jokinen, K., Kawahara, T. 2019. ERICA and WikiTalk. IJCAI 2019.
• Laohakangvalvit, T., Jokinen, K. (2019). Eye‐gaze Behaviors between Human‐Human and Human‐Robot Interactions in Natural Scene. ERCEM, 2019.
• Levitski, A., Radun, J., Jokinen, K. 2012. Visual Interaction and Conversational Activity. The 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye Gaze and Multimodality Santa Monica, CA, USA
• Mori, T., Jokinen, K., Den, Y. 2020. Analysis of Body Behaviours in Human‐Human and Human‐Robot Interactions LREC ONION Workhshop on peOple in laNguage, vIsiOn and the mind.
• Mutlu, B. Kanda, T. Forlizzi, J. Hodgins, J., Ishiguro, H. 2012. Conversational gaze mechanisms for humanlike robots. ACM Transactions on Interactive Intelligent Systems, 1(2):12:1–12:33.
• Nakano, Y., Conati, C., Bader, T. 2013. Eye Gaze in Intelligent User Interfaces, Springer
• Watanabe, K., Jokinen, K., 2020. Interactive Robotic Systems as Boundary‐Crossing Robots – the User’s View. The 29th IEEE International Conference on Robot and Human Interactive Communication (Ro‐Man).
• Wilcock, G., Jokinen, K. 2015. Multilingual WikiTalk: Wikipedia‐based talking robots that switch languages. SIGDial.
Some References