enterface’08 multimodal communication with robots and virtual agents
TRANSCRIPT
OverviewContext:
• Exploitation of multi-modal signals for the development of an active robot/agent listener
• Storytelling experience :– Speakers told a story of an animated cartoon they had just seen
1- See the cartoon
2- Tell the story to a robot or an agent
OverviewActive listening :
– During natural interaction, speakers see if the statements have been correctly understood (or at least heard).
– Robots/agents should also have active listening skills…
• Characterization of multi-modal signals as inputs of the feedback model:– Speech analysis : prosody, keywords recognition, pauses– Partner analysis : face traking, smile detection
• Robot/agent feedbacks (outputs):– Lexical non-verbal behaviors
• Dialog management:– Feedback model: exploitation of both inputs and outputs signals
• Evaluation:– Storytelling experiences are usually evaluated by annotation
Organization:Workpackages:
• WP1: Speech feature extraction and analysis• WP2: Partner analysis: face tracking and analysis • WP3: Robot and Agent Behavior Analysis• WP4: Dialog management for feedback behaviors• WP5: Evaluation and Annotation• WP6: Deliverables, reports.
Speech AnalysisAutomatic detection of prominence during the interaction
Computational attention algorithms:
eNTERFACE’08Multimodal Communication with
Robots and Virtual AgentsSpeech analysis
for prominence detection
• Have more recently been tested for audio event detection
M. MANCAS, L. COUVREUR, B. GOSSELIN, B. MACQ, 2007, "Computational Attention for Event Detection", Proceedings of ICVS Workshop on Computational Attention & Applications (WCAA-2007) , Bielefeld, Germany, Mar 2007.
• In this project, we intend to test it for the automatic detection of salient speech events, for triggering avatar/robot feedback– Underlying hypothesis: listener is a child, with limited language knowledge test the bottom-up approach, as opposed to the more language-driven
top-down approach:A Top-down Auditory Attention Model For Learning Task Dependent Influences On Prominence Detection In Speech, Ozlem Kalinli and Shrikanth Narayanan, ICASSP’08, 3981-3984.
Computational attention algorithms
Partner analysisAnalysis of human behaviour (non-verbal interaction).
Development of a component able to detect the face and key features of feedback analysis: shaking head, smiling…
Methodology:
Face detection: Viola & Jones face detection,
Head shaking: frequency analysis of interest points
Smile detection: Combining colorimetric and geometric approaches
Robot and Agent Behavior Analysis
Integratation of existing tools to produce an ECA/robot able to display expressive backchannels.
The ECA architecture follows the SAIBA framwork. It is composed of several modules
connected to each other via a Representation Language.
The language FML (Functional Markup Language) connects the module 'intent planning' to 'behavior planning' and BML (Behavior) connects 'behavior planning to 'behavior realiser'. Modules are connected via psyclone, a white board architecture.
Tasks:- define the capabilities the ECA/robot ought to have- create BML (Behavior Markup Language) entries for the lexicon- integrate modules that will endow ECA with such expressive capabilities.- work out carefully the synchronization scheme between modules, in particular between
modules of Speaker and of Listener
Dialog Management
Development of a feedback model with the respect of the input signals (common) and the output capabilities (behavior)
Methodology:
• Representation of input data:– EMMA: Extensible MultiModal Annotation markup language– Definition of task-oriented representation
• Dialog management:
– State Chart XML (SCXML): State Machine Notation for Control Abstraction
– Interpretation of the speaker’s conversation
Evaluation and AnnotationInvestigate the impact of the feedback provided by the robot and the virtual agent on the
user.
A single model of feedback will be defined but implemented differently on therobot and the agent since they have different communication capabilities. Thesystem will be partly simulated (WOZ). If time allows, a functional version of thesystem will be evaluated.
Tasks:• Evaluation protocol: scenario, variables …• System implementation: WOZ• Data collection: recordings• Data analysis: coding schemes, analysis of annotation, computation of evaluation
metrics