mobile multimodal applications. dr. roman englert, gregor glass march 23 rd, 2006

Mobile Multimodal Applications.

Dr. Roman Englert, Gregor Glass March 23rd, 2006

Agenda:

Motivation

Potentials through Multimodality

Use Cases Map & Sound Logo

Components & Modules for Mobile Multimodal Interaction

User Perspective

Challenges

Moore's Law: Technical capability will double approximately every 18 months.

Buxton's Law: Technology designers promise functionality proportional to Moore's Law.

Moore‘s Law:

Growth ofTechnology

Moore's Law: Technical capability will double approximately every 18 months.

Buxton's Law: Technology designers promise functionality proportional to Moore's Law.

Buxton‘s Law:

Growth ofFunctionality

Multimodality.Motivation.

The Challenge is how to deliver more functionality without breaking through the complexity barrier and making the systems so cumbersome as to be completely unusable.

God’s Law (complexity barrier): Human capacity is limited and does not increase over time!

God‘s Law:

Growth of Human Capabiltiy

(Billy Buxton)

Multimodality – New User Interfaces. Composite Usage Szenario: Map.

Example:

User selects a point of interestclicking with a stylus and speaking inorder to focus it.

„Zoom in here”

Multimodality – New User Interfaces. Composite Usage Szenario: SoundLogo

Example:

User selects a sound logoby clicking on the title witha stylus and speaking inorder to hear it

SoundLogo = Personalized Call Connect Signal

„Play this sound logo”

Input Voice Stylus

Gesture …

ClientUser Server Content

Back-EndVoice Data

Dialog Management

Synchronisation Management

Media Resource Management (ASR/TTS)

Output Voice Text

Graphic Video …

User Interface Types of Multimodality

Sequential

Parallel

Architecture Layer Internet / Services

Multimodality – New User Interfaces.Components of multimodal end-to-end connection.

Recognition

grammar

speech

system-generated

Interpretation

interpretation

mouse/keyboard

semanticinterpretation

Integration

integrationprocessor

interactionmanager

system andenvironment

applicationfunctions

sessioncomponent

Multimodality – New User Interfaces.Main modules for parallel interaction.

Multimodality – New User Interfaces. User Perspective:

Feedback nutshell from divers previous innovation projects: “Give us speech control”

Composite interaction with full prototype implementation for customer self service: 2 campaigns (SMS & Personalized Call Connect Signal)

Need to actively communicate the possibilities & advantages of new multimodal interaction paradigm to user

Real appreciation of speech control & good acceptance of “push-to-talk” mode

Expectation: Symmetry & consistency between the interaction modes

BUT: How do users really want to speak to the machine?How to provide feedback? / How to correct input

errors?

Great for context dependent service interaction BUT: Which mode is most suitable for which task?

For whom? Under which circumstances?

Multimodality – New User Interfaces. Challenges:

Sequential vs parallel i/o

Unique interpretation of multimodal hypotheses

Discourse phenomena like anaphora resolution and generation

Input correction loops

Encapsulation of i/o tools to achieve a generic front end

Model Driven Architecture

Thank you for your attention!

Multimodality – New User Interfaces.Sequential and Parallel Input.

Sequential input Multimodal applications may allow

to choose between different input modalities, e.g. to speak or to click on a button

Only one input channel will be interpreted, i.e. the user may speak or click on a button

Multiple input channels will be interpreted sequentially as defined by the application

Parallel input Also known as composite input Multimodal applications allow

to use multiple input modes at nearly the same time, e.g. the user may speak and tap onto the screen

The Multimodal application will combine multiple inputs and interpret them User navigates in a map

and speaks “zoom in here”

Select a field and then speak “My number is …”

Then click only on a button Afterwards, navigate “Back to

main menu”

Parallel input needs additional platform or application capabilities in order to combine (integrate) and interpret multiple inputs

User speaks and clicks on the screen:

“Zoom in <here>.”

Recognition

grammar

speech

Interpretation

interpretation

Integration

interactionmanager

Multimodality – New User Interfaces.Example: Composite input for voice and stylus.

Semantic Interpretation:

action = zoom in

location = x, y from stylus

Recognition

grammar

speech

Interpretation

interpretation

Integration

interactionmanager

Interpretation:<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="speech"> <action>zoom_in</action> <location emma:hook="ink"/> </emma:interpretation></emma:emma>

User clicks on map while speaking:

x = 17

y = 54

Recognition

grammar

speech

Interpretation

interpretation

Integration

interactionmanager

Interpretation:<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="ink"> <location> <type>point</type> <x>17</x> <y>54</y> </location> </emma:interpretation></emma:emma>

Recognition

grammar

speech

Interpretation

interpretation

Integration

interactionmanager

Recognition

grammar

speech

Interpretation

interpretation

Integration

interactionmanager

Integration:<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="multimodal"> <action>zoom_in</action> <location> <type>point</type> <x>17</x> <y>54</y> </location> </emma:interpretation></emma:emma>

Interaction manager (application specific tasks)

Proof of input data integrated input? speech only? ink/stylus only?

Proof of suitability of Integration results input data compatible? (e.g. are the

real number of stylus input (e.g. 2times) the same like the expected value

Mapping of recognition results from different modalities e.g.

Speech recognition error but stylus correct

Speech recognition OK but stylus incorrect

Confidence ok and stylus ok Decision for error handling output

graphical, audio, prompt, TTS Handling of redundant information and

creation of related user reaction prioritisation of input modalities

Integration

interactionmanager

Multimodality – New User Interfaces.Methods and functionalities: Interaction manager.

mobile multimodal applications. dr. roman englert, gregor glass march 23 rd, 2006

buxtons law

sound logo slide

parallel interaction

interaction modes

gods law complexity

functionality proportional

mobile multimodal applications

technology designers

Documents

gregor sclllemann

foto: alexander paul englert / gestaltung: opak ...€¦ ·...

gregor berger

gregor modules

11 sorting buffers on hsts harald räcke matthias englert...

englert competencies - online psychometric assessment of

gregor mendel

englert myths and realities

englert sets the standard for innovative, efficient and

delivering happiness - englert theatre 9-1-10

cilia and ribosomes by: coleman young matt knight ethan...

project management profile: michelle englert...project...

klein-becker v patrick englert

stuart gregor

mendel, gregor gregor mendel's letters to carl nägeli,...

gregor samsa:images

an englert metal roof is the smart choice

your home deserves an englert metal roof

englert 2015 foreword by else roesdahl

stages: the official magazine of the englert theatre