dialog design
DESCRIPTION
Dialog Design. Speech/Natural Language Pen & Gesture. Dialog Styles. 1. Command languages 2. WIMP - Window, Icon, Menu, Pointer 3. Direct manipulation 4. Speech/Natural language 5. Gesture, pen. Natural input. Universal design Take advantage of familiarity, existing knowledge - PowerPoint PPT PresentationTRANSCRIPT
Dialog Design
Speech/Natural Language
Pen & Gesture
Dialog Styles
1. Command languages
2. WIMP - Window, Icon, Menu, Pointer
3. Direct manipulation
4. Speech/Natural language
5. Gesture, pen
Natural input
Universal design Take advantage of familiarity,
existing knowledge Alternative input & output Multi-modal interfaces Getting “off the desktop”
Agenda
Speech Natural Language Other audio
PDAs & Pen input styles Gesture
When to Use Speech
Hands busy Mobility required Eyes occupied Conditions preclude use of keyboard Visual impairment Physical limitation
Waveform & Spectrogram
Speech does not equal written language
Parsing Sentences
"I told him to go back where he came from, but he wouldn't listen."
Speech Input
Speaker recognition Speech recognition Natural language understanding
Speaker Recognition
Tell which person it is (voice print)
Could also be important for monitoring meetings, determining speaker
Speech Recognition
Primarily identifying words
Improving all the time Commercial systems:
IBM ViaVoice, Dragon Dictate, ...
Recognition Dimensions
Speaker dependent/independent Parametric patterns are sensitive to speaker With training (dependent) can get better
Vocabulary Some have 50,000+ words
Isolated word vs. continuous speech Continuous: where words stop & begin Typically a pattern match, no context used
Did youvs.Didja
Recognition Systems
Typical system has 5 components: Speech capture device - Has analog -> digital
converter Digital Signal Processor - Gets word
boundaries, scales, filters, cuts out extra stuff Preprocessed signal storage - Processed
speech buffered for recognition algorithm Reference speech patterns - Stored templates
or generative speech models for comparisons Pattern matching algorithm - Goodness of fit
from templates/model to user’s speech
Errors
Systems make four types of errors:Substitution - one for anotherRejection - detected, but not
recognizedInsertion - addedDeletion - not detected
Problems with recovery
Recognition Example:
My flight experience uh, I started off as a private pilot and then I spent 20 years flying as a navigator on C130s in the air force and I currently fly as a first officer on the SAAB 340 turboprop.
my flight experience a and I started off as a private pilot and I spent 20 years flying is a navigator on C-130s in the Air Force and I currently fly as a first officer on the sound 340 turboprop.
And another example
Here here's one of the
thoughts I'd throw in on
that. I'd agree with
Howard that the
combination of checklists
and flows is the way way
to manage things best.
hair and here's one of the
outside troodon and
an eye to agree with
power that the
domination checklists
and flows is the way
that managed things best
Natural Language Understanding
Putting meaning to the words
Input might be spoken or typed
Holy grail of Artificial Intelligence problems
Eliza - a “Faked” Understanding Person : Men are all alike.
ELIZA : In what way?Person : They're always bugging us about something or other.ELIZA : Can you think of a specific example?Person : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here?Person : He says I'm depressed much of the time.ELIZA : I am sorry to hear that you are depressed.Person : It's true. I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?Person : I need some help, that much seems certain.ELIZA : What would it mean to you if you got some help?Person : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family
http://www-ai.ijs.si/eliza/eliza.html Weizenbaum, J., "ELIZA -- A computer program for the study of
natural language communication between man and machine", Communications of the ACM 9(1):36-45, 1966
Natural Language Domains
Conceptual: total set of objects/actions• Knows about managers and salaries
Functional: what can be expressed• What is the salary of Joe’s manager? Who is Mary’s
manager? Syntactic: variety of forms
• What is the salary of the manager of Joe? Lexical: word meanings, synonyms,
vocabulary• What were the earnings of Joe’s boss
NL Factors/Terms
SyntacticGrammar or structure
ProsodicInflection, stress, pitch, timing
PragmaticSituated context of utterance, location,
time Semantic
Meaning of words
SR/NLU Advantages
Easy to learn and remember Powerful Fast, efficient (not always) Little screen real estate
SR/NLU Disadvantages
Assumes domain knowledge Doesn’t work well enough yet
Requires confirmationAnd recognition will always be error-
prone Expensive to implement Unrealistic expectations Generate mistrust/anger
Speech Output
Tradeoffs in speed, naturalness and understandability
Male or female voice? Technical issues (freq. response of phone) User preference (depends on the application)
Rate of speech Technically up to 550 wpm! Depends on listener
Synthesized or Pre-recorded? Synthesized: Better coverage, flexibility Recorded: Better quality, acceptance
Speech Output
Synthesis Quality depends on software ($$) Influence of vocabulary and phrase choices http://www.research.att.com/projects/tts/demo.html
Recorded segments Store tones, then put them together The transitions are difficult (e.g., numbers)
Designing the Interaction
Constrain vocabularyLimit valid commandsStructure questions wisely (Yes/No)Manage the interactionExamples?
Slow speech rate, but concise phrases Design for failsafe error recovery Visual record of input/output Design for the user – Wizard of Oz
Speech Tools/Toolkits
Java Speech SDK FreeTTS 1.1.1
http://freetts.sourceforge.net/docs/index.php
IBM JavaBeans for speech Microsoft speech SDK (Visual Basic, etc.) OS capabilities (speech recognition and
synthesis built in to OS) (TextEdit) VoiceXML
General Issues in Choosing Dialogue Style Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required
Non-speech audio
Traditionally used for warnings, alarms or status information
Sounds provide information that help reduce error. Eg: typing, video games
Multi-modal interfaces
Additional benefits of non-speech audio
Good for indicating changes, since we ignore continuous sounds
Provides secondary representationSupports visual interface
Tradeoff in using natural (real) sounds vs. synthesized noises.
Non-speech audio examples
Error ding Info beep Email arriving ding Recycle Battery critical Logoff LogonOthers?
Pen and Gesture
PDAs
Becoming more common and widely used
Smaller display (160x160), (320x240) Few buttons, interact through pen Estimate: 14 million shipped by 2004 Improvements
Wireless, color, more memory, better CPU, better OS
Palmtop versus Handheld
No Shredder…
http://sonyelectronics.sonystyle.com/micros/clie/
http://www.vaio.sony.co.jp/Products/VGN-U50/
http://www.oqo.com/
http://www.blackberry.com/
Input
Pen is dominant form for PDA and Tablet Main techniques
Free-form ink Soft keyboard Numeric keyboard => text Stroke recognition - strokes not in the shape
of characters Hand printing / writing recognition
Sometimes can connect keyboard or has modified keyboard (e.g. cell phones)
Soft Keyboards
Common on PDAs and mobile devicesTap on buttons on screen
Soft Keyboard
Presents a small diagram of keyboard You click on buttons/keys with pen QWERTY vs. alphabetical
Tradeoffs?Alternatives?
Numeric Keypad -T9
Tegic Communications developed You press out letters of your word, it matches the
most likely word, then gives optional choices Faster than multiple presses per key Used in mobile phones http://www.t9.com/
Cirrin - Stroke Recognition
Developed by Jen Mankoff (GT -> Berkeley CS Faculty -> CMU CS Faculty)
Word-level unistroke technique UIST ‘98 paper Use stylus to go
from one letterto the next ->
Quikwriting - Stroke Recogntion
Developed by Ken Perlin
Quikwriting Example
p l
http://mrl.nyu.edu/~perlin/demos/Quikwrite2_0.html
e
Said to be as fast as graffiti, but have to learn more
Hand Printing / Writing Recognition Recognizing letters and numbers and
special symbols Lots of systems (commercial too) English, kanji, etc. Not perfect, but people aren’t either!
People - 96% handprinted single characters Computer - >97% is really good
OCR (Optical Character Recognition)
Recognition Issues
Off-line vs. On-lineOff-line: After all writing is done,
speed not an issue, only quality.• Work with either a bit map or vector
sequence
On-line: Must respond in real-time - but have richer set of features - acceleration, velocity, pressure
More Issues
Boxed vs. Free-Form inputSometimes encounter boxes on forms
Printed vs. CursiveCursive is much more difficult to impossible
Letters vs. WordsCursive is easier to do in words vs
individual letters, as words create more context
More Issues
Using context & words can helpUsually requires existence of a
dictionaryCheck to see if word existsConsider 1 vs. I vs. l
Training - Many systems improve a lot with training data
Special Alphabets
Graffiti - Unistroke alphabet on Palm PDAWhat are your
experienceswith Graffiti?
Other alphabets or purposesGestures for commands
Pen Gesture Commands
-Might mean delete
-Insert
-Paragraph
Define a series of (hopefully) simple drawing gesturesthat mean different commands in a system
Pen Use Modes
Often, want a mix of free-form drawing and special commands
How does user switch modes?Mode icon on screenButton on penButton on device
Error Correction
Having to correct errors can slow input tremendously
StrategiesErase and try again (repetition)When uncertain, system shows list of
best guesses (n-best list)Others??
More ink applications
Signature verification Notetaking Electronic whiteboards and large-
scale displays Sketching
Free-form Ink
Ink is the data, take as is
Human is responsible forunderstanding andinterpretation
Like a sketch pad Often time-stamped
Audio Notebook
Stifelman, MIT
affordances of paper
notetaking activity
indexing
scanning
Meeting Support
Natural Input
Indexing
Tivoli Domain
objects
eClass
Ubiquitous Access
Flatland
Supportindividualwhiteboarduse
Use study persistence segments Informal
Mynatt, E.D., Igarashi, T., Edward, W.K., and LaMarca, A. (1999). "Flatland: New dimensions in office whiteboards." In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1999; Pittsburgh, Pennsylvania). New York: ACM Press, pp. 346-353.
Example
DENIM – Landay, Berkeley
General Issues in Choosing Dialogue Style Who is in control - user or computer Initial training required Learning time to become proficient Speed of use Generality/flexibility/power Special skills - typing Gulf of evaluation / gulf of execution Screen space required Computational resources required
Gesture Recognition
Tracking 3D hand-arm gesturesfiber optic, e.g. dataglovemagnetic tracker, e.g. Polhemus
Perceptual user interfacesemerging areamainly computer vision researchers
Other interesting interactions
3D interactionStereoscopic displays
Virtual realityImmersive displays such as glasses,
caves Augmented reality
Head trackers and vision based tracking