Download - 1 Speech User Interfaces 2 Outline Motivation for speech UIs Speech recognition UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech

1

Speech User Interfaces

2

Outline

Motivation for speech UIs Speech recognition UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech UI design tools Multimodal UIs

3

Motivation for Speech UIs:Pervasive Information Access

Information

&

Services

I-Land vision by Streitz, et. al.

4

UIs in the Pervasive Computing Era

Future computing devices won’t have the same UI as current PCs wide range of devices

small or embedded in environment often w/ “alternative” I/O & w/o screens information appliances

I-Land vision by Streitz, et. al.

5

Information Access via Speech

Read my important

email

6

Industry Leaders

Nuance Corporation Applications: TellMe, … Users: Government, Computers- Microsoft,

IBM,

7

Speech UI Motivation

Smaller devices -> difficult I/O people can talk at ~ 90 wpm -> high speed

“Virtually unlimited” set of commands Freedom for other body parts

imagine you are working on your car & need to know something from the manual

Natural evolutionarily selected for

reading, writing, & typing are not (too new)

8

Why are Speech UIs Hard to Get Right?

Speech recognition far from perfect imagine inputting commands w/ the mouse &

getting the wrong result 5-20% of the time Speech UIs have no visible state

can’t see what you have done before or what affect your commands have had

Speech UIs are hard to learn how do you explore the interface? how do you

find out what you can say?

9

Speech recognition the computer understanding what the customer is saying

Speech production (or synthesis) the computer talking to the customer

Speech UIs Require

10

Speech Recognition Continuous vs. non-continuous Speaker independent vs. dependent Speech often misunderstood by people

feedback via speech, facial expressions, & gesture Recognizers trained with real samples

often get gender-based problems Based on probabilities (HMMs - Bayes)

trigrams of sounds or words Several popular recognizers

Nuance, SpeechWorks, IBM ViaVoice

11

Speech Production

Three frequency regions of great intensity visible on oscilloscope come from larynx, throat, mouth

Two needed for recognition but “tinny” Can generate emotion affect in speech

Demo anger, disgust, gladness, sadness, fear, & surprise

http://cahn.www.media.mit.edu/people/cahn/emot-speech.html

12

Recognition Problems

Good recognition humans < 1% error rate on dictation top recognition systems get <1-X% error rates

computers don’t use much context Key is to be application specific for lower error rates

Background noise even worse recognition rates (20-40% error)

Speed Better as hardware getting faster

in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs

13

More Recognition Problems

Isolated, short words difficult common words become short

Segmentation silly versus sill lea

Spelling mail vs. male -> need to understand language

14

Speech UI Problems Speech UI no-nos

modes (no feedback) certain commands only work when in specific states

deep hierarchies (aka voice mail hell) Verbose feedback wastes time/patience

only confirm consequential things use meaningful, short cues

Interruption half-duplex communication (i.e., no barge-in support)

Too much speech on the part of customer is tiring Speech takes up space in working memory

can cause problems when problem solving

15

SpeechActs: Guidelines for Speech UIs

Speech interface to computer tools email, calendar, weather, stock quotes

Establish common ground & shared context make sure people know where they are in the conversation

Pacing recog. delays are unnatural, make it clear when this occurs barge-in lets user interrupt like in real conversations tapering of prompts progressive assistance: short errors messages at first, longer

when user needs more help implicit confirmation: include confirm in next command

16

One Vision of Future User Interfaces

Star Trek style UI verbally ask the computer for information may be common in mobile/hands-busy situations problem: hard to design, build, & use!

requires perfect speech recognition & language understanding

17

Our Vision of Future User Interfaces

Multimodal, Context-aware UIs multimodal

uses multiple input modalities (speech & gesture) to disambiguate

user says “move it to this screen” while pointing context-aware

apps can be aware of location, user, what they are doing, … people are talking -> don’t rely on speech I/O

Problem: how to prototype & test new ideas? Informal UI Design Tools!

combine Wizard of Oz & informal storyboarding

18

Multimodal Error Correction

Dictation error correction study found users are better at correcting recognition

errors with a different input modality recognizer got it wrong the first time -> it will get it

wrong the second time hyperarticulating aggravates

Correct dictation errors with vocal spelling, writing, typing, etc

19

Summary

Speech UIs may permit more natural computer access allow us to use computers in more situations are hard to get to work well

lack of visible state, tax working memory, recognition problems, etc.

UI tools are needed for speech UI design Multimodal UIs address some of the problems with

pure speech UIs help disambiguate help w/ correction

Download - 1 Speech User Interfaces 2 Outline Motivation for speech UIs Speech recognition UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech

Top Related