designing for voice interactions (uxaustralia)

source: http://www.flickr.com/photos/altemark/304079314

Designing for Voice Interactions

UX Australia

Designing for Mobility

Melbourne, March 1 2013

Jonny Schneider

Lead Consultant

Mobile Experience Design & Strategy

http://www.flickr.com/photos/altemark/304079314

http://www.flickr.com/photos/altemark/304079314

mailto:[email protected]




‘Name of referenced work’, Author/source/URL, date.

When you think of voice

recognition, you probably

think of...

‘Understanding Moira’, AAMI TV Commercial, http://www.youtube.com/watch?v=EY_jL38HMy8

inaccurate

too slow

never works

it’s a gimmick too tedious for me

“I won’t use it until it’s faster and more accurate than typing”

it can’t handle my accent

A lot of those things might be true, but this is default thinking, likely based on many bad experiences. However, there are two sides to every story.

http://www.youtube.com/watch?v=EY_jL38HMy8

http://www.youtube.com/watch?v=EY_jL38HMy8

https://twitter.com/bennyg/status/167192535305945088




http://www.flickr.com/photos/av_hire_london/5579125851

IDEA: Experience first-hand what it's like to interact with digital devices using predominantly your voice.

METHOD: A group of colleagues committed to using voice wherever possible, for an entire day.

Day of Voice

Let’s take a more objective look at what it’s like to use voice in our everyday interactions. Today.



✦ Controlling the device is tedious

✦ I’m sorry, I can’t do that for you

✦ Comprehension/recognition

✦ Expression

✦ Privacy

✦ Loss of context/paradigm

Day of Voice: what didn’t work

Control:“Dictation itself was fine, but getting to where notes are taken very tedious.”“I couldn’t navigate to where I needed to be. It heard the command correctly, but didn’t know what to do with it”

Limitations: Generally, it’s not pervasive enough to be relied upon

“I can’t...”- “play games with voice”!- Attach to email- dictate an email address "schneider dot jonny at gmail com". - edit an address

Recognition. i.e. Pam’s clips.

Expression. Exclamation marks, commas, full stops, slang etc. is possible, but not natural. As a result “I found that everything tends to run together”

Privacy. “On several occasions, I found myself wandering off to a small room or closet so that other’s couldn’t hear what I was talking about.”

Loss of context. Chat client. Using voice means I have to break-out of the normal short-messaging paradigm that I’m used to. It changes to asynchronous audible communication. Without those visual cues, I’m not sure where I’m up to, or what I want to say next.

A lot of this could just be that we’re not used to it.

✦ Google search with auto-suggest

✦ Dictation

✦ Accessibility*

✦ Control by command (XBox Kinect; Dragon for desktop)

Day of Voice: what worked

Examples of some useful and surprising experiences with voice

Google search. “brilliant for rarely used words like 'oesophagus' or 'onomatopoeia', and much faster than guessing letters and typing.”

Dictation. “Recording of notes is easy and I've done it on a number of occasions as I'd much prefer to talk than to type.” Can make light of a tedious task of typing on a mobile device.Even at 80% accuracy, this is way faster than typing, for longer messages

Accessibility.Blind person using Instagram [video]

‘How Blind People Use Instagram’, Tommy Edison, 2012. http://bit.ly/YBmBmb

blind man uses Instagram(video)

http://www.youtube.com/watch?v=P1e7ZCKQfMA

http://bit.ly/YBmBmb

http://bit.ly/YBmBmb

http://www.google.com/nexus/4/

✦ On-board hardware (microphone and speaker)

✦ hands busy + eyes busy context of use

✦ Personal and ‘always with you’ nature of device suits idea of ‘virtual assistants’

Why is this so relevant for mobile?



Data: http://isc.org; http://amta.org.au; http://wikipedia.org and various websites

‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

AMPSAnalogue

GSM2G/WAP/WML/i-mode

3G UMTS NextG

The beginnings of speech recognition technology predates mobile telephony.Goes back to the 50s but let’s look at the last30 years

•Ray Kurzweil’s reading machine: speech synthesiser for blind people.•+10 years first the first commercial speech recogniser is created. It’s enormous, and very expensive.•The next decade: mobile devices get smaller and more prolific. Internet starts to take off•(early 90s) SMS, then T9 later that decade•(’95-2000) Dragon dictation, 1st IVR over DTMF, Telephone banking •Touch devices happen•Google voice search (2008)•Voice Control for iOS, then Voice Actions a year later•Swype text input•Voice controlled virtual assistants (SIRI and Google Now) 2012•Visual IVR

Ray Kurzweil is now Head of Engineering at Google. Leading a Search AI program.http://techcrunch.com/2013/01/06/googles-director-of-engineering-ray-kurzweil-is-building-your-cybernetic-friend/

http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au

http://wikipedia.org



‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

AMPSAnalogue


3G UMTS NextG

Telecom ‘Walkabout’

KurzweilReadingMachine←(1976)

1st commercial large vocabularyspeech recogniser




http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR

AMPSAnalogue


3G UMTS NextG







http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR

AMPSAnalogue


3G UMTS NextG

SMS is born

Predictive Text (T9)







http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR

AMPSAnalogue


3G UMTS NextG

SMS is born


TelephoneBanking

1st dial-in IVR

(DTMF)

Dragon Dictate v1

for PC







http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR

HTC Dream (1st Android)

iPhone 3

AMPSAnalogue


3G UMTS NextG

SMS is born


TelephoneBanking

1st dial-in IVR

(DTMF)

Dragon Dictate v1

for PC







http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR


iPhone 3

AMPSAnalogue


3G UMTS NextG

Google voicesearch app

SMS is born


TelephoneBanking

1st dial-in IVR

(DTMF)

Dragon Dictate v1

for PC







http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR


iPhone 3

AMPSAnalogue


3G UMTS NextG


SMS is born


TelephoneBanking

1st dial-in IVR

(DTMF)

Dragon Dictate v1

for PC

Voice control(iOS3)

Voice actions(Froyo)







http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR


iPhone 3

AMPSAnalogue


3G UMTS NextG


SMS is born


TelephoneBanking

1st dial-in IVR

(DTMF)

Dragon Dictate v1

for PC

Voice control(iOS3)





Swype




http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au




‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR


iPhone 3

AMPSAnalogue


3G UMTS NextG


SMS is born


TelephoneBanking

1st dial-in IVR

(DTMF)

Dragon Dictate v1

for PC

Voice control(iOS3)


SIRI &Google Now




Swype




http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au



VisualIVR


‘83 ‘85 ‘87 ‘89 ‘91 ‘93 ‘95 ‘97 ‘99 ‘01 ‘03 ‘05 ‘07 ‘09 ‘11

Palm Treo

Motorola Brick

Nokia 5110

MotorolaRAZR


iPhone 3

AMPSAnalogue


3G UMTS NextG


SMS is born


TelephoneBanking

1st dial-in IVR

(DTMF)

Dragon Dictate v1

for PC

Voice control(iOS3)


SIRI &Google Now




Swype




http://isc.org

http://isc.org

http://amta.org.au

http://amta.org.au



http://www.flickr.com/photos/carnamah/5859235859

What do people want?

If I had asked people what they wanted,

they would have said faster horses.

Henry Ford, nineteen twenty never

Henry didn’t actually say this... Someone at Harvard Business Review went looking, and got a response from the Henry Ford Museum, who have researched the topic before, and had found no satisfactory result to suggest that Ford in fact said it!

The point is...I believe there’s a misconception that people don’t like voice as an interaction method.I would argue that people will use whatever input method gets the job done quickly and with minimum fuss - that can be ‘voice’.

I wonder what people said about:•T9•Touch•Mobile telephony •or even computers



Used with permission by Kenneth Johnson. http://kennethjohnson.us/

✦ All the robots!

✦ Google glass

Imagine the future...

if machines could understand.

A few examples:- HAL 9000 (2001: A Space Odyssey)- T-800 (Terminator)- Johnny 5 (Short Circuit) - Data (Star Trek) - Robocop ED-209 (Robocop)

Not just movies....CSI and other such shows are riddled with intelligent, understanding, all singing, all dancing, talking computers.

Sci-Fi movies have been spruiking the possibilities for decades. In reality, we’re moving at a much slower pace, but things like Google Glass are coming - in fact, you can participate for the trial study right now if you like.

http://kennethjohnson.us/images/Peek2.jpg

http://kennethjohnson.us/images/Peek2.jpg

Voice recognition technology

Main types of voice interaction

Design principles

›❯

›❯

›❯

Let’s talk about Voice



Design principles

›❯

›❯

›❯

A (very) quick look at the technology

search engine

customer database

private APIs

transaction gateway

3rd party APIs

SPEECHRECOGNITION &

SYNTHESIS SERVICE

voice-to-text

text-to-speech

This is one configuration, that we used on a recent project.There are many other ways this could be done.

•sound clip recorded•clip sent to VTT•VTT interprets/translates•sent back as text•device sends text to other services (i.e. search engine)•data sent back to the device (often multiples, with a confidence rating)•device sends text to be voiced over (i.e. a summary of the data presented to user)•TTS creates a voice clip and sends it back to the device•device presents the data and plays the voice clip


search engine

customer database

private APIs

transaction gateway

3rd party APIs

A

SPEECHRECOGNITION &

SYNTHESIS SERVICE

voice-to-text

text-to-speech




search engine

customer database

private APIs

transaction gateway

3rd party APIs

A

B

SPEECHRECOGNITION &

SYNTHESIS SERVICE

voice-to-text

text-to-speech




search engine

customer database

private APIs

transaction gateway

3rd party APIs

A

B

C

SPEECHRECOGNITION &

SYNTHESIS SERVICE

voice-to-text

text-to-speech



http://www.flickr.com/photos/citychiccountrymouse/3856797711

PURPOSE: Measure accuracy and latency of current voice recognition solutions

METHOD:

✦ 4 vendor solutions

✦ 14 test phrases for translation

✦ 12 participants

✦ phrases recorded ‘fast’ and ‘slow’

Let’s Benchmark!



“Are there any good deals nearby”

I’ll get any goodies nearby

Are there any deals near me

Adding any deals any of me

Are there any good deals nearby ✔

✘

✔

✘

Objective (exact) and subjective matching.

Average Accuracy

Number of people tested

Comments

iSpeech 10% 4 Discarded after initial testing

Google 47% 12 Non supported API

Nuance - high quality audio 56% 12 10x file size

Nuance - low quality audio 50% 12 1x file size

Siri 64% 12 Not a reusable product

Average accuracy of voice solutions

Average accuracy.

It’s a small number of participants. I’m sure you could find much more comprehensive test results from other sources. Knock yourself out!

0

20

40

60

80

100

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

P11

P12

Google Voice Nuance Wav Nuance Speex SIRI

Accuracy of voice recognition by participant

Accuracy by participant.Here’s Google Voice in pink.and now Nuance.and the other two vendors tested.

This tells us there is significant variation in accuracy, from person to person.

0

20

40

60

80

100

Aust

ralia

n (2

)

Indi

an (3

)

Sing

apor

ean

(3)

Amer

ican

(1)

Hon

g Ko

ng (1

)

Mal

aysi

an (1

)

Chin

ese

(1)

Google Voice Nuance Wav Nuance Speex SIRI

Average accuracy of voice recognition by accent

It’s a similar story across the different accents.


SPEECHRECOGNITION &

SYNTHESIS SERVICE

voice-to-text

text-to-speech

search engine

customer database

private APIs

transaction gateway

3rd party APIs

A

B

C

Remember A, B, C?We’re going to measure latency now.

2 weeks, sampling every 30 mins.

0

10

20

30

40

50

60

3G (in Asia) WiFi (private)

3

16

10

21

24

Nuance Google

Comparison of latency performance (seconds)

0

10

20

30

40

50

60


3

18

10

22

4

16

Voice-to-Text ‘Stuff’ in the cloud Text-to-speech

Let’s measure latency of each of those steps.

Enormous latency!Over 40 seconds over 3G. Absurd.

One important note, is that these times represent a whole phrase, the phrases are not broken down and processes synchronously, as is the case with products like Google voice search app.

0

10

20

30

40

50

60


3

16

24

Nuance Google

Comparison of latency performance (seconds)

0

10

20

30

40

50

60


3

18

4

16

Voice-to-Text Text-to-speech

Even when we cut out the ‘other stuff’, and measure only VTT and TTS services, it’s still really very slow.

Some of this can be improved with colocation of servers and services. This test involved servers that were geographically spread over the globe. However, that isn’t always feasible, depending on the services you are connecting with, and where they are served from.

http://www.flickr.com/photos/lisovy/5415681393/

✦ Even the best recognisers struggle to achieve higher than 60% accuracy

✦ Latency is a problem, especially over slower networks

Conclusions

Consider the effect when these compound.It takes ages to get the result, and there’s a high likelihood it will be incorrect.

Not ideal.

My friend Rod Farmer kindly pointed out that it is possible to run concurrent requests - translating a few words at a time - in order to reduce latency significantly. For our limited prototype, this kind of engineering wasn’t feasible. None the less, the recommendations that follow are helpful regardless of latency.





Design principles

›❯

›❯

›❯

Main ways of interacting with voice

Commands Dictation

Natural Language Identification

http://www.flickr.com/photos/bengrogan/2147048247

Command-based interactions

think of: Selective hearing.

✦ System only hears what it is listening for

✦ Structured/scripted

Commands based systems are like ‘selective hearing’.

The system only knows how to understand things that it is listening for.It’s a structured generally tedious way of interacting. It often feels scripted and impersonal, which are the kind of attributes that typically offend customers.

This was typically the back-bone of the early IVRs (late 90s-2000s).

AAMI, the Australian insurance company, has built it’s unique market position on exactly that. You might be familiar with the ‘Moira’ campaign.



Think about any time you’ve called your mobile provider.I know it feels tedious, but ask yourself - would it be any better if you spoke with a person?

Customers hate:1. repeating themselves (usually because of a routing issue)2. waiting in queue

Telstra has 2nd biggest call centrewith 600 unique reasons to call200,000 inbound calls per dayhandling 1M transfers per month

I’d like to argue that speaking with a real agent may well be a poorer experience than a machine.Why? Humans aren’t perfect either:- Attitude- Accents- Understanding- Consistency

There are also times when we might simply prefer a machine. I can think of one or two times when I’ve really hoped to get to voicemail, because the person I was calling is a difficult to talk with. Or perhaps you’re five weeks overdue on your invoice, and would prefer not to explain yourself, but instead get it paid through an IVR.

We’re talking about command based interactions - Strictly, most IVRs today has moved beyond simple ‘commands’. They usually begin with an open prompt, before moving to menu mode. We’ll discuss that in more detail in a moment.


A very clever use of simple voice commands to control an interface - entirely appropriate for the context of use you’d expect for this scenario (sticky fingers etc.)

Other’s noteworthy examples: - XBOX Kinect- Dragon for desktop

✦ Great as a text-input replacement, particularly for mobile, where keypads are tedious

✦ It doesn’t need to ‘understand’

✦ Predictive dictation, based on data

http://www.flickr.com/photos/vivax_imago/5603582392

DictationDictation

think of: Hearing, but not understanding.

The machine hears what you tell it, but can’t make meaning from it.

I think we all understand how dictation works. The user says something, their speech is ‘recognised’ and then usually converted from voice to text.

If it is reasonably accurate, it’s easy to see how this can be helpful.Driving or walking down the street while composing SMS on a touch screen is hideously difficult. Dangerous, and possibly illegal. Dictation frees you up to focus on other things.

Complex vocabulary often also benefit from dictation. A word like oesophagus is difficult to spell, and you could be left guessing what letter it starts with a few times before T9 kicks in to save the day. Dictating it is likely to be quicker and easier.

Nuance’s Powerscribe360 is a great example of that in action. For medical practitioners.



It’s no co-incidence that major mobile operating systems have this embedded right at the core.

Just how it’s not a co-incidence that Google have just employed Ray Kurzweil as director of Engineering.Are they building SkyNet?

on a mac

Example of predictive dictation:“What does onomatopoeia mean?”

The machine still doesn’t “understand” in the way we mean it.But just like search engines, it can predict what we mean based on statistical modeling.

Think of how many billions of search queries Google has at hand, that are used to inform these statistical models.

on a mac on a mat




on a mac on a mat onomatopoeiamean?




http://bit.ly/XPJ7DC

✦ ‘natural language’ interactions

✦ The machine understands* meaning, and can then respond in a helpful, meaningful and personal way

Virtual Assistants

think of: hearing and understanding*

This is like hearing and understanding.

‘Understanding’ has an asterisk next to it, and you’ll see why over the next few slides.Machines have a really hard time trying to understand meaning - Why...



‘Subliminal: How your unconscious mind rules your behaviour’, p. 34. Leonard Mlodinow, 2012.

The cooking teacher said the students

made good snacks.

Meaning is nuanced

The cannibal said the students made

good snacks.

It’s because human communication is complex and nuanced.and it can’t easily be automated or codified.

Herein lies one of the biggest challenges for ‘intelligent’ or ‘understanding’ voice systems.

“Teachers and Cannibals” is a basic example.As humans, we easily understand the meaning of these two statements that are only different by a single word.And you’re probably alarmed - I hope you’re alarmed - by the latter.

Machines don’t understand this as easily.


A common homily

The spirit is willing, but the flesh is weak

Here’s another example...



A common homily,

when programmatically translated




The vodka is strong, but the meat is rotten

A common homily,

when programmatically translated


http://www.flickr.com/photos/lifementalhealthpics/8384573785

✦ Semantic classification

✦ Statistical probability modeling

✦ Creating a perception of understanding

What is machine ‘understanding’

Documents, conversations, or any kind of content can be manually classified or coded for meaning, and this becomes a model by which the machine can use for matching.

Statistical algorithms similar to those used in search engines are also used to help the machine perform better, based on past behaviour of other people.

This creates a perception of understanding or intelligence. You might call that ‘Artificial Intelligence’.

Vocabulary is an important factor in accuracy of probability modeling.Radiography reader was a successful early speech recognition system, that was ultimately successful because the vocabulary in radiography is constrained, and the acoustic signature of the words are quite different. Therefore the algorithms are more successful.




✦ Can you access data to help do the thinking on behalf of your users?

✦ prediction of customer needs

✦ Personalisation

System awareness

When a customer interacts with a service, various bits of data may be available:- identity- account status- location of call- time of day- device being used

This can be used to predict customer needs.

Example:Engineer cuts a cable that wipes out internet for all of Brunswick. 30,000 customers affected. For customers calling in from that geographic area, system has automated response, telling them about the problem. Customer hangs up. Lots of money saved.

20% vs. 2% improvement in routing and/or task completion by doing this. When compared with ‘tuning’ of semantic and statistical modeling.



Blade Runner, 1982. Warner Bros. img: http://replicant976.tumblr.com/image/12757032749

The Uncanny Valley

is not something we need

worry about.

Yet.

The Uncanny Valley is a hypothesis is robotics that suggests that as robots approach human likeness, they incite repulsive emotions in humans.

It doesn’t really apply to virtual agents, and so far, our experience has been that there is a long way to go before voice synthesis approaches human likeness - so it’s really nothing to worry about yet.

http://replicant976.tumblr.com/image/12757032749

http://replicant976.tumblr.com/image/12757032749

‘Sneakers’, Universal Studios, 1982. img: http://lat.ms/ZlHtN0

✦ Voice biometrics

Identification

think of: “My voice is my passport, verify me!”

Who remembers the film Sneakers? One of my favourites.

A team of security specialist steal the keycard and vocal codes of Warner Brandes, an unsuspecting employee of the ‘front’ company operated a bad guy who intends to become wealthy by using a decryption device to defraud companies for his own benefit.

In the end, the good guys win, and in a postscript, they use the Janek decryption device to steal from the rich and give to the poor. A modern day Robin Hood story.

This is a nice example of using voice biometrics for multi-factor authentication. There’re obvious applications for this, particularly for things like banking, where 2nd factor is often SMS, which has several limitations.

30 years later, we’re starting to see this kind of security for real.

http://lat.ms/ZlHtN0

http://lat.ms/ZlHtN0



Design principles

›❯

›❯

›❯

We’ve seen opportunities for humans to interact with computers in helpful waysconstraints in the capabilities of technology to deliver against this promiseand objectives in business to optimise operating costs and improving customer service

These are essentially the same ingredients to any design problem aren’t they?So let’s look at some principles that apply specifically to voice...

AT&T Visual IVR Project http://www.att.com/gen/press-room?pid=23362

✦ High latency, low accuracy...

✦ Help users recover by using offering alternatives

Design for failure

This could be as a multi-modal interface, or it could be a translated interface like this example of visual IVR, which let’s users traverse the IVR tree using a touch menu.

http://www.att.com/gen/press-room?pid=23362

http://www.att.com/gen/press-room?pid=23362

✦ Don’t treat voice as a ‘me too’ feature(will your product or your customers actually benefit from voice... really?)

✦ Think twice before introducing redundancy

Would you like voice with that?

Voice is the hot new thing right now, but resist the hype. It’s not trivial to implement, and even if it were, does that validate it as a ‘must have’ feature for your product?

Voice is integrated into the OS of modern devices.Their technology is mature. It can be used with any input field, any interface.The interaction design is polished, and extensively tested.Use that! If you can.


✦ Understand the various modes of voice interaction

✦ Be careful about mixing modes(is that a command or a conversation?)

Know when and how to use voice

When you are designing for voice, understand the modes.

Command, dictate, natural language, identity.

✦ Support multi-modal interactions and make it as seamless as possible(voice, gesture, type, other)

✦ test, iterate, test, iterate...

Let users decide how to interact

Don Norman, 2003“I believe that voice interfaces hold their greatest promise as an additional component to a multi-modal dialogue, rather than as the only interface channel.”

Dictate and edit is a prime example of this. It’s beautifully crafted.Voice -> typegesture -> voice

Test and iterate. Voice still isn’t a common/normal interaction, so you will likely get it a bit wrong the first few times.

Don’t make me think

“A simple voice interface can only be as good as

what the customer thinks they want. A better

system is one that understands what their needs

are likely to be, based on what’s known about

them. ”

✦ Personalisation

✦ Work on making the system ‘smarter’

Create a perception of understanding

The speech recognition and synthesis tools have become commodities. Focus your energies on helping the system seem smarter.

Jonny SchneiderLead ConsultantMobile Experience Design & [email protected]@jonnyschneiderau.linkedin.com/in/jonnyschneider/

All images used by permission

http://au.linkedin.com/in/jonnyschneider/

http://au.linkedin.com/in/jonnyschneider/

designing for voice interactions (uxaustralia)

Technology