VOICEDE S IGN IN G A N D DE VE L OP IN G
FOR TH E FUTURE OF US E R E X PE R IE N CE
WHY ARE AMAZON TALK ING ABOUT IOT AND VOICE?
Ifyouaren’tinterestedinmakingyourwonconnecteddevices,youcanuseASKtogetyourapplicationtomillionsofusers
IfyouaremakingIoTdevicesyoucanvoiceenableitwithAVS
Ifyou’remakingsmarthomeproducts,youcancontrolthoseproductsviavoicewithAlexa
TheIoT isgrowingrapidlywithsomeresearchestimatingthattherewillbe50Bnconnecteddevicesby2020.
VOICE WILL BE EVERYWHERE
Theageoftouchcouldsooncometoanend.Fromsmartphonesandsmartwatches,tohomedevices,toin-carinfotainmentsystems, touchisnolongerthe
primaryuserinterface.
Source:DesignNews
“
”
Although voice technology is still in it’s relative infancy, the future is not as far off as you think.
A S RAutomatedSpeechRecognition
Whatistheuseractuallysaying?
N L PNaturalLanguageProcessing
Whatistheintentoftheuser?
MachineLearningandIntelligence
Provideextraordinaryandadvancedcustomerexperiences
RAPID ADVANCEMENT
A IC L O U DScalable,Reliable&Secure
Abilitytointroducefeaturesatscale,thatcontinuouslyaddvalueovertime
A C C E S SDemocratizingVoiceTechnology
BetterASRandNLPhasleadtobetteraccessandadoption
1970 1980 1990 2000 2010 2020
HUMAN ACCURACY
50% 55%60% 62%
70%
95%
ASRaccuracyhasdramaticallyincreasedinthelast4-5years.
Thisinflectionpointhascreatedsustainedmomentuminconsumeradoptionofvoicetechnology
MACHINE ASR ACCURACY
Source:MindMeld
Asspeechrecognitionaccuracygoesfrom95%to99%,allofusintheroomwillgofrombarelyusingittodaytousingitallthetime.Mostpeopleunderestimatethe
differencebetween95%and99%accuracy.99%isagamechanger.
“
”AndrewNg,ChiefScientistatBaidu
TIME
PER
FOR
MAN
CE
COMPUTER PERFORMANCE& MACHINE LEARNING
MACHINE LEARNING & INTELLIGENCE
HUMAN PERFORMANCE
WE ARE HERE
Soonitwillseemalmostquainttherewasatimewelookedatvoiceassistantsasvirtualfriendswholivedin
ourpocketsandansweredourquestions.
“
”Source:TheDrumNews– HowVoiceTechWillChangeOurLivesForever
INTERFACE EVOLUTION EVENTSIttookgenerationsandseveralmajortechnologicaladvancementsfortouchscreens,GUIandVUItoachievecriticaladoption.
Followingnon-commercialGUImilestones,theadvancesoftheearly80sand90s(Windows95,Apple’sOS,theInternet)changedtrajectoryoftheGUI
Itwasn’tuntilthePalmPilotofthelate‘90sandsmartphonesofthemid-2000sthatallowedTouchtoemergeasakeyinteractionmodality
Human-To-HumanVUIwasbroughtinwiththedawnofthetelephone,butHuman-To-MachineVUIshavejustrecentlybecomeviable
• Canhandlemoreinfo• Morefamiliar• Hardertogetlost• Providesflexibility
TOUCH vs. VOICE
• Faster• Lesscumbersome• Universal• Removesnoise
VOICE AS A KEY MODALITY Whereit’sgoingislimitedbyone’simaginationbutvoiceWILLplayakeyroleinhowwecontrolourhomes,ouroutdoorspacesandaccessinformation…Why?
ACCESS ACCURACY EFFICIENCY SECURITYAcrossbillionsofdevicesbetweenphones,watchescars,Alexa-powereddevicessuchastheAmazonEcho,EchoDot,AmazonTap,AmazonFireTV,ismakingaccesstovoiceubiquitous.
Advancesintheabilitytounderstanduserintentionischangingthegameinadoptionasinteractionsbecomefasterandmorereliable
Easeofusewillmakeitapowerfulchoiceforquickaccesstoanythinginsideandoutsideourenvironment
Voiceisahighlyuniquesignature,asadvancesinbiometricsareintegratedwithvoice,ourindividualitywillbecomeakeytofurtherpersonalizationandsecurity.
DE VE L OP IN GFOR VOICE
Create Great Content: ASK is how you connect
to your consumer
THE ALEXA SERVICESupported by two powerful SDKs
A LE X AVO I C E
S E RV I C E
Unparalleled Distribution: AVS allows your content
to be everywhereLives In The Cloud
Automated SpeechRecognition (ASR)
Natural Language Understanding (NLU)
Always Learning
A LE X AS K I L LS
K I T
A LE X AS K I L L S K I T
UNDER THE HOOD OF THE ALEXA SKILLS KITA closer look at how the Alexa Skills Kit processes a request and returns an appropriate response
You Pass Back a Textual or Audio Response
You Pass Back a Graphical Response
Alexa Converts Text-to-Speech (TTS) & Renders Graphical Component
Respond to Intent through Text & Visual
Alexa sends Customer Intent to
Your Service
Your ServiceprocessesRequest
User Makes a Request
Audio Stream issent up to Alexa Alexa Identifies Skill & Recognizes
Intent Through ASR & NLU
Speech Platform
SkillsWeather
ASR
NLU
TTS
“speak”directive
intent
recognitionresult
recognize
intent
recognitionresult
text/SSML
user’sutterance
Alexa’svoice
Alexa’svoice
Alexa, what’s the weather?
WAKE WORD DETECTION
SPEECH CAPTURE
TEXT TO SPEECH OUTPUT
AlexaVoiceService
W HA T C OMP ON E N TS MA KE UP A SK I L L
Skills are made up of two components
Skill configuration in the Amazon Developer Portal
and
Your skill code, hosted in AWS Lambda or your own HTTPS endpoint
I N V O C A T I O N N A M E S
Invocation names are how we know to route traffic to your particular skill.
Interactions can be either:
One Shot – open your skill and perform an action such as ‘Alexa, ask National Rail for my commute’
Conversational – Alexa, ask National Rail to set up my commute’ - ‘OK, what is your regular departure station’ – ‘Birmingham New Street’
Open Only – Alexa, open National Rail
Your skill can support all of these, it’s not one or the other.
‘Alexa, ask National Rail for my commute’
Alexa, open Just Eat
Alexa tell Uber to get me a ride
Alexa, launch Cat Facts
Alexa, play Reindeer Trivia
I N T E N T S A N D SL O T S
You define interactions for your voice app through intent schemas
Each intent consists of two fields. The intent field gives the name of the intent. The slots field lists the slots associated with that intent.
Slots can also included types such as LITERAL, NUMBER, DATE, etc.
intent schemas are uploaded to your skill in the Amazon Developer Portal
{"intents": [
{"intent": "tubeinfo","slots": [
{"name": "LINENAME","type": "LINENAMES"
}]
}]
}
C UST O M S LO T S
Custom Slots increase the accuracy of Alexa when identifying an argument within an intent.
They are created as a line separated list of values
It is recommended to have as many possible slots as possible.
There are some built in slots for things such as GB.City and GB.FirstName
bakerloocentralcircledistricthammersmith and cityjubileemetropolitannorthernpiccadillyvictoriawaterloo and citylondon overgroundtfl railDLR
SA M P LE UT T E R A N C E S
The mappings between intents and the typical utterances that invoke those intents are provided in a tab-separated text document of sample utterances.
Each possible phrase is assigned to one of the defined intents.
tubeinfo are there any disruptions on the {LINENAME} line
tubeinfo {LINENAME} line
“What is…”
“Are there…”
“Tell me…”
“Give me…”
“Give…”
“Find…”
“Find me…”
P u t t i n g I t A l l T o ge t h e r
tubeinfo are there any delays on the {LINENAME} line
{"intent": "tubeinfo","slots": [
{"name": "LINENAME","type": "LINENAMES"
}]
}
bakerloocentral. . .
Utterance
Intent
Slots
R E Q UE ST T Y P E S
LaunchRequestOccurs when the users launch the app without specifying what they want
IntentRequestOccurs when the user specifies an intent
SessionEndedRequestOccurs when the user ends the session
A N E XA M P LE R E Q UE ST
If hosting your own service, you will need to handle POST requests to your service over port 443 and parse the JSON
With AWS Lambda, the event object that is passed when invoking your function is equal to the request JSON
Requests always include a type, requestId, and timestamp
If an IntentRequest they will include the intent and its slots
type maps directly to LaunchRequest, IntentRequest, and SessionEndedRequest
"request": {"type": "IntentRequest","requestId": "string","timestamp":"2016-05-13T13:19:25Z","intent": {
"name": "tubeinfo","slots": {
"LINENAME": {"name": "LINENAME","value": "circle"
}}
},"locale": "en-GB"
}
A N E XA M P LE R E SP O N SE
Your app will need to build a response object that includes the relevant keys and values.
The alexa-sdk for Node.js makes this super simple.
outputSpeech, card and reprompt are the supported response objects.
shouldEndSession is a boolean value that determines wether the conversation is complete or not
You can also store session data in the Alexa Voice Service. These are in the sessionAttributes object.
{"version": "1.0","response": {
"outputSpeech": {"type": "SSML","ssml": "<speak>There are
currently no delays on the circle line.</speak>"
},"shouldEndSession": true
},"sessionAttributes": {}
}
TH ECODE
The promise for voice is great... but so is the potential for failure
DE S IGN IN GFOR VOICE
WH A T E X A CTL Y I SVOICE US E R IN TE R FA CE ( VU I ) DE S IGN ?
Voice User Interface (VUI) design creates delightful experiences using voice and natural language, by designing voice interactions that fulfill a user’s request, engages them in conversation, and makes the technology they’re using, seem totally invisible.
S OM E D O’s & D ONT ’ s o f V U I
DONTI can give you disruption information for the London Underground
DOI can give you disruption information for the London Underground. What line would you like to check?
S OM E D O’s & D ONT ’ s o f V U I
DONTI can give you disruption information for the London Underground
DONTWelcome to The Underground
DOWelcome to the The Underground skill. You can get disruption information by saying a London Underground line name. What line would you like to check?
DOI can give you disruption information for the London Underground. What line would you like to check?
S OM E D O’s & D ONT ’ s o f V U I
DONTI can give you disruption information for the London Underground
DONTWelcome to The Underground
DONTI can give disruption info for all of the London Underground lines. Which one would you like….
DOWhich line would you like disruption information for?
DOWelcome to the The Underground skill. You can get disruption information by saying a London Underground line name. What line would you like to check?
DOI can give you disruption information for the London Underground. What line would you like to check?
S OM E D O’s & D ONT ’ s o f V U I
DONTI can give you disruption information for the London Underground
DONTWelcome to The Underground
DONTI can give disruption info for all of the London Underground lines. Which one would you like….
DOWhich line would you like disruption information for?
DONTYou would like disruption information for the circle line right?
DOThere are currently no delays on the circle line
DOWelcome to the The Underground skill. You can get disruption information by saying a London Underground line name. What line would you like to check?
DOI can give you disruption information for the London Underground. What line would you like to check?
WH A TN E X T?
E N TE R ON E OF OUR P R OMOS
Hackster.io – API Mashup Contest Attend a MLH Hackathon
Publish a Skill – Get a Free t-shirt
PL E N TY MOR E R E S OUR CE S OVE R A TDE VE L OPE R .A MA Z ON .COM/ A S K
TH A N K YOU QUE S T I ON S ?