multimodal apps: tablet pc & speech development in.net casey chesnut brains-n-brawn.com...
TRANSCRIPT
![Page 1: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/1.jpg)
Multimodal Apps: Tablet PC & Speech Development in .NET
casey chesnutbrains-N-brawn.com
Wisconsin .NET June 2005
![Page 2: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/2.jpg)
Source Code
• The associated source can be found here:– http://www.brains-n-brawn.com/artifacts/ugTabletSpeech.zip
![Page 3: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/3.jpg)
Seamless Computing
• Advanced Web Services (MVP05)
• Compact Framework (MVP04)
• MapPoint• Tablet PC (MVP03)
• Speech• Artificial Intelligence• Direct3D• Media Center
![Page 4: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/4.jpg)
Questions
• How many programmers?– Tablet PC– Speech– Media Center
![Page 5: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/5.jpg)
Outline
• Tablet PC
• Speech– Speech API (SAPI)– Speech Application SDK (SASDK)– Speech Server
• Demo– Tablet and Speech– Media Center and Speech
![Page 6: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/6.jpg)
Outline : Tablet PC
• Development environment
• How it works
• Working with Ink
• Opinion
• Future
![Page 7: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/7.jpg)
Development Environment
• Windows XP Pro (non Tablet edition)• Visual Studio .NET 1.1• Tablet PC SDK 1.7
– http://www.microsoft.com/downloads/details.aspx?familyid=b46d4b83-a821-40bc-aa85-c9ee3d6e9699&displaylang=en
• Recognizer Pack– http://www.microsoft.com/downloads/details.aspx?FamilyId=080
184DD-5E92-4464-B907-10762E9F918B&displaylang=en
• Digitizer Board– http://www.wacom.com/productinfo/index.cfm
• Tablet PC
![Page 8: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/8.jpg)
How Ink works
• Digitizer collects stroke information
• Strokes are broken up into characters / words / drawings
• Character / word stroke info is transformed into some feature set
• Feature set is run through some sort of pre-trained AI
• Output is mapped to a dictionary or words
![Page 9: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/9.jpg)
Demo
• Digitizer collects stroke information
• Tablet PC Inspector– http://codebetter.com/blogs/peter.van.ooijen/archive/0001/01/01/56161.aspx
![Page 10: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/10.jpg)
Demo
• Strokes are broken up into characters / words / drawings
• InkDivider– Tablet PC SDK Sample
![Page 11: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/11.jpg)
Demo
• Character / word stroke info is transformed into some feature set
• Feature set is run through some sort of pre-trained AI
• Demo– /aiTabletOcr
• Article– http://www.brains-N-brawn.com/aiTabletOcr/
![Page 12: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/12.jpg)
Demo
• Output is mapped to a dictionary or words
• Dictionary Tool– http://blogs.msdn.com/omars/archive/2004/04/15/113597.aspx
• Article– http://www.brains-N-brawn.com/tabletDic/
![Page 13: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/13.jpg)
Working with Ink
• InkControls
• InkOverlay– Collection– Recognition
• RealTimeStylus
• Ink on the web
![Page 14: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/14.jpg)
Ink Controls
• InkEdit
• InkPicture
• Code from scratch
![Page 15: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/15.jpg)
InkOverlay
• Collection
• Recognition
• Demo apps
![Page 16: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/16.jpg)
RealTimeStylus
• RealTimeStylusPlugin– Tablet PC SDK Sample
![Page 17: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/17.jpg)
Ink on the Web
• IE only
• InkBlogWeb– Tablet PC SDK Sample
• Article– http://www.brains-N-brawn.com/tabletWeb/
![Page 18: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/18.jpg)
Opinion
• Green Light– Tablet PC Edition 2005 improved recognition
and usability dramatically– Recognition Pack made development more
accessible– Language Support
• Chinese (Traditional and Simplified),U.S. English, U.K. English, French, German, Italian, Japanese, Korean, Spanish
![Page 19: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/19.jpg)
Possible Future
• VS.NET 2005?
• Avalon?
• Will IE7 have tighter integration with ink?
• Longhorn – baked in
• Possiblity for training ink recognition
![Page 20: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/20.jpg)
What about Pocket PCs
• Handwriting Recognition
• Form factors
![Page 21: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/21.jpg)
Outline : Speech
• How does it work?– Synthesis (TTS)– Recognition (SR)
• Development– Speech API (SAPI)– Speech Application SDK (SASDK)– Speech Server (MSS)
![Page 22: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/22.jpg)
How Synthesis Works
• Text is converted to phonemes
• Phonemes are appended together
• Audio is played back
• Demo– /ttSpeech app
• Article– http://www.brains-N-brawn.com/ttSpeech/
![Page 23: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/23.jpg)
How Recognition Works
• Audio wav is transformed to some meaningful form
• Phonemes are found in audio signals• Phonemes are mapped to a dictionary or words
• Demo– wavReader app
• Article– http://www.brains-N-brawn.com/noReco/
![Page 24: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/24.jpg)
Speech API (SAPI)
• Old school COM
• Windows applications
• Can do dictation
• Demo– SAPI app
![Page 25: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/25.jpg)
Opinion
• Yellow light– It works, but is aging– Has to be trained for dictation– Limited language support
• Green light for Tablet PCs– Tablet PC has recognition and synthesis
engines installed– Some Tablets have microphone arrays built in
![Page 26: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/26.jpg)
Future
• System.Speech– Simple API– Reflection capabilities– Standards support (SSML, SRGS)– Engines should be improved from all the
Speech Server work
![Page 27: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/27.jpg)
What about Pocket PCs
• OEMs can add VoiceCommand
• WindowsMobile has the SAPI API, but no engines
• PlatformBuilder is supposed to have engines
• There are 3rd party engines for purchase
![Page 28: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/28.jpg)
Speech Application SDK
• VS.NET 1.1 integration• For web based apps
– Voice-only telephony– Multimodal browser
• Demo– Code voice-only from scratch
• Article– http://www.brains-N-brawn.com/noHands/
![Page 29: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/29.jpg)
SASDK
• Speech Synthesis– Inline– Code behind– Prompt functions– Prompt databases
• Speech Recognition– Inline– Static Grammar– Dynamic Grammar– DTMF
![Page 30: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/30.jpg)
Speech Server
• Runs SASDK applications• Primarily for Voice-only apps• Also for Multimodal PocketPC apps• Speech Language Packs
– North American Spanish– Canadian French
• Article– http://www.brains-N-brawn.com/speechMulti/
![Page 31: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/31.jpg)
Deployment
![Page 32: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/32.jpg)
Opinion
• Green light for Voice-Only– Great tool support– Cheap hardware– Language support
• Red light for Multimodal– Standards battle with VoiceXml– IE Speech Add-Ins are not accessible– Pocket IE Speech Add-In not updated for R2
release, nor does it support Smartphone
![Page 33: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/33.jpg)
Possible Future
• VS.NET 2005?
• XAML?
• Will IE7 have voice browsing built-in?
• Other browsers to add SALT support?
• Pocket IE Professional?
![Page 34: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/34.jpg)
Combo Demos
• Ink and Speech (WinForm)– InkCollection app– http://www.brains-N-brawn.com/tabletStrator/
• Ink and Speech (WebForm)– Video– http://www.brains-N-brawn.com/tabletWeb/
• Remote and Speech (AddIn)– http://www.brains-N-brawn.com/mceSAPI/
• Remote and Speech (HostedHTML)– http://www.brains-N-brawn.com/mceSALT/
![Page 35: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d835503460f94a6963e/html5/thumbnails/35.jpg)
Questions