speech technologies and voicexml try department of computer science national cheng-chi university
TRANSCRIPT
![Page 1: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/1.jpg)
Speech Technologies and VoiceXML
try
Department of Computer Science
National Cheng-Chi University
![Page 2: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/2.jpg)
Reference [1]Bob Edgar(2001),“The VoiceXML Handbook” ,NY:CM
P Books. [2]Dave Raggett(2001),”Getting started with VoiceXML
2.0”,W3C. [3]Sun Microsystems(1998),”Java Speech Grammar For
mat Specification v1.0”,Sun Microsystems. [4]Chetan Sharma and Jeff Kunins(2002),”VoiceXML:St
rategies and Techniques for Effective Voice Application Development with VoiceXML 2.0”,Wiley.
[5]Brian Eberman,Jerry Carter,Darren Meyer,David Goddeau(2002),”Building VoiceXML Browsers with OpenVXI”, NY:ACM Press.
![Page 3: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/3.jpg)
Reference [6]Microsoft (2002),“Speech Technology Overview ” , htt
p://www.microsoft.com/speech/evaluation/techover/ [7] VoiceGenie Technologies Inc.(2001),”White Paper:S
peaking Freely About The VoiceGenie VoiceXML Gateway and the VoiceXML Interpreter”,VoiceGenie Technologies Inc.
[8]W3C(2002),”VoiceXML Specification v2.0”,W3C. [9]Chun-Feng,Liao(2002),” Basics of Speech Recognitio
n”,NCCU Computer Center.
![Page 4: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/4.jpg)
Presentation Agenda Voice technologies Backgrounds
ASR/TTS
Voice browsing with VoiceXML VoiceXML architecture Implementations of VoiceXML Platform VoiceXML document structure Bringing Voice Technologies into Virtual Environm
ent
![Page 5: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/5.jpg)
Voice Technologies In the mid- to late 1990s, personal computers started
to become powerful enough to support ASR The two key underlying technologies behind these
advances are speech recognition (SR) and text-to-speech synthesis (TTS).
![Page 6: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/6.jpg)
Classification of Voice Application Basic interactive voice response (IVR)
Computer: “For stock quotes, press 1. For trading, press 2. …”
Human: (presses DTMF “1”)
Basic speech ASR C: “Say the stock name for a price quote.” H: “Lucent Technologies”
![Page 7: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/7.jpg)
Classification of Voice Application Advanced speech ASR
C: “Stock Services, how may I help you?” H: “Uh, what’s Lucent trading at?”
“Near-natural language” ASR C: “How may I help you?” H: “Um, yeah, I’d like to get the current price of Lucent
Technologies” C: “Lucent is up two at sixty eight and a half.” H: “OK. I want to buy one hundred shares at market price.” C: “…”
![Page 8: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/8.jpg)
Speech Recognition Capturing speech (analog) signals Digitizing the sound waves, converting them to basic
language units or phonemes, Constructing words from phonemes, and
contextually analyzing the words to ensure correct spelling for words that sound alike (such as write and right).
![Page 9: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/9.jpg)
Speech Recognition Process Flow
Source:Microsoft Speech.NET Home(http://www.microsoft.com/speech/ )
![Page 10: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/10.jpg)
Speech Recognition Process Flow Step 1:User Input
The system catches user’s voice in the form of analog acoustic signal .
Step 2:Digitization Digitize the analog acoustic signal.
Step 3:Phonetic Breakdown Breaking signals into phonemes.
![Page 11: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/11.jpg)
Speech Recognition Process Flow Step 4:Statistical Modeling
Mapping phonemes to their phonetic representation using statistics model (ex:HMM)
Step 5:Matching According to grammar , phonetic representation and Dicti
onary , the system returns an n-best list (I.e.:a word plus a confidence score
Grammar-the union words or phrases to constraint the range of input or output in the voice application.
Dictionary-the mapping table of phonetic representation and word(EX:thu,theethe)
![Page 12: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/12.jpg)
Speech Synthesis Speech Synthesis, or text-to-speech, is the process of
converting text into spoken language. Breaking down the words into phonemes; Analyzing for special handling of text such as numbers,
currency amounts. Generating the digital audio for playback.
![Page 13: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/13.jpg)
Speech Synthesis
Source:Microsoft Speech.NET Home(http://www.microsoft.com/speech/ )
![Page 14: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/14.jpg)
Pervasive Computing Model E-business has changed from client-server model to
web-centric model Once connect to the Internet,one can get any
information he want. But people wants more convenient way to connect to Internet.
Lou Gerstner,CEO of IBM:Pervasive Computing Model is billion people interacting with million e-business with trillion devices interconnected.
![Page 15: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/15.jpg)
![Page 16: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/16.jpg)
Voice Browsing VoiceXML instead of HTML A voice browser instead of an ordinary web browser Phone instead of PC.
![Page 17: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/17.jpg)
Show : An Scenario of Using VoiceXML
應用程式
![Page 18: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/18.jpg)
VoiceXML Overview A language for specifying voice dialogs. Voice dialogs use audio prompts and text-to-speech
(TTS) for output; touch-tone keys (DTMF) and automatic speech recognition (ASR) for input.
Main input/output device (initially) is the phone. Leverages the Internet for application development
and delivery. Standard language enables portability.(unifies dialog
control languages)
![Page 19: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/19.jpg)
History of VoiceXML
Source:VoiceXML forum(http://www.voicexml.org)
![Page 20: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/20.jpg)
Making use of mature Internet Technologies Leverage existing web application development
tools. Leverage existing web infrastructure for application
delivery. Clean separation of service logic from user
interaction.
![Page 21: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/21.jpg)
VoiceXML Platform Architecture
![Page 22: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/22.jpg)
VoiceXML Platform Architecture-1 Telephone and Telephone network-Connects caller’s
telephone with Telephony Server VoiceXML Gateway
Voice Browser Audio input-Speech Recognition (ASR), Touchtone (DT
MF), Audio recording. Audio output-Audio playback, Speech Synthesis (TTS) Interface, Call Controls
![Page 23: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/23.jpg)
VoiceXML Platform Architecture-2 VoiceXML Documents
Dialog and flow control Client-side scripting (ECMAScript) Speech Recognition grammar Speech Synthesis pronunciation control
Document servers(web server) Feeding Static VoiceXML documents or audio files.
Application servers Generate VoiceXML documents dynamically. Server-side application logic Connect to Database, or database interface
![Page 24: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/24.jpg)
Voice Gateway
![Page 25: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/25.jpg)
VoiceXML Gateway(detail)
![Page 26: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/26.jpg)
Implementations of VoiceXML Gateways
In Taiwan: Yes Mobile Chunghwa Telecom Laboratories eWings Technologies, Inc
Free IBM VoiceServerSDK
Open Source CMU:OpenVXI
![Page 27: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/27.jpg)
[DEMO]How to Write and Run VoiceXML Applications?
![Page 28: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/28.jpg)
[DEMO]Generate VoiceXML Document Dynamically-using ASP.NET
![Page 29: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/29.jpg)
VoiceXML Document Structure.
![Page 30: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/30.jpg)
A Simple VoiceXML Document
![Page 31: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/31.jpg)
[DEMO]VoiceXML /HTML Comparison
![Page 32: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/32.jpg)
Bringing Voice Technologies to 3D Virtual Environment
![Page 33: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/33.jpg)
Related Research Raymond L.Smith,III and Stephen D.Roberts:
Using voice input command to operate simulation-animation.
The efficiency issues of ASR/TTS are taken into account. Satoru,Osamu,Katunobu,Takashi,Tomoyoshi,Hideki,
Shotaro,Takio and Katsuhiko: Create 3D virtual user who can speak with user via speake
r and microphone. Virtual User have the ability to learn words and recognize
human face.
![Page 34: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/34.jpg)
We can do more.. Speak to many users who are “moving” in virtual en
vironment. System are built in distributed environment.(I.e. we
b) Make use of XML technology (VoiceXML/SALT).
![Page 35: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/35.jpg)
Problems to Solve Voice /Animation synchronization. Protocol integration. ASR/TTS integration and its performance issues. Virtual user autonomy. The “Voice propagation range” issues.
![Page 36: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/36.jpg)
System Design Prototype
![Page 37: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/37.jpg)
Summary Speech is the most natural way for human to commu
nicate thus it will become an important way in HCI. VoiceXML has revolutionized speech recognition &
telephony application development & deployment. Adding Speech facilities into 3D virtual environment
will make UI more friendly and enable multi-modal input/output.
My research interest on this topic will focus on voice-animation synchronization and enable SR/TTS in distributed 3D virtual environment .
![Page 38: Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649ec95503460f94bd613a/html5/thumbnails/38.jpg)
Q & A