Download - Speech Technology. HOT! What are the big players in the area up to? Google – technology.html
![Page 1: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/1.jpg)
Speech Technology
![Page 2: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/2.jpg)
HOT!
![Page 3: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/3.jpg)
What are the big players in the area up to?
• Google– http://googleblog.blogspot.com/2010/12/can-we-talk-better-speech-tech
nology.html• Microsoft
– http://gigaom.com/2010/12/06/microsoft-claims-its-place-in-a-voice-enabled-world/
• Apple– http://www.dailyfinance.com/story/company-news/apples-siri-purchase-h
eats-up-the-race-toward-a-voice-activated/19458344/• IBM
– http://www.ibm.com/news/in/en/2010/08/20/a896686u56875f96.html• Nuance
– http://gigaom.com/2011/01/19/nuance-releases-mobile-sdk-to-speechify-apps/
• Voxeo
![Page 4: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/4.jpg)
Apple, and the case of Siri
• Siri: http://www.youtube.com/watch?v=MpjpVAB06O4
• Review of Siri: http://www.youtube.com/watch?v=AohzWSkAU7c&feature=watch_response
![Page 5: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/5.jpg)
Types of dialog systems• by modality
– text-based– spoken– graphical user interface– multi-modal
• by device – telephone-based systems– PDA systems– in-car systems– robot systems– desktop/laptop systems
• native• in-browser systems• in-virtual machine
– in-virtual environment– robots
• by style – command-based– menu-driven– natural language
• by initiative – system initiative– user initiative– mixed initiative
• by application – information service– command-and-control– entertainment– education/tutorial– edutainment– reminder systems– companion systems– healthcare– eldercare– assistive/access systems
![Page 6: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/6.jpg)
More about application types
• Information providing systems: – weather reports – stock quotes – timetables– ...
• Transaction-based systems: – calendar functions – shopping – financial transactions – travel reservations– ...
![Page 7: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/7.jpg)
Why Voice?
![Page 8: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/8.jpg)
Why voice?
• Wireless devices have small screens and limited input capabilities.
• Telephone keypad can give users only a limited number of choices.
• Speech technology is improving.• The exchange of information between a person and a
computer is becoming more like a real conversation.• Users want hands-free or eyes-free use.• From a business viewpoint, voice applications open up
a host of new revenue opportunities.• There exist many more telephones than computers
with the potential to access the Internet.
![Page 9: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/9.jpg)
Traditional Interactive Voice Response (IVR)
![Page 10: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/10.jpg)
Speech versus Touch Tone
![Page 11: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/11.jpg)
Architecture 1
![Page 12: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/12.jpg)
Architecture 2
![Page 13: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/13.jpg)
Today
• Presentation of project ideas
• TTS evaluation
• Short intro to XML
• Speech technology standards overview
• Speech Synthesis Markup Language (SSML)
• Presentation of home assignment 3: ASR evaluation
![Page 14: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/14.jpg)
Project ideas?
![Page 15: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/15.jpg)
Intro to XML
![Page 16: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/16.jpg)
W3C Speech Standards
Torbjörn Lager
![Page 17: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/17.jpg)
VoiceXML – a part of the web
Web servers
VoiceXML browser(ASR, TTS, interpreter)
VoiceXML
HTML browser
HTML
![Page 18: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/18.jpg)
The place of speech technology
• … speech technology itself has a very long way to go. … the most important thing may turn out to be not the speech technology itself, but the way in which speech technology connects to all the other technologies.
Tim Berners-Lee
![Page 19: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/19.jpg)
The What and Why of Standards
• Software standards include terminology, languages and protocols specified by committees of experts for widespread use in the software industry. Software standards have both advantages and disadvantages.
• Advantages:– developers can create applications using the standard languages that
are portable across a variety of platforms; – products from different vendors are able to interact with each other;– a community of experts evolves around the standard and is available to
develop products and services based on the standard. • Disadvantages:
– some developers feel that standards may inhibit creativity and stall the introduction of superior technology.
• However, in the area of speech, vendors are enthusiastic about standards and frequently complain that standards are not developed fast enough.
• Emerging speech-technology standards could give a boost to an industry hampered by proprietary software and hardware.
![Page 21: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/21.jpg)
W3C Speech Standards
• Speech Recognition Grammar Specification (SRGS) –• What the user can say
• Semantic Interpretation for Speech Recognition (SISR) –• What the user means
• Speech Synthesis Markup Language (SSML) – • What the user hears
• VoiceXML – • Dialog management: What the system is to do
![Page 22: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/22.jpg)
Speech Recognition Grammar Specification (SRGS)
• Covers both speech and DTMF (Dual-Tone Multi-Frequency) input. (DTMF is valuable in noisy conditions or when the social context makes it awkward to speak.)
• Grammars can be specified in either an XML or an equivalent augmented BNF (ABNF) syntax.
– Speech recognition is an inherently uncertain process. Recognizers may report confidence values.
– If the utterance has several possible parses, the recognizer may be able to report the most likely alternatives (N-best results).
• What about statistical language models? Not covered by SRGS!
![Page 23: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/23.jpg)
Semantic Interpretation for Speech Recognition (SISR)
<grammar root="answer">
<rule id="answer" scope="public"> <one-of> <item><ruleref uri="#yes"/></item> <item><ruleref uri="#no"/></item> </one-of> </rule>
<rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>yes</tag></item> <item><token>you bet</token><tag>yes</tag></item> <item xml:lang="fr-CA">oui<tag>yes</tag></item> </one-of> </rule>
<rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>no</tag> </rule>
</grammar>
![Page 24: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/24.jpg)
Semantic Interpretation for Speech Recognition (SISR)
• I would like a coca cola and three large pizzas with pepperoni and mushrooms
{ drink: { liquid:"coke", drinksize:"medium"}, pizza: { number: 3, pizzasize: "large", topping: [ "pepperoni", "mushrooms" ] }}
![Page 25: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/25.jpg)
<grammar root="order">
<rule id="order"> I would like a <ruleref uri="#drink"/> <tag>out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize;</tag> and <ruleref uri="#pizza"/> <tag>out.pizza=rules.pizza;</tag> </rule> <rule id="kindofdrink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola<tag>out="coke";</tag></item> </one-of> </rule> <rule id="foodsize"> <tag>out="medium";</tag> <item repeat="0-1"> <one-of> <item>small<tag>out="small";</tag></item> <item>medium</item> <item>large<tag>out="large";</tag></item> <item>regular<tag>out="medium";</tag></item> </one-of> </item> </rule> <rule id="tops"> <tag>out=new Array;</tag> <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> <item repeat="1-"> and <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> </item> </rule> <rule id="top"> <one-of> <item>anchovies</item> <item>pepperoni</item> <item>mushroom<tag>out="mushrooms";</tag></item> <item>mushrooms</item> </one-of> </rule>
<rule id="drink"> <ruleref uri="#foodsize"/> <ruleref uri="#kindofdrink"/> <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag> </rule> <rule id="pizza"> <ruleref uri="#number"/> <ruleref uri="#foodsize"/> <tag>out.pizzasize=rules.foodsize; out.number=rules.number;</tag> pizzas with <ruleref uri="#tops"/> <tag>out.topping=rules.tops;</tag> </rule> <rule id="number"> <one-of> <item> <tag>out=1;</tag> <one-of> <item>a</item> <item>one</item> </one-of> </item> <item>two<tag>out=2;</tag></item> <item>three<tag>out=3;</tag></item> </one-of> </rule></grammar>
I would like a coca cola and three large pizzas with pepperoni and mushrooms
{ drink: { liquid:"coke", drinksize:"medium“ }, pizza: { number: 3, pizzasize: "large", topping: [ "pepperoni", "mushrooms" ] }}
![Page 26: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/26.jpg)
Foundational
• Grammar (CFG, PSG)• Automata theory (FSMs, FSTs, etc)• Logic
• Phonetics• Linguistics• Computer science
![Page 27: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/27.jpg)
Speech Synthesis Markup Language (SSML)
• The key concepts of SSML are– interoperability, or interacting with other
markup languages (VoiceXML, etc.); – consistency, or providing predictable control
of voice output across platforms and across speech synthesis implementations; and
– internationalization, or enabling speech output in a large number of languages within or across documents.
![Page 28: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/28.jpg)
Speech Synthesis Markup Language (SSML) – An Example
<speak> <p> <s xml:lang="en-US"> <voice name="David" gender="male" age="25"> For English, press <emphasis>one</emphasis>. </voice> </s> <s xml:lang="es-MX"> <voice name="Miguel" gender="male" age="25"> Para español, oprima el <emphasis>dos</emphasis>. </voice> </s></p>
</speak>
![Page 29: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/29.jpg)
Text Structure: p and s Elements
• A p element represents a paragraph. An s element represents a sentence.
<speak> <p> <s>This is the first sentence of the paragraph.</s> <s>Here's another sentence.</s> </p></speak>
![Page 30: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/30.jpg)
The phoneme Element
• The phoneme element provides a phonemic/phonetic pronunciation for the contained text.
<speak>
<phoneme alphabet="ipa“ ph="təmei̥ɾou̥">tomato</phoneme>
</speak>
![Page 31: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/31.jpg)
The sub Element
• The sub element is employed to indicate that the text in the alias attribute value replaces the contained text for pronunciation. This allows a document to contain both a spoken and written form.
<?xml version="1.0"?><speak>
<sub alias="World Wide Web Consortium">W3C</sub>
</speak>
![Page 32: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/32.jpg)
The voice Element• The voice element is a production element that requests a change in speaking
voice. A selection of attributes is:– gender: optional attribute indicating the preferred gender of the voice to speak the
contained text. Enumerated values are: "male", "female", "neutral".– age: optional attribute indicating the preferred age in years (since birth) of the voice
to speak the contained text. – name: optional attribute indicating a processor-specific voice name to speak the
contained text.
<?xml version="1.0"?><speak>
<voice gender="female">Mary had a little lamb,</voice>
<!-- now request a different female child's voice --> <voice gender="female" age=“7">Its fleece was white as snow.</voice>
<!-- processor-specific voice selection --> <voice name="Mike">I want to be like Mike.</voice>
</speak>
![Page 33: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/33.jpg)
The emphasis Element
• The emphasis element requests that the contained text be spoken with emphasis.
<speak>
That is a <emphasis> big </emphasis> car!
That is a <emphasis level="strong"> huge </emphasis> bank account!
</speak>
![Page 34: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/34.jpg)
The break Element
• The break element is an empty element that controls the pausing or other prosodic boundaries between words.
<speak> Take a deep breath <break/> then continue. Press 1 or wait for the tone. <break time="3s"/> I didn't hear you! <break strength="weak"/> Please repeat.</speak>
![Page 35: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/35.jpg)
The prosody Element• The prosody element permits control of the pitch, speaking rate and volume of the speech output. • The attributes, all optional, are:
– pitch: the baseline pitch for the contained text. Although the exact meaning of "baseline pitch" will vary across synthesis processors, increasing/decreasing this value will typically increase/decrease the approximate pitch of the output. Legal values are: a number followed by "Hz", a relative change or "x-low", "low", "medium", "high", "x-high", or "default". Labels "x-low" through "x-high" represent a sequence of monotonically non-decreasing pitch levels.
– contour: sets the actual pitch contour for the contained text. The format is specified in Pitch contour below.– range: the pitch range (variability) for the contained text. Although the exact meaning of "pitch range" will
vary across synthesis processors, increasing/decreasing this value will typically increase/decrease the dynamic range of the output pitch. Legal values are: a number followed by "Hz", a relative change or "x-low", "low", "medium", "high", "x-high", or "default". Labels "x-low" through "x-high" represent a sequence of monotonically non-decreasing pitch ranges.
– rate: a change in the speaking rate for the contained text. Legal values are: a relative change or "x-slow", "slow", "medium", "fast", "x-fast", or "default". Labels "x-slow" through "x-fast" represent a sequence of monotonically non-decreasing speaking rates. When a number is used to specify a relative change it acts as a multiplier of the default rate. For example, a value of 1 means no change in speaking rate, a value of 2 means a speaking rate twice the default rate, and a value of 0.5 means a speaking rate of half the default rate. The default rate for a voice depends on the language and dialect and on the personality of the voice. The default rate for a voice should be such that it is experienced as a normal speaking rate for the voice when reading aloud text. Since voices are processor-specific, the default rate will be as well.
– duration: a value in seconds or milliseconds for the desired time to take to read the element contents. Follows the time value format from the Cascading Style Sheet Level 2 Recommendation [CSS2], e.g. "250ms", "3s".
– volume: the volume for the contained text in the range 0.0 to 100.0 (higher values are louder and specifying a value of zero is equivalent to specifying "silent"). Legal values are: number, a relative change or "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default". The volume scale is linear amplitude. The default is 100.0. Labels "silent" through "x-loud" represent a sequence of monotonically non-decreasing volume levels.
![Page 36: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/36.jpg)
The prosody Element (cont’d)
• Pitch contour. The pitch contour is defined as a set of white space-separated targets at specified time positions in the speech output.
• The algorithm for interpolating between the targets is processor-specific.
• In each pair of the form (time position,target), the first value is a percentage of the period of the contained text (a number followed by "%") and the second value is the value of the pitch attribute (a number followed by "Hz", a relative change, or a label value).
<?xml version="1.0"?><speak> <prosody contour="(0%,+20Hz) (10%,+30%) (40%,+10Hz)"> good morning </prosody></speak>
![Page 37: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/37.jpg)
![Page 38: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/38.jpg)
Today
• Project reminder• Presentation of the results of the TTS evaluation• Speech Synthesis Poetry Slam• Wrapping up TTS (stages of TTS)• Presentation of home assignment 3: ASR evaluation• Automatic speech recognition (ASR)• Natural language understanding (NLU)• Speech Recognition Grammar Specification (SRGS)• Semantic Interpretation for Speech Recognition (SISR)• Thursday's Lab session
![Page 39: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/39.jpg)
Architecture 1
![Page 40: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/40.jpg)
Wrapping up TTS
• Stages of TTS:– Structure analysis (sentence splitting)– Text normalisation– Text to phoneme conversion– Prosody analysis– Waveform production
• Speech Synthesis Markup Language– enables developers to override default behavior
![Page 41: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/41.jpg)
TTS stages and SSML elements
Stage SSML elements
Structure analysis (sentence splitting)
<p>, <s>, ., ?, !
Text normalisation <sub>, <say-as>
Text to phoneme conversion <phoneme>
Prosody analysis <prosody>, <emphasis>, <break>,., ?, !
Waveform production <voice>, <audio>
![Page 42: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/42.jpg)
Prosody analysis• Pitch (intonation or melody), timing (rhythm), pauses, speech rate,
emphasis on words, and the relative timing of segments and pauses.
• most TTS engines have a prosody analysis algorithm responsible for producing the prosody of synthesized speech, which is often based on the parts of speech. For example, nouns, verbs, and adjectives may be accented; whereas, auxiliary verbs and prepositions may be distressed.
• Spoken speech pauses for commas and properly inflects the speech depending upon whether the sentence is declarative, interrogative, or exclamatory.
• Prosody rules and algorithms are not perfect and are a topic of ongoing research. Prosody rules for different spoken national languages may be quite different. For example, the prosody for American, British, Indian, and Jamaican pronunciations of English are different.
![Page 43: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/43.jpg)
Speech Recognition(ASR)
![Page 44: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/44.jpg)
Architecture 1
![Page 45: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/45.jpg)
ASR Input and Output
• A speech recognizer is a component with the following inputs and outputs:
• Input
– A grammar or multiple grammars as defined by the SRGS specification. These grammars inform the recognizer of the words and patterns of words to listen for.
– An audio stream that may contain speech content that matches the grammar(s).
– Parameters: timeouts, recognition thresholds, or N-best result counts.
• Output
– Descriptions of results that indicate details about the speech content detected by the speech recognizer. Recognizers will include at least a transcription of any detected words.
– Errors and other performance information such as confidence
![Page 46: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/46.jpg)
SRGS
![Page 47: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/47.jpg)
SRGS
<grammar root="s">
<rule id="s">
hello
</rule>
</grammar>
s -> "hello"
![Page 48: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/48.jpg)
SRGS
<grammar root="s">
<rule id="s">
<one-of>
<item>hello</item>
<item>goodbye</item>
</one-of>
</rule>
</grammar>
s -> "hello"
s -> "goodbye"
s -> "hello"
| "goodbye"
![Page 49: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/49.jpg)
SRGS
<grammar root="s">
<rule id="s">
hello
<item repeat="0-1">
how are you
</item>
</rule>
</grammar>
s -> "hello" ("how are you")
![Page 50: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/50.jpg)
SRGS
<grammar root="s">
<rule id="s">
<item repeat="1-">
hello
</item>
</rule>
</grammar>
s -> "hello"
s -> "hello" s
s -> "hello"+
NOTE: Listing is no longer possible
![Page 51: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/51.jpg)
SRGS
<grammar root="s">
<rule id="s">
<item repeat="1-">
<one-of>
<item>hello</item>
<item>goodbye</item>
</one-of>
</item>
</rule>
</grammar>
![Page 52: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/52.jpg)
SRGS
<grammar root="s">
<rule id="s">
<item repeat="1-">
<ruleref uri="#greeting"/>
</item>
</rule>
<rule id="greeting">
<one-of>
<item>hello</item>
<item>goodbye</item>
</one-of>
</rule>
</grammar>
s -> greeting+
greeting -> "hello"
| "goodbye"
![Page 53: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/53.jpg)
SRGS<grammar root="city_state">
<rule id="city">
<one-of>
<item>Boston</item>
<item>Philadelphia</item>
<item>Fargo</item>
</one-of>
</rule>
<rule id="state">
<one-of>
<item>Florida</item>
<item>North Dakota</item>
<item>New York</item>
</one-of>
</rule>
<rule id="city_state">
<ruleref uri="#city"/>
<ruleref uri="#state"/>
</rule>
</grammar>
![Page 54: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/54.jpg)
SRGS + SISR
<grammar root="s">
<rule id="s">
hello
</rule>
</grammar>
<grammar root="s">
<rule id="s">
<item>
hello
<tag>hi</tag>
</item>
</rule>
</grammar>
![Page 55: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/55.jpg)
SRGS + SISR<grammar root="answer">
<rule id="answer"> <one-of> <item><ruleref uri="#yes"/></item> <item><ruleref uri="#no"/></item> </one-of> </rule>
<rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>yes</tag></item> <item><token>you bet</token><tag>yes</tag></item> <item xml:lang="fr-CA">oui<tag>yes</tag></item> </one-of> </rule>
<rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>no</tag> </rule>
</grammar>
![Page 56: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/56.jpg)
SISR
• I would like a coca cola and three large pizzas with pepperoni and mushrooms
{ drink: { liquid:"coke", drinksize:"medium"}, pizza: { number: 3, pizzasize: "large", topping: [ "pepperoni", "mushrooms" ] }}
![Page 57: Speech Technology. HOT! What are the big players in the area up to? Google – technology.html](https://reader031.vdocuments.us/reader031/viewer/2022032709/56649eca5503460f94bd7c50/html5/thumbnails/57.jpg)
<grammar root="order">
<rule id="order"> I would like a <ruleref uri="#drink"/> <tag>out.drink={}; out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize;</tag> and <ruleref uri="#pizza"/> <tag>out.pizza=rules.pizza;</tag> </rule> <rule id="kindofdrink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola<tag>out="coke";</tag></item> </one-of> </rule> <rule id="foodsize"> <tag>out="medium";</tag> <item repeat="0-1"> <one-of> <item>small<tag>out="small";</tag></item> <item>medium</item> <item>large<tag>out="large";</tag></item> <item>regular<tag>out="medium";</tag></item> </one-of> </item> </rule> <rule id="tops"> <tag>out=[];</tag> <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> <item repeat="1-"> and <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> </item> </rule> <rule id="top"> <one-of> <item>anchovies</item> <item>pepperoni</item> <item>mushroom<tag>out="mushrooms";</tag></item> <item>mushrooms</item> </one-of> </rule>
<rule id="drink"> <ruleref uri="#foodsize"/> <ruleref uri="#kindofdrink"/> <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag> </rule> <rule id="pizza"> <ruleref uri="#number"/> <ruleref uri="#foodsize"/> <tag>out.pizzasize=rules.foodsize; out.number=rules.number;</tag> pizzas with <ruleref uri="#tops"/> <tag>out.topping=rules.tops;</tag> </rule> <rule id="number"> <one-of> <item> <tag>out=1;</tag> <one-of> <item>a</item> <item>one</item> </one-of> </item> <item>two<tag>out=2;</tag></item> <item>three<tag>out=3;</tag></item> </one-of> </rule></grammar>
I would like a coca cola and three large pizzas with pepperoni and mushrooms
{ drink: { liquid:"coke", drinksize:"medium“ }, pizza: { number: 3, pizzasize: "large", topping: [ "pepperoni", "mushrooms" ] }}