voicexml: speech recognition grammars. acknowledgements prof. mctear, natural language processing, ...
Post on 26-Dec-2015
221 Views
Preview:
TRANSCRIPT
Acknowledgements
Prof. Mctear, Natural Language Processing, http://www.infj.ulst.ac.uk/nlp/index.html, University of Ulster.
Bevocal documentation
Overview
Types of grammar Grammar design and use Optional items in a grammar Semantic tags DTMF grammars Grammar rules Built-in grammars Grammar scope
What is a grammar
A grammar defines the words and patterns of words that a user can say at any particular point in a dialogue
Uses: speech recognition: to constrain the speech recognition process by
specifying permissible sequences of words language understanding: to determine the structure and/or
meaning of a sequence of words e.g.Transfer one hundred dollars from my checking to my savings account
might be parsed and transformed into the structure:
<transfer><command> transfer </command>
<destination> savings </destination><source> checking </source><amount> 100 </amount>
</transfer>
Types of grammar
Finite-state and phrase structure take the form of rules with a left-hand and right-hand side
e.g.noun_phrase -> determiner adjective noun
flight -> <destination> <date> <time> used in language understanding and speech recognition
N-gram (used in speech recognition) based on probabilities of word combinations
e.g. bigrams, trigrams
Grammar in VoiceXML
May be specified– Inline i.e. embedded into a VoiceXML page– External i.e. stored as files on Web servers, etc.
Grammar formats XML, ABNF (Augmented BNF syntax), Java Speech
Grammar format (JSGF), GSL (Nuance’s Grammar Specification language)
W3C specification embodies XML and ABNF IBM Voice Toolkit supports the XML and ABNF grammar
formats Bevocal Café, Voxpilot and Tellme support the XML and
GSL grammar formats For further details on the W3C Speech Recognition
Grammar Specification, see http://www.w3.org/TR/speech-grammar/
Inline and External Grammar Definitions
An inline grammar is defined within the <grammar> element in a VoiceXML document.
In an inline grammar, if the grammar consists of exactly 1 rule, that rule does not have to have a name.
GSL grammars use special characters: wrap your inline grammar as a section of CDATA:
<grammar ...usage attributes...> <![CDATA[ ...grammar header... ...grammar rule definitions... ]]> </grammar>
An external grammar is defined in an external file and referenced in the VoiceXML document
In an external grammar document, all rules must be named
In external GSL grammar file, the contents of that file should not be inside a CDATA section and should not contain a <grammar> element. :
;GSL2.0 ...grammar rule definitions...
<option> element
Specifies a set of possible responses for a field If the number of possible responses is small, then a set of
<option> elements can be used instead of a <grammar> element
<form><field name=“choice">
<prompt> Say students, courses, or reports
</prompt> <option>students</option> <option>courses</option> <option>reports</option>
</field></form>
<option> can also be used for alternative DTMF input e.g.<option dtmf = “1” value = “balance” > balance </option>
Grammar Design A grammar should cover all the ways that a user might say
something
1. Alternative choices within a category e.g.studentname [john rosemary etc]
2. Alternative words for the same concept e.g.[comms communications]
3. Alternative sentences that have the same meaning e.g.[(student john scott taking databases)(databases john scott)(john scott taking the course databases)]
Note: careful wording of prompts can constrain the user to saying what has been predicted by the grammar designer
These examples use the GSL grammar format, which is more suitable than the XML format for the presentation of examples
Grammars for words
Simple words (or touch-tone strings): tokens
GSL<grammar type = …> (student name) </grammar>
XML
<grammar>
<token>student name</token>
</grammar>
Alternative words
GSL
Choice[
students
courses
reports]
XML
<rule id = “choice" >
<one-of>
<item> students <item>
<item> courses <item>
<item> reports </item>
</one-of>
</rule>
Making items optional
GSL
Name
(?firstname lastname)
XML
<rule id=“name>
<item repeat=“0-1” firstname </item>
<item> lastname </item>
</rule>
Making items optional-2
( [ news weather sports ] ?please )
( ?[ (i'd like) (tell me) ] ?the [ news weather sports ] ?please )
Repeating items
XML: repeat = "0-1" means the item is optional i.e. zero or one
time repeat = "n-” means the item is repeated n or more times
e.g. “0-” = zero or more times repeat = "m-n" means the item re repeated between m
and n times (inclusive) e.g. “1-3” = between one and three times
repeat = "n" means the item is repeated exactly n times GSL:
+(item) - the item is repeated 1 or more times *(item) - the item is repeated 0 or more times ?(item) – the item is optional
Grammar Slots (Tags)
Grammar slots are used in grammars to return a value representing the meaning of the word(s) recognised e.g. ‘checking account’ and ‘checking’ should return the same value.
GSL:<field name = MainMenu>
…<![CDATA[ ( ?[ (i'd like) (tell me) ] ?the [ (news ?reports) { <selection news> } (weather ?[info information]) { <selection weather> } (sports ?[updates news]) { <selection sports> } ] ?please ) ]]> …
<filled><assign name=“selected" expr=“MainMenu.selection"/>
…
Grammars often consist of sub-grammars e.g.
;GSL 2.0; ColoredOjbect:public (Color Object) Color [
[red pink] { <color red> }[yellow canary] { <color yellow> } [green khaki] { <color green> }
] Object [
[truck car] { <object vehicle> } [ball block] { <object toy> } [shirt blouse] { <object clothing> }
] "yellow shirt" "canary blouse"=> { color: yellow; object: clothing; }
Grammar rules: sentences
Colored Object
ObjectColor
Grammar with sub-rules
Sub-grammars and rules are referenced in XML form using a rule reference. A rule reference can point to a local grammar, or an external grammar rule contained in another file or even on another server on the Internet.
Design of a grammar consisting of sub-grammars requires considerable planning to ensure that all possible utterances are covered and also to avoid redundancies as well as repetitions in the grammar.
It is often useful to map out the grammar diagrammatically or using a simple format such as GSL or ABNF before attempting to code the rules in XML format.
Rule Scope - GSL Each defined rule has a scope of either private or public. A rule with public scope is
visible outside its grammar and can be referenced by name from other grammars
can be activated for recognition (can serve as a top-level rule) A rule with private scope is
visible only within its containing grammar may be referenced only by other rules within the same grammar.
To mark a rule as public, the format is: RuleName:public ruleExpansion If no rules in the grammar are explicitly marked with :public, then all
rules in the grammar are public. If any rule in the grammar is marked with :public, then all public rules
must be so marked. The root rule in a GSL grammar is always the first public rule.
For example, the following set of definitions creates one public rule named Snapper and two private rules named SnapperType and FishColors:
SnapperType [mutton FishColors] FishColors [black gray red] Snapper:public (SnapperType snapper)
Rule scope - XML By default, VoiceXML 2.0 grammar rules are “private”. This
means that the rules can only be referenced within the same grammar file.
To allow a grammar rule to be referenced from an external source, such as a VoiceXML document or another grammar, the rule needs to be scoped as public using the scope attribute
<rule id = “choice” scope = “public” ><ruleref uri="#studentname"/> </rule> <rule id = “studentname"><one-of> <item> john </item><item> rosemary </item></one-of> </rule>
Can be referenced from outside grammar
References a rule in same grammar
Not public, can only be referenced by a rule in same grammar
Grammar Headers - GSL
Inline <grammar type="application/x-nuance-gsl">
External:;GSL2.0
...grammar rule definitions...
No definition of top-level rule Referencing an external grammar or a top level rule in a
grammar:<grammar src="foo.gsl">
<grammar src="foo.gsl#Month">
Grammar Headers - XMLInline
<grammar type="application/srgs+xml" root="source“ version=“1.0”><!– grammar rule(s) -></grammar>
External
<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN""http://www.w3.org/TR/speech-grammar/grammar.dtd"><grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"tag-format="semantics/1.0"mode="voice" root=“transfer“><!– grammar rule(s) -></grammar>
Note: the root node for the grammar must be defined
Grammar Scope
Grammar elements can be included within any VoiceXML element that receives user input field link: for transitions to other documents e.g. operator.vxml menu: grammar implicitly specified by the <choice>
element form: for mixed-initiative dialogues
by default the scope of a grammar is limited to the elements in which it is defined
scope can be set using the scope attribute e.g. grammars defined within forms or menus can be given document scope
grammars defined in the root document scope to the entire application
Using Grammar Effectively
A grammar should cover effectively the range of responses that can be encountered to a prompt
this can include the essential input as well as extraneous words and phrases
a grammar that is too large will hinder speech processing and lead potentially to more misrecognitions
scope is important: grammars should not overlap excessive use of global grammars (defined in the root
document) can increase the possibility of overlapping
Tutorial Exercise 1. Using tagsIntegrate the following rule and its grammar into an application that takes in the name of a student and the name of a course and outputs the student's name along with a course code.
<rule id="rule2" scope="public"><one-of>
<item> <one-of> <item> comms </item> <item> communications </item> </one-of> <tag>$="01"</tag> </item>
<item> algorithms <tag>$="02"</tag></item> <item> programming <tag>$="03"</tag></item> <item> databases <tag>$="04"</tag></item></one-of></rule>
DTMF
DTMF (touch-tone) can be used as an alternative to speech input, particularly when speech recognition is unreliable or problematic.
In VoiceXML 2.0 dtmf is included as a value of the mode attribute in the <grammar> element
<grammar mode="dtmf" type="application/srgs+xml" version= "1.0" root="digit">
<rule id = "digit" scope = "public"><one-of><item> 1 <tag>$= “students" </tag> </item><item> 2 <tag>$= “courses" </tag></item><item> 3 <tag>$= “reports" </tag> </item></one-of></rule></grammar>
DTMF and / or speech in GSL
;GSL 2.0;
Rating(
?[(i feel ?like) (it is ?a) (its ?a)]
[
[one dtmf-1] { <numRating 1> }
[two dtmf-2] { <numRating 2> }
[three dtmf-3] { <numRating 3> }….
]
DTMF after counts
Prompt counts can be used, e.g. to give the user an opportunity to choose using speech, then advise use of keypad if speech is unsuccessful
<nomatch count="1">
<reprompt/>
</nomatch>
<nomatch count="2">
please use your keypad
</nomatch>
Tutorial Exercise 2: DTMF and speech
Create a file with choices (student details | course details | reports) that allows speech as well as DTMF input
Include a nomatch (or noinput) event that asks the user to use the keypad on the second time that speech input is unsuccessful.
The system should confirm with words rather than DTMF
<grammar mode="dtmf" type="application/srgs+xml" version= "1.0" root="digit">
<rule id = "digit" scope = "public">
<one-of>
<item> 1 <tag>$= "student details" </tag> </item>
<grammar type="application/srgs+xml" root="choice" version="1.0">
<rule id = "choice" scope = "public">
<one-of>
<item> student details <tag>$= "student details" </tag> </item>
Built-In Grammars
Built-in grammars are provided in VoiceXML boolean (true or false: in DTMF 1 is true, 2 is false) date digits (e.g. “three four seven”) currency number (e.g. “three hundred and forty seven”) phone time
specifying within the <field> element<field name = “age” type = “number”>
Built-In Grammar: Digits
Digit recognition is performed in VoiceXML by using a built-in grammar for digits that is declared as a field type. For example:
<field name=“pin" type ="digits">
The user can say one or more digits between 0 and 9 and the result will be a string of digits.
If the field value is used in a prompt, it will be spoken as a sequence of digits e.g. “one five six four”.
You can also parameterise the digit built-in grammar as follows:
digits?minlength=n - a string of at least n digits
digits?maxlength=n - a string of at most n digits
digits? length=n - a string of exactly n digits
e.g.
<field type="digits?minlength=3;maxlength=5“>
Digits grammar example
<form><field name=“pin" type="digits?length=4">
<prompt>what is your pin?</prompt></field>
<block><prompt>Confirming your pin is <say-as interpret-as=“vxml:digits"> <value
expr=“pin"/></say-as></prompt></block></form>
Built-in grammar: boolean
The boolean grammar contains ways of saying ‘yes’ or ‘no’ The particular words within the boolean grammar are
dependent on the ‘locale’ i.e. the language type e.g. US English, UK English, etc.
The words may also vary from one platform to another
IBM Voice Toolkit UK English: yes, true, positive, right, ok, sure, affirmative, check, yep,
correct, no, false, negative, wrong,not, nope, incorrect
The return value sent is a boolean true or false. If the field name is subsequently used in a value element
within a prompt, the TTS engine will speak either yes or no. Users can also provide DTMF input: 1 is yes, and 2 is no.
Boolean grammar example
<form scope="dialog">
<field name=“pin" type="digits?length=4" modal="false"><prompt version="1.0">what is your pin?</prompt></field>
<field name="confirm" type="boolean" modal="false"> <prompt version="1.0">Please confirm your pin is <say-as interpret-
as=“vxml:digits"><value expr=“pin"/></say-as> </prompt> </field>
</form>
Built-in field type Sample input
currency three twenty five
sixteen dollars and fifty seven cents
ten dollars
nine million two hundred thousand dollars
date may fifth
march
the thirty first of december two thousand
yesterday
today
tomorrow
phone seven three five eight four nine zero
two one two four nine six two seven oh six
Sample input for built-in field types
Built-in field type Sample input
number ten million five hundred thousand and fifty three
minus one point five
plus one point five
point seven
digits zero, oh, one, two, three, four , five, six, seven, eight, nine
time one o’clock
five past one
three fifteen
seven thirty
half past eight
oh four hundred hours
sixteen fifty
twelve noon
midnight
Sample input for UK English built-in field types (continued)
Tutorial Exercise 3. Built-in grammars
Aim: to include built-in grammars
Create an application in which the user has to speak their account number, which consists of 6 digits (use built-in digit grammar).Extend the application with other built-in grammars, such as date.Experiment with the use of the DTMF simulator to enter the values for account number, date, etc.
top related