speech recognition grammars as trindikit resources

24
Speech recognition grammars as TRINDIKIT resources David Hjelm 2003-12-12

Upload: gizi

Post on 12-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Speech recognition grammars as TRINDIKIT resources. David Hjelm 2003-12-12. TRINDIKIT. Framework for building dialogue systems Written in SICStus Prolog Contains predefined modules for input, output, interpretation, etc… Total Information State (TIS) holds information accessible by modules - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speech recognition grammars as TRINDIKIT resources

Speech recognition grammars as TRINDIKIT resources

David Hjelm

2003-12-12

Page 2: Speech recognition grammars as TRINDIKIT resources

TRINDIKIT

• Framework for building dialogue systems• Written in SICStus Prolog• Contains predefined modules for input, output,

interpretation, etc…• Total Information State (TIS) holds information

accessible by modules• As long as different modules behave similar with respect

to TIS they are interchangeable

Page 3: Speech recognition grammars as TRINDIKIT resources

Nuance

• Speech recognition, voice authentication and text-to-speech engines

• API:s to create speech-recognition/text-to-speech clients in Java, C++ and C

• Clients can read and write audio in several ways:– native sound card– telephony card– IP-telephony– from audio files

Page 4: Speech recognition grammars as TRINDIKIT resources

Speech recognition basics

Acoustic model(N-gram)

Language model(N-gram or PCFG)

speech

acoustic features

feature extraction

viterbi search orparsing

phoneme or word lattice

viterbi search

word lattice or n-best list of sentences

Page 5: Speech recognition grammars as TRINDIKIT resources

Nuance SR models

• Acoustic models (master packages)– One or several for each language + some

multilingual.

• Language models – written using Nuance’s Grammar Specification

Language (GSL). – PCFG, but SLM:s can actually be used as categories

– SLM:s trained from corpus data separately– compiled using a specific master package into a

recognition package (acoustic + language model)

Page 6: Speech recognition grammars as TRINDIKIT resources

Nuance GSL

• EBNF variant augmented with – optional probabilities– optional rudimentary slot-filling semantics– a lot of other special stuff like e.g.

• SLM inclusion• external grammar references• external rule references• special words for e.g. pauses and telephony touch-tones

• Must not be left-recursive

Page 7: Speech recognition grammars as TRINDIKIT resources

Example Nuance grammars

• Without probabilities or semantics a grammar can look like this:

.Top [ Cmd Q ]

Cmd ( [ stop play pause ] ?it)

Q ( is [ (the vcr) it ] [stopped playing paused] )

• Start symbol(s) are preceded by ’.’• Nonterminals are uppercase• Terminals are lowercase

Page 8: Speech recognition grammars as TRINDIKIT resources

More example Nuance grammars

• Probabilistic grammar:.Top [ Cmd~0.6 Q~0.4 ]

Cmd ( [ stop~0.2 play~0.4 pause~0.3 ] ?it~0.3)

Q ( is [ (the vcr)~0.3 it~0.7 ] [stopped playing paused] )

• Slot-filling grammar:.Top [ Cmd {<cmd $return>} Q {<q $return>} ]

Cmd ( [ stop {return(stop)} play {return(play)}

pause {return(pause)}] ?it)

Q ( is [ (the vcr) it ]

[ stopped {return(stop)} playing {return(play)}

paused {return(pause)} ] )

• Of course they can be combined…

Page 9: Speech recognition grammars as TRINDIKIT resources

Static or dynamic grammar compilation

• Nuance’s recognize function takes one argument, which is either of the following:– a start symbol in the current statically compiled

recognition package. In this case recognition is performed using the grammar specified.

– a GSL expression. In this case the GSL expression is dynamically compiled on the fly.

– The GSL expression can not contain recursive rules, but it can point to a precompiled ’grammar object’ which does.

Page 10: Speech recognition grammars as TRINDIKIT resources

Current TRINDIKIT – Nuance interface

• TRINDIKIT modules exist for Nuance speech input and Nuance speech output.

• OAA is used for the communication between TRINDIKIT (prolog) and Nuance client (java).

• Each OAA agent connects to a facilitator and declares a set of capabilities. Agents can then pose queries to the facilitator which delegates the each query to the appropriate agent(s) and returns an answer to the requesting agent.

Page 11: Speech recognition grammars as TRINDIKIT resources

Current TRINDIKIT – Nuance interface

TRINDIKIT OAA gateway

Nuance java client

ASRserver

TTSserver

OAA facilitator

telephonycard

nativesound card

IP telephony

Page 12: Speech recognition grammars as TRINDIKIT resources

Current TRINDIKIT – Nuance interface

• Nuance java client – provides (partial) access to Nuance java API via OAA– loads recognition package at startup– performs SR using one of its top level grammars

• TRINDIKIT input module– checks name of dummy resource $asr_grammar for name of

top level grammar– calls OAA solvable nscPlayAndRecognize(+Grammar,?Result)

• Major disadvantages:– Recognition package must be compiled before using system and

specified when running java application– Actual ASR grammar is not a part of TRINDIKIT – can not be

modified or checked for coverage by modules

Page 13: Speech recognition grammars as TRINDIKIT resources

Upcoming TRINDIKIT – Nuance interface

• Nuance java client – provides (partial) OAA access to Nuance java API – loads empty recognition package at startup– can compile GSL into a Nuance Grammar Object (NGO) via

OAA– performs SR using a GSL expression which points at a NGO

• TRINDIKIT input module– checks resource $sr_grammar for actual speech recognition

grammar– makes sure $sr_grammar is compiled into a NGO at start-up– calls OAA solvable nscPlayAndRecognize(+GSL,?Result) where

GSL = ’<file:/path/to/ngo>’

Page 14: Speech recognition grammars as TRINDIKIT resources

Upcoming TRINDIKIT – Nuance interface

TRINDIKIT OAA gateway

Nuance java client

ASRserver

TTSserver

OAA facilitator

telephonycard

nativesound card

IP telephony

Compilationserver

Page 15: Speech recognition grammars as TRINDIKIT resources

Different ways for implementing sr_grammar resource

1. Keep the GSL expression making up the Nuance grammar as a prolog string or atom

• Easy for Nuance input module• Really hard for other modules trying to reason about the

SR grammar

2. Define the EBNF rules as prolog terms• Quite easy for Nuance input module (convert EBNF to

GSL)• Enables reasoning about rules and categories by other

modules• Hard to find a working EBNF prolog notation.

Page 16: Speech recognition grammars as TRINDIKIT resources

Different ways for implementing sr_grammar resource

3. Define grammar as a set of context free grammar rules (Chosen method)

• Some computation by Nuance input module (needs to convert (CFG to BNF to GSL)

• Enables reasoning about rules and categories by other modules

• Enables efficient parsing (if needed)• Easy to find a prolog notation• Portable – same grammar can be ported to many different

speech recognizer grammar formats, as long as they are CFG-equivalent.

Page 17: Speech recognition grammars as TRINDIKIT resources

CFG resource definition

• resource relations:– start_symbol(S)

where S is a nonterminal– rule(LHS,RHS)

where LHS is a nonterminal and RHS is a list of nonterminals/terminals– rules(Rules)

where Rules is the set of rules in the resource

• resource operations (not yet implemented):– add_rule(rule(LHS,RHS))– delete_rule(rule(LHS,RHS)) – add_rules(Rules)– delete_rules(Rules)

Page 18: Speech recognition grammars as TRINDIKIT resources

CFG rule format

• Example rules:

rule( nonterminal(np), [ nonterminal(det), nonterminal(n) ] ).

rule( nonterminal(det), [ terminal(”a”) ] ).rule( nonterminal(n), [ terminal(”car”) ] ).

• Convenient when reasoning about rules in grammar but not very convenient when writing grammars…

• Solution: – write rules in EBNF-ish notation using operators.– convert EBNF-ish rules to CFG rules.

Page 19: Speech recognition grammars as TRINDIKIT resources

’blockworld’ - example CFG resource

• ebnf2cfg:assert_rules/0 converts EBNF rules to CFG rules and asserts them

:- module( blockworld , [rules/1,rule/2,start_symbol/1] ).:- ensure_loaded( ebnf2cfg ).

top( np ).np => det, adj* , n, loc? .adj => colour | size.colour => "blue" | "red" | "green".size => "big" | "small".det => "a".n => "sphere" | "cube" | "pyramid".loc => prep , np.prep => "in" | "on" | "under" | "above".

:- assert_rules.

Page 20: Speech recognition grammars as TRINDIKIT resources

Using CFG resource with Nuance input module

input:init:-check_condition( $sr_grammar::start_symbol(Start) ),check_condition( $sr_grammar::rules(set(Rules)) ),cfg2gsl(dynamic,Start,Rules,GSL), oaag:solve(nscCurrentMasterPackage(Package),( oaag:solve(nscGslCompiledToNGO(GSL,Package,Path) ->

true; oaag:solve(nscCompileGslToNGO(Gsl,Package,Path)),!.

input:input:-check_condition( $sr_grammar::start_symbol(Start) ),check_condition( $sr_grammar::rules(set(Rules)) ),cfg2gsl(dynamic,Start,Rules,GSL),oaag:solve(nscCurrentMasterPackage(Package),oaag:solve(nscGslCompiledToNGO(GSL,Package,Path),join_atoms([’<file:/’,GSL,’>’],NGOGSL),recognize_score(NGOGSL,String,Score),apply_update( set( input, String ) ),apply_update( score := Score ).

Page 21: Speech recognition grammars as TRINDIKIT resources

What must be done before CFG resource can be used with Nuance?

• Write actual code of input module (some parts are missing)

• Implement nscGetMasterPackage(?Pkg) solvable• Make sure that all nonterminals are upper-case and all

terminals are lower-case in GSL• Write real CFG resource (use existing Nuance grammar)• testing, testing and testing…

Page 22: Speech recognition grammars as TRINDIKIT resources

What should be done?

• Documentation of java and prolog code• Trindikit manual• Eliminate left-recursion• Convert to Chomsky Normal Form (?)• Parser/generator for testing CFGs inside of prolog• Multilingual nuance input module• batch scripts for running with ease• Asynchronous input algorithm

Page 23: Speech recognition grammars as TRINDIKIT resources

What can be done?

• PCFG resource– if EBNF format is used, how calculate weights when converting

to PCFG? (this has been solved in Nuance though – but is it a proper solution)

• SLM resource– would probably not store entire model in memory

• Nuance semantics + CFG/PCFG– can GoDiS semantics be expressed?

• Convert typed unification grammars to CFG resources– DCG with typed features (regulus), SKVATT(?), HPSG

• Grammatical Framework CFG approximation– e.g. by limiting sentence length or letting grammar

overgenerate– problem: any interesting grammar will overgenerate a lot

Page 24: Speech recognition grammars as TRINDIKIT resources

What can be done?

• Write modules for Java Speech API, ViaVoice, etc. using the same CFG resource…

• Use several recognition grammars in sequence (one after the other on the same input)

• Dynamically generate recognition grammar based on IS contents and or system expectations

• Letting the system learn new words - ”How do you spell that?”