dialog systems for automotive environments presenter: joseph picone inst. for signal and info....
TRANSCRIPT
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS
• Presenter:
Joseph Picone
Inst. for Signal and Info. Processing
Dept. Electrical and Computer Eng.
Mississippi State University
Email: [email protected]
• Co-Authors:
Julie Baca, Feng Zheng, Hualin Gao
Center for Advanced Vehicular Systems
Mississippi State University
Mississippi State, Mississippi 39762
• URL:http://www.isip.msstate.edu/projects/speech
EUROSPEECH 2003
Email: {baca,zheng,gao}@isip.msstate.edu
• In-vehicle dialog systems improve information access.
• Advanced user interfaces enhance workforce training and increase manufacturing efficiency.
• Noise robustness in both environments to improve recognition performance
• Advanced statistical models and machine learning technology
• Multidisciplinary team (IE, ECE, CS).
INTRODUCTIONIN-VEHICLE DIALOG SYSTEMS
DIALOG SYSTEM ARCHITECTURE
SYSTEM ARCHITECTUREDARPA COMMUNICATOR FRAMEWORK
….• Uses publicly available ISIP
speech recognition toolkit.
• Implements standard HMM-based speaker independent continuous speech recognition system.
• Complete toolkits available for many popular tasks including conversational speech.
•On-line educational materials
• Extensive documentation
SYSTEM ARCHITECTUREPUBLIC DOMAIN ASR
• Transduction:Andrea NC-65 head-mounted
• Feature extraction:standard 39-element MFCCs
• Acoustic modeling:8-mixture Gaussian HMMs
• Lexicon:7,100 words (5K WSJ, 2K names)
• Language modeling:Interpolated Bigram (ppl: ~70)
• Search:Hierarchical Viterbi Beam
SYSTEM ARCHITECTUREASR SYSTEM COMPONENTS
• Uses Phoenix semantic case frame parser from Colorado Univ. (CU).
• Employs semantic grammar consisting of case frames with named slots.
• FRAME: Drive
[route]
[distance]
[route]
(*IWANT [go_verb][arrive_loc])
IWANT
(I want *to)(I would *like *to)
(I will) (I need *to)
[go_verb]
(go)(drive)(get)(reach)
[arriveloc]
[*to [placename][cityname]]
SYSTEM ARCHITECTURENATURAL LANGUAGE UNDERSTANDING
“I want to drive from Columbus Mississippi to New York.”
SYSTEM ARCHITECTURENATURAL LANGUAGE UNDERSTANDING
SYSTEM ARCHITECTURE
• Accepts ungrammatical input,
“I want… I need to drive to the campus post office .”
• Current version of the semantic grammar contains over 500 rules and 2000 words.
• Developed from pilot test corpus of sentence patterns.
Route
IWANT go_verb arrive_loc
“I need to” “drive” placename cityname
“post office” “campus”
NLU MODULE
• Controls interaction between user and system.
• Accepts parsed input from NLU module.
• Determines data requested, obtains data and controls presentation to user.
SYSTEM ARCHITECTURE DIALOG MANAGER
User: “How can I get to campus?”
System: “Are you going to a specific location on campus?”
User: “Where is engineering?”
System: “What department?”
• Derived from CU toolkit. Bulk of development lies in construction of domain-specific frames, rules, and slots.
• Example frames and associated queries:
Drive_Direction: “How can I get from Lee Boulevardto Kroger?
Drive_Address: “Where is the campus bakery?”
Drive_Distance: “How far is China Garden?”
Drive_Quality: “Find me the most scenic routeto Scott Field.”
Drive_Turn: “I am on Nash Street. What’s my next turn?”
SYSTEM ARCHITECTURE DIALOG MANAGER
•Geographic Information System (GIS) contains map routing data for MSU and surrounding area.
• Dialog manager (DM) first determines the nature of query, then:
obtains route data from the GIS database
handles presentation of the data to the user
APPLICATION DEVELOPMENTGIS BACKEND
• Obtained domain-specific data by:
1. Initial data gathering and system testing
2. Retesting after enhancing LM and semantic grammar
• Initial efforts focused on reducing OOV utterances and parsing errors for NLU module.
APPLICATION DEVELOPMENTPILOT SYSTEM
Refinements to NLU System:
Overall System Enhancements:
Vers. 1.0 2.0 3.0
Test Pre Post Pre Post Pre Post
OOV 25% 0% 36% 0% 4% 0%
Parser 80% 3% 60% 5% 46% 11%
Test No.
NLU Parser
Error RateDM Error
Rate1 43% 49%
2 6% 3%
APPLICATION DEVELOPMENTRESULTS
• Users participate in multiple scenarios in which they query for information (e.g., hotel and meeting locations).
• Tasks vary in scenarios according to role user plays:
First-time visitors
New residents
Long-time residents
SUMMARY AND CONCLUSIONSWIZARD OF OZ DATA
SUMMARY AND CONCLUSIONSFURTHER DEVELOPMENT
• Established a preliminary dialog system for future data collection and research
• Demonstrated significant domain-specific improvements for in-vehicle dialog systems.
• Created a testbed for future studies of workforce training applications.
• Extended the ISIP public domain toolkit and released relevant resources into the public domain.
SUMMARY
RELEVANT RESOURCES
•CAVS Dialog System: review our experimental results and download the in-vehicle prototype architecture and associated components.
•Natural Language and Dialog Management Toolkits (CU): explore tools to build NLU and DM components for a specific domain.
• Speech Recognition Toolkit (ISIP): examine a state of the art public domain ASR toolkit for integration in a dialog system.