siri, watson and natural language...
TRANSCRIPT
-
Siri, Watson and Natural Language Processing
Intelligent Software Lab. POSTECHProf. Gary Geunbae Lee
-
Contents Siri, Watson and NLP Multi-domain ontology reasoning dialog systems
for intelligent assistant Multi-strategy knowledge search Question
Answering systems Multi-party open proactive dialog systems Dialog system for English education
-
Siri, Watson and NLP
-
4
Apple SiriSection
• Siri (IOS)– Released in October 4, 2011– Collaborated dialog interface,
using Web-search engine (Bing, Google, Yahoo)
– offshoot of CALO project funded by DARPA
– Implemented with Nuance Speech
-
Siri – your wish is its command
• Services– Built-in apps
• Worldwide– Phone, FaceTime, Music, Mail, Messages,
Calendar, Reminders, Notes, Contacts, Weather, Stocks, WebSearch, find My Friends,Alarms, World Clock and Timer, Wolfram:Alpha,Wikipedia search
• In the US– Maps, Local search with Yelp!
– Chatting • (maybe) it does not support discourse contexts.
– Ecosystem of partners (before apple purchased Siri)• Restaurant and Business Questions: OpenTable, Gayot, CitySearch, BooRah,
Yahoo Local, ReserveTravel, Localeze• Events and Concert information: Eventful, StubHub, LiveKick• Movie Information: MovieTickets, RottenTomatoes, New York Times• Factual Question Answering: Bing Answers, Wolfram Alpha• Web Search: Bing, Yahoo, Google
-
Sample Dialogs (chatting)U: Who are youS: I’m just a humble virtual assistantU: Seriously, who are you?S: Who I am isn’t importantU: No for real I really want to know who you areS: OK, maybe not.U: Are you SiriS: That’s me!U: What is the meaning of lifeS: I can’t answer that now, but give me some time to write a very long
play in which nothing happens.U: Once again what is the meaning of lifeS: Try and be nice to people, avoid eating fat, read a good book every
now and then, get some waking in, and try to live together in peace and harmony with people of all creeds and nations
-
Sample Dialogs (tasks)U: Remind me to order some flowers for my lovely wife and adorable
daughter tomorrow at 9 AMS: Here’s your reminder for tomorrow at 9 am, Shall I create it?U: Yes pleaseS: OK, I’ll remind you.U: Check apple stock priceS: Apple Inc. went up significantly today. 400 dollars and 29 centsU: Send to EwdisonS: What do you want to say to Ewdison Then?U: Hey, hope you’re feeling better, see you tomorrow at work … (Siri barges
in)S: I updated your message. Ready to send it?
• (Message: Hey hope you’re feeling better)U: NoS: To continue, you can Send, Cancel, Review, or Change itU: Change it…
Reminder
Stock
Messages
-
Architecture
• Active ontologies as a method and apparatus for building Siri
Survey on Siri Patent 8
Fig. 1 Intelligent Automated Assistant (from Siri Patent)
-
9
Google NowSection
• Google now (Android)– Initial-Released in July, 2012
(Galaxy nexus) – Collaborated dialog interface,
using Google Voice search and Web search
– Leverages Google Knowledge Graph project, analyzing meaning and connections of result
– Context understanding for proactive service
-
10
MS CortanaSection
• Cortana (Windows Phone)– Released in April 2, 2014– Collaborated dialog interface,
using Bing search Engine and Azure Cloud service
– Can also recognize music– Well known for predicting
winners of first 14 matches of 2014 World Cup
– Show with MS deep neural network to identify cats
-
Dan Jurafsky
Question Answering: IBM’s Watson
• Won Jeopardy on February 16, 2011!
WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA”INSPIRED THIS AUTHOR’SMOST FAMOUS NOVEL
Bram Stoker
-
Dan Jurafsky
Types of Questions in Modern Systems
• Factoid questions• Who wrote “The Universal Declaration of Human Rights”?• How many calories are there in two slices of apple pie?• What is the average age of the onset of autism?• Where is Apple Computer based?
• Why, how (procedure), what is (definition), list up, etc…• Complex (narrative) questions:
• In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever?
• What do scholars think about Jefferson’s position on dealing with pirates?
-
13/44
KB
-
14
IBM Watson Platform and Application
GenieMD Inc.health care app
Majestyk Apps.edu support app
Red Ant.retail sale business intelligence app
-
15
IBM Watson - Recent ApplicationsSection
Watson Engagement Advisor
WatsonDiscovery Advisor
Watson Explorer
발표자프레젠테이션 노트--KNOW-MEReflexis StorePulseReflexis Systems, Inc.날씨, 소셜미디어 동향, 지역 행사, 뉴스 등의 정보원으로부터 소비동향을 예측하고이에 대응하기 위한 판매전략을 도출하여 통보http://www.businesswire.com/news/home/20141008006500/en/IBM-Reflexis-Tap-Power-Watson-Transform-Retail#.VDktWPl_swA
--EMPOWER-MEWatson discovery advisorBaylor College of Medicine, Johnson & Johnson자연어 이해 능력을 이용하여 다양한 분야에서 생산되는 대량의 문헌을 이해하고 분석하여인간이 미처 발견하지 못한 가설 혹은 데이터 상의 연결점을 도출하여 통보http://www.ibm.com/smarterplanet/us/en/ibmwatson/discovery-advisor.html
--ENGAGE-MERecipe generation demo at SXSW 2014Institute of Culinary EducationWatson 시스템이 메뉴에 사용할 주 재료와 문화권 등의 스타일을 지정받은 뒤기존 조리법의 선호도 및 재료간의 조화에 대한 정보를 이용하여 새로운 조리법을 생성http://asmarterplanet.com/blog/2014/02/food-thought-ibm-watson-whips-creativity.html
-
16
IBM Watson – EcosystemSection
Recipe generation
• Watson Developer Cloud• Public API
• Watson Content Store• Content providing network
• Watson Talent Hub• Talent expert matching
-
Dan Jurafsky
Language Technology
Coreference resolution
Question answering (QA)
Part-of-speech (POS) tagging
Word sense disambiguation (WSD)
Paraphrase
Named entity recognition (NER)
ParsingSummarization
Information extraction (IE)
Machine translation (MT)Dialog
Sentiment analysis
mostly solved
making good progress
still really hard
Spam detection
Let’s go to Agra!
Buy V1AGRA …
✓
✗
Colorless green ideas sleep furiously.
ADJ ADJ NOUN VERB ADV
Einstein met with UN officials in Princeton
PERSON ORG LOC
You’re invited to our dinner party, Friday May 27 at 8:30
PartyMay 27add
Best roast chicken in San Francisco!
The waiter ignored us for 20 minutes.
Carter told Mubarak he shouldn’t run again.
I need new batteries for my mouse.
The 13th Shanghai International Film Festival…
第13届上海国际电影节开幕…
The Dow Jones is up
Housing prices rose
Economy is good
Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?
I can see Alcatraz from the window!
XYZ acquired ABC yesterday
ABC has been taken over by XYZ
Where is Citizen Kane playing in SF?
Castro Theatre at 7:30. Do you want a ticket?
The S&P500 jumped
-
What’s hard – ambiguities, ambiguities, all different levels of ambiguities
John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. [from J. Eisner]
- donut: To get a donut (doughnut; spare tire) for his car?- Donut store: store where donuts shop? or is run by donuts? or looks like a
big donut? or made of donut?- From work: Well, actually, he stopped there from hunger and exhaustion,
not just from work.- Every few hours: That’s how often he thought it? Or that’s for coffee?- it: the particular coffee that was good every few hours? the donut store?
the situation- Too expensive: too expensive for what? what are we supposed to conclude
about what John did?
-
Dan Jurafsky
non-standard English
Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥
segmentation issues idioms
dark horseget cold feet
lose facethrow in the towel
neologisms
unfriendRetweet
bromance
tricky entity names
Where is A Bug’s Life playing …Let It Be was recorded …… a mutation on the for gene …
world knowledge
Mary and Sue are sisters.Mary and Sue are mothers.
But that’s what makes it fun!
the New York-New Haven Railroadthe New York-New Haven Railroad
Why else is natural language understanding difficult?
-
Levels of Language
• Phonetics/phonology/morphology: what words (or subwords) are we dealing with?
• Syntax: What phrases are we dealing with? Which words modify one another?
• Semantics: What’s the literal meaning?• Pragmatics: What should you conclude from the
fact that I said something? How should you react?
20
-
21
Recent Trend of Application using NLPSection
• Summary of Gartner Report, 2014
-
22
Recent Trend of Application using NLP/AISection
• Summary of Gartner Report (cont.)– Scale of Market:
• 53 billion $ in 2012, will grow to 113 billion $ in 2017• About 6 billion $ in 2015 in domestic market (2012, KISTI report)
– Riffle effect: • About 1.1 billion users will use Intelligent Personal Assistant system in 2015• About 1 billion vehicles will using Artificial Intelligence
– NLP using Deep Learning: • Recent Watson adopted cloud system for distributed computing• MS launched “Adam” project using Neural Network technique
-
IOT2H (Internet of things to Human)
23
Siri KGSDS UI/UX
WATSON
Red antMajestykapps
genieMD
IOT2H Platform-communication (logos-pathos-ethos): natural language processing/emotion-thinking (smart): reasoning/ontology-knowledge (exo-brain): knowledge question answering/retrieval
IOT2H service- co-op service (human in the loop) for health, home, mobile, education
-
Multi-domain ontology reasoning
dialog systems for intelligent
assistant
-
SPOKEN DIALOG SYSTEM (SDS)
-
Interactive Question Answering New challenges for Question Answering System [TREC ciQA; HLT-NAACL2006 workshop]
Series of related questions in a session / Interact with other people Should handle anaphora, ellispses and other discourse related problems But still mainly user initiative; no dialog “management”
POS Tagging
Answer TypeIdentification
AnswerJustification
Query Formation
Dynamic AnswerPassage Selection
Answer Finding
DocumentRetrieval
Answer Type
Answer1
Question-m
Question2Question1
……..
Answer2…….Answer-m
-
Tele-service
Car-navigation Home networking
Robot interface
SDS APPLICATIONS
-
ASR (automatic speech recognition)
FeatureExtraction Decoding
AcousticModel
PronunciationModel
LanguageModel
버스 정류장이어디에있나요?
Speech Signals Word Sequence
버스정류장이어디에있나요?
NetworkConstruction
SpeechDB
TextCorpora
HMMEstimation
G2P
LMEstimation
WO
)()|(maxargˆ WPWOPWLW∈
=
-
SPEECH UNDERSTANDING (in general)
Computer Program
Speaker ID /Language ID
Sentiment / Opinion
Named Entity / Relation
Topic / Intent
Speech Segment
Summary
Syntactic / Semantic Role
SQL
Meaning Representation
Dave /English
Nervous
LOC = pod bayOBJ = door
Control the Spaceship
Open the doors.
Open=Verb, the=Det. ...
select * from DOORS where ...
-
REPRESENTATION Semantic frame (slot/value structure) [Gildea and Jurafsky, 2002]
An intermediate semantic representation to serve as the interface between user and dialog system
Each frame contains several typed components called slots. The type of a slot specifies what kind of fillers it is expecting.
“Show me flights from Seattle to Boston”
ShowFlight
Subject Flight
FLIGHT Departure_City Arrival_City
SEA BOS
FLIGHT
SEABOS
Semantic representation on ATIS task; XML format (left) and hierarchical representation (right) [Wang et al., 2005]
-
Knowledge-based Systems Knowledge-based systems:
Developers write a syntactic/semantic grammar A robust parser analyzes the input text with the grammar Without a large amount of training data
Previous works MIT: TINA (natural language understanding) [Seneff, 1992] CMU: PHEONIX [Pellom et al., 1999] SRI: GEMINI [Dowding et al., 1993]
Disadvantages1) Grammar development is an error-prone process2) It takes multiple rounds to fine-tune a grammar3) Combined linguistic and engineering expertise is required to
construct a grammar with good coverage and optimized performance
4) Such a grammar is difficult and expensive to maintain
31
-
Two Classification Problems
HOW TO SOLVE: STATISTICAL APP
Find Korean restaurants in Daeyidong, PohangInput:
Output: SEARCH_RESTAURANT
Dialog Act Identification
FOOD_TYPE ADDRESS CITY
Find Korean restaurants in Daeyidong, PohangInput:
Output: Named Entity Recognition
-
Encoding:
x is an input (word), y is an output (NE), and z is another output (DA).
Vector x = {x1, x2, x3, …, xT} Vector y = {y1, y2, y3, …, yT} Scalar z
Goal: modeling the functions y=f(x) and z=g(x)
PROBLEM FORMALIZATION
x Find Korean restaurants
in Daeyidong
, Pohang .
y O FOOD_TYPE-B O O ADDRESS-B O CITY-B O
z SEARCH_RESTAURANT
-
MACHINE LEARNING FOR SLU Background: Maximum Entropy (a.k.a logistic regression)
Conditional and discriminative manner Unstructured! (no dependency in y) Dialog act classification problem
Conditional Random Fields [Lafferty et al. 2001] Structured versions of MaxEnt (argmax search in inference) Undirected graphical models Popular in language and text processing Linear-chain structure for practical implementation Named entity recognition problem
z
x
yt-1 yt yt+1
xt-1 xt xt+1
fk
gk
hk
-
DIALOG MANAGEMENT GOAL Answer your query (e.g., question and order)
given the task domain It includes : Provide query results Ask further slot information Confirm user utterance Notify invalid query Suggest the alternative
Related to dialog complexity and task complexity.
In practice Find the best system action a given the dialog state s
-
DESIGN ISSUES Task complexity How hard the task is? How much the system has domain knowledge?
Simple Complex
Call Routing
CollaborativePlanning
WeatherInformation
Conversational English Tutoring
AutomaticBanking
-
DESIGN ISSUES Dialog complexity Which dialog phenomena are allowed
Initiative strategies e.g., system-initiative vs. user-initiative vs. mixed-initiative
Meta-dialogs; the dialog itself e.g., Could you hold on for a minute?
Subdialogs; clarification/confirmation e.g., You selected KE airlines, is it right?
Multiple dialog threads e.g., domain switching
-
DIALOG EXAMPLES Example 3
U: I’d like to have African food in Gangnam, Seoul S: Sorry, there are no African restaurants. S: How about American restaurants in Gangnam, Seoul?U: No I don’t like it.S: What is your favorite food?U: I like grilled and seasoned beef S: So, how about Korean restaurants?U: Good.
Mixed-initiative Implicit/Explicit confirmation Recommends the alternative when query fails Most natural dialog flow
-
KNOWLEDGE-BASED DM (KBDM) Rule-based approaches Early KBDMs were developed with handcrafted
rules (e.g., information state update). Simple Example [Larsson and Traum, 2003]
Agenda-based approaches Recent KBDMs were developed with domain-
specific knowledge and domain-independent dialog engine.
-
AGENDA-BASED DM RavenClaw DM (CMU) Using Hierarchical Task Decomposition
A set of all possible dialogs in the domain Tree of dialog agents Each agent handles the corresponding part of the dialog
task
[Bohus and Rudnicky, 2003]
-
Vanilla EXAMPLE-BASED DM (EBDM) Example-based approaches
Dialog State Space
Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = , Previous Main Goal = Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이어디지 ?System Action = inform(Floor)
Dialog CorpusUSER: 회의 실이 어디지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE =회의실]SYSTEM: 3층에 교수회의실, 2층에대회의실, 소회의실이있습니다. [System Action = inform(Floor)]
Turn #1 (Domain=Building_Guidance)
Dialog Example
Indexed by using semantic & discourse features
Having the similar state
),(argmax* heSe iEei∈
=
Cheongjae Lee, Sangkeun Jung, Seokhwan Kim, Gary Geunbae Lee. Example-based dialog modeling for practical multi-domain dialog system. speech communications, 51:5 (466-484), May 2009
-
STOCHASTIC DM Supervised approaches [Griol et al., 2008] Find the best system action to maximize the
conditional probability P(a|s) given the dialog state Based on supervised learning algorithms
MDP/POMDP-based approaches [Williams and Young, 2007] Find the optimal system action to maximize the reward
function R(a|s) given the belief state Based on reinforcement learning algorithms
In general, a dialog state space is too large So, generalizing the current dialog state is important
-
Template-based System Utterance Generation
System Utterance Generator
SystemTemplate
DB
System Action
Dialog Frame
Retrieved Result
Inform_cast
Program : 시크릿 가든
Cast : 현빈, 하지원
의주인공은입니다.
-
OOD/DD (Out-of-Domain/Domain Detection)
Utterance
Domain Detection
IN-DOMAIN
Task Dialog Service
OOD-CHAT
Chat Dialog Service
OOD-TASK
Rejection Message
-
OOD Utterance Rejection (Confidence Combination Approach)
Score– S(i) = λFOR * SFOR(i) + λDOD * SDOD + λDAC(i) * SDAC + λIDV(i) * SIDV(i)
FOR
DAC
IDV
DOD
NER
Positive example : IN-DOMAIN corpusNegative example : OOD-CHAT corpusFeature : lexical unigram & bigram
Data : IN-DOMAIN corpusFeature : lexical unigram & bigram
Corpus : TID corpusFeature : lexical unigram & bigram
Data : TID corpusFeature : lexical features+ Named entity dictionary
Positive example : TID corpusNegative example : OOD-CHAT corpusFeature : OOV-LSP unigram & bigram
ScoreFOR
ScoreDOD
ScoreDAC
ScoreIDV
FinalIn-DomainVerification
IN-DOMAIN
OOD
λ
Seonghan Ryu, Jaiyoun Song, Sangjoon Koo, Soonchoul Kwon, Gary Geunbae Lee. Detecting multiple domains from user’s utterance in spoken dialog system. Proceedings of the international workshop series on spoken dialog systems (IWSDS 2015), Jan 2015, Busan
-
MULTI-MODAL DIALOG SYSTEM
x y
InputGesture
OutputSystem
Response
(x, y)
Training examples
Learning algorithm
InputSpeech
Inputface
-
TASK PERFORMANCE AND USER PREFERENCE Task performance and user preference for
multimodal over speech only interfaces [Oviatt et al., 1997] 10% faster task completion, 23% fewer words, (Shorter and simpler linguistic constructions) 36% fewer task errors, 35% fewer spoken disfluencies, 90-100% user preference to interact this way.
• Speech-only dialog system
Speech: Bring the drink on the table to the side of bed
• Multimodal dialog System
Speech: Bring this to herePen gesture:
Easy, Simplified
user utterance !
-
Dialog System Development Toolkit Features
Web-based Interface Providing easy-to-use interfaces for developers Controlling complicated processes in an efficient and stable manner
Domain Dialog Corpus
Definition SLU Corpus
NLG Template
Contents
Statistics
Validation
Training
Evaluation
Dialog System
Log Analysis
Design Acquisition& Annotation
RunningTraining Maintenance
WorkflowScreen shot
Donghyeon Lee, Kyungduk Kim, Cheongjae Lee, Junhwi Choi, Gary Geunbae Lee. D3 toolkit: A development toolkit for daydreaming spoken dialog system. Proceedings of the 2nd International Workshop on Spoken Dialog Systems Technololgy (IWSDS 2010), Oct 2010, Japan. (LNAI 6392, Springer)
-
AUTOMATED DIALOG SYSTEM EVALUATION
Sangkeun Jung, Cheongjae Lee, Kyungduk Kim, Minwoo Jeong, Gary Geunbae Lee. Data-driven user simulation for automated evaluation of spoken dialog systems, computer speech and language, 23(4): 479-509, Oct 2009
-
Querying with Inference Engine
Match entry ChannelFeb 5 ManU vs Chelsea football KBS
Let’s watch Wayne Rooney’s game
SLU
Wayne Rooney : Person name
Query Generation
SELECT ?match ?entry ?channelFROM WHERE { ?match owl:hasMonth owl:Dec .
?match owl:hasDay owl:d_12 .owl:Rooney owl:isMemberOf ?t .?match owl:hasTeam ?t .?match owl:hasEntry ?entry?match owl:hasChannel ?channel}
Result
HyeongJong Nho, Cheongjae Lee, Gary Geunbae Lee. Ontology-based inference for information-seeking in natural language dialog system. Proceedings of the 6th IEEE international conference on industrial informatics (IEEE INDIN 2008) July 2008, Dajeon Korea
-
51
Platform: Multi-Domain Ontology Reasoning Intelligent Assistant Dialog System Platform
Spoken Language Understanding (SLU)
Input Sentence
Knowledge Graph
Intent Determination Named Entity Recognition
Output
Action Selection
Response Generation
Service Execution
Complete
POMDP-based Disambiguation
Discourse & Anaphor Processing
YesNo
Ontology / Reasoning Service AgentA
PITask DB/KB
-
52
Open-Domain Spoken Language Understanding
• Traditional spoken dialog systems first detect a domain from the input sentence and perform domain-specific SLU
Ontology
Input Sentence
Spoken Language Understanding (SLU)
Intent Determination Named Entity Recognition
Domain Selection
Semantic Representation
Domain
Input Sentence
Domain Detection
SLUTV Program Guide
SLUMusic Guide
SLURestaurant Guide
• However, we first perform open-domain SLU
• We exploit ontology as important resource in understanding processes
Patent pending
-
53
• Open named entity recognition (AIDA)– 1. mentions are detected using the Stanford NER Tagger– 2. mentions are mapped onto canonical entities in a knowledge base
Open-Domain Spoken Language Understanding
Mentions
Candidate EntitiesKnowledge Bases
Mention-EntityPair
Entity-EntityPair
Yosef et al. “AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables,” Proc. VLDB 2011
-
Open-Domain Named Entity Recognition
Detection of NE Mentions
Input Sentence
Dictionary
Filtering of NE Candidates
NE Candidates
Filtered NE Candidates
Evaluation of NE Combinations Semantic LM
Generation of NE Combinations
NE Combinations
Best NE Combination
Overall Architecture Goals
Mendes et al., “DBPedia Spotlight: Shedding Light on the Web of Documents”, Proc. International Conference on Semantic Systems 2011 Yosef et al. “AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables” Proc. International Conference on
Very Large Databases 2011 Roth et al. “Wikificationand Beyond: The Challenges of Entity and Concept Grounding”, Tutorial in ACL, 2014
Large-scale named entity
dictionary from Knowledge base
(e.g. DBpedia, Freebase, Yago)
Entity type disambiguation is
performed based on semantic
language model
-
Detection of Multi-intents from a Sentence
Traditional spoken dialog systems focus on processing simple input sentences
that express only one intent → single intent (SI) type
However, in the real world, users often express multiple intents (MIs) within one
dialog turn → MI conjunctive (MI.C) and MI non-conjunctive (MI.N) types
We named this task MI detection (MID)
“what is the genre of big bang theory and tell me the story about it”
Detection of multi-intents
search-genre
search-introduction
User’s Utterance
55
-
Detection of Multi-intents from a Sentence
POS Tagging
Detection of Conjunction
Disambiguation of Sentence Boundary
Restoration of Original Sentences
Evaluation of multi-intent hypotheses
Detection of single-intent
Input Sentence
POS-tagged Input Sentence
Multi-intenthypotheses
Single-intent
Final answer
Multi-intenthypotheses
SI MI.C MI.N Avg.
Baseline 97.04% 65.37% 65.08% 87.50%
Proposed 96.62% 92.11% 94.40% 95.61%
SI MI.C MI.N Avg.
Baseline 96.64% 60.32% 63.02% 86.15%
Proposed 95.95% 94.17% 92.07% 95.10%
Korean
English
Overall Architecture Results
Seonghan Ryu, Junhwi Choi, Younghee Kim, Sangjoon Koo, Gary Geunbae Lee. A two-stage approach to multi-intent detection for spoken language understanding. Submitted to the 40th international conference on acoustics, speech and signal processing (ICASSP 2015), April 2015, Brisbane
-
Out-of-Domain / Domain Detection
Traditional spoken dialog systems assumed that all user utterances belong to only one domain
0.8 0.2 0.3 0.9
Extraction of Features
“I want news now”
Binary Classification
...Feature vector: X
xi = [0 ... 1]y = {positive, negative}
Word sequence: W
x1 x2 xn-1 xn
PositivePositive or negative: y
Ryu et al. “A hierarchical domain model-based multi-domain selection framework for multi-domain dialog systems,” Proc. Coling 2012 Ryu et al. “Exploiting out-of-vocabulary words for out-of-domain detection in dialog systems,” Proc. BigComp 2014 Ryu et al. “Detecting Multiple Domains from User’s Utterance in Spoken Dialog System,” Proc. IWSDS 2015
However, in the real world, users often express multi-domain requests or out-of-domain requests
We proposed a framework that performs multi-domain detection and out-of-domain detection
In each domain, various features are extracted from an input sentence and perform binary classification
Any news is on now?
User’s Utterance
Spoken Language Understanding
TV epg
Radio epg
-
Out-of-Domain / Domain Detection
Extraction of features
Input sentence
Part-of-speech tagging
Preprocessed sentence
Intent determination
NER
Intent determination model
NER model
Lexical LM scoring
Intent and NEco-occurrence table
Lexical LM score
LSP LM scoring
LSP LM score
Intent
Named entities
Mapping
Semantic consistency
Lexical LM
LSP LM
LSP lexicon
x1: confidence score of intent determination
x4: probability of the input sentence x5: probability of the lexico-semantic pattern of the input sentence
x2: confidence score of named entity recognition
x3: semantic consistency of intent and named entities
※We are currently working on exploiting distributed word representation in language modeling
-
59
POMDP-DM with Hybrid ArchitectureSection
• Motivation of proposed method– Uncertainty Problem in Deterministic-DM
• Difficulty in making proper actions for given ambiguous input– Scalability Problem in POMDP-DM
• Difficulty in designing / tracking dialog state• Difficulty in training POMDP policy• Difficulty in eliciting system action
• Core idea of the hybrid architecture – Generate summary meta-actions with POMDP framework– Translate the actions into system output with Deterministic framework
-
60
POMDP-DM with Hybrid ArchitectureSection
• Concept diagram of proposed architecture
Ambiguous Input Meta Action
Meta Action Selector Service DM
Input
CorrespondingComponent
OutputMeta Action = Confirm
System Action
POMDPAction Selector
Service DM(Rule-based DM,
Example-based DM)
Meta Action = Submit
-
61
POMDP-DM with Hybrid ArchitectureSection
• Main architecture of proposed architecture
Tracker Part
TrackerModel
FeatureExtractor
Meta action selector
POMDPAction
Selector
SummaryState
Service Provider
ServiceDialog
Management
SlotDB
ResponseDB
User Input Recognition
ASR/NLUResult
Corresponding Architecture
POMDPModel
Ambiguous User Input
ASR/NLUResult
Tracked Result
MetaAction
PhonemeMatcher
Confirm 1st value
Request Slot Value
Provide Service Sentence
POMDPArchitecture
-
62
POMDP-DM with Hybrid ArchitectureSection
• Tracking Belief State– Estimation of observation 𝑜𝑜 from NLU hypothesis 𝐻𝐻 : 𝑃𝑃(𝑜𝑜|𝐻𝐻)
• Phoneme/Word-level Matcher• Example : 𝑃𝑃 𝑜𝑜𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑚𝑚𝑙𝑙𝑙𝑙 = 𝑚𝑚𝑚𝑚𝑎𝑎𝑚𝑚 ≈ 0.78 ,𝑃𝑃 𝑜𝑜𝑙𝑙𝑒𝑒𝑒𝑒 𝑎𝑎𝑚𝑚𝑙𝑙𝑙𝑙 = 𝑚𝑚𝑚𝑚𝑎𝑎𝑚𝑚 ≈ 0.0
– Estimation of belief update : b s′ s = 𝑃𝑃(𝑠𝑠𝑠|𝑠𝑠, 𝑎𝑎, 𝑜𝑜)• Rule-based Tracking to relieve computational complexity
south north east west
Area
Probability
none
U1 : western food please?
S1 : How may I help you?
south north east west
Area
Probability
none
U2 : No, I don’t mean it
S2 : You mean west restaurant?
-
63
POMDP-DM with Hybrid ArchitectureSection
• Generating Meta-Action– Construction of Summary-State
• Bulid Summary-State for 1st , 2nd value in each slot value• Also Build “User Intention Slot” Summary-State for 1st , 2nd value
Blaise Thompson and Steve Young. “Bayesian update of dialog states : A POMDP framework for spoken dialog systems”Computer Speech & Language 2010 vol. 24 Issue 4. pp. 562-588
-
64
POMDP-DM with Hybrid ArchitectureSection
• Generating Meta-Action (cont.)– Construction of POMDP framework
• Construct separate POMDP framework for UI slot and NE slots• Train each POMDP framework independently
0
1
1st 2nd
b'(s)
0
1
1st 2nd
b'(s)
POMDPAction Selector
UI NE #1
POMDP Action Selector
Submit
Restart
Meta Response
SystemActionModel
Template DB
Slot DB
NE #2
0
1
1st 2nd
b'(s)
Output Sentence
Submit
-
65
POMDP-DM with Hybrid ArchitectureSection
• Generating Meta-Action (multiple slot values)– Construction of POMDP framework
• Construct separate POMDP framework for UI slot and NE slots• Train each POMDP framework independently
0
1
1st 2nd
b'(s)
0
1
1st 2nd
b'(s)
POMDPAction Selector
UI NE #1
POMDP Action Selector
Submit
Restart
Meta Response
SystemActionModel
Template DB
Slot DB
NE #2
0
1
1st 2nd
b'(s)
Output Sentence
Submit
UI – POMDP Model Training
NE – POMDP Model Training
Model Construction
UI – POMDP Model Training
NE – POMDP Model Training
Model Construction
-
66
POMDP-DM with Hybrid ArchitectureSection
• Experiment (Change of Reward on Learning curve)– Observing learning curve in training process
• Each POMDP Component were trained in 400 Epochs Convergence over reward was observed
-60
-50
-40
-30
-20
-10
0
10
0 100 200 300 400
Ave
rage
Rew
ard
Epoch
Average Reward [UI Slot]
-120
-100
-80
-60
-40
-20
0
20
0 100 200 300 400
Ave
rage
Rew
ard
Epoch
Average Reward [NE Slot]
Sangjun Koo, Seonghan Ryu, Kyusong Lee, Gary Geunbae Lee. Scalable summary-state pomdp hybrid dialog system for multiple goal drifting requests and massive slot entity instances. Proceedings of the international workshop series on spoken dialog systems (IWSDS 2015), Jan 2015, Busan
-
67
Ontology-based Inference System [1/3]Section
• Ontology-based inference– Integrate cross-domain knowledge by ontology and its inference rules
• Used for :– IOT dialog system– Smart home– Smart healthcare
-
68
Ontology-based Inference System [2/3]Section
• Ontology– OWL
• Family of knowledge representation languages for knowledge bases
• Inference rules– SWRL
• OWL-DL + RuleML– The datalog sublanguage of Horn clause
FastComputer(? c)← Computer ? c ⋀ hasCPU(? c, ? cpu)⋀hasSpeed(?cpu, ? sp)⋀HighSpeed(? sp)
-
69
Ontology-based Inference System [3/3]Section
Spoken Language Understanding
Natural Language Generation
Dialog Manager
Dialog Modeling
Knowledge Manager
Inference Engine
Ontology Resource
ASR output
Named entitiesUser intentions
System response
KnowledgeSystem action
Generated querystatement
-
70
An Example Scenario of Inference ProcessSection
Semantic Representation for Input Sentence
raw utterance sentence I want to eat something spicy.
intent ask_food_recommendationnamed entity something spicyAfter Searching Ontologiesintent ask_food_recommendation_in_fridge
named entity something spicyKnowledge for User/Environmentsspeaker Tomfavorite food Tteok-bokki, Spaghetti, ...fridge materials Tteok, hot pepper paste, spring onion,
...Knowledge for Reasoningpremise
1Tom likes Tteok-bokki, Spaghetti, ...
premise2
In the fridge, there are Tteok, hot pepper paste, spring onion, ...
premise3
The recipe of Tteok-bokki is Tteok, hot pepper paste, and spring onion
premise4
Tteok-bokki is a kind of spicy food
premise5
Tom now wants to eat something spicy
Output of DMsystem action suggest_foodoutput sentence template How about eating {food} ?
output sentence How about eating Tteok-bokki?
-
71
Experimental resultSection
Before inference
Added weather info
After inference
Jaiyoun Song, Seonhan Ryu, Sangjun Koo, Gary Geunbae Lee. Ontology reasoning-based intelligent assistant for smart home. Proceedings of the IEEE spoken language technology workshop (SLT 2014), Dec 2014, Nevada (demo presentation)
-
Deep Neural Network
Deep Neural Network
Neural network + Multiple non-linear hidden layers
-
Why Deep Neural Network?
Deep levels of abstraction Mimics the cognition process of human
From low level (simple) to high level (complex)
73
-
Why Deep Neural Network?
Integrated learning Automatic feature extraction
Traditional machine learning methods
Deep neural network : Feature extractor + Classifier
74
-
Deep Neural Network Today
Difficulties of DNN
1. Difficult to train
Cannot use back-propagation algorithm (vanishing gradients)
Unsupervised pre-training
2. Computation-intensive Many parameters
GPGPU, cloud computing
3. Over-fitting
regularization, drop-out technique
75
-
Deep Neural Network Today
Deep Belief Network [Hinton 06]
Pre-train layers with an unsupervised learning algorithm
Then, fine-tune the whole network by supervised learning
DBN are stacks of restricted Boltzmann machines (RBMs)
76
-
Deep Neural Network Today
Autoencoder
NN whose the output is the same as the input
To learn is to compress data
Autoencoder learns the encoding (representation) of data
�𝑚𝑚
𝑦𝑦𝑚𝑚 − 𝑥𝑥𝑚𝑚 2
Learn to minimize
77
-
Deep Neural Network Today
Drop-out method [Hinton 12]
Drop out some weights randomly
Can reduce over-fitting problem
78
-
Deep Neural Network Today
Rectified linear unit (ReLU) Activation function used instead of sigmoid
Sparse coding : only some neurons have non-zero values
𝑔𝑔 𝑥𝑥 = max(0,𝑥𝑥)
0
0
79
-
Deep Neural Network Today
Convolution neural network [LeCun 98] Sparse network with local features within the window only
The weights are shared between windows
Popular for image recognition
Two kinds of layers, alternatively Convolution layer : Extract features from the previous layer
Max-pooling layer : Sub-sample by taking the maximum
80
-
Deep Neural Network Today for NLP
Word representation 1. One-hot vector
Ex. [0 0 0 0 0 …… 0 0 0 1 0 0 …… 0 0 0 0 0 0 0]
High dimension : 20K (speech) ~ 3M (Google 1T)
2. Class-based word representation
Hard clustering
Ex. Brown clustering (Brown et al. 1992)
3. Continuous representation
Ex. Latent semantic analysis (LSA)
Random projection
Latent Dirichlet Allocation (LDA)
HMM clustering
Distributed representations (Neural word embedding)
Dense vector
Used as pre-training and supervised training improves the representation
81
-
Deep Neural Network Today
Neural network language model (NNLM) Language model
Model to predict the next word given the context
NN language model
Two hidden layers
Training complexity is high
Between hidden output
Ex. Hierarchical softmax
Negative sampling
Ranking (hinge loss)
w(t-3)
w(t-2)
w(t-1)
w(t)
input projection hidden output
82
-
Deep Neural Network Today
Neural network language model (NNLM) Negative sampling (unsupervised training)
A word and its context is a positive sample
A random word in that context is a negative sample
Trained to be Score(positive) > Score(negative)
w(t-3)
w(t-2)
w(t-1)
w(t)
input projection hidden output
83
-
Deep Neural Network Today
Word2vec Remove the hidden layer
1000x speed-up
Continuous bag-of-words (CBoW)
Predicts the current word given the context
Skip-gram
Predicts the context given the wordw(t-3)
w(t-2)
w(t-1)
w(t)
input projection output
84
-
DNN based Korean Dependency Parsing [Changki Lee,2014]
Transition-based + Backward parsing O(N)
Constituency corpus Dependency corpus
After pre-processing
Deep learning based ReLU + Dropout
Better than sigmoid
Korean Word Embedding NNLM, Ranking (hinge loss, logit loss)
Word2vec
Feature Embedding Auto tagged PoS (stack + buffer)
Dependency label (stack)
Distance information
Valency information
Mutual information Massive corpus automatic parsing 85
-
DNN based Multi-lingual Multi-task NLP [postech on-going]
Collecting corpus for all languages, all tasks is impossible
Adaptive learning
Methods to use one language/task’s information for another language/task
Distributed word embedding is essential
Language transfer One language to another language
Pre-train with one language and further train with another language
Multi-task learning
One task to another task
Hidden layers : trained with all tasks’ data
Output layer : trained with task-specific data Parsing
Tagging
Semanticrole labeling
*Sooncheol Kwon, Byungsoo Kim, Seonyeong Park, Sangdo Han, Gary Geunbae Lee. Multi-lingual knowledge transfer for dependency parsing using deep neural network, submitted
86
-
Demo – youtube postech isoft https://www.youtube.com/watch?v=4jg0Tknl-Rw
multi-domain dialog system multi-modal dialog system (smart home) Inference dialog system One-step asr error correction
-
Multi-strategy knowledge search
Question Answering (QA) systems
-
89
Multi-Source Hybrid Question Answering (QA) System
User Input
Keyword ProcessQuestion Processing
Sparql Query Search
Web basedRDF Triple Database
SparqlQuery
generator
SparqlQuerySearch
Answer Selection
Answer
AnswerMerging
DBpedia
Open Information Extraction (off line)
WebDocuments
(wiki)
RDF T ripleExtrac tion
Documentsindexing
Detect Question or Keywords
AnswerRanking
Question Keywords
Keyword to Entity
Keyword to Property
Query Generator
TripleExtractor
NaturalLanguage Generator
Report
TemplateDB
DocumentProcessing
Relevant WebDocuments
IR/web search(Lucene)
AnswerProcessing
PassageRetrieval
Passages
Possible A nswer Extrac tion &Formulation
Question Analysis
Slot,Template Extractor
Keyword,Answer Type
Extractor
Keyword,Answer Type
Slot, Template
-
90
Section
RDFKnowledge
Base
Human 지식베이스
Answer candidates Generation
Answer candidates
Features for Query
Lexical Alteration
Disambiguation
Merge retrieved
Real SPARQL generation
Focus, LAT, query template…Graph pattern for SPARQL
Synonym/alternative words
KB property/entity
• Template-based approach [Unger et al, WWW 2012]
Knowledge based QA System
Seonyeong Park, Hyosup Shim, Gary Geunbae Lee. Isoft at QALD-4: Semantic similarity based question answering system over linked data. in Cappellato, L., Ferro, N., Halvey, M., and Kraaij, W., (eds.), CLEF 2014 Labs and Workshops, Notebook Papers (Qald task). CEUR Workshop Proceedings, vol-1180, CEUR-WS.org (2014). Sept 2014, Sheffield
-
91
Section
Knowledge based QA System
Semantic parsing based approach [Berant et al, EMNLP 2013]
Where was Obama born?
-
92
QA on Knowledge base Section
• Semantic parsing based approach – Maps natural language sentences to formal semantic representations– Independent of word order, paraphrase– Translating into KB query language is relatively easy
– Requires
• Well-defined formal representation• Set of concepts (Knowledge base)
– Previous researches focused on toy-sized KBs, recent ones utilizes bigger, more general KBs like Dbpedia, Freebase such as Sempre, paraSempre.
-
93
QA on Knowledge base Section
• Semantic parsing based approach [Berant & Liang, ACL 2014]– Process
• Making segmentation– Generate segmentations of sentence
• Translate segments into KB vocabulary– Lookup each segment from KB vocabulary– Keep two dictionaries of KB concepts
» Dictionary of entity» Dictionary of property
– Match named entity by string similarity– Match property by natural language – property model
• Combining– Combine segment into single formal representation– Performed based on combining rule
• Rewrite to query language
-
94
QA on Knowledge base Section
• Semantic parsing based approach– Example
• Question : Where was Barack Obama born?• Segmentation
– [Where] was [Barack Obama] [born] ?
• Disambiguation– [Where] Type.Location– [Barack Obama] Barack_Obama– [born] PeopleBornHere
• Combining– Type.Location ^ PeopleBornHere.Barack_Obama
-
95
QA on Knowledge base Section
• Compiling natural language-to-property mapping– Aligning approach
• Align pseudo triples from text to ones from KB– Extract pseudo triple from text
» Ex> Mary Todd married Abraham Lincoln on November 4, 1842»
– Disambiguate to KB entities» »
– Align to existing “real” triples in KB» pseudo triple» real triple from KB
– Collect matched phrase-property pairs from aligned triples» prefix “!” means reverse order
-
96
QA on Knowledge base Section
• Experiment
– Test set• 138 questions from webquestions train sets• Wh-questions on figure with single answer• Containing no alternative forms of named entity
precision recall F1-score
0.7301 0.8911 0.8025
Seonyeong PARK, Hyosup SHIM, Sangdo HAN, Byeongsoo KIM, Gary Geunbae LEE. Multi-source hybrid question answering system. Proceedings of the international workshop series on spoken dialog systems (IWSDS 2015), Jan 2015, Busan (demo presentation)
-
97
Open Information Extraction [Etzioni, Wu@UW] Section
• Extract triples from an sentence.– triple format : < argument1 ; relation ; argument2 >– argument : noun phrases in an sentence– relation : phrase shows relationship between two arguments
• Ex) sentence : Gautama Buddha taught primarily in Northeastern Indiatriple : < Gautama Buddha ; taught in ; Northeastern India >
• Open IE does not require any pre-specified relations.• Suitable for IE on the Web scale.
-
98
Open Information ExtractionSection
Dependency Pattern based IE [WOE, Wu&Weld, ACL 2010]• Extraction template :
• Ex)
• Learning extraction template– collect training data (triple-sentence pair) automatically (bootstrapping)– learn extraction template from training data
arg1 arg2rel
nsubj prep_in< arg1 ; rel in ; arg2 >
Gautama Buddha taught primarily in Northeastern India
nsubjadvmod
nnamod
prep_in
triple : < Guatama Buddha ; taught in ; Northeastern India >
-
99
Open Information ExtractionSection
SRL based IE [Christensen et al, NAACL workshop 2010]• SRL : identifying arguments of a predicate with their roles
– possible to convert SRL result to Open IE triples
• Ex) Eli Whitney created the cotton gin in 1793– SRL result
• predicate : created• arg0 : Whitney (Eli Whitney)• arg1 : gin (cotton gin)• argm-TMP : in (in 1793)
– conversion to Open IE triple style• < Eli Whitney ; created ; cotton gin >• < Eli Whitney ; created cotton gin in ; 1793 >
-
100
Open Information ExtractionSection
Current Implementation (postech, combined)• Bootstrapped Dependency pattern + SRL result
– SRL : can only extract verb mediated relation with relatively high precision
– Bootstrapped Dependency pattern : can extract both verb and noun mediated relation
– Ex) Princeton economist Paul Krugman was awarded the Nobel prize• verb mediated relation :
– < Princeton economist Paul Krugman ; was awarded ; the Nobel prize >
• noun mediated relation : – < Princeton ; economist ; Paul Krugman >
– Apply SRL based extraction to verb mediated relation, and Dependency pattern based extraction to noun mediated relation
-
101
Open Information ExtractionSection
Experiment • Test Data
– Data from “Fader et al, Identifying Relations for Open Information Extraction, 2011, EMNLP”
– Randomly selected 500 web sentences -> We used 100 sentences among them
• Result–
*Byungsoo KIM, Hyosup SHIM, Sangdo HAN, Soonchoul KWON, Seonyeong PARK, Gary Geunbae LEE. Relation disambiguation using ontology type checking and semantic relatedness. Submitted
-
102
Applying Open IE to Knowledge Base Section
Knowledge Base Augmentation• Triples extracted from Open IE can be used to augment
existing knowledge bases– need argument and relation mapping to canonical form on the ontology
(disambiguation)
• Ex) Einstein married Elsa Lowenthal on 2 June 1919.– triple from Open IE
• < Einstein ; married ; Elsa Lowenthal >– disambiguation
• Einstein → Albert_Einstein• Elsa Lowenthal → Elsa_Einstein• married → spouse
– DBpedia ontology RDF triple• < dbr:Albert_Einstein ; dbo:spouse ; dbr:Elsa_Einstein >
-
103
Applying Open IE to Knowledge BaseSection
Disambiguation with Constraint• Relation phrases on the ontology have proper argument type.• Ex) < Alain_Connes ; birthPlace ; Draguignan >
< Ayn_Rand ; birthPlace ; Saint_Petersburg >< type:Person ; birthPlace ; type:Place >
< The_Birth_of_a_Nation ; director ; D._W._Griffith >
< type:Film ; director ; type:Person >
• Use this argument type constraint when disambiguating relation– disambiguate arguments first, then use argument type information for
relation disambiguation
-
104
Scenario• Find the appropriate answer to the user question using raw text
data
Information Retrieval-based QA
1. Where was Kim yunaborn?
2. When did Kim yuna got gold medal?
3. Who is Sotnikova?
User’s Question Raw Text
Text data
Answer
Bucheon, South Korea
Section
-
105
• Architecture
Information Retrieval-based QA
Question Processing
Answer Type Detection
Entity extraction
(NER)
Triple extraction
(Parser, SRL)
Document Processing
Question
PassagesScoring
Answer type Mapping to DBpedia
Answer Type, Keywords
Answer
Answer ProcessingAnswer Candidates
Extraction
Documents scoring
Passages
Text(Wikipedia) Database
Relevant Documents
Answer Selection
Answer Type
Section
-
106
• Answer Type is important !– It can reduce the search space to find answer.– Regard answer type as type of named entity – Use ontology in the knowledgebase.
Open Domain Semantic Answer Type
…
… …
Golf player Swimmer
TennisPlayer
Thing
AgentActivity Drug Event
Person
Athlete
Wrestler
Game Sport
GetMore detailInformation of answer
Section
Seonyeong Park, Donghyeon Lee, Seonghan Ryu, Byungsoo Kim, Gary Geunbae Lee. Hierarchical dirichlet process topic modelling for flexible answer type classification in open domain question answering. Proceedings of the 10th Asian information retrieval society conference (AIRS 2014), Sarawak, Dec 2014
-
107
• Hard to detect the answer type in the question using only lexical information.– Ex) Q: Who compose the “magic waltz” ? Answer Type: composer– Ex) Q: What did Bruce Carver die from? Answer type : reason
• Use Semantic Information in the Question
Open Domain Semantic Answer Type
Extract various information of input question
Input Questionquestion –answer pair
web log
Map the information to the class in ontology (need Inference)
Answer Type
Section
-
108
– Semantic answer type detector using Knowledgebase
Open Domain Semantic Answer Type Section
Extract property semantically similar with main verb in
DBpedia
Main verb, Focus, parsing result
DBpedia
Previous Hybrid(rule + supervised learning) Answer
type classifier
Question Parsing(Parser, SRL)
user question
Detect type of focus using type info of each property in
DBpedia
No
yes
Answer type Previous small size of answer type Ontology
Measuring semantic similarity between previous ontology and
DBpedia ontology
Answer type in DBpedia ontology
Answer type in DBpedia ontology
Focus: focus is the word which will be replaced with an answer. Therefore, type of focus is same as answer type.
-
109
– Example of Semantic answer type detector using Knowledgebase• Example 1
– Q: Who has been married to Tom Cruise?– Main verb: married– Focus: “Who”– Parsing information: who is the subject of married (main verb)– property semantically similar with main verb: wife– type of property information:– Answer type : person
• Example 2– Q: Who resides in the high-rise?– Main verb: resides– Focus: “Who”– Parsing information: who is the subject of resides(main verb)– property semantically similar with main verb: residence– type of property information:– Answer type: person
Open Domain Semantic Answer Type Section
Gabrilovich, Evgeniy, and Shaul Markovitch. "Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis." IJCAI. Vol. 7. 2007.
-
110
– Extract Answer Candidates using open domain semantic answer type detector
Open Domain Semantic Answer Type Section
Keywords of the user question
Answer Type check
Answer Type
passages
Extract entities & recognize the type of entities
Answer Candidates
Open DomainSemantic Answer
Type Detector
user question
DBpediaSparql
Hoffart, Johannes, et al. "Robust disambiguation of named entities in text."Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011.
DBpediaOntology
-
111
• Sentence Scoring in Passage– Kidman has been married twice: first to actor Tom Cruise, and now to country singer Keith Urban. S
he has an adopted son and daughter with Cruise as well as two biological daughters. …. Cruise Kidman is the adopted son of actors Tom Cruise and Nicole Kidman. …..An example of the extravagance of a wedding locations is when Tom Cruise married Katie Holmes……/ …He was replaced as team leader by [[Ethan Hunt]] ([[Tom Cruise]]) after he was revealed: Impossible (film)……………………
• Measuring sentence importance– Measuring between text and question similarity– Content Selection : Choose sentences to extract from the document– Query Focused Multi Document Summarization
Answer Candidate Selection Section
-
112
• Select Important Sentence– We consider various similarity measures
• Term similarity (Jaccard coefficient)
– We can use not only answer type but also other information
• Compare not only Answer type and entity similarity but also semantic and syntactic structure of sentences.
• syntactic (Dependency Parser ) • semantic level (Semantic role labeler)
Answer Candidate Selection Section
-
113
Textual Entailment Answer Selection
T: Passages
H: Substituted sentence
Check the type of entities in the sentences (same as answer type or not) and select
important sentences
Answer Type
Answer Candidates
Answer Candidates scoring using Textual Entailment
Substitute Focus with Answer candidates
Answer
Section
Patent pending
KBQA Answer
-
114
• Textual Entailment– Text(t) : Entailing text ( Candidate sentence)– Hypothesis(h) : Entailed text (question)– Ex)
• true entailment– t : John is a fluent French speaker– h : John speaks French
• false entailment– t : John was born in France– h : John speaks French
– given t/h pair, cast the textual entailment task as a classification problem
Textual Entailment Answer Selection Section
-
115
• Textual Entailment– Other Textual Entailment Architecture
Textual Entailment Answer Selection Section
Collections of NLPApache UIMA framework
Standardized algorithms or knowledge resource (knowledge resources/ lexical syntactic resources).Different approach: transformation based, edit distance based, and classification based
Magnini, Bernardo, et al. "The Excitement Open Platform for Textual Inferences.“, ACL demo
-
Keyword QA
Goal Get keywords as input, return report answer
messi team manager
Lionel_Messi play at FC_Barcelona, Argentina_national_football_team.
Tito_Vilanova is manager of FC_Barcelona.
Extracted Data
Search from Database
116
-
Keyword QA
Query Generator
Natural Language Generator
KeywordTo
Entity
KeywordTo
Property
TemplateDB
Knowledge DB
Triple Extractor
Keyword Process
DB Entity DB PropertyProperty
Candidates
Keyword Input
Report
SPARQL QueryQuery
TripleTriple Set
Messi team manager
Lionel_Messi team
Person/heightbirthDatebirthPlacecareerStationNumberPositionTeam…
SELECT ?p ?o WHERE…
Lionel_Messi team FC_Barcelona
Lionel_Messi team FC_BarcelonaFC_Barcelona manager Tito_Vilanova
…
FC_Barcelona is Lionel_Messi’s team. FC_Barcelona’s manager is Tito_Vilanova.
Keyword Segmentator Messi, team, manager
117
http://...lionel_messi/
-
Keyword QA
118
Keyword Segmentation Segmentation using Lexicon
Wikipedia Lexicon + additional lexicon
Longest Match
Lionel messi team manager
Lionel messi, messi, team, manager,
birthday, …
Lionel messi, team, manager
-
Keyword QA
Keyword-Entity/Property Matching Module
Match user input keyword to entity/property of DB messi -> Lionel_Messi
Kim yuna -> Kim_Yu_Na
manager -> manager (property)
birth -> birthDate
AIDA (open source) Named entity disambiguation module
Match to wikipedia entity
ESA (open source) Word semantic similarity
Keyword-Entity Matching Module
Messi team
Lionel_Messi team
119
-
Keyword to Entity AIDA module (open-source)
Accurate Online Disambiguation of Named Entities
Find named entity, and match to Wikipedia page
Entity Matching
Input : “When did Barack Obama graduated Harvard Law School?”
120
-
Keyword to Property Semantic match between keyword & property
Explicit Semantic Analysis Module (open source)
Property Matching
Tom cruise
birthDate 1962-07-03
birthPlace Syracuse, New York, United States
religion Scientology
spouse Mimi_Rogers
starring Interview_with_the_vampirestarring Top_Gun
… …
Tom cruise’s triple example
Keyword ExampleTom cruise, birthday
121
-
Keyword to Property Semantic match between keyword & property
Explicit Semantic Analysis Module
Property Matching
Tom cruise
birthDate 1962-07-03
birthPlace Syracuse, New York, United States
religion Scientology
spouse Mimi_Rogers
starring Interview_with_the_vampirestarring Top_Gun
… …
Tom cruise’s triple example
Keyword ExampleTom cruise, wife
122
-
Keyword QA
Query Generator
SPQRQL query generation Extract related triples
Rule-based
Query Generator
Lionel_Messi team
SELECT , ?p, ?o WHERE { …
123
-
Query Generating Policy
Lionel_Messi
Person/heightteam
birthPlace…
Argentina_national_football_team
FC_BarcelonaFC_Barcelona_B
…
coachstadiummanager
…169.2
ArgentinaRosario,_Santa_FeSanta_Fe_Province
…
capacitychairmanmanager
…
areaTotalcapital
currency…
: Entity
: Property
………
Input keywords : messi, fc barcelona, manager
124
-
Query Generating Policy
Lionel_Messi
Person/heightteam
birthPlace…
Argentina_national_football_teamFC_Barcelona
FC_Barcelona_B…
coachstadium
manager…
169.2
ArgentinaRosario,_Santa_FeSanta_Fe_Province
…
capacitychairmanmanager
…
areaTotalcapital
currency…
: Entity
: Property
………
Input keywords : messi, team, manager
125
-
Keyword QA
Report Generator
Report triple set, template Property-template matching data
534 templates generated
Report Generator
Barak Obama graduated ColumbiaUniversity, and
HarvardLawSchool. …
Barak_Obama almaMater ColumbiaUniversity
Barak_Obama almaMater HarvardLawSchool
… … …
almaMater graduated
birthPlace borned in
… …
Extracted Triple Set
Template Set
126
-
Keyword QA
NLG Template Generator
Automatic Template Extraction Wikipedia-dbpedia
Template Generator
teamposition
…(properties)
play at plays as a plays as a
127
-
Keyword QA
Keyword Segmentation (81.64%) Whole data – 670 keyword queries
Well-segmented – 547 queries
Error - 123 queries
Out of vocabulary (segmentation lexicon)
System Answer Accuracy (95.1%) Whole data – 670 keyword queries
Right answers – 637 answers
Wrong answers – 33 answers
Error case : property / entity matching error
Sangdo Han, Hyosup Shim, Byungsoo Kim, Seonyeong Park, Seonghan Ryu, Gary Geunbae Lee. Keyworkd question answering system with reportgeneration for linked data. Proceedings of the 2015 International Conference on Big Data and Smart Computing (BigComp 2015), Jeju, Feb 2015 (short paper) 128
-
Demo movie Youtube postech isoft QA
http://www.youtube.com/watch?v=P6yL5QiJQo0 KBQA IRQA Keyword QA OpenIE
http://www.youtube.com/watch?v=P6yL5QiJQo0
-
Multi-party Open Proactive Dialog Systems
-
131
• Overall Architecture
Overall Architecture
Module Description
Situation Feature Extraction Extract Feature from Situation (Voice Activity Detection, Speaker Detection, Previous Info Stacking, ETC..)
Dialog Engagement Classify Dialog Engagement for Each Speaker ID
Speaker Identification Classify Speaker and Assign New Speaker ID
Always Listening ASR Recognize All Speech Sound
Sentence Formation Regularity Checking Check Sentence Formation Regularity for Dialog Situation Feature
Speaker ID Assignment to Sentence Assign Speaker ID to All Sentence from ASR
ASR Error Correction Correct ASR Error before Passing Sentence to the Next Step
Multiparty Language Understanding Language Understanding for Multiparty
Multiparty Dialog System Dialog System for Multiparty
Dialog Engagement
(CNN)
Always Listening ASR (Voice
Activity Detection )
Sound Signal
SituationFeature
Extraction
Dialog Situation(Vision)
Speaker Identification
Natural Language
Understanding &
Dialog Management
For Multiparty
Speaker IDAssignment to
Sentence
Sentence Formation Regularity
Checking (RNN)
ASR Error Correction
(RNN + Several Method)
-
132
• Dialog Engagement– Dialog Engagement to PC– Classify Dialog Engagement for Each Speaker ID
• ScenarioA: Let’s have a dinner outside, in some fancy restaurant!B: Great! Where should we go to?C: I like FANCYFANCY restaurant.A: Yeah, FF restaurant is good.B: Then let’s go there to have dinner.A: But we brought our car in for servicing this morning. How can we go there?B: Maybe we should take a taxi and go to the repair shop. Where was it?PosChat: NiceCar repair shop is where you brought your car in.B: Okay. Then we go to the shop and go to there for dinner. Make a reservation at 7 p.m., PosChat.PosChat: Okay. I’ll make a reservation to FF restaurant for three people at 7 p.m.
Dialog Engagement
Engage
Engage
Engage
Non-engageNon-engage
Bohus, Dan, and Eric Horvitz. "Models for multiparty engagement in open-world dialog." Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 2009.
-
133
• Dialog Engagement Architecture
Dialog Engagement
Engage
Non-engageNon-engage
Camera Face Direction
Lips Tracking
Voice Activity Detection
Always Listening ASR
Recognition Result
EngagementClassifier
Dialog Engagement
Model
DialogManagement
Learning by DNN
EngagementPrevious
State
Dialog ManagementPrevious
State
Junhwi Choi, Jeesoo Bang, Gary Geunbae Lee. “Multiparty open-world dialog system on NAO robot”. Proceedings of SLT 2014, Dec 2014, Nevada (demo presentation)
-
134
• Automatic ASR Error Correction– Two Step Process– ASR Error Detection
• Part of Speech Information based detection• Context based detection
– ASR Error Correction• Word Sequence Matching based Correction• Recurrent Neural Network based Correction
ASR Error Correction
Current Syllable
Previous Syllable Context
Confused Phoneme of Next Syllable
Probabilityof Next Syllable
ASR Error Correction Performance 1-WER
Basline (no Correction) 0.8357
Word Sequence Pattern Matching 0.8813 (27.8% Error Reduction)
Syllable RNN Only 0.8382 (1.5% Error Reduction)
Combined RNN (Syllable RNN + Phoneme RNN) 0.8480 (7.5% Error Reduction)
Word Sequence Pattern Matching + Combined RNN 0.8820 (28.1% Error Reduction)
ASR Error Detection Performance
F-Score DetectionAccuracy
POS Label Pattern 0.4744 0.8266
Word Dictionary by POS 0.3452 0.8653
Word Co-occurrence 0.4143 0.7587
Voting (threshold 2) 0.4967 0.8761
Voting (threshold 1) 0.4879 0.7337
Junhwi Choi, Donghyeon Lee, Seonghan Ryu, Kyusong Lee, Gary Geunbae Lee, ”Engine-independent ASR error management for dialog systems”, IWSDS 2014Junhwi Choi, Seonghan Ryu, Kyusong Lee, Younghee Kim, Jeesoo Bang, Seonyeong Park, and Gary Geunbae Lee. “ASR Independent Hybrid Recurrent Neural Network based Error Correction for Dialog System Applications”, Proceedings of the MA3HMI 2014 Workshop, Satellite workshop of INTERSPEECH 2014.
-
135
• Manual ASR Error Correction• ASR Error Correction Interface with Voice-only
• ScenarioA: Let’s have a dinner outside, in some fancy restaurant!B: Great! Where should we go to?C: I like FANCYFANCY restaurant.A: Yeah, FF restaurant is good.B: Then let’s go there to have dinner. Okay. Then we go to the shop and go to there for dinner. Make a reservation at 7 p.m., PosChat.PosChat: Okay. I’ll make a reservation to Effa restaurant for three people at 7 p.m.A: FF restaurant.PosChat: Okay. I’ll make a reservation to FF restaurant for three people at 7 p.m.
One-step Error Correction
User Utterance
Analysis Region
Detection
User Intention Understanding Correction
Proceed Dialog
Management
Confirmation(Optional)
Junhwi Choi, et al. "Seamless error correction interface for voice word processor." Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012.Junhwi Choi, Seonghan Ryu, Kyusong Lee, Younghee Kim, Jeesoo Bang, Seonyeong Park, and Gary Geunbae Lee. “ASR Independent Hybrid Recurrent Neural Network based Error Correction for Dialog System Applications”, Proceedings of the MA3HMI 2014 Workshop, Satellite workshop of INTERSPEECH 2014.
Understand User Intention
Proceed Correction
-
136
• User Intention Understanding– Characteristic of Clear Speech (Prosodic: Pitch, Duration, Intensity)
– Characteristic of ASR Error (Pronunciation Similarity)– Accuracy of User Intention: 84.62%
One-step Error Correction
Junhwi Choi, Seonghan Ryu, Younghee Kim, Gary Geunbae Lee. “One-step error detection and correction approach for voice word processor”, (In preparation)
-
137
• Scenario
Long-term Memory Chatting System
Hi, I’m John.
Nice to meet you.
I’ll remember that.
Do you know where I live?
You are living in Pohang.
I’m not good at foreign language.
I live in Pohang.
(I, be, John)(I, live in, Pohang)
(…, …, …)
Long-term Memory
User Utterance System Response
Hi, I’m Jane. Nice to meet you.
Do you know where I live? You are living in Seoul.
Do you know where I live? I don’t know about that.
I’m not good at foreign language.
But I heard that you can speak Chinese.
I’m not good at foreign language.
Then, can I help you?
Example Database
Then, can I help you?
Jeesoo Bang, Hyungjong Noh, Yonghee Kim, and Gary Geunbae Lee. Example-based Chat-oriented Dialogue System with Personalized Long-term Memory. Proceedings of the 2nd International Conference on Big Data and Smart Computing (BigComp 2015), 2015.
-
138
• Architecture of the Chatting System
Architecture of the Chatting System
-
139
1. Extract user-related facts (triples) from user inputs, and store them into the long-term memory
2. Modify the system response by applying user-related facts
3. Select the most appropriate response
Long-term Memory Chatting System
I’ll remember that.I live in John.
(I, be, John)(I, live in, Pohang)
Long-term Memory
User utterance System response
Do you know where I live?
You are living in Seoul.
Do you know where I live?
You are living in Pohang.
-
140
1. Knowledge Extractor• Extract user-related facts from user inputs, and store them into the long-
term memory
• RDF-style triple: trp = (arg1, rel, arg2)– arg: noun phrase– rel: textual fragment indicating semantic relation between two args– E.g. I like red apples. (I, like, red apple)
Long-term Memory Chatting System
-
141
1. Knowledge Extractor• Long-term Memory (LTM)
– Define two types of triple patterns• Triple pattern with SBJ slot (e.g. (SBJ, be, my friend))• Triple pattern with OBJ slot (e.g. (I, like, OBJ))
– Matched triples are stored in the Long-term Memory
Long-term Memory Chatting System
Triple patterns(SBJ, be, my friend)
(I, like, OBJ)(My name, be, OBJ)(I, can speak, OBJ)
…(Harry, be, my friend)
Matched triples(Harry, be, my friend) Long-term
Memory
Personal Knowledge Manager
User InputHarry is my friend.
Knowledge Extractor
-
142
2. Personal Knowledge Applier• Apply User-related facts to System Response Candidates
– For 𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠𝑠𝑠 extracted from a response (candidate) 𝑠𝑠𝑠𝑠– Replace arg2 (arg1) of the 𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠𝑠𝑠 with the arg2 (arg1) of a user-related triple,
when the two triples are similar enough except those arg2 (arg1)
Long-term Memory Chatting System
objectmy
name
is
Chuck
subject
predicateTriple extracted
from system response(candidate)
objectMy
name
is
Bruce
subject
predicateTriple inLong-term Memory
Oh, your name is Chuck.Bruce
-
143
3. General Score• Put weight on the system response which is general• Assumption: general response has many similar responses in the example
database.
– E: the example database– e = (su, ss): an example; su is a user utterance, ss is a system response– sim(s1, s2): weighted dice similarity between two sentences s1, s2– … for any sentence s = {w1, w2, …, wn}; w is a word
– userIDF(w) = log(|E|/cnt(w)) … approximation for short sentences– cnt(w): the frequency (the number of occurrence) of the word w in E
Long-term Memory Chatting System
‖𝑠𝑠‖ = �𝑢𝑢𝑠𝑠𝑚𝑚𝑡𝑡𝐼𝐼𝐼𝐼𝐼𝐼(𝑤𝑤)𝑤𝑤∈𝑠𝑠
-
144
• Anaphor: a word or phrase that refers back to an earlier word or phrase– My mother said she was leaving.
• ScenarioA: My best friend is Seonghan. His favorite fruit is strawberry.B: I like strawberry, too.A: Oh, I didn’t know that.B: My best friend is Sangdo. He likes computer games. His favorite game is FIFA online.A: Today is Seonghan’s birthday. Could you recommend a present for Senghan?PosChat: You can give him what he likes. You said Seonghan’s favorite fruit is strawberry.A: That’s good idea. B: Hmm… I am bored. Do you have any recommendations?PosChat: You can play computer games with your best friend. Sangdo’s favorite game is FIFA online.
Multi-party Chatting System
-
145
• Discourse stack– Stores the contents of multiparty dialog texts in structured format for
anaphora resolution
Multi-party Chatting System
Sentence Information
That’s good idea. DA: statement
Could you recommend a present for Senghan?
DA: yn_q
Today is Seonghan’s birthday. DA: statement
Oh, I didn’t know that. DA: statement
His favorite fruit is strawberry.
DA: statement
My best friend is Seonghan. DA: statementPerson: Senghan
Sentence Information
Do you have any recommendations?
DA: yn_q
Hmm… I am bored. DA: statement
His favorite game is FIFA online.
DA: statement
He likes computer games. DA: statement
My best friend is Sangdo. DA: statementPerson: Sangdo
I like strawberry, too. DA: statement
Sentence
You can play computer games with your best friend. Sangdo’s favorite game is FIFA online.
You can give him what he likes. You said Seonghan’sfavorite fruit is strawberry.
[I, like, strawberry]
…
[My best friend, be, Seonghan]
…
A stack B stack Poschat
A LTM B LTM
Junhwi Choi, Jeesoo Bang, Gary Geunbae Lee. “Multiparty open-world dialog system on NAO robot”. Proceedings of SLT 2014, Dec 2014, Nevada (demo presentation)
-
146
Distributed Word Representation Matching
Distributed word representation: n-dimensional vector
Can capture distributional syntactic and semantic information
Recursive Autoencoder (RAE) Combine word representations into vector representations of longer
phrases
The cats catch mice
-
147
Distributed Word Representation Matching
Paraphrase identification using distributed word representation
The cats catch mice Cats eat mice
1 2 3 4 1 2 3
6 5 4
7 5
1 2 3 4 51234567
Variable-sized similarity matrix
Fixed-sized matrix
Dynamic Pooling Softmax classifierParaphrase
Socher, Richard, et al. "Dynamic pooling and unfolding recursive autoencoders for paraphrase detection." Advances in Neural Information Processing Systems. 2011.
-
Emotional Dialog System - Issue Emotion-Based Dialog Strategy
Emotional Strategy Video / Audio Input
Facial Expression Angle of Mouth Angle of Eyes
Prosody, Accent Emotion Detection
Angery, Sad, Happy, Scared Emotional Keyword Detection
Ex) 슬퍼 -> Sad
Dialog Strategy Appropriate Strategy for Emotion
Emotion : Sad
My dog died yesterday
You look sad, cheer up!
User
AgentSangdo Han, Kyusong Lee, Donghyeon Lee, Gary Geunbae Lee. Counselling dialog system with 5W1H extraction. Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2013), Aug 2013, Metz
-
Humor Dialog System - Solution Architecture
Natural Language Understanding
Chat-oriented DM
Humor Generation
Utterance Selection
Linguistic Analysis(DA, NE)
Natural Language Understanding
User Utterance
System Utterance
Utterance Selection
Dialog Manager
Natural LanguageGeneration
Chat-oriented DM
Humor Manager
Humor Generation
Humor GenerationModule 1
Humor GenerationModule 2
Humor GenerationModule N…
Keyword Analysis
④
③②
①
Korean Patent filed: 10-2013-0008478
-
Counseling System Features
Based on Counseling Skills Open-Ended Questioning Paraphrasing Reflect feeling
Based on User’s Emotion Strategy based on Emotion
Virtual Agent Talk with 3D Virtual Agent Emotion-based Attitude
Speaker Utterance
User My friend bugs me a lot
Agent Why does your friend bug you?
User He beats me saying I am dirty
Agent Wow! You must be angry
User Ya! I am depressed
Agent Since when did it happen?
Sangdo Han, Younghee Kim, Gary Geunbae Lee. Micro-counselling dialog system based on semantic content. Proceedings of the international workshopseries on spoken dialog systems (IWSDS 2015), Jan 2015, Busan
-
Demo video postech isoft
(NAO robot) https://www.youtube.com/watch?v=pcz228RDTlk
Counselling dialog system
Demo
https://www.youtube.com/watch?v=pcz228RDTlk
-
Dialog system for English education
Gary Geunbae LeeIntelligent Software Lab.
-
CALL:POSTECH approach to CALL
Intelligent Software Lab. 153
StudentModel
Learner
PronunciationProsodyGrammar
Error Detection Game Environment
Utterance Suggestion
DB-CALL
PronunciationProsody
Grammar
Error Feedback
Data Server
-
CALL:POSTECH approach to CALL
Intelligent Software Lab. 154
PRONUNCIATION PROSODY GRAMMAR Dialog system / Game
DETECTION & FEEDBACK
DETECTION & FEEDBACK
DETECTION & FEEDBACK DB-CALL & Gameplay
Pronunciation Training & Assessment
Intonation
Phrase Break
Grammar Error Simulation
Grammatical Error Detection
Data Collection
Various Platf orm(Mobile, Tablet PC)
Student Model
Pronunciation Detection / Feedback
Pronunciation Error Simulation
Stress/Rythm
English Tutoring System
3D virtual Environment
-
Pronunciation Assessment and Training:Architecture & data flow
Jisoo Bang, Jonghoon Lee, Gary Geunbae Lee, Minhwa Chung. A pronunciation variants prediction method for Korean learner’s mispronunciation detection.(accepted) ACM Trans. on Asian Language Information Processing (TALIP)
-
Prosody Assessment and Training:Definition– What is prosody?
• English is one of the stress-timed languages• Prosody consists of rhythm, stress and intonation
– Rhythm• Determined by the beats occurring in regular patterns
– Between stressed and unstressed syllables• We derived rhythm from sentence stress patterns
– Intonation• Pitch fluctuations in utterances Showing high degree of freedom Requiring prosodic components with relatively low degree
• Integrating pitch accent, phrase accent and boundary tone
156
-
Prosody Assessment and Training:Architecture including feedback provision
Alignment
Text TextAnalysis
Speech Analysis
ProsodyPrediction
Model
Rule ApplicationRules
PredictedProsody
ModelTraining Model
ProsodyDetection
DetectedProsody
FeedbackDiff.
TextAnalysis
Text
Speech Signal
ModelTraining
Sechun Kang, Gary Geunbae Lee, Ho-Young Lee, Byeongchag Kim. An automatic pitch accent feedback system for english learners with adapatation of english corpus spoken by Koreans. Proceedings of the 2012 IEEE workshop on spoken language technology (SLT2012), Dec 2012, Miami
157
-
Prosody Assessment and Training:Rhythm’s user interface
– Component interface view
• Words: the recognized (or given) text• Canonical: sentence stress prediction results• Actual: sentence stress detection results
• Score:DetectionPrediction
}Detection)Prediction( B ,Conf | {B1 B
∪∩∉≥
−τ
158
-
Collecting Grammar Error Data: POLC:Picture description task
• From English learners of Korean• Story Telling based on pictures• 80 Students (5 tasks for each student)
Hongsuck Seo, Kyusong Lee, Gary Geunbae Lee, Soo-Ok Kweon, Hae-Ri Kim. Grammatical error annotation for Korean learners of spoken English. Proceedings of the 8th international conference on language resources and evaluation (LREC2012), May 2012, Istanbul
-
Collecting Grammar Error Data: Error tagsets
• JLE Tagset– Consisting of 46 tags– Systematic tag structure– Some ambiguity caused by POS specific error tag structure
• CLC Tagset– World-widely used tagset including 76 tags– Systematic & Taxonomic tag structure– JLE issue is figured out by taxonomic tag structure
• NUCLE Tagset– 27 error tags– Quiet arbitrary tag structure
• UIUC Tagset– Only for articles and prepositions
-
TextErroneous TextGrammatical Error
Simulation
ASR ASR’
N-gram LM
Merged Hypotheses
Error-typeClassifier
GrammaticalityChecker
N-gram LM
Feedback
Error PatternsError Frequency
Grammar Assessment and Training:Grammar error detector architecture
Sungjin Lee, Hyungjong Noh, Kyusong Lee, Gary Geunbae Lee. Grammatical error detection for corrective feedback provision in oral conversations. Proceedings of the 25th AAAI conference on artificial intelligence (AAAI-11), Aug 2011, Sanfransisco
-
Grammar Assessment and Training:Grammatical Error Simulation
Automatic Speech Recognizer
Grammar Error Simulator
Incorrect Sentences
Correct Sentences
Error Types
Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. Proceedings of the ACL 2009, August 2009 Singapore (short paper)
-
Spoken Dialog System DB-CALL System
Dialog-based Language Learning System: Dialog-based CALL system
Cheongjae Lee, Sangkeun Jung, Kyungduk Kim, Gary Geunbae Lee. Hybrid approach to robust dialog management using agenda and dialog examples. computer speech and language, 24 (4): 609-631, Oct 2010
-
Dialog-based Language Learning System:The Framework of Ranking DM
Scoring Module
User Intention: SLU N-best(System Intention)
CalculatedScores
Next System Intention(User Intention)
Ranking various scores Robust system action
Hyungjong Noh, Sungjin Lee, Kyusong Lee. Gary Geunbae Lee. Ranking dialog acts using discourse coherence indicator for English tutoring dialog systems. Proceedings of the 3rd international workshop on spoken dialog systems technology (IWSDS 2011), Sept 2011, Granada Spain
-
Dialog-based Language Learning System:POMY system architecture
Kyusong Lee, Soo-ock Kweon, Hyungjong Noh, Gary Geunbae Lee. Postech Immersive English Study (POMY): Dialog-based Language Learning Game.(accepted) IEICE transactions on information and systems
-
Pre-test Post-test Mean
Category N Mean SD Mean SD difference p
Listening 25 56.4 16.6 71.2 20.9 14.8 0.0001**
Vocabulary 25 74.0 31.4 117.6 32.7 43.6 0.0001**
Speaking 25 Pronunciation 25 42.08 6.80 44.48 6.80 2.40 0.0001**
Grammar 25 36.56 8.45 42.40 6.95 5.84 0.0001**
# of Words 25 136.31 55.30 170.04 80.88 33.73 0.003**
Table 1. Overall
Dialog-based Language Learning System:Cognitive effect on overalls students
• Significantly Improved• Students Spoke more words in post test
-
Demo video (Postech isoft dbcall) (postechisoft pesaa)
https://www.youtube.com/watch?v=k0TAdfngZpU
Robot dbcall system 2013 pesaa system
Demo
https://www.youtube.com/watch?v=k0TAdfngZpU
-
Thank You & QA
Siri, Watson and Natural Language ProcessingContentsSiri, Watson and NLPApple SiriSiri – your wish is its commandSample Dialogs (chatting)Sample Dialogs (tasks)Architecture�Google NowMS CortanaQuestion Answering: IBM’s WatsonTypes of Questions in Modern Systems슬라이드 번호 13IBM Watson Platform and ApplicationIBM Watson - Recent ApplicationsIBM Watson – EcosystemLanguage TechnologyWhat’s hard – ambiguities, ambiguities, all different levels of ambiguitiesWhy else is natural language understanding difficult?Levels of LanguageRecent Trend of Application using NLPRecent Trend of Application using NLP/AIIOT2H (Internet of things to Human)Multi-domain ontology reasoning dialog systems for intelligent assistantSPOKEN DIALOG SYSTEM (SDS)Interactive Question AnsweringSDS APPLICATIONSASR (automatic speech recognition)SPEECH UNDERSTANDING (in general)REPRESENTATIONKnowledge-based SystemsHOW TO SOLVE: STATISTICAL APPPROBLEM FORMALIZATIONMACHINE LEARNING FOR SLUDIALOG MANAGEMENTDESIGN ISSUESDESIGN ISSUESDIAL