siri, watson and natural language...

168
Siri, Watson and Natural Language Processing Intelligent Software Lab. POSTECH Prof. Gary Geunbae Lee

Upload: others

Post on 19-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Siri, Watson and Natural Language Processing

    Intelligent Software Lab. POSTECHProf. Gary Geunbae Lee

  • Contents Siri, Watson and NLP Multi-domain ontology reasoning dialog systems

    for intelligent assistant Multi-strategy knowledge search Question

    Answering systems Multi-party open proactive dialog systems Dialog system for English education

  • Siri, Watson and NLP

  • 4

    Apple SiriSection

    • Siri (IOS)– Released in October 4, 2011– Collaborated dialog interface,

    using Web-search engine (Bing, Google, Yahoo)

    – offshoot of CALO project funded by DARPA

    – Implemented with Nuance Speech

  • Siri – your wish is its command

    • Services– Built-in apps

    • Worldwide– Phone, FaceTime, Music, Mail, Messages,

    Calendar, Reminders, Notes, Contacts, Weather, Stocks, WebSearch, find My Friends,Alarms, World Clock and Timer, Wolfram:Alpha,Wikipedia search

    • In the US– Maps, Local search with Yelp!

    – Chatting • (maybe) it does not support discourse contexts.

    – Ecosystem of partners (before apple purchased Siri)• Restaurant and Business Questions: OpenTable, Gayot, CitySearch, BooRah,

    Yahoo Local, ReserveTravel, Localeze• Events and Concert information: Eventful, StubHub, LiveKick• Movie Information: MovieTickets, RottenTomatoes, New York Times• Factual Question Answering: Bing Answers, Wolfram Alpha• Web Search: Bing, Yahoo, Google

  • Sample Dialogs (chatting)U: Who are youS: I’m just a humble virtual assistantU: Seriously, who are you?S: Who I am isn’t importantU: No for real I really want to know who you areS: OK, maybe not.U: Are you SiriS: That’s me!U: What is the meaning of lifeS: I can’t answer that now, but give me some time to write a very long

    play in which nothing happens.U: Once again what is the meaning of lifeS: Try and be nice to people, avoid eating fat, read a good book every

    now and then, get some waking in, and try to live together in peace and harmony with people of all creeds and nations

  • Sample Dialogs (tasks)U: Remind me to order some flowers for my lovely wife and adorable

    daughter tomorrow at 9 AMS: Here’s your reminder for tomorrow at 9 am, Shall I create it?U: Yes pleaseS: OK, I’ll remind you.U: Check apple stock priceS: Apple Inc. went up significantly today. 400 dollars and 29 centsU: Send to EwdisonS: What do you want to say to Ewdison Then?U: Hey, hope you’re feeling better, see you tomorrow at work … (Siri barges

    in)S: I updated your message. Ready to send it?

    • (Message: Hey hope you’re feeling better)U: NoS: To continue, you can Send, Cancel, Review, or Change itU: Change it…

    Reminder

    Stock

    Messages

  • Architecture

    • Active ontologies as a method and apparatus for building Siri

    Survey on Siri Patent 8

    Fig. 1 Intelligent Automated Assistant (from Siri Patent)

  • 9

    Google NowSection

    • Google now (Android)– Initial-Released in July, 2012

    (Galaxy nexus) – Collaborated dialog interface,

    using Google Voice search and Web search

    – Leverages Google Knowledge Graph project, analyzing meaning and connections of result

    – Context understanding for proactive service

  • 10

    MS CortanaSection

    • Cortana (Windows Phone)– Released in April 2, 2014– Collaborated dialog interface,

    using Bing search Engine and Azure Cloud service

    – Can also recognize music– Well known for predicting

    winners of first 14 matches of 2014 World Cup

    – Show with MS deep neural network to identify cats

  • Dan Jurafsky

    Question Answering: IBM’s Watson

    • Won Jeopardy on February 16, 2011!

    WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF

    WALLACHIA AND MOLDOVIA”INSPIRED THIS AUTHOR’SMOST FAMOUS NOVEL

    Bram Stoker

  • Dan Jurafsky

    Types of Questions in Modern Systems

    • Factoid questions• Who wrote “The Universal Declaration of Human Rights”?• How many calories are there in two slices of apple pie?• What is the average age of the onset of autism?• Where is Apple Computer based?

    • Why, how (procedure), what is (definition), list up, etc…• Complex (narrative) questions:

    • In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever?

    • What do scholars think about Jefferson’s position on dealing with pirates?

  • 13/44

    KB

  • 14

    IBM Watson Platform and Application

    GenieMD Inc.health care app

    Majestyk Apps.edu support app

    Red Ant.retail sale business intelligence app

  • 15

    IBM Watson - Recent ApplicationsSection

    Watson Engagement Advisor

    WatsonDiscovery Advisor

    Watson Explorer

    발표자프레젠테이션 노트--KNOW-MEReflexis StorePulseReflexis Systems, Inc.날씨, 소셜미디어 동향, 지역 행사, 뉴스 등의 정보원으로부터 소비동향을 예측하고이에 대응하기 위한 판매전략을 도출하여 통보http://www.businesswire.com/news/home/20141008006500/en/IBM-Reflexis-Tap-Power-Watson-Transform-Retail#.VDktWPl_swA

    --EMPOWER-MEWatson discovery advisorBaylor College of Medicine, Johnson & Johnson자연어 이해 능력을 이용하여 다양한 분야에서 생산되는 대량의 문헌을 이해하고 분석하여인간이 미처 발견하지 못한 가설 혹은 데이터 상의 연결점을 도출하여 통보http://www.ibm.com/smarterplanet/us/en/ibmwatson/discovery-advisor.html

    --ENGAGE-MERecipe generation demo at SXSW 2014Institute of Culinary EducationWatson 시스템이 메뉴에 사용할 주 재료와 문화권 등의 스타일을 지정받은 뒤기존 조리법의 선호도 및 재료간의 조화에 대한 정보를 이용하여 새로운 조리법을 생성http://asmarterplanet.com/blog/2014/02/food-thought-ibm-watson-whips-creativity.html

  • 16

    IBM Watson – EcosystemSection

    Recipe generation

    • Watson Developer Cloud• Public API

    • Watson Content Store• Content providing network

    • Watson Talent Hub• Talent expert matching

  • Dan Jurafsky

    Language Technology

    Coreference resolution

    Question answering (QA)

    Part-of-speech (POS) tagging

    Word sense disambiguation (WSD)

    Paraphrase

    Named entity recognition (NER)

    ParsingSummarization

    Information extraction (IE)

    Machine translation (MT)Dialog

    Sentiment analysis

    mostly solved

    making good progress

    still really hard

    Spam detection

    Let’s go to Agra!

    Buy V1AGRA …

    Colorless green ideas sleep furiously.

    ADJ ADJ NOUN VERB ADV

    Einstein met with UN officials in Princeton

    PERSON ORG LOC

    You’re invited to our dinner party, Friday May 27 at 8:30

    PartyMay 27add

    Best roast chicken in San Francisco!

    The waiter ignored us for 20 minutes.

    Carter told Mubarak he shouldn’t run again.

    I need new batteries for my mouse.

    The 13th Shanghai International Film Festival…

    第13届上海国际电影节开幕…

    The Dow Jones is up

    Housing prices rose

    Economy is good

    Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?

    I can see Alcatraz from the window!

    XYZ acquired ABC yesterday

    ABC has been taken over by XYZ

    Where is Citizen Kane playing in SF?

    Castro Theatre at 7:30. Do you want a ticket?

    The S&P500 jumped

  • What’s hard – ambiguities, ambiguities, all different levels of ambiguities

    John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. [from J. Eisner]

    - donut: To get a donut (doughnut; spare tire) for his car?- Donut store: store where donuts shop? or is run by donuts? or looks like a

    big donut? or made of donut?- From work: Well, actually, he stopped there from hunger and exhaustion,

    not just from work.- Every few hours: That’s how often he thought it? Or that’s for coffee?- it: the particular coffee that was good every few hours? the donut store?

    the situation- Too expensive: too expensive for what? what are we supposed to conclude

    about what John did?

  • Dan Jurafsky

    non-standard English

    Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥

    segmentation issues idioms

    dark horseget cold feet

    lose facethrow in the towel

    neologisms

    unfriendRetweet

    bromance

    tricky entity names

    Where is A Bug’s Life playing …Let It Be was recorded …… a mutation on the for gene …

    world knowledge

    Mary and Sue are sisters.Mary and Sue are mothers.

    But that’s what makes it fun!

    the New York-New Haven Railroadthe New York-New Haven Railroad

    Why else is natural language understanding difficult?

  • Levels of Language

    • Phonetics/phonology/morphology: what words (or subwords) are we dealing with?

    • Syntax: What phrases are we dealing with? Which words modify one another?

    • Semantics: What’s the literal meaning?• Pragmatics: What should you conclude from the

    fact that I said something? How should you react?

    20

  • 21

    Recent Trend of Application using NLPSection

    • Summary of Gartner Report, 2014

  • 22

    Recent Trend of Application using NLP/AISection

    • Summary of Gartner Report (cont.)– Scale of Market:

    • 53 billion $ in 2012, will grow to 113 billion $ in 2017• About 6 billion $ in 2015 in domestic market (2012, KISTI report)

    – Riffle effect: • About 1.1 billion users will use Intelligent Personal Assistant system in 2015• About 1 billion vehicles will using Artificial Intelligence

    – NLP using Deep Learning: • Recent Watson adopted cloud system for distributed computing• MS launched “Adam” project using Neural Network technique

  • IOT2H (Internet of things to Human)

    23

    Siri KGSDS UI/UX

    WATSON

    Red antMajestykapps

    genieMD

    IOT2H Platform-communication (logos-pathos-ethos): natural language processing/emotion-thinking (smart): reasoning/ontology-knowledge (exo-brain): knowledge question answering/retrieval

    IOT2H service- co-op service (human in the loop) for health, home, mobile, education

  • Multi-domain ontology reasoning

    dialog systems for intelligent

    assistant

  • SPOKEN DIALOG SYSTEM (SDS)

  • Interactive Question Answering New challenges for Question Answering System [TREC ciQA; HLT-NAACL2006 workshop]

    Series of related questions in a session / Interact with other people Should handle anaphora, ellispses and other discourse related problems But still mainly user initiative; no dialog “management”

    POS Tagging

    Answer TypeIdentification

    AnswerJustification

    Query Formation

    Dynamic AnswerPassage Selection

    Answer Finding

    DocumentRetrieval

    Answer Type

    Answer1

    Question-m

    Question2Question1

    ……..

    Answer2…….Answer-m

  • Tele-service

    Car-navigation Home networking

    Robot interface

    SDS APPLICATIONS

  • ASR (automatic speech recognition)

    FeatureExtraction Decoding

    AcousticModel

    PronunciationModel

    LanguageModel

    버스 정류장이어디에있나요?

    Speech Signals Word Sequence

    버스정류장이어디에있나요?

    NetworkConstruction

    SpeechDB

    TextCorpora

    HMMEstimation

    G2P

    LMEstimation

    WO

    )()|(maxargˆ WPWOPWLW∈

    =

  • SPEECH UNDERSTANDING (in general)

    Computer Program

    Speaker ID /Language ID

    Sentiment / Opinion

    Named Entity / Relation

    Topic / Intent

    Speech Segment

    Summary

    Syntactic / Semantic Role

    SQL

    Meaning Representation

    Dave /English

    Nervous

    LOC = pod bayOBJ = door

    Control the Spaceship

    Open the doors.

    Open=Verb, the=Det. ...

    select * from DOORS where ...

  • REPRESENTATION Semantic frame (slot/value structure) [Gildea and Jurafsky, 2002]

    An intermediate semantic representation to serve as the interface between user and dialog system

    Each frame contains several typed components called slots. The type of a slot specifies what kind of fillers it is expecting.

    “Show me flights from Seattle to Boston”

    ShowFlight

    Subject Flight

    FLIGHT Departure_City Arrival_City

    SEA BOS

    FLIGHT

    SEABOS

    Semantic representation on ATIS task; XML format (left) and hierarchical representation (right) [Wang et al., 2005]

  • Knowledge-based Systems Knowledge-based systems:

    Developers write a syntactic/semantic grammar A robust parser analyzes the input text with the grammar Without a large amount of training data

    Previous works MIT: TINA (natural language understanding) [Seneff, 1992] CMU: PHEONIX [Pellom et al., 1999] SRI: GEMINI [Dowding et al., 1993]

    Disadvantages1) Grammar development is an error-prone process2) It takes multiple rounds to fine-tune a grammar3) Combined linguistic and engineering expertise is required to

    construct a grammar with good coverage and optimized performance

    4) Such a grammar is difficult and expensive to maintain

    31

  • Two Classification Problems

    HOW TO SOLVE: STATISTICAL APP

    Find Korean restaurants in Daeyidong, PohangInput:

    Output: SEARCH_RESTAURANT

    Dialog Act Identification

    FOOD_TYPE ADDRESS CITY

    Find Korean restaurants in Daeyidong, PohangInput:

    Output: Named Entity Recognition

  • Encoding:

    x is an input (word), y is an output (NE), and z is another output (DA).

    Vector x = {x1, x2, x3, …, xT} Vector y = {y1, y2, y3, …, yT} Scalar z

    Goal: modeling the functions y=f(x) and z=g(x)

    PROBLEM FORMALIZATION

    x Find Korean restaurants

    in Daeyidong

    , Pohang .

    y O FOOD_TYPE-B O O ADDRESS-B O CITY-B O

    z SEARCH_RESTAURANT

  • MACHINE LEARNING FOR SLU Background: Maximum Entropy (a.k.a logistic regression)

    Conditional and discriminative manner Unstructured! (no dependency in y) Dialog act classification problem

    Conditional Random Fields [Lafferty et al. 2001] Structured versions of MaxEnt (argmax search in inference) Undirected graphical models Popular in language and text processing Linear-chain structure for practical implementation Named entity recognition problem

    z

    x

    yt-1 yt yt+1

    xt-1 xt xt+1

    fk

    gk

    hk

  • DIALOG MANAGEMENT GOAL Answer your query (e.g., question and order)

    given the task domain It includes : Provide query results Ask further slot information Confirm user utterance Notify invalid query Suggest the alternative

    Related to dialog complexity and task complexity.

    In practice Find the best system action a given the dialog state s

  • DESIGN ISSUES Task complexity How hard the task is? How much the system has domain knowledge?

    Simple Complex

    Call Routing

    CollaborativePlanning

    WeatherInformation

    Conversational English Tutoring

    AutomaticBanking

  • DESIGN ISSUES Dialog complexity Which dialog phenomena are allowed

    Initiative strategies e.g., system-initiative vs. user-initiative vs. mixed-initiative

    Meta-dialogs; the dialog itself e.g., Could you hold on for a minute?

    Subdialogs; clarification/confirmation e.g., You selected KE airlines, is it right?

    Multiple dialog threads e.g., domain switching

  • DIALOG EXAMPLES Example 3

    U: I’d like to have African food in Gangnam, Seoul S: Sorry, there are no African restaurants. S: How about American restaurants in Gangnam, Seoul?U: No I don’t like it.S: What is your favorite food?U: I like grilled and seasoned beef S: So, how about Korean restaurants?U: Good.

    Mixed-initiative Implicit/Explicit confirmation Recommends the alternative when query fails Most natural dialog flow

  • KNOWLEDGE-BASED DM (KBDM) Rule-based approaches Early KBDMs were developed with handcrafted

    rules (e.g., information state update). Simple Example [Larsson and Traum, 2003]

    Agenda-based approaches Recent KBDMs were developed with domain-

    specific knowledge and domain-independent dialog engine.

  • AGENDA-BASED DM RavenClaw DM (CMU) Using Hierarchical Task Decomposition

    A set of all possible dialogs in the domain Tree of dialog agents Each agent handles the corresponding part of the dialog

    task

    [Bohus and Rudnicky, 2003]

  • Vanilla EXAMPLE-BASED DM (EBDM) Example-based approaches

    Dialog State Space

    Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = , Previous Main Goal = Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이어디지 ?System Action = inform(Floor)

    Dialog CorpusUSER: 회의 실이 어디지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE =회의실]SYSTEM: 3층에 교수회의실, 2층에대회의실, 소회의실이있습니다. [System Action = inform(Floor)]

    Turn #1 (Domain=Building_Guidance)

    Dialog Example

    Indexed by using semantic & discourse features

    Having the similar state

    ),(argmax* heSe iEei∈

    =

    Cheongjae Lee, Sangkeun Jung, Seokhwan Kim, Gary Geunbae Lee. Example-based dialog modeling for practical multi-domain dialog system. speech communications, 51:5 (466-484), May 2009

  • STOCHASTIC DM Supervised approaches [Griol et al., 2008] Find the best system action to maximize the

    conditional probability P(a|s) given the dialog state Based on supervised learning algorithms

    MDP/POMDP-based approaches [Williams and Young, 2007] Find the optimal system action to maximize the reward

    function R(a|s) given the belief state Based on reinforcement learning algorithms

    In general, a dialog state space is too large So, generalizing the current dialog state is important

  • Template-based System Utterance Generation

    System Utterance Generator

    SystemTemplate

    DB

    System Action

    Dialog Frame

    Retrieved Result

    Inform_cast

    Program : 시크릿 가든

    Cast : 현빈, 하지원

    의주인공은입니다.

  • OOD/DD (Out-of-Domain/Domain Detection)

    Utterance

    Domain Detection

    IN-DOMAIN

    Task Dialog Service

    OOD-CHAT

    Chat Dialog Service

    OOD-TASK

    Rejection Message

  • OOD Utterance Rejection (Confidence Combination Approach)

    Score– S(i) = λFOR * SFOR(i) + λDOD * SDOD + λDAC(i) * SDAC + λIDV(i) * SIDV(i)

    FOR

    DAC

    IDV

    DOD

    NER

    Positive example : IN-DOMAIN corpusNegative example : OOD-CHAT corpusFeature : lexical unigram & bigram

    Data : IN-DOMAIN corpusFeature : lexical unigram & bigram

    Corpus : TID corpusFeature : lexical unigram & bigram

    Data : TID corpusFeature : lexical features+ Named entity dictionary

    Positive example : TID corpusNegative example : OOD-CHAT corpusFeature : OOV-LSP unigram & bigram

    ScoreFOR

    ScoreDOD

    ScoreDAC

    ScoreIDV

    FinalIn-DomainVerification

    IN-DOMAIN

    OOD

    λ

    Seonghan Ryu, Jaiyoun Song, Sangjoon Koo, Soonchoul Kwon, Gary Geunbae Lee. Detecting multiple domains from user’s utterance in spoken dialog system. Proceedings of the international workshop series on spoken dialog systems (IWSDS 2015), Jan 2015, Busan

  • MULTI-MODAL DIALOG SYSTEM

    x y

    InputGesture

    OutputSystem

    Response

    (x, y)

    Training examples

    Learning algorithm

    InputSpeech

    Inputface

  • TASK PERFORMANCE AND USER PREFERENCE Task performance and user preference for

    multimodal over speech only interfaces [Oviatt et al., 1997] 10% faster task completion, 23% fewer words, (Shorter and simpler linguistic constructions) 36% fewer task errors, 35% fewer spoken disfluencies, 90-100% user preference to interact this way.

    • Speech-only dialog system

    Speech: Bring the drink on the table to the side of bed

    • Multimodal dialog System

    Speech: Bring this to herePen gesture:

    Easy, Simplified

    user utterance !

  • Dialog System Development Toolkit Features

    Web-based Interface Providing easy-to-use interfaces for developers Controlling complicated processes in an efficient and stable manner

    Domain Dialog Corpus

    Definition SLU Corpus

    NLG Template

    Contents

    Statistics

    Validation

    Training

    Evaluation

    Dialog System

    Log Analysis

    Design Acquisition& Annotation

    RunningTraining Maintenance

    WorkflowScreen shot

    Donghyeon Lee, Kyungduk Kim, Cheongjae Lee, Junhwi Choi, Gary Geunbae Lee. D3 toolkit: A development toolkit for daydreaming spoken dialog system. Proceedings of the 2nd International Workshop on Spoken Dialog Systems Technololgy (IWSDS 2010), Oct 2010, Japan. (LNAI 6392, Springer)

  • AUTOMATED DIALOG SYSTEM EVALUATION

    Sangkeun Jung, Cheongjae Lee, Kyungduk Kim, Minwoo Jeong, Gary Geunbae Lee. Data-driven user simulation for automated evaluation of spoken dialog systems, computer speech and language, 23(4): 479-509, Oct 2009

  • Querying with Inference Engine

    Match entry ChannelFeb 5 ManU vs Chelsea football KBS

    Let’s watch Wayne Rooney’s game

    SLU

    Wayne Rooney : Person name

    Query Generation

    SELECT ?match ?entry ?channelFROM WHERE { ?match owl:hasMonth owl:Dec .

    ?match owl:hasDay owl:d_12 .owl:Rooney owl:isMemberOf ?t .?match owl:hasTeam ?t .?match owl:hasEntry ?entry?match owl:hasChannel ?channel}

    Result

    HyeongJong Nho, Cheongjae Lee, Gary Geunbae Lee. Ontology-based inference for information-seeking in natural language dialog system. Proceedings of the 6th IEEE international conference on industrial informatics (IEEE INDIN 2008) July 2008, Dajeon Korea

  • 51

    Platform: Multi-Domain Ontology Reasoning Intelligent Assistant Dialog System Platform

    Spoken Language Understanding (SLU)

    Input Sentence

    Knowledge Graph

    Intent Determination Named Entity Recognition

    Output

    Action Selection

    Response Generation

    Service Execution

    Complete

    POMDP-based Disambiguation

    Discourse & Anaphor Processing

    YesNo

    Ontology / Reasoning Service AgentA

    PITask DB/KB

  • 52

    Open-Domain Spoken Language Understanding

    • Traditional spoken dialog systems first detect a domain from the input sentence and perform domain-specific SLU

    Ontology

    Input Sentence

    Spoken Language Understanding (SLU)

    Intent Determination Named Entity Recognition

    Domain Selection

    Semantic Representation

    Domain

    Input Sentence

    Domain Detection

    SLUTV Program Guide

    SLUMusic Guide

    SLURestaurant Guide

    • However, we first perform open-domain SLU

    • We exploit ontology as important resource in understanding processes

    Patent pending

  • 53

    • Open named entity recognition (AIDA)– 1. mentions are detected using the Stanford NER Tagger– 2. mentions are mapped onto canonical entities in a knowledge base

    Open-Domain Spoken Language Understanding

    Mentions

    Candidate EntitiesKnowledge Bases

    Mention-EntityPair

    Entity-EntityPair

    Yosef et al. “AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables,” Proc. VLDB 2011

  • Open-Domain Named Entity Recognition

    Detection of NE Mentions

    Input Sentence

    Dictionary

    Filtering of NE Candidates

    NE Candidates

    Filtered NE Candidates

    Evaluation of NE Combinations Semantic LM

    Generation of NE Combinations

    NE Combinations

    Best NE Combination

    Overall Architecture Goals

    Mendes et al., “DBPedia Spotlight: Shedding Light on the Web of Documents”, Proc. International Conference on Semantic Systems 2011 Yosef et al. “AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables” Proc. International Conference on

    Very Large Databases 2011 Roth et al. “Wikificationand Beyond: The Challenges of Entity and Concept Grounding”, Tutorial in ACL, 2014

    Large-scale named entity

    dictionary from Knowledge base

    (e.g. DBpedia, Freebase, Yago)

    Entity type disambiguation is

    performed based on semantic

    language model

  • Detection of Multi-intents from a Sentence

    Traditional spoken dialog systems focus on processing simple input sentences

    that express only one intent → single intent (SI) type

    However, in the real world, users often express multiple intents (MIs) within one

    dialog turn → MI conjunctive (MI.C) and MI non-conjunctive (MI.N) types

    We named this task MI detection (MID)

    “what is the genre of big bang theory and tell me the story about it”

    Detection of multi-intents

    search-genre

    search-introduction

    User’s Utterance

    55

  • Detection of Multi-intents from a Sentence

    POS Tagging

    Detection of Conjunction

    Disambiguation of Sentence Boundary

    Restoration of Original Sentences

    Evaluation of multi-intent hypotheses

    Detection of single-intent

    Input Sentence

    POS-tagged Input Sentence

    Multi-intenthypotheses

    Single-intent

    Final answer

    Multi-intenthypotheses

    SI MI.C MI.N Avg.

    Baseline 97.04% 65.37% 65.08% 87.50%

    Proposed 96.62% 92.11% 94.40% 95.61%

    SI MI.C MI.N Avg.

    Baseline 96.64% 60.32% 63.02% 86.15%

    Proposed 95.95% 94.17% 92.07% 95.10%

    Korean

    English

    Overall Architecture Results

    Seonghan Ryu, Junhwi Choi, Younghee Kim, Sangjoon Koo, Gary Geunbae Lee. A two-stage approach to multi-intent detection for spoken language understanding. Submitted to the 40th international conference on acoustics, speech and signal processing (ICASSP 2015), April 2015, Brisbane

  • Out-of-Domain / Domain Detection

    Traditional spoken dialog systems assumed that all user utterances belong to only one domain

    0.8 0.2 0.3 0.9

    Extraction of Features

    “I want news now”

    Binary Classification

    ...Feature vector: X

    xi = [0 ... 1]y = {positive, negative}

    Word sequence: W

    x1 x2 xn-1 xn

    PositivePositive or negative: y

    Ryu et al. “A hierarchical domain model-based multi-domain selection framework for multi-domain dialog systems,” Proc. Coling 2012 Ryu et al. “Exploiting out-of-vocabulary words for out-of-domain detection in dialog systems,” Proc. BigComp 2014 Ryu et al. “Detecting Multiple Domains from User’s Utterance in Spoken Dialog System,” Proc. IWSDS 2015

    However, in the real world, users often express multi-domain requests or out-of-domain requests

    We proposed a framework that performs multi-domain detection and out-of-domain detection

    In each domain, various features are extracted from an input sentence and perform binary classification

    Any news is on now?

    User’s Utterance

    Spoken Language Understanding

    TV epg

    Radio epg

  • Out-of-Domain / Domain Detection

    Extraction of features

    Input sentence

    Part-of-speech tagging

    Preprocessed sentence

    Intent determination

    NER

    Intent determination model

    NER model

    Lexical LM scoring

    Intent and NEco-occurrence table

    Lexical LM score

    LSP LM scoring

    LSP LM score

    Intent

    Named entities

    Mapping

    Semantic consistency

    Lexical LM

    LSP LM

    LSP lexicon

    x1: confidence score of intent determination

    x4: probability of the input sentence x5: probability of the lexico-semantic pattern of the input sentence

    x2: confidence score of named entity recognition

    x3: semantic consistency of intent and named entities

    ※We are currently working on exploiting distributed word representation in language modeling

  • 59

    POMDP-DM with Hybrid ArchitectureSection

    • Motivation of proposed method– Uncertainty Problem in Deterministic-DM

    • Difficulty in making proper actions for given ambiguous input– Scalability Problem in POMDP-DM

    • Difficulty in designing / tracking dialog state• Difficulty in training POMDP policy• Difficulty in eliciting system action

    • Core idea of the hybrid architecture – Generate summary meta-actions with POMDP framework– Translate the actions into system output with Deterministic framework

  • 60

    POMDP-DM with Hybrid ArchitectureSection

    • Concept diagram of proposed architecture

    Ambiguous Input Meta Action

    Meta Action Selector Service DM

    Input

    CorrespondingComponent

    OutputMeta Action = Confirm

    System Action

    POMDPAction Selector

    Service DM(Rule-based DM,

    Example-based DM)

    Meta Action = Submit

  • 61

    POMDP-DM with Hybrid ArchitectureSection

    • Main architecture of proposed architecture

    Tracker Part

    TrackerModel

    FeatureExtractor

    Meta action selector

    POMDPAction

    Selector

    SummaryState

    Service Provider

    ServiceDialog

    Management

    SlotDB

    ResponseDB

    User Input Recognition

    ASR/NLUResult

    Corresponding Architecture

    POMDPModel

    Ambiguous User Input

    ASR/NLUResult

    Tracked Result

    MetaAction

    PhonemeMatcher

    Confirm 1st value

    Request Slot Value

    Provide Service Sentence

    POMDPArchitecture

  • 62

    POMDP-DM with Hybrid ArchitectureSection

    • Tracking Belief State– Estimation of observation 𝑜𝑜 from NLU hypothesis 𝐻𝐻 : 𝑃𝑃(𝑜𝑜|𝐻𝐻)

    • Phoneme/Word-level Matcher• Example : 𝑃𝑃 𝑜𝑜𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑚𝑚𝑙𝑙𝑙𝑙 = 𝑚𝑚𝑚𝑚𝑎𝑎𝑚𝑚 ≈ 0.78 ,𝑃𝑃 𝑜𝑜𝑙𝑙𝑒𝑒𝑒𝑒 𝑎𝑎𝑚𝑚𝑙𝑙𝑙𝑙 = 𝑚𝑚𝑚𝑚𝑎𝑎𝑚𝑚 ≈ 0.0

    – Estimation of belief update : b s′ s = 𝑃𝑃(𝑠𝑠𝑠|𝑠𝑠, 𝑎𝑎, 𝑜𝑜)• Rule-based Tracking to relieve computational complexity

    south north east west

    Area

    Probability

    none

    U1 : western food please?

    S1 : How may I help you?

    south north east west

    Area

    Probability

    none

    U2 : No, I don’t mean it

    S2 : You mean west restaurant?

  • 63

    POMDP-DM with Hybrid ArchitectureSection

    • Generating Meta-Action– Construction of Summary-State

    • Bulid Summary-State for 1st , 2nd value in each slot value• Also Build “User Intention Slot” Summary-State for 1st , 2nd value

    Blaise Thompson and Steve Young. “Bayesian update of dialog states : A POMDP framework for spoken dialog systems”Computer Speech & Language 2010 vol. 24 Issue 4. pp. 562-588

  • 64

    POMDP-DM with Hybrid ArchitectureSection

    • Generating Meta-Action (cont.)– Construction of POMDP framework

    • Construct separate POMDP framework for UI slot and NE slots• Train each POMDP framework independently

    0

    1

    1st 2nd

    b'(s)

    0

    1

    1st 2nd

    b'(s)

    POMDPAction Selector

    UI NE #1

    POMDP Action Selector

    Submit

    Restart

    Meta Response

    SystemActionModel

    Template DB

    Slot DB

    NE #2

    0

    1

    1st 2nd

    b'(s)

    Output Sentence

    Submit

  • 65

    POMDP-DM with Hybrid ArchitectureSection

    • Generating Meta-Action (multiple slot values)– Construction of POMDP framework

    • Construct separate POMDP framework for UI slot and NE slots• Train each POMDP framework independently

    0

    1

    1st 2nd

    b'(s)

    0

    1

    1st 2nd

    b'(s)

    POMDPAction Selector

    UI NE #1

    POMDP Action Selector

    Submit

    Restart

    Meta Response

    SystemActionModel

    Template DB

    Slot DB

    NE #2

    0

    1

    1st 2nd

    b'(s)

    Output Sentence

    Submit

    UI – POMDP Model Training

    NE – POMDP Model Training

    Model Construction

    UI – POMDP Model Training

    NE – POMDP Model Training

    Model Construction

  • 66

    POMDP-DM with Hybrid ArchitectureSection

    • Experiment (Change of Reward on Learning curve)– Observing learning curve in training process

    • Each POMDP Component were trained in 400 Epochs Convergence over reward was observed

    -60

    -50

    -40

    -30

    -20

    -10

    0

    10

    0 100 200 300 400

    Ave

    rage

    Rew

    ard

    Epoch

    Average Reward [UI Slot]

    -120

    -100

    -80

    -60

    -40

    -20

    0

    20

    0 100 200 300 400

    Ave

    rage

    Rew

    ard

    Epoch

    Average Reward [NE Slot]

    Sangjun Koo, Seonghan Ryu, Kyusong Lee, Gary Geunbae Lee. Scalable summary-state pomdp hybrid dialog system for multiple goal drifting requests and massive slot entity instances. Proceedings of the international workshop series on spoken dialog systems (IWSDS 2015), Jan 2015, Busan

  • 67

    Ontology-based Inference System [1/3]Section

    • Ontology-based inference– Integrate cross-domain knowledge by ontology and its inference rules

    • Used for :– IOT dialog system– Smart home– Smart healthcare

  • 68

    Ontology-based Inference System [2/3]Section

    • Ontology– OWL

    • Family of knowledge representation languages for knowledge bases

    • Inference rules– SWRL

    • OWL-DL + RuleML– The datalog sublanguage of Horn clause

    FastComputer(? c)← Computer ? c ⋀ hasCPU(? c, ? cpu)⋀hasSpeed(?cpu, ? sp)⋀HighSpeed(? sp)

  • 69

    Ontology-based Inference System [3/3]Section

    Spoken Language Understanding

    Natural Language Generation

    Dialog Manager

    Dialog Modeling

    Knowledge Manager

    Inference Engine

    Ontology Resource

    ASR output

    Named entitiesUser intentions

    System response

    KnowledgeSystem action

    Generated querystatement

  • 70

    An Example Scenario of Inference ProcessSection

    Semantic Representation for Input Sentence

    raw utterance sentence I want to eat something spicy.

    intent ask_food_recommendationnamed entity something spicyAfter Searching Ontologiesintent ask_food_recommendation_in_fridge

    named entity something spicyKnowledge for User/Environmentsspeaker Tomfavorite food Tteok-bokki, Spaghetti, ...fridge materials Tteok, hot pepper paste, spring onion,

    ...Knowledge for Reasoningpremise

    1Tom likes Tteok-bokki, Spaghetti, ...

    premise2

    In the fridge, there are Tteok, hot pepper paste, spring onion, ...

    premise3

    The recipe of Tteok-bokki is Tteok, hot pepper paste, and spring onion

    premise4

    Tteok-bokki is a kind of spicy food

    premise5

    Tom now wants to eat something spicy

    Output of DMsystem action suggest_foodoutput sentence template How about eating {food} ?

    output sentence How about eating Tteok-bokki?

  • 71

    Experimental resultSection

    Before inference

    Added weather info

    After inference

    Jaiyoun Song, Seonhan Ryu, Sangjun Koo, Gary Geunbae Lee. Ontology reasoning-based intelligent assistant for smart home. Proceedings of the IEEE spoken language technology workshop (SLT 2014), Dec 2014, Nevada (demo presentation)

  • Deep Neural Network

    Deep Neural Network

    Neural network + Multiple non-linear hidden layers

  • Why Deep Neural Network?

    Deep levels of abstraction Mimics the cognition process of human

    From low level (simple) to high level (complex)

    73

  • Why Deep Neural Network?

    Integrated learning Automatic feature extraction

    Traditional machine learning methods

    Deep neural network : Feature extractor + Classifier

    74

  • Deep Neural Network Today

    Difficulties of DNN

    1. Difficult to train

    Cannot use back-propagation algorithm (vanishing gradients)

    Unsupervised pre-training

    2. Computation-intensive Many parameters

    GPGPU, cloud computing

    3. Over-fitting

    regularization, drop-out technique

    75

  • Deep Neural Network Today

    Deep Belief Network [Hinton 06]

    Pre-train layers with an unsupervised learning algorithm

    Then, fine-tune the whole network by supervised learning

    DBN are stacks of restricted Boltzmann machines (RBMs)

    76

  • Deep Neural Network Today

    Autoencoder

    NN whose the output is the same as the input

    To learn is to compress data

    Autoencoder learns the encoding (representation) of data

    �𝑚𝑚

    𝑦𝑦𝑚𝑚 − 𝑥𝑥𝑚𝑚 2

    Learn to minimize

    77

  • Deep Neural Network Today

    Drop-out method [Hinton 12]

    Drop out some weights randomly

    Can reduce over-fitting problem

    78

  • Deep Neural Network Today

    Rectified linear unit (ReLU) Activation function used instead of sigmoid

    Sparse coding : only some neurons have non-zero values

    𝑔𝑔 𝑥𝑥 = max(0,𝑥𝑥)

    0

    0

    79

  • Deep Neural Network Today

    Convolution neural network [LeCun 98] Sparse network with local features within the window only

    The weights are shared between windows

    Popular for image recognition

    Two kinds of layers, alternatively Convolution layer : Extract features from the previous layer

    Max-pooling layer : Sub-sample by taking the maximum

    80

  • Deep Neural Network Today for NLP

    Word representation 1. One-hot vector

    Ex. [0 0 0 0 0 …… 0 0 0 1 0 0 …… 0 0 0 0 0 0 0]

    High dimension : 20K (speech) ~ 3M (Google 1T)

    2. Class-based word representation

    Hard clustering

    Ex. Brown clustering (Brown et al. 1992)

    3. Continuous representation

    Ex. Latent semantic analysis (LSA)

    Random projection

    Latent Dirichlet Allocation (LDA)

    HMM clustering

    Distributed representations (Neural word embedding)

    Dense vector

    Used as pre-training and supervised training improves the representation

    81

  • Deep Neural Network Today

    Neural network language model (NNLM) Language model

    Model to predict the next word given the context

    NN language model

    Two hidden layers

    Training complexity is high

    Between hidden output

    Ex. Hierarchical softmax

    Negative sampling

    Ranking (hinge loss)

    w(t-3)

    w(t-2)

    w(t-1)

    w(t)

    input projection hidden output

    82

  • Deep Neural Network Today

    Neural network language model (NNLM) Negative sampling (unsupervised training)

    A word and its context is a positive sample

    A random word in that context is a negative sample

    Trained to be Score(positive) > Score(negative)

    w(t-3)

    w(t-2)

    w(t-1)

    w(t)

    input projection hidden output

    83

  • Deep Neural Network Today

    Word2vec Remove the hidden layer

    1000x speed-up

    Continuous bag-of-words (CBoW)

    Predicts the current word given the context

    Skip-gram

    Predicts the context given the wordw(t-3)

    w(t-2)

    w(t-1)

    w(t)

    input projection output

    84

  • DNN based Korean Dependency Parsing [Changki Lee,2014]

    Transition-based + Backward parsing O(N)

    Constituency corpus Dependency corpus

    After pre-processing

    Deep learning based ReLU + Dropout

    Better than sigmoid

    Korean Word Embedding NNLM, Ranking (hinge loss, logit loss)

    Word2vec

    Feature Embedding Auto tagged PoS (stack + buffer)

    Dependency label (stack)

    Distance information

    Valency information

    Mutual information Massive corpus automatic parsing 85

  • DNN based Multi-lingual Multi-task NLP [postech on-going]

    Collecting corpus for all languages, all tasks is impossible

    Adaptive learning

    Methods to use one language/task’s information for another language/task

    Distributed word embedding is essential

    Language transfer One language to another language

    Pre-train with one language and further train with another language

    Multi-task learning

    One task to another task

    Hidden layers : trained with all tasks’ data

    Output layer : trained with task-specific data Parsing

    Tagging

    Semanticrole labeling

    *Sooncheol Kwon, Byungsoo Kim, Seonyeong Park, Sangdo Han, Gary Geunbae Lee. Multi-lingual knowledge transfer for dependency parsing using deep neural network, submitted

    86

  • Demo – youtube postech isoft https://www.youtube.com/watch?v=4jg0Tknl-Rw

    multi-domain dialog system multi-modal dialog system (smart home) Inference dialog system One-step asr error correction

  • Multi-strategy knowledge search

    Question Answering (QA) systems

  • 89

    Multi-Source Hybrid Question Answering (QA) System

    User Input

    Keyword ProcessQuestion Processing

    Sparql Query Search

    Web basedRDF Triple Database

    SparqlQuery

    generator

    SparqlQuerySearch

    Answer Selection

    Answer

    AnswerMerging

    DBpedia

    Open Information Extraction (off line)

    WebDocuments

    (wiki)

    RDF T ripleExtrac tion

    Documentsindexing

    Detect Question or Keywords

    AnswerRanking

    Question Keywords

    Keyword to Entity

    Keyword to Property

    Query Generator

    TripleExtractor

    NaturalLanguage Generator

    Report

    TemplateDB

    DocumentProcessing

    Relevant WebDocuments

    IR/web search(Lucene)

    AnswerProcessing

    PassageRetrieval

    Passages

    Possible A nswer Extrac tion &Formulation

    Question Analysis

    Slot,Template Extractor

    Keyword,Answer Type

    Extractor

    Keyword,Answer Type

    Slot, Template

  • 90

    Section

    RDFKnowledge

    Base

    Human 지식베이스

    Answer candidates Generation

    Answer candidates

    Features for Query

    Lexical Alteration

    Disambiguation

    Merge retrieved

    Real SPARQL generation

    Focus, LAT, query template…Graph pattern for SPARQL

    Synonym/alternative words

    KB property/entity

    • Template-based approach [Unger et al, WWW 2012]

    Knowledge based QA System

    Seonyeong Park, Hyosup Shim, Gary Geunbae Lee. Isoft at QALD-4: Semantic similarity based question answering system over linked data. in Cappellato, L., Ferro, N., Halvey, M., and Kraaij, W., (eds.), CLEF 2014 Labs and Workshops, Notebook Papers (Qald task). CEUR Workshop Proceedings, vol-1180, CEUR-WS.org (2014). Sept 2014, Sheffield

  • 91

    Section

    Knowledge based QA System

    Semantic parsing based approach [Berant et al, EMNLP 2013]

    Where was Obama born?

  • 92

    QA on Knowledge base Section

    • Semantic parsing based approach – Maps natural language sentences to formal semantic representations– Independent of word order, paraphrase– Translating into KB query language is relatively easy

    – Requires

    • Well-defined formal representation• Set of concepts (Knowledge base)

    – Previous researches focused on toy-sized KBs, recent ones utilizes bigger, more general KBs like Dbpedia, Freebase such as Sempre, paraSempre.

  • 93

    QA on Knowledge base Section

    • Semantic parsing based approach [Berant & Liang, ACL 2014]– Process

    • Making segmentation– Generate segmentations of sentence

    • Translate segments into KB vocabulary– Lookup each segment from KB vocabulary– Keep two dictionaries of KB concepts

    » Dictionary of entity» Dictionary of property

    – Match named entity by string similarity– Match property by natural language – property model

    • Combining– Combine segment into single formal representation– Performed based on combining rule

    • Rewrite to query language

  • 94

    QA on Knowledge base Section

    • Semantic parsing based approach– Example

    • Question : Where was Barack Obama born?• Segmentation

    – [Where] was [Barack Obama] [born] ?

    • Disambiguation– [Where] Type.Location– [Barack Obama] Barack_Obama– [born] PeopleBornHere

    • Combining– Type.Location ^ PeopleBornHere.Barack_Obama

  • 95

    QA on Knowledge base Section

    • Compiling natural language-to-property mapping– Aligning approach

    • Align pseudo triples from text to ones from KB– Extract pseudo triple from text

    » Ex> Mary Todd married Abraham Lincoln on November 4, 1842»

    – Disambiguate to KB entities» »

    – Align to existing “real” triples in KB» pseudo triple» real triple from KB

    – Collect matched phrase-property pairs from aligned triples» prefix “!” means reverse order

  • 96

    QA on Knowledge base Section

    • Experiment

    – Test set• 138 questions from webquestions train sets• Wh-questions on figure with single answer• Containing no alternative forms of named entity

    precision recall F1-score

    0.7301 0.8911 0.8025

    Seonyeong PARK, Hyosup SHIM, Sangdo HAN, Byeongsoo KIM, Gary Geunbae LEE. Multi-source hybrid question answering system. Proceedings of the international workshop series on spoken dialog systems (IWSDS 2015), Jan 2015, Busan (demo presentation)

  • 97

    Open Information Extraction [Etzioni, Wu@UW] Section

    • Extract triples from an sentence.– triple format : < argument1 ; relation ; argument2 >– argument : noun phrases in an sentence– relation : phrase shows relationship between two arguments

    • Ex) sentence : Gautama Buddha taught primarily in Northeastern Indiatriple : < Gautama Buddha ; taught in ; Northeastern India >

    • Open IE does not require any pre-specified relations.• Suitable for IE on the Web scale.

  • 98

    Open Information ExtractionSection

    Dependency Pattern based IE [WOE, Wu&Weld, ACL 2010]• Extraction template :

    • Ex)

    • Learning extraction template– collect training data (triple-sentence pair) automatically (bootstrapping)– learn extraction template from training data

    arg1 arg2rel

    nsubj prep_in< arg1 ; rel in ; arg2 >

    Gautama Buddha taught primarily in Northeastern India

    nsubjadvmod

    nnamod

    prep_in

    triple : < Guatama Buddha ; taught in ; Northeastern India >

  • 99

    Open Information ExtractionSection

    SRL based IE [Christensen et al, NAACL workshop 2010]• SRL : identifying arguments of a predicate with their roles

    – possible to convert SRL result to Open IE triples

    • Ex) Eli Whitney created the cotton gin in 1793– SRL result

    • predicate : created• arg0 : Whitney (Eli Whitney)• arg1 : gin (cotton gin)• argm-TMP : in (in 1793)

    – conversion to Open IE triple style• < Eli Whitney ; created ; cotton gin >• < Eli Whitney ; created cotton gin in ; 1793 >

  • 100

    Open Information ExtractionSection

    Current Implementation (postech, combined)• Bootstrapped Dependency pattern + SRL result

    – SRL : can only extract verb mediated relation with relatively high precision

    – Bootstrapped Dependency pattern : can extract both verb and noun mediated relation

    – Ex) Princeton economist Paul Krugman was awarded the Nobel prize• verb mediated relation :

    – < Princeton economist Paul Krugman ; was awarded ; the Nobel prize >

    • noun mediated relation : – < Princeton ; economist ; Paul Krugman >

    – Apply SRL based extraction to verb mediated relation, and Dependency pattern based extraction to noun mediated relation

  • 101

    Open Information ExtractionSection

    Experiment • Test Data

    – Data from “Fader et al, Identifying Relations for Open Information Extraction, 2011, EMNLP”

    – Randomly selected 500 web sentences -> We used 100 sentences among them

    • Result–

    *Byungsoo KIM, Hyosup SHIM, Sangdo HAN, Soonchoul KWON, Seonyeong PARK, Gary Geunbae LEE. Relation disambiguation using ontology type checking and semantic relatedness. Submitted

  • 102

    Applying Open IE to Knowledge Base Section

    Knowledge Base Augmentation• Triples extracted from Open IE can be used to augment

    existing knowledge bases– need argument and relation mapping to canonical form on the ontology

    (disambiguation)

    • Ex) Einstein married Elsa Lowenthal on 2 June 1919.– triple from Open IE

    • < Einstein ; married ; Elsa Lowenthal >– disambiguation

    • Einstein → Albert_Einstein• Elsa Lowenthal → Elsa_Einstein• married → spouse

    – DBpedia ontology RDF triple• < dbr:Albert_Einstein ; dbo:spouse ; dbr:Elsa_Einstein >

  • 103

    Applying Open IE to Knowledge BaseSection

    Disambiguation with Constraint• Relation phrases on the ontology have proper argument type.• Ex) < Alain_Connes ; birthPlace ; Draguignan >

    < Ayn_Rand ; birthPlace ; Saint_Petersburg >< type:Person ; birthPlace ; type:Place >

    < The_Birth_of_a_Nation ; director ; D._W._Griffith >

    < type:Film ; director ; type:Person >

    • Use this argument type constraint when disambiguating relation– disambiguate arguments first, then use argument type information for

    relation disambiguation

  • 104

    Scenario• Find the appropriate answer to the user question using raw text

    data

    Information Retrieval-based QA

    1. Where was Kim yunaborn?

    2. When did Kim yuna got gold medal?

    3. Who is Sotnikova?

    User’s Question Raw Text

    Text data

    Answer

    Bucheon, South Korea

    Section

  • 105

    • Architecture

    Information Retrieval-based QA

    Question Processing

    Answer Type Detection

    Entity extraction

    (NER)

    Triple extraction

    (Parser, SRL)

    Document Processing

    Question

    PassagesScoring

    Answer type Mapping to DBpedia

    Answer Type, Keywords

    Answer

    Answer ProcessingAnswer Candidates

    Extraction

    Documents scoring

    Passages

    Text(Wikipedia) Database

    Relevant Documents

    Answer Selection

    Answer Type

    Section

  • 106

    • Answer Type is important !– It can reduce the search space to find answer.– Regard answer type as type of named entity – Use ontology in the knowledgebase.

    Open Domain Semantic Answer Type

    … …

    Golf player Swimmer

    TennisPlayer

    Thing

    AgentActivity Drug Event

    Person

    Athlete

    Wrestler

    Game Sport

    GetMore detailInformation of answer

    Section

    Seonyeong Park, Donghyeon Lee, Seonghan Ryu, Byungsoo Kim, Gary Geunbae Lee. Hierarchical dirichlet process topic modelling for flexible answer type classification in open domain question answering. Proceedings of the 10th Asian information retrieval society conference (AIRS 2014), Sarawak, Dec 2014

  • 107

    • Hard to detect the answer type in the question using only lexical information.– Ex) Q: Who compose the “magic waltz” ? Answer Type: composer– Ex) Q: What did Bruce Carver die from? Answer type : reason

    • Use Semantic Information in the Question

    Open Domain Semantic Answer Type

    Extract various information of input question

    Input Questionquestion –answer pair

    web log

    Map the information to the class in ontology (need Inference)

    Answer Type

    Section

  • 108

    – Semantic answer type detector using Knowledgebase

    Open Domain Semantic Answer Type Section

    Extract property semantically similar with main verb in

    DBpedia

    Main verb, Focus, parsing result

    DBpedia

    Previous Hybrid(rule + supervised learning) Answer

    type classifier

    Question Parsing(Parser, SRL)

    user question

    Detect type of focus using type info of each property in

    DBpedia

    No

    yes

    Answer type Previous small size of answer type Ontology

    Measuring semantic similarity between previous ontology and

    DBpedia ontology

    Answer type in DBpedia ontology

    Answer type in DBpedia ontology

    Focus: focus is the word which will be replaced with an answer. Therefore, type of focus is same as answer type.

  • 109

    – Example of Semantic answer type detector using Knowledgebase• Example 1

    – Q: Who has been married to Tom Cruise?– Main verb: married– Focus: “Who”– Parsing information: who is the subject of married (main verb)– property semantically similar with main verb: wife– type of property information:– Answer type : person

    • Example 2– Q: Who resides in the high-rise?– Main verb: resides– Focus: “Who”– Parsing information: who is the subject of resides(main verb)– property semantically similar with main verb: residence– type of property information:– Answer type: person

    Open Domain Semantic Answer Type Section

    Gabrilovich, Evgeniy, and Shaul Markovitch. "Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis." IJCAI. Vol. 7. 2007.

  • 110

    – Extract Answer Candidates using open domain semantic answer type detector

    Open Domain Semantic Answer Type Section

    Keywords of the user question

    Answer Type check

    Answer Type

    passages

    Extract entities & recognize the type of entities

    Answer Candidates

    Open DomainSemantic Answer

    Type Detector

    user question

    DBpediaSparql

    Hoffart, Johannes, et al. "Robust disambiguation of named entities in text."Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011.

    DBpediaOntology

  • 111

    • Sentence Scoring in Passage– Kidman has been married twice: first to actor Tom Cruise, and now to country singer Keith Urban. S

    he has an adopted son and daughter with Cruise as well as two biological daughters. …. Cruise Kidman is the adopted son of actors Tom Cruise and Nicole Kidman. …..An example of the extravagance of a wedding locations is when Tom Cruise married Katie Holmes……/ …He was replaced as team leader by [[Ethan Hunt]] ([[Tom Cruise]]) after he was revealed: Impossible (film)……………………

    • Measuring sentence importance– Measuring between text and question similarity– Content Selection : Choose sentences to extract from the document– Query Focused Multi Document Summarization

    Answer Candidate Selection Section

  • 112

    • Select Important Sentence– We consider various similarity measures

    • Term similarity (Jaccard coefficient)

    – We can use not only answer type but also other information

    • Compare not only Answer type and entity similarity but also semantic and syntactic structure of sentences.

    • syntactic (Dependency Parser ) • semantic level (Semantic role labeler)

    Answer Candidate Selection Section

  • 113

    Textual Entailment Answer Selection

    T: Passages

    H: Substituted sentence

    Check the type of entities in the sentences (same as answer type or not) and select

    important sentences

    Answer Type

    Answer Candidates

    Answer Candidates scoring using Textual Entailment

    Substitute Focus with Answer candidates

    Answer

    Section

    Patent pending

    KBQA Answer

  • 114

    • Textual Entailment– Text(t) : Entailing text ( Candidate sentence)– Hypothesis(h) : Entailed text (question)– Ex)

    • true entailment– t : John is a fluent French speaker– h : John speaks French

    • false entailment– t : John was born in France– h : John speaks French

    – given t/h pair, cast the textual entailment task as a classification problem

    Textual Entailment Answer Selection Section

  • 115

    • Textual Entailment– Other Textual Entailment Architecture

    Textual Entailment Answer Selection Section

    Collections of NLPApache UIMA framework

    Standardized algorithms or knowledge resource (knowledge resources/ lexical syntactic resources).Different approach: transformation based, edit distance based, and classification based

    Magnini, Bernardo, et al. "The Excitement Open Platform for Textual Inferences.“, ACL demo

  • Keyword QA

    Goal Get keywords as input, return report answer

    messi team manager

    Lionel_Messi play at FC_Barcelona, Argentina_national_football_team.

    Tito_Vilanova is manager of FC_Barcelona.

    Extracted Data

    Search from Database

    116

  • Keyword QA

    Query Generator

    Natural Language Generator

    KeywordTo

    Entity

    KeywordTo

    Property

    TemplateDB

    Knowledge DB

    Triple Extractor

    Keyword Process

    DB Entity DB PropertyProperty

    Candidates

    Keyword Input

    Report

    SPARQL QueryQuery

    TripleTriple Set

    Messi team manager

    Lionel_Messi team

    Person/heightbirthDatebirthPlacecareerStationNumberPositionTeam…

    SELECT ?p ?o WHERE…

    Lionel_Messi team FC_Barcelona

    Lionel_Messi team FC_BarcelonaFC_Barcelona manager Tito_Vilanova

    FC_Barcelona is Lionel_Messi’s team. FC_Barcelona’s manager is Tito_Vilanova.

    Keyword Segmentator Messi, team, manager

    117

    http://...lionel_messi/

  • Keyword QA

    118

    Keyword Segmentation Segmentation using Lexicon

    Wikipedia Lexicon + additional lexicon

    Longest Match

    Lionel messi team manager

    Lionel messi, messi, team, manager,

    birthday, …

    Lionel messi, team, manager

  • Keyword QA

    Keyword-Entity/Property Matching Module

    Match user input keyword to entity/property of DB messi -> Lionel_Messi

    Kim yuna -> Kim_Yu_Na

    manager -> manager (property)

    birth -> birthDate

    AIDA (open source) Named entity disambiguation module

    Match to wikipedia entity

    ESA (open source) Word semantic similarity

    Keyword-Entity Matching Module

    Messi team

    Lionel_Messi team

    119

  • Keyword to Entity AIDA module (open-source)

    Accurate Online Disambiguation of Named Entities

    Find named entity, and match to Wikipedia page

    Entity Matching

    Input : “When did Barack Obama graduated Harvard Law School?”

    120

  • Keyword to Property Semantic match between keyword & property

    Explicit Semantic Analysis Module (open source)

    Property Matching

    Tom cruise

    birthDate 1962-07-03

    birthPlace Syracuse, New York, United States

    religion Scientology

    spouse Mimi_Rogers

    starring Interview_with_the_vampirestarring Top_Gun

    … …

    Tom cruise’s triple example

    Keyword ExampleTom cruise, birthday

    121

  • Keyword to Property Semantic match between keyword & property

    Explicit Semantic Analysis Module

    Property Matching

    Tom cruise

    birthDate 1962-07-03

    birthPlace Syracuse, New York, United States

    religion Scientology

    spouse Mimi_Rogers

    starring Interview_with_the_vampirestarring Top_Gun

    … …

    Tom cruise’s triple example

    Keyword ExampleTom cruise, wife

    122

  • Keyword QA

    Query Generator

    SPQRQL query generation Extract related triples

    Rule-based

    Query Generator

    Lionel_Messi team

    SELECT , ?p, ?o WHERE { …

    123

  • Query Generating Policy

    Lionel_Messi

    Person/heightteam

    birthPlace…

    Argentina_national_football_team

    FC_BarcelonaFC_Barcelona_B

    coachstadiummanager

    …169.2

    ArgentinaRosario,_Santa_FeSanta_Fe_Province

    capacitychairmanmanager

    areaTotalcapital

    currency…

    : Entity

    : Property

    ………

    Input keywords : messi, fc barcelona, manager

    124

  • Query Generating Policy

    Lionel_Messi

    Person/heightteam

    birthPlace…

    Argentina_national_football_teamFC_Barcelona

    FC_Barcelona_B…

    coachstadium

    manager…

    169.2

    ArgentinaRosario,_Santa_FeSanta_Fe_Province

    capacitychairmanmanager

    areaTotalcapital

    currency…

    : Entity

    : Property

    ………

    Input keywords : messi, team, manager

    125

  • Keyword QA

    Report Generator

    Report triple set, template Property-template matching data

    534 templates generated

    Report Generator

    Barak Obama graduated ColumbiaUniversity, and

    HarvardLawSchool. …

    Barak_Obama almaMater ColumbiaUniversity

    Barak_Obama almaMater HarvardLawSchool

    … … …

    almaMater graduated

    birthPlace borned in

    … …

    Extracted Triple Set

    Template Set

    126

  • Keyword QA

    NLG Template Generator

    Automatic Template Extraction Wikipedia-dbpedia

    Template Generator

    teamposition

    …(properties)

    play at plays as a plays as a

    127

  • Keyword QA

    Keyword Segmentation (81.64%) Whole data – 670 keyword queries

    Well-segmented – 547 queries

    Error - 123 queries

    Out of vocabulary (segmentation lexicon)

    System Answer Accuracy (95.1%) Whole data – 670 keyword queries

    Right answers – 637 answers

    Wrong answers – 33 answers

    Error case : property / entity matching error

    Sangdo Han, Hyosup Shim, Byungsoo Kim, Seonyeong Park, Seonghan Ryu, Gary Geunbae Lee. Keyworkd question answering system with reportgeneration for linked data. Proceedings of the 2015 International Conference on Big Data and Smart Computing (BigComp 2015), Jeju, Feb 2015 (short paper) 128

  • Demo movie Youtube postech isoft QA

    http://www.youtube.com/watch?v=P6yL5QiJQo0 KBQA IRQA Keyword QA OpenIE

    http://www.youtube.com/watch?v=P6yL5QiJQo0

  • Multi-party Open Proactive Dialog Systems

  • 131

    • Overall Architecture

    Overall Architecture

    Module Description

    Situation Feature Extraction Extract Feature from Situation (Voice Activity Detection, Speaker Detection, Previous Info Stacking, ETC..)

    Dialog Engagement Classify Dialog Engagement for Each Speaker ID

    Speaker Identification Classify Speaker and Assign New Speaker ID

    Always Listening ASR Recognize All Speech Sound

    Sentence Formation Regularity Checking Check Sentence Formation Regularity for Dialog Situation Feature

    Speaker ID Assignment to Sentence Assign Speaker ID to All Sentence from ASR

    ASR Error Correction Correct ASR Error before Passing Sentence to the Next Step

    Multiparty Language Understanding Language Understanding for Multiparty

    Multiparty Dialog System Dialog System for Multiparty

    Dialog Engagement

    (CNN)

    Always Listening ASR (Voice

    Activity Detection )

    Sound Signal

    SituationFeature

    Extraction

    Dialog Situation(Vision)

    Speaker Identification

    Natural Language

    Understanding &

    Dialog Management

    For Multiparty

    Speaker IDAssignment to

    Sentence

    Sentence Formation Regularity

    Checking (RNN)

    ASR Error Correction

    (RNN + Several Method)

  • 132

    • Dialog Engagement– Dialog Engagement to PC– Classify Dialog Engagement for Each Speaker ID

    • ScenarioA: Let’s have a dinner outside, in some fancy restaurant!B: Great! Where should we go to?C: I like FANCYFANCY restaurant.A: Yeah, FF restaurant is good.B: Then let’s go there to have dinner.A: But we brought our car in for servicing this morning. How can we go there?B: Maybe we should take a taxi and go to the repair shop. Where was it?PosChat: NiceCar repair shop is where you brought your car in.B: Okay. Then we go to the shop and go to there for dinner. Make a reservation at 7 p.m., PosChat.PosChat: Okay. I’ll make a reservation to FF restaurant for three people at 7 p.m.

    Dialog Engagement

    Engage

    Engage

    Engage

    Non-engageNon-engage

    Bohus, Dan, and Eric Horvitz. "Models for multiparty engagement in open-world dialog." Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 2009.

  • 133

    • Dialog Engagement Architecture

    Dialog Engagement

    Engage

    Non-engageNon-engage

    Camera Face Direction

    Lips Tracking

    Voice Activity Detection

    Always Listening ASR

    Recognition Result

    EngagementClassifier

    Dialog Engagement

    Model

    DialogManagement

    Learning by DNN

    EngagementPrevious

    State

    Dialog ManagementPrevious

    State

    Junhwi Choi, Jeesoo Bang, Gary Geunbae Lee. “Multiparty open-world dialog system on NAO robot”. Proceedings of SLT 2014, Dec 2014, Nevada (demo presentation)

  • 134

    • Automatic ASR Error Correction– Two Step Process– ASR Error Detection

    • Part of Speech Information based detection• Context based detection

    – ASR Error Correction• Word Sequence Matching based Correction• Recurrent Neural Network based Correction

    ASR Error Correction

    Current Syllable

    Previous Syllable Context

    Confused Phoneme of Next Syllable

    Probabilityof Next Syllable

    ASR Error Correction Performance 1-WER

    Basline (no Correction) 0.8357

    Word Sequence Pattern Matching 0.8813 (27.8% Error Reduction)

    Syllable RNN Only 0.8382 (1.5% Error Reduction)

    Combined RNN (Syllable RNN + Phoneme RNN) 0.8480 (7.5% Error Reduction)

    Word Sequence Pattern Matching + Combined RNN 0.8820 (28.1% Error Reduction)

    ASR Error Detection Performance

    F-Score DetectionAccuracy

    POS Label Pattern 0.4744 0.8266

    Word Dictionary by POS 0.3452 0.8653

    Word Co-occurrence 0.4143 0.7587

    Voting (threshold 2) 0.4967 0.8761

    Voting (threshold 1) 0.4879 0.7337

    Junhwi Choi, Donghyeon Lee, Seonghan Ryu, Kyusong Lee, Gary Geunbae Lee, ”Engine-independent ASR error management for dialog systems”, IWSDS 2014Junhwi Choi, Seonghan Ryu, Kyusong Lee, Younghee Kim, Jeesoo Bang, Seonyeong Park, and Gary Geunbae Lee. “ASR Independent Hybrid Recurrent Neural Network based Error Correction for Dialog System Applications”, Proceedings of the MA3HMI 2014 Workshop, Satellite workshop of INTERSPEECH 2014.

  • 135

    • Manual ASR Error Correction• ASR Error Correction Interface with Voice-only

    • ScenarioA: Let’s have a dinner outside, in some fancy restaurant!B: Great! Where should we go to?C: I like FANCYFANCY restaurant.A: Yeah, FF restaurant is good.B: Then let’s go there to have dinner. Okay. Then we go to the shop and go to there for dinner. Make a reservation at 7 p.m., PosChat.PosChat: Okay. I’ll make a reservation to Effa restaurant for three people at 7 p.m.A: FF restaurant.PosChat: Okay. I’ll make a reservation to FF restaurant for three people at 7 p.m.

    One-step Error Correction

    User Utterance

    Analysis Region

    Detection

    User Intention Understanding Correction

    Proceed Dialog

    Management

    Confirmation(Optional)

    Junhwi Choi, et al. "Seamless error correction interface for voice word processor." Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012.Junhwi Choi, Seonghan Ryu, Kyusong Lee, Younghee Kim, Jeesoo Bang, Seonyeong Park, and Gary Geunbae Lee. “ASR Independent Hybrid Recurrent Neural Network based Error Correction for Dialog System Applications”, Proceedings of the MA3HMI 2014 Workshop, Satellite workshop of INTERSPEECH 2014.

    Understand User Intention

    Proceed Correction

  • 136

    • User Intention Understanding– Characteristic of Clear Speech (Prosodic: Pitch, Duration, Intensity)

    – Characteristic of ASR Error (Pronunciation Similarity)– Accuracy of User Intention: 84.62%

    One-step Error Correction

    Junhwi Choi, Seonghan Ryu, Younghee Kim, Gary Geunbae Lee. “One-step error detection and correction approach for voice word processor”, (In preparation)

  • 137

    • Scenario

    Long-term Memory Chatting System

    Hi, I’m John.

    Nice to meet you.

    I’ll remember that.

    Do you know where I live?

    You are living in Pohang.

    I’m not good at foreign language.

    I live in Pohang.

    (I, be, John)(I, live in, Pohang)

    (…, …, …)

    Long-term Memory

    User Utterance System Response

    Hi, I’m Jane. Nice to meet you.

    Do you know where I live? You are living in Seoul.

    Do you know where I live? I don’t know about that.

    I’m not good at foreign language.

    But I heard that you can speak Chinese.

    I’m not good at foreign language.

    Then, can I help you?

    Example Database

    Then, can I help you?

    Jeesoo Bang, Hyungjong Noh, Yonghee Kim, and Gary Geunbae Lee. Example-based Chat-oriented Dialogue System with Personalized Long-term Memory. Proceedings of the 2nd International Conference on Big Data and Smart Computing (BigComp 2015), 2015.

  • 138

    • Architecture of the Chatting System

    Architecture of the Chatting System

  • 139

    1. Extract user-related facts (triples) from user inputs, and store them into the long-term memory

    2. Modify the system response by applying user-related facts

    3. Select the most appropriate response

    Long-term Memory Chatting System

    I’ll remember that.I live in John.

    (I, be, John)(I, live in, Pohang)

    Long-term Memory

    User utterance System response

    Do you know where I live?

    You are living in Seoul.

    Do you know where I live?

    You are living in Pohang.

  • 140

    1. Knowledge Extractor• Extract user-related facts from user inputs, and store them into the long-

    term memory

    • RDF-style triple: trp = (arg1, rel, arg2)– arg: noun phrase– rel: textual fragment indicating semantic relation between two args– E.g. I like red apples. (I, like, red apple)

    Long-term Memory Chatting System

  • 141

    1. Knowledge Extractor• Long-term Memory (LTM)

    – Define two types of triple patterns• Triple pattern with SBJ slot (e.g. (SBJ, be, my friend))• Triple pattern with OBJ slot (e.g. (I, like, OBJ))

    – Matched triples are stored in the Long-term Memory

    Long-term Memory Chatting System

    Triple patterns(SBJ, be, my friend)

    (I, like, OBJ)(My name, be, OBJ)(I, can speak, OBJ)

    …(Harry, be, my friend)

    Matched triples(Harry, be, my friend) Long-term

    Memory

    Personal Knowledge Manager

    User InputHarry is my friend.

    Knowledge Extractor

  • 142

    2. Personal Knowledge Applier• Apply User-related facts to System Response Candidates

    – For 𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠𝑠𝑠 extracted from a response (candidate) 𝑠𝑠𝑠𝑠– Replace arg2 (arg1) of the 𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠𝑠𝑠 with the arg2 (arg1) of a user-related triple,

    when the two triples are similar enough except those arg2 (arg1)

    Long-term Memory Chatting System

    objectmy

    name

    is

    Chuck

    subject

    predicateTriple extracted

    from system response(candidate)

    objectMy

    name

    is

    Bruce

    subject

    predicateTriple inLong-term Memory

    Oh, your name is Chuck.Bruce

  • 143

    3. General Score• Put weight on the system response which is general• Assumption: general response has many similar responses in the example

    database.

    – E: the example database– e = (su, ss): an example; su is a user utterance, ss is a system response– sim(s1, s2): weighted dice similarity between two sentences s1, s2– … for any sentence s = {w1, w2, …, wn}; w is a word

    – userIDF(w) = log(|E|/cnt(w)) … approximation for short sentences– cnt(w): the frequency (the number of occurrence) of the word w in E

    Long-term Memory Chatting System

    ‖𝑠𝑠‖ = �𝑢𝑢𝑠𝑠𝑚𝑚𝑡𝑡𝐼𝐼𝐼𝐼𝐼𝐼(𝑤𝑤)𝑤𝑤∈𝑠𝑠

  • 144

    • Anaphor: a word or phrase that refers back to an earlier word or phrase– My mother said she was leaving.

    • ScenarioA: My best friend is Seonghan. His favorite fruit is strawberry.B: I like strawberry, too.A: Oh, I didn’t know that.B: My best friend is Sangdo. He likes computer games. His favorite game is FIFA online.A: Today is Seonghan’s birthday. Could you recommend a present for Senghan?PosChat: You can give him what he likes. You said Seonghan’s favorite fruit is strawberry.A: That’s good idea. B: Hmm… I am bored. Do you have any recommendations?PosChat: You can play computer games with your best friend. Sangdo’s favorite game is FIFA online.

    Multi-party Chatting System

  • 145

    • Discourse stack– Stores the contents of multiparty dialog texts in structured format for

    anaphora resolution

    Multi-party Chatting System

    Sentence Information

    That’s good idea. DA: statement

    Could you recommend a present for Senghan?

    DA: yn_q

    Today is Seonghan’s birthday. DA: statement

    Oh, I didn’t know that. DA: statement

    His favorite fruit is strawberry.

    DA: statement

    My best friend is Seonghan. DA: statementPerson: Senghan

    Sentence Information

    Do you have any recommendations?

    DA: yn_q

    Hmm… I am bored. DA: statement

    His favorite game is FIFA online.

    DA: statement

    He likes computer games. DA: statement

    My best friend is Sangdo. DA: statementPerson: Sangdo

    I like strawberry, too. DA: statement

    Sentence

    You can play computer games with your best friend. Sangdo’s favorite game is FIFA online.

    You can give him what he likes. You said Seonghan’sfavorite fruit is strawberry.

    [I, like, strawberry]

    [My best friend, be, Seonghan]

    A stack B stack Poschat

    A LTM B LTM

    Junhwi Choi, Jeesoo Bang, Gary Geunbae Lee. “Multiparty open-world dialog system on NAO robot”. Proceedings of SLT 2014, Dec 2014, Nevada (demo presentation)

  • 146

    Distributed Word Representation Matching

    Distributed word representation: n-dimensional vector

    Can capture distributional syntactic and semantic information

    Recursive Autoencoder (RAE) Combine word representations into vector representations of longer

    phrases

    The cats catch mice

  • 147

    Distributed Word Representation Matching

    Paraphrase identification using distributed word representation

    The cats catch mice Cats eat mice

    1 2 3 4 1 2 3

    6 5 4

    7 5

    1 2 3 4 51234567

    Variable-sized similarity matrix

    Fixed-sized matrix

    Dynamic Pooling Softmax classifierParaphrase

    Socher, Richard, et al. "Dynamic pooling and unfolding recursive autoencoders for paraphrase detection." Advances in Neural Information Processing Systems. 2011.

  • Emotional Dialog System - Issue Emotion-Based Dialog Strategy

    Emotional Strategy Video / Audio Input

    Facial Expression Angle of Mouth Angle of Eyes

    Prosody, Accent Emotion Detection

    Angery, Sad, Happy, Scared Emotional Keyword Detection

    Ex) 슬퍼 -> Sad

    Dialog Strategy Appropriate Strategy for Emotion

    Emotion : Sad

    My dog died yesterday

    You look sad, cheer up!

    User

    AgentSangdo Han, Kyusong Lee, Donghyeon Lee, Gary Geunbae Lee. Counselling dialog system with 5W1H extraction. Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2013), Aug 2013, Metz

  • Humor Dialog System - Solution Architecture

    Natural Language Understanding

    Chat-oriented DM

    Humor Generation

    Utterance Selection

    Linguistic Analysis(DA, NE)

    Natural Language Understanding

    User Utterance

    System Utterance

    Utterance Selection

    Dialog Manager

    Natural LanguageGeneration

    Chat-oriented DM

    Humor Manager

    Humor Generation

    Humor GenerationModule 1

    Humor GenerationModule 2

    Humor GenerationModule N…

    Keyword Analysis

    ③②

    Korean Patent filed: 10-2013-0008478

  • Counseling System Features

    Based on Counseling Skills Open-Ended Questioning Paraphrasing Reflect feeling

    Based on User’s Emotion Strategy based on Emotion

    Virtual Agent Talk with 3D Virtual Agent Emotion-based Attitude

    Speaker Utterance

    User My friend bugs me a lot

    Agent Why does your friend bug you?

    User He beats me saying I am dirty

    Agent Wow! You must be angry

    User Ya! I am depressed

    Agent Since when did it happen?

    Sangdo Han, Younghee Kim, Gary Geunbae Lee. Micro-counselling dialog system based on semantic content. Proceedings of the international workshopseries on spoken dialog systems (IWSDS 2015), Jan 2015, Busan

  • Demo video postech isoft

    (NAO robot) https://www.youtube.com/watch?v=pcz228RDTlk

    Counselling dialog system

    Demo

    https://www.youtube.com/watch?v=pcz228RDTlk

  • Dialog system for English education

    Gary Geunbae LeeIntelligent Software Lab.

  • CALL:POSTECH approach to CALL

    Intelligent Software Lab. 153

    StudentModel

    Learner

    PronunciationProsodyGrammar

    Error Detection Game Environment

    Utterance Suggestion

    DB-CALL

    PronunciationProsody

    Grammar

    Error Feedback

    Data Server

  • CALL:POSTECH approach to CALL

    Intelligent Software Lab. 154

    PRONUNCIATION PROSODY GRAMMAR Dialog system / Game

    DETECTION & FEEDBACK

    DETECTION & FEEDBACK

    DETECTION & FEEDBACK DB-CALL & Gameplay

    Pronunciation Training & Assessment

    Intonation

    Phrase Break

    Grammar Error Simulation

    Grammatical Error Detection

    Data Collection

    Various Platf orm(Mobile, Tablet PC)

    Student Model

    Pronunciation Detection / Feedback

    Pronunciation Error Simulation

    Stress/Rythm

    English Tutoring System

    3D virtual Environment

  • Pronunciation Assessment and Training:Architecture & data flow

    Jisoo Bang, Jonghoon Lee, Gary Geunbae Lee, Minhwa Chung. A pronunciation variants prediction method for Korean learner’s mispronunciation detection.(accepted) ACM Trans. on Asian Language Information Processing (TALIP)

  • Prosody Assessment and Training:Definition– What is prosody?

    • English is one of the stress-timed languages• Prosody consists of rhythm, stress and intonation

    – Rhythm• Determined by the beats occurring in regular patterns

    – Between stressed and unstressed syllables• We derived rhythm from sentence stress patterns

    – Intonation• Pitch fluctuations in utterances Showing high degree of freedom Requiring prosodic components with relatively low degree

    • Integrating pitch accent, phrase accent and boundary tone

    156

  • Prosody Assessment and Training:Architecture including feedback provision

    Alignment

    Text TextAnalysis

    Speech Analysis

    ProsodyPrediction

    Model

    Rule ApplicationRules

    PredictedProsody

    ModelTraining Model

    ProsodyDetection

    DetectedProsody

    FeedbackDiff.

    TextAnalysis

    Text

    Speech Signal

    ModelTraining

    Sechun Kang, Gary Geunbae Lee, Ho-Young Lee, Byeongchag Kim. An automatic pitch accent feedback system for english learners with adapatation of english corpus spoken by Koreans. Proceedings of the 2012 IEEE workshop on spoken language technology (SLT2012), Dec 2012, Miami

    157

  • Prosody Assessment and Training:Rhythm’s user interface

    – Component interface view

    • Words: the recognized (or given) text• Canonical: sentence stress prediction results• Actual: sentence stress detection results

    • Score:DetectionPrediction

    }Detection)Prediction( B ,Conf | {B1 B

    ∪∩∉≥

    −τ

    158

  • Collecting Grammar Error Data: POLC:Picture description task

    • From English learners of Korean• Story Telling based on pictures• 80 Students (5 tasks for each student)

    Hongsuck Seo, Kyusong Lee, Gary Geunbae Lee, Soo-Ok Kweon, Hae-Ri Kim. Grammatical error annotation for Korean learners of spoken English. Proceedings of the 8th international conference on language resources and evaluation (LREC2012), May 2012, Istanbul

  • Collecting Grammar Error Data: Error tagsets

    • JLE Tagset– Consisting of 46 tags– Systematic tag structure– Some ambiguity caused by POS specific error tag structure

    • CLC Tagset– World-widely used tagset including 76 tags– Systematic & Taxonomic tag structure– JLE issue is figured out by taxonomic tag structure

    • NUCLE Tagset– 27 error tags– Quiet arbitrary tag structure

    • UIUC Tagset– Only for articles and prepositions

  • TextErroneous TextGrammatical Error

    Simulation

    ASR ASR’

    N-gram LM

    Merged Hypotheses

    Error-typeClassifier

    GrammaticalityChecker

    N-gram LM

    Feedback

    Error PatternsError Frequency

    Grammar Assessment and Training:Grammar error detector architecture

    Sungjin Lee, Hyungjong Noh, Kyusong Lee, Gary Geunbae Lee. Grammatical error detection for corrective feedback provision in oral conversations. Proceedings of the 25th AAAI conference on artificial intelligence (AAAI-11), Aug 2011, Sanfransisco

  • Grammar Assessment and Training:Grammatical Error Simulation

    Automatic Speech Recognizer

    Grammar Error Simulator

    Incorrect Sentences

    Correct Sentences

    Error Types

    Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. Proceedings of the ACL 2009, August 2009 Singapore (short paper)

  • Spoken Dialog System DB-CALL System

    Dialog-based Language Learning System: Dialog-based CALL system

    Cheongjae Lee, Sangkeun Jung, Kyungduk Kim, Gary Geunbae Lee. Hybrid approach to robust dialog management using agenda and dialog examples. computer speech and language, 24 (4): 609-631, Oct 2010

  • Dialog-based Language Learning System:The Framework of Ranking DM

    Scoring Module

    User Intention: SLU N-best(System Intention)

    CalculatedScores

    Next System Intention(User Intention)

    Ranking various scores Robust system action

    Hyungjong Noh, Sungjin Lee, Kyusong Lee. Gary Geunbae Lee. Ranking dialog acts using discourse coherence indicator for English tutoring dialog systems. Proceedings of the 3rd international workshop on spoken dialog systems technology (IWSDS 2011), Sept 2011, Granada Spain

  • Dialog-based Language Learning System:POMY system architecture

    Kyusong Lee, Soo-ock Kweon, Hyungjong Noh, Gary Geunbae Lee. Postech Immersive English Study (POMY): Dialog-based Language Learning Game.(accepted) IEICE transactions on information and systems

  • Pre-test Post-test Mean

    Category N Mean SD Mean SD difference p

    Listening 25 56.4 16.6 71.2 20.9 14.8 0.0001**

    Vocabulary 25 74.0 31.4 117.6 32.7 43.6 0.0001**

    Speaking 25 Pronunciation 25 42.08 6.80 44.48 6.80 2.40 0.0001**

    Grammar 25 36.56 8.45 42.40 6.95 5.84 0.0001**

    # of Words 25 136.31 55.30 170.04 80.88 33.73 0.003**

    Table 1. Overall

    Dialog-based Language Learning System:Cognitive effect on overalls students

    • Significantly Improved• Students Spoke more words in post test

  • Demo video (Postech isoft dbcall) (postechisoft pesaa)

    https://www.youtube.com/watch?v=k0TAdfngZpU

    Robot dbcall system 2013 pesaa system

    Demo

    https://www.youtube.com/watch?v=k0TAdfngZpU

  • Thank You & QA

    Siri, Watson and Natural Language ProcessingContentsSiri, Watson and NLPApple SiriSiri – your wish is its commandSample Dialogs (chatting)Sample Dialogs (tasks)Architecture�Google NowMS CortanaQuestion Answering: IBM’s WatsonTypes of Questions in Modern Systems슬라이드 번호 13IBM Watson Platform and ApplicationIBM Watson - Recent ApplicationsIBM Watson – EcosystemLanguage TechnologyWhat’s hard – ambiguities, ambiguities, all different levels of ambiguitiesWhy else is natural language understanding difficult?Levels of LanguageRecent Trend of Application using NLPRecent Trend of Application using NLP/AIIOT2H (Internet of things to Human)Multi-domain ontology reasoning dialog systems for intelligent assistantSPOKEN DIALOG SYSTEM (SDS)Interactive Question AnsweringSDS APPLICATIONSASR (automatic speech recognition)SPEECH UNDERSTANDING (in general)REPRESENTATIONKnowledge-based SystemsHOW TO SOLVE: STATISTICAL APPPROBLEM FORMALIZATIONMACHINE LEARNING FOR SLUDIALOG MANAGEMENTDESIGN ISSUESDESIGN ISSUESDIAL