georgios meditskos and stamatia dasiopoulou | question answering over pattern-based user models
TRANSCRIPT
Question Answering over Pattern-Based User Models
Georgios Meditskos, Stamatia Dasiopoulou, Stefanos Vrochidis, Leo Wanner, Ioannis Kompatsiaris
12th International Conference on Semantic Systems (SEMANTiCS)Leipzig, Germany September 12 - 15, 2016
2
Outline
• Overview & Motivation• Proposed Framework• Semantic Language Analysis• Ontology-based Question Answering
• Example• Conclusions
3
Natural Language Interfaces (NLI) & Question Answering (QA) (1/2)• Allow users to express their information needs in an intuitive manner• over structured data/knowledge bases• hide the complexity of formal knowledge representation and query languages
• The key challenge is to bridge the gap between the way users communicate with the system and the way knowledge is captured• Usually involves the translation of questions into semantically enriched
structures that capture the meaning of requests• Formulation of pertinent queries (e.g. SPARQL) in accordance with the
conceptualisation of the underlying structured data sources• Answers can be retrieved from the underlying knowledge bases
Overview & Motivation
4
Natural Language Interfaces (NLI) & Question Answering (QA) (2/2)• The focus has been mainly given on simple, factoid questions• who, lists, yes/no, when, etc.• NL inputs comprise primarily light linguistic constructions • answers target respective bindings on (chains of) binary properties• Goal: to overcome the conceptual mismatch and ambiguities between the triple-based
question representations and the underlying knowledge model
• Question answering over more conceptually demanding domains (?)• involve complex relational contexts that go beyond (chains of) binary associations and
abide to ontology patterns design principles• such as habits and daily routines profiling
Overview & Motivation
5
Ontology Design Patterns
• Usually describe abstract roles and relationships • can be applied in a wide variety of situations
• This level of generalization fosters reusability and extensibility, but imposes certain challenges • in the formalisation of the natural language questions • in the subsequent content matching and retrieval
• This is mainly due to the encapsulation of domain semantics inside conceptual layers of abstraction • e.g. using reification or container classes that demand flexible, context-aware approaches
for query analysis and interpretation
Overview & Motivation
6
Example: Event-Model-F (DnS)
• Highly axiomatised descriptions and rich structures• Querying relies on coping with
NL questionsthat allow capturing complex relations between entities and roles
Overview & Motivation
7
PROHOW• Web of Know-How
dataset contains activities and instructions collected from WikiHow and Snapguide• Example: information
about recipes• Conceptual mismatch
between question and dataset
Overview & Motivation
8
KRISTINA Project
• A dialogue-based agent for conversational assistance in healthcare• Elderly use the dialogue system (usually at home) to acquire information and
suggestions related to basic care and healthcare (e.g. symptoms, treatments, etc.).• Clinicians and caregivers can use the agent to acquire information about the
person• e.g. migrants, difficulties in communication (e.g. cognitive impairment)
• RDF Knowledge bases with profile information• e.g. habits before sleep, medical profile, activities of daily living, etc.
Overview & Motivation
9
Basic Components• Recognition of non-verbal modalities
• Gesture analysis, facial expressions
• Speech Recognition• Transformation of user input into textual form (speech-to-text)
• Language Analysis and Understanding• Formalize user utterances in a structured representation that allows for automated reasoning and interpretation
• Dialogue Management• Coordinates the components, controlling the dialog flow and communicating with external applications
• Discourse analysis, clarifications, system actions, etc.
• Question Answering• Retrieval of information relevant to user’s question/request
• Speech generation and avatar• Responses are typically generated as natural language with content retrieved from knowledge databases
Overview & Motivation
10
Language Analysis (1/2)
• Most ontology-based NLI approaches capture only partially the underlying user utterance semantics• main focus is on light linguistic constructions & syntactic (e.g. subject , object) rather
than semantic dependencies
• Expressive, frame-based ontological representations of text have been proposed for knowledge extraction tasks• varying modelling choices, tailored to intended application context
• Need for expressive, principled representations that capture the user inputs
The framework / Language Analysis
11
• Two stages• Extraction of entities and their interrelations• Translation of extracted into structured OWL representations
• Approach• Semantic predicate-argument extraction• DUL-based mapping to OWL representations
The framework / Language Analysis
Language Analysis (2/2)
12
Predicate-argument extraction
• Graph transducers pipeline to extract incrementally abstract representation structures (surface syntax, deep syntax, semantic)• https://github.com/talnsoftware/FrameSemantics parser (Pompeu Fabra Univ.)• Frame-based representation (events, objects, frames, frame elements etc.)
• Example: Apply_heat frame describes a cooking situation involving, among others, a Cook, some Food and a Heating_Instrument• the roles of the involved participants, i.e. cook, food and heating instrument,
comprise the frame elements (FEs) of the frame, while words that evoke it, such as fry, bake, boil, and broil, its lexical units (LUs).
• SemLink mappings for labelling/enriching semantic predicate-argument structures with FrameNet based annotations• Word-sense disambiguation (BebelNet)
The framework / Language Analysis
13
Mapping to OWL representations
• DnS-based translation, where:• frames as contextual views• frame elements as role classifiers• frame occurrences as relational contexts
The framework / Language Analysis
:IngestionFrame rdfs:SubClassOf dul:Situation:ingestion1 rdf:type :IngestionFrame dul:isSettingFor :coffee1, :Ann; dul:includesEvent :drink1 .:event1 dul:classifies :drink1 .:drink1 rdf:type :Drink .:ingestibles1 dul:classifies :coffee1.:coffee1 rdf:type :Coffee .
14
Knowledge Extraction
• Capitalizes on the graph traversal paradigm• Returns a set of triples that conceptually match the input (language analysis)
• Aim/challenge is to decouple graph expansion from predicate ranking• in pattern-based modelling, additional layers of axiomatisation are introduced that
encapsulate conceptual dependencies and links among resources • These dependencies are usually not relevant to the structure and semantics of
questions • cannot be uncovered by graph expansion approaches that are based on predicate
ranking
The framework / Knowledge Extraction
15
Domain Modelling
• No restrictions on the domain ontologies used to capture background knowledge• Existing foundational ontologies can be used, according to the domain• DUL, SEM, Event-Model-F, SUMO, MeSH, etc.
• Examples
time:hasDurationDescription
Aspect View
Role
Domain Vocabulary
hasView
defines
interprets
Duration
TemporalContext
time:DurationDescription
involves
rdfs:subClass
rdfs:subClass
literal
value
hasRole
Event-Model-F CPO ontology Activity duration pattern in DUL
The framework / Knowledge Extraction
16
Context Extraction
1. Extraction of key entities• from question analysis
2. Resource Identification• find relevant resources in the underlying KBs
3. Resource unfolding and local context• identification of neighboring set of triples
4. Context links• links among local context
5. Context ranking and final responses• traverse local context and collect the triples
The framework / Knowledge Extraction
17
1. Key Entity Extraction
• Entities that participate in DnS classification relations• such axiomatizations encapsulate information about the context of questions.
• They are extracted by traversing the frame situation model, collecting the resources classified through dul:classifies property assertions.
The framework / Knowledge Extraction
:IngestionFrame rdfs:SubClassOf dul:Situation:ingestion1 rdf:type :IngestionFrame dul:isSettingFor :coffee1, :Ann; dul:includesEvent :drink1 .:event1 dul:classifies :drink1 .:drink1 rdf:type :Drink .:ingestibles1 dul:classifies :coffee1.:coffee1 rdf:type :Coffee .
18
2. Resource Identification
• Find relevant resources in the underlying KBs• Assign URIs to key entities• UMBC Semantic Similarity Service• combines Latent Semantic Analysis (LSA) word similarity and WordNet knowledge
The framework / Knowledge Extraction
Drink -> http://….#DrinkCoffee-> http://….#Coffee
19
3. Resource Unfolding and Local Context
• Local context• captures information relevant to the neighbouring resources (triples)
• It is built by taking into account all the connected triples (h threshold), without examining the similarity of the predicate labels• ensures that the local contexts contain information that is part of the conceptual model
of the pattern• for example, the question “How to make a pancake” does not directly entail that the
predicates requires or has_method • they should be part of the graph expansion algorithm
The framework / Knowledge Extraction
20
4. Context Links
• Connects local contexts based on the contained triples• Should have at least one common subject, predication or object (OWL schema
predicates are ignored)
The framework / Knowledge Extraction
21
5. Context Ranking and Final Responses
• Local context merging• Traverse the paths defined by context links, collecting the triples of local
contexts• Each set is semantically compared to language analysis results• Depends on the number of key entity URIs that exist in the set
The framework / Knowledge Extraction
22
Example: How often does Ann like to drink coffee?
23
Language Analysis
Key Entities
KB Resources
24
Local Context (h=2)
Final Response
The semantic similarity equals to 1, since all key entities are exactly matched to the resources of the response
25
Conclusions
• We propose a language analysis and question answering framework over conceptually complex, pattern-based KBs• Combines the frame-based reified representation of NL questions with a
context-aware, graph-based paradigm• We are currently building rich KBs capturing user models of participants in
KRISTINA pilots • The collected data will allow us to evaluate our framework with realistic data, identifying
possible limitations that have not been foreseen so far.
• In parallel, we are working towards further enrichment of the analysis and interpretation of complex relational context • support additional constructions, such as negation, superlatives and aggregation, that
will allow for more expressive QA over the profiled users routines.
26
Thank you for your attention
http://kristina-project.eu/