semantic relation classification: task formalisation and refinement

31
NLP & Semantic Computing Group N L P Semantic Relation Classification: Task Formalisation and Refinement Vivian S. Silva Manuela Hürliman Brian Davis Siegfried Handschuh André Freitas

Upload: andre-freitas

Post on 16-Jan-2017

38 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

N L P

Semantic Relation Classification: Task Formalisation and RefinementVivian S. SilvaManuela HürlimanBrian DavisSiegfried HandschuhAndré Freitas

Page 2: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Outline•Motivation

•Revisiting Semantic Relation Classification Using Foundational Ontologies (DOLCE)

•Systematic Analysis

•Summary

Page 3: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Motivation

Page 4: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Source: Ontotext

IntroductionSemantic Relation Classification (SRC) is a fundamental

task in NLP, allowing the induction of semantic representation models for both commonsense and

domain-specific data.

Source: W3C

Source: Semantrix

Page 5: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Our Goals• Improve the coverage, description and the

formalisation of the semantic relation classification task

•Provide a critique and generalization of the existing SemEval-2010 task 8

Page 6: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Semantic Relation Classification•SemEval-2010 task 8

Most common semantic relation set Relations covered:• Cause-Effect (CE)• Instrument-Agency (IA)• Product-Producer (PP)• Content-Container (CC)• Entity-Origin (EO)• Entity-Destination (ED)• Component-Whole (CW)• Member-Collection (MC)• Message-Topic (MT)

Page 7: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Semantic Relations ClassificationThe <e1> burst </e1> has been caused by water hammer <e2> pressure </e2>.

Page 8: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Semantic Relations Classification• Despite the obvious intuition around the utility of

SRC…

Semantic relations set and their expressive coverage has not been fully grounded with regard to an ontological framework

1 When projecting these semantic relations back to the corpora-level, it can be observed that the majority of the words within a text does not have a direct semantic relationship connecting them

2

Page 9: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Semantic Relations Classification•SemEval-2010 task 8Has some constraints… …that brings some limitationsFocus on Nominals: only noun

phrases are considered

Locality Constraint: only relations for arguments in the same clause

Focus on Concrete Relations: most relation refer to physical objects

Exclusion of Conditionals: conditional clauses not considered

No relations between events, when represented by verbs, and

their objectsNo relations between terms

belonging to different, subordinate clauses

No relations for abstract entities or quantitative/qualitative roles

No relations expressing conditional dependencies

Page 10: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Revisiting Semantic Relation Classification

Page 11: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Main question•Given two sets of content words in a

sentence, can we provide a semantic relation between them?

•Can this task be useful as a semantic interpretation mechanism?

Page 12: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Main strategy•Start using foundational ontologies for this

task

•Define relation compositions

•Expand the model with custom abstract relations that stand on the interface between dependency relations and an ontology-based representation

Page 13: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Why Foundational Ontologies?

Representation ReasoningData

Foundational ontologies are intended to represent the world in the way people perceive it,

classifying entities into categories that are familiar to people’s common sense

can represent data in a formal

way

can reason over data using high-

level restrictions

Page 14: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

When is a foundational ontology useful?

• 1. When subtle distinctions are important• 2. When recognizing disagreement is

important• 3. When rigorous referential semantics is

important• 4. When general abstractions are important• 5. When careful explanation and justification of

ontological commitment is important• 6. When mutual understanding is more

important than interoperability.

Guarino, 2006

Page 15: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

DOLCE•DOLCE (Descriptive Ontology for Linguistic

and Cognitive Engineering)

•Strong cognitive/linguistic bias: Descriptive (as opposite to prescriptive)

attitude Categories mirror cognition, common sense,

and the lexical structure of natural language Emphasis on cognitive invariants

Page 16: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

DOLCE•Any term can be mapped to a DOLCE high

level category (class)

• It’s always possible to find a relation between any two DOLCE categories, and, therefore, between the entities mapped to them

Page 17: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Page 18: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Page 19: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

DOLCE Relations• 23 immediate relations and 25 mediated

(composed) relations, many of them having sub-relations. Some examples:

immediate-relation

mediated-relation

instrumentperformed-bytarget

functional-participant

part

referencesresource

temporally-coincides

precedestemporal-relationabstract-locationco-participates-with

temporally-overlaps

temporally-includes

… …

Page 20: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Applications: Simple Text Entailment Example

Assumption Mary is a motherHypothesis Mary gave birthCommonsense KB a mother is a woman who has given

birth

Foundational Ontology Mapping

Mary

mother

give birth

agent role action

(agent plays role)(role performs action)(agent performs action)(agent plays role) and (role performs action) -> (agent performs action)

Foundational classes

Commonsense concepts

Foundational relations

Page 21: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Systematic Analysis

Page 22: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Corpus-based Analysis1. Corpus construction

• Focused on the financial domain (merges both commonsense with domain-specific discourse)

• Contains both factoid and definition type of discourse• We created a financial corpus by crawling two distinct

types of sources:a) definitions, from three

sources:b) articles, from two

sources:Bloomberg Financial Glossary

SGM Glossary

Investopedia Definitions

Wikipedia

Investopedia

Page 23: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Corpus Construction•Definitions

Bloomberg financial Glossary (8324 definitions; 212,421 tokens)

SGM Glossary (1007 definitions; 43,638 tokens) Investopedia Definitions4 (15476 definitions;

2,462,801 tokens), •Articles

Investopedia (5890 articles; 5,129,793 tokens) Wikipedia (articles on Investment and Finance;

8306 articles; 6,714,129 tokens).

Page 24: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Corpus-based Analysis1. Corpus construction

• We created a financial corpus by crawling two distinct types of sources:

• Word pair selection: Corpus split into sentences First word randomly selected among the sentence tokens Second word manually selected

a) definitions, from three sources:

b) articles, from two sources:Bloomberg Financial

GlossarySGM Glossary

Investopedia Definitions

Wikipedia

Investopedia

Page 25: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

[…] the legislation's include a lifting of a 40-year ban on the United States' exporting of crude oil

Corpus-based Analysis2. Manual Classification Analysis

• 300 pairs of words manually annotated• Words mapped to DOLCE classes• Relation between them chosen among the set of relations

that exist between the classes assigned to the words• 3 different scenarios occurred:

a) Direct relationship:

b) Relation composition:

c) No relation found:Concepts too far away

After 30 days the trustee can then use the contributions to pay the insurance policy premium

target

target target

indirect-target

Page 26: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Custom Relations• DOLCE relations can be defined specifically for a class, or be

inherited from an ancestor class In the second case, the kind of relationship can become

too general To avoid semantically vague relations, we proposed a small

set of custom relations. A few examples:Relation ExampleCorrelated variation

It also decreases the value of the currency - potentially stimulating exports and decreasing imports - improving the balance of trade.

Ownership The lessor is the legal owner of the asset.Sibling concept Operating activities include net income, accounts

receivable, accounts payable and inventory.Value component Valuation of life annuities may be performed by

calculating the actuarial present value of the future life contingent payments.

Page 27: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Some Statistics

• Most common DOLCE relations: patient, patient-of, target, target-of

• Most common custom relations: qualifier, indirect-target, ownership

Relation type

DOLCE Relation

Custom Relations

Total

Direct 35.32% 64.68% 72.67%Composite 48.65% 51.35% 24.67%Unclassifie

d- - 2.66%

Page 28: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Semantic Relation X Semantic Relatedness• The corpus was further annotated by two domain experts in

finance • Two human annotators scored each of the 300 concept

pairs for semantic relatedness on a scale from 0 (unrelated) to 10 (identical or highly related) Average of their scores taken as final score Comparing the semantic relation to the semantic

relatedness score assigned to the same pair:

Page 29: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Summary• This work described a preliminary study on the

improvement of the coverage, description and the formalisation of the semantic relation classification task

• A foundational ontology (DOLCE), composite relations and custom semantic abstract relations were used

• DOLCE accounted for 38.2% of the semantic relations• 67 % of the pairs were assigned to a direct relation• 2.66% of the pairs could not be classified

• Relevant research questions: The impact of foundational ontology models in

distributional and compositional-distributional semantics.Data available at: http://bit.ly/2gpTkHT

Page 30: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Some Limitations (Currently being addressed)

•Scaling corpus annotation size (currently 300 elements)

•Grounding the custom relations into the foundational ontology

Page 31: Semantic Relation Classification: Task Formalisation and Refinement

NLP & Semantic Computing Group

Work in Progress• Train an automatic annotator, capable of

identifying FO classes semantic relations in text.