phd research proposal - qualifying exam
DESCRIPTION
Understanding which events are mentioned in unstructured natural language texts, and which relations connect them is a fundamental task for many applications in natural language processing (NLP), such as personalized news systems, question answering and summarization. A notably challenging problem related to event processing is recognizing the relations that hold between events, in particular temporal and causal relations. Having knowledge about such relations is necessary to build event timelines from text and could be useful for future event prediction, risk analysis and decision making support. While there has been some research on temporal relations, the aspect of causality between events from an NLP perspective has hardly been touched, even though it has a long-standing tradition in psychology and formal linguistic fields. We propose an annotation scheme to cover different types of causality between events, techniques for extracting such relations and an investigation into the connection between temporal and causal relations. The latter will be the focus of this thesis work because causality clearly has a temporal constraint. We claim that injecting this precondition may be beneficial for the recognition of both temporal and causal relations.TRANSCRIPT
Extracting Temporal and Causal Relations between Events
Paramita
Under the supervision of Sara Tonelli
10 December 2013
Overview
• Introduction to Event Extraction• Event Relation Extraction– Problem Statements– State-of-the-Art
• Research Goals and Plan– Preliminary Result
Information Extraction
Typhoon Haiyan, one of the most powerful typhoons ever recorded slammed into the Philippines on Friday, setting off landslides, knocking out power in one entire province and cutting communications in the country's central region of island provinces.
What?Where?When?
Typhoon HaiyanThe PhilippinesFriday
Natural Language Text: unstructured
Knowledge Base: structured
“A thing that happens or takes place, especially one of importance”─ Oxford dictionary
A Philippine volcano, dormant for six centuries, explodedlast Monday. During the eruption, lava, rocks and red-hotash are spewed onto surrounding villages. The explosionclaimed at least 30 lives.
Event ExtractionWhat is an event?
A Philippine volcano, dormant for six centuries, explodedlast Monday. During the eruption, lava, rocks and red-hotash are spewed onto surrounding villages. The explosionclaimed at least 30 lives.
Annotation frameworks for events: • TimeML• ACE
event: “something that happens/occurs or a state that holds true”
Events and temporal expressions:• TimeML
• ACE
A Philippine volcano, dormant for six centuries, explodedlast Monday. During the eruption, lava, rocks and red-hotash are spewed onto surrounding villages. The explosionclaimed at least 30 lives.
dormant• arg-time: six centuries
exploded• arg-time: last Monday
dormant for six centuries, exploded last Mondaytemporal link temporal link
temporal link
TempEval-3 (2013)
• Shared task on temporal and event processing• Automatic identification of temporal expressions, events, and
temporal relations within a text annotated with TimeML Task F1 Precision Recall
Task A –Temporal Expression 90.30% 93.09% 87.68%Task B – Event Extraction 81.05% 81.44% 80.67%Task ABC – Temporal Awareness 30.98% 34.08% 28.40%Task C1 – Temporal Relations (identification + classification)
36.26% 37.32% 35.25%
Task C2 – Temporal Relations (only classification)
56.45% 55.58% 57.35%
Low performances on temporal relation extraction!
Overview
• Introduction to Automatic Event Extraction• Event Relation Extraction– Problem Statements– State-of-the-Art
• Research Goals and Plan– Preliminary Result
CAUSE
BEFORE
The Relationship between Events
IS_INCLUDED
Temporal Relations
Causal Relations
Typhoon Haiyan struck the eastern Philippines on Friday,
killing thousands of people.
Temporal Constraint of Causalitycause BEFORE effect
creating event timelines, multi-document summarization
predicting future events, risk analysis,decision making support
Research Questions
“Given a text annotated with events and time expressions, how to automatically extract temporal relations and causal relations between them?”
“Given the temporal constraint of causality, how to utilize the interaction between temporal relations and causal relations for building an integrated extraction system for both types of relations?”
Temporal Relation Types: TimeML
• Based on Allen’s interval algebra (James F. Allen, 1983): a calculus for temporal reasoning, capturing 13 relations between two intervalsAllen’s Relation Illustration TimeML Relation
X < Y , Y > X X BEFORE Y , Y AFTER X
X m Y , Y mi X X IBEFORE Y , Y IAFTER X
X o Y , Y oi X X overlaps with Y
X s Y , Y si X X BEGINS Y , Y BEGUN_BY X
X d Y , Y di X X DURING Y , Y DURING_INV X
X INCLUDES Y , Y IS_INCLUDED X
X f Y , Y fi X X ENDS Y , Y ENDED_BY X
X = Y X SIMULTANEOUS Y
X IDENTITY Y
XY
XY
Y
Y
Y
Y
Y
X
X
X
X
X
XY
YX
Expressing Temporal Order
• Temporal anchoring– John drove back home for 20 minutes.
• Explicit temporal connectives– John went shopping before he drove back home.
• Implicit (and ambiguous) temporal connectives– John arrived at home. He parked the car and saw
his son waiting at the front door.
Temporal Relation Extraction
• Common approach dividing the task:– Identifying the pairs of entities having a temporal link
• Often simplified, rule-based approach:– Main events of consecutive sentences– Pairs of events in the same sentence– An event and a time expression in the same sentence– An event and the document creation time
– Determining the relation types• Often regarded as a classification problem, supervised learning
approach: – Given an ordered pair of entities (e1, e2), the classifier has to
assign a certain label (temporal relation type)
TempEval-3 (2013)
• Shared task on temporal and event processing• Automatic identification of temporal expressions, events, and
temporal relations within a text annotated with TimeML Task F1 Precision Recall
Task A –Temporal Expression 90.30% 93.09% 87.68%Task B – Event Extraction 81.05% 81.44% 80.67%Task ABC – Temporal Awareness 30.98% 34.08% 28.40%Task C1 – Temporal Relations (identification + classification)
36.26% 37.32% 35.25%
Task C2 – Temporal Relations (only classification)
56.45% 55.58% 57.35%
Low performances on temporal relation extraction!
Modelling Causality
• Counterfactual Model (Lewis, 1973)– “C is the cause of E iff it holds true that
if C had not occurred, E would not have occurred”
• Probabilistic Contrast Model (Cheng & Novick, 1991)– “C is the cause of E if covariation
is positive” probability of E in the presence of C: probability of E in the absence of C
• Dynamics Model (Wolff & Song, 2003)Patient tendency for result
Affector-patient concordance
Occurrence of result
CAUSEENABLEPREVENT
NYY
NYN
YYN
Causal Relations: Language Resources
• Penn Discourse Treebank (PDTB) 2.0– Focuses on encoding discourse relations– “It was approved when a test showed some positive results,
officials said.” CONTINGENCY:Cause:reason• PropBank
– Annotates verbal propositions and their arguments– “Five countries remained on that so-called priority watch list
because of an interim reviewARGM-CAU.”
• SemEval 2007 Task 4 “Classification of Semantic Relations between Nominals”– Contains nominal causal relations as a subset– The period of tumor shrinkagee1 after radiation therapye2 is
often long and varied. Cause-Effect(e2,e1) = "true"
Causal Relations: Language Resources (2)between Events
• Bethard et al. (2008) – 1000 conjoined event pairs (with conjunctive and) are annotated
manually with BEFORE, AFTER, CAUSE, or NO-REL relations– Build classification model using SVM (697 train pairs)– Causal relation extraction evaluation: F-score 37.4%
• Do et al. (2011)– Detection of causality between verb-verb, verb-noun, and noun-noun
triggered event pairs, using PMI (based on probabilistic contrast model)
– Causal relation extraction evaluation: F-score 46.9%
• Riaz & Girju (2013)– Identification of causal relations between verbal events (with
conjunctives because and but, for causal and non-causal resp.)– Resulting in knowledge base containing 3 classes of causal association:
strongly causal, ambiguous, strongly non-causal
Causal Relation Extraction
• No standard benchmarking corpus for evaluating event causality extraction
• Causal relations in TimeML?– “The rainse1 causede2 the floodinge3.“– IDENTITY (e1,e2), BEFORE (e1,e3)
Temporal and Causal: the Interaction
• Temporal constraint of causal relations:The cause happened BEFORE the effect
• Bethard et al. (2008) on corpus analysis:– 32% of CAUSAL relations in the corpus did not have an
underlying BEFORE relation– “The walls were shaking because of the earthquake."
• Rink et al. (2010) makes use of temporal relations as a feature for classification model of causal relations– Causal relation extraction evaluation: F-score 57.9%
Overview
• Introduction to Automatic Event Extraction• Event Relation Extraction– Problem Statements– State-of-the-Art
• Research Goals and Plan– Preliminary Result
Research Objectives & Time Plan
1. Temporal Relation Extraction– Finding ways to improve the current state-of-the-art
performance on temporal relation extraction: 1st year
2. Causal Relation Extraction– Creating a standard benchmarking corpus for evaluating
causal relation extraction: 2nd year, 4 months– Building an automatic extraction system for event
causality: 2nd year, 8 months
3. Integrated Event Relation Extraction– Utilizing the interaction between temporal and causal to
build an integrated system for temporal and causal relations: 3rd year, 8 months
Temporal Relation Extraction Preliminary Result
• Temporal Relation Classification“Given a pair of entities (event-event, event-timex or timex-timex*), the classifier has to assign a certain label (temporal relation type).”
*) timex-timex pairs are so few in the dataset, so they are not included
– Supervised classification approach– Support Vector Machines (SVMs) algorithm– Feature engineering: event attributes, temporal signals, event
duration, temporal connectives (disambiguation), etc.– Bootstrapping the training data: inverse relations and closure
• TempEval-3 task evaluation setup
*) Paper submitted to EACL 2014
System F-Score Precision Recall
TRelPro* 58.48% 58.80% 58.17%
UTTime 56.45% 55.58% 57.35%
NavyTime 46.83% 46.59% 47.07%
JU-CSE 34.77% 35.07% 34.48%
Temporal Relation Extraction (2)Preliminary Result
• TempEval-3 test data annotated by TRelPro
Can be improved by including causality as a feature?
Causal Relation Extraction
• Create an annotation format for causal relations based on TimeML, in order to have a unified annotation scheme for both temporal and causal relations– Take the same definitions of events and time expressions– Introduce CLINK tags, in addition to TimeML TLINK tags for temporal
relations
• Map existing resources (e.g. PDTB, PropBank, SemEval-2007 Task 4 nominal causal corpus) to the newly created annotation scheme
• Build a causal relation extraction system– Consider the similar approach (and features) for the temporal relation
extraction system– New features relevant for causality extraction: causal
signals/connectives, lexical information (WordNet, VerbOcean)
• Affect verbs (affect, influence, determine, change)– Age influences cancer spread in mice.
• Link verbs (linked to, led to, depends on)– The earthquake was linked to a tsunami in Japan.
• Causal conjunctives– She fell because she sat on a broken chair.– John drank a lot of coffee. Consequently, he stayed awake all night. (conjunctive adverb)– I will go around the world if I win the lottery. (conditional)– She stopped the car when she saw the runaway goose. (temporal)– Ralph broke the car and his father went ballistic. (coordinating)
• Causal prepositions– He likely died because of a heart attack.– She was tired from running around all day.
• Periphrastic causative verbs– The earthquake prompts people to stay out of buildings. (CAUSE)– The pole restrains the tent from collapsing. (PREVENT)– The oxygen lets the fire gets bigger. (ENABLE)
• Affect verbs (affect, influence, determine, change)– Age influences cancer spread in mice.
• Link verbs (linked to, led to, depends on)– The earthquake was linked to a tsunami in Japan.
• Causal conjunctives– She fell because she sat on a broken chair.– John drank a lot of coffee. Consequently, he stayed awake all night. (conjunctive adverb)– I will go around the world if I win the lottery. (conditional)– She stopped the car when she saw the runaway goose. (temporal)– Ralph broke the car and his father went ballistic. (coordinating)
• Causal prepositions– He likely died because of a heart attack.– She was tired from running around all day.
• Periphrastic causative verbs– The earthquake prompts people to stay out of buildings. (CAUSE)– The pole restrains the tent from collapsing. (PREVENT)– The oxygen lets the fire gets bigger. (ENABLE)
Expressing Causality
Ambiguous!
Integrated Temporal & Causal Relation System
Temporal Expressions
Event Extraction
Temporal Relation
Classification
Temporal & Causal Relation
Classification
Explicit Causal Relation
Classification
Thank you!
CAUSE
BEGINS
Paramita closes the presentation and the question-answering session starts.
Expressing Causality: Implicit
• Lexical causatives– John broke the clock.
• Resultatives– John hammered the metal flat.
• Implicit– Max switched off the light. The room became pitch dark.