syntactic variations versus semantic roles · 2008-05-12 · 1 1 semantic role labeling cs 224n,...

11
1 1 Semantic Role Labeling CS 224N, Spring 2008 Dan Jurafsky Slides mainly from a tutorial from Scott Wen-tau Yih and Kristina Toutanova (Microsoft Research) with additional slides from Sameer Pradhan (BBN) as well as Chris Manning and myself 2 Syntactic Variations versus Semantic Roles Yesterday , Kristina hit Scott with a baseball Scott was hit by Kristina yesterday with a baseball Yesterday , Scott was hit with a baseball by Kristina With a baseball, Kristina hit Scott yesterday Yesterday Scott was hit by Kristina with a baseball The baseball with which Kristina hit Scott yesterday was hard Kristina hit Scott with a baseball yesterday Agent, hitter Instrument Patient, Thing hit Temporal adjunct 3 Syntactic Variations (as trees) PP S NP VP NP Kristina hit Scott with a baseball yesterday NP NP S PP VP With a baseball , Kristina hit Scott yesterday NP NP 4 Semantic Role Labeling – Giving Semantic Labels to Phrases [ AGENT John] broke [ THEME the window] [ THEME The window] broke [ AGENT Sotheby’s] .. offered [ RECIPIENT the Dorrance heirs] [ THEME a money-back guarantee] [ AGENT Sotheby’s] offered [ THEME a money-back guarantee] to [RECIPIENT the Dorrance heirs] [ THEME a money-back guarantee] offered by [ AGENT Sotheby’s] [ RECIPIENT the Dorrance heirs] will [ ARM-NEG not] be offered [ THEME a money-back guarantee] 5 What is SRL good for? Question Answering Q: What was the name of the first computer system that defeated Kasparov? A: [PATIENT Kasparov] was defeated by [AGENT Deep Blue] [TIME in 1997]. Q: When was Napoleon defeated? Look for: [PATIENT Napoleon] [PRED defeat-synset] [ARGM-TMP *ANS*] More generally: 6 What is SRL good for? Applications as a simple meaning rep’n Machine Translation English (SVO) Farsi (SOV) [AGENT The little boy] [AGENT pesar koocholo] boy-little [PRED kicked] [THEME toop germezi] ball-red [THEME the red ball] [ARGM-MNR moqtam] hard-adverb [ARGM-MNR hard] [PRED zaad-e] hit-past Document Summarization Predicates and Heads of Roles summarize content Information Extraction SRL can be used to construct useful rules for IE

Upload: others

Post on 16-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

1

1

Semantic Role Labeling CS 224N, Spring 2008

Dan Jurafsky

Slides mainly from a tutorial from Scott Wen-tau Yih and Kristina Toutanova (Microsoft Research) with additional slides from Sameer Pradhan (BBN) as well as Chris Manning and myself

2

Syntactic Variations versus Semantic Roles

Yesterday, Kristina hit Scott with a baseball

Scott was hit by Kristina yesterday with a baseball

Yesterday, Scott was hit with a baseball by Kristina

With a baseball, Kristina hit Scott yesterday

Yesterday Scott was hit by Kristina with a baseball

The baseball with which Kristina hit Scott yesterday was hard

Kristina hit Scott with a baseball yesterday

Agent, hitter Instrument Patient, Thing hit Temporal adjunct

3

Syntactic Variations (as trees) S

PP

S

NP VP

NP

Kristina hit Scott with a baseball yesterday

NP

S

NP

S

PP VP

With a baseball , Kristina hit Scott yesterday

NP

NP

4

Semantic Role Labeling – Giving Semantic Labels to Phrases

  [AGENT John] broke [THEME the window]

  [THEME The window] broke

  [AGENTSotheby’s] .. offered [RECIPIENT the Dorrance heirs] [THEME a money-back guarantee]

  [AGENT Sotheby’s] offered [THEME a money-back guarantee] to [RECIPIENT the Dorrance heirs]

  [THEME a money-back guarantee] offered by [AGENT Sotheby’s]

  [RECIPIENT the Dorrance heirs] will [ARM-NEG not] be offered [THEME a money-back guarantee]

5

What is SRL good for? Question Answering

Q: What was the name of the first computer system that defeated Kasparov?

A: [PATIENT Kasparov] was defeated by [AGENT Deep Blue] [TIME in 1997].

Q: When was Napoleon defeated? Look for: [PATIENT Napoleon] [PRED defeat-synset] [ARGM-TMP *ANS*]

More generally:

6

What is SRL good for? Applications as a simple meaning rep’n   Machine Translation

English (SVO) Farsi (SOV) [AGENT The little boy] [AGENT pesar koocholo] boy-little [PRED kicked] [THEME toop germezi] ball-red [THEME the red ball] [ARGM-MNR moqtam] hard-adverb [ARGM-MNR hard] [PRED zaad-e] hit-past

  Document Summarization   Predicates and Heads of Roles summarize content

  Information Extraction   SRL can be used to construct useful rules for IE

2

7

Application: Semantically precise search

Query: afghans destroying opium poppies

8

Some typical semantic roles

9

Some typical semantic roles

10

Diathesis alternations

11

Problems with those semantic roles

  It’s very hard to produce a formal definition of a role

  There are all sorts of arbitrary role splits   Intermediary instruments (1-2) vs. enabling

instruments (3-4): 1.  The cook opened the jar with the new gadget 2.  The new gadget opened the jar 3.  Sally ate the sliced banana with a fork 4.  *The fork ate the sliced banana

12

Solutions to the difficulty of defining semantic roles

  Ignore semantic role labels, and just mark arguments of individual verbs as 0, 1, 2   PropBank

  Define semantic role labels for a particular semantic domain   FrameNet

3

13

Proposition Bank (PropBank) [Palmer et al. 05]

  Transfer sentences to propositions   Kristina hit Scott → hit(Kristina,Scott)

  Penn TreeBank → PropBank   Add a semantic layer on Penn TreeBank   Define a set of semantic roles for each verb   Each verb’s roles are numbered

…[A0 the company] to … offer [A1 a 15% to 20% stake] [A2 to the public] …[A0 Sotheby’s] … offered [A2 the Dorrance heirs] [A1 a money-back guarantee] …[A1 an amendment] offered [A0 by Rep. Peter DeFazio] … …[A2 Subcontractors] will be offered [A1 a settlement] …

14

PropBank   A corpus of labeled sentences   The arguments of each verb are labeled with numbers

rather than names

15

Application of PropBank labels

16

Proposition Bank (PropBank) Define the Set of Semantic Roles

  It’s difficult to define a general set of semantic roles for all types of predicates (verbs).

  PropBank defines semantic roles for each verb and sense in the frame files.

  The (core) arguments are labeled by numbers.   A0 – Agent; A1 – Patient or Theme   Other arguments – no consistent generalizations

  Adjunct-like arguments – universal to all verbs   AM-LOC, TMP, EXT, CAU, DIR, PNC, ADV, MNR,

NEG, MOD, DIS

17

Proposition Bank (PropBank) Frame Files

  hit.01 “strike”  A0: agent, hitter; A1: thing hit;

A2: instrument, thing hit by or with [A0 Kristina] hit [A1 Scott] [A2 with a baseball] yesterday.

  look.02 “seeming”  A0: seemer; A1: seemed like; A2: seemed to [A0 It] looked [A2 to her] like [A1 he deserved this].

  deserve.01 “deserve”  A0: deserving entity; A1: thing deserved;

A2: in-exchange-for It looked to her like [A0 he] deserved [A1 this].

AM-TMP Time

Proposition: A sentence and a target verb

18

S

PP

S

NP VP

NP

Kristina hit Scott with a baseball yesterday

NP

Proposition Bank (PropBank) Add a Semantic Layer

A0

A1 A2 AM-TMP

[A0 Kristina] hit [A1 Scott] [A2 with a baseball] [AM-TMP yesterday].

4

19

Proposition Bank (PropBank) Add a Semantic Layer – Continued

S

VP

S

NP

VP

NP

“The worst thing about him ,” said Kristina , “is his laziness .”

NPNP PP

SA1 C-A1

A0

[A1 The worst thing about him] said [A0 Kristina ] [C-A1 is his laziness].

20

Proposition Bank (PropBank) Final Notes

  Current release (Mar 4, 2005): Proposition Bank I   Verb Lexicon: 3,324 frame files   Annotation: ~113,000 propositions http://www.cis.upenn.edu/~mpalmer/project_pages/ACE.htm

  Alternative format: CoNLL-04,05 shared task   Represented in table format   Has been used as standard data set for the shared

tasks on semantic role labeling http://www.lsi.upc.es/~srlconll/soft.html

21

Other Corpora

  Chinese PropBank http://www.cis.upenn.edu/~chinese/cpb/   Similar to PropBank, it hadds a semantic layer

onto Chinese Treebank   NomBank http://nlp.cs.nyu.edu/meyers/NomBank.html

  Label arguments that co-occur with nouns in PropBank

  [A0 Her] [REL gift] of [A1 a book] [A2 to John]

22

1.  lie(“he”,…)

2.  leak(“he”, “information obtained from … he supervised”)

3.  obtain(X, “information”, “from a wiretap he supervised”)

4.  supervise(“he”, “a wiretap”)

23

Problem with PropBank Labels

  Propbank has no nouns   Nombank adds nouns, but   We’d like to also get similar meanings in these

cases using   Increase (verb)   Rose (verb)   Rise (noun)

24

FrameNet [Fillmore et al. 01]

Frame: Hit_target (hit, pick off, shoot)

Agent Target

Instrument Manner

Means Place

Purpose Subregion

Time

Lexical units (LUs): Words that evoke the frame (usually verbs)

Frame elements (FEs): The involved semantic roles

Non-Core Core

[Agent Kristina] hit [Target Scott] [Instrument with a baseball] [Time yesterday ].

5

25

FrameNet

  A frame is a semantic structure based on a set of participants and events

  Consider the “change_position_on_scale” frame

26

Roles in this frame

27

Examples from this frame

28

Problems with FrameNet

  Example sentences are chosen by hand   Not randomly selected   Complete sentences not labeled

  Since TreeBank wasn’t used   No perfect parses for each sentence

  Still ongoing (that’s good and bad)

29

Some History   Fillmore 1968: The case for case

  Proposed semantic roles as a shallow semantic representation   Simmons 1973:

  Built first atuomatic semantic role labeler   Based on first parsing the sentence

30

Methodology for FrameNet While (remaining funding > 0) do 1.  Define a frame (eg DRIVING) 2.  Find some sentences for that frame 3.  Annotate them

  Corpora   FrameNet I – British National Corpus only   FrameNet II – LDC North American Newswire corpora

  Size   >8,900 lexical units, >625 frames, >135,000 sentences

http://framenet.icsi.berkeley.edu

6

31

FrameNet vs PropBank -1

32

FrameNet vs PropBank -2

33

Information Extraction versus Semantic Role Labeling

Characteristic IE SRL

Coverage narrow broad

Depth of semantics shallow shallow

Directly connected to application

sometimes no

34

Overview of SRL Systems

  Definition of the SRL task   Evaluation measures

  General system architectures   Machine learning models

  Features & models   Performance gains from different techniques

35

Subtasks

  Identification:   Very hard task: to separate the argument substrings from the

rest in this exponentially sized set   Usually only 1 to 9 (avg. 2.7) substrings have labels ARG and

the rest have NONE for a predicate

  Classification:   Given the set of substrings that have an ARG label, decide the

exact semantic label

  Core argument semantic role labeling: (easier)   Label phrases with core argument labels only. The modifier

arguments are assumed to have label NONE.

36

Evaluation Measures

Correct: [A0 The queen] broke [A1 the window] [AM-TMP yesterday] Guess: [A0 The queen] broke the [A1 window] [AM-LOC yesterday]

  Precision ,Recall, F-Measure {tp=1,fp=2,fn=2} p=r=f=1/3   Measures for subtasks

  Identification (Precision, Recall, F-measure) {tp=2,fp=1,fn=1} p=r=f=2/3   Classification (Accuracy) acc = .5 (labeling of correctly identified phrases)   Core arguments (Precision, Recall, F-measure) {tp=1,fp=1,fn=1}

p=r=f=1/2

Correct Guess {The queen} →A0 {the window} →A1 {yesterday} ->AM-TMP all other → NONE

{The queen} →A0 {window} →A1 {yesterday} ->AM-LOC all other → NONE

7

37

What’s the problem with these evaluations?

  Approximating human evaluations is dangerous   Humans don’t always agree   Not clear if it’s good for anything   Sometimes called the “match-a-linguist” task

  What’s a better evaluation?

38

Basic Architecture of a Generic SRL System

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|c,s,p,A)

Local scores for phrase labels do not depend on labels of other phrases

Joint scores take into account dependencies among the labels of multiple phrases

(adding features)

39

SRL architecture: Walk the tree, labeling each parse tree node

  Given a parse tree t, label the nodes (phrases) in the tree with semantic labels

S

NPVP

NP

She broke the expensive vase

PRP VBD DT JJ NN

A0

NONE

Alternative approach: labeling chunked sentences..

[NPYesterday] , [NPKristina] [VPhit] [NPScott] [PPwith] [NPa baseball]. 40

Why this parse-tree architecture?

annotations

local scoring

joint scoring

Sentence s , predicate t

semantic roles

s, t, A

s, t, Ascore(l|n,s,t,A)

  Semantic role chunks tend to correspond to syntactic constituents

  Propbank:   96% of arguments = 1 (gold) parse tree constitutent   90% of arguments = 1 (Charniak) parse tree constituent   Simple rules can recover missing 4-10% FrameNet, 87% of arguments = 1 (Collins) parse tree constituent

Why? they were labeled from parse trees by humans trained in syntax

41

Parsing Algorithm

  Use a syntactic parser to parsethe sentence   For each predicate (non-copula verb)

  For each node in the syntax tree   Extract a feature vector relative to the predicate   Classify the node

  Do second-pass informed by global info

Slide from Sameer Pradhan 42

Slide from Sameer Pradhan

Baseline Features [Gildea & Jurafsky, 2000]

•  Predicate (verb) NP↑S↓VP↓VBD

VP→VBD-PP •  Path from constituent to predicate

•  Position (before/after) •  Phrase type (syntactic)

•  Sub-categorization •  Head Word •  Voice (active/passive)

8

43 Slide from Sameer Pradhan

Pradhan et al. (2004) Features

•  Predicate cluster •  Noun head and POS of PP constituent •  Verb sense •  Partial path •  Named entities in constituent (7) [Surdeanu et al., 2003] •  Head word POS [Surdeanu et al., 2003] •  First and last word in constituent and their POS •  Parent and sibling features •  Constituent tree distance •  Ordinal constituent position •  Temporal cue words in constituent •  Previous 2 classifications

44 Slide from Sameer Pradhan

Predicate cluster, automatic or WordNet

spoke lectured chatted explained

45 Sameer Pradhan

Noun Head and POS of PP

→ PP-for

46 Sameer Pradhan

Partial Path

47 Sameer Pradhan

Named Entities and Head Word POS [Surdeanu et al., 2003]

she it they

half an hour 60 seconds

TIME PRP

48 Sameer Pradhan

First and Last Word and POS

9

49 Sameer Pradhan

Parent and Sibling features

Left sibling

Parent

50 Sameer Pradhan

Constituent tree distance

1

2

1

3

51 Sameer Pradhan

Ordinal constituent position

1

1

1

1 First NP to the right of the predicate

1

52 Sameer Pradhan

Temporal Cue Words (~50)

time

recently

days

end

period

years;ago

night

hour

decade

late

53 Sameer Pradhan

Previous 2 classifications

54

S

NPVP

NP

She broke the expensive vase

PRP VBD DT JJ NN

Step 2. Identification. Identification model (filters out candidates with high probability of NONE)

Combining Identification and Classification Models

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|n,s,p,A)

S

NPVP

NP

She broke the expensive vase

PRP VBD DT JJ NN

Step 1. Pruning. Using a hand-specified filter.

S

NPVP

NP

She broke the expensive vase

PRP VBD DT JJ NN

S

NPVP

NP

She broke the expensive vase

PRP VBD DT JJ NN

A0

Step 3. Classification. Classification model assigns one of the argument labels to selected nodes (or sometimes possibly NONE)

A1

10

55

Combining Identification and Classification Models – Continued

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|n,s,p,A)

S

NPVP

NP

She broke the expensive vase

PRP VBD DT JJ NN

One Step. Simultaneously identify and classify using

S

NPVP

NP

She broke the expensive vase

PRP VBD DT JJ NN

A0 A1

or

56

Joint Scoring Models

  These models have scores for a whole labeling of a tree (not just individual labels)   Encode some dependencies among the labels of different nodes

S

NP

S

NP VP

Yesterday , Kristina hit Scott hard

NP

NPA0 AM-TMP

A1 AM-TMP

NONE

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|n,s,p,A)

57

Combining Local and Joint Scoring Models

  Tight integration of local and joint scoring in a single probabilistic model and exact search [Cohn&Blunsom 05] [Màrquez et al. 05],[Thompson et al. 03]   When the joint model makes strong independence assumptions

  Re-ranking or approximate search to find the labeling which maximizes a combination of local and a joint score [Gildea&Jurafsky 02] [Pradhan et al. 04] [Toutanova et al. 05]   Usually exponential search required to find the exact maximizer

  Exact search for best assignment by local model satisfying hard joint constraints   Using Integer Linear Programming [Punyakanok et al 04,05] (worst

case NP-hard)

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|n,s,p,A)

58

Joint Scoring: Enforcing Hard Constraints

  Constraint 1: Argument phrases do not overlap By [A1 working [A1 hard ] , he] said , you can achieve a lot.

  Pradhan et al. (04) – greedy search for a best set of non-overlapping arguments

  Toutanova et al. (05) – exact search for the best set of non-overlapping arguments (dynamic programming, linear in the size of the tree)

  Punyakanok et al. (05) – exact search for best non-overlapping arguments using integer linear programming

  Other constraints ([Punyakanok et al. 04, 05])   no repeated core arguments (good heuristic)   phrases do not overlap the predicate

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|n,s,p,A)

59

Joint Scoring: Integrating Soft Preferences

  There are many statistical tendencies for the sequence of roles and their syntactic realizations   When both are before the verb, AM-TMP is usually before A0   Usually, there aren’t multiple temporal modifiers   Many others which can be learned automatically

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|n,s,p,A)

S

NP

S

NP VP

Yesterday , Kristina hit Scott hard

NP

NPA0 AM-TMP

A1 AM-TMP

60

Joint Scoring: Integrating Soft Preferences

  Gildea and Jurafsky (02) – a smoothed relative frequency estimate of the probability of frame element multi-sets:

  Gains relative to local model 59.2 → 62.9 FrameNet automatic parses

  Pradhan et al. (04 ) – a language model on argument label sequences (with the predicate included)

  Small gains relative to local model for a baseline system 88.0 → 88.9 on core arguments PropBank correct parses

  Toutanova et al. (05) – a joint model based on CRFs with a rich set of joint features of the sequence of labeled arguments   Gains relative to local model on PropBank correct parses 88.4 → 91.2 (24% error

reduction); gains on automatic parses 78.2 → 80.0

  Also tree CRFs [Cohn & Brunson] have been used

annotations

local scoring

joint scoring

Sentence s , predicate p

semantic roles

s, p, A

s, p, Ascore(l|n,s,p,A)

11

61

Semantic roles: joint models boost results [Toutanova et al. 2005] Accuracies of local and joint models on core arguments

Error reduction from best published result: 44.6% on Integrated 52% on Classification

95.0

90.6

95.195.7

91.8

96.1

97.6

94.8

85

87

89

91

93

95

97

99

Id Class Integrated

X&P

Local

Joint

best previous result Xue and Palmer 04

local model result

joint model result f-m

easu

re

62

System Properties

  Features   Most modern systems use the standard set of Gildea,

Pradhan, and Surdeanu features listed above   Lots of features important for building a good system

  Learning Methods   SNoW, MaxEnt, AdaBoost, SVM, CRFs, etc.   The choice of learning algorithms is less important.

63

System Properties – Continued

  Syntactic Information   Charniak’s parser, Collins’ parser, clauser, chunker, etc.   Top systems use Charniak’s parser or some mixture  Quality of syntactic information is important

  System/Information Combination   Greedy, Re-ranking, Stacking, ILP inference  Combination of systems or syntactic information is a

good strategy to reduce the influence of incorrect syntactic information!

64

Per Argument Performance CoNLL-05 Results on WSJ-Test

  Core Arguments (Freq. ~70%)

  Adjuncts (Freq. ~30%)

Best F1 Freq.

A0 88.31 25.58% A1 79.91 35.36% A2 70.26 8.26% A3 65.26 1.39% A4 77.25 1.09%

Best F1 Freq.

TMP 78.21 6.86% ADV 59.73 3.46% DIS 80.45 2.05% MNR 59.22 2.67% LOC 60.99 2.48% MOD 98.47 3.83% CAU 64.62 0.50% NEG 98.91 1.36%

Data from Carreras&Màrquez’s slides (CoNLL 2005)

Arguments that need to be improved

65

Summary

  Semantic role labeling   An important attempt at shallow semantic

extraction   Relatively successful in terms of

approximating   Human FrameNet labels   Human PropBank labels

  Are these good for anything?   We don’t know yet