annotation of semantic roles
TRANSCRIPT
![Page 1: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/1.jpg)
Annotation of Semantic Roles
Paola MonachesiIn collaboration with
Gerwert Stevens and Jantine Trapman
Utrecht University
![Page 2: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/2.jpg)
2
Overview
• Semantic roles in linguistic literature • Annotation of semantic roles
– Framenet– PropBank
• Merging approaches• Annotation in the D-coi project• Automatic Semantic Role Labeling• Hands-on: annotation of the 1984 English-
Romanian corpus
![Page 3: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/3.jpg)
3
Semantic RolesA general introduction
• Semantic roles capture the relationship between a predicate and syntactic constituents
• Semantic roles assign meaning to syntactic constituents
• Linking theory: interaction between syntax and semantics
• How can roles be inferred from syntax?
![Page 4: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/4.jpg)
4
Generic semantic roles: characteristics
• Fixed set• Roles are atomic• Each verbal argument is assigned only
one role• roles are uniquely assigned• roles are non relational
![Page 5: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/5.jpg)
5
Fillmore 1968
• Nine roles: agent, experiencer,instrument,object,source, goal, location, time and path.
• Direct relation between roles and grammatical functions
• Small set of roles not sufficient• Frame semantics -> FrameNet
![Page 6: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/6.jpg)
6
Jackendoff 1990
• Four roles: theme, source, goal, agent• Meaning represented by conceptual
structure based on conceptual constituents
• Relation between syntactic constituent and conceptual constituent
![Page 7: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/7.jpg)
7
Dowty 1991
• Thematic roles as prototipical concepts• Two proto-roles: Proto-Agent and Proto-
Patient• Each proto role characterized by
properties• Flexible system
![Page 8: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/8.jpg)
8
Levin 1993
• Syntactic frames reflect the semantics of verbs
• Verb classes based on syntactic frames which are meaning preserving
• Verb Net (Kipper et al. 2000)• PropBank (Palmer et al. 2005)
![Page 9: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/9.jpg)
9
Verb specific roles
• Situation Semantics in HPSG (Pollard and Sag 1987)
• Frame semantics (Fillmore 1968)• No fixed set of roles• Role sets specific to:
– Verb– Concept of a given verb
![Page 10: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/10.jpg)
10
Semantic roleassignment
Some emerging projects as basis:
• Proposition Bank (Kingsbury et al. 2002)• FrameNet (Johnson et al. 2002)
![Page 11: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/11.jpg)
11
PropBank
• Semantic layer of Penn Treebank• Goal: consistent argument labeling for
automatic extraction of relational data.• Set of semantic roles related to the
accompanying syntactic realizations.
![Page 12: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/12.jpg)
12
PropBank
external causerArgA
end pointArg4
start point / beneficiary / instrument / attributeArg3
indirect object / beneficiary / instrument / attribute / end state
Arg2
intern argument (proto-Patient)Arg1
extern argument (proto-Agent)Arg0
![Page 13: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/13.jpg)
13
Additional tags (ArgMs)
• ArgM-TMP Temporal marker (when?)• ArgM-LOC Location (where?)• ArgM-DIR Direction (where to?)• ArgM-MNR Manner (how?)• Etc.
![Page 14: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/14.jpg)
14
PropBank
• Framefiles are developed on the basis of the individual verbs.
• All the possible roles are spelled out.• The framefile includes all the possible
senses of a word.
![Page 15: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/15.jpg)
15
FramefilesMary left the roomMary left her daughter-in-law her pearls in her will
Frameset leave.01 “move away from”:Arg0: entity leavingArg1: place left
Frameset leave.02 “give”:Arg0: giverArg1: thing givenArg2: beneficiary
![Page 16: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/16.jpg)
16
PropBank Frame File ExampleRoleset give.01 "transfer"
Roles:Arg0: Giver
Arg1: Thing given
Arg2: entity given to
Example:[The executives]arg0 gave [the chefs]arg2 [a standing ovation]arg1
![Page 17: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/17.jpg)
17
FrameNet
• http://framenet.icsi.berkeley.edu• Lexicon-building project• Corpus based• Words grouped in semantic classes which
represent prototypical situations (frame)
![Page 18: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/18.jpg)
18
FrameNet
• 8.900 lexical units,• 625 semantic frames• 135.000 annotated sentences
![Page 19: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/19.jpg)
19
FrameNet• Three components:
• Frame ontology• Set of annotated sentences• Set of lexical entries
![Page 20: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/20.jpg)
20
FrameNet
• Lexical units• Frame ontology• Frame:
– Definition– List of frame elements– Set of lexical units (Frame Evoking Elements)
• Corpus of example sentences
![Page 21: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/21.jpg)
21
FrameNet
Example:Leave evokes Departing:
Definition:“An object (the Theme) moves away from a Source.
The Source may be expressed or it may be understood from context, but its existence is always implied by the departing word itself.”
![Page 22: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/22.jpg)
22
FrameNet
Frame elements:Source, Theme, Area, Depictive, Distance, Manner, Goal etc.
Example sentence:[Theme We all] left [Source the school] [Time at four o’clock].
![Page 23: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/23.jpg)
23
FrameNet frame ExampleFrame: Giving
Lexical units: give.v, give_out.v, hand in.v, hand.v, hand_out.v, hand_over.v, pass.v,
Frame elements:Donor: The person that begins in possession of the Themeand
causes it to be in the possession of the Recipient.
Recipient: The entity that ends up in possession of the Theme
Theme: The object that changes ownership
Example:[300 euro]theme was given [to John]recipient [by his mother]donor
![Page 24: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/24.jpg)
24
Comparing approaches
Differences in– Methodology– Construction– Structure
![Page 25: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/25.jpg)
25
Comparing approaches
FrameNet[Buyer Chuck] bought [Goods a car] [Seller from Jerry] [Payment for $1000].
[Seller Jerry] sold [Goods a car] [Buyer to Chuck] [Payment for $1000].
PropBank[Arg0 Chuck] bought [Arg1 a car] [Arg2 from Jerry] [Arg3 for $1000].
[Arg0 Jerry] sold [Arg1 a car] [Arg2 to Chuck] [Arg3 for $1000].
![Page 26: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/26.jpg)
26
FrameNet: methodology
• Frame by frame basis• Choose semantic frame• Define the frame• Define its participants (frame elements)• List lexical predicates which invoke a frame• Find relevant sentences in a corpus• Annotate each frame element in sentence
![Page 27: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/27.jpg)
27
PropBank: methodology• Examine relevant sentences from corpus containing
verb under consideration;• Group verbs into major senses;• Semantic roles assigned on a verb by verb basis.• Fileframes created on the basis of all possible senses
of a predicate.• Attempt to label semantically related verbs
consistently• Less emphasis on the definition of the semantics of the
class• Creates the basis for training statistical systems.
![Page 28: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/28.jpg)
28
PropBank vs. FrameNet
• PB: classification based on word senses (corpus driven)
• FN: classification based on semantic classes(concepts driven)
![Page 29: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/29.jpg)
29
Comparing approaches
PropBank– Word senses– Shallow layering– Restricted set of
argument labels– Reflecting syntactic
relations
FrameNet– Concepts– Deep hierarchy– Exhaustive list of
frame elements– Semantic roles
![Page 30: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/30.jpg)
30
Semantic roles and NLP
• Semantic roles help to answer questions like "Who?", "When?", "What?", "Where?", "Why?", etc. in NLP applications.
• Semantic role labeling (SRL) is in useful in range of applications such as:– Question Answering– Machine translation– Information extraction
• Projects have emerged in which corpora are annotated with semantic roles
![Page 31: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/31.jpg)
31
Semantic roles and corpora
• Can the PB and FN methodologies be adopted for the annotation of corpora in different languages?
• What changes are necessary?• FN: Salsa project (Erk and Pado 2004) Spanish
FrameNet (Subirats and Petruck, 2003) and Japanese FrameNet (Ohara et al., 2004)
• PB: Arabic PB, Spanish PB
![Page 32: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/32.jpg)
32
Dutch Corpus Initiative(D-coi)
• Pilot of 50 M words, written language• September 2005 - December 2006• Blueprint for 500 MW corpus
– Schemes– Protocols– Procedures– Testing adequacy & practicability
![Page 33: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/33.jpg)
33
STEVIN
• Dutch-Flemish cooperation• 2004 – 2009• 8.5 M euro• Goals:
– Realization of an adequate digital language infrastructure for Dutch
– Research within the area of LST– Train new experts, exchange knowledge, stimulate
demand
![Page 34: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/34.jpg)
34
STEVIN
Priority list of needed facilities– In Speech Technology
• Speech and multimodal corpora• Text corpora• Tools and data
– In Language Technology• Dutch corpora• Electronic lexica• Aligned parallel corpora
![Page 35: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/35.jpg)
35
D-coi
• Applications:– Information extraction– QA– Document classification– Automatic abstracting– Linguistic research
![Page 36: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/36.jpg)
36
D-coi
• Various annotation layers:– PoS– Lemmatization– Syntax– (Semantics)
![Page 37: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/37.jpg)
37
Semantic annotation
• Current projects focus mainly on English• Need for a Dutch scheme• Role assignment, temporal and spatial
annotation• +/- 3000 words• Utrecht University: role assignment
![Page 38: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/38.jpg)
38
Integration in D-coi
• Separate annotation levels• One comprehensive scheme for
semantic annotations• Integration with other annotation layers
![Page 39: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/39.jpg)
39
Several options
Option 1: Dutch FrameNet
+ Exploit SALSA results- construction of new frames necessary- not a very transparent annotation- difficult in use for annotators
![Page 40: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/40.jpg)
40
Several options
Option 2: Dutch PropBank
+ Transparent annotation+ At least semi-automatic- No classification within frame ontology
![Page 41: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/41.jpg)
41
Several options
Option 3: Union of FrameNet and PropBank
• FrameNet – conceptual structure• PropBank – role assignment
![Page 42: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/42.jpg)
42
D-coi: semantic role assignment
Reconcile: • PropBank approach which is corpus based and
syntactic driven. • FrameNet approach which is semantic driven
and based on a network of relations between frames.
• Necessity to make the annotation process automatic.
• Necessity to have a transparent annotation for annotators and users.
![Page 43: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/43.jpg)
43
Questions
• Is it possible to merge FN frames with PB role labels (manual or semi-automatic)?
• To which extent can we use existing resources?• Can we extend existing resources? Should we
include language specific features in the original source?
• Is it possible to extend the merged resources by exploiting the best features of both?
![Page 44: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/44.jpg)
44
Three pilot studies
• The Communication frame• The Transitive_action frame• The adjunct middle in Dutch
![Page 45: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/45.jpg)
45
The Communicationframe
• Aims:– Convert FN frames to a simpler form– Make PB argument labels more uniform
• Assume Levin’s classes and diathesis alternations
• Construct one role set for verbs that share the same class
![Page 46: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/46.jpg)
46
The Communicationframe
• Test: Communication and daughter frames• Example from Communication_noise:
SpeakerMessageAddressee
Arg0: speaker, communicatorArg1: utteranceArg2: hearer
FrameNetPropBank
![Page 47: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/47.jpg)
47
The Transitive_actionframe
• Definition: ”This frame characterizes, at a very abstract level an Agent or Cause affecting a Patient.”
• More abstract, more challenging• 29 daughter frames• Five frames investigated
![Page 48: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/48.jpg)
48
The Transitive_actionframe
Example from Cause_harm:
AgentVictim, Body_partInstrumentDegree
Arg0: agent, hitter – animate only!Arg1: thing hitArg2: instrument, thing hit by/withArg3: intensifier of action
FrameNetPropBank
![Page 49: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/49.jpg)
49
The Transitive_actionframe
• Classification sometimes not straightforward• Role sets can be very specific• Be careful not to create too general role sets
Arg0: entity causing harmArg1: thing being harmedArg2: instrumentArg3: pieces
Arg0: V-erArg1: thing being V-edArg2: instrumentArg3: pieces
![Page 50: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/50.jpg)
50
The adjunct middleObject middle(1) De winkel verkocht zijn laatste roman helemaal
niet.‘The store didn’t sell his last novel at all.’
(2) Zijn laatste roman verkocht helemaal niet.‘His last novel didn’t sell at all.’
Adjunct middle(3) Men zit lekker op deze stoel.
‘One sits comfortably on this chair’
(4) Deze stoel zit lekker.‘This chair sits comfortably.’
![Page 51: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/51.jpg)
51
The adjunct middle
(1) Deze stoel zit lekker.‘This chair sits comfortably.’
(2) Deze zee vaart rustig.‘This sea sails peacefully.’
(3) Regenweer wandelt niet gezellig.‘Rainy weather does not walk pleasantly.’
![Page 52: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/52.jpg)
52
Middles in FrameNeta. [Goods Zijn laatste roman] verkocht helemaal niet (CNI:
Seller).‘His last novel didn’t sell at all.’
b. [Location Deze stoel] zit lekker (CNI: Agent).‘This chair sits comfortably.’
c. [Area De zee] vaart rustig (CNI: Driver).‘The sea sails peacefully.’
d. [Depictive? Regenweer] wandelt niet prettig (CNI: Self-mover).‘Rainy weather does not walk pleasantly.’
![Page 53: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/53.jpg)
53
Middles in PropBank
a. [Arg1 Zijn laatste roman] verkocht [ArgM-MNRhelemaal niet].‘His last novel didn’t sell at all.’
b. [? Deze stoel] zit [ArgM-MNR lekker].‘This chair sits comfortably.’
c. [? De zee] vaart [ArgM-MNR rustig].‘The sea sails peacefully.’
d. [? Regenweer] wandelt [ArgM-MNR niet prettig].‘Rainy weather does not walk pleasantly.’
![Page 54: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/54.jpg)
54
Observations
• FrameNet: more specific role labels, semantically driven
• PropBank: less specific, syntactically driven
• Both approaches have their own problems• Merging might provide a solution
language specific problems need to be addressed
![Page 55: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/55.jpg)
55
Omegahttp://omega.isi.edu
• 120,000-node terminological ontology• It includes:
• Wordnet• Mikrokosmos (conceptual resource)
• FrameNet and PropBank are included to assign frame information to each word sense of the predicate.
• Link between the frames and the word senses is created manually as well as the alignment between FrameNet and PropBank
• Omega seems to align while we merge
![Page 56: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/56.jpg)
56
Concepts vs. word sensesConcepts
MC WordSensesWN
SemanticFrames
FN
WordSenses
PB
Omega
![Page 57: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/57.jpg)
57
Alignment
• Linking schemes• Schemes stay separate modules• Problem when modified
![Page 58: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/58.jpg)
58
Merging
• Implies alignment• Integrates one scheme into another• Integrates two schemes into a third,
new scheme
![Page 59: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/59.jpg)
59
How to proceed
• Omega can be used.• Possibility to use the link with WN and its
Dutch equivalent to automatically translate the word senses.
• PB methodology can be employed to automatically assign roles to various predicates.
![Page 60: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/60.jpg)
60
Semantic annotation in D-coiConsiderations
• Can existing methologies be adopted?– PropBank– FrameNet
• Our choice: a combination of both (Monachesi and Trapman 2006)
• But for the time being: PropBank
![Page 61: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/61.jpg)
61
Automatic SRL
• Manual annotation of a large corpus such as D-Coi is too expensive
• Is automatic semantic role labeling feasible?
![Page 62: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/62.jpg)
62
Automatic SRL
• Classification algorithms• Mapping between set of features and set
of classes• Two phases:
– Training phase– Evaluation phase
![Page 63: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/63.jpg)
63
Classification algorithms
• Probability Estimation (Gildea and Jurafsky, 2002)
• Assignment of FrameNet roles• 65% Precision• 61% Recall
![Page 64: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/64.jpg)
64
Classification algorithms
• Support Vector Machines (SVMs) (Vapnik, 1995)
• Binary classifier, problem for SRL• Lower classification speed
– Solution: filter out instances with high probability of being null
![Page 65: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/65.jpg)
65
Classification algorithms
• Memory based learning (MBL) (Daelemans et al., 2004)
• Learning component: training examples stored in memory
• Performance component: similarity based
![Page 66: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/66.jpg)
66
MBL
• Instances are loaded in memory• Instance: vector with feature-value pairs
and class assignment• Unseen examples compared with training
data• Distance metric used for comparison• k-nearest neighbors algorithm
![Page 67: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/67.jpg)
67
Automatic SRL
• Previous research on automatic SRL showed encouraging results– Best published results for PropBank labeling of an
English corpus: 84% precision, 75% recall and 79 F-Score (Pradhan et al., 2005)
• Generally, machine learning methods are used, which requires training data
![Page 68: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/68.jpg)
68
Automatic SRL in a Dutch corpus
• Main Problem:– There is no Dutch annotated corpus available that
can be used as training data
• Solution:– Create new training data semi-automatically
(bootstrapping) by using a rule-based tagger on unannotated data (dependency structures)
– Manually correct output of rule-based tagger
![Page 69: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/69.jpg)
69
SRL approach
• Define mapping between dependency structures and PropBank
• Implement mapping in a rule based automatic argument tagger
• Manually correct tagger output• Use manually corrected corpus as input
for a memory based classifier (TiMBL)
![Page 70: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/70.jpg)
70
Dependency structuresJohn geeft het boek aan Marie“John gives the book to Marie” SMAIN
Johnhet boek
SUname
HDverb
geeft
OBJ1NP
OBJ2PP
HDPREP
OBJ1noun
aan Marie
![Page 71: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/71.jpg)
71
Augmenting dependency nodeswith PropBank labels
John geeft het boek aan Marie“John gives the book to Marie” SMAIN
John het boek
SUName
HDverb
geeft
OBJ1NP
OBJ2PP
aan Marie
Arg0 Arg1 Arg2PRED
![Page 72: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/72.jpg)
72
Basic mapping
HeadPredicate
ModifierArgM-xxx
ComplementArg0 . . .Argn
Dependency categoryPropBank
label
![Page 73: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/73.jpg)
73
Mapping numbered arguments a mapping for subject and object
complements
Arg2Instrument / Attribute
OBJ2 (Indirect object)
Arg1PatientOBJ1 (Direct object)
Arg0AgentSU (Subject)
PropBank LabelThematic roleDependency label
“No consistent generalizations can be made across verbs for the higher numbered arguments” (Palmer et al. 2005)
![Page 74: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/74.jpg)
74
Mapping numbered argumentsHeuristically mapping higher numbered
arguments• Mapping complements to numbered arguments higher
than Arg2 is difficult• Complements that are candidate arguments are:
– PREDs (purpose clauses)– VCs (verbal complements)– MEs (complements indicating a quantity)– PCs (prepositional complements)
• These complements are mapped to the first available numbered argument
![Page 75: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/75.jpg)
75
Heuristic mapping exampleIk denk aan je
SMAIN
Ikaan je
denk
PCpp
SUPronArg0 Arg1
HDverb
PRED
First available numberedArgument: Arg1
![Page 76: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/76.jpg)
76
Mapping modifiers
Nodes with dependency label LD
Locative modifiers
ArgM-LOC
“om te” clause (c-label OTI)
Purpose clausesArgM-PNC
Nodes with dependency label PREDM
Prediction markers
ArgM-PRD
Mezelf, zichzelf, etc.
ReciprocalsArgM-Rec
Niet, nooit, geen, nergens
Negation markersArgM-Neg
Corresponding dependency nodes
DescriptionPropBank label
![Page 77: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/77.jpg)
77
XARA overview• Mapping is implemented in XARA: XML-based Automatic
Role-labeler for Alpino Trees (Stevens 2006, 2007)
• XARA performs automatic annotation of XML files based on a set of rules
• Purpose of XARA is to create training data for a learning system
• XARA is written in Java• Rule definitions are based on XPath queries
– Rules consist of an XPath expression and a target label– XPath expressions are used to select nodes in an XML file
![Page 78: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/78.jpg)
78
XARA annotation process
SMAIN
Johnhet boek
SUName
HDverb
geeft
OBJ1NP
OBJ2PP
aan Marie
Arg0 Arg1 Arg2PRED
(./node[@rel=‘su’], Arg0)
(./node[@rel=‘obj1’], Arg1)
(./node[@rel=‘obj2’], Arg2)
![Page 79: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/79.jpg)
79
XARA’s reusability
• Rules are based on XPath expressions, as a result: – XARA can be adapted to any XML-based
treebank– Creating rule definitions does not require
programming skills• XARA is not restricted to a specific set of
role labels
![Page 80: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/80.jpg)
80
Evaluation of XARA
53,80%45,83%65,11%
F-ScoreRecallPrecision
• Relatively low recall score due to the fact that XARA’s rules cover only a subset of PropBank argument labels
![Page 81: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/81.jpg)
81
Manual correction
• Sentences from a corpus annotated by XARA were manually corrected
• Correction was done in accordance with the PropBank guidelines
• The manually corrected corpus can be used to train a semantic role classifier
![Page 82: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/82.jpg)
82
Consequences
• Adapt PB guidelines to Dutch• Extend guidelines if needed• Dutch PB frameindex?
![Page 83: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/83.jpg)
83
Guidelines• PB guidelines largely applicable to Dutch
without problems (Trapman and Monachesi 2006)
• More linguistic research/background needed about the interpretation of modifiers
• Differences mainly caused by different tree structures:– D-coi: dependency structure– Penn Treebank: constituent structure
• Structural issue: traces
![Page 84: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/84.jpg)
84
Traces• General rule: traces do not get any label
- Passives:[Arg1 Degene die sterft], wordt *trace* [Arg2erflater] [PRED genoemd].
- Conjunctions:[Arg0 Jaap] [PRED leest] [Arg1 een boek] en [Arg0 Piet]*trace* [Arg1 een magazine].
![Page 85: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/85.jpg)
85
Traces- Wh-questions[Arg1 Wat] kunt [Arg0 u] *trace* [PRED doen] [Arg2 om deluchtkwaliteit in uw woning te verbeteren]?
- Relative clausesDaarnaast moet er regionaal extra aandacht komenvoor [Arg0 kinderen] [Arg0 die] *trace* [Arg1 tot eenrisicogroep] [PRED behoren].
![Page 86: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/86.jpg)
86
Annotation tools
• CLaRK:http://www.bultreebank.org/clark/index.html
• Salto:http://www.coli.uni-saarland.de/projects/salsa/
• TrEd:http://ufal.mff.cuni.cz/~pajas/tred/
![Page 87: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/87.jpg)
87
Methodology• Partly automatic annotation: Arg0, Arg1 and some
modifiers• Manual correction based on “Dutch” PB guidelines
– Check automatic annotation– Add remaining labels
• Support from PB frame files (English):– Partial setup Dutch frame index– check role set when uncertain about argument structure– check verb sense
![Page 88: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/88.jpg)
88
Result
• Semantic layer with labeled predicates, arguments and modifiers
• 2.088 sentences:– 1.773 NL– 315 VL
• 12.147 labels (NL):– 3.066 PRED labels (= verbs)– 5.271 arguments– 3.810 modifiers
![Page 89: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/89.jpg)
89
Example
![Page 90: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/90.jpg)
90
Annotation problems• Ellipsis:Indien u toch mocht besluiten [naar *trace* en in Angola te reizen],
wordt aangeraden ...
![Page 91: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/91.jpg)
91
Annotation problems• Ellipsis:
De man komt dichterbij en *trace* zegt: ...
![Page 92: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/92.jpg)
92
Annotation problems
• Syntactic errors, e.g. wrong PP-attachment
• One annotator• English frame files
![Page 93: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/93.jpg)
93
Automatic SRL classification• Automatic SRL in earlier research is based on
classification algorithms, e.g.:– Support Vector Machines (SVMs) – Decision Trees– Maximum Entropy Models– Memory Based Learning (MBL) (Daelemans et al. 2004)
• In semantic role classification text chunks are described by a set of features– e.g.: phrase type, POS-tag
• Text chunks are assigned a semantic role based on their feature set
![Page 94: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/94.jpg)
94
Semantic role classification
• Classification is a two step process:– Training the classifier on training data– Applying the trained classifier to unseen (test) data
• Previous research focused on English training data based on constituent structures
• This approach is based on dependency structures from a Dutch corpus (Stevens (2006), Stevens, Monachesi and van den Bosch (2007), Monachesi, Stevens and Trapman (2007))
![Page 95: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/95.jpg)
95
Classification approach• Approach based on earlier research by van den
Bosch et al:– Predicates are paired with candidate arguments– (predicate features,argument features) pairs are
called instances– Instances are classified into a set of PropBank labels
and “null” labels
![Page 96: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/96.jpg)
96
TiMBL
• TiMBL (Tilburg Memory Based Learner is used for classification)– MBL is a descendent of the classical k-
Nearest Neighbor (k-NN) approach– Adapted to NLP applications by the ILK
research group at Tilburg University
![Page 97: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/97.jpg)
97
Features used1.Predicate’s root form2.Predicate’s voice (active/passive)3.Argument’s Part-of-speech tag4.Argument’s c-label5.Argument’s d-label6.Argument’s position (before/after predicate)7.Argument’s relation head word8.Head word POS-tag9.c-label pattern of argument10. d-label pattern of argument11.c-/d-label combined
predicatefeatures
argumentfeatures
![Page 98: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/98.jpg)
98
An example instanceSMAIN
John geeft
SUnameArg0
OBJ1NP
Arg1
OBJ2PP
Arg2
HDverb
PRED
het boek aan Marie
geef,active,#,SU,name,before,John,verb,name*verb*NP*PP,SU*HD*OBJ1*OBJ2,SU*name,Arg0
![Page 99: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/99.jpg)
99
Training procedure
• TiMBL with default parameters, parameter optimization to prevent overfitting
• Relatively few training data was available:– 12,113 instances extracted from 2,395 sentences– 3066 verbs, 5271 arguments, 3810 modifiers
• Leave One Out (LOO) method to overcome data sparsity problem– Every data item in turn is selected once as a test
item, classifier is trained on remaining items
![Page 100: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/100.jpg)
100
Evaluation measures• Measures commonly used in information extraction are
used:
– Precision: proportion of instances labeled with a non-null label that were labeled correctly.
– Recall: proportion of instances correctly labeled with a non-null label out of all non-null instances
– F-Score: harmonic mean of precision and recall: 2·precision·recall / (precision+recall)
![Page 101: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/101.jpg)
101
Evaluation of the TiMBL classifier
70,43%70,59%70,27%F-ScoreRecallPrecision
![Page 102: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/102.jpg)
10254.5853.29%55.95%ArgM-TMP
85.5084.85%86.15%ArgM-REC
42.9040.63%45.45%ArgM-PRD
66.6764.83%68.61%ArgM-PNC
66.6765.38%68.00%ArgM-NEG
49.7247.57%52.07%ArgM-MNR
56.1954.53%57.95%ArgM-LOC
29.2128.57%29.89%ArgM-EXT
72.4570.71%74.27%ArgM-DIS
34.7833.33%36.36%ArgM-DIR
45.1643.26%47.24%ArgM-CAU
53.3751.85%54.98%ArgM-ADV
54.0554.05%54.05%Arg4
20.1419.18%21.21%Arg3
61.1559.10%63.34%Arg2
86.1884.63%87.80%Arg1
88.5986.82%90.44%Arg0
F ß=1RecallPrecisionLabel
![Page 103: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/103.jpg)
103
Comparison with CoNLL-05 systems
• CoNLL = Conference on Computational Natural Language
• Shared task: “competition” between automatic PropBank role labeling systems
• CoNLL shared task 2005– Best performing system reached an F-Score of 80– Seven systems reached an F-Score in the 75-78
range, seven more in the 70-75 range– Five systems reached an F-Score between 65 and 70
• Dependency structure (Hacioglu, 2004): F-Score 84.6
![Page 104: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/104.jpg)
104
Future work
• Further work is needed to improve performance:– Larger training corpus– Improvements on the feature set– Optimizations of algorithmic parameters– Experimentation with different learning
algorithms (e.g. SVMs)
![Page 105: Annotation of Semantic Roles](https://reader030.vdocuments.us/reader030/viewer/2022021009/6203b1a1da24ad121e4c6091/html5/thumbnails/105.jpg)
105
Conclusions• Dependency structures prove to be a quite valuable
resource both for rule based as for learning systems.• Automatic SRL in a Dutch corpus is feasible given the
currently available resources• Current system shows encouraging results, still many
improvements are possible• Adapting PB guidelines to Dutch not problematic.• Follow-up project: SONAR 500 million word corpus, 1
million semantically annotated