multilingual event extraction and semi-automatic acquisition of related resources
DESCRIPTION
How to create a multilingual event extraction systemTRANSCRIPT
![Page 1: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/1.jpg)
Multilingual Event Extraction and Semi-automatic Acquisition of Related
Resources
Hristo TanevJoint Research Centre
Ispra, Italy
![Page 2: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/2.jpg)
NEXUS News Event eXtraction
Using language Structures
![Page 3: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/3.jpg)
Event Extraction
Event extraction was introduced as a language processing task at MUC-2 in 1989
Event is something that happens, event description is a template which describes an event
The goal of automatic event extraction is automatic filling of an event description template from a text or a set of texts
Event description usually includes: Event type Time and place of the event Participating entities which have specific roles and which depend on the event type,
e.g. perpetrator, victim, instrument etc. Cause
![Page 4: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/4.jpg)
Event Extraction in the Context of EMM
The purpose of the automatic event extraction from online news is to facilitate the crisis-management efforts of the European Commission and other related political institutions
NEXUS NEXUS detects security-related events and disasters NEXUSNEXUS monitors in nearly real time online news in English,
French, Spanish, Italian, Russian, Portuguese, and Arabic (after automatic translation into English)
Medical NEXUS detects news about disease outbreaks in English (soon to be deployed in French)
![Page 5: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/5.jpg)
EMM Event Extraction from Online News
News cluster:
Car bomb kills 50 in IraqHindustanTimes Wednesday, June 18, 2008 5:07:00 AM CEST A car bomb blast in northern Baghdad left more than 50 people dead and 80 wounded on Tuesday, a police source said…
Biggest blast in months leaves at least 50 dead in IraqreliefWeb Wednesday, June 18, 2008 5:05:00 AM CESTA car bomb blast in northern Baghdad, the largest in months, left more than 50 people dead and 80 wounded on Tuesday, a police source said...
![Page 6: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/6.jpg)
EMM Event Extraction from Online News
Event Description
• Date: 18 June 2008• Place: Baghdad, Iraq• Event type: terrorist attack• Number killed: 50• Number wounded: 80• Number kidnapped: 0• Perpetrators: not reported• Weapons: car bomb
![Page 7: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/7.jpg)
NEXUS
EMM Event Extraction ArchitectureNews
Entity Match Geo-Tagging Clustering
TextProcessing
NER, Parsing,Pattern Matching
InformationAggregation
Visualization Events
![Page 8: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/8.jpg)
Partial Parsing
Example for a multilingual rule, which recognizes NP like: "a French volunteer and an Italian military"
coordination_rule :> ( person_group & [NAME:#name1, AMOUNT:"1" #amount1] (token & [SURFACE: ","]?
person_group & [NAME:#name2, AMOUNT:"1" #amount2])?(token & [SURFACE: ","]?
person_group & [NAME:#name3, AMOUNT:"1" #amount3])?conjunctionperson_group & [NAME:#name4, AMOUNT:"1" #amount4]):c
c: person_group & [NAME:#final, AMOUNT:#amount, NUMBER:"p“]& #final := ConcForSum(#name1,#name2,#name3,#name4)& #amount := ConcForSum(#amount1,#amount2,#amount3,#amount4).
![Page 9: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/9.jpg)
Annotating Participating Entities
This is one of the most important tasks – to label the person groups and other phrases with event specific semantic roles, e.g. Perpetrator, Dead victim, Displaced people, Weapons used, etc.
Linear patterns – work well for English We use linear patterns also for Russian More elaborated event extraction grammars for Arabic,
Italian, French, Spanish and Portuguese
![Page 10: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/10.jpg)
Event-specific Grammars
Rule: <person-group> [introduce-passive] Verb[baseform: rimanere]? Adv? Verb[sem: injured-obj, passive-voice] <person-group> : injured
Cinque persone sono state feriteCinque persone sono state gravemente feriteCinque persone sono rimaste ferite For details see [Zavarella et.al. Event Extraction for
Italian, Using a Cascade of Finite State Grammars, FSMNLP 2008]
![Page 11: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/11.jpg)
![Page 12: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/12.jpg)
Multilingual Lexical Acquisition
![Page 13: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/13.jpg)
Multilingual Lexical Acquisition
Automatic learning of language-specific lexical resources
Statistical approaches, weakly supervised, make use of large quantities of unannotated news
Learning of patterns, keywords and keyphrases, which can be manually validated, rather than statistical models like SVM
Pattern learning Learning domain-specific lexica Learning semantic classes
![Page 14: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/14.jpg)
Linear Pattern Learning
For English we use the linear patterns, as the algorithm learns them
We learned more 3000 linear patterns for English For Italian and other languages, linear patterns
are staring point for grammar development
![Page 15: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/15.jpg)
Learning Semantic Classes
Sometimes, it is necessary to learn specific semantic classes, e.g. vehicles, disasters, weapons, facilities
We built a stastical system for automatic acquisition of semantic classes
The system is language-independent, only a list of language-specific stop words is used
![Page 16: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/16.jpg)
Ontopopulis
INPUT:
feelings: hatred, love, fear, sadness
contrasting classes: taste, (style, outlook), character, thoughts
![Page 17: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/17.jpg)
Extracting New Terms
Newly learnt terms are ordered and next given to the user for evaluation Top 20 terms from the category feelings
griefsorrowsadnesscondolencesfeardisappointmentregretsympathyshockhatredgratitudefrustrationangerdeep sorrowprofounddismaycondolencesatisfactionprofound griefdeep grief
![Page 18: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/18.jpg)
Using Learnt Semantic Classes for Event Extraction
We use Ontopopulis to learn terms, which we next put into our domain-specific dictionaries
Some rules which require a domain specific dictionary: Rules for parsing person reference noun phrases, such as
two engineers Rules which detect weapons used:
killed with a [WEAPON] (killed with a gun ) Detection of vehicles used:
[PEOPLE] in a [VEHICLE] were stopped (three men in a boat were stopped)
![Page 19: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/19.jpg)
NEXUS Evaluation for English
61%Geo-tagging (place name)
90%Geo-tagging (country)
80%Event classification
57%Injured counting
70%Dead counting
AccuracyDetection Task
![Page 20: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/20.jpg)
NEXUS Multilingual Evaluation
0.470.670.510.69Portuguese
0.67-0.620.87Italian
ArrestedKidnappedWoundedDeadF1 measure
![Page 21: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/21.jpg)
Evaluation of Ontopopulis
------6095Spanish
7585207085756090Portuguese
BuildingCrimeEdged weapon
WatercraftVehiclePoliticianWeaponPersonAccuracy (%) top 20
![Page 22: Multilingual Event Extraction and Semi-automatic acquisition of related resources](https://reader033.vdocuments.us/reader033/viewer/2022052911/559e1c0b1a28abef5b8b4702/html5/thumbnails/22.jpg)