nyu: description of the proteus/pet system as used for muc-7 st
DESCRIPTION
NYU: Description of the Proteus/PET System as Used for MUC-7 ST. Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002. Outline. Introduction Proteus IE System PET User Interface Performance on the Launch Scenario. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/1.jpg)
NYU:Description of the Proteus/PET System as Used for MUC-7 ST
Roman Yangarber & Ralph Grishman
Presented by Jinying Chen
10/04/2002
![Page 2: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/2.jpg)
Outline
• Introduction
• Proteus IE System
• PET User Interface
• Performance on the Launch Scenario
![Page 3: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/3.jpg)
Introduction
• Problem : portability and customization of IE engines at the scenario level
• To address this problem– NYU built a set of tools, which allow the user
to adapt the system to new scenarios rapidly through example-based learning
– The present system operates on two tiers: Proteus & PET
![Page 4: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/4.jpg)
Introduction (Cont.)
• Proteus– Core extraction engine, an enhanced version of
the one employed at MUC-6
• PET– GUI front end, through which the user interacts
with Proteus– The user provide the system examples of events
in text, and examples of associated database entries to be created
![Page 5: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/5.jpg)
Proteus IE System
• Modular design– Control is encapsulated in immutable, domain-
independent core components– Domain-specific information resides in the
knowledge bases
![Page 6: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/6.jpg)
![Page 7: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/7.jpg)
Proteus IE System (Cont.)
• Lexical analysis module– Assign each token a reading or a list of
alternative readings by consulting a set of on-line dictionaries
• Name Recognition– Identify proper names in the text by using local
contextual cues
![Page 8: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/8.jpg)
Proteus IE System (Cont.)• Partial Syntax
– Find small syntactic units, such as basic NPs and VPs
– Marks the phrase with semantic information, e.g. the semantic class of the head of the phrase
• Scenario Patterns– Find higher–level syntactic constructions using
local semantic information: apposition, prepositional phrase attachment, limited conjucntions, and clausal constructions.
![Page 9: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/9.jpg)
Proteus IE System (Cont.)
• Note:– The above three modules are Pattern matching
phrases, they operate by deterministic, bottom-up, partial parsing or pattern matching.
– The output is a sequence of LFs corresponding to the entities, relationships, and events encountered in the analysis.
![Page 10: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/10.jpg)
Figure 2: LF for the NP: “a satellite built by Loral Corp. of New York for Intelsat”
![Page 11: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/11.jpg)
Proteus IE System (Cont.)
• Reference Resolution (RefRes)– Links anaphoric pronouns to their antecedents
and merges other co-referring expressions
• Discourse Analysis– Uses higher-level inference rules to build more
complex event structures– E.g. a rule that merges a Mission entity with a
corresponding Launch event.
• Output Generation
![Page 12: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/12.jpg)
PET User Interface
• A disciplined method of customization of knowledge bases, and the pattern base in particular
• Organization of Patterns– The pattern base is organized in layers– Proteus treats the patterns at the different levels
differently– Acquires the most specific patterns directly
from user, on a per-scenario basis
![Page 13: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/13.jpg)
clausal patterns that capture events (scenario-specific)
find relationships among entities, such as between persons and organizations
perform partial syntactic analysis
most general patterns, capture the most basic constructs, such as proper names, temporal expressions etc.
Domain-dependent
Domain-independent
Core part of
System
user
Pattern Lib
![Page 14: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/14.jpg)
PET User Interface (Cont.)
• Pattern Acquisition – Enter an example – Choose an event template – Apply existing patterns (step 3)– Tune pattern elements (step 4)– Fill event slots (step 5)– Build pattern – Syntactic generalization
![Page 15: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/15.jpg)
Step 3
Step 4
Step 5
![Page 16: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/16.jpg)
Performance on the Launch Scenario
• Scenario Patterns– Basically two types: launch events and mission events
– In cases there is no direct connection between these two events, the post-processing inference rules attempted to tie the mission to a launch event
• Inference Rules– Involve many-to-many relations (e.g. multiple payloads
correspond to a single event)
– Extending inference rule set with heuristics, e.g. find date and site
![Page 17: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/17.jpg)
Conclusion:
•Example-based pattern acquisition is appropriate for ST-level task, especially when training data is quite limited
•Pattern editing tools are useful and effective
![Page 18: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/18.jpg)
NYU:Description of the MENE Named Entity System as Used in MUC-7
Andrew Borthwick, John Sterling etc.
Presented by Jinying Chen
10/04/2002
![Page 19: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/19.jpg)
Outline
• Maximum Entropy
• MENE’s Feature Classes
• Feature Selection
• Decoding
• Results
• Conclusion
![Page 20: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/20.jpg)
Maximum Entropy
• Problem DefinitionThe problem of named entity recognition can be reduced to the problem of assigning 4*n+1 tags to each token
– n: the number of name categories, such as company, product, etc. For MUC-7, n=7
– 4 states: x_start, x_continue, x_end, x_unique
– other : not part of a named entity
![Page 21: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/21.jpg)
Maximum Entropy (cont.)
• Maximum Solution– compute p(f | h), where f is the prediction
among the 4*n+1 tags and h is the history– the computation of p(f | h) depends on a set of
binary-valued features, e.g.
![Page 22: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/22.jpg)
Maximum Entropy (cont.)
– Given a set of features and some training data, the maximum entropy estimation process produces a model:
![Page 23: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/23.jpg)
MENE’s Feature Classes
• Binary Features
• Lexical Features
• Section Features
• Dictionary Features
• External Systems Features
![Page 24: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/24.jpg)
Binary Features
• Features whose “history” can be considered to be either on or off for a given token.
• Example:– The token begins with a capitalized letter– The token is a four-digit number
![Page 25: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/25.jpg)
Lexical Features
• Example:
![Page 26: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/26.jpg)
Section Features
• Features make predictions based on the current section of the article, like “Date”, “Preamble”, and “Text”.
• Play a key role by establishing the background probability of the occurrence of the different futures (predictions).
![Page 27: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/27.jpg)
Dictionary Features
• Make use of a broad array of dictionaries of useful single or multi-word terms such as first names, company names, and corporate suffixes.
• Require no manual editing
![Page 28: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/28.jpg)
![Page 29: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/29.jpg)
External Systems Features
• MENE incorporates the outputs of three NE taggers– a significantly enhanced version of the
traditional , hand-coded “Proteus” named-entity tagger
– Manitoba– IsoQuest
![Page 30: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/30.jpg)
• Example:
![Page 31: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/31.jpg)
Feature Selection
• Simple
• Select all features which fire at least 3 times in the training corpus
![Page 32: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/32.jpg)
Decoding
• Simple– For each token, check all the active features for
this token and compute p(f | h)– Run a viterbi search to find the highest
probability coherent path through the lattice of conditional probabilities
![Page 33: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/33.jpg)
Results
• Training set: 350 aviation disaster articles (consisted of about 270,000 words)
• Test set: – Dry run : within-domain corpus – Formal run : out-of-domain corpus
![Page 34: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/34.jpg)
Result (cont.)
![Page 35: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/35.jpg)
Result (cont.)
![Page 36: NYU: Description of the Proteus/PET System as Used for MUC-7 ST](https://reader035.vdocuments.us/reader035/viewer/2022062500/56814fe5550346895dbdaeba/html5/thumbnails/36.jpg)
Conclusion
• A new, still immature system. Can improve the performance by:– Adding long-range reference-resolution features
– Exploring compound features
– Sophisticated methods of feature selection
• Highly portable• An efficient method to combine NE systems