a s urvey on i nformation e xtraction from d ocuments u sing s tructures of s entences chikayama...
TRANSCRIPT
A SURVEY ONINFORMATION EXTRACTIONFROM DOCUMENTSUSING STRUCTURES OF SENTENCES
Chikayama Taura Lab. M1 Mitsuharu Kurita
1
INTRODUCTION
Current search systems are based on 2 assumptions
1. Users send words, not sentences2. The aim is finding documents which is
related to the query words
We are unconsciously get to select words which will appear nearby the target information
In some cases this clue doesn’t work well2
INTRODUCTION
For more convenient access to the information Analysis of the detail of question
To know the target information
Analysis of the information in retrieved documents To find the requested information
Information Extraction
3
OUTLINE
Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
4
INFORMATION EXTRACTION
What is Information Extraction? A kind of task in natural language processing Addresses extraction of information from texts
Not to retrieve the documents Originated with an international conference
named MUC
Message Understanding Conference (MUC) Competition of IE among research groups Set information extraction tasks every year
between 1987-1997
5
MUC COMPETITION
An example of MUC task MUC-3 terrorism domain
Input: news articles(some of them include
terrorism event)
Output: the instances involved in each incident
6
MUC COMPETITION
Pattern matching or linguistic analysis At that time (1987-1997), there were many
difficulties to use advanced natural language processing
Therefore, most of competitors adopted pattern matching to find instances
7
OUTLINE
Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
8
EXAMPLE OF PATTERN MATCHING
CIRCUS [92 Lehnert et al.] Each pattern consists of “trigger word” and
“linguistic pattern”
Pattern: kidnap-passiveTrigger:
“kidnap”Linguistic pattern:
“<subject> passive-verb”Variable:
“target”
“The mayor was kidnapped
by terrorists.”1. “kidnap” activates the
pattern2. “was kidnapped” is a
passive verb phrase3. The subject “mayor” is
the target
9
PROBLEMS OF PATTERN MATCHING
It takes a huge amount of time to create patterns In many cases, they were handwritten
It depends a lot on the target domain It is difficult to adapt to the new task
Automatic constructionof patterns
10
THE EARLIESTAUTOMATIC PATTERN
GENERATION
AutoSlog [93 Riloff et al.] Creates the patterns for CIRCUS automatically Training data: articles tagged the target word
Created 1237 patterns from 1500 tagged texts Only 450 of them were judged to be valid by
human
“The mayor was kidnapped
by terrorists.”
Pattern: kidnap-passiveTrigger:
“kidnap”Linguistic pattern:
“<subject> passive-verb”Variable:
“target”
11
Recently it has become possible to use deeper linguistic analysis
Some studies are addressing new IE tasks using these linguistic resources and machine learning approach
12
OUTLINE
Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
13
SENTENCE STRUCTURES
Dependency Structure Describes modification relations between words One sentence makes up a tree structure
Predicate-Argument structure Describes the semantic relations between
predicate and argument One sentence makes up a graph structure
14
DIFFICULTIES TO USE STRUCTURED DATA
Most of the machine learning algorithms deal with the data as feature vectors
It is difficult to express structured data (e.g. trees, graphs) as vectors
The ways to use sentence structures for IE Frequent substructures Shortest paths between 2 words Applying the kernel method for structured data
15
OUTLINE
Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
16
IE WITHSUBGRAPH OF SENTENCE STRUCTURES
On-Demand Information Extraction[06 Sekine et
al.] Create extraction patterns on-demand and
extract information with itquery Relevan
tarticles
FrequentSubtreeMining
Article database Dependency analyzer
Table of Information
Dependency trees
Subtree patterns
17
EXPERIMENTAL RESULTS
Generated patterns Found patterns for a query
“merger and acquisition” (M&A)
Extracted Information For the query “acquire, acquisition, merger, buy,
purchase”
18
<COM1>
<agree to buy>
<COM2>
<for MNY>
<COM1>
<will acquire>
<COM2>
<for MNY>
<a MNY merger>
<of COM1>
<and COM2>
EXPERIMENTAL RESULTS
Very quick construction of patterns In MUC, it is allowed to take one month ODIE takes only a few minutes to return the
result
No training corpus is needed ODIE learns extraction patterns from the data
Information about reprising event can be extracted well Merger and acquisition Nobel prize winners 19
OUTLINE
Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
20
IE WITHSHORTEST PATH BETWEEN
WORDS
Extraction of interacting protein pair[06 Yakushiji et al.]
Extract the interacting protein pairs from biomedical articles
Focus on the shortest path between 2 protein names on predicate-argument structure
Discriminate with Support Vector Machine (SVM)
Entity1 is interacted with a hydrophilic loop region
of Entity2.be
entity1
interact
withregion
of
a
hydrophilic
loop
entity2 21
PATTERN GENERATION
Variation of Patterns The extracted patterns are not enough Divide the patterns and combine them into new
patterns
Main PrepEntity Entity
………
Xinterac
tYwithprotein
region
of
22
PATTERN GENERATION
Validation of patterns Some of these patterns are inappropriate Each patterns are scored by its adequacy to the
learning data
Feature vector
23
TP: True PositiveFP: False Positive
SUPPORT VECTOR MACHINE (SVM)
2 class linear classifier Divide the data space with hyperplane Margin maximization
Margin maximization
24
EXPERIMENTAL RESULTS
Learning AImed corpus
225 abstracts of biomedical papers Annotated with protein names and interactions
Extraction MEDLINE
14 million titles and 8 million abstracts
Extracted data 7775 protein pairs 64.0% precision 83.8% recall
25
OUTLINE
Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
26
IE WITH THE KERNEL METHOD ON SENTENCE STRUCTURES
Kernel Method e.g. SVM
Data are used only in the form of dot products If you can calculate the dot product directly, you
do not have to calculate the vector Furthermore, you can use other functions as long
as they meet some conditions27
Raw data
vector space
classifier
Kernel function
RELATION EXTRACTION
Relation Extraction with Tree Kernel[04 Culotta et
al.] Classify the relation between 2 entities
5 entity types(person, organization, geo-political-entity,
location, facility) 5 major types of relations
(at, near, part, role, social) Classify the smallest subtree of dependency tree
which includes the entities
28
TREE KERNEL
Represents the similarity between 2 tree-shaped data
Calculated as the sum of similarity of nodes
29
Dequeue a node pair
Add the similarity
Find all child node sequence pairswhose main features of the nodes
are common
Enqueue the child node pairs
Is the queueempty?
Return the similarity
Enqueue root node pair
Start
End
Yes
No
CALCULATION OF TREE KERNEL
Features of nodes
The similarity between nodes are defined as the number of common features (except the main features)
30
Main features
CALCULATION OF TREE KERNEL
31
A
B C D
E
A’
B’ D’
E’
F’
A
B
A
D
D
E
C’
A’
B’ C’
A’A
A’
B’
A’
D’
A
B C
D’
E’
X and X’ denote the nodes whose main
features are common
A
C
A’
C’
EXPERIMENTAL RESULTS
Data set: ACE corpus 800 annotated documents
(gathered from newspapers and broadcasts)
5 entity types(person, organization, geo-political-entity,
location, facility) 5 major types of relations
(at, near, part, role, social)
32
Kernel Precision (%)
Recall (%)
Bag-of-words kernel 47.0 10.0
Tree kernel 69.6 25.3
OUTLINE
Introduction Overview of Information Extraction (IE) IE with pattern matching IE with sentence structures
Frequent substructure Shortest path between 2 words Applying the kernel method for structured data
Conclusion
33
CONCLUSION
Overview of Information Extraction The aim of information extraction Recent movement to use deep linguistic resource
The way to use sentence structures for IE Difficulties of using structured data in machine
learning Three different approaches to exploit them
34