an alignment-based approach to semi-supervised relation extraction including multiple arguments
TRANSCRIPT
An alignment-based Approach to Semi-supervised Relation ExtractionIncluding Multiple Arguments
Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee, Kwangil Ko, and Zino Lee{megaup, stardust, gblee}@postech.ac.kr, {kik, zino}@alticast.com
Abstract - We present an alignment-based approach to semi-supervised relation extraction task including more than two arguments. We concentrate on improving not only the precision of the extracted result, but also on the coverage of the method. Our relation extraction method is based on an alignment-based pattern matching approach which provides more flexibility of the method. In addition, we extract all relationships including two or more arguments at once in order to obtain the integrated result with high quality. We present experimental results which indicate the effectiveness of our method.
Alignment-based Information Extractionv Information Extractionw Extracting the defined number of relevant arguments from natural language documentsw Subtasks
# of arguments subtask1 named-entity recognition2 binary relation extraction
more than 2 relation/event extraction
w Approaches w Supervised w Un/Semi-Supervised
Michaelcharacter Scofield portrayed by MillerWentworth in the TV series Prison Break is
the character <ROLE> portrayed by <ACTOR> in the television series <PROGRAM> is
v Sentence Alignment for Information Extractionw Example
w Alignment Matrixthe character Michael Scofield portrayed by Wentworth Miller in the TV series Prison Break is
character 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1<ROLE> 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2portrayed 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3
by 1 1 2 2 3 4 4 4 4 4 4 4 4 4 4<ACTOR> 1 2 2 3 3 4 5 5 5 5 5 5 5 5 5
in 1 2 2 3 3 4 5 5 6 6 6 6 6 6 6the 1 2 2 3 3 4 5 5 6 7 7 7 7 7 7
television 1 2 2 3 3 4 5 5 6 7 7 7 7 7 7series 1 2 2 3 3 4 5 5 6 7 7 8 8 8 8
<PROGRAM> 1 2 3 3 4 4 5 6 6 7 8 8 9 9 9is 1 2 3 3 4 4 5 6 6 7 8 8 9 9 10
w Matrix Computation
w Trace Back
Semi-supervised Relation Extraction Including Multiple Arguments
Experimental Results
Seed Datan arguments
Extracting ContextPatterns
RelationExtraction
Seed Data
2 arguments
…Extracting ContextPatterns
RelationExtraction
Seed Data
Extracting ContextPatterns
RelationExtraction
Seed Data
… … Extracting ContextPatterns
RelationExtraction
Seed Data
Extracting ContextPatterns
RelationExtraction
Seed Data
…Extracting ContextPatterns
RelationExtraction
Seed Data
Extracting ContextPatterns
RelationExtraction
Seed Data
k arguments n args
n argumentsResultsValidation &
Integration
M i , j max
M i 1, j 1 sim i 1, j 1
M i 1, j gpM i , j 1 gp0
{1, if PTNi = RAWjor PTNi = <label>
0, otherwisesimi,j =
M i,j next positionM i,j-1 +gp [i, j-1]
M i-1,j-1 + simi,j [i-1, j-1]M i-1,j +gp [i-1, j]
score(PTN, RAW) = max{M(PTN, RAW)}
length(PTN)
similarity(A,B) = max{M(A, B)}× 2
length(A) + length(B)
sim(tuple1, tuple2) =
|arguments|
|args|i=1 similarity(tuple1i, tuple2i)
v Overall Architecture v Context Patterns Extraction1) Searching the sentences containing all arguments of each tuple in source documents2) Segmenting out subpart of the sentence with the window size w3) Replacing the parts of arguments in the sub-sentence with argument labelsv Relation Extraction based on Pairwise Alignmentw Alignment score
v Alignment-based Verificationw Aligning between two candidate arguments
w Tuple clustering based on
w Selecting the most probable tuple for each cluster
1.00 0.95 0.90 0.85 0.80 0.75 0.700
10
20
30
40
50
60
70
80
90
threshold
# of
cor
rect
resu
lts
including 2 argumentsincluding 3 argumentsincluding 4 arguments
v Comparison on the Coverage for Various Threshold Values
|tuples| P |tuples| P(A,R) 249 36.55 79 73.42(P,R) 19 52.63 17 58.82(P,A) 10 60 10 60(C,P) 12 33.33 6 66.67
(P,A,R) 7 42.86 5 60(C,P,R) 18 55.56 16 81.25(C,P,A) 8 62.5 8 75
(C,P,A,R) 15 60 14 85.71
type ofrelations
beforeverification
afterverification
v Result of the verification
|tuples| P |tuples| P(P,A,R) 9 77.78 9 88.89(C,P,R) 11 81.82 16 87.5(C,P,A) 12 58.33 9 77.78
(C,P,A,R) 8 87.5 16 87.5
type ofrelations
with onlybinary
relations
with allintermediates
v Result of the integration
v Experimental Setupw 930 Korean news documents (13,175 sents) about TV seriesw Only a tuple with 4 arguments (CHANNEL, PROGRAM, ACTOR, ROLE) is used as a seedw Each result is collected after the first iteration and evaluated manually
w th = 0.85w C(Channel), P(Program), A(Actor), R(Role)