an alignment-based approach to semi-supervised relation extraction including multiple arguments

1

Click here to load reader

Upload: seokhwan-kim

Post on 26-Jun-2015

87 views

Category:

Business


3 download

TRANSCRIPT

Page 1: An alignment-based approach to semi-supervised relation extraction including multiple arguments

An alignment-based Approach to Semi-supervised Relation ExtractionIncluding Multiple Arguments

Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee, Kwangil Ko, and Zino Lee{megaup, stardust, gblee}@postech.ac.kr, {kik, zino}@alticast.com

Abstract - We present an alignment-based approach to semi-supervised relation extraction task including more than two arguments. We concentrate on improving not only the precision of the extracted result, but also on the coverage of the method. Our relation extraction method is based on an alignment-based pattern matching approach which provides more flexibility of the method. In addition, we extract all relationships including two or more arguments at once in order to obtain the integrated result with high quality. We present experimental results which indicate the effectiveness of our method.

Alignment-based Information Extractionv Information Extractionw Extracting the defined number of relevant arguments from natural language documentsw Subtasks

# of arguments subtask1 named-entity recognition2 binary relation extraction

more than 2 relation/event extraction

w Approaches w Supervised w Un/Semi-Supervised

Michaelcharacter Scofield portrayed by MillerWentworth in the TV series Prison Break is

the character <ROLE> portrayed by <ACTOR> in the television series <PROGRAM> is

v Sentence Alignment for Information Extractionw Example

w Alignment Matrixthe character Michael Scofield portrayed by Wentworth Miller in the TV series Prison Break is

character 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1<ROLE> 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2portrayed 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3

by 1 1 2 2 3 4 4 4 4 4 4 4 4 4 4<ACTOR> 1 2 2 3 3 4 5 5 5 5 5 5 5 5 5

in 1 2 2 3 3 4 5 5 6 6 6 6 6 6 6the 1 2 2 3 3 4 5 5 6 7 7 7 7 7 7

television 1 2 2 3 3 4 5 5 6 7 7 7 7 7 7series 1 2 2 3 3 4 5 5 6 7 7 8 8 8 8

<PROGRAM> 1 2 3 3 4 4 5 6 6 7 8 8 9 9 9is 1 2 3 3 4 4 5 6 6 7 8 8 9 9 10

w Matrix Computation

w Trace Back

Semi-supervised Relation Extraction Including Multiple Arguments

Experimental Results

Seed Datan arguments

Extracting ContextPatterns

RelationExtraction

Seed Data

2 arguments

…Extracting ContextPatterns

RelationExtraction

Seed Data

Extracting ContextPatterns

RelationExtraction

Seed Data

… … Extracting ContextPatterns

RelationExtraction

Seed Data

Extracting ContextPatterns

RelationExtraction

Seed Data

…Extracting ContextPatterns

RelationExtraction

Seed Data

Extracting ContextPatterns

RelationExtraction

Seed Data

k arguments n args

n argumentsResultsValidation &

Integration

M i , j max

M i 1, j 1 sim i 1, j 1

M i 1, j gpM i , j 1 gp0

{1, if PTNi = RAWjor PTNi = <label>

0, otherwisesimi,j =

M i,j next positionM i,j-1 +gp [i, j-1]

M i-1,j-1 + simi,j [i-1, j-1]M i-1,j +gp [i-1, j]

score(PTN, RAW) = max{M(PTN, RAW)}

length(PTN)

similarity(A,B) = max{M(A, B)}× 2

length(A) + length(B)

sim(tuple1, tuple2) =

|arguments|

|args|i=1 similarity(tuple1i, tuple2i)

v Overall Architecture v Context Patterns Extraction1) Searching the sentences containing all arguments of each tuple in source documents2) Segmenting out subpart of the sentence with the window size w3) Replacing the parts of arguments in the sub-sentence with argument labelsv Relation Extraction based on Pairwise Alignmentw Alignment score

v Alignment-based Verificationw Aligning between two candidate arguments

w Tuple clustering based on

w Selecting the most probable tuple for each cluster

1.00 0.95 0.90 0.85 0.80 0.75 0.700

10

20

30

40

50

60

70

80

90

threshold

# of

cor

rect

resu

lts

including 2 argumentsincluding 3 argumentsincluding 4 arguments

v Comparison on the Coverage for Various Threshold Values

|tuples| P |tuples| P(A,R) 249 36.55 79 73.42(P,R) 19 52.63 17 58.82(P,A) 10 60 10 60(C,P) 12 33.33 6 66.67

(P,A,R) 7 42.86 5 60(C,P,R) 18 55.56 16 81.25(C,P,A) 8 62.5 8 75

(C,P,A,R) 15 60 14 85.71

type ofrelations

beforeverification

afterverification

v Result of the verification

|tuples| P |tuples| P(P,A,R) 9 77.78 9 88.89(C,P,R) 11 81.82 16 87.5(C,P,A) 12 58.33 9 77.78

(C,P,A,R) 8 87.5 16 87.5

type ofrelations

with onlybinary

relations

with allintermediates

v Result of the integration

v Experimental Setupw 930 Korean news documents (13,175 sents) about TV seriesw Only a tuple with 4 arguments (CHANNEL, PROGRAM, ACTOR, ROLE) is used as a seedw Each result is collected after the first iteration and evaluated manually

w th = 0.85w C(Channel), P(Program), A(Actor), R(Role)