automatic selection of predicates for common sense knowledge expression
Post on 06-Jul-2015
50 views
DESCRIPTION
Ai Makabi, Hiroshi Matsumoto and Kazuhide Yamamoto. Automatic Selection of Predicates for Common Sense Knowledge Expression. Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING 2013), no page numbers (2013.9)TRANSCRIPT
Automa'c selec'on of predicates for common sense knowledge
expression
Ai Makabi, Kazuhide Yamamoto, Hiroshi Matsumoto
Nagaoka University of Technology
D&(350$"-1
•! C$)1+A+,$/)&-)2-#+,,25+-#)($%/"#+0)–!E0&%%&'(&,)3-$4,+15+)–!F&05+)&%$"-#)$.)!"##"$%&'$&'%($")*'+,'%-./01%
@*+0G*)H+%&03)I)&%)*"J+02-5).0$%))K$(<2G*)82'-5
H+*/$-*+)L$)B$")80$4*+)&)M+8)*2#+).$0)1$5)#$)12*(2/,2-+N
O! !"#$%)2*)&)-&%+)$.)&)&"'(O! PA+0($%+)82'-5)8+<&A2$0)
C$)#0&2-)&)1$5
+Q5Q)R$-A+0*&'$-&,)*B*#+%
KRKR
D&(350$"-1
•! C$)1+A+,$/)&-)2-#+,,25+-#)($%/"#+0)–!E0&%%&'(&,)3-$4,+15+)–!F&05+)&%$"-#)$.)!"##"$%&'$&'%($")*'+,'%
I)&%)*"J+02-5).0$%))K$(<2G*)82'-5
L$)B$")80$4*+)&)M+8)*2#+).$0)1$5)#$)12*(2/,2-+N
O! !"#$%)2*)&)-&%+)$.)&"'(O! PA+0($%+)82'-5)8+<&A2$0)
C$)#0&2-)&)1$5
+Q5Q)R$-A+0*&'$-&,)*B*#+%
KRKR
)))S$("*)$-)%+#<$1*T)))U)D"2,12-5)&)($%%$-)*+-*+)3-$4,+15+)8&*+))))))VRW:DX)))U)K0$A212-5)&((+**28,+)0+/0+*+-#&'$-).$0)"*+))))))2-)-&#"0&,),&-5"&5+)/0$(+**2-5)#&*3*))
Related Works 1/2
• Exis'ng Upper Ontologies (SUMO, Cyc, etc.) – Contain many general concepts – e.g. Collec'on: book
• A Type of: Informa'on bearing object the form of paper • Instance of: Kind of ar'fact not dis'nguished by brand or model
• Merits: – Exploit rigorously-‐defined CSK
• Demerits: – Knowledge representa'on cannot be matched fully with actual expressions
Related Works 2/2 • Defineing the CSK as some rela'ons are added to
sentences/words (ConceptNet) – e.g. 犬(dog)
• CapableOf: 散歩(walk), 寝る(sleep) • SymbolOf: 忠誠(loyalty),
• Merits: – Defini'on is be_er suited to a natural language processing task
• Demerits: – For the Japanese ConceptNet, the most concepts are collected manually
• Coverage of CSK is excep'onally low
E$&,)$.)#<+)W#"1B•! !"#$%&'(&,,B)($-*#0"(#)&)`&/&-+*+)RW:D)#<&#)(&-)8+)"',2;+1).$0)*+%&-'()&-&,B*2*)2-)-&#"0&,),&-5"&5+)/0$(+**2-5
W+#)$.)/0+12(&#+*)#<&#)($U$(("0)42#<)&))-$"-)a)RW:))
A+08))&1]+('A+))A+08&,)-$"-
A+08)8&03)0"-
&1]+('A+)/0+_B)("#+
A+08&,)-$"-)#$)#0&2-9)#$)80+&1
RW:)$.)b1$5c
S2-&,)5$&,)U)PA+0A2+4)$.)#<+)RW:D
(&#%+49)%+$4)#$)80+&1)/0+_B)
#$)80+&1)/0+_B)
&-2%&,
R$%/"#+)&)*2%2,&02#B)8+#4++-)-$"-*)
/"//B
B+,/))))))) ))
8&03)#$)80+&1)/0+_B)
1$5R$%/&0+)&#)
#<+)/0+12(&#+U,+A+,
!550+5&#+))($-(+/#*)&*)&)"//+0)($-(+/#)b&-2%&,c)8&*+1)$-)#<+)*2%2,&02#B
R$-(+/#)V-$"-X
&-2%&,
&)"//+0)($-(+/#)b&-2%&,c)8&*+1)$-)#<+)*2%2,&02#BRW:)
V/0+12(&#+X
Specific Property of CSK
• We make the three hypothesis: 1) The predicate a is the CSK of the noun n when
the pair of a and n are frequently co-‐occurred in sentences.
2) The predicate a which co-‐occurs with any nouns is not the appropriate CSK
3) Whether the predicate a is a correct CSK or not, it depends on the number of unique nouns which co-‐occurred with a.
Specific Property of CSK
• We make the three hypothesis: 1) The predicate a is the CSK of the noun n when
the pair of a and n are frequently co-‐occurred in sentences.
2) The predicate a which co-‐occurs with any nouns is not the appropriate CSK
3) Whether the predicate a is a correct CSK or not, it depends on the number of unique nouns which co-‐occurred with a.
W/+(2^()K0$/+0#B)$.)RW:
•! M+)%&3+)#<+)#<0++)<B/$#<+*2*T)YX! C<+)/0+12(&#+()(2*)#<+)RW:)$.)#<+)-$"-(*(4<+-)
#<+)/&20)$.)))&-1(*(&0+).0+d"+-#,B)($U$(("00+1)2-)*+-#+-(+*Q))
[X! C<+)/0+12(&#+()(4<2(<)($U$(("0*)42#<)&-B)-$"-*)2*)-$#)#<+)&//0$/02&#+)RW:)
eX! M<+#<+0)#<+)/0+12(&#+)))2*)&(($00+(#)RW:)$0)-$#9)2#)1+/+-1*)$-)#<+)-"%8+0)$.)"-2d"+)-$"-*)4<2(<)($U$(("00+1)42#<))Q
I)&_+-1)'*'#'$2345%&!6""*)+A+0B1&B))/0+12(&#+)))))))))))$"7$%
($00+(#)RW:)$0)-$#9)2#)1+/+-1*)$-)#<+)-"%8+0)$.)"-2d"+)-$"-*)
)+A+0B1&B) ($U$(("0)42#<)<25<).0+d"+-(B
C<+)/0+12(&#+2*)-$#)#<+)&//0$/02&#+)RW:)C<+)/0+12(&#+b&_+-1c)2*)#<+)RW:)
42#<)<25<)/0$8&82,2#B
!"#$%&'()*+,+('$-)$.)K0+12(&#+*C<+)#$/)Yf)/0+12(&#+*)&112-5)#$)
&)-$"-)b )V+,+%+-#&0B)*(<$$,Xc
V#$)+-0$,,)2-)*(<$$,X)V#$)+1"(&#+X)V8+X)V8+($%+X)V#$)^-2*<)*(<$$,X)V#$)52A+),+**$-*X)V#$)#&3+)&-)+6&%X)V&_+-1X)
V#$),+&0-X)V#$)($&(<X))
<25<
,$4
C<+)/0+12(&#+*)/,&(+1)"//+0)2-)#<+),2*#)&0+)($-*21+0+1)%$0+)&//0$/02&#+)&*)#<+)RW:)
Specific Property of CSK
• We make the three hypothesis: 1) The predicate a is the CSK of the noun n when
the pair of a and n are frequently co-‐occurred in sentences.
2) The predicate a which co-‐occurs with any nouns is not the appropriate CSK
3) Whether the predicate a is a correct CSK or not, it depends on the number of unique nouns which co-‐occurred with a.
C<+)#$/)Yf)/0+12(&#+*)&112-5)#$)&)-$"-)b )V+,+%+-#&0B)*(<$$,Xc K0+12(&#+*)42#<)<25<)
($U$(("00+-(+).0+d"+-(B)42#<)&)-$"-)8"#)(&--$#)(<&0&(#+02;+)#<+)-$"-
I-($00+(#)RW:)•! g+0*&',+)4$01*)•! R$U$(("00+1)42#<)%&-B)
-$"-*
V#$)+-0$,,)2-)*(<$$,X)V#$)+1"(&#+X)V8+X)V8+($%+X)V#$)^-2*<)*(<$$,X)V#$)52A+),+**$-*X)V#$)#&3+)&-)+6&%X)V&_+-1X)
V#$),+&0-X)V#$)($&(<X))
!"#$%&'()*+,+('$-)$.)K0+12(&#+*
!"
#!!"
$!!!"
$#!!"
%!!!"
%#!!"
!" %!!" &!!" '!!" (!!" $!!!"
\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))
C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23
?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)
?"%
8+0)$
.)"-2d"+)/0+12(&#+*))
!"
#!!"
$!!!"
$#!!"
%!!!"
%#!!"
!" %!!" &!!" '!!" (!!" $!!!"
\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))
C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23
?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)
?"%
8+0)$
.)"-2d"+)/0+12(&#+*))
C<+)-"%8+0)$.)"-2d"+)/0+12(&#+*9)4<2(<)($U$(("0)42#<)hff)-$"-*9)2*)Yfff
!"
#!!"
$!!!"
$#!!"
%!!!"
%#!!"
!" %!!" &!!" '!!" (!!" $!!!"
C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23
?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)
?"%
8+0)$
.)"-2d"+)/0+12(&#+*))
\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))
($U$(("002-5)42#<)%&-B)-$"-*
($U$(("002-5)42#<).+4)-$"-*
!"
#!!"
$!!!"
$#!!"
%!!!"
%#!!"
!" %!!" &!!" '!!" (!!" $!!!"
C<2*)($-#&2-*)#<+)2-($00+(#,B)/0+12(&#+*)8&*+1)$-)<B/$#<+*2*)V[X)V*<&0/,B)2-(0+&*+1X
C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23
?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)
?"%
8+0)$
.)"-2d"+)/0+12(&#+*))
\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))
/$4+0)&//0$62%&#+1)("0A+))
,$5&02#<%2()("0A+
2-i+('$-)/$2-#
\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))
?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)V,$5&02#<%X
?"%
8+0)$
.)"-2d"+)/0+12(&#+*)V,$5&02#<
%X))
Specific Property of CSK
• We make the three hypothesis: 1) The predicate a is the CSK of the noun n when
the pair of a and n are frequently co-‐occurred in sentences.
2) The predicate a which co-‐occurs with any nouns is not the appropriate CSK
3) Whether the predicate a is a correct CSK or not, it depends on the number of unique nouns which co-‐occurred with a.
W/+(2^()K0$/+0#B)$.)RW:
•! M+)%&3+)#<+)#<0++)<B/$#<+*2*T)YX! C<+)/0+12(&#+()(2*)#<+)RW:)$.)#<+)-$"-(*(4<+-)
#<+)/&20)$.)))&-1(*(&0+).0+d"+-#,B)($U$(("00+1)2-)*+-#+-(+*Q))
[X! C<+)/0+12(&#+()(4<2(<)($U$(("0*)42#<)&-B)-$"-*)2*)-$#)#<+)&//0$/02&#+)RW:)
eX! M<+#<+0)#<+)/0+12(&#+)))2*)&(($00+(#)RW:)$0)-$#9)2#)1+/+-1*)$-)#<+)-"%8+0)$.)"-2d"+)-$"-*)4<2(<)($U$(("00+1)42#<))Q
W/+(2^()K0$/+0#B)$.)RW:
M+)%&3+)#<+)#<0++)<B/$#<+*2*T)C<+)/0+12(&#+#<+)/&20)$.)*+-#+-(+*Q))C<+)/0+12(&#+2*)-$#)#<+)&//0$/02&#+)RW:)M<+#<+0)#<+)/0+12(&#+)
W$0#)#<+)-$"-*)8B)#<+)-"%8+0)$.)($U$(("002-5)/0+12(&#+*
2-.$0%&'$-)/+0*$-)/0$1"(#))))0"--+0)1&#&8&*+)/2&-$
W/+(2^()K0$/+0#B)$.)RW:
2*)#<+)RW:)$.)#<+)-$"-(*(4<+-)
2-.$0%&'$-) C<+)/0+12(&#+)$.)b0"-c)($",1)-$#)(<&0&(#+02;+)#<+)
-$"-)$.)b/+0*$-c
&0+).0+d"+-#,B)($U$(("00+1)2-)
1&#&8&*+)
C<+)/0+12(&#+)$.)b0"-c)($",1)(<&0&(#+02;+)#<+)
b0"--+0c
C<+)-$"-)4<2(<)($U$(("0*)42#<)%&-B)/0+12(&#+*)(&-)-$#)8+)(<&0&(#+02;+1)8B)5+-+02()/0+12(&#+*9)<+-(+9)#<+)-"%8+0)$.)#<+20)1+,+'-5)/0+12(&#+*)2*)%$0+)2-(0+&*+)#<&-)-$"-*)($U$(("002-5)42#<)&).+4)/0+12(&#+*Q))
\%+05+-(+)12*#028"'$-)$.)#<+)#$/)?)/0+12(&#+*)($U$(("002-5)42#<)-$"-)
?jYff ?jYfff
>25<U0&-3+1)-$"-*)42,,)<&A+)%$0+)1+,+'-5)/0+12(&#+*))4<+-)-$"-*)&0+)*$0#+1)2-)#<+)$01+0)$.)/0+12(&#+)($U$(("00+-(+Q))
?jYfff
L+,+'-5)/0+12(&#+*
I-)#<+)(&*+)4<2(<)#<+)A&,"+)$.)?)2*)Yff9)$-,B)#<+)#$/)Yff)-$"-*)&0+)#&3+-)2-#$)&(($"-#
1+(0+&*2-5)2-)&)*#&20(&*+)/&_+0-
C<+)#$/)?)-$"-*)($U$(("002-5)42#<)%&-B)/0+12(&#+*)
C<+)-"%8+0)$
.)1+,+'-5)/0+12(&#+*)
C<+)-"%8+0)$.)1+,+'-5)/0+12(&#+*).$0)+&(<)-$"-)2*)1+(21+1)8&*+1)$-)#<+)<B/$#<+*2*)VeXQ))
*2-5",&0)/$2-#*)2-)?jkff9)Y9Yff9)Y9lff9)[9mff)$0)e9lffQ
?"%8+0)$.)1+,+'-5)/0+12(&#+*).$0)+&(<)-$"-
(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))
Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE
UNIQUE NUMBER OF CO-OCCURRED PREDICATES)
Scope of the nouns DeletionN!700 427
700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73
others 33
However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.
わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)
Figure 6. The deleting predicates for all noun
We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).
TF (a, n) =log2(na,n + 1)
log2(!
k nk,n)(1)
IV. EVALUATION
A. Evaluation method
We compare the adding common sense knowledge tofollowing baselines.
(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).
(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).
(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).
We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.
The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).
wgt(a, n) = TF (a, n)""log2
|N ||Na|
+ 1
#(2)
B. Evaluation Result
We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted
DeletionR$-*21+0)#<&#)#<+)ee)/0+12(&#+*)&0+)-$#)RW:9)&-1)1+,+#+).0$%)&,,)-$"-*)&*)2-($00+(#,B)/0+12(&#+*))
V"-1+0*#&-1X9) V<&A+X9) V*++9),$$3X9) V8+($%+X9))V-$#<2-5X9) V#&3+9)&1$/#9)/0+.+0X9) V(&-X9) V3-$4X9))V($%+X9) V#<2-3X9) V%&-BX9) V8+9)-++19)*<$$#X
\6&%/,+)$.)1+,+'-5)/0+12(&#+*
Added CSK for each noun
(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))
Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE
UNIQUE NUMBER OF CO-OCCURRED PREDICATES)
Scope of the nouns Number of deletionN!700 427
700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73
others 33
However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.
わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)
Figure 6. The deleting predicates for all noun
We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).
TF (a, n) =log2(na,n + 1)
log2(!
k nk,n)(1)
IV. EVALUATION
A. Evaluation method
We compare the adding common sense knowledge tofollowing baselines.
(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).
(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).
(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).
We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.
The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).
wgt(a, n) = TF (a, n)""log2
|N ||Na|
+ 1
#(2)
B. Evaluation Result
We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted
• The following equa'on computes weighted scores for predicates co-‐occurring with noun using Harman normalized frequency
A predicate is appreciate as correct CSK for a noun when the predicate score is high.
(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))
Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE
UNIQUE NUMBER OF CO-OCCURRED PREDICATES)
Scope of the nouns DeletionN!700 427
700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73
others 33
However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.
わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)
Figure 6. The deleting predicates for all noun
We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).
TF (a, n) =log2(na,n + 1)
log2(!
k nk,n)(1)
IV. EVALUATION
A. Evaluation method
We compare the adding common sense knowledge tofollowing baselines.
(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).
(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).
(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).
We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.
The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).
wgt(a, n) = TF (a, n)""log2
|N ||Na|
+ 1
#(2)
B. Evaluation Result
We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted
(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))
Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE
UNIQUE NUMBER OF CO-OCCURRED PREDICATES)
Scope of the nouns DeletionN!700 427
700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73
others 33
However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.
わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)
Figure 6. The deleting predicates for all noun
We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).
TF (a, n) =log2(na,n + 1)
log2(!
k nk,n)(1)
IV. EVALUATION
A. Evaluation method
We compare the adding common sense knowledge tofollowing baselines.
(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).
(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).
(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).
We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.
The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).
wgt(a, n) = TF (a, n)""log2
|N ||Na|
+ 1
#(2)
B. Evaluation Result
We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted
(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))
Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE
UNIQUE NUMBER OF CO-OCCURRED PREDICATES)
Scope of the nouns DeletionN!700 427
700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73
others 33
However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.
わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)
Figure 6. The deleting predicates for all noun
We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).
TF (a, n) =log2(na,n + 1)
log2(!
k nk,n)(1)
IV. EVALUATION
A. Evaluation method
We compare the adding common sense knowledge tofollowing baselines.
(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).
(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).
(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).
We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.
The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).
wgt(a, n) = TF (a, n)""log2
|N ||Na|
+ 1
#(2)
B. Evaluation Result
We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted
: noun : predicate : appearance frequency of predicate a with noun n
Baselines
1) Do not delete the any predicates, just use the weighted predicates by Harman normalized frequency (baseline 1)
2) Do not delete the any predicates, just use the weighted predicates by TF-‐IDF score (baseline 2)
3) Remove the 427 dele'ng predicates in N≤700, and use the weighted predicates by Harman normalized frequency (baseline 3)
893#:*'%";%3&&<,$'+%:4'+<!32'&%3++<$,%2"%=+",>D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(<
<&A+ <&A+ V1$)-$#)+&#X)
V#$)#&3+)$"#).$0)&)4&,3X)
?'!"#' -?'%2",'26'41% V1$)-$#)80++1X)
V802-5)"/X)
?' V#$),2A+X) V82#+)#$)1+&#<X)
V8+)*2(3X
?' V#$)*&,+X) V1$)-$#)8&03X)
V#&3+)*$%+$-+)#$)#$4X)
#$),2A+ V."-X) V#$)52A+)&),+#<&,)2-]+('$-X)
V,2A+X
*++ V(<+&/X) V#$)#+#<+0X V#$)#0&2-X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X) V8&03X)*&B V#$)0+52*#+0X) V5+#)&,,)
#<2-X)V("#+X
893#:*'%";%3&&<,$'+%:4'+<!32'&%3++<$,%2"%=+",>D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(<
<&A+ <&A+ V1$)-$#)+&#X)
V#$)#&3+)$"#).$0)&)4&,3X)
?'!"#' -?'%2",'26'41% V1$)-$#)80++1X)
V802-5)"/X)
?' V#$),2A+X) V82#+)#$)1+&#<X)
V8+)*2(3X
?' V#$)*&,+X) V1$)-$#)8&03X)
V#&3+)*$%+$-+)#$)#$4X)
#$),2A+ V."-X) V#$)52A+)&),+#<&,)2-]+('$-X)
V,2A+X
*++ V(<+&/X) V#$)#+#<+0X V#$)#0&2-X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X) V8&03X)*&B V#$)0+52*#+0X) V5+#)&,,)
#<2-X)V("#+X
1+&#<X)V1$)-$#)
V#$)52A+)&),+#<&,)2-]+('$-X)
V#$)#+#<+0X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X)
I-($00+(#,B)/0+12(&#+*)#<&#)($U$(("00+1)
.0+d"+-#,B)42#<)%&-B)-$"-*
893#:*'%";%3&&<,$'+%:4'+<!32'&%3++<$,%2"%=+",>D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(<
<&A+ <&A+ V1$)-$#)+&#X)
V#$)#&3+)$"#).$0)&)4&,3X)
?'!"#' -?'%2",'26'41% V1$)-$#)80++1X)
V802-5)"/X)
?' V#$),2A+X) V82#+)#$)1+&#<X)
V8+)*2(3X
?' V#$)*&,+X) V1$)-$#)8&03X)
V#&3+)*$%+$-+)#$)#$4X)
#$),2A+ V."-X) V#$)52A+)&),+#<&,)2-]+('$-X)
V,2A+X
*++ V(<+&/X) V#$)#+#<+0X V#$)#0&2-X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X) V8&03X)*&B V#$)0+52*#+0X) V5+#)&,,)
#<2-X)V("#+X
!//0$/02&#+)/0+12(&#+*)&0+)1+,+#+1
Error Analysis 1/3
• Although a predicate co-‐occurs with a noun many 'mes, there are unrelated pairs – Do not check the dependency rela'on between them
Solu'on: Use only the predicates which depend on the target nouns as candidate of CSK
Error Analysis 2/3
• Could not assign nouns, which can also be used as suffix to appropriate predicates – 美しい月です (This is the beau'ful moon) – 月ごとに決済する (We make a charge for each month)
Solu'on: U'lize the rela'on of another co-‐occurred nouns e.g., If the “月” is co-‐occurred with a noun “太陽 (sun)”, it may mean the moon
Error Analysis 3/3
• Include nouns which are used for defining the rela'on of nouns – 原因 (cause) – 理由 (reason)
Solu'on: Discuss how we limit the nouns of adding target
Conclusion
• Described the selec'on method of appropriate predicate as CSK for construc'ng the CSKB. – Method for sta's'cally selec'ng CSK of nouns u'lizing the unique number of co-‐occurred predicates.
• Evaluated sets of CSK which are assigned to each noun compared with three baselines – Demonstrated assumed characteris'cs of CKS in our study
– Gave a subjec've evalua'on • Plan to make a quan'ta've evalua'on