automatic selection of predicates for common sense knowledge expression

32
Automa’c selec’on of predicates for common sense knowledge expression Ai Makabi, Kazuhide Yamamoto, Hiroshi Matsumoto Nagaoka University of Technology

Post on 06-Jul-2015

50 views

Category:

Technology


0 download

DESCRIPTION

Ai Makabi, Hiroshi Matsumoto and Kazuhide Yamamoto. Automatic Selection of Predicates for Common Sense Knowledge Expression. Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING 2013), no page numbers (2013.9)

TRANSCRIPT

Page 1: Automatic Selection of Predicates for Common Sense Knowledge Expression

Automa'c  selec'on  of  predicates  for  common  sense  knowledge  

expression

Ai  Makabi,  Kazuhide  Yamamoto,  Hiroshi  Matsumoto    

Nagaoka  University  of  Technology

Page 2: Automatic Selection of Predicates for Common Sense Knowledge Expression

D&(350$"-1

•! C$)1+A+,$/)&-)2-#+,,25+-#)($%/"#+0)–!E0&%%&'(&,)3-$4,+15+)–!F&05+)&%$"-#)$.)!"##"$%&'$&'%($")*'+,'%-./01%

@*+0G*)H+%&03)I)&%)*"J+02-5).0$%))K$(<2G*)82'-5

H+*/$-*+)L$)B$")80$4*+)&)M+8)*2#+).$0)1$5)#$)12*(2/,2-+N

O! !"#$%)2*)&)-&%+)$.)&)&"'(O! PA+0($%+)82'-5)8+<&A2$0)

C$)#0&2-)&)1$5

+Q5Q)R$-A+0*&'$-&,)*B*#+%

KRKR

Page 3: Automatic Selection of Predicates for Common Sense Knowledge Expression

D&(350$"-1

•! C$)1+A+,$/)&-)2-#+,,25+-#)($%/"#+0)–!E0&%%&'(&,)3-$4,+15+)–!F&05+)&%$"-#)$.)!"##"$%&'$&'%($")*'+,'%

I)&%)*"J+02-5).0$%))K$(<2G*)82'-5

L$)B$")80$4*+)&)M+8)*2#+).$0)1$5)#$)12*(2/,2-+N

O! !"#$%)2*)&)-&%+)$.)&"'(O! PA+0($%+)82'-5)8+<&A2$0)

C$)#0&2-)&)1$5

+Q5Q)R$-A+0*&'$-&,)*B*#+%

KRKR

)))S$("*)$-)%+#<$1*T)))U)D"2,12-5)&)($%%$-)*+-*+)3-$4,+15+)8&*+))))))VRW:DX)))U)K0$A212-5)&((+**28,+)0+/0+*+-#&'$-).$0)"*+))))))2-)-&#"0&,),&-5"&5+)/0$(+**2-5)#&*3*))

Page 4: Automatic Selection of Predicates for Common Sense Knowledge Expression

Related  Works  1/2

•  Exis'ng  Upper  Ontologies  (SUMO,  Cyc,  etc.)  –  Contain  many  general  concepts  –  e.g.  Collec'on:  book  

•  A  Type  of:  Informa'on  bearing  object  the  form  of  paper  •  Instance  of:  Kind  of  ar'fact  not  dis'nguished  by  brand  or  model  

•  Merits:    –  Exploit  rigorously-­‐defined  CSK    

•  Demerits:  –  Knowledge  representa'on  cannot  be  matched  fully  with  actual  expressions

Page 5: Automatic Selection of Predicates for Common Sense Knowledge Expression

Related  Works  2/2 •  Defineing  the  CSK  as  some  rela'ons  are  added  to    

sentences/words  (ConceptNet)  –  e.g.  犬(dog)  

•  CapableOf:  散歩(walk),  寝る(sleep)  •  SymbolOf:  忠誠(loyalty),    

•  Merits:    –  Defini'on  is  be_er  suited  to  a  natural  language  processing  task  

•  Demerits:  –  For  the  Japanese  ConceptNet,  the  most  concepts  are  collected  manually    

•  Coverage  of  CSK  is  excep'onally  low  

Page 6: Automatic Selection of Predicates for Common Sense Knowledge Expression

E$&,)$.)#<+)W#"1B•! !"#$%&'(&,,B)($-*#0"(#)&)`&/&-+*+)RW:D)#<&#)(&-)8+)"',2;+1).$0)*+%&-'()&-&,B*2*)2-)-&#"0&,),&-5"&5+)/0$(+**2-5

W+#)$.)/0+12(&#+*)#<&#)($U$(("0)42#<)&))-$"-)a)RW:))

A+08))&1]+('A+))A+08&,)-$"-

A+08)8&03)0"-

&1]+('A+)/0+_B)("#+

A+08&,)-$"-)#$)#0&2-9)#$)80+&1

RW:)$.)b1$5c

Page 7: Automatic Selection of Predicates for Common Sense Knowledge Expression

S2-&,)5$&,)U)PA+0A2+4)$.)#<+)RW:D

(&#%+49)%+$4)#$)80+&1)/0+_B)

#$)80+&1)/0+_B)

&-2%&,

R$%/"#+)&)*2%2,&02#B)8+#4++-)-$"-*)

/"//B

B+,/))))))) ))

8&03)#$)80+&1)/0+_B)

1$5R$%/&0+)&#)

#<+)/0+12(&#+U,+A+,

!550+5&#+))($-(+/#*)&*)&)"//+0)($-(+/#)b&-2%&,c)8&*+1)$-)#<+)*2%2,&02#B

R$-(+/#)V-$"-X

&-2%&,

&)"//+0)($-(+/#)b&-2%&,c)8&*+1)$-)#<+)*2%2,&02#BRW:)

V/0+12(&#+X

Page 8: Automatic Selection of Predicates for Common Sense Knowledge Expression

Specific  Property  of  CSK

•  We  make  the  three  hypothesis:  1)  The  predicate  a  is  the  CSK  of  the  noun  n  when  

the  pair  of  a  and  n  are  frequently  co-­‐occurred  in  sentences.    

2)  The  predicate  a  which  co-­‐occurs  with  any  nouns  is  not  the  appropriate  CSK  

3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,  it  depends  on  the  number  of  unique  nouns  which  co-­‐occurred  with  a.

Page 9: Automatic Selection of Predicates for Common Sense Knowledge Expression

Specific  Property  of  CSK

•  We  make  the  three  hypothesis:  1)  The  predicate  a  is  the  CSK  of  the  noun  n  when  

the  pair  of  a  and  n  are  frequently  co-­‐occurred  in  sentences.    

2)  The  predicate  a  which  co-­‐occurs  with  any  nouns  is  not  the  appropriate  CSK  

3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,  it  depends  on  the  number  of  unique  nouns  which  co-­‐occurred  with  a.

Page 10: Automatic Selection of Predicates for Common Sense Knowledge Expression

W/+(2^()K0$/+0#B)$.)RW:

•! M+)%&3+)#<+)#<0++)<B/$#<+*2*T)YX! C<+)/0+12(&#+()(2*)#<+)RW:)$.)#<+)-$"-(*(4<+-)

#<+)/&20)$.)))&-1(*(&0+).0+d"+-#,B)($U$(("00+1)2-)*+-#+-(+*Q))

[X! C<+)/0+12(&#+()(4<2(<)($U$(("0*)42#<)&-B)-$"-*)2*)-$#)#<+)&//0$/02&#+)RW:)

eX! M<+#<+0)#<+)/0+12(&#+)))2*)&(($00+(#)RW:)$0)-$#9)2#)1+/+-1*)$-)#<+)-"%8+0)$.)"-2d"+)-$"-*)4<2(<)($U$(("00+1)42#<))Q

I)&_+-1)'*'#'$2345%&!6""*)+A+0B1&B))/0+12(&#+)))))))))))$"7$%

($00+(#)RW:)$0)-$#9)2#)1+/+-1*)$-)#<+)-"%8+0)$.)"-2d"+)-$"-*)

)+A+0B1&B) ($U$(("0)42#<)<25<).0+d"+-(B

C<+)/0+12(&#+2*)-$#)#<+)&//0$/02&#+)RW:)C<+)/0+12(&#+b&_+-1c)2*)#<+)RW:)

42#<)<25<)/0$8&82,2#B

Page 11: Automatic Selection of Predicates for Common Sense Knowledge Expression

!"#$%&'()*+,+('$-)$.)K0+12(&#+*C<+)#$/)Yf)/0+12(&#+*)&112-5)#$)

&)-$"-)b )V+,+%+-#&0B)*(<$$,Xc

V#$)+-0$,,)2-)*(<$$,X)V#$)+1"(&#+X)V8+X)V8+($%+X)V#$)^-2*<)*(<$$,X)V#$)52A+),+**$-*X)V#$)#&3+)&-)+6&%X)V&_+-1X)

V#$),+&0-X)V#$)($&(<X))

<25<

,$4

C<+)/0+12(&#+*)/,&(+1)"//+0)2-)#<+),2*#)&0+)($-*21+0+1)%$0+)&//0$/02&#+)&*)#<+)RW:)

Page 12: Automatic Selection of Predicates for Common Sense Knowledge Expression

Specific  Property  of  CSK

•  We  make  the  three  hypothesis:  1)  The  predicate  a  is  the  CSK  of  the  noun  n  when  

the  pair  of  a  and  n  are  frequently  co-­‐occurred  in  sentences.    

2)  The  predicate  a  which  co-­‐occurs  with  any  nouns  is  not  the  appropriate  CSK  

3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,  it  depends  on  the  number  of  unique  nouns  which  co-­‐occurred  with  a.

Page 13: Automatic Selection of Predicates for Common Sense Knowledge Expression

C<+)#$/)Yf)/0+12(&#+*)&112-5)#$)&)-$"-)b )V+,+%+-#&0B)*(<$$,Xc K0+12(&#+*)42#<)<25<)

($U$(("00+-(+).0+d"+-(B)42#<)&)-$"-)8"#)(&--$#)(<&0&(#+02;+)#<+)-$"-

I-($00+(#)RW:)•! g+0*&',+)4$01*)•! R$U$(("00+1)42#<)%&-B)

-$"-*

V#$)+-0$,,)2-)*(<$$,X)V#$)+1"(&#+X)V8+X)V8+($%+X)V#$)^-2*<)*(<$$,X)V#$)52A+),+**$-*X)V#$)#&3+)&-)+6&%X)V&_+-1X)

V#$),+&0-X)V#$)($&(<X))

!"#$%&'()*+,+('$-)$.)K0+12(&#+*

Page 14: Automatic Selection of Predicates for Common Sense Knowledge Expression

!"

#!!"

$!!!"

$#!!"

%!!!"

%#!!"

!" %!!" &!!" '!!" (!!" $!!!"

\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))

C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23

?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)

?"%

8+0)$

.)"-2d"+)/0+12(&#+*))

Page 15: Automatic Selection of Predicates for Common Sense Knowledge Expression

!"

#!!"

$!!!"

$#!!"

%!!!"

%#!!"

!" %!!" &!!" '!!" (!!" $!!!"

\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))

C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23

?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)

?"%

8+0)$

.)"-2d"+)/0+12(&#+*))

C<+)-"%8+0)$.)"-2d"+)/0+12(&#+*9)4<2(<)($U$(("0)42#<)hff)-$"-*9)2*)Yfff

Page 16: Automatic Selection of Predicates for Common Sense Knowledge Expression

!"

#!!"

$!!!"

$#!!"

%!!!"

%#!!"

!" %!!" &!!" '!!" (!!" $!!!"

C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23

?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)

?"%

8+0)$

.)"-2d"+)/0+12(&#+*))

\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))

($U$(("002-5)42#<)%&-B)-$"-*

($U$(("002-5)42#<).+4)-$"-*

Page 17: Automatic Selection of Predicates for Common Sense Knowledge Expression

!"

#!!"

$!!!"

$#!!"

%!!!"

%#!!"

!" %!!" &!!" '!!" (!!" $!!!"

C<2*)($-#&2-*)#<+)2-($00+(#,B)/0+12(&#+*)8&*+1)$-)<B/$#<+*2*)V[X)V*<&0/,B)2-(0+&*+1X

C<+)/0+12(&#+*)4<2(<).&,,)"-1+0)&)(+0#&2-)*($/+)($U$(("0)42#<)%&-B)-$"-*)a)L+,+#+)#<+)/0+12(&#+*)&*)+&,-,.*'(/0,&%#)1,23

?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)

?"%

8+0)$

.)"-2d"+)/0+12(&#+*))

\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))

Page 18: Automatic Selection of Predicates for Common Sense Knowledge Expression

/$4+0)&//0$62%&#+1)("0A+))

,$5&02#<%2()("0A+

2-i+('$-)/$2-#

\%+05+-(+)12*#028"'$-)$.)/0+12(&#+*)2-)#<+)#$/)Y9fff)-$"-*))

?"%8+0)$.)"-2d"+)-$"-*)($U$(("002-5)42#<)/0+12(&#+)V,$5&02#<%X

?"%

8+0)$

.)"-2d"+)/0+12(&#+*)V,$5&02#<

%X))

Page 19: Automatic Selection of Predicates for Common Sense Knowledge Expression

Specific  Property  of  CSK

•  We  make  the  three  hypothesis:  1)  The  predicate  a  is  the  CSK  of  the  noun  n  when  

the  pair  of  a  and  n  are  frequently  co-­‐occurred  in  sentences.    

2)  The  predicate  a  which  co-­‐occurs  with  any  nouns  is  not  the  appropriate  CSK  

3)  Whether  the  predicate  a  is  a  correct  CSK  or  not,  it  depends  on  the  number  of  unique  nouns  which  co-­‐occurred  with  a.

Page 20: Automatic Selection of Predicates for Common Sense Knowledge Expression

W/+(2^()K0$/+0#B)$.)RW:

•! M+)%&3+)#<+)#<0++)<B/$#<+*2*T)YX! C<+)/0+12(&#+()(2*)#<+)RW:)$.)#<+)-$"-(*(4<+-)

#<+)/&20)$.)))&-1(*(&0+).0+d"+-#,B)($U$(("00+1)2-)*+-#+-(+*Q))

[X! C<+)/0+12(&#+()(4<2(<)($U$(("0*)42#<)&-B)-$"-*)2*)-$#)#<+)&//0$/02&#+)RW:)

eX! M<+#<+0)#<+)/0+12(&#+)))2*)&(($00+(#)RW:)$0)-$#9)2#)1+/+-1*)$-)#<+)-"%8+0)$.)"-2d"+)-$"-*)4<2(<)($U$(("00+1)42#<))Q

W/+(2^()K0$/+0#B)$.)RW:

M+)%&3+)#<+)#<0++)<B/$#<+*2*T)C<+)/0+12(&#+#<+)/&20)$.)*+-#+-(+*Q))C<+)/0+12(&#+2*)-$#)#<+)&//0$/02&#+)RW:)M<+#<+0)#<+)/0+12(&#+)

W$0#)#<+)-$"-*)8B)#<+)-"%8+0)$.)($U$(("002-5)/0+12(&#+*

2-.$0%&'$-)/+0*$-)/0$1"(#))))0"--+0)1&#&8&*+)/2&-$

W/+(2^()K0$/+0#B)$.)RW:

2*)#<+)RW:)$.)#<+)-$"-(*(4<+-)

2-.$0%&'$-) C<+)/0+12(&#+)$.)b0"-c)($",1)-$#)(<&0&(#+02;+)#<+)

-$"-)$.)b/+0*$-c

&0+).0+d"+-#,B)($U$(("00+1)2-)

1&#&8&*+)

C<+)/0+12(&#+)$.)b0"-c)($",1)(<&0&(#+02;+)#<+)

b0"--+0c

C<+)-$"-)4<2(<)($U$(("0*)42#<)%&-B)/0+12(&#+*)(&-)-$#)8+)(<&0&(#+02;+1)8B)5+-+02()/0+12(&#+*9)<+-(+9)#<+)-"%8+0)$.)#<+20)1+,+'-5)/0+12(&#+*)2*)%$0+)2-(0+&*+)#<&-)-$"-*)($U$(("002-5)42#<)&).+4)/0+12(&#+*Q))

Page 21: Automatic Selection of Predicates for Common Sense Knowledge Expression

\%+05+-(+)12*#028"'$-)$.)#<+)#$/)?)/0+12(&#+*)($U$(("002-5)42#<)-$"-)

?jYff ?jYfff

>25<U0&-3+1)-$"-*)42,,)<&A+)%$0+)1+,+'-5)/0+12(&#+*))4<+-)-$"-*)&0+)*$0#+1)2-)#<+)$01+0)$.)/0+12(&#+)($U$(("00+-(+Q))

?jYfff

L+,+'-5)/0+12(&#+*

I-)#<+)(&*+)4<2(<)#<+)A&,"+)$.)?)2*)Yff9)$-,B)#<+)#$/)Yff)-$"-*)&0+)#&3+-)2-#$)&(($"-#

Page 22: Automatic Selection of Predicates for Common Sense Knowledge Expression

1+(0+&*2-5)2-)&)*#&20(&*+)/&_+0-

C<+)#$/)?)-$"-*)($U$(("002-5)42#<)%&-B)/0+12(&#+*)

C<+)-"%8+0)$

.)1+,+'-5)/0+12(&#+*)

C<+)-"%8+0)$.)1+,+'-5)/0+12(&#+*).$0)+&(<)-$"-)2*)1+(21+1)8&*+1)$-)#<+)<B/$#<+*2*)VeXQ))

*2-5",&0)/$2-#*)2-)?jkff9)Y9Yff9)Y9lff9)[9mff)$0)e9lffQ

Page 23: Automatic Selection of Predicates for Common Sense Knowledge Expression

?"%8+0)$.)1+,+'-5)/0+12(&#+*).$0)+&(<)-$"-

(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))

Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE

UNIQUE NUMBER OF CO-OCCURRED PREDICATES)

Scope of the nouns DeletionN!700 427

700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73

others 33

However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.

わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)

Figure 6. The deleting predicates for all noun

We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).

TF (a, n) =log2(na,n + 1)

log2(!

k nk,n)(1)

IV. EVALUATION

A. Evaluation method

We compare the adding common sense knowledge tofollowing baselines.

(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).

(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).

(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).

We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.

The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).

wgt(a, n) = TF (a, n)""log2

|N ||Na|

+ 1

#(2)

B. Evaluation Result

We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted

DeletionR$-*21+0)#<&#)#<+)ee)/0+12(&#+*)&0+)-$#)RW:9)&-1)1+,+#+).0$%)&,,)-$"-*)&*)2-($00+(#,B)/0+12(&#+*))

V"-1+0*#&-1X9) V<&A+X9) V*++9),$$3X9) V8+($%+X9))V-$#<2-5X9) V#&3+9)&1$/#9)/0+.+0X9) V(&-X9) V3-$4X9))V($%+X9) V#<2-3X9) V%&-BX9) V8+9)-++19)*<$$#X

\6&%/,+)$.)1+,+'-5)/0+12(&#+*

Page 24: Automatic Selection of Predicates for Common Sense Knowledge Expression

Added  CSK  for  each  noun

(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))

Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE

UNIQUE NUMBER OF CO-OCCURRED PREDICATES)

Scope of the nouns Number of deletionN!700 427

700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73

others 33

However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.

わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)

Figure 6. The deleting predicates for all noun

We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).

TF (a, n) =log2(na,n + 1)

log2(!

k nk,n)(1)

IV. EVALUATION

A. Evaluation method

We compare the adding common sense knowledge tofollowing baselines.

(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).

(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).

(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).

We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.

The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).

wgt(a, n) = TF (a, n)""log2

|N ||Na|

+ 1

#(2)

B. Evaluation Result

We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted

•  The  following  equa'on  computes  weighted  scores  for  predicates  co-­‐occurring  with  noun  using  Harman  normalized  frequency    

A  predicate  is  appreciate  as  correct  CSK  for  a  noun  when  the  predicate  score  is  high.    

(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))

Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE

UNIQUE NUMBER OF CO-OCCURRED PREDICATES)

Scope of the nouns DeletionN!700 427

700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73

others 33

However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.

わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)

Figure 6. The deleting predicates for all noun

We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).

TF (a, n) =log2(na,n + 1)

log2(!

k nk,n)(1)

IV. EVALUATION

A. Evaluation method

We compare the adding common sense knowledge tofollowing baselines.

(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).

(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).

(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).

We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.

The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).

wgt(a, n) = TF (a, n)""log2

|N ||Na|

+ 1

#(2)

B. Evaluation Result

We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted

(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))

Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE

UNIQUE NUMBER OF CO-OCCURRED PREDICATES)

Scope of the nouns DeletionN!700 427

700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73

others 33

However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.

わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)

Figure 6. The deleting predicates for all noun

We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).

TF (a, n) =log2(na,n + 1)

log2(!

k nk,n)(1)

IV. EVALUATION

A. Evaluation method

We compare the adding common sense knowledge tofollowing baselines.

(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).

(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).

(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).

We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.

The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).

wgt(a, n) = TF (a, n)""log2

|N ||Na|

+ 1

#(2)

B. Evaluation Result

We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted

(1) N=100 (2) N=1,000 (3) N=10,000Figure 4. Emergence distribution of top the N predicates co-occurring with noun (horizontal axis: a number of unique nouns co-occurring with predicates,vertical axis: a number of unique predicates (logarithm))

Table ITHE NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE

UNIQUE NUMBER OF CO-OCCURRED PREDICATES)

Scope of the nouns DeletionN!700 427

700<N!1,100 2671,100<N!1,600 1431,600<N!2,500 73

others 33

However, the 33 predicates, which get deleted whenN=3,600, can be used to nearly all nouns, so we considerthat they are not common sense knowledge, and delete fromall nouns as incorrectly predicates. Figure 6 shows a list ofthe deleting predicates for all nouns.

わかる (understand),もつ (have),みる (see, look),なる(become),ない (nothing),とる (take, adopt, prefer),できる (can),つく (get, prick, arrive, strike),しる (know),くる (come), おもう (think), おおい (many), いる (be,need, shoot), いう (say), ある (be), 良い (good), 入る (enter, join), でる (leave, go out, attend), つくる(make), つかう (use), きく (hear, ask, be effective), かく (write, scratch), おこなう (do), 紹介 (introduce), よい (good), ゆく (go), たつ (stand, build, pass), たかい(high, expensive), おる (be, fold), いい (good), 関係(to relate), やる (do), かける (build, hang, run, lack)

Figure 6. The deleting predicates for all noun

We use the selected predicates as common sense knowl-edge, and add them to each noun. In particular, we calculatethe weighted scores for predicates co-occurring with nounusing Harman normalized frequency. A predicate is correctcommon sense knowledge for a noun when the predicatescore is high. The equation of Harman normalized frequencyis as follows (n: noun, a: predicate, na,n: appearance fre-quency of predicate a with noun n).

TF (a, n) =log2(na,n + 1)

log2(!

k nk,n)(1)

IV. EVALUATION

A. Evaluation method

We compare the adding common sense knowledge tofollowing baselines.

(1) Do not delete the any predicates, just use the weightedpredicates by Harman normalized frequency (base-line1).

(2) Do not delete the any predicates, just use the weightedpredicates by the equation 2 based on TF-IDF (base-line2).

(3) Remove the 427 deleting predicates in N!700 shownby Table I, and use the weighted predicates by Harmannormalized frequency (baseline3).

We compare our method with baseline1 and baseline2 toverify the effect of deleting predicates which frequently co-occurred with unique nouns. Moreover, for comparing ap-proach method to baseline3, we check the effect of shiftingnumbers of deleting predicates for each noun.

The following equation shows the weighted score fora predicate a which co-occurred with a noun n used bybaseline2 (|N |: the sum of nouns, |Na|: the sum of nounswhich co-occurred with a predicate a).

wgt(a, n) = TF (a, n)""log2

|N ||Na|

+ 1

#(2)

B. Evaluation Result

We take three different nouns for evaluation, and showtheir assigned predicates which ranked in the top 10 asfollows (Table II).The proposed method can assign appropriate predicate to anoun as the common sense knowledge for the most nouns.On the other hand, in baseline1 and baseline2, a predicatewhich frequently co-occurred with any nouns are rankedmuch higher than proposed method. For example of a noun“犬 (dog)”, predicates of “なる (become)”, “いる (be)” and“一緒 (be together)” which co-occurred with any nouns areappeared in higher rank in baseline1 and baseline2. How-ever, in approach method, incorrect predicates are deleted

:  noun :  predicate :  appearance  frequency  of  predicate  a  with  noun  n    

Page 25: Automatic Selection of Predicates for Common Sense Knowledge Expression

Baselines

1)  Do  not  delete  the  any  predicates,  just  use  the  weighted  predicates  by  Harman  normalized  frequency  (baseline  1)  

2)  Do  not  delete  the  any  predicates,  just  use  the  weighted  predicates  by  TF-­‐IDF  score  (baseline  2)  

3)  Remove  the  427  dele'ng  predicates  in  N≤700,  and  use  the  weighted  predicates  by  Harman  normalized  frequency  (baseline  3)  

Page 26: Automatic Selection of Predicates for Common Sense Knowledge Expression

893#:*'%";%3&&<,$'+%:4'+<!32'&%3++<$,%2"%=+",>D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(<

<&A+ <&A+ V1$)-$#)+&#X)

V#$)#&3+)$"#).$0)&)4&,3X)

?'!"#' -?'%2",'26'41% V1$)-$#)80++1X)

V802-5)"/X)

?' V#$),2A+X) V82#+)#$)1+&#<X)

V8+)*2(3X

?' V#$)*&,+X) V1$)-$#)8&03X)

V#&3+)*$%+$-+)#$)#$4X)

#$),2A+ V."-X) V#$)52A+)&),+#<&,)2-]+('$-X)

V,2A+X

*++ V(<+&/X) V#$)#+#<+0X V#$)#0&2-X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X) V8&03X)*&B V#$)0+52*#+0X) V5+#)&,,)

#<2-X)V("#+X

Page 27: Automatic Selection of Predicates for Common Sense Knowledge Expression

893#:*'%";%3&&<,$'+%:4'+<!32'&%3++<$,%2"%=+",>D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(<

<&A+ <&A+ V1$)-$#)+&#X)

V#$)#&3+)$"#).$0)&)4&,3X)

?'!"#' -?'%2",'26'41% V1$)-$#)80++1X)

V802-5)"/X)

?' V#$),2A+X) V82#+)#$)1+&#<X)

V8+)*2(3X

?' V#$)*&,+X) V1$)-$#)8&03X)

V#&3+)*$%+$-+)#$)#$4X)

#$),2A+ V."-X) V#$)52A+)&),+#<&,)2-]+('$-X)

V,2A+X

*++ V(<+&/X) V#$)#+#<+0X V#$)#0&2-X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X) V8&03X)*&B V#$)0+52*#+0X) V5+#)&,,)

#<2-X)V("#+X

1+&#<X)V1$)-$#)

V#$)52A+)&),+#<&,)2-]+('$-X)

V#$)#+#<+0X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X)

I-($00+(#,B)/0+12(&#+*)#<&#)($U$(("00+1)

.0+d"+-#,B)42#<)%&-B)-$"-*

Page 28: Automatic Selection of Predicates for Common Sense Knowledge Expression

893#:*'%";%3&&<,$'+%:4'+<!32'&%3++<$,%2"%=+",>D&*+,2-+)Y D&*+,2-+)[ D&*+,2-+)e !//0$&(<

<&A+ <&A+ V1$)-$#)+&#X)

V#$)#&3+)$"#).$0)&)4&,3X)

?'!"#' -?'%2",'26'41% V1$)-$#)80++1X)

V802-5)"/X)

?' V#$),2A+X) V82#+)#$)1+&#<X)

V8+)*2(3X

?' V#$)*&,+X) V1$)-$#)8&03X)

V#&3+)*$%+$-+)#$)#$4X)

#$),2A+ V."-X) V#$)52A+)&),+#<&,)2-]+('$-X)

V,2A+X

*++ V(<+&/X) V#$)#+#<+0X V#$)#0&2-X8+)-$-+ V"-1+0*#&-1X) V#$)#0&2-X) V8&03X)*&B V#$)0+52*#+0X) V5+#)&,,)

#<2-X)V("#+X

!//0$/02&#+)/0+12(&#+*)&0+)1+,+#+1

Page 29: Automatic Selection of Predicates for Common Sense Knowledge Expression

Error  Analysis  1/3

•  Although  a  predicate  co-­‐occurs  with  a  noun  many  'mes,  there  are  unrelated  pairs  – Do  not  check  the  dependency  rela'on  between  them  

Solu'on:    Use  only  the  predicates  which  depend  on  the  target  nouns  as  candidate  of  CSK  

Page 30: Automatic Selection of Predicates for Common Sense Knowledge Expression

Error  Analysis  2/3

•  Could  not  assign  nouns,  which  can  also  be  used  as  suffix  to  appropriate  predicates  –   美しい月です (This  is  the  beau'ful  moon)  – 月ごとに決済する  (We  make  a  charge  for  each  month)  

Solu'on:    U'lize  the  rela'on  of  another  co-­‐occurred  nouns    e.g.,  If  the  “月”  is  co-­‐occurred  with  a  noun  “太陽 (sun)”,  it  may  mean  the  moon  

Page 31: Automatic Selection of Predicates for Common Sense Knowledge Expression

Error  Analysis  3/3

•  Include  nouns  which  are  used  for  defining  the  rela'on  of  nouns  – 原因  (cause)  – 理由  (reason)  

Solu'on:  Discuss  how  we  limit  the  nouns  of  adding  target  

Page 32: Automatic Selection of Predicates for Common Sense Knowledge Expression

Conclusion

•  Described  the  selec'on  method  of  appropriate  predicate  as  CSK  for  construc'ng  the  CSKB.    – Method  for  sta's'cally  selec'ng  CSK  of  nouns  u'lizing  the  unique  number  of  co-­‐occurred  predicates.    

•  Evaluated  sets  of  CSK  which  are  assigned  to  each  noun  compared  with  three  baselines  – Demonstrated  assumed  characteris'cs  of  CKS  in  our  study    

– Gave  a  subjec've  evalua'on  •  Plan  to  make  a  quan'ta've  evalua'on