e-hownet- a lexical knowledge representation system
DESCRIPTION
E-HowNet- a Lexical Knowledge Representation System. Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica. Outline. - PowerPoint PPT PresentationTRANSCRIPT
E-HowNet- a Lexical Knowledge Representation
System
Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAPResearch Fellow Research Center for Information Technology Innovation &Institute of Information Science, Academia Sinica
Outline What is E-HowNet?
E-HowNet- Sense RepresentationMajor Features
Current status of E-HowNetAutomatic Construction of OntologyApply the Framework to Metadata
Representation of Digital Collections Conclusion and Future Work
What is E-HowNet?E-HowNet is an entity-relation model for lexical semantic representation extended from HowNet.The design of E-HowNet is for the purpose of automatic semantic composition and decomposition.
E-HowNet- Sense Representation Word sense definition- decompose a sense into
simpler senses and sense relations 果盤 fruit plate
def:{plate|盤 :telic={put| 放置 :
location={~},patient={fruit| 水果 }}}
玻璃盤 glass plate
def: {plate|盤 :material={glass| 玻璃 }}
圓盤 round plate
def: {plate|盤 :shape={round|圓 }}
Principles for sense definitions
Use hypernym and prominent properties to define concepts. Qualia structure- agentive, telic,
formal, and constitutive Use well-defined/primitive
concepts and relations to define new concepts.
Telic
狗食 dog food def: { 食物 :
telic={餵 :target={狗 },patient={~}}}
def: { food| 食品 : telic={feed|餵 : target={livestock| 牲畜 :telic={TakeCare| 照料 :patient={family| 家庭 },agent={~}}}, patient={~}}}
Agentive
早產兒 premature baby def: { 嬰兒 :agentive={ 早產 :patient={~}}}
def: {human|人 :age={child| 少兒 }, agentive={labour| 臨產 :manner={early|早 }, patient={~}}}
Formal
彩霞 rosy clouds def: {CloudMist| 雲霧 :color={colored|彩 }}
酸辣湯 spicy and sour soup def: {湯 :taste={酸 }.and.{辣 }} def: {food| 食品 :material={StateLiquid| 液態 },taste={sour|酸 }.and.{peppery|辣 }
Constitutive 草裙 grass skirt
def: {裙 :material={草 }} def: {clothing| 衣物 :telic={PutOn| 穿戴 : instrument={~},location={leg|腿 : whole={human|人 :gender={female|女 }}}}, material={FlowerGrass| 花草 }}
Major Features Lexical senses are expressed by either
primitive concepts (sememes) or basic concepts.
Semantic relations are explicitly expressed in E-HowNet representations.
A uniform representation for function words, content words and phrases.
Taxonomy for both entities and relations.
Semantic composition and decomposition capabilities.
Uniform representation and compositional semantics Preposition: 把 |ba def: goal={} Noun: 文章 |article def: {text| 語文 } Verb: 寫好 |have written def: {write|寫 :aspect={Vachieve| 達成 }} Phrase: 把文章寫好 |The article have
been written. {write|寫 :goal={text| 語文 },
aspect={Vachieve| 達成 }}
Taxonomy of E-HowNet • http://ehownet.iis.sinica.edu.tw• All| 全
• entity| 事物– event| 事件
• state| 狀態• Act| 行動• AttributeValue| 屬性值
– object| 物體• thing| 萬物• time| 時間• space| 空間
• relation| 關係– Semantic Role| 語意角色– function| 函數
Current status of E-HowNet Coarse-grained E-HowNet sense
representations for about 95,000 word-sense entries of CKIP Chinese dictionary.– About 45,000 different sense expressions– About 2,600 semantic primitives
(sememes 義原 )– About 200 semantic roles for objects– About 70 semantic roles for events
An automatic constructed ontology by appending and structuralizing all word senses to the HowNet top-level ontology.
Automatic construction of ontology Starting from the top-level ontology
(modified from HowNet ontology) creates lower-level ontology by subsumption relations of E-HowNet expressions.– Attach lexical senses: Words and associated
sense expressions are first attached to the top-level ontology nodes according to their head concepts.
– Sub-categorization by attribute-values: Lexical concepts with the same semantic head are further sub-categorized (creates a new node) according to their attribute-values.
– Repeat sub-categorization step: If there are many lexical concepts in one node with same extended feature values.
Examples: 衣衫 , {clothing|衣物 } 木屐 , {clothing|衣物 :location={foot|腳 },material={wood|木 }} 木鞋 , {clothing|衣物 :location={foot|腳 },material={wood|木 }} 球鞋 , {clothing|衣物 :location={foot|腳 },while={exercise|鍛鍊 }} 溜冰鞋 , {clothing|衣物 :location={foot|腳 },while={slide|滑 :location={ice|冰 },purpose={exercise|鍛鍊 :domain={sport|體育 }}}}
靴子 , {clothing|衣物 :location={foot|腳 },length={LengthLong|長 }} 運動褲 , {clothing|衣物 :location={leg|腿 },while={exercise|鍛鍊 }} 褲子 , {clothing|衣物 :location={leg|腿 }} 內衣 , {clothing|衣物 :qualification={private|私 }} 禮服 , {clothing|衣物 :qualification={formal|正式 }} 白紗 , {clothing|衣物 :qualification={formal|正式 },owner={human|人 :gender={female|女 },predication={GetMarried|結婚 :agent={~}}}}
婚紗 , {clothing|衣物 :qualification={formal|正式 },owner={human|人 :gender={female|女 },predication={GetMarried|結婚 :agent={~}}}}
Attach all lexical senses:
{clothing|衣物 } [衣衫 , 木屐 , 木鞋 , 球鞋 , 溜冰鞋 , 靴子 , 運動褲 , 褲子 , 內衣 , 禮服 , 白紗 , 婚紗 ]
Sub–categorization by attribute-values: {clothing|衣物 } [衣衫 ]
– 鞋子 |shoes [木屐 , 木鞋 , 球鞋 , 溜冰鞋 , 靴子 ]
– 褲子 |trousers [褲子 , 運動褲 ]– 內衣 |underwear [內衣 ]– 禮服 |ceremonial robe/dress [禮服 , 白紗 , 婚紗 ]
Repeat sub-categorization step: {clothing|衣物 } [衣衫 ]
– 鞋子 |shoes [球鞋 , 溜冰鞋 , 靴子 ]• { 木屐 } [木屐 , 木鞋 ]
– 褲子 |trousers [褲子 , 運動褲 ]– 內衣 |underwear [內衣 ]– 禮服 |ceremonial robe/dress [禮服 ]
• { 白紗 } [白紗 , 婚紗 ]
Apply the Framework to Metadata Representation of Digital Collections
奉華紙槌瓶={瓷瓶 : Time={北宋 },Type={汝窯 }}
瓷瓶={瓶子 :material={瓷 }
奉華紙槌瓶={瓶子 : material={瓷 }, Time={北宋 },Type={汝窯 }}
Apply the Framework to Metadata Representation of Digital Collections
青瓷水仙盆={瓷盆 : Time={北宋 },Type={汝窯 }, Telic={水仙 }}
瓷盆={盆 :material={瓷 }}
青瓷水仙盆={盆 :material={瓷 } , Time={北宋 }, Type={汝窯 } }, Telic={水仙 }}
Apply the Framework to Metadata Representation of Digital Collections
奉華紙槌瓶={瓶子 : material={瓷 }, Time={北宋 },Type={汝窯 }}
青瓷水仙盆={盆 :material={瓷 } , Time={北宋 }, Type={汝窯 } }, Telic={水仙 }}
Conclusion and Future Works
E-HowNet sense representations are updated from time to time.
The ontology can be rebuilt automatically based on the refined expressions.
New categories in the taxonomy can be identified and characterized by their specific attribute-values.
Uniform representations of function words and content words facilitate semantic composition and decomposition.
Because of E-HowNet’s semantic decomposition capability, the primitive representations for surface sentences with the same deep semantics are nearly canonical.