e-hownet- a lexical knowledge representation system

Post on 30-Dec-2015

45 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

E-HowNet- a Lexical Knowledge Representation System. Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

E-HowNet- a Lexical Knowledge Representation

System

Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAPResearch Fellow Research Center for Information Technology Innovation &Institute of Information Science, Academia Sinica

Outline What is E-HowNet?

E-HowNet- Sense RepresentationMajor Features

Current status of E-HowNetAutomatic Construction of OntologyApply the Framework to Metadata

Representation of Digital Collections Conclusion and Future Work

What is E-HowNet?E-HowNet is an entity-relation model for lexical semantic representation extended from HowNet.The design of E-HowNet is for the purpose of automatic semantic composition and decomposition.

E-HowNet- Sense Representation Word sense definition- decompose a sense into

simpler senses and sense relations 果盤 fruit plate

def:{plate|盤 :telic={put| 放置 :

location={~},patient={fruit| 水果 }}}

玻璃盤 glass plate

def: {plate|盤 :material={glass| 玻璃 }}

圓盤 round plate

def: {plate|盤 :shape={round|圓 }}

Principles for sense definitions

Use hypernym and prominent properties to define concepts. Qualia structure- agentive, telic,

formal, and constitutive Use well-defined/primitive

concepts and relations to define new concepts.

Telic

狗食 dog food def: { 食物 :

telic={餵 :target={狗 },patient={~}}}

def: { food| 食品 : telic={feed|餵 : target={livestock| 牲畜 :telic={TakeCare| 照料 :patient={family| 家庭 },agent={~}}}, patient={~}}}

Agentive

早產兒 premature baby def: { 嬰兒 :agentive={ 早產 :patient={~}}}

def: {human|人 :age={child| 少兒 }, agentive={labour| 臨產 :manner={early|早 }, patient={~}}}

Formal

彩霞 rosy clouds def: {CloudMist| 雲霧 :color={colored|彩 }}

酸辣湯 spicy and sour soup def: {湯 :taste={酸 }.and.{辣 }} def: {food| 食品 :material={StateLiquid| 液態 },taste={sour|酸 }.and.{peppery|辣 }

Constitutive 草裙 grass skirt

def: {裙 :material={草 }} def: {clothing| 衣物 :telic={PutOn| 穿戴 : instrument={~},location={leg|腿 : whole={human|人 :gender={female|女 }}}}, material={FlowerGrass| 花草 }}

Major Features Lexical senses are expressed by either

primitive concepts (sememes) or basic concepts.

Semantic relations are explicitly expressed in E-HowNet representations.

A uniform representation for function words, content words and phrases.

Taxonomy for both entities and relations.

Semantic composition and decomposition capabilities.

Uniform representation and compositional semantics Preposition: 把 |ba def: goal={} Noun: 文章 |article def: {text| 語文 } Verb: 寫好 |have written def: {write|寫 :aspect={Vachieve| 達成 }} Phrase: 把文章寫好 |The article have

been written. {write|寫 :goal={text| 語文 },

aspect={Vachieve| 達成 }}

Taxonomy of E-HowNet • http://ehownet.iis.sinica.edu.tw• All| 全

• entity| 事物– event| 事件

• state| 狀態• Act| 行動• AttributeValue| 屬性值

– object| 物體• thing| 萬物• time| 時間• space| 空間

• relation| 關係– Semantic Role| 語意角色– function| 函數

Current status of E-HowNet Coarse-grained E-HowNet sense

representations for about 95,000 word-sense entries of CKIP Chinese dictionary.– About 45,000 different sense expressions– About 2,600 semantic primitives

(sememes 義原 )– About 200 semantic roles for objects– About 70 semantic roles for events

An automatic constructed ontology by appending and structuralizing all word senses to the HowNet top-level ontology.

Automatic construction of ontology Starting from the top-level ontology

(modified from HowNet ontology) creates lower-level ontology by subsumption relations of E-HowNet expressions.– Attach lexical senses: Words and associated

sense expressions are first attached to the top-level ontology nodes according to their head concepts.

– Sub-categorization by attribute-values: Lexical concepts with the same semantic head are further sub-categorized (creates a new node) according to their attribute-values.

– Repeat sub-categorization step: If there are many lexical concepts in one node with same extended feature values.

Examples: 衣衫 , {clothing|衣物 } 木屐 , {clothing|衣物 :location={foot|腳 },material={wood|木 }} 木鞋 , {clothing|衣物 :location={foot|腳 },material={wood|木 }} 球鞋 , {clothing|衣物 :location={foot|腳 },while={exercise|鍛鍊 }} 溜冰鞋 , {clothing|衣物 :location={foot|腳 },while={slide|滑 :location={ice|冰 },purpose={exercise|鍛鍊 :domain={sport|體育 }}}}

靴子 , {clothing|衣物 :location={foot|腳 },length={LengthLong|長 }} 運動褲 , {clothing|衣物 :location={leg|腿 },while={exercise|鍛鍊 }} 褲子 , {clothing|衣物 :location={leg|腿 }} 內衣 , {clothing|衣物 :qualification={private|私 }} 禮服 , {clothing|衣物 :qualification={formal|正式 }} 白紗 , {clothing|衣物 :qualification={formal|正式 },owner={human|人 :gender={female|女 },predication={GetMarried|結婚 :agent={~}}}}

婚紗 , {clothing|衣物 :qualification={formal|正式 },owner={human|人 :gender={female|女 },predication={GetMarried|結婚 :agent={~}}}}

Attach all lexical senses:

{clothing|衣物 } [衣衫 , 木屐 , 木鞋 , 球鞋 , 溜冰鞋 , 靴子 , 運動褲 , 褲子 , 內衣 , 禮服 , 白紗 , 婚紗 ]

Sub–categorization by attribute-values: {clothing|衣物 } [衣衫 ]

– 鞋子 |shoes [木屐 , 木鞋 , 球鞋 , 溜冰鞋 , 靴子 ]

– 褲子 |trousers [褲子 , 運動褲 ]– 內衣 |underwear [內衣 ]– 禮服 |ceremonial robe/dress [禮服 , 白紗 , 婚紗 ]

Repeat sub-categorization step: {clothing|衣物 } [衣衫 ]

– 鞋子 |shoes [球鞋 , 溜冰鞋 , 靴子 ]• { 木屐 } [木屐 , 木鞋 ]

– 褲子 |trousers [褲子 , 運動褲 ]– 內衣 |underwear [內衣 ]– 禮服 |ceremonial robe/dress [禮服 ]

• { 白紗 } [白紗 , 婚紗 ]

Apply the Framework to Metadata Representation of Digital Collections

奉華紙槌瓶={瓷瓶 : Time={北宋 },Type={汝窯 }}

瓷瓶={瓶子 :material={瓷 }

奉華紙槌瓶={瓶子 : material={瓷 }, Time={北宋 },Type={汝窯 }}

Apply the Framework to Metadata Representation of Digital Collections

青瓷水仙盆={瓷盆 : Time={北宋 },Type={汝窯 }, Telic={水仙 }}

瓷盆={盆 :material={瓷 }}

青瓷水仙盆={盆 :material={瓷 } , Time={北宋 }, Type={汝窯 } }, Telic={水仙 }}

Apply the Framework to Metadata Representation of Digital Collections

奉華紙槌瓶={瓶子 : material={瓷 }, Time={北宋 },Type={汝窯 }}

青瓷水仙盆={盆 :material={瓷 } , Time={北宋 }, Type={汝窯 } }, Telic={水仙 }}

Conclusion and Future Works

E-HowNet sense representations are updated from time to time.

The ontology can be rebuilt automatically based on the refined expressions.

New categories in the taxonomy can be identified and characterized by their specific attribute-values.

Uniform representations of function words and content words facilitate semantic composition and decomposition.

Because of E-HowNet’s semantic decomposition capability, the primitive representations for surface sentences with the same deep semantics are nearly canonical.

top related