e-hownet- a lexical knowledge representation system

24
E-HowNet- a Lexical Knowledge Representation System Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project TELDAP Research Fellow Research Center for Information Technology Innovation &

Upload: gisela-hutchinson

Post on 30-Dec-2015

43 views

Category:

Documents


0 download

DESCRIPTION

E-HowNet- a Lexical Knowledge Representation System. Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: E-HowNet- a Lexical Knowledge Representation System

E-HowNet- a Lexical Knowledge Representation

System

Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAPResearch Fellow Research Center for Information Technology Innovation &Institute of Information Science, Academia Sinica

Page 2: E-HowNet- a Lexical Knowledge Representation System

Outline What is E-HowNet?

E-HowNet- Sense RepresentationMajor Features

Current status of E-HowNetAutomatic Construction of OntologyApply the Framework to Metadata

Representation of Digital Collections Conclusion and Future Work

Page 3: E-HowNet- a Lexical Knowledge Representation System

What is E-HowNet?E-HowNet is an entity-relation model for lexical semantic representation extended from HowNet.The design of E-HowNet is for the purpose of automatic semantic composition and decomposition.

Page 4: E-HowNet- a Lexical Knowledge Representation System

E-HowNet- Sense Representation Word sense definition- decompose a sense into

simpler senses and sense relations 果盤 fruit plate

def:{plate|盤 :telic={put| 放置 :

location={~},patient={fruit| 水果 }}}

玻璃盤 glass plate

def: {plate|盤 :material={glass| 玻璃 }}

圓盤 round plate

def: {plate|盤 :shape={round|圓 }}

Page 5: E-HowNet- a Lexical Knowledge Representation System

Principles for sense definitions

Use hypernym and prominent properties to define concepts. Qualia structure- agentive, telic,

formal, and constitutive Use well-defined/primitive

concepts and relations to define new concepts.

Page 6: E-HowNet- a Lexical Knowledge Representation System

Telic

狗食 dog food def: { 食物 :

telic={餵 :target={狗 },patient={~}}}

def: { food| 食品 : telic={feed|餵 : target={livestock| 牲畜 :telic={TakeCare| 照料 :patient={family| 家庭 },agent={~}}}, patient={~}}}

Page 7: E-HowNet- a Lexical Knowledge Representation System

Agentive

早產兒 premature baby def: { 嬰兒 :agentive={ 早產 :patient={~}}}

def: {human|人 :age={child| 少兒 }, agentive={labour| 臨產 :manner={early|早 }, patient={~}}}

Page 8: E-HowNet- a Lexical Knowledge Representation System

Formal

彩霞 rosy clouds def: {CloudMist| 雲霧 :color={colored|彩 }}

酸辣湯 spicy and sour soup def: {湯 :taste={酸 }.and.{辣 }} def: {food| 食品 :material={StateLiquid| 液態 },taste={sour|酸 }.and.{peppery|辣 }

Page 9: E-HowNet- a Lexical Knowledge Representation System

Constitutive 草裙 grass skirt

def: {裙 :material={草 }} def: {clothing| 衣物 :telic={PutOn| 穿戴 : instrument={~},location={leg|腿 : whole={human|人 :gender={female|女 }}}}, material={FlowerGrass| 花草 }}

Page 10: E-HowNet- a Lexical Knowledge Representation System

Major Features Lexical senses are expressed by either

primitive concepts (sememes) or basic concepts.

Semantic relations are explicitly expressed in E-HowNet representations.

A uniform representation for function words, content words and phrases.

Taxonomy for both entities and relations.

Semantic composition and decomposition capabilities.

Page 11: E-HowNet- a Lexical Knowledge Representation System

Uniform representation and compositional semantics Preposition: 把 |ba def: goal={} Noun: 文章 |article def: {text| 語文 } Verb: 寫好 |have written def: {write|寫 :aspect={Vachieve| 達成 }} Phrase: 把文章寫好 |The article have

been written. {write|寫 :goal={text| 語文 },

aspect={Vachieve| 達成 }}

Page 12: E-HowNet- a Lexical Knowledge Representation System

Taxonomy of E-HowNet • http://ehownet.iis.sinica.edu.tw• All| 全

• entity| 事物– event| 事件

• state| 狀態• Act| 行動• AttributeValue| 屬性值

– object| 物體• thing| 萬物• time| 時間• space| 空間

• relation| 關係– Semantic Role| 語意角色– function| 函數

Page 13: E-HowNet- a Lexical Knowledge Representation System

Current status of E-HowNet Coarse-grained E-HowNet sense

representations for about 95,000 word-sense entries of CKIP Chinese dictionary.– About 45,000 different sense expressions– About 2,600 semantic primitives

(sememes 義原 )– About 200 semantic roles for objects– About 70 semantic roles for events

An automatic constructed ontology by appending and structuralizing all word senses to the HowNet top-level ontology.

Page 14: E-HowNet- a Lexical Knowledge Representation System

Automatic construction of ontology Starting from the top-level ontology

(modified from HowNet ontology) creates lower-level ontology by subsumption relations of E-HowNet expressions.– Attach lexical senses: Words and associated

sense expressions are first attached to the top-level ontology nodes according to their head concepts.

– Sub-categorization by attribute-values: Lexical concepts with the same semantic head are further sub-categorized (creates a new node) according to their attribute-values.

– Repeat sub-categorization step: If there are many lexical concepts in one node with same extended feature values.

Page 15: E-HowNet- a Lexical Knowledge Representation System

Examples: 衣衫 , {clothing|衣物 } 木屐 , {clothing|衣物 :location={foot|腳 },material={wood|木 }} 木鞋 , {clothing|衣物 :location={foot|腳 },material={wood|木 }} 球鞋 , {clothing|衣物 :location={foot|腳 },while={exercise|鍛鍊 }} 溜冰鞋 , {clothing|衣物 :location={foot|腳 },while={slide|滑 :location={ice|冰 },purpose={exercise|鍛鍊 :domain={sport|體育 }}}}

靴子 , {clothing|衣物 :location={foot|腳 },length={LengthLong|長 }} 運動褲 , {clothing|衣物 :location={leg|腿 },while={exercise|鍛鍊 }} 褲子 , {clothing|衣物 :location={leg|腿 }} 內衣 , {clothing|衣物 :qualification={private|私 }} 禮服 , {clothing|衣物 :qualification={formal|正式 }} 白紗 , {clothing|衣物 :qualification={formal|正式 },owner={human|人 :gender={female|女 },predication={GetMarried|結婚 :agent={~}}}}

婚紗 , {clothing|衣物 :qualification={formal|正式 },owner={human|人 :gender={female|女 },predication={GetMarried|結婚 :agent={~}}}}

Page 16: E-HowNet- a Lexical Knowledge Representation System

Attach all lexical senses:

{clothing|衣物 } [衣衫 , 木屐 , 木鞋 , 球鞋 , 溜冰鞋 , 靴子 , 運動褲 , 褲子 , 內衣 , 禮服 , 白紗 , 婚紗 ]

Page 17: E-HowNet- a Lexical Knowledge Representation System

Sub–categorization by attribute-values: {clothing|衣物 } [衣衫 ]

– 鞋子 |shoes [木屐 , 木鞋 , 球鞋 , 溜冰鞋 , 靴子 ]

– 褲子 |trousers [褲子 , 運動褲 ]– 內衣 |underwear [內衣 ]– 禮服 |ceremonial robe/dress [禮服 , 白紗 , 婚紗 ]

Page 18: E-HowNet- a Lexical Knowledge Representation System

Repeat sub-categorization step: {clothing|衣物 } [衣衫 ]

– 鞋子 |shoes [球鞋 , 溜冰鞋 , 靴子 ]• { 木屐 } [木屐 , 木鞋 ]

– 褲子 |trousers [褲子 , 運動褲 ]– 內衣 |underwear [內衣 ]– 禮服 |ceremonial robe/dress [禮服 ]

• { 白紗 } [白紗 , 婚紗 ]

Page 19: E-HowNet- a Lexical Knowledge Representation System
Page 20: E-HowNet- a Lexical Knowledge Representation System

Apply the Framework to Metadata Representation of Digital Collections

奉華紙槌瓶={瓷瓶 : Time={北宋 },Type={汝窯 }}

瓷瓶={瓶子 :material={瓷 }

奉華紙槌瓶={瓶子 : material={瓷 }, Time={北宋 },Type={汝窯 }}

Page 21: E-HowNet- a Lexical Knowledge Representation System

Apply the Framework to Metadata Representation of Digital Collections

青瓷水仙盆={瓷盆 : Time={北宋 },Type={汝窯 }, Telic={水仙 }}

瓷盆={盆 :material={瓷 }}

青瓷水仙盆={盆 :material={瓷 } , Time={北宋 }, Type={汝窯 } }, Telic={水仙 }}

Page 22: E-HowNet- a Lexical Knowledge Representation System

Apply the Framework to Metadata Representation of Digital Collections

奉華紙槌瓶={瓶子 : material={瓷 }, Time={北宋 },Type={汝窯 }}

青瓷水仙盆={盆 :material={瓷 } , Time={北宋 }, Type={汝窯 } }, Telic={水仙 }}

Page 23: E-HowNet- a Lexical Knowledge Representation System

Conclusion and Future Works

E-HowNet sense representations are updated from time to time.

The ontology can be rebuilt automatically based on the refined expressions.

New categories in the taxonomy can be identified and characterized by their specific attribute-values.

Uniform representations of function words and content words facilitate semantic composition and decomposition.

Because of E-HowNet’s semantic decomposition capability, the primitive representations for surface sentences with the same deep semantics are nearly canonical.

Page 24: E-HowNet- a Lexical Knowledge Representation System