bucharest, 30 july 2003 computational lexicons and the semantic web alessandro lenci università di...

75
Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica Computazionale - CNR

Upload: imogene-walton

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexicons and

the Semantic Web

Alessandro Lenci

Università di Pisa – Department of Linguistics

&

Istituto di Linguistica Computazionale - CNR

Page 2: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Tutorial Outline

Computational lexicons for the Semantic Web (SW) how they are how they should be

The SW for computational lexicons lexicon design in the age of the SW

Training session case study – lexical modelling in RDF/S

Page 3: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The Semantic Web Vision

Semantic Web

Turning the WWW into a machine understandable knowledge base

Ontologies

KnowledgeMarkup

IntelligentAgents

Applications

Documents

Databases

Page 4: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Six Challenges for the SW(Benjamins et al. 2002)

1. Content availability

2. Ontology availability

3. Multilinguality

4. Scalability

5. Visualization

6. Stability of SW languages

Page 5: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Six Challenges for the SW(Benjamins et al. 2002)

1. Content availability

2. Ontology availability

3. Multilinguality

4. Scalability

5. Visualization

6. Stability of SW languages

Human LanguageTechnology

(HLT)

Page 6: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Lexical Information and HLT

All language analysis involves determining meaning at some level Anything from groups of related words to a full-blown

representation of each sentence

bank…………… ………account………………………money…………

Information retrieval

Topic = financial John went to the store

GO

AGENT John TARGET store

Page 7: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexicons and HLT

Explicit representation of word meaning word content accessible to computational agents

Word meaning linked to word syntax and morphology

Multilingual lexical links

Computational lexicons provide machine understandable word knowledge

Page 8: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Contain the linguistic information required to build meaning representations

bank…………… ………account………………………money…………

account n. domain [financial]account v. …bank_1 n. domain: [financial]bank_2 n. domain: [geography]money n. domain: [financial]

Lexicon went vpast GOgo v. (NP_SUBJ ((role AGENT) (sem +animate)) (VP ((verb GO) (PP ((prep TO) (NP ((role TARGET) (sem +loc)))))John n. sem : humanstore n. sem: loc

Lexicon

John went to the store

GO

AGENT John TARGET store

Topic = financial

Computational Lexicons and HLT

Page 9: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Critical language resources for NLP systems syntactic subcategorization frames for parsing semantic selectional preferences for ambiguity

reduction semantic classes for WSD, semantic tagging, etc.

Key components of HLT monolingual lexicons – IE, QA, etc. multilingual lexicons – MT, CLIR, etc.

Computational Lexicons and HLT

Page 10: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Ontologies and Computational Lexicons

Semantic Web

OntologiesComputational

Lexicons

HLTAccess toContent

?

Page 11: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Ontologies

An ontology is a system of concepts relevant for knowledge and action in (a portion of) the world categorization of objects and processes inference action planning …

“An ontology is a specification of a conceptualization”(Gruber 1993)

Page 12: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Ontologies

OBJECT

EVENT

LOCATION

ARTIFACT

ANIMAL

ENTITY

“A set of knowledge terms, including the vocabulary, the semantic interconnections,

and some simple rule of inference and logic”(Hendler 2001)

Page 13: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Types of Ontologies

Foundational Ontology

Domain Core Ontology

Domain Specific Ontology

OBJECT

SOFTWARE

WORD_PROCESSOR

Horizontal typology:

Information System ontology

AI ontology

Linguistic ontology

Vertical typology:

Page 14: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Linguistic Ontology

A system of symbols representing the concepts (meanings) encoded by NL expressions (lexical units, terms, etc.) specify semantic classes grouping semantically similar terms semantic representation language interlingua

OBJECT

EVENT

LOCATION

ARTIFACT

ANIMAL

ENTITY

VEHICLE

MAMMAL

BEACH

CONCERT

dog, cat, horse

car, van, truck

beach

piano concert, rock concert

spiaggia

Page 15: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Language/s

Ontologies and Computational Lexicons

ConceptSpace

Ontology

ComputationalLexicon

Semantics

Syntax

Morphology

Multilinguality

Page 16: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexiconstipology

Monolingual vs. multilingual General purpose vs. domain (application) specific Content type

(Morpho)-Syntactic Semantic Mixed Terminological

Page 17: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Syntactic Computational Lexicons

Syntactic lexical information is distilled in subcategorization frames

ComLex, PAROLE, etc.

Syntactic frames typically include: number of selected arguments syntactic categories of their realizations (PP, NP, etc.) lexical constraints on argument realization (e.g. preposition heading

a PP) argument functional role (Subj, Obj, etc.) optionality, control, auxiliary selection, etc.

hit [V: (Subj: NP) (Objd: NP)]answer [N: (Obji: PP_to)]

Page 18: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Semantic Computational Lexicons

Representing the meaning of a word (minimally) requires Distinguishing different senses of the word

E.g. bank : finacial institution vs. geographical configuration Capturing inferences

E.g. being human implies being animate Representing similarity of meaning with other words

E.g. bank, account, money all related to finances

Page 19: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Semantic Computational Lexicons

Mikrokosmos (Nirenburg, Mahesh et al.) WordNet (Miller, Fellbaum et al.)

EuroWordNet (Vossen et al.)

SIMPLE (Calzolari, Lenci et al.) FrameNet (Fillmore et al.)

Page 20: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexiconsdesign issues

Network based hierarchy (taxonomy)

WordNet heterarchy

EuroWordNet

Frame based Mikrokosmos FrameNet

Hybrid SIMPLE

Page 21: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

EuroWordNet

Page 22: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

EuroWordNetTop Ontology

skinhairbody-covering

Top

1stOrderEntity 2ndOrderEntity

SituationType SituationComponent

Living

Location ExperiencePhysicalStatic DynamicNaturalCovering Part Group

Composition OriginFunction Form

Etc….Etc.

bodypartcellmuscleorgan

Object

Human

Mental

Directiondistancespatial propertyspatial relationcoursepath

change of positiondividelocomotionmotion

feeldesiredisturbanceemotionfeelinghumorpleasance

churchcompanyinstituteorganizationpartyunion

humanadultadult femaleadult malechildnativeoffspring

Page 23: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

EuroWordNet

Page 24: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

PAROLE-SIMPLE Lexicons

12 EU monolingual core lexicons built according to a harmonized model and further extended at the national level

Integrated combinations of syntactic and semantic information: syntactic subcategorization frames semantic type (“Ontology”) semantic frames linked to syntax

semantic roles selectional preferences etc.

semantic relations Pustejovsky’s “qualia roles”, etc. regular polysemy event structure

Page 25: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Italian lexicon

etc.

Greek lexiconGreek lexicon

PAROLE Syntax

Italian lexiconItalian lexicon

Catalan lexiconCatalan lexicon

SIMPLE Architecture

OntologyLexical

Templates

Language Independent Module

SemU

SemanticRelations

EventStructure

Polysemy

Semantic Frame(semantic roles, etc.)

Page 26: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Top

Formal Constitutive Agentive Telic

Is_a Is_a_part_of Property

Contains

Created_by Agentive_cause Indirect_telic Activity

Instrumental Is_the_habit_of

Used_for Used_as

... ...

SIMPLEsemantic relations

Page 27: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

<parte>part

Isa

Isa

Isa

<volare>fly

Used_for

Used_for

<aeroplano>airplane

Is_a_part_of

<uccello>bird

Is_a_part_of

<edificio>building

Is_a_part_of

Ala (wing)

SemU: 3232Type: [Part]Part of an airplane

SemU: 3268Type: [Part]Part of a building

SemU: D358Type: [Body_part]Organ of birds for flying

SemU: 3467Type: [Role]Role in football

<giocatore>player

Isa

Agentive

SIMPLEsemantic network

<fabbricare>make

Agentive

Page 28: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

SIMPLEsemantic frames

PREDemploy#1

Arg#1<AGENT - HUMAN>

Arg#2<PATIENT - HUMAN>

SemU

employer

SemU

employee

SemU

employment

SemU

to employ

agentnominalization

patientnominalization

eventnominalization

master link

Page 29: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Comprendere V

SemU: 61725

Type: [Cognitive_event]

To understand

SemU: 6962

Type: [Constitutive_state]

To include

Comprensione N

SemU: 61726

Type: [Cognitive_event]

Understanding

SIMPLEsemantic frames

PREDComprendere#1 <Arg1 [+human]>, <Arg2 [+semiotic]>

PREDComprendere#2<Arg1 [+Entity]>, <Arg2[Entity]>

Page 30: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

il difensore di Berlusconi (Berlusconi's defender)

il difensore del Milan (the Milan fullback)

Difensore N

SemU: 4125

Type: [Role]

Defender

SemU: 3526

Type: [Role]

Fullback

agentnominalization

<squadra>teamIs_a_member_of

SIMPLEsemantic frames

PREDDifendere#1<Arg1>, <Arg2>

Page 31: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Semantic multidimensionality

Identification of the semantic contribution of an NP requires to access a rich representation of semantic content of the nominal heads

The “semantic structure” of the nominal head determines the semantic relation expressed by a modifying PP (in Italian):

1. la pagina del libro (the page of the book)

2. il difensore del Milan (the Juventus fullback)

3. il suonatore di liuto (the lute player)

4. il tavolo di legno (the wooden table)

PART-OF

MEMBER-OF

TELICMADE-OF

Page 32: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

SIMPLEsample entries

semantic frame

semantic relations

ontology

Page 33: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexiconsloose ends

Non-compositional aspects in the lexicon collocations, terms, MWEs, etc.

Integration between lexicons and corpus data lexical tuning, data-driven lexicon population, etc.

Semantic dynamics (polysemy, lexical creativity, etc.) “context-sensitivity” of meaning as a challenge for lexical

semantics sense enumeration vs. sense generation heavy smoker, heavy book, heavy road, heavy sea, heavy wine, heavy sky,

heavy artillery, etc.

Page 34: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexiconsloose ends

Semantic type system for lexical senses must account for a non-static kaleidoscope of senses

Salience of aspects of meaning differ for different types natural kinds Is-a; artifacts function

Possible solutions: multiple layers of representation explicit identification of information so that NLP systems can

access what is needed at a given time “dynamic type systems”

Page 35: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexiconsnew challenges from the SW

From language resources for HLT to knowledge resources for inferential engines in-depth lexical description for better content understanding

Content interoperability between computational lexicons better integration between lexical information from different

sources

Beyond the lexical information bottleneck automatic lexical knowledge acquisition

Page 36: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Lexical Inferences

“Midfielder Scott Sellars was sold to Blackburn for $35,000 and was bought back in the summer for $750,000.”

(FrameNet Corpus)

$35,000:

SellarsScott Midfielder :

Blackburn:

buy:

1

money

goods

buyer

event

e

$750,000:

SellarsScott Midfielder :

Blackburn:

buy:

2

money

goods

seller

event

e

after e1:OWN (buyer, goods)NOT(OWN (buyer, money))

after e2:NOT(OWN (seller, goods))OWN (seller, money)

e1 < e2

TIME e2 = SUMMER

Page 37: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Hot Topics

In-depth lexical analysis e.g. X buys Y from Z at t ==> Z owns Y before t &

X owns Y after t Key issues at the lexicon-grammar interface

predicate event structure states, processes, accomplishments, etc.

temporal adverbs and temporal expressions e.g. in three years, etc.

quantificational expressions etc. syntax-semantics argument linking

To provide SW agents with high inferential capacities in accessing linguistic content

Page 38: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Computational Lexicons and

the Semantic Web

Part 2

Lexicon Design in the Age of the Semantic Web

Page 39: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Lexicons of the Future

General purpose portable over different domains

Multilingual relations among lexical entities in different languages

Flexible and extensible enable use of information at appropriate granularity for the

application enable continual extension : “dynamic”

Integrated with Web technology content interoperability

Page 40: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Lexical Content Interoperability

SIMPLE

WordNet

FrameNet

The Lexical WebEnable universal access to lexical information

IntelligentAgents

EuroWordNet

Page 41: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Some Requirements for Lexical Content Interoperability

Compatibility between different models of lexical analysis relational semantic models (e.g. WordNet) Syntactic and semantic frames …

Compatibility between different degrees of lexical specification deep lexical representations (e.g. PAROLE-SIMPLE) shallow semantic descriptions

Compatibility between different paradigms of multilinguality lexicons for transfer-based MT interlingua-based lexicons …

Page 42: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The Need for Standards

To represent common information ……while keeping flexibility

To enhance the sharing and reusability of multilingual lexical resources

To establish an open environment for the development and integration of multilingual resources

Information must be consistent with related technologies in order to take advantage of them XML, RDF/S, etc.

Page 43: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

International Standards for Language Engineering

Computational Lexicon Working Group (CLWG)

Definition of standards for multilingual computational lexicons both at the content

and at the representational level

Page 44: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

EAGLES guidelines for syntactic and semantic lexiconsGENELEX

Model

PAROLE-SIMPLELexicons

MultilingualLexicons

(EuroWordNet, etc.) MILE Lexical Model

ISLE

Page 45: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The MILE Lexical Model

A general architecture to foster the content interoperability between multilingual computational lexicons

Key issues: Modularity User-adaptability Resource sharing Reusability

SW technologies and standards applied at lexicon modelling

Page 46: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The MILE Lexical Model (MLM)

The MLM core is the Multilingual ISLE Lexical Entry (MILE) a general schema for multilingual lexical resources a lexical meta-entry as a common representational layer for

multilingual lexicons Computational lexicons can be viewed as different

instances of the MILE schema

MILELexical Model

lexicon#1 lexicon#3lexicon#2

Page 47: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MILEthe building-block model

The MILE architecture is designed according to the building-block model: Lexical entries are obtained by combining various types

of lexical objects (atomic and complex) Users design their lexicon by:

selecting and/or specifying the relevant lexical objects combine the lexical objects into lexical entries

Lexical objects may be shared: within the same lexicon (intra-lexicon reusability) among different lexicons (inter-lexicon reusability)

Page 48: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

syntacticframe

phraseslot Synfeature

Lexical Objects

Semfeature

MILEthe building-block model

Lexical entry 1 Lexical entry 2 Lexical entry 3

Page 49: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Modularity in MILE

morphologicallayer

syntactic layer

semantic layer

linkingconditions

mono-Mile

multi-MILE

multilingualcorrespondence

conditions

mono-Mile

multiple levels of

modularity

Page 50: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The Mono-MILE

Each monolingual layer within Mono-MILE identifies a basic unit of lexical description

morphological layer MU

basic unit to describe the inflectional and derivational morphological properties of the word

syntactic layer SynU

basic unit to describe the syntactic behavior of the MU

semantic layer SemUbasic unit to describe the semantic properties of the MU

Page 51: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The Mono-MILE

MU

SynU

SynU

SynU

SynU

SemUSemU

SemU

SemUSemU

SemU

SemU

Page 52: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Syntax-Semantics Linking

CorrespSynUSemU

linkingSlot_0:Arg_1

Slot_1:Arg_0

SemU

Predicate

Arg_0

Arg_1

SynU

Self

Slot_1

Slot_0

filters&

conditions Expressed by pointing to syntactic and semantic elements

Page 53: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Syntax-Semantics Linking

John gave the book to Mary

John gave Mary the book

SynU#1

obj_NP obl_PP_to

SemU#1

Semantic_Frame:GIVE

Arg1Agent

subj_NP

SynU#2

obj_NP obj_NPsubj_NP

Arg2Theme

Arg3Goal

Page 54: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The Multi-MILE

Open to various approaches to multilinguality transfer-based

monolingual descriptions are used to state complex correspondences (tests and actions) between source and target entries

interlingua-based monolingual entries linked to language-independent

lexical objects (e.g. semantic frames, “primitive predicates”, etc.)

Page 55: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MU_1

SynU_2

SemU_2

SynU_1

SemU_1

Italianmono-MILE IT-to-EN multi-MILE

Multi-MILE

IT_SemU_2 En_SemU_1

IT_SynU_2 En_SynU_1

IT_Slot_0 EN_Slot_1

IT_Slot_1 EN_Slot_0

MU_1

SynU_1

SemU_1

Englishmono-MILE

AddFeature to source SemU

+HUMAN

AddSlot to target SynU

MODIF [PP_with]

Page 56: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Multi-MILE

dito

finger

toe

modif(mano)

modif(piede)

multilingual conditions

run + PP_intoentrare“to enter” +PP_di_corsa

multilingual conditions

IT Lexicon EN Lexicon

Page 57: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Defining the MLM

The MLM is designed as an E-R model (MILE Entry Schema) defines the lexical objects and the ways they can be

combined into a lexical entry The MLM includes two types of lexical

objects: MILE Lexical Classes (MLC) MILE Lexical Data Categories (MDC)

Page 58: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MILE Lexical Classes

Represent the main building blocks of lexical entries Define an ontology of lexical objects

represent lexical notions such as semantic unit, syntactic feature, syntactic frame, semantic predicate, semantic relation, synset, etc.

Similar to class definitions in OO languages specify the relevant attributes define the relations with other classes hierarchically structured

Page 59: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MILE Lexical Classesan ontology of lexical objects

MLM:SemU

id: xs:anyURI comment: xs:string example: xs:string

MLM:Synset correspondsToSynset

*

MLM:SemanticFrame

MLM:semValues

hasSemanticFrame

0..1

MLM:SemU semURelation

*

MLM:SemURelation

MLM:Collocation hasCollocation

*

semFeature

*

Page 60: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MILE Lexical Data Categories

MDC are instances of the MILE lexical Classes Each MDC respresents a resource

uniquely identified by a URI Two types of MDC:

Core MDC belong to shared repositories (Lexical Data Category Registry) lexical objects and linguistic notions with wide consensus

User Defined MLDC user-specific or language specific lexical objects

Page 61: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MILE Lexical Data Categories

MLM:Feature

MLM:SemFeature

MLM:SynFeature

HUMANARTIFACTUALEVENTDURATIONGROUP

AGEANIMATE

instance_of

Core

UserDefined

MDC

GENDERCASEPERSONTENSECONTROL

ASPECT

Core

UserDefined

instance_of

MDC

MLM:GrammaticalFunction

SUBJOBJIOBJPREDX_COMPC_COMP

Core

UserDefined

instance_of

MDC

Page 62: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Defining the MLM

MILEEntry Schema

MILE LexicalClasses

User DefinedMDC

MDCRegistry

RDF/SDescriptions

Monolingual/MultilingualLexicon

Page 63: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

RDF Instantiation of the MLM

Lexicon#1Lexicon#2

Lexicon#3 Resources

LexicalObjects

LexicalClasses

LexicalData Categories

Resources

Metadata

Page 64: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

General Means

W3C standards: Resource Definition Framework (RDF/S) Ontology Web Language (OWL) …

Built on the XML web infrastructure to enable the creation of a Semantic Web web objects are classified according to their properties semantics of relations (links) to other web objects

precisely defined

Page 65: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MILE Lexical Model

Ideal structure for rendering in RDF: hierarchy of lexical objects built up by combining

atomic data categories via clearly defined relations

Proof of concept: Create an RDF schema for the MILE Lexical

Modelversion 1.2

Instantiate MILE Lexical Data Categories

Page 66: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

The RDF Schema

Defines classes of objects (MLC) and their relations to other objects

Like a class definition in Java, etc. Classes and properties in the schema

correspond to the E-R model Can specify sub-classes/sub-properties and

inheritance

Page 67: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MILE Lexical Data Category Registry (MDC)

Instantiation of pre-defined lexical objects Extension of the shared class schema with lexicon-

specific sub-classes and sub-properties Can be used “off the shelf” or as a departure point

for the definition of new or modified categories Enables modular specification of lexical entities

eliminate redundancy identify lexical entries or sub-entries with shared

properties

Page 68: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MLC in RDF/S features

mlm:LexObject mlm:Valuesmlm:feature

mlm:SemValues

mlm:SynValues

rdfs:subClassOfmlm:semFeature

rdfs:subClassOf

mlm:synFeature

rdfs:subPropertyOf

features are properties of lexical objects

Page 69: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MLC in RDF/S syntactic features

<rdfs:Property rdf:ID=“synCat"><rdfs:subPropertyOf

rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#synFeature"/>

<rdfs:rangerdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#SynCatValues”/>

</rdfs:Property>

<rdfs:Class rdf:ID=“SynCatValues”><rdfs:subClassOf

rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1 #SynValues”/>

<owl:oneOf rdf:parseType="Collection"><owl:Thing rdf:about="#Noun"/><owl:Thing rdf:about="#Verb"/><owl:Thing rdf:about="#Adjective"/>...

</owl:oneOf> </rdfs:Class> </rdfs:RDF>

feature values

Page 70: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

MLC in RDF/S semantic features

<rdfs:Property rdf:ID=“domain"><rdfs:subPropertyOf

rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#semFeature"/>

<rdfs:rangerdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1 #DomainValues”/>

</rdfs:Property>

<rdfs:Class rdf:ID=“DomainValues”><rdfs:subClassOf

rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#SemValues”/>

<owl:oneOf rdf:parseType="Collection"><owl:Thing rdf:about="#Finance"/><owl:Thing rdf:about="#Medicine"/><owl:Thing rdf:about="#Sport"/>...

</owl:oneOf> </rdfs:Class> </rdfs:RDF>

“domain ontology”

Page 71: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Synsets in RDF/S

mlm:Synset rdfs:literalmlm:word

mlm:Synset

mlm:synsetRelation

mlm:Values

rdfs:literalmlm:gloss

mlm:feature

cf. also http://www.semanticweb.org/library/wordnet/wordnet-20000620.rdfs

Page 72: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

<rdfs:Class rdf:ID="Synset"><rdfs:label>Synset</rdfs:label><rdfs:comment>This class formalizes the notion of synset as defined in WordNet (Fellbaum 1998).</rdfs:comment><rdfs:subClassOf rdf:resource=“#LexObject”/>

</rdfs:Class>

<rdfs:Property rdf:ID="synsetRelation"><rdfs:domain rdf:resource="#Synset"/><rdfs:range rdf:resource="#Synset"/>

</rdfs:Property>

<rdfs:Property rdf:ID="hypernym" mlm:source="WordNet1.7"><rdfs:comment>The WordNet hypernym relation</rdfs:comment><rdfs:subPropertyOf rdf:resource="#synsetRelation"/>

</rdfs:Property><rdfs:Property rdf:ID="meronym" mlm:source="WordNet1.7">

<rdfs:comment>The WordNet meronym relation</rdfs:comment><rdfs:subPropertyOf rdf:resource="#synsetRelation"/>

</rdfs:Property>

Synsets in RDF/S

relation between synsets

different types of synset relations

Page 73: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

<mlm:Synset rdf:about="http://www.cogsci.princeton.edu/~wn1.7/concept#01752990“ mlm:source="WordNet1.7">

<mlm:gloss>A member of the genus Canis</mlm:gloss><mlm:word>dog</mlm:word><mlm:word>domestic dog</mlm:word><mlm:word>Canis familiaris</mlm:word><mdc:synCat rdf:resource="#Noun"/><mdc:domain rdf:resource="#Zoology"/><mdc:hypernymrdf:resource="http://www.cogsci.princeton.edu/~wn1.7/concept

#01752283"/></mlm:Synset>

WordNet 1.7 Synsets

featureshypernym

Page 74: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Conclusions and Future Work

The MILE Lexical Model is oriented towards open, distributed lexical resources:

Lexical Information Servers for multiple access to lexical information repositories

Enhance user-adaptivity and resource sharing Develop integration and interchange tools Promote interchange with the Semantic Web and Ontology

communities Related projects and initiatives:

ISO, INTERA, ENABLER, etc.

Page 75: Bucharest, 30 July 2003 Computational Lexicons and the Semantic Web Alessandro Lenci Università di Pisa – Department of Linguistics & Istituto di Linguistica

Bucharest, 30 July 2003

Acknowledgements

S. Atkins, N. Bel, F. Bertagna, P. Bouillon, N. Calzolari, C. Fellbaum, R. Grishman, N. Ide, M. Palmer, W. Peters, G. Thurmair,

M. Villegas, P. Wittenburg, A. Zampolli

and many others …