archiwordnet integrating wordnet with domain-specific knowledge luisa bentivogli 1, andrea bocco 2,...

38
GWC 2004 - Brno, January 20-23, 2004 ArchiWordNet Integrating WordNet with Domain-Specific Knowledge Luisa Bentivogli 1 , Andrea Bocco 2 , Emanuele Pianta 1 1 ITC-irst Trento, Italy 2 Politecnico di Torino, Italy

Upload: allyson-bates

Post on 30-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

GWC 2004 - Brno, January 20-23, 2004

ArchiWordNetIntegrating WordNet with

Domain-Specific Knowledge

Luisa Bentivogli1, Andrea Bocco2, Emanuele Pianta1

1ITC-irst Trento, Italy2Politecnico di Torino, Italy

GWC 2004 - Brno, January 20-23, 2004

Outline

• ArchiWordNet: a WordNet-like thesaurus

• Adopting and adapting the MultiWordNet model

• Integrating ArchiWordNet with MultiWordNet

• Conclusion and future work

GWC 2004 - Brno, January 20-23, 2004

Outline

• ArchiWordNet: a WordNet-like thesaurus

• Adopting and adapting the MultiWordNet model

• Integrating ArchiWordNet with MultiWordNet

• Conclusion and future work

GWC 2004 - Brno, January 20-23, 2004

ArchiWordNet: a WordNet-like thesaurus

• A bilingual English/Italian thesaurus for the “Architecture and Construction” domain

– structured according to the WordNet model – fully integrated with MultiWordNet

MultiWordNet

A multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton’s English WordNet.

GWC 2004 - Brno, January 20-23, 2004

Motivation

• Still Image Server, an architecture image archive available at the Polytechnic of Turin

– need for a thesaurus:• Image cataloguing (minimize subjectivity) • Image retrieval (minimize ambiguity)

• No exhaustive thesauri for the architecture domain are available

GWC 2004 - Brno, January 20-23, 2004

Why (Multi)WordNet model?

• A rich and rigorous structure– synonyms– many relations explicitly and homogeneously

encoded

• Allows for a more powerful and expressive retrieval mechanism– no ambiguities– extended search with related concepts

• Is more suitable for educational purposes

GWC 2004 - Brno, January 20-23, 2004

Why integrated with MultiWN?

• General and multilingual framework for the specialized knowledge

• Integrated access allowing for a more flexible retrieval of the information

• Information already existing in the generic (Multi)WordNet can be exploited in the creation of the specialized one

GWC 2004 - Brno, January 20-23, 2004

Outline

• ArchiWordNet: a WordNet-like thesaurus

• Adopting and adapting the MultiWordNet model

• Integrating ArchiWordNet with MultiWordNet

• Conclusion and future work

GWC 2004 - Brno, January 20-23, 2004

Adopting MultiWN model

• Sources:– Specialized sources

• Art and Architecture Thesaurus (AAT) • Construction Indexing Manual of CI|SfB • International and National standards (ISO, CEN, UNI)• Architecture and Building Dictionaries• Domain literature

– MultiWN itself

• Issues:– Reorganize specialized sources to make them compatible

with the MultiWN model– Modify MultiWN synsets to make them suitable for

representing the specialized domain

GWC 2004 - Brno, January 20-23, 2004

<by composition or origin><by quality> <by form>

METAL

noble_metal aluminumsteelalloy powder leaf

Reorganizing domain-specific sources

METAL

metal_powder

noble_metalsteel aluminum alloy

metal_leaf

leaf powder

ISA

ISA

ISA

ISA

ISA

ISA

ISA

ISA

ISA

AAT hierarchy

ArchiWN hierarchy

GWC 2004 - Brno, January 20-23, 2004

Tailoring MultiWN synsets

• MultiWN synsets considered appropriate by the domain experts are included into ArchiWN

• Several options are available:

– add or delete synonyms to MultiWN synsets – modify MultiWN definitions of the synsets – delete and add relations between synsets

GWC 2004 - Brno, January 20-23, 2004

New relations for ArchiWN

• HAS FORM (n/n) – {tympanum} HAS-FORM {triangle, trigon, …}

• HAS ROLE (n/n) – {metal section} HAS-ROLE {upright, vertical}

• HAS FUNCTION (n/v) – {beam} HAS-FUNCTION {to hold, to support,…}

GWC 2004 - Brno, January 20-23, 2004

Outline

• ArchiWordNet: a WordNet-like thesaurus

• Adopting and adapting the MultiWordNet model

• Integrating ArchiWordNet with MultiWordNet

• Conclusion and future work

GWC 2004 - Brno, January 20-23, 2004

Integrating ArchiWN with MultiWN

• 5,000 terms grouped in 13 semantic areas => the main ArchiWN hierarchies

• Architectural styles• Materials• Construction products• Techniques• Tools• Components of buildings• Single buildings and

building complexes

• Physical properties • Conditions• Disciplines• People• Documents• Drawings and representations

GWC 2004 - Brno, January 20-23, 2004

Integration issues

• Identify the MultiWN nodes where to insert the ArchiWN hierarchies

• Include ArchiWN hierarchies in MultiWN• Handle the overlaps between terms present in both

MultiWN and ArchiWN • Handle the possible inconsistencies in the

hierarchies

GWC 2004 - Brno, January 20-23, 2004

The integration methodology

• Basic operations– performed on single MultiWN synsets

• Complex procedures (plug-in)– apply to entire hierarchies

GWC 2004 - Brno, January 20-23, 2004

Basic operations

• eclipse a synset

• tag a synset with the “architecture and construction” domain label

• add or delete relations to a synset

• add or delete synonyms in a synset

• modify the synset definition

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-in

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inMWN MWN

MWN

MWN

MWN

MWN

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inAWN AWN

AWN

AWN

MWN

MWN

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inMWN MWN

MWN

MWN

MWN

MWN

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inAWN AWN

AWN

MWN

MWN

MWN

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inMWN MWN

MWN

MWN

MWN

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inMWN MWN

MWN

MWN

MWN

AWN AWN

AWN

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inAWN AWN

AWN

AWN

AWN

GWC 2004 - Brno, January 20-23, 2004

Complex procedures

• Substitutive plug-in

• Integrative plug-in

• Hyponymic plug-in

• Inverse plug-inAWN AWN

AWN

AWN

AWN

MWN MWN

MWN

GWC 2004 - Brno, January 20-23, 2004

Results

• 13 ArchiWN semantic areas plugged in 18 MultiWN synsets– 11 ArchiWN semantic areas (12 hierarchies) directly

plugged in MultiWN• 4 substitutive plug-ins

• 8 integrative plug-ins

– 2 ArchiWN semantic areas (6 hierarchies) required a reorganization of some MultiWN sub-hierarchies

• 4 hyponymic plug-ins

• 2 inverse plug-ins

• large synset eclipsing

GWC 2004 - Brno, January 20-23, 2004

ArchiWN up to now

• “Single buildings and building complexes” sub-hierarchy – 900 synsets

– Italian and English synonyms

– accurate definition

• Work done manually using the MultiWN graphical interface which allows the user– to modify existing synsets and relations

– to create new synsets

GWC 2004 - Brno, January 20-23, 2004

Outline

• ArchiWordNet: a WordNet-like thesaurus

• Adopting and adapting the MultiWordNet model

• Integrating ArchiWordNet with MultiWordNet

• Conclusion and future work

GWC 2004 - Brno, January 20-23, 2004

Conclusions

• It is possible to integrate ArchiWN with MultiWN• MultiWN itself can be widely exploited in the

creation of ArchiWN hierarchies

• Advantages of interdisciplinary cooperation – wrt specialized thesauri

• formalized structure

• inheritance of linguistic-oriented information from the generic WordNet

– wrt lexical resources• many synsets will be associated with images

GWC 2004 - Brno, January 20-23, 2004

Future work

• Go on enriching the “Single buildings and building complexes” hierarchy and populating the remaining hierarchies

• Industrial applications: multilingual specialized lexicon of approximately 1,000 synsets for the window and curtain wall industry

• Agreement for the future usage of ArchiWN by the Piemonte region in the cataloguing of its architectural cultural heritage

GWC 2004 - Brno, January 20-23, 2004

Details

GWC 2004 - Brno, January 20-23, 2004

Direct plug-ins

Architectural styles architectural style/1 Sub

Materials material/1, substance/1 Sub

Construction products building material/1 Sub

Techniques technique/1 Int

Tools tool/1 Int

Physical properties physical property/1 Int

Conditions condition/1 Int

Disciplines discipline/1 Int

People person/ Int

Documents document/1 Int

Drawings and representations drawing/2,representation/2

Int

back

GWC 2004 - Brno, January 20-23, 2004

Reorganizationsback

Components of buildings structure/1

component/3

region/1

Hypo

Hypo

Hypo

Single buildings and

building complexes

structure/AWN

building/1

building complex/1

Hypo

Inverse

Inverse

GWC 2004 - Brno, January 20-23, 2004

Term overlapping

ITC-irst provides the Polythecnic with lists of terms:

-synsets tagged with the “architecture” label in WN-Domains

-hyponyms of WordNet plug-in synsets

WN-Domains: 2,595• Architecture = 155 synsets

– Town planning = 444 synsets– Building industry = 1,541 synsets– Furniture = 455 synsets

GWC 2004 - Brno, January 20-23, 2004

Hyponyms of Plug-in synsets

Architectural styles architectural style/1 S 12 hyponymsMaterials material/1

substance/1S 1,266 hyponyms

6,054 hyponymsConstruction products building material/1 S 95 hyponymsTechniques technique/1 I 3 hyponymsTools tool/1 I 301 hyponyms Physical properties physical property/1 I 103 hyponymsConditions condition/1 I 1,721

hyponymsDisciplines discipline/1 I 464 hyponymsPeople person/ I 6,068 hyponymsDocuments document/1 I 328 hyponymsDrawings and representations

drawing/2,representation/2

II

26 hyponyms159 hyponyms

back

GWC 2004 - Brno, January 20-23, 2004

buildingcomplex/1

room, area,building space

buildingelement open space

entity/1

object/1

artifact/1

structure/1

architecturalcomponent

part/4 location/1

structure(AWN)

region/1component/3

architecturalspace

building/1

hypohypo

hypo hypo

inverseinverse

eclipsing

Reorganization of:-Components of buildings-Single buildings and building complexes

GWC 2004 - Brno, January 20-23, 2004

Modifying MultiWN definition

structural_wallbearing_wall

ISA ISA

an architectural partition with a height and length greater than its thickness; used to divide or enclose an area

support

wall

partitiondivider

any wall supporting a floor or the roof of a building

WordNet: {wall – “an architectural partition with a height and length greater than its thickness; used to divide or enclose an area or to support another structure”}