archiwordnet integrating wordnet with domain-specific knowledge luisa bentivogli 1, andrea bocco 2,...
TRANSCRIPT
GWC 2004 - Brno, January 20-23, 2004
ArchiWordNetIntegrating WordNet with
Domain-Specific Knowledge
Luisa Bentivogli1, Andrea Bocco2, Emanuele Pianta1
1ITC-irst Trento, Italy2Politecnico di Torino, Italy
GWC 2004 - Brno, January 20-23, 2004
Outline
• ArchiWordNet: a WordNet-like thesaurus
• Adopting and adapting the MultiWordNet model
• Integrating ArchiWordNet with MultiWordNet
• Conclusion and future work
GWC 2004 - Brno, January 20-23, 2004
Outline
• ArchiWordNet: a WordNet-like thesaurus
• Adopting and adapting the MultiWordNet model
• Integrating ArchiWordNet with MultiWordNet
• Conclusion and future work
GWC 2004 - Brno, January 20-23, 2004
ArchiWordNet: a WordNet-like thesaurus
• A bilingual English/Italian thesaurus for the “Architecture and Construction” domain
– structured according to the WordNet model – fully integrated with MultiWordNet
MultiWordNet
A multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton’s English WordNet.
GWC 2004 - Brno, January 20-23, 2004
Motivation
• Still Image Server, an architecture image archive available at the Polytechnic of Turin
– need for a thesaurus:• Image cataloguing (minimize subjectivity) • Image retrieval (minimize ambiguity)
• No exhaustive thesauri for the architecture domain are available
GWC 2004 - Brno, January 20-23, 2004
Why (Multi)WordNet model?
• A rich and rigorous structure– synonyms– many relations explicitly and homogeneously
encoded
• Allows for a more powerful and expressive retrieval mechanism– no ambiguities– extended search with related concepts
• Is more suitable for educational purposes
GWC 2004 - Brno, January 20-23, 2004
Why integrated with MultiWN?
• General and multilingual framework for the specialized knowledge
• Integrated access allowing for a more flexible retrieval of the information
• Information already existing in the generic (Multi)WordNet can be exploited in the creation of the specialized one
GWC 2004 - Brno, January 20-23, 2004
Outline
• ArchiWordNet: a WordNet-like thesaurus
• Adopting and adapting the MultiWordNet model
• Integrating ArchiWordNet with MultiWordNet
• Conclusion and future work
GWC 2004 - Brno, January 20-23, 2004
Adopting MultiWN model
• Sources:– Specialized sources
• Art and Architecture Thesaurus (AAT) • Construction Indexing Manual of CI|SfB • International and National standards (ISO, CEN, UNI)• Architecture and Building Dictionaries• Domain literature
– MultiWN itself
• Issues:– Reorganize specialized sources to make them compatible
with the MultiWN model– Modify MultiWN synsets to make them suitable for
representing the specialized domain
GWC 2004 - Brno, January 20-23, 2004
<by composition or origin><by quality> <by form>
METAL
noble_metal aluminumsteelalloy powder leaf
Reorganizing domain-specific sources
METAL
metal_powder
noble_metalsteel aluminum alloy
metal_leaf
leaf powder
ISA
ISA
ISA
ISA
ISA
ISA
ISA
ISA
ISA
AAT hierarchy
ArchiWN hierarchy
GWC 2004 - Brno, January 20-23, 2004
Tailoring MultiWN synsets
• MultiWN synsets considered appropriate by the domain experts are included into ArchiWN
• Several options are available:
– add or delete synonyms to MultiWN synsets – modify MultiWN definitions of the synsets – delete and add relations between synsets
GWC 2004 - Brno, January 20-23, 2004
New relations for ArchiWN
• HAS FORM (n/n) – {tympanum} HAS-FORM {triangle, trigon, …}
• HAS ROLE (n/n) – {metal section} HAS-ROLE {upright, vertical}
• HAS FUNCTION (n/v) – {beam} HAS-FUNCTION {to hold, to support,…}
GWC 2004 - Brno, January 20-23, 2004
Outline
• ArchiWordNet: a WordNet-like thesaurus
• Adopting and adapting the MultiWordNet model
• Integrating ArchiWordNet with MultiWordNet
• Conclusion and future work
GWC 2004 - Brno, January 20-23, 2004
Integrating ArchiWN with MultiWN
• 5,000 terms grouped in 13 semantic areas => the main ArchiWN hierarchies
• Architectural styles• Materials• Construction products• Techniques• Tools• Components of buildings• Single buildings and
building complexes
• Physical properties • Conditions• Disciplines• People• Documents• Drawings and representations
GWC 2004 - Brno, January 20-23, 2004
Integration issues
• Identify the MultiWN nodes where to insert the ArchiWN hierarchies
• Include ArchiWN hierarchies in MultiWN• Handle the overlaps between terms present in both
MultiWN and ArchiWN • Handle the possible inconsistencies in the
hierarchies
GWC 2004 - Brno, January 20-23, 2004
The integration methodology
• Basic operations– performed on single MultiWN synsets
• Complex procedures (plug-in)– apply to entire hierarchies
GWC 2004 - Brno, January 20-23, 2004
Basic operations
• eclipse a synset
• tag a synset with the “architecture and construction” domain label
• add or delete relations to a synset
• add or delete synonyms in a synset
• modify the synset definition
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-in
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inMWN MWN
MWN
MWN
MWN
MWN
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inAWN AWN
AWN
AWN
MWN
MWN
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inMWN MWN
MWN
MWN
MWN
MWN
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inAWN AWN
AWN
MWN
MWN
MWN
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inMWN MWN
MWN
MWN
MWN
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inMWN MWN
MWN
MWN
MWN
AWN AWN
AWN
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inAWN AWN
AWN
AWN
AWN
GWC 2004 - Brno, January 20-23, 2004
Complex procedures
• Substitutive plug-in
• Integrative plug-in
• Hyponymic plug-in
• Inverse plug-inAWN AWN
AWN
AWN
AWN
MWN MWN
MWN
GWC 2004 - Brno, January 20-23, 2004
Results
• 13 ArchiWN semantic areas plugged in 18 MultiWN synsets– 11 ArchiWN semantic areas (12 hierarchies) directly
plugged in MultiWN• 4 substitutive plug-ins
• 8 integrative plug-ins
– 2 ArchiWN semantic areas (6 hierarchies) required a reorganization of some MultiWN sub-hierarchies
• 4 hyponymic plug-ins
• 2 inverse plug-ins
• large synset eclipsing
GWC 2004 - Brno, January 20-23, 2004
ArchiWN up to now
• “Single buildings and building complexes” sub-hierarchy – 900 synsets
– Italian and English synonyms
– accurate definition
• Work done manually using the MultiWN graphical interface which allows the user– to modify existing synsets and relations
– to create new synsets
GWC 2004 - Brno, January 20-23, 2004
Outline
• ArchiWordNet: a WordNet-like thesaurus
• Adopting and adapting the MultiWordNet model
• Integrating ArchiWordNet with MultiWordNet
• Conclusion and future work
GWC 2004 - Brno, January 20-23, 2004
Conclusions
• It is possible to integrate ArchiWN with MultiWN• MultiWN itself can be widely exploited in the
creation of ArchiWN hierarchies
• Advantages of interdisciplinary cooperation – wrt specialized thesauri
• formalized structure
• inheritance of linguistic-oriented information from the generic WordNet
– wrt lexical resources• many synsets will be associated with images
GWC 2004 - Brno, January 20-23, 2004
Future work
• Go on enriching the “Single buildings and building complexes” hierarchy and populating the remaining hierarchies
• Industrial applications: multilingual specialized lexicon of approximately 1,000 synsets for the window and curtain wall industry
• Agreement for the future usage of ArchiWN by the Piemonte region in the cataloguing of its architectural cultural heritage
GWC 2004 - Brno, January 20-23, 2004
Direct plug-ins
Architectural styles architectural style/1 Sub
Materials material/1, substance/1 Sub
Construction products building material/1 Sub
Techniques technique/1 Int
Tools tool/1 Int
Physical properties physical property/1 Int
Conditions condition/1 Int
Disciplines discipline/1 Int
People person/ Int
Documents document/1 Int
Drawings and representations drawing/2,representation/2
Int
back
GWC 2004 - Brno, January 20-23, 2004
Reorganizationsback
Components of buildings structure/1
component/3
region/1
Hypo
Hypo
Hypo
Single buildings and
building complexes
structure/AWN
building/1
building complex/1
Hypo
Inverse
Inverse
GWC 2004 - Brno, January 20-23, 2004
Term overlapping
ITC-irst provides the Polythecnic with lists of terms:
-synsets tagged with the “architecture” label in WN-Domains
-hyponyms of WordNet plug-in synsets
WN-Domains: 2,595• Architecture = 155 synsets
– Town planning = 444 synsets– Building industry = 1,541 synsets– Furniture = 455 synsets
GWC 2004 - Brno, January 20-23, 2004
Hyponyms of Plug-in synsets
Architectural styles architectural style/1 S 12 hyponymsMaterials material/1
substance/1S 1,266 hyponyms
6,054 hyponymsConstruction products building material/1 S 95 hyponymsTechniques technique/1 I 3 hyponymsTools tool/1 I 301 hyponyms Physical properties physical property/1 I 103 hyponymsConditions condition/1 I 1,721
hyponymsDisciplines discipline/1 I 464 hyponymsPeople person/ I 6,068 hyponymsDocuments document/1 I 328 hyponymsDrawings and representations
drawing/2,representation/2
II
26 hyponyms159 hyponyms
back
GWC 2004 - Brno, January 20-23, 2004
buildingcomplex/1
room, area,building space
buildingelement open space
entity/1
object/1
artifact/1
structure/1
architecturalcomponent
part/4 location/1
structure(AWN)
region/1component/3
architecturalspace
building/1
hypohypo
hypo hypo
inverseinverse
eclipsing
Reorganization of:-Components of buildings-Single buildings and building complexes
GWC 2004 - Brno, January 20-23, 2004
Modifying MultiWN definition
structural_wallbearing_wall
ISA ISA
an architectural partition with a height and length greater than its thickness; used to divide or enclose an area
support
wall
partitiondivider
any wall supporting a floor or the roof of a building
WordNet: {wall – “an architectural partition with a height and length greater than its thickness; used to divide or enclose an area or to support another structure”}