![Page 1: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/1.jpg)
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group
Institute of Informatics,
Wrocław University of Technology
nlp.pwr.wroc.pl
plwordnet.pwr.wroc.pl
Large Polish-English Lexico-Semantic
Resource Based on plWordNet - Princeton
WordNet Mapping
![Page 2: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/2.jpg)
Outline
• What is a wordnet?
• Mapping plWordNet on Princeton WordNet
• Extending Princeton WordNet
• Applications
• Conclusions
![Page 3: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/3.jpg)
What is a wordnet? (1)
A huge electronic lexico-semantic database (a kind of thesaurus)
Basic building blocks:
- lemma – base form representing different inflectional forms and
different meanings
e.g. czwórka – 'good'
- lexical unit – lemma plus sense pair (in wordnets marked with
number)
e.g. czwórka 3 (por – 'communication')
- synset – a set of synonymous lexical units
e.g. {czwórka 3 (por), czwóra 1 (por)}
![Page 4: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/4.jpg)
What is a wordnet? (2)
Both lexical units and synsets linked via different lexico-semantic relations such as:
synonymy, near-synonymy,
hypernymy/hyponymy,
meronymy/holonymy, fuzzynymy
Examples: Lexical relations:
czwórka 3 (por) has a derivativity relation to czwórka 4 (por)
czwórka 3 (por) has an expressiveness relation to czwóra 1(por) Synset relations:
{czwórka 3 (por), czwóra 1 (por)} is a hyponym of
{stopień 3(il), ocena 1(il), nota 3(il)}
![Page 5: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/5.jpg)
Princeton WordNet
Princeton WordNet (Fellbaum 1998):
the first wordnet ever built on psycholinguistic principles – mapping the structure of
human lexical memory (cf. Miller 1998) taxonomic hierarchies for nouns, entailment relations for
verbs,
antonym relations for adjectives synsets represent 'lexicalised concepts' (cf. Miller 1998); synsets built of lexical units linked by synonymy relation,
understood as a conceptual relation established on the basis of
linguist's intuitions and dictionary definitions
No major changes since 2006, last version 2012
![Page 6: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/6.jpg)
plWordNet - Słowosieć
• plWordNet (plWN)• developed fairly independently of
Princeton WordNet (PWN) by applying • a unique corpus-based method
• one of the biggest existing wordnets
Number of plWN PWN enWN
lemmas 156,402 155,593 157,541
lexical units 220,129 206,978 209,147
synsets 162,629 117,659 119,290
![Page 7: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/7.jpg)
• the emphasis on
• relations between lexical units, new relations specially designed to cover the pecularities of morphosyntactic structure of Polish
• cf. Piasecki et al. 2009, Maziarz et al. 2012
• synsets built of
• lexical units sharing the same set of constitutive relations
• such as hyponymy, hypernymy, meronymy, holonymy
• partly linked to Princeton WordNet
• cf. Rudnicka et al. 2012
plWordNet vs. Princeton WordNet
![Page 8: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/8.jpg)
Mapping plWordNet on Princeton WordNet
• Goal: Linking plWordNet synsets with Princeton Wordnet synsets
• Steps:• Defining a set of inter-lingual relations and setting
their hierarchy
• Designing mapping procedures for nouns and adjectives
• Mapping direction: plWordNet > Princeton WordNet
• Bottom-up approach – starting from the lowest levels in the hierarchy
• Currently mapped lexical categories:
• nouns (most of them), adjectives (about a half)
![Page 9: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/9.jpg)
Automatic prompts
Two systems, based on: 1) relaxation labeling algorithm (nouns)
2) rules relying on the network of the existing
intra and inter-lingual relations (adjectives)
Resource: cascade dictionary Generated prompts: - visible in the form of special links in
WordNetLoom editing system
- verified by lexicographers
![Page 10: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/10.jpg)
A set of inter-lingual relationsand current statistics
• A set of inter-lingual relations between plWN and PWN
• inspired by:• inter-lingual relations from EuroWordNet (Vossen
2002)
• intra-lingual relations from plWordNet (Maziarz et al. 2011)
• Statistics of the established inter-lingual links:
Nouns Adjectives1. Synonymy 28 736 3 1992. Partial synonymy 2 580 1 0033. Inter-register synonymy 1 510 354. Hyponymy 57 029 6 5615. Hypernymy 3 744 346. Meronymy 6 0347. Holonymy 1 204 8. Cross-categorial synonymy 3 891
![Page 11: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/11.jpg)
Motivation for the extension of Princeton WordNet
the high percentage of inter-lingual hyponymy links between plWordNet and Princeton WordNet synsets
Established due to a number of lexical coverage gaps in Princeton WordNet
And the resulting impossibility to establish much more informative and useful inter-lingual synonymy links
possible to be used as ‘pointers’ to specific
Princeton WordNet gaps (‘missing’ lexical units) and whole ‘empty nests’ (several missing co-hyponyms of one hypernym synset) in the network
![Page 12: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/12.jpg)
Inter-lingual hyponymy links
![Page 13: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/13.jpg)
General extension procedure
• The starting point -- existing inter-lingual hyponymy links
• Lemmas of plWordNet synsets translated by a cascade dictionary• Which combines several traditional dictionaries, the data
ordered in the hierarchy of importance; the topmost gaining more priority
the results are filtered by lemmas of Princeton WordNet, to gain:• A list of plWN lemmas with the ‘equivalent’ cascade
dictionary lemmas absent from PWN• A list of plWN lemmas without the ‘equivalent’ cascade
dictionary lemmas• A list of plWN lemmas with the ‘equivalent’ cascade
dictionary lemmas present in PWN
![Page 14: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/14.jpg)
Extension procedure
• Start is at the lowest level of hierarchy• in order not to change the structure of the original Princeton WordNet
• Verification of the suggested English equivalent(s)• in corpora and other reliable sources
• on the basis of
• the researcher’s knowledge
• dictionaries
• frequency lists from corpora
• Creation of the new Princeton WordNet synset
• The synset is linked
• via intra-lingual hyponymy relation to a proper PWN hypernym synset
• via inter-lingual synonymy relation to its direct counterpart in plWordNet
![Page 15: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/15.jpg)
Extension results
• Each added synset provided with:• a definition
• major source - English Wikipedia
• a usage example • from a corpus or
• other reliable English source
• Total number of selected plWN synsets --- 42785
• Domains selected for the first stage :• shape (156)
• substance (1181)
• quantity (547)
• food (885)
• property (1492)
![Page 16: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/16.jpg)
Extension via plWN.Pros and cons
• Pros:• There is a definite vocabulary basis for the
extension
• New synsets can be easily and safely located in the structure of the original PWN
• Cons:• Polish orientation of the extension
• Addition of lexical units related to strictly Polish domains
![Page 17: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/17.jpg)
Extension via corpora data.An alternative strategy
• This extesion procedure uses frequency lists derived from:• British National Corpus • Wacky corpus• Corpus of Contemporary American English• American National Corpus• English Wikipedia
• Independent of plWordNet• Criterion for inclusion of a new lexical unit
• its appearance in five different texts
![Page 18: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/18.jpg)
Pros and cons
• Pros:• English oriented
• no Polish bias• Cons:• new synsets have to be introduced at different
levels of the PWN hierarchy
• there is a risk of changing • the structure of the original PWN
![Page 19: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/19.jpg)
Cross-lingual Applications
Cross-lingual Semantic searching, Semantic indexing of texts, Text classification, Statistical semantic analysis of corpora in
different languages Information Extraction, Machine Translation
Multi-lingual Princeton WordNet 3.1 is linked to more than
60 languages
![Page 20: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/20.jpg)
Conclusions
• The created bilingual resource will become a gateway to CLARIN bilingual resources
• It has a number of practical applications
• Princeton WordNet can be enriched and updated
• Extension of Princeton WordNet allows one to replace
• the existing inter-lingual hyponymy links between plWN and PWN synsets with
• more precise and useful inter-lingual synonymy links
![Page 21: Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649cfa5503460f949cc145/html5/thumbnails/21.jpg)
References
Fellbaum, Ch. (ed). 1998. WordNet: An Electronic Lexical Database. MIT Press: Cambridge, Massachusets.
Maziarz, M., Piasecki, M. and S. Szpakowicz. 2012. Approaching plWordNet 2.0. Proceedings of the 6th Global Wordnet Conference, Matsue.
Piasecki, M., Maziarz, M. Szpakowicz, S & Rudnicka, E. (2014). plWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources. W Proc. 7th International Global Wordnet Conference.
Princeton WordNet http://wordnet.princeton.edu/wordnet/
Rudnicka, E., Maziarz, M., Piasecki, M., & Szpakowicz, S. (2012). A Strategy of Mapping Polish WordNet onto Princeton WordNet. In Proceedings of COLING 2012. ACL.
Słowosieć http://plwordnet.pwr.wroc.pl/wordnet/
Vossen, P. (ed). 2002. EuroWordNet. General Document. Amsterdam.