interconnecting lexicographic resources. in search for a model dan cristea “alexandru ioan cuza”...
TRANSCRIPT
![Page 1: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/1.jpg)
Interconnecting lexicographic resources. In search for a model
Dan Cristea“Alexandru Ioan Cuza” University of Iași
Institute of Computer Science of the Romanian Academy
![Page 2: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/2.jpg)
Topics
• Why would one want to connect linguistic resources?
• Parameterising the needs• Standardisation helps interconnecting• A bunch of notorious resources• How would this work?• Final remarks
COST-ENeL, Bled, 29-30 September 2014
![Page 3: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/3.jpg)
Why would one want to connect linguistic resources?
• Use case 1: 100 Romanian dictionaries aligned– CLRE. Essential Romanian Lexicographical Corpus.
100 dictionaries aligned at entry and, partially, sense levels (2010 –2013, at Institute A.Philippide of the Romanian Academy, in Iași• dictionaries’ list at:
http://85.122.23.90/resurse/Lista-dictionarelor.doc• written in 3 types of alphabets: Cyrillic, transition and Latin• large diversity of formatting styles
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 4: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/4.jpg)
CLREEssential Romanian Lexicographical Corpus
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 5: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/5.jpg)
ISER
The CLRE project
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 6: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/6.jpg)
Petri
The CLRE project
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 7: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/7.jpg)
DN II
The CLRE project
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 8: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/8.jpg)
Bălăşescu
The CLRE project
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 9: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/9.jpg)
Lexicon Militar
The CLRE project
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 10: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/10.jpg)
Dicționar de informatică
The CLRE project
Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014
![Page 11: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/11.jpg)
• Scanning• OCR – Abby Fine Reader 9• Parsing entries => XML • Manual verification• Indexing and alignment
Processing in CLRE
Iași, 25-26 September 2013 COST-ENeL, Bled, 29-30 September 2014
![Page 12: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/12.jpg)
CLRE – manual verification
Iași, 25-26 September 2013 COST-ENeL, Bled, 29-30 September 2014
![Page 13: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/13.jpg)
Why would one want to connect linguistic resources?
• Use case 2: align WN with an explanatory dictionary– a WN synset:
pos (def, ex, w1s1 …wk
sk … wnsn)
– an explanatory dictionary entry: wk, pos, <wk
s1,def1,ex1>… <wksk,defk,exk>… <wk
sm,defm,exm>
COST-ENeL, Bled, 29-30 September 2014
![Page 14: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/14.jpg)
Why would one want to connect linguistic resources?
• Use case 2: align WN with an explanatory dictionary– synsets of a word wk, pos:
(def1, ex1, …wks1 …)
… (defk, exk, …wk
sk …)
…
(defm, exm, …wksm …)
– the explanatory dictionary entry of the word wk, pos: wk, pos, <wk
s1,def1,ex1>… <wksk,defk,exk>… <wk
sm,defm,exm>COST-ENeL, Bled, 29-30 September 2014
![Page 15: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/15.jpg)
Why would one want to connect linguistic resources?
• Use case 2: align WN with an explanatory dictionary– a WN synset:
pos (def, ex, w1s1 …wk
sk … wnsn)
– explanatory dictionary entries: w1, pos, … <w1
s1, def, ex>…
wk, pos, … <wksk, def, ex>…
wn, pos, … <wnsn, def, ex>…
COST-ENeL, Bled, 29-30 September 2014
![Page 16: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/16.jpg)
Why would one want to connect linguistic resources?
• Use case 3: the TOT problem or the forgotten word
Mic
hael
Zoc
k
COST-ENeL, Bled, 29-30 September 2014
![Page 17: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/17.jpg)
The first thought: standardisation
• Lexical Markup Framework (LMF)– What is it?
• a common model for creation and use of lexical resources
– With what goal?• to manage the exchange of data between and among
these resources• to enable the merging of a large number of individual
electronic resources to form extensive global electronic resources
COST-ENeL, Bled, 29-30 September 2014
![Page 18: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/18.jpg)
Near-standard• Text Encoding Initiative (TEI)
– What is it?• an inventory of the features most often deployed for
computer-based text processing • recommendations about suitable ways of representing
these features
– With what goal?• to facilitate processing by computer programs• to facilitate the loss-free interchange of data amongst
individuals and research groups using different programs, computer systems, or application software
COST-ENeL, Bled, 29-30 September 2014
![Page 19: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/19.jpg)
Standardisation• Text Encoding Initiative (TEI)
– Example of a dictionary entry serialisation (from TEI Guidelines)
<entry> <form> <orth>disproof</orth> <pron>dIs"pru:f</pron> </form> <gramGrp> <pos>n</pos> </gramGrp> <sense n="1"> <def>facts that disprove something.</def> </sense> <sense n="2"> <def>the act of disproving.</def> </sense> </entry>
disproof (dIs"pru:f) n. 1. facts that disprove something. 2. the act of disproving. CED
COST-ENeL, Bled, 29-30 September 2014
![Page 20: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/20.jpg)
LMF and TEI content and… discontent
• The TEI format may be used as an interchange format, permitting sharing of resources even when their local encoding schemes differ.
• Both LMF and TEI model lexical material at a deep representational detail…
COST-ENeL, Bled, 29-30 September 2014
![Page 21: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/21.jpg)
LMF and TEI content and… discontent
• TEI intention: – guidance for individual or local practice in text
creation and data capture– support of data interchange – support of application-independent local processing
• Opening good possibilities of querying• But how would function the interconnection?...
COST-ENeL, Bled, 29-30 September 2014
![Page 22: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/22.jpg)
Parameterising the needs
• If I want to connect two resources, simply merge the contents
• Then be able to interrogate the merged resource by taking advantage of peculiarities in each resource
COST-ENeL, Bled, 29-30 September 2014
![Page 23: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/23.jpg)
Parameterising the needs
• Able to represent variations in word forms, alternate orthography, diachronic morphology
• Easy navigation by applying various filtering criteria
COST-ENeL, Bled, 29-30 September 2014
![Page 24: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/24.jpg)
Parameterising the needs
• Very often lexicographic data is hierarchical– for instance, a sense of a dictionary entry contains
a definition, examples, but also sub-senses• Organise even recursive searches
– give me the definition neighbouring sphere of depth 2 of the word captain (take all senses of the entry captain and form the list of words in the corresponding definitions, then for each of them take all their senses and collect again words in their definitions)
COST-ENeL, Bled, 29-30 September 2014
![Page 25: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/25.jpg)
The idea
• Representing lexical information as feature structures centred on word’s lemmas
– disproof (dIs"pru:f) n. 1. facts that disprove something. 2. the act of disproving. CED
[lemma=disproof, entry=[pron=dIs"pru:f, pos=n, sense=[n=1, def=facts that disprove something], sense=[n=2, def=the act of disproving], res=CED]]
COST-ENeL, Bled, 29-30 September 2014
![Page 26: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/26.jpg)
Representing lexical entries as feature structures
lemma=disproof
entry=
pron=dIs"pru:fpos=n
sense=
sense=
res=CED
n=1def=facts that disprove something
n=2def=the act of disproving
Cam
brid
ge E
nglis
h D
ictio
nary
COST-ENeL, Bled, 29-30 September 2014
![Page 27: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/27.jpg)
Representing lexical entries as feature structures
• Graph representation
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
facts that disproves smthdef2
n
the act of disprovingdef
res
CED
COST-ENeL, Bled, 29-30 September 2014
![Page 28: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/28.jpg)
Representing lexical entries as feature structures
lemma=disproof
entry=
pron=dIs"pru:fpos=n
sense=
sense=
res=MWCD
n=2def=evidence that disproves
n=1def=the action of disproving
Mer
riam
-Web
ster
’s C
olle
giat
e D
ictio
nary
COST-ENeL, Bled, 29-30 September 2014
![Page 29: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/29.jpg)
Representing lexical entries as feature structures
• Graph representation
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
the action of disprovingdef2
n
evidence that disprovesdef
res
MWCD
COST-ENeL, Bled, 29-30 September 2014
![Page 30: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/30.jpg)
How could lexical entries be merged?• Entries of the same word from different
dictionaries
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
facts that…def2
n
the act of…def
res
CED
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
the action of…def2
n
evidence that…def
res
MWCD
COST-ENeL, Bled, 29-30 September 2014
![Page 31: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/31.jpg)
Merging lexical entries
• Distinct parts
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
facts that…def2
n
the act of…def
res
CED
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
the action of…def2
n
evidence that…def
res
MWCD
COST-ENeL, Bled, 29-30 September 2014
![Page 32: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/32.jpg)
Merging lexical entries
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
facts that…def2
n
the act of…def
res
CED
1sense
sense
n
the action of…def2
n
evidence that…def
res
MWCD
X (new)
X (new)
COST-ENeL, Bled, 29-30 September 2014
![Page 33: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/33.jpg)
Representation of the merged feature structure
lemma=disproof
entry=
pron=dIs"pru:fpos=n
X=
X=
n=1def=facts that disprove something
n=2def=the act of disproving
sense=
sense=
res=CED
n=1def=the action of disproving
n=2def=evidence that disproves
sense=
sense=
res=MWCD
COST-ENeL, Bled, 29-30 September 2014
![Page 34: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/34.jpg)
The WordNet search for disproof
COST-ENeL, Bled, 29-30 September 2014
![Page 35: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/35.jpg)
Feature structures representation for the WN synsets of disproof
lemma
disproof
synsets
pos
n(*, falsification, refutation)
synset
(falsification, falsifying, *, refutation, refutal)synsetlex
lexgloss any evidence that helps to establish
the falsity of something
gloss (the act of determining that something is false
COST-ENeL, Bled, 29-30 September 2014
![Page 36: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/36.jpg)
The WordNet search for discount
COST-ENeL, Bled, 29-30 September 2014
![Page 37: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/37.jpg)
Representing WordNet synsets
lemma
discount
synsets
pos
n(*, price reduction, deduction)
synset(discount rate, *, bank discount)
synset
synset
lex
lex
synset
gloss the act of reducing the selling price of merchandise
gloss interest on an annual basis deducted in advance on a loan
synsetspos
v
synset
synset
(dismiss, disregard, brush aside, brush off, *, push aside, ignore)lex
gloss bar from attention or consideration
……
…
ex“She dismissed his advances"
COST-ENeL, Bled, 29-30 September 2014
![Page 38: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/38.jpg)
How could dictionary entries be merged with
WN synsets?
lemma
disproof
synsets
pos
n(*, falsification, refutation)
synset
(falsification, falsifying, *, refutation, refutal)synsetlex
lexgloss any evidence that helps to establish
the falsity of something
gloss (the act of determining that something is false
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
facts that disproves smthdef2
n
the act of disprovingdef
res
CED
COST-ENeL, Bled, 29-30 September 2014
![Page 39: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/39.jpg)
Merging a dictionary entry with a WN entry
lemma
disproof
entry
pron
dIs"pru:f
npos
1sense
sense
n
facts that disproves smthdef2
n
the act of disprovingdef
res
CED
lemma
disproof
synsets
pos
n(*, falsification, refutation)
synset
(falsification, falsifying, *, refutation, refutal)synsetlex
lexgloss any evidence that helps to establish
the falsity of something
gloss (the act of determining that something is false
COST-ENeL, Bled, 29-30 September 2014
![Page 40: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/40.jpg)
synsets
pos
n(*, falsification, refutation)
synset
(falsification, falsifying, *, refutation, refutal)synsetlex
lexgloss any evidence that helps to establish
the falsity of something
gloss (the act of determining that something is false
lemma
disproofentry
pron
dIs"pru:f
npos
1sense
sense
n
facts that disproves smthdef2
n
the act of disprovingdef
res
CED
Merging a dictionary entry with a WN entry
COST-ENeL, Bled, 29-30 September 2014
![Page 41: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/41.jpg)
Going one step further
• Feature structures are hierarchical data • Codd: hierarchical data can be represented as
relational tables – Codd, E.F. (June 1970). "A Relational Model of Data
for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.
COST-ENeL, Bled, 29-30 September 2014
![Page 42: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/42.jpg)
Representing feature structures as relational tables
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
…
cit …orth
auth…
……
…
from http://en.wikipedia.org/wiki/Relational_database
yr
COST-ENeL, Bled, 29-30 September 2014
![Page 43: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/43.jpg)
Representing feature structures as relational tables
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
…
W O R D
cit …orth
auth…
……
…
id lemma entry
yr
from http://en.wikipedia.org/wiki/Relational_database
COST-ENeL, Bled, 29-30 September 2014
![Page 44: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/44.jpg)
Representing feature structures as relational tables
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
…
W O R D
E N T RY entry pron pos sense res
cit …orth
auth…
……
…
id lemma entry
yr
from http://en.wikipedia.org/wiki/Relational_database
COST-ENeL, Bled, 29-30 September 2014
![Page 45: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/45.jpg)
Representing feature structures as relational tables
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
…
W O R D
E N T RY entry pron pos sense res
cit …orth
auth…
……
…
S E N S E sense n def cit
id lemma entry
yr
from http://en.wikipedia.org/wiki/Relational_database
COST-ENeL, Bled, 29-30 September 2014
![Page 46: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/46.jpg)
Representing feature structures as relational tables
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
…
W O R D
E N T RY entry pron pos sense res
cit …orth
auth…
……
…
C I T
S E N S E sense n def cit
cit orth auth yr
id lemma entry
yr
from http://en.wikipedia.org/wiki/Relational_database
COST-ENeL, Bled, 29-30 September 2014
![Page 47: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/47.jpg)
Relational operators
• Projection: πa1,…an(R) => a relation containing only values of attributes a1,… an from the relation R
• Selection: σφ(R), with ϕ is logical condition => only tuples verifying the condition ϕ are retained from the relation (or the set) R
• Join: R✜S => the set of all attributes in R and S that are equal on their common attributes
• Union: RŮS => a table representing the union of the two relations
COST-ENeL, Bled, 29-30 September 2014
![Page 48: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/48.jpg)
Interrogating a dictionary
• Citations before 1850 of the entry “symphony”.
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
cit …orth
yrauth
…
……
…
WORD
ENTRY
SENSE
CIT
πorth(σlemma=“symphony” & yr<1850 (WORD✜ENTRY✜SENSE✜CIT))
COST-ENeL, Bled, 29-30 September 2014
![Page 49: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/49.jpg)
Interrogating a combination between a dictionary and a wordnet
• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
cit …orth
yrauth
…
……
…
WORD
ENTRY
SENSE
CIT
COST-ENeL, Bled, 29-30 September 2014
![Page 50: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/50.jpg)
Interrogating a combination between a dictionary and a wordnet
• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
cit …orth
yrauth
…
……
…
WORD
ENTRY
SENSE
CITThe citations of the title word w:πorth(σlemma=w & yr<1850 (WORD✜ENTRY✜SENSE✜CIT))
COST-ENeL, Bled, 29-30 September 2014
![Page 51: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/51.jpg)
Interrogating a combination between a dictionary and a wordnet
• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
cit …orth
yrauth
…
……
…
WORD
ENTRY
SENSE
CITUnify and lemmatise words belonging to the citations:lem(U(πorth(σlemma=w & yr<1850 (WORD✜ENTRY✜SENSE✜CIT))))
COST-ENeL, Bled, 29-30 September 2014
![Page 52: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/52.jpg)
Interrogating a combination between a dictionary and a wordnet
• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.
πlex(σpos=n & lemma lem(U(πorth(σlemma=w & yr<1850 (WORD✜ENTRY✜SENSE✜CIT)))
(WORD✜SYNS✜SYN))
lemma
w
entry
pron
…
npos 1
sense
sense
n …def
res
cit …orth
yrauth
……
……
WORDENTRY
SENSE
CIT
synsets posn
synset
lex
gloss…
…
SYNS
SYN
COST-ENeL, Bled, 29-30 September 2014
![Page 53: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/53.jpg)
Conclusions
• I did not propose any model, I simply made some observations (nothing is really new)
• Linking lexicographic resources: – one resource => TEI representation => as feature
structures => hierarchical graphs => relational tables– more resources => unifications of tables– use query and relational operators for interrogation
COST-ENeL, Bled, 29-30 September 2014
![Page 54: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/54.jpg)
Discussion
• Only a sketch – a lot of details should still be filled in– the good news: XML structures (the native
language of TEI) accept direct representations as database records: XSLT => opening direct access to a complex querying language: XQuery => mimicking the relational operators and adding more facilities
COST-ENeL, Bled, 29-30 September 2014
![Page 55: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/55.jpg)
Discussion
• Another good news– representing variable depth structures– recursive hierarchies: Kamfonas
• Fixed depth dimensions are simpler to implement, maintain and query… Hierarchies that have variable depth or an uncertain number of levels… can often benefit if implemented as recursive hierarchies. http://www.kamfonas.com/id3.html
COST-ENeL, Bled, 29-30 September 2014
![Page 56: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/56.jpg)
Discussion
• Even more good news (hopes)– interrogations can be formulated in natural
language => an interpreter translates them in the query language of a DBMS system
– as such, a handy tool at the benefit of lexicographers
COST-ENeL, Bled, 29-30 September 2014
![Page 57: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/57.jpg)
Acknowledgements
• Work partially supported by the project The Computational Representative Corpus of Contemporary Romanian Language, a project of the Romanian Academy and partially by the COST-ENeL project
• I thank Isabelle Tamba and Mădălin Pătrașcu for the slides describing the CLRE project
COST-ENeL, Bled, 29-30 September 2014
![Page 58: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the](https://reader037.vdocuments.us/reader037/viewer/2022110304/551c23e1550346a34f8b5ce5/html5/thumbnails/58.jpg)
Thank you!
COST-ENeL, Bled, 29-30 September 2014