martin benjamin the particles of language: "the dictionary" as elemental data for 7000...
TRANSCRIPT
![Page 1: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/1.jpg)
1
Martin Benjamin
The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space
21 May, 2015 – CERN, Geneva
![Page 2: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/2.jpg)
2
kamusi is Swahili for dictionary
![Page 3: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/3.jpg)
3
Goal: A complete matrix of human expression across time and space
• As a knowledge resource• As a data resource
![Page 4: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/4.jpg)
4
In service since 1994 (originally at Yale Council on African Studies)International NGO since 2009• Registered non-profit in USA and Switzerland
Academic Home since 2013:EPFL - Swiss Federal Institute of Technology in LausanneLSIR - Distributed Systems Information Laboratory
![Page 5: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/5.jpg)
5
White House Big Data Initiative:
Launch Partner for Building the Data Innovation Ecosystem Networking and Information Technology R&D ProgramOffice of Science and Technology Policy
![Page 6: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/6.jpg)
6
What is the overlap between and ?
• Big goals, small particles• Big collaboration• 7000 languages• “Human Languages Project”
• Pure science – data for knowledge• Practical science – data for use• High energy particle detectors
![Page 7: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/7.jpg)
7
Problems for Lexicography
What are Concepts?• How to explain an idea in
its own language• How to express an idea
across languages• How to account for
variation
What are Words?• A set of letters?• A set of sounds?
• A “canonical” form?• A single entity?
![Page 8: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/8.jpg)
8
Problems for Lexicography
What are Concepts?• How to explain an idea in
its own language• How to express an idea
across languages• How to account for
variation
What are Words?• A set of letters?• A set of sounds?
• A “canonical” form?• A single entity?
![Page 9: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/9.jpg)
9
Problems for Lexicography
What are Concepts?• How to explain an idea in
its own language• How to express an idea
across languages• How to account for
variation
What are Words?• A set of letters?• A set of sounds?
• A “canonical” form?• A single entity?
![Page 10: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/10.jpg)
10
Problems for Lexicography
What are Concepts?• How to explain an idea in
its own language• How to express an idea
across languages• How to account for
variation
What are Words?• A set of letters?• A set of sounds?
• A “canonical” form?• A single entity?
C-L-I-E-N-T
![Page 11: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/11.jpg)
11
Problems for Lexicography
What are Concepts?• How to explain an idea in
its own language• How to express an idea
across languages• How to account for
variation
What are Words?• A set of letters?• A set of sounds?
• A “canonical” form?• A single entity?
whined wind wined
![Page 12: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/12.jpg)
12
Problems for Lexicography
What are Concepts?• How to explain an idea in
its own language• How to express an idea
across languages• How to account for
variation
What are Words?• A set of letters?• A set of sounds?
• A “canonical” form?• A single entity?
SEEseessawseenseeing
Kinyarwanda900 million forms for every verb
![Page 13: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/13.jpg)
13
Problems for Lexicography
What are Concepts?• How to explain an idea in
its own language• How to express an idea
across languages• How to account for
variation
What are Words?• A set of letters?• A set of sounds?
• A “canonical” form?• A single entity?
African fish eagle drive up the wall
![Page 14: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/14.jpg)
14
light
![Page 15: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/15.jpg)
15
light
why multilingual dictionaries were impossible
![Page 16: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/16.jpg)
16
light
lumineux
léger
allégé
léger
why multilingual dictionaries were impossible
![Page 17: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/17.jpg)
17
light
lumineux
léger
allégé
léger
why multilingual dictionaries were impossible
WOLF 02121424-a:légerlumière
WOLF 01186408-a:léger
WOLF 00993117-a:légerallégélumièrelight
WOLF 00269989-a:lumièrelumineuxclair
PWN (English Wordnet):light x 47
WOLF (French Wordnet):light = lumière x 44light = léger x 37
![Page 18: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/18.jpg)
18
lightléger
why multilingual dictionaries were impossible
lumineux
allégé
léger
![Page 19: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/19.jpg)
19why multilingual dictionaries were impossible
![Page 20: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/20.jpg)
20why multilingual dictionaries were impossible
lumineux
![Page 21: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/21.jpg)
21
light
fr: lumineux
fr: léger
fr: allégé
fr: léger
why multilingual dictionaries were impossible
th: ที่��แคลอรี่��ต่ำ��
fi: kaloritonsw: pungufu
th: เบ�
fi: kevyt
sw: -epesi
th: สว่��ง
fi: valoisasw: -enye mwanga
th: ซึ่��งไรี่�ส�รี่ะ
fi: tyhjänpäiväinen
sw: -a kuchekesha
![Page 22: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/22.jpg)
22
en: light
fr: lumineux
fr: léger
fr: allégé
fr: léger
why multilingual dictionaries were impossible
th: ที่��แคลอรี่��ต่ำ��
fi: kaloritonsw: pungufu
th: เบ�
fi: kevyt
sw: -epesi
th: สว่��ง
fi: valoisasw: -enye mwanga
th: ซึ่��งไรี่�ส�รี่ะ
fi: tyhjänpäiväinen
sw: -a kuchekesha
en: light
en: light
en: light
![Page 23: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/23.jpg)
23
fr: lumineux
fr: léger
fr: allégé
why multilingual dictionaries were impossible
th: ที่��แคลอรี่��ต่ำ��
fi: kaloritonsw: pungufu
th: เบ�
fi: kevyt
sw: -epesi
th: สว่��ง
fi: valoisasw: -enye mwanga
light
fr: léger
th: ซึ่��งไรี่�ส�รี่ะ
fi: tyhjänpäiväinen
sw: -a kuchekesha
![Page 24: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/24.jpg)
24why multilingual dictionaries were impossible
![Page 25: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/25.jpg)
25
light
how Kamusi makes a multilingual dictionary possible
![Page 26: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/26.jpg)
26
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
![Page 27: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/27.jpg)
27
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
fr: lumineux
fr: léger
fr: allégé
fr: léger
![Page 28: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/28.jpg)
28
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
fr: lumineux th: สว่��งfi: valoisasw: -enye mwanga
![Page 29: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/29.jpg)
29
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
fr: léger th: เบ�fi: kevytsw: -epesi
![Page 30: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/30.jpg)
30
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
fr: léger th: ซึ่��งไรี่�ส�รี่ะfi: tyhjänpäiväinensw: -a kuchekesha
![Page 31: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/31.jpg)
31
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
fr: allégé th: ที่��แคลอรี่��ต่ำ��fi: kaloritonsw: pungufu
![Page 32: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/32.jpg)
32
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
fr: allégé th: ที่��แคลอรี่��ต่ำ��fi: kaloritonsw: pungufu
fr: léger th: ซึ่��งไรี่�ส�รี่ะfi: tyhjänpäiväinensw: -a kuchekesha
fr: léger th: เบ�fi: kevytsw: -epesi
fr: lumineux th: สว่��งfi: valoisasw: -enye mwanga
![Page 33: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/33.jpg)
33how Kamusi makes a multilingual dictionary possible
light (not heavy) fr: léger th: เบ�fi: kevytsw: -epesi
fr: léger (sandy)
fr: léger (low alcohol)
fr: léger (without much luggage)
![Page 34: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/34.jpg)
34
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
![Page 35: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/35.jpg)
35
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
![Page 36: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/36.jpg)
36
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
![Page 37: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/37.jpg)
37
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
![Page 38: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/38.jpg)
38
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
![Page 39: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/39.jpg)
39
light (not serious)
light (not fattening)
light (not heavy)
light (not dark)
how Kamusi makes a multilingual dictionary possible
fr: lumineux th: สว่��งfi: valoisasw: -enye mwanga/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
![Page 40: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/40.jpg)
40how Kamusi makes a multilingual dictionary possible
Catalan: brillant illuminós
Japanese:明るい 明らか
Croatian:
svjetleći
svijetao
Spanish:claro
luminoso
light (not dark)
![Page 41: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/41.jpg)
41how Kamusi makes a multilingual dictionary possible
light
![Page 42: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/42.jpg)
42
![Page 43: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/43.jpg)
43
![Page 44: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/44.jpg)
44
light
![Page 45: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/45.jpg)
45
light
![Page 46: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/46.jpg)
46
light
![Page 47: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/47.jpg)
47
light
meaning
shape
sound
place
time
relationships
![Page 48: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/48.jpg)
48
light
meaning
shape
sound
place
time
relationships
![Page 49: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/49.jpg)
49
light
lighter
lightest
meaning
shape
sound
place
time
relationships
light
lights
lightedlit
lighting
![Page 50: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/50.jpg)
50
light
meaning
shape
sound
place
time
relationships
robot
![Page 51: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/51.jpg)
51
light
meaning
shape
sound
place
time
relationships
![Page 52: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/52.jpg)
52
light
meaning
shape
sound
place
time
relationships
linhtaz
![Page 53: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/53.jpg)
53
light
meaning
shape
sound
place
time
relationships
torch(hyponym)
lamp(synonym)
lighthouse(spawn)
dark(antonym)
car(holonym)
![Page 54: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/54.jpg)
54
(difference)meaning
shape
sound
place
time
relationships
lamp(synonym)
light
![Page 55: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/55.jpg)
55
light
meaning
shape
sound
place
time
relationships
![Page 56: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/56.jpg)
56
light
meaning
shape
sound
place
time
relationships
![Page 57: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/57.jpg)
57
light
meaning
definition examples
translations
![Page 58: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/58.jpg)
58
light
meaning
translations
![Page 59: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/59.jpg)
59
light
meaning
translations
![Page 60: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/60.jpg)
60
equivalence• Parallel• Similar• Explanatory
translations
![Page 61: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/61.jpg)
61
equivalence• Parallel• Similar• Explanatory
hand (English) = main (French)
✓: transitive across languages
translations
![Page 62: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/62.jpg)
62
equivalence• Parallel• Similar• Explanatory
mkono (Swahili) = hand + arm (English)
⁇ : might be transitive across languages
translations
difference difference translation
![Page 63: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/63.jpg)
63
equivalence• Parallel• Partial• Similar
hand (English) = 10.2 cm (most languages)
✗: not transitive across languages
translations
![Page 64: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/64.jpg)
64
light
meaning
definition examples
translationsdefinitiontranslations
example translations
![Page 65: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/65.jpg)
65
light
meaning
definition examples
timehardeasy place
notes
![Page 66: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/66.jpg)
66
light
shape
inflections multiple words
alternates
![Page 67: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/67.jpg)
67
light
lighter
lightest
shape
inflections
soundtranslation shape
separability (MWEs)
• SimpleConfigurable forme.g., English verbs
• ComplexFixed tablee.g., French verbs
• AgglutinativeRule-based codinge.g., Swahili verbs
alternate spellings
place
spelling sets:polysemous terms often have the same inflections.
![Page 68: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/68.jpg)
68
light
liteshape
alternates金魚 きんぎょ キンギョ kingyo goldfish
Kanji Hiragana Katakana Rōmaji English
https://en.wikipedia.org/wiki/Japanese_writing_system
![Page 69: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/69.jpg)
69
shape
multiple words
inflections (+separability)
drives || up the walldrove || up the walldriven || up the walldriving || up the wall
separability
drive || up the wall
Research question:Can we determine Separability Sets?
![Page 70: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/70.jpg)
70
shape
sign languagese.g. Uganda Sign LanguageSolomon Islander Sign Language
• no sound• no spelling
• need for gesture recognition(future research)
ideograms光
• no relation between shape and sound
• no sequencing• ontological
relationships
![Page 71: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/71.jpg)
71
light
place
dialect dialect word sightings
sound sightings
![Page 72: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/72.jpg)
72
light
sound
audio tone
IPA (phonetics)
place
![Page 73: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/73.jpg)
73
light
time
ancestors (other languages)
ancestors(own language)
datings (examples)
![Page 74: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/74.jpg)
74
light
relationships
synonyms ontologies
terminologies
transitivitywith
translations
hierarchiesor
reciprocity
![Page 75: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/75.jpg)
75
Lexicography vs.
TerminologyLexicography:
• General terms
• Variability of concepts among languages
• Describes indigenous words
Terminology
• Domain-specific terms
• Fixed meaning within context
• Prescribes words
sabilli
![Page 76: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/76.jpg)
76
Collecting Data
• Gathering new data• For languages with zero digitized data (most world languages)• For languages with incomplete data (all languages)
• Aligning existing data• To separate terms at concept level• To match concepts across languages
![Page 77: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/77.jpg)
77
Collecting DataExisting Data
• Copyright restrictions• Data structure• Data alignment
![Page 78: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/78.jpg)
78
Collecting DataExpert Interface: Edit Engine
![Page 79: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/79.jpg)
79
Crowdsourcing Lexicography• Gathering new data
• For languages with zero digitized data (most world languages)
• For languages with incomplete data (all languages)
• Aligning existing data• To separate terms at concept level• To match concepts across languages
People are very good at these tasksMachines are very badScholars are very busy
![Page 80: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/80.jpg)
80
Crowdsourcing with Games
• Engage the public in producing raw data• Data can be built upon and refined over time• Collecting “facts” that• can best come from native informants• can be verified by consensus as fulfilling a communicative role
• Wrong data and bad actors can be removed
![Page 81: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/81.jpg)
81
Game Architecture
• Simple tasks the public can understand• “Word” questions to stimulate the mind• Competition elements to stimulate the heart• Answers validated by consensus• Starts with English concept set to have a shared realm of ideas• Grows progressively – winning answers in one mode generate more
advanced questions in the next
![Page 82: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/82.jpg)
82
![Page 83: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/83.jpg)
83
Game Modes
1. Translation2. Synonyms3. Word Forms4. Definitions5. Examples6. Alignment7. Equivalence8. Difference
![Page 84: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/84.jpg)
84
Translation Game
![Page 85: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/85.jpg)
85
Translation Game
![Page 86: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/86.jpg)
86
Definition Game
![Page 87: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/87.jpg)
87
Definition Game
![Page 88: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/88.jpg)
88
Definition Game
![Page 89: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/89.jpg)
89
Example Game
![Page 90: Martin Benjamin The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space 21 May, 2015 – CERN, Geneva 1](https://reader036.vdocuments.us/reader036/viewer/2022062407/56649f3f5503460f94c5fc57/html5/thumbnails/90.jpg)
90
Martin Benjamin
The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space