collaborative multilingual dictionaries

32
20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 1 Collaborative Multilingual Dictionaries Christian M. Meyer Lexicografía multilingüe en rede (MULTILEX). October 20–21, 2015. Santiago de Compostela, Spain.

Upload: carlos-valcarcel-riveiro

Post on 27-Jan-2017

260 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 1

Collaborative Multilingual Dictionaries Christian M. Meyer

Lexicografía multilingüe en rede (MULTILEX). October 20–21, 2015. Santiago de Compostela, Spain.

Page 2: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 2

Outline

Collaborative Multilingual Dictionaries

Page 3: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 3

Outline

Collaborative Multilingual Dictionaries

Dictionaries created by users

Typically, dictionaries are the product of few expert lexicographers

Page 4: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 4

Outline

Collaborative Multilingual Dictionaries

Dictionaries for > 2 languages

Dictionaries created by users

Most dictionaries cover one or two languages

Page 5: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 5

Outline

Collaborative Multilingual Dictionaries

Dictionaries for > 2 languages

Dictionaries created by users

1. User participation

2. Quality of user contributions

3. Multilingual Structures

Page 6: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 6

Involving users in the dictionary-making process is not new

• Oxford English Dictionary started already in the 19th century (cf. Thier, 2014)

But with the advent of the World Wide Web, user participation became

• possible at a large scale

• easier

• faster

• more diverse

User Participation

Page 7: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 7

1. Direct User Contributions

Creation, modification, and deletion of dictionary articles by users

• Contributions to open-collaborative dictionaries

• Completely built by dictionary users

• No editorial control

• Contributions to collaborative-institutional dictionaries

• Often no modification of existing articles

• Reuse in other publisher-owned products

• Contributions to semi-collaborative dictionaries

• Checked by editorial staff

• Typically no extensive revision

(Abel/Meyer, 2013)

Page 8: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 8

Feedback from users to dictionary makers

• Explicit Feedback

• Suggestions for articles or the dictionary as a whole

• Additions (e.g., quotations, usage examples)

• Corrections or quality flaws

• Implicit Feedback

• Users are often not aware of this type of feedback

• Log file analysis documenting dictionary use

• Integrating external content (e.g., from Flickr)

2. Indirect User Contributions

(Abel/Meyer, 2013)

Page 9: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 9

Exchange beyond the dictionary content

• Exchange between dictionary makers and users

• Blogs, newsletter, social networks, etc.

• Dictionary games and usage guides

• Language consultation services

• Exchange between dictionary users themselves

• Forums

• Discussion pages

• User comments

3. Accessory User Contributions

(Abel/Meyer, 2013)

Page 10: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 10

Types of User Contributions

1. Direct user contributions

2. Indirect user contributions

3. Accessory user contributions

Contributions to

• open-collaborative dictionaries

• collaborative-institutional dictionaries

• semi-collaborative dictionaries

• Explicit feedback

• Implicit feedback

• Exchange between dictionary makers and dictionary users

• Exchange among dictionary users

(Abel/Meyer, 2013)

Page 11: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 11

Multilingual Dictionaries

Users can…

• add/suggest translations

• revise (modify/delete) translations

• assume responsibility for a certain language pair, language variety, or domain

• check the quality of entries (collaborative quality assessment)

• comment the dictionary as a whole, give implicit and explicit feedback

• answer language-related questions beyond the dictionary (e.g., in a forum)

• …

Page 12: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 12

Strengths

• Rapid growth

• Language varieties

• Many language (pairs)

• Subjective language intuitions

• Open licenses

• Quality assessment

Weaknesses

• Inconsistency

• Incorrect, unspecific,

old-fashioned descriptions

• Plagiarism

• “Complicated” articles missing

• Corpus evidence

htt

ps:

//p

ixab

ay.c

om

/de/

waa

ge-g

erec

hti

gkei

t-au

sgeg

lich

en-3

10

96

2/

(pu

blic

do

mai

n, C

C0

)

(Meyer, 2013; Meyer/Gurevych, 2014)

Quality of User Contributions

Page 13: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 13

Strengths

• Rapid growth

• Language varieties

• Many language (pairs)

• Subjective language intuitions

• Open licenses

• Quality assessment

Weaknesses

• Inconsistency

• Incorrect, unspecific,

old-fashioned descriptions

• Plagiarism

• “Complicated” articles missing

• Corpus evidence

htt

ps:

//p

ixab

ay.c

om

/de/

waa

ge-g

erec

hti

gkei

t-au

sgeg

lich

en-3

10

96

2/

(pu

blic

do

mai

n, C

C0

)

(Meyer, 2013; Meyer/Gurevych, 2014)

Quality of User Contributions

How to steer user contributions to get the best ones only?

Page 14: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 14

Collaborative Quality Assessment

(Meyer/Gurevych, 2014)

Page 15: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 15

Outline

Collaborative Multilingual Dictionaries

Dictionaries for > 2 languages

Dictionaries created by users

1. User participation

2. Quality of user contributions

3. Multilingual Structures

Page 16: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 16

LEO Dictionary Portal: Example

DE EN

DE ES

LEO , http://dict.leo.org

Page 17: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 17

Bilingual Translation Tables

DE EN

DE ES

Verteidiger | Verteidigerin defender [law]

Verteidiger | Verteidigerin apologist

Verteidiger [Fußball] back [sport.]

Verteidiger | Verteidigerin el defensor | la defensora

Verteidiger | Verteidigerin el apologista | la apologista

Libero [Fußball] el líbero [dep.]

? • Sense distinctions and usage comments? • Translation shifts? • Multilingual queries difficult (e.g., EN ES)

LEO , http://dict.leo.org

Page 18: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 18

Refugee Phrasebook

• Open-collaborative dictionary by the Open Knowledge Foundation

• Targeted at refugees and voluntary helpers

• Phrases from everyday life, medicine&healthcare, and law

• Available as public domain (Creative Commons CC0)

• > 500 phrases in > 30 languages since September 2015

• http://www.refugeephrasebook.de/project/

Page 19: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 19

Refugee Phrasebook: Example

Page 20: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 20

Multilingual Translation Table

Richter القاضي

Visum تأشيرة دخول

judge

visa

El juez / La jueza

Visa / visado

• Multiple translations or variants (e.g., gender)? • Translation shifts? • Sense distinctions and usage comments?

Verteidiger محامي الدفاع counsel

for defence Abogado defensor

Page 21: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 21

Wiktionary: Example

htt

ps:

//d

e.w

ikti

on

ary.

org

/wik

i/V

erte

idig

er

Page 22: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 22

htt

ps:

//d

e.w

ikti

on

ary.

org

/wik

i/V

erte

idig

er

Fixed Source Language

Verteidiger1 защитник

back

• Meaning of the translated word? • Consistency? • Effort?

Verteidiger2

Verteidiger3

Verteidiger4

defence

defensa

abogado

Verteidigung

Wehr

?

Page 23: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 23

Wikipedia: Example

htt

ps:

//d

e.w

ikip

edia

.org

/wik

i/St

rafv

erte

idig

er_(

Deu

tsch

lan

d)

Page 24: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 24

htt

ps:

//d

e.w

ikip

edia

.org

/wik

i/St

rafv

erte

idig

er_(

Deu

tsch

lan

d)

• Translation shift? • Multiple translations?

Müdafi1

Strafverteidiger (Deutschland)1

Защитник в уголовном процессе1

Criminal defense lawyer1 { } Abwehrspieler1

Защитник (футбол)1

Defender (association football)1

Defensa (fútbol)1 { }

Interwiki Links

Defans1

(قدمكرة )مدافع 1

Page 25: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 25

OmegaWiki: Example

htt

p:/

/ww

w.o

meg

awik

i.org

/Exp

ress

ion

:Ver

teid

iger

Page 26: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 26

htt

p:/

/ww

w.o

meg

awik

i.org

/Exp

ress

ion

:Ver

teid

iger

Multilingual Synsets

Verteidiger1

advocate1

• Translation shift? • Missing lexicalizations in a language?

Befürworter1 Vertreter1 partidaria1 { proponent1 } Verteidiger2

защитник1 defender1

Verfechter2 defensor1 defensora1 { supporter1 } partidario1 proponente1

Page 27: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 27

Context-specific Organization?

• Organization types are usually static

• Either defense lawyer is a translation of Verteidiger or not

• Can you imagine more flexible types of organization?

• Users need to solve problems

• Ideally, a dictionary understand the problem and returns ONLY the relevant information

Page 28: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 28

Context-specific Organization?

What some of my colleagues consider a “dictionary”:

• At first sight: no descriptions; we stick with erroneous output…

• But: translations are chosen based on context!

Page 29: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 29

Context-specific Organization?

Deliver dictionary knowledge which exactly fits this (translation) context

Page 30: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 30

Take-Home Message

Diverse forms of user participation

• Direct, indirect, accessory contributions

• Strengths and weaknesses

A variety of multilingual structures

• Bilingual and multilingual tables

• Fixed source language

• Interwiki links and multilingual synsets

• Context-specific organization

• More flexible structures?

• Lexicographers–users cooperation?

htt

ps:

//p

ixab

ay.c

om

/en

/han

d-l

eav

e-p

en-p

aper

-th

ank-

you

-22

63

58

/ (C

C0

Pu

blic

Do

mai

n)

Page 31: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 31

Further Readings

Ch.M. Meyer/A. Abel: User Contribution in the Era of the Internet, chapter 27 in P.A. Fuertes Olivera (Ed.): The Routledge Handbook of Lexicography, London: Routledge, (in preparation).

Ch.M. Meyer, I. Gurevych: Methoden bei kollaborativen Wörterbüchern, Lexicographica 30(1): 187–212, 2014.

Ch.M. Meyer: Wiktionary: The Metalexicographic and the Natural Language Processing Perspective (= tuprints 3654). Dissertation, Darmstadt: Technische Universität Darmstadt 2013. http://tuprints.ulb.tu-darmstadt.de/3654/

A. Abel, Ch.M. Meyer: The dynamics outside the paper: user contributions to online dictionaries, in: Proceedings of the 3rd eLex conference ‘Electronic lexicography in the 21st century: thinking outside the paper’, pp. 179–194, 2013. Tallinn, Estland.

M. Matuschek, Ch.M. Meyer, I. Gurevych: Multilingual Knowledge in Aligned Wiktionary and OmegaWiki for Translation Applications, Translation: Computation, Corpora, Cognition – Special Issue “Language Technology for a Multilingual Europe” 3(1):87–118, 2013.

I. Gurevych, J. Eckle-Kohler, S. Hartmann, M. Matuschek, Ch.M. Meyer, C. Wirth: UBY – A Large-Scale Unified Lexical-Semantic Resource Based on LMF, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 580–590, April 2012. Avignon, France.

Ch.M. Meyer, I. Gurevych: Wiktionary: a new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography. chapter 13 in S. Granger, M. Paquot (Hrsg.): Electronic Lexicography, pp. 259–291, Oxford: Oxford University Press, November 2012.

Page 32: Collaborative Multilingual Dictionaries

20.10.2015 | Fachbereich Informatik | Ubiquitous Knowledge Processing (UKP) Lab | Christian M. Meyer | 32

Kontakt / Contact Christian M. Meyer Technische Universität Darmstadt Ubiquitous Knowledge Processing Lab Hochschulstr. 10, 64289 Darmstadt, Germany +49 (0)6151 16–5386 +49 (0)6151 16–5455 meyer (at) ukp.informatik.tu-darmstadt.de

Rechtliche Hinweise

Die Folien sind für den persönlichen Gebrauch der Vortragsteilnehmer

gedacht. Im Vortrag verwendete Photographien, Illustrationen, Wort- und

Bildmarken sind Eigentum der jeweiligen Rechteinhaber oder Lizenzgeber. Um

Missverständnisse zu vermeiden, wäre eine kurze Kontaktaufnahme vor

Weitergabe oder -nutzung der Vortragsmaterialien empfehlenswert. Sofern Sie

Ihre Rechte verletzt sehen, bitte ich ebenfalls um Kontaktaufnahme zur

Klärung der Sachlage.

Legal Issues

The slides are intended for personal use by the audience of the talk.

Photographies, illustrations, tradedmarks, or logos are property of the holder of

rights. To avoid any misconceptions, I would strongly recommend to get in

touch before reusing or redistributing the slides or any additional material of the

talk. The same applies if you consider your rights infringed – please let me

know to initiate further clarification.