growing wikipedia across languages via recommendation · 1/30/2017  · recommendation jure...

Post on 25-Aug-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Growing Wikipedia Across Languages via

RecommendationJure Leskovec

StanfordEllery Wulczyn

WikimediaBob West

EPFLLeila Zia

Wikimedia

Wikipedia is the biggest encyclopedia in history and the 5th most visited website worldwide.

Given that you speak language X, how much can you learn about the world?

All Wikipedia languages 2M geo-located articles

3

Lausanne

Markus Krötzsch, TU Dresden

German Wikipedia 356K geo-located articles 132M native speakers

4Markus Krötzsch, TU Dresden

Spanish Wikipedia 261k Geo Tagged Items 389M Native speakers:

Spanish Wikipedia 261K geo-located articles 389M native speakers 5Markus Krötzsch, TU Dresden

https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en

Arabic Wikipedia 87K geo-located articles 467M native speakers 6Markus Krötzsch, TU Dresden

Problem: Access to Knowledge

Depending on the language you speak, you can only learn about a small subset

of topics

How do we make access to all knowledge universal?

7

Problem: Access to Knowledge

Depending on the language you speak, you can only learn about a small subset

of topics

How do we democratize access to knowledge?

8

How do we grow language-specific Wikipedias faster?

Goal: Recommend important missing articles to human editorsSystem overview:

Wikipedia: Filling knowledge gaps

9

Find Rank Recommend

Bottomline: 3x growth rate with no loss in article quality

Find Rank Recommend

10

How do we identify missing articles in language X?

Task: Find Missing Articles

● Wikidata: Knowledge base for Wikipedia

● Every article in every language is mapped into a node in a concept graph

11

en:Apple sl:Jabolko

Q32

nl:Appel

Task: Find Missing Articles

Q32

English Wikipedia French Wikipedia

Apple Switzerland SuisseAfrique

Q45

12

ConceptsQ46

Wikipedia articles

Task: Find Missing Articles

Q32 Q46

English Wikipedia French Wikipedia

Apple

Q45

13

Concepts

Wikipedia articles

PommeSwitzerland SuisseAfrique

14

We want editors to create important missing articles. How can we rank articles by importance?

Find Rank Recommend

Ranking Missing Articles

Goal: Given a missing article, predict popularity rank in the target language (once it’s created)

Problem: The article doesn’t exist yet (so we don’t have its features for prediction)

15

Ranking Missing Articles

Approach:- Predict popularity of existing French articles- using features extracted from the

corresponding non-French articlesExample features: pageviewstopicslinks

16

Classification algorithm: Random forests

17

Identify interested human editors who can work on a given important missing article

Find Rank Recommend

Recommending Missing Articles

Q22

Q32

Q34

Q12

Top missing articles

Editors

E1

E2

Affinity(E1, Q34) 18

Bipartite matching between editors and missing items

Computing Affinities: Concept Embedding

Q99

Q67

19

Q67

Q34

Computing Affinities

Q99

Q67

Q34

20

Q67E1

Computing Affinities

Q99

Q67

Q34

21

Q67E1

ϴ

cos(ϴ) = Affinity(E1, Q34)

22

Identify missing concepts using Wikidata knowledge base

Predict language- specific popularity

Match missing articles to editors considering their interest

Method Summary

Find Rank Recommend

Experiments

Large-scale experiment targeting French Wikipedia EditorsSent 12K editors 5 recommendations each by email

Research questions: RQ1: Do recommendations boost article creation rate?RQ2: How much does personalization help?RQ3: How do recommendations affect article quality?

23

Large-Scale In Vivo Experiment

24

Articles EditorsCondition

Personalized recommendation

Random recommendations

No recommendations

Large-Scale In Vivo Experiment

24

Articles EditorsCondition

Personalized recommendation

Random recommendations

No recommendations

RQ1: Boost in Creation rate?

25

Yes! 3x boost in creation rate

Personalized No recs.

Creation rate 1.05% 0.32%

*** p < 0.00001

RQ2: Does personalizing help?

26

Yes! 2x boost in activity

Personalized Not personalized

# Editors 6K 6K

Activation rate 2.3% 1.1%

*** p < 0.00001

RQ3: Impact on article quality?

27

No loss in quality!

Deployment at Wikipedia

28

https://recommend.wmflabs.org

Agriculture in Farsi

29

Climate Change in Malagasy

30

ML in Simple English

31

Conclusions

32

Massive knowledge gaps in Wikipedia

System for recommending important missing articles

3x boost in article creation rate with no loss in article quality

Deployed at Wikipedia!

Thank you!

top related