wikimedia presentation data mining meetup pub

16
data and 1

Upload: howiewiki

Post on 08-May-2015

357 views

Category:

Technology


1 download

DESCRIPTION

Presentation given at SF Data Mining meetup in November 2010

TRANSCRIPT

Page 1: Wikimedia presentation data mining meetup pub

data and

1

Page 2: Wikimedia presentation data mining meetup pub

Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s our commitment.

Jimmy Wales, Founder of Wikipedia

2

Page 3: Wikimedia presentation data mining meetup pub

is: Bigger than you think

Smaller than you think

3

Page 4: Wikimedia presentation data mining meetup pub

477,000,000

Readers every month

4

Page 5: Wikimedia presentation data mining meetup pub

5

272

Number of Wikipedia Language Versions

Page 6: Wikimedia presentation data mining meetup pub

The English Wikipedia: 10 years of dataAs of September 2011

3,754,533

3,806,293

293,893,801

2,337,355,406

6

articles

people have edited

total edits

words (estimated)

= 9+ million pages!

Page 7: Wikimedia presentation data mining meetup pub

User FunnelEnglish Wikipedia per month

200-300M Readers

35,000 Active Editors

3,500 Very Active Editors

(~80% of edits)

7

91% male

College Educated

Average age: 32

Predominantly from North America, Western Europe

Page 8: Wikimedia presentation data mining meetup pub

8

Page 9: Wikimedia presentation data mining meetup pub

Most Edited Wikipedia Article?

9

George W. Bush

Page 10: Wikimedia presentation data mining meetup pub

Most Edited Pages

10

Total EditsTotal Unique

EditorsArticle

43,648 13,783 George W. Bush

33,534 4,306 Barack Obama (discussion)

30,567 3,817 List of World Wrestling Entertainment employees

27,433 8,242 United States

25,308 2,609 Global warming (discussion)

25,224 1,821 Sarah Palin (discussion)

23,241 5,672 Michael Jackson

21,768 5,933 Jesus

21,501 4,647 George W. Bush (discussion)

21,343 753 Gaza War (discussion)

In the month surrounding the release of Inconvenient Truth:

116 people edited >132 people edited >5

Page 11: Wikimedia presentation data mining meetup pub

11

Page 12: Wikimedia presentation data mining meetup pub

12

Page 13: Wikimedia presentation data mining meetup pub

Why do editors leave Wikipedia?

13

Page 14: Wikimedia presentation data mining meetup pub

70% of new users receive their first message from a bot

14

Page 15: Wikimedia presentation data mining meetup pub

How we use data

Past

Descriptive analysis

•Why do people edit?

•Why do they stop?

•How can we make them stay longer?

•What types of social interactions correlate with longevity?

15

Present

Experimentation

•How can we create on-ramps into editing?

•How can we improve interactions between new and experienced editors?

•How can we acculturate new editors more effectively?

Future

Predictive modeling

•How can we predict whether someone will be an active editor?

•How can we predict when an editor is going to leave?

Page 16: Wikimedia presentation data mining meetup pub

Get Involved!

Our data is open:

•http://stats.wikimedia.org/ (excel)

•http://toolserver.org/ (queries)

•http://dumps.wikimedia.org/ (xml dumps - advanced)

• https://github.com/whym/wikihadoop

Research hub: http://meta.wikimedia.org/wiki/Research

Survey: http://bit.ly/WikimediaData

Work with the Foundation!

16