a new research agenda for wikimedia – big dive 2015
TRANSCRIPT
![Page 1: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/1.jpg)
The sum of all human knowledge in the age of machines
A new research agenda for Wikimedia
Dario Taraborelli • Wikimedia FoundationBig Dive, 16 June 2015
![Page 2: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/2.jpg)
Non-profit running Wikipedia and sister projects
Mission: support the creation and dissemination of collaboratively produced free knowledge.
250+ employees, mostly based in San Francisco
6th most popular web property by traffic of the planet
![Page 3: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/3.jpg)
35M articles in 288 languages 26M media files 60M triples
![Page 4: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/4.jpg)
A conversation
![Page 5: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/5.jpg)
Academic research on Wikipedia
rise and decline of the editor population
gender gap and content biases
contributor motivation
asymmetries in content and provenance of contributions
socio-technical systems governing quality control.
![Page 6: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/6.jpg)
WIkipedia’s rise and decline
https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline
![Page 7: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/7.jpg)
Human curated knowledge in the age of machines
![Page 8: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/8.jpg)
the long-form encyclopedia
![Page 9: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/9.jpg)
Outline
1. sourcing information
2. consuming information
3. distributing content
A new research agenda
Distributed innovation: how we work
![Page 10: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/10.jpg)
1. Sourcing information
![Page 11: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/11.jpg)
Goats
![Page 12: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/12.jpg)
https://en.wikipedia.org/wiki/Goat#Life_expectancy
![Page 13: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/13.jpg)
![Page 14: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/14.jpg)
![Page 15: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/15.jpg)
https://www.wikidata.org/wiki/Q42
![Page 16: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/16.jpg)
https://tools.wmflabs.org/wikidata-todo/stats.php
85%
![Page 17: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/17.jpg)
1. Sourcing information
● What’s the role of humans in sourcing and verifying information when answers to most questions are readily available from search engines?
● Should Wikipedia start integrating algorithmically extracted sources in its contents?
● Should Wikipedia further invest in supporting human generated citations?
![Page 18: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/18.jpg)
2. Consuming information
![Page 19: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/19.jpg)
O. Keyes (2015) The Mobile Singularity is already here. Wikipedia and the Mobile Web
![Page 20: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/20.jpg)
Bite-sized consumption
![Page 21: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/21.jpg)
Structured contributions
![Page 22: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/22.jpg)
Manipulating fragments
![Page 23: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/23.jpg)
media
structured data
referencesmedia
long-form text
fragments
references geocoordinatesstructured
data
decoupled article
Decoupling the article
long-form article
![Page 24: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/24.jpg)
2. Consuming information
● Can we transform Wikipedia contents to make them suitable to bite-sized consumption?
● How to accelerate extraction of structured data from Wikipedia and its use in Wikidata?
● How to design effective lightweight contribution funnels around structured data and content fragments?
● How to support programmatic manipulation of content fragments?
![Page 25: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/25.jpg)
3. Distributing content
![Page 26: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/26.jpg)
The paradox of reuse
![Page 27: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/27.jpg)
Routing attention
Women in Science
Wikipedia needs your help
The English Wikipedia article Women in Science needs contributors from a more global perspective. Help expand it!
![Page 28: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/28.jpg)
Routing attention
![Page 29: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/29.jpg)
Routing attention
![Page 30: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/30.jpg)
3. Distributing content
● How can we design content distribution systems that do not intermediate Wikipedia?
● How do we leverage content syndication to route (expert) attention to the source?
![Page 31: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/31.jpg)
A new research agenda
Designing and evaluating systems to:
1. preserve and increase transparent sourcing of information
2. break down long-form articles into their constituents
3. optimize content fruition, as a function of access
4. enable lightweight contribution/manipulation of structured data / fragments
5. leverage content distributed / syndicated by 3rd parties
6. prioritize work and route contributors to the site, as a function of demand
![Page 32: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/32.jpg)
Distributed innovation: how we work
![Page 33: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/33.jpg)
Open knowledge curation ecosystem
Humans
Cyborgs
Machines
![Page 34: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/34.jpg)
Wikimedia Research as a platform
Wikimedia Research & Data team
Edit/article quality classifiers
Automated link recommendations
Article creation recommendations
Fundraiser testing and optimization
![Page 35: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/35.jpg)
Scaling Wikimedia Research
1:100,000,000Approximate ratio of full-time data scientists at WMF to monthly unique visitors
![Page 36: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/36.jpg)
Formal collaborations
Stanford University
GroupLens, University of Minnesota
Oxford Internet Institute
Los Alamos National Laboratory
https://wikimediafoundation.org/wiki/Open_access_policy
![Page 37: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/37.jpg)
Open data
https://meta.wikimedia.org/wiki/Research:Data
![Page 38: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/38.jpg)
Open data: pageviews
http://www.wikipediatrends.com
![Page 39: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/39.jpg)
Open data: clickstream
Wulczyn, E; Taraborelli, D (2015): Wikipedia Clickstream. http://dx.doi.org/10.6084/m9.figshare.1305770
![Page 40: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/40.jpg)
Open data: tuples
https://www.wikidata.org/wiki/Wikidata:Data_access http://tools.wmflabs.org/wikidata-todo/tempo_spatial_display.html
![Page 41: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/41.jpg)
Open data: real-time changes
https://wikitech.wikimedia.org/wiki/RCStream
![Page 42: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/42.jpg)
Conclusions
![Page 44: A new research agenda for Wikimedia – Big Dive 2015](https://reader034.vdocuments.us/reader034/viewer/2022042716/55cefe77bb61ebaf438b480b/html5/thumbnails/44.jpg)
Image creditsElection Night Crowd, Wellington, 1931https://www.flickr.com/photos/nationallibrarynz_commons/3326203787CC0
King Billy of Dalkey Islandhttps://www.flickr.com/photos/paulodonnell/5937678226CC BY
Secretary at typewriter, 1912https://www.flickr.com/photos/muohio_digital_collections/3192197470CC0
"Getting em up" at U.S.Naval Training Camp, Seattle, Washington. ca. 1917 - ca. 1918https://www.flickr.com/photos/usnationalarchives/5505933145CC0