crowdsourcing beyond mechanical turk: building crowdmining services for your own research
Post on 14-Sep-2014
1.677 views
DESCRIPTION
The keynote talk at CrowdKDD 2012 http://www.cse.ust.hk/~nliu/crowdkdd12/TRANSCRIPT
Crowdsourcing beyond …
Kuan-Ta Chen
Institute of Information Science Academia Sinica
Building Crowdmining Services for Your Own Research
CrowdKDD’12 Aug 12, 2012
What I’m going to talk
Crowdsourcing?
Crowdsourcing + Data Mining Research?
Common Fallacies of CS4DM Research
Pomics: A Crowdmining Service
Conclusion
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 3
Crowdsourcing = Crowd + Outsourcing
“soliciting solutions via open calls to large-scale communities”
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 4
A more formal definition
“Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” [1]
[1] Howe, Jeff. Crowdsourcing: A Definition, http://crowdsourcing.typepad.com/
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 9
General Questions
Reward: points on Yahoo! Answers
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 10
When crowdsourcing meets data mining…
Crowdsourcing Data mining
What’s in here?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 11
Crowdsourcing for Data Mining: Issues
Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation
Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 12
Crowdsourcing Uses in Data Mining Research
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 13
Image Semantics
Reward: 0.04 USD / task
main theme? key objects?
unique attributes?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 14
0.02 USD/ task
find out photos of revolvers!
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 17
0.01 USD/ task
Photo Orientation
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 18
Perspectives for 3D Objects
Thi Phuong Nghiem, Axel Carlier, Geraldine Morin, and Vincent Charvillat, "Enhancing online 3D products through crowdsourcing," ACM CrowdMM'12.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 19
Web Site Classifier
12 USD / hour Panos Ipeirotis, “Crowdsourcing using Mechanical Turk: Quality Management and Scalability,” Invited Talk at CSDM 2011.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 20
Photographers’ Intention to support a task? to capture a bad feeling? to preserve a good feeling? to recall later on? to publish it online? to show it to friends and family?
Mathias Lux, Mario Taschwer, and Oge Marques, “A Closer Look at Photographers’ Intentions: a Test Dataset,” ACM CrowdMM’12.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 22
Linguistic Affective Judgement
Affective response (Snow et al. 2008)
USD 0.4 to label 20 headlines (140 labels)
“Closing and cancellations top advice on flu outbreak”
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 24
A Lot More Examples
Document relevance evaluation Alonso et al. (2008)
Document rating collection Kittur et al. (2008)
Noun compound paraphrasing Nakov (2008)
Person name resolution Su et al. (2007)
And so on...
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 25
THE COMMON FALLACIES -- EXPERIENCES FROM CROWDMM’12
Thanks to CrowdMM’12 co-organizers: Wei-Tsang Ooi, Martha Larson, and Wei-Ta Chu; also thanks to “Crowdsourcing for Multimedia” SI co-guest-editors Paul Bennent and Matt Lease.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 26
Common Fallacies #1
Crowdsourcing is NOT JUST conducting user studies
Crowd is uncontrollable with tasks performed in uncontrolled conditions
How to manage the crowd?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 27
Common Fallacies #2
Crowdsourcing is NOT JUST analyzing user-generated content
Cope with the noise in UGC rather than only the information.
How to manage the imperfectness & diversity in UGC?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 28
Common Fallacies #2
Crowdsourcing is NOT JUST analyzing user-generated content
Put the task element in the loop
Re-purposing the creation of UGC as your own microtasks
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 29
Common Fallacies #3
Crowdsourcing is NOT JUST posting tasks on Mechanical Turk
Explicit Crowdsourcing Implicit Crowdsourcing
Piggyback Crowdsourcing
Doan et al, "Crowdsourcing systems on the World-Wide Web," CACM, vol 54, no 4, 2011.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 31
Crowdsourcing for Data Mining: Issues
Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation
Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 32
The Era of Too Many Photos People today use pictures to write down their daily experience (with the prevalence of digital cameras)
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 34
3 Common Ways
Photo browsing Photo/video slideshow Illustrated text
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 43
Media Comparison Creation
Cost Viewer
Req. Viewer Control
Richness Port-
ability
Photo browsing
Low Low High Low Low
Slideshow Medium Low Low Medium Low
IllustratedText
High High High High High
Comic High Low High High High How to lower it?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 48
Computer-Aided Storytelling
Picture
Location Timing Analysis
Aesthetics Analysis Semantics Analysis
User Preference
Own rating Popularity
Auto Storytelling
Automated
Adjustment
Machine Learning
Draft Story
User Editing
Final Story
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 49
Technical Challenges #1
Semantics Analysis Human recognition Emotion recognition Behavior recognition Object recognition Location identification Natural language processing
Aesthetics Analysis Exposure Composition
Timing Analysis Contextual Analysis
49
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 50
Technical Challenges #2 Automatic Storytelling Significant photo selection Paginating and page layouting Narrative design
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 51
Publish & share
Pomics as a Social Service
Web albums
Web resources
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 55
HOW IS RELATED TO CROWDSOURCING?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 56
USERS ARE IMPLICITLY DOING IMAGE ANNOTATION AND EVALUATION
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 57
What pictures are used?
Why the 3 pictures were used?
Aesthetics information
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 58
Wizard Interface
Aesthetics information
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 59
The Page Layout
Semantics
Saliency info
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 60
Usage Statistics of Pomics (since July 15 2012)
352 authors 434 comic books
4,362 frames 4,332 images used 1,057 image annotations 3,789 text balloons
3000+ shares on Facebook
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 65
Picture Semantics
Love / Like / Dear Happy Sleepy / sleeping Tears Wearing a hat NO!
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 66
Can Pomics Do Micro-tasks?
The answer is YES! Users were asked to create comics using a specific album Rewarded by 200 MB quota if their books are “shared” by 20+ FB users
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 67
Picture Aesthetics from Microtasks
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 68
Picture Saliency from Microtasks
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 69
Crowdmining Services
Advantages No or little hiring cost once right incentives are given Easily scale up Can change the game rules to fit to research
Disadvantages
High development cost Less flexible Hard to find the right incentives (besides money)
Conclusion
Crowdmining is a potential and exciting area Crowdsourcing != Mechanical Turking A lot more can be done with crowdmining services
Building your own crowdmining service
today!
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 71
CrowdMM 2012
Keynote: Prof. Masataka Goto (AIST, Japan) 11 oral+poster presentations
Annotation, Evaluation, Novel applications
An industrial panel discussion Welcome to join us!
(in conjunction with ACM Multimedia 2012)
http://crowdmm.org/
Kuan-Ta Chen Academia Sinica
Unleash the power of
Crowd!
Thank You!
http://www.iis.sinica.edu.tw/~swc