crowdsourcing & human...

45
Crowdsourcing & Human Computation Matt Lease School of Information University of Texas at Austin [email protected]

Upload: lynhan

Post on 07-Jul-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Crowdsourcing & Human Computation

Matt Lease

School of Information

University of Texas at Austin

[email protected]

Page 2: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

“Amazon Remembers”

April 15, 2011 Matt Lease - [email protected] 2

J. Pontin. Artificial Intelligence, With Help From the Humans. NY Times (March 25, 2007)

Page 3: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Matt Lease - [email protected] 3

• “Micro-task” crowdsourcing marketplace

• On-demand, scalable, real-time workforce

• Online since 2005 (and still in “beta”)

• Progammer’s API & “Dashboard” GUI

Amazon Mechanical Turk (MTurk)

April 15, 2011

Page 4: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

From Outsourcing to Crowdsoucing

http://www.mturk-tracker.com (P. Ipeirotis’10)

From 1/09 – 4/10, 7M HITs from 10K requestors worth $500,000 USD (significant under-estimate)

Matt Lease - [email protected] 4 April 15, 2011

Page 6: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Road Map

• Example and Introduction

• Crowdsourcing

• Human Computation

• Rethinking Application Design

• The Road Ahead

• Wrap-up

April 15, 2011 Matt Lease - [email protected] 6

Page 7: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Crowdsourcing

April 15, 2011 Matt Lease - [email protected] 7

Page 8: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Crowdsourcing

Matt Lease - [email protected]

• Take a job traditionally performed by a known agent (often an employee)

• Outsource it to an undefined, generally large group of people via an open call

• New application of principles from open source movement

8 April 15, 2011

Page 9: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

April 15, 2011 Matt Lease - [email protected] 9

Page 10: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Other Crowdsourcing Examples

Matt Lease - [email protected] 10 April 15, 2011

Page 11: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Human Computation

April 15, 2011 Matt Lease - [email protected] 11

Page 12: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

The Mechanical Turk

Matt Lease - [email protected] 12

The original, constructed and unveiled in 1770 by Wolfgang von Kempelen (1734–1804)

April 15, 2011

Page 13: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

The Turing Test (Alan Turing, 1950)

Matt Lease - [email protected] 13 April 15, 2011

Page 14: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Matt Lease - [email protected] 14 April 15, 2011

Page 15: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

The Turing Test (Alan Turing, 1950)

Matt Lease - [email protected] 15 April 15, 2011

Page 16: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

What is a Computer?

Matt Lease - [email protected] 16 April 15, 2011

Page 17: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Princeton University Press, 2005

• What was old becomes new

• “Crowdsourcing: A New Branch of Computer Science” (March 29, 2011)

Matt Lease - [email protected] 17 April 15, 2011

Page 18: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Davis et al. (2010) The HPU.

Matt Lease - [email protected] 18

HPU

April 15, 2011

Page 19: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Human Computation

• People become ‘computists’ once more

– Do tasks computers cannot (do well)

• 1. Detect robots (Captcha – “reverse Turing test”)

• 2. Micro-tasks and data labeling (at scale)

– Game changer for improving practical AI: starving for data

• 3. Rethink what is possible in application design

– Integrate CPU + HPU = new capabilities

19 April 15, 2011 Matt Lease - [email protected]

Page 20: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Blending Automation & Human Computation

April 15, 2011 Matt Lease - [email protected] 20

Page 21: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Soylent: A Word Processor with a Crowd Inside

• Bernstein et al., UIST 2010

April 15, 2011 Matt Lease - [email protected] 21

Page 22: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Translation by monolingual speakers

• C. Hu, CHI 2009

April 15, 2011 Matt Lease - [email protected] 22

Page 23: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

fold.it

• S. Cooper et al. (2010).

April 15, 2011 Matt Lease - [email protected] 23

Page 24: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Invisible By-product

L. von Ahn et al. (2008). recaptcha… In Science.

Matt Lease - [email protected] 24 April 15, 2011

Page 25: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

CrowdSearch and mCrowd

• T. Yan, MobiSys 2010

April 15, 2011 Matt Lease - [email protected] 25

Page 26: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

April 15, 2011 Matt Lease - [email protected] 26

Page 27: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Wisdom of Crowds Computing Pre-conditions

• Diversity

• Independence

• Decentralization

• Aggregation

Input: large, diverse sample

(increases likelihood of overall pool quality)

Output: consensus, selection, distribution

27 Matt Lease - [email protected] April 15, 2011

Page 28: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Unreasonable Effectiveness of Data

• Massive free Web data changed how we train learning systems

– Banko and Brill (2001). Human Language Tech.

– Halevy et al. (2009). IEEE Intelligent Systems.

Matt Lease - [email protected] 28

• How might access to cheap & plentiful labeled data change the balance again?

April 15, 2011

Page 29: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

CrowdForge: MapReduce for Automation + Human Computation

Matt Lease - [email protected] 29 April 15, 2011

Page 30: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Wrap-up

April 15, 2011 Matt Lease - [email protected] 30

Page 31: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Conclusions

• Shift in practice that’s here to stay

– Fast, cheap, easy: collect data or do work

– Emerging phenomenon to study and guide

• New capabilities in application design from

automation + human computation

• Hot area, fast changing, many open problems

Matt Lease - [email protected] 31 April 15, 2011

Page 32: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Matt Lease - [email protected]

Thank You! • Students

– Catherine Grady (iSchool)

– Hyunjoon Jung (ECE)

– Adriana Kovashka (CS)

– Abhimanu Kumar (CS)

• Omar Alonso, Microsoft Bing

• Support – John P. Commons

ir.ischool.utexas.edu/crowd

UT Mechanical Turk & Crowdsourcing Google Group

April 15, 2011 32

Page 33: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

April 15, 2011 Matt Lease - [email protected] 33

Page 34: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Special issue of Springer’s Information Retrieval journal on Crowdsourcing (papers due May 6, 2011)

Upcoming Conferences & Workshops

• 3rd HCOMP at AAI 2011

• CHI 2011 Workshop (proceedings online)

• SIGIR 2011 Workshop

• TREC 2011 Crowdsourcing Track

• CrowdConf 2011 (TBA)

Matt Lease - [email protected] 34

Resources & Upcoming Events

April 15, 2011

Page 35: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Data Collection

April 15, 2011 Matt Lease - [email protected] 35

Page 36: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Snow et al. (2008). EMNLP

• 5 Tasks – Affect recognition

– Word similarity

– Recognizing textual entailment

– Event temporal ordering

– Word sense disambiguation

• 22K labels for $26

• high agreement between Turk annotations and expert “gold” labels

Matt Lease - [email protected] 36 April 15, 2011

Page 37: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Example – Dialect Identification

37 Matt Lease - [email protected] April 15, 2011

Page 38: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Example – Spelling correction

Matt Lease - [email protected] 38 April 15, 2011

Page 39: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Kovashka & Lease, CrowdConf’10

Matt Lease - [email protected] 39 April 15, 2011

Page 40: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

The Crowd

April 15, 2011 Matt Lease - [email protected] 40

Page 41: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Models & Incentives

• Pay (e.g. MTurk) • Fun (or avoid boredom) • Socialize • Earn acclaim/prestige • Altruism • Learn something new (e.g. English) • Invisible by-product (e.g. re-Captcha) • Create self-serving resource (e.g. Wikipedia) Multiple incentives are often offered in tandem

41 April 15, 2011 Matt Lease - [email protected]

Page 42: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Altruism

• Contribute knowledge

• Help others (who need knowledge)

• Help workers (e.g. SamaSource)

• Charity

42 April 15, 2011 Matt Lease - [email protected]

Page 43: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Games with a Purpose (L. von Ahn)

– Players have fun, creators get data as by-product

• distinct from Serious Gaming / Edutainment

– Player learning / training / education is by-product

43 April 15, 2011 Matt Lease - [email protected]

Page 44: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

Worker Demographics

• 2008-2009 studies found less global and diverse than previously thought

– US

– Female

– Educated

– Bored

– Money is secondary

44 April 15, 2011 Matt Lease - [email protected]

Page 45: Crowdsourcing & Human Computationir.ischool.utexas.edu/crowd/mlease-advisory-council-041511.pdf · Crowdsourcing & Human Computation Matt Lease School of Information ... The Turing

2010 shows increasing diversity

47% US, 34% India, 19% other (P. Ipeitorotis. March 2010)

45 April 15, 2011 Matt Lease - [email protected]