anand kulkarni björn hartmann university of california, berkeley matthew can stanford university...

73
Anand Kulkarni Björn Hartmann University of California, Berkeley Matthew Can Stanford University Collaboratively Crowdsourcing Comple With Turkomatic Turkomati c

Upload: linda-hodge

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Anand KulkarniBjörn Hartmann University of California, BerkeleyMatthew Can Stanford University

Collaboratively Crowdsourcing Complex WorkWith Turkomatic

Turkomatic

Microtask marketplaces excel at simple, repetitive work.

Microtask marketplaces excel at simple, repetitive work.

Transcribe a business card.

Microtask marketplaces excel at simple, repetitive work.

Transcribe a business card.

Look up a fact online.

Much of the work we do in our daily lives is not simple or repetitive.

“Create algebra problems for my mathematics exam.”

“Write a research paper.”

“Create a small piece of software.”

“Arrange my trip to Seattle.”

“Write a blog about Mechanical Turk with a few good entries.”

How do we crowdsource complex work?

Complex work with crowds

Soylent: Editing word processing documents(Bernstein et al ’10)Vizwiz: Answering queries about visual scenes (Bigham et al ‘10)

More complex applications: Platemate [NHZG11], Adrenaline [BBMK11], Crowdforge [KSK11]….

Workflows: Crowd Algorithms

Divide complex tasks into a sequence of microtasks arranged in a workflow

Soylent, Bernstein et al, UIST 2010

Workflow design is labor-intensive

1. Design individual HITs2. Implement parallelism to make sure tasks are done correctly3. Write software to launch HITs and parse worker results4. Test workflow by running program 5. Identify errors6. Iterate from step 1

Workflow design is labor-intensive

Difficult and domain-specific: Workflow design requires extensive up-front iteration and experimentation and is specific to a given task domain.

Inaccessible to non-experts: Few have the patience to implement this process in code

Turkomatic is a system for crowdsourcing high-level complex and creative work where the crowd designs the workflow.

What is Turkomatic?

What is Turkomatic?

Create a new blog about Mechanical Turk with two posts.

Price-Divide-Solve (PDS)

How do we induce the crowd to design a workflow?

Price-Divide-Solve (PDS)

PDS is a divide and conquer algorithm to create workflows.

Price: Can this task be solved for 20 cents?

If yes: Solve task and return the answer.

If no: Divide task into multiple steps.

For each step, recurse.

Merge steps into solution.

Price-Divide-Solve (PDS)

PDS is a divide and conquer algorithm to create workflows.

Price: Can this task be solved for 20 cents?

If yes: Solve task and return the answer.

If no: Divide task into multiple steps.

For each step, recurse.

Merge steps into solution.

Price Task

Price Task

Price-Divide-Solve (PDS)Redundancy is used at each step to ensure quality.

Divide Task

Best subdivisio

nVote

Price Task

Price Task

Price check

Consensus on price

Majority

Price Task

Price Task

Solve Task

Best solution

Vote

Price-Divide-Solve (PDS)Create a new blog about Mechanical Turk with two posts.

Can we solve it for 20 cents?Price

Price-Divide-Solve (PDS)Create a new blog about Mechanical Turk with two posts.

Can we solve it for 20 cents?Price No.

Can we solve it for 20 cents?Price No.

Price-Divide-Solve (PDS)Create a new blog about Mechanical Turk with two posts.

Create a new blog on Wordpress.com.

Write one entry for a blog.

Write a second entry for a blog.

Divide it into two or more steps.Divide

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for a blog.

Create a new blog about Mechanical Turk with two posts.

Write a second entry for a blog.

PriceDivide it into two

or more steps.Divide

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for a blog.

Create a new blog about Mechanical Turk with two posts.

Write a second entry for a blog.

Can we solve it for 20 cents?

Price

Can we solve it for 20 cents?

Can we solve it for 20 cents?

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for a blog.

Create a new blog about Mechanical Turk with two posts.

Write a second entry for a blog.

Can we solve it for 20 cents?

Price

Can we solve it for 20 cents?

Can we solve it for 20 cents?

Yes. Yes. Yes.

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for a blog.

Create a new blog about Mechanical Turk with two posts.

Write a second entry for a blog.

Solve

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for a blog.

Create a new blog about Mechanical Turk with two posts.

Write a second entry for a blog.

Solve

“Welcome to my blog about Mechanical Turk! Here, I’ll be posting

some of my favorite recipes for Mechanical Turk. You’ll be able to follow

along at home and create delicious HITs. From the comfort of your own home! Stay tuned and i’ll show you

some of the best strategies for keeping your Turk workers engaged.”

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for a blog.

Create a new blog about Mechanical Turk with two posts.

Write a second entry for a blog.

Solve

“You may be inclined to price your HITs at the lowest possible rate, but this

isn’t always the best choice. Instead, you should base your pricing on:

-How long will the HIT take?-Is the HIT similar to other HITs? If so,

price it slightly less than theirs.-If the HIT involves a lot of

qualifications, you may want to price it higher, to attract more qualified

workers.”

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for a blog.

Create a new blog about Mechanical Turk with two posts.

Write a second entry for a blog.

mtworker.wordpress.com

Combine the results of solved steps.Merge

mtworker.wordpress.com

Can this task be solved for 20 cents?

YesNo

Write a blog about Mechanical Turk

Submit

Break down the following task.

Write a blog about Mechanical Turk

Step 1:

Step 2:

Add Step Submit

Solve the following task.

Create a new blank blog on Wordpress

Submit

Merge the following subtasks.

Write a blog about Mechanical Turk

Step 1:Step 2:

Submit

Workers previously divided this task into simpler steps and solved each step. Combine their work into a complete solution.

Write a blog post about Mechanical Turk. [answer: This post is…]

Create a blank blog about Mechanical Turk [answer: www...]

Price-Divide-Solve (PDS)

PDS guides the crowd to design workflows in a particular way.

It can attempt to create a workflow for any task, but it can’t produce all workflows.

Write a sentence.Improve the

previous worker’s answer.

Check that the previous answer was improved.

System Recap

Price SolveDivid

e

Requester Interface

System Output

Algorithm

Worker Interface

Experiment 1: Can the crowd plan and execute workflows using PDS?

Over 150 trials, including:

• Java programming• Booking restaurants• Sorting and cleaning data• Blogging• Creating self-portraits• Solving an SAT• Logo design• Travel planning• Writing essays• Web research

Experiment 1: Can the crowd plan and execute workflows using PDS?

Over 150 trials, including:

• Java programming• Booking restaurants• Sorting and cleaning data• Blogging• Creating self-portraits• Solving an SAT• Logo design• Travel planning• Writing essays• Web research

Experiment 1: Success Modes

Write a 3-paragraph essay about whether it’s ever OK to lie.

Write one paragraph arguing it’s OK to lie sometimes.

Write one paragraph suggesting it’s never OK to lie.

Write a conclusion reconciling the two.

Write one sentence

to open the conclusion.

Write 2-3 sentences in the middle of the conclusion.

Write a concluding sentence.

Experiment 1: Success Modes

Data:• 6 subnodes were produced• 44 separate worker judgments were

used• Task completed with a full essay

Experiment 1: Success Modes

“…although many people believe it is always essential to tell the truth, sometimes it may be better to lie. There is credibility in both views. And like many ethical decisions, sometimes the circumstances dictate.

When you tell the truth you develop a stronger bond of trust with those around you. A relationship can not exist without trust. If you lie, you end up telling more lies to cover the first….”

Experiment 1: Failure Modes

There are two ways we found that the algorithm could fail:

-Failing to terminate at all-Completing, but producing

wrong answers

Experiment 1: Failing to terminate

Plan a trip from New York to S.F. that visits 5 interesting places.

Think about where to go next

in Ohio.

Think about where to go next

in Ohio.

Experiment 1: Wrong answers

List the department chairs of the top 20 US programs in CS.

aalto armchair poang lounge chair adirondack chair

aeron chair balans chair

ball chair….

Why does the crowd lose context?

Turkomatic worker:“…I’ve taken a look at your instructions, and I understand them perfectly. However, this task seems to have been inadvertently sabotaged by other turkers who do not understand what you are asking them to do…”

Long workflows involve increasing chains of trust.

Each individual worker has a ~30% probability of failure [Chi/Kittur/Suh ’08, Bernstein et al ’10]

Weakest link problem: If one worker early in the workflow design process makes mistakes, the subsequent decompositions will fail.

Including context doesn’t suffice

One explanation

What if we used more competent workers?

Experiment 2: Can expert workers make Turkomatic work?

Setup: We recruited five graduate students with experience as requesters on Mechanical Turk.

We ran the PDS algorithm on three complex tasks with this crowd: online research, essay writing, and creating a blog

Experiment 2: Can expert workers make Turkomatic work?

Results:

Each of three tested tasks completed correctly when we used only expert workers!

Experiment 2: Can expert workers make Turkomatic work?

Results:

Each of three tested tasks completed correctly when we used only expert workers!

Conclusion:PDS works well with qualified crowds.

How can we successfully run PDS with unskilled workers?

Experiment 3: Can requester management help the crowd?

Workflow visualizer: Monitor the workflow in real-time.

Interactive task editor: Selectively invalidate parts of a workflow.

Workflow seeding: Run previously-designed parts of workflows in the crowd.

Task Graphs (Requester)

Task Graph Nodes

Task Prompt

Status Submitted Answer

completedqueued in progress

Task Graph Edges: Parallel

Parent Task

Split

Sub Task 1

Solve

Sub Task 2

Decide

Task Graph Edges: Sequential

Parent Task

Split

Sub Task 1

Solve

Sub Task 2

Decide

Task Graph Example

Write an essay

Split

Write an outline

Solve 1. Thesis: …

Expand the outline

Decide

Task Graph EditingWrite a 3-paragraph essay…

Split

Think about the topic…

Split Collect information about…

DecideWrite the paragraphs…

Decide

Pick one of the topics

Split

List possible topics

Solve 1. The word…

EDIT TASK DETAILS

Edit TaskEdit

Solution

Edit Subtask

Delete Node

Task:

Status:

Think about the topic you want to write aboutSplit

Task Graph EditingWrite a 3-paragraph essay…

Split

Think about the topic…

Split Collect information about…

DecideWrite the paragraphs…

Decide

Pick one of the topics

Split

List possible topics

Solve 1. The word…

List three main topics…

Solve

Recomputing Task Graphs

• Delete subtree of edited task• Recursively:

– Delete stale solutions in parent tasks– Delete stale solutions in subsequent

sibling tasks (for serial decompositions)

Seeding workflows

We mitigate poor performances by workers by starting with partial workflows.

Run Workflow with Crowd

Experiment 3: Collaboration

Setup: We ran the PDS algorithm using Turkers on three sets of tasks, but actively monitored and intervened only to eliminate errors

Outcomes:Each of the three tested tasks completed correctly with 1 to 4 requester interventions.

Experiment 3: Collaboration

Paragraph 1

Paragraph 2

Paragraph 3

Experiment 3: Collaboration

Crowdsourcing is a term…

Experiment 3: Collaboration

Crowdsourcing is a term…

Chaordix crowd consulting is…

Experiment 3: Collaboration

Crowdsourcing is a term…

Experiment 3: Collaboration

Crowdsourcing is a term…

Crowdsourcing works best on tasks where…

Experiment 3: Collaboration

Crowdsourcing is a term…

Crowdsourcing works best on tasks where…

One of the best known crowdsourcing

platforms…

Conclusion

We presented Turkomatic, a system to let the requesters harness the crowd to design complex workflows.

Our first experiment showed successful and unsuccessful examples could result from letting the crowd design their own tasks.

Our second experiment showed that expert workers could successfully design workflows using PDS.

Conclusion

Last, we showed that an interactive, real-time interface for visualizing and selectively editing worker interfaces could produce viable workflows.

One finding of note

In Turkomatic, highly motivated workers could not contribute to correct others’ errors.

Excessive structure in workflow design prevents the emergence of leaders.

To scale, we may consider giving editing abilities to more capable workers.

Contributions

A simplified interface for crowdsourcing that lowers the threshold for crowdsourcing complex tasks

A new algorithm, techniques, and interfaces enabling the crowd to decompose complex tasks

A new interface for letting requesters edit, visualize, and seed workflows