aws re:invent 2016: leverage the power of the crowd to work with amazon mechanical turk (bda204)

Post on 06-Jan-2017

108 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Russell Smith, co-founder/CTO/CIO, Rainforest QA

November 2016

BDA204

Leverage the Power of the Crowd To Work with Amazon Mechanical Turk

What to Expect from the Session

• Learn what Mechanical Turk (MTurk) is

• Understand the basics

• Learn about scaling beyond the basics

• How Rainforest leverages MTurk

Who am I?

Russell Smith

• CTO & Co-Founder of Rainforest QA

• Programmer

• MTurk Requester for ~5 years

• ~>250m questions through MTurk

• Can follow me on twitter — @rhs

What is Rainforest?

QA-as-a-Service: Fast Crowdsourced Testing for Web and

Mobile Apps thanks to Mechanical Turk:

• Customers write tests in plain English

• Results in ~30 minutes, anytime, 24x7

• Powered by humans

What is Mechanical Turk?

• Super early AWS service

• Public since 2005

• First invented in 2001

• 24 x 7, on-demand, programmatic interface to do Human

Intelligence Tasks (HITs)

• “Automate” the un-automatable

What is Mechanical Turk?

• Pay (lots of) humans to do (lots of) things. Classic things:• Extract data from receipts

• Identify things in photos

• Search for data for you (find the phone number of XYZ restaurant)

• Transcribe audio

• More hip / upcoming things• Data science – build ground truth for machine learning and AI

Basics

Marketplace

• Connects Workers and Requesters

• Requesters are you!

• Web-interface where Workers execute your tasks

• Searchable list of HITs, Workers pick

Requester interface

1. Select a template

2. Provide info on your task and how

much you want to pay.

3. Design the layout of your task

4. Load your variables

5. Publish

Requester interface

- The results of your task can be viewed in the Manage tab.

- This is also where you can view and manage your Workers.

Worker interface

- Workers visit mturk.com

to find HITs they want to

work on.

- Description, reward, and

reputation all matter in

determining if your work

gets done.

Worker interface

- Workers can choose to Accept

a HIT or Skip to the next one in

a set.

- Once they’ve accepted the HIT

they have until the allotted time

has expired to Submit.

- Workers can also Return the

task if they decide they don’t

want to complete it.

Basics - task design

Basics - Task design

Design is critical:

• Bad tasks = bad reputation + bad results

• Unclear tasks = bad reputation + bad results

• Good tasks ~= good reputation + good results

Basics - Task design

My rules:1. Have instructions and/or rules

2. Must be clear to understand (note, not necessarily simple)

3. Must protect against mistakes or fraud

4. Have a fair price

5. Include a feedback field

Basics - Task design

Ask:

• Can the worker get in a groove and churn through tasks?

• Can anyone read the instructions and do this right?

• Do we need to qualify the workers?

Basics - Task design

Pricing iteration1. Work out a budget per assignment

2. Do a small run

3. Verify quality vs speed* of results

4. Fix your task, optimize spend** and goto 4 (repeat forever)

* Qualifications, SEO, # of workers

** Payment, repetition, requirements

Workers

Workers

Workers

Workers

• Motivations

• Earn money

• Status

• Incentives

• Leveling up

• Pride

• Expectations

• Traditionally being treated like an API

• Now; being treated like a human

• Fairness, transparency

Workers

• Lifecycle

• Custom Qualifications / Training

• Master Workers / Premium Qualifications

Community

Community

- Retention is key

- Finding the leaders

- Worker enablement- Help Workers improve

- We do: video tutorials, community forum, clear rules, automated training, re-training

- Ask them what they need!

- Listen to complaints- Add a comment box to your tasks to collect feedback

- NPS

Community

- Handling Workers that you don’t want doing your tasks

- Rejecting

- Qualifications

- Blocking

- Finding spammers and cheaters

- Join the external forums

- Your reputation matters

Intermediate

Hits

- HITType

- HIT

- Assignments

- Notifications

HITType

HITAssignment Assignment

Assignment Assignment

HITAssignment Assignment

Assignment Assignment

HITAssignment Assignment

Assignment Assignment

Notification:

Reviewable

Useful API operations

CreateHIT Create new tasks for Workers to do.

GetAccountBalance Check the funding available for publishing new tasks.

RevokeQualification /

GrantQualification

Modify the Qualifications assigned to Workers.

ForceExpireHIT Immediately remove a HIT from MTurk.

GetAssignment The status and results from an Assignment.

NotifyWorkers Send a message to your Workers.

GrantBonus Provide a bonus payment to Workers.

Use the Sandbox environment to experiment with creating and responding to HITs without spending money.

Question types

• QuestionForm – XML defined questions.

• HTMLQuestion – HTML form based questions.

• ExternalQuestion – Questions hosted on your own website.

Review Policies

- Review Policies can be specified in your CreateHIT call to automatically

evaluate Worker submissions.

- Assignment-level policies can be used to validate Worker responses to

known answers.

- HIT-level policies look for consensus amongst Workers on each HIT.

B B C

B C B

B B

• Imagine you want to ask six Workers

and get 75% agreement.

• If two Workers disagree, the policy

will add additional Assignments until

there is agreement.

How Rainforest QA

Uses Mechanical Turk

Write tests, in plain English

Automatically trained testers

• Fully automated training

• Course + class-based

• Automatic re-training

• Always expanding

• Per-customer training, for special situations

Super fast

Human results

Accurate human results, ML / AI backed

Scaling

Scaling - Rainforest v1

• Initially linked jobs to HITs 1:1

• Balanced a list of HITs against an internal list of jobs

• Constantly pulling on / off MTurk when jobs were added, cancelled, changed.

Jobs HITs

Scaling - Rainforest v2

• Decoupled jobs from HITs

• Balance list of HITs against an internal list of jobs

• Qualifications, constantly pulling on / off MTurk

Jobs HITs

Scaling - Rainforest v3

• Unbalanced job / HITs - no 1:1 ratio, allowing for more

SEO and higher chance of workers finding us

• Stopped using Qualifications

Jobs HITs

Questions

Thank you!

Remember to complete

your evaluations!

top related