crowdsourcing software engineering studies: … t. stolee, sebastian g. elbaum: exploring the use of...

29
Crowdsourcing Software Engineering Studies: Opportunities and Perils Sebastian Elbaum (based on work performed with Kathryn Stolee)

Upload: lykhanh

Post on 24-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Crowdsourcing Software Engineering Studies:

Opportunities and Perils

Sebastian Elbaum (based on work performed with Kathryn Stolee)

90%

20%

Software Engineers

ICSE!Researchers

Software Engineers

ICSE!Researchers

Software Engineers

ICSE!Researchers

Software Engineers

ICSE!Researchers

ICSE!Researchers

Software Engineers

\

IntroductionMechanical Turk Study

Summary

MotivationBackgroundBackgroundObjective

Crowdsourcing Services (examples)

Companies with hard problems connectwith people interested in solving. 1,000+problems, 200,000+ solvers

Photographers collect with people whoneed stock photography. 3,000,000+members

Companies with scientific problemsconnect with retired scientists. 1,000+companies, 5,000+ scientists

People with many small tasks connect withscalable workforce. 100,000+ tasks,100,000+ workers

Kathryn T. Stolee & Sebastian Elbaum Crowdsourcing Empirical Studies in Software Engineering 4 / 18

\

IntroductionMechanical Turk Study

Summary

MotivationBackgroundBackgroundObjective

Crowdsourcing Services (examples)

Companies with hard problems connectwith people interested in solving. 1,000+problems, 200,000+ solvers

Photographers collect with people whoneed stock photography. 3,000,000+members

Companies with scientific problemsconnect with retired scientists. 1,000+companies, 5,000+ scientists

People with many small tasks connect withscalable workforce. 100,000+ tasks,100,000+ workers

Kathryn T. Stolee & Sebastian Elbaum Crowdsourcing Empirical Studies in Software Engineering 4 / 18

IntroductionMechanical Turk Study

Summary

MotivationBackgroundBackgroundObjective

Crowdsourcing Services (examples)

Companies with hard problems connectwith people interested in solving. 1,000+problems, 200,000+ solvers

Photographers collect with people whoneed stock photography. 3,000,000+members

Companies with scientific problemsconnect with retired scientists. 1,000+companies, 5,000+ scientists

People with many small tasks connect withscalable workforce. 100,000+ tasks,100,000+ workers

Kathryn T. Stolee & Sebastian Elbaum Crowdsourcing Empirical Studies in Software Engineering 4 / 18

\

Who are the workers (“Conducting behavioral research on Amazon’s Mechanical Turk”,

Winter Mason & Siddharth Suri, Behavioral Research Journal, 2011) Winter Mason & Siddharth Suri

• Median 30 years old and $30K salary

• 69% of U.S. workers: “Mechanical Turk is a fruitful way to spend free time and get some cash”

• Majority from US and India

• Work for at least $1.4/hour, average $4.5/hour

• Completion time is correlated with pay, but not linear

Create tasks

Set study

Search task

Select task

Complete task

Submit task

Verify results

MTurk

ICSE!Researchers

Software EngineersCrowdsource our studies

ICSE!Researchers

Software EngineersCrowdsource our studies

• Access to population of software engineers

• Low cost

• Speedy / adaptive experimentation

Potential

ICSE!Researchers

Software EngineersCrowdsource our studies

Initial Try• I may have solved the SE Empirical Challenge! Look

how many answers I am getting with a few dollars!

• oh… some of those answers are not that useful. Are these real software engineers?

• They are completing my exercise in seconds. How? Damn… they are gaming the system.

• ouch, I need to check thousands of answers.

• Ok, now let’s give them a “real” SE task.

Kathryn T. Stolee, Sebastian G. Elbaum: Exploring the use of crowdsourcing to support empirical studies in software engineering. ESEM 2010

Goal: evaluate the impact of smells/refactoring on end user programers preferences and understanding.

IntroductionMechanical Turk Study

Summary

MotivationBackgroundBackgroundObjective

Workflow in Mechanical Turk

Workers:

Searchfor Tasks

SelectTask

CompleteTask

SubmitTask

Kathryn T. Stolee & Sebastian Elbaum Crowdsourcing Empirical Studies in Software Engineering 6 / 18

IntroductionMechanical Turk Study

Summary

DefinitionPlanningOperationAnalysis

Experimental Task in Mechanical Turk

ExperimentDefinition

Design

Selection

Instrumentation

Operation

Analysis

Kathryn T. Stolee & Sebastian Elbaum Crowdsourcing Empirical Studies in Software Engineering 16 / 18

• 22 participants, 188 tasks completed, 2 weeks, $42

• Supported hypothesis

ICSE!Researchers

Software EngineersCrowdsource our studies

Potential

• Access to population of software engineers

• Low cost

• Speedy / adaptive experimentation

“… Academics are now taking advantage of Turk, and, from my own experience with the difficulties of recruiting students to experiments, I

suspect Turk’s use will only increase.” Scientific American 2011

Venue Task Soft Engs CostESEM 2010 /

TSE 2013Compare two mashups, determine

outcome (10min)~25 (30% SE),

188 tasks $42

FSE NIER 2012 / ESEM 2013

Write small program specification as input/ouput (compare with class

exercise, 15 min)

~25 ~100 tasks $25

TOSEM 2014Rank code search results from various tools, provide qualified feedback (10

min)

~50, ~300 tasks $300

… Survey on competing scenarios for emerging technology (15min)

+1000, +1000 tasks $600

Cost per task under a dollar!

Can we get …

• X software engineers to participate?

• the X kind of software engineers?

• software engineers to do X?

• software engineers to do a X seriously?

• …

Can we get …

• X software engineers to participate?

• the K kind of software engineers?

• X software engineers to do T?

• X software engineers to do a T seriously?

• …

Yes, $, T

Yes, QA, $.

Some Ts

Some Ts, QA, $

• Setting a baseline pay

• Market forces

• Enough to motivate software engineers

• Enough not to motivate others

• Too high perceived as “baiting” requesters

• Ethical concerns

• Multiple deployments

• Still IRB “cost” (with good reason)

$

• Decompose SE problem into tasks

• Small enough to attract participants / keep low cost

• Large enough to be valuable to SE Research

• Provide motivation other than money

• Design of experiments

• Tasks are small, need many to test hypothesis, need even more to study SE problem

• Bundled tasks help attract subjects

• Control for learning, gaming, …

$

Task

• Qualifications checks and Pre-tests

• Embed obvious/repeated mostly verifiable questions to check for robots, gamers, and level of attention

• Performance threshold to pay, or pay little to all but use good workers to seed next study

• Controlling revision costs with tasks on tasks

• Multiple deployments

• Compare performance of subjects in/out MTurk

$

Task

Quality

Limitations• Accidental

• Assignment of subjects to tasks (self-selection bias)

• Single users (small tasks, no collaboration)

• Rewards are mainly monetary

• Essential

• Separation from subjects (cannot observe/interact much)

• Very limited context information

• Mismatch with many SE problems

Alterna

tive In

frastr

uctur

es

Tempered enthusiasm• Not all SE problems can be broken into small tasks

• Many SE problems require a team and communication

• Many SE problems require time to develop

• Proof by Mturking

• Balancing task design, $, and thresholds is tricky

• Lack of contact and context with subjects

Charge• Great as initial empirical vehicle (better than ugrads :)

• Could be better

• Pool or pre-qualified workers

• Capabilities to design of more complex studies

• Connection stream to worker for follow-up (+context)

• Ability to control development environment

• …

MTurk for Software Engineering Studies?

Crowdsourcing Software Engineering Studies:

Opportunities and Perils

Sebastian Elbaum (based on work performed with Kathryn Stolee)