designing incentive for cooperative problem...

Designing Incentive forCooperative Problem Solving

in Crowdsourcing

Huan JiangFebruary 2015

Department of Social InformaticsKYOTO UNIVERSITY

Doctor Thesis ofIshida and Matsubara LaboratoryDepartment of Social InformaticsKyoto University

c© Copyright by Huan Jiang 2015All Rights Reserved

– –

Abstract

The goal of this thesis is to design incentives for workers in crowdsourcing-based cooperative task solving, with the aim of improving the quality of tasksolution.

Crowdsourcing is an online, distributed task-solving and web-basedbusiness model that has emerged in recent years. It is now admired as one ofthe most lucrative paradigm of leveraging collective intelligence to carry outa wide variety of tasks with high complexity. This very success is dependenton the potential for replacing a limited number of skilled experts with a mul-titude of unskilled crowds, by decomposing the complex tasks into smallerpieces of “micro-tasks” (or subtasks), such that each subtask becomes lowin complexity, requires little specialized skill, time and cognitive effort tobe completed by an individual, and models a step or operation needed in thesequence of producing a final solution to the complex task. However, it isalso possible that crowdsourcing-based work will fail to achieve its poten-tial. When facing a geographically distributed workforce who has variousknowledge domains and levels of abilities, task requesters always have diffi-culty to ascertain the quality of the submitted result. Moreover, the strategicbehaviors of the rational workers who aim at maximizing their own utilitiescould have great impact on the quality of the task solution.

Drawing on incentive design perspective, this thesis addresses follow-ing issues in crowdsourcing with respect to task-solving model construction,worker’s behavior analysis, incentive design and experimental implementa-tion.

1. Designing efficient task decomposition strategy for solving complex

– i –

tasks.

In order to facilitate crowdsourcing-based task solving, complex tasksare decomposed into smaller subtasks that can be executed either sequen-tially or in parallel by workers. These two task decompositions attract aplenty of empirical explorations in crowdsourcing. Our research focuses onthe complex tasks that can be decomposed into both a collective of indepen-dent and dependent subtasks. Of particular interest are the work that aimat analyzing workers’ strategic behaviors, comparing the efficiency of twotask decomposition strategies, in terms of final quality, and finally generat-ing explicit instructions on optimal task decomposition. To achieve thesegoals, we formally present and analyze those two task decompositions asvertical and horizontal task decomposition models, in which the final qual-ity is defined as the sum of the qualities of solutions to all decomposedsubtasks. Our focus is on addressing the efficiency (i.e., the quality of thetask’s solution) of task decomposition when the self-interested workers arepaid in two different ways — equally paid (group-based revenue sharing)and paid based on their contributions (contribution-based revenue sharing).By combining the theoretical analyses on worker’s strategic behavior andsimulation-based exploration on the efficiency of task decomposition, ourstudy 1) shows the superiority of vertical task decomposition over horizon-tal task decomposition in improving the quality of the task’s solution whenthe final quality is defined as the sum of the qualities of solutions to all de-composed subtasks; and 2) gives the explicit instructions on strategies foroptimal vertical task decomposition under both revenue sharing schemes tomaximize the quality of the task’s solution.

Furthermore, we design and conduct proofreading experiments onAmazon Mechanical Turk. The first series of experiments compare the effi-cacy of vertical decomposition and horizontal decomposition on proofread-ing task. The second series of experiments focus on the vertical task decom-position, and compare the situations where the first subtasks with differentdifficulties. Our experiments ran for two months, and were highly visible onthe MTurk crowdsourcing platform. According to the results we collected,

– ii –

we 1) verify the superiority of the vertical task decomposition strategy overthe horizontal one on proofreading task; and 2) apply some of the our pro-posed instructions on vertically decomposing the proofreading task, whichshow promise on improving the quality of the final outcome.

2. Limited budget allocation among workers in cooperative task solving.

As discussed in vertical task decomposition, in order to facilitatecrowdsourcing-based task solving, complex tasks are recommended to bedecomposed into smaller subtasks that are chained sequentially with inter-dependence. These collective subtasks are required to be executed cooper-atively by individual workers. Aiming to maximize the quality of the finalsolution subject to the self-interested worker’s utility maximization, a keychallenge is to allocate the limited budget among the sequential subtasks.This is particularly difficult in crowdsourcing marketplaces where the taskrequester posts subtasks with their payments sequentially and pays the pre-determined payment to the worker once the subtask is finished. This studyis the first attempt to show the value of Markov Decision Processes (MDPs)for the problem of optimizing the quality of the final solution by dynami-cally determining the budget allocation on sequentially dependent subtasksunder the budget constraints and the uncertainty of the workers’ abilities.Our simulation-based approach verifies that compared to some benchmarkapproaches, the proposed MDP-based payment planning is more efficient atoptimizing the final quality under the same limited budget.

3. Quality control on subtask solving.

Even though higher-quality output can be selected or aggregated by har-nessing appropriate mechanisms, the quality of the final output can dependheavily on the qualities of the individuals’ outputs. Therefore, quality con-trol on subtasks plays an important role in determining the success of thecomplex task solving. Crowdsourcing competitions have been found to beconducive to acquiring high quality solutions by encouraging redundancy.Such crowdsourcing marketplaces exhibit a similar structure — a micro-task is specified, a reward and time period are stated, and during the period

– iii –

workers compete to provide the best solution. After the competition, work-ers are then used to verify redundant solutions for quality control, by askingeach worker to select and go through some of solutions. Finally, at the con-clusion of the period, a subset of selected solutions are accepted as finalsolutions, and the corresponding workers are granted the reward.

However, the verification procedure presents weakness in terms of thedistribution of tasks that have been verified. Because of the various diffi-culties of the micro-tasks which determine the expected rewards associatedwith the verification executions, hard micro-tasks on demand endure a riskof long completion time since task selections of workers are congested oneasy micro-tasks. Strategic task selection behaviors result in uneven dis-tribution of solved micro-tasks, which can be viewed as low efficiency incrowdsourcing.

The goal of this work is to mitigate the uneven distribution problemto improve the crowdsourcing-based task solving from an efficiency stand-point. We explore crowdsourcing software development process, where thecrowdsourcing-based bug detection contest is imported as a solution exam-ination phase. This bug detection contest can be view as a particular mech-anism for redundant solution examination, and is readily extended to thecooperative task-solving process we formalized in the first two issues.

To achieve the goal, we first formalize and analyze the bug detectionmodel in which strategic players select code and compete in bug detectioncontests. Specifically, we model the bug detection contests as all-pay auc-tions, and rigorously analyze the relationship between strategic code selec-tion behaviors and the offered rewards. Secondly, division strategy is pro-posed to divide the workers into small divisions, and constrain each worker’scode selection behavior exclusively in the division she belongs to. Our studyshows that the division strategy can control two features of the bug detec-tion contest, in terms of the expected reward classes and the scales of abilitylevels, by intentionally assembling workers with particular ability distribu-tion in one division. In this way, division strategy is able to determine theworker’ strategic behaviors on code selection, and thus improve the bug de-tection efficiency. We analyze the division strategy characterized by skill

– iv –

mixing degree and skill similarity degree and find an explicit correspon-dence between the division strategy and the bug detection efficiency. Basedon our simulation results, we verified that the skill mixing degree, serv-ing as determinant factor of division strategy, controls the trend of the bugdetection efficiency, and skill similarity degree plays an important role inindicating the shape of the bug detection efficiency.

– v –

Acknowledgements

First and foremost, I would like to express my sincere gratitude to my su-pervisor, Associate Professor Matsubara Shigeo for the continuous supportof my Ph.D study and research. I greatly appreciate his encouragement, hisenthusiasm and immense knowledge which always help me out whenever Iencounter difficulties in my research. From Associate Professor MatsubaraShigeo, I earned a love and power for scientific research that will last a life-time. I could not have imagined having a better supervisor and mentor formy Ph.D study.

I own sincere thanks to Professor Toru Ishida for providing me such anexcellent opportunity to study to Japan. I am also deeply grateful to myadvisory committee members at Kyoto University, Professor Keishi Tajimaand Professor Yoshinori Hara, for all their valuable advice and interestingdiscussions. I would also like to thank the rest of my thesis committee: Pro-fessor Hisashi Kashima, for his encouragement, insightful comments, andhard questions. Besides, I would like to express my thanks to ProfessorMakoto Yokoo and Associate Professor Yuko Sakurai in Kyushu Univer-sity for providing me the opportunity to visit and giving generous care andattention of my research.

I would also like to show gratitude to the faculty members at Ishidaand Matsubara Laboratory: Associate Professor David Kinny, AssociateProfessor Hiromitsu Hattori, Associate Professor Yohei Murakami, Assis-tant Professor Yuu Nakashima, Assistant Professor Arisa Ema, ResearcherMasayuki Otani, Researcher Takao Nakaguchi. My sincere thanks also goesto Assistant Professor Donghui Lin, who gives me a lot help and support in

– vii –

my research life in Japan. Besides, I also greatly appreciate our coordi-nators, Ms. Terumi Kosugi, Ms. Hiroko Yamaguchi, Ms. Yoko Kubota,and Ms. Yoko Iwama, for the help on administrative affairs throughout theacademic process.

I thank all the members and alumni in Ishida and Matsubara Laboratory.Special thanks goes to all my fellow PhD students Ari Hautasaari, JulienBourdon-Miyamoto, Andrew W. Vargo, Chunqi Shi, Mairidan Wushouer,Amit Pariyar, Kemas Muslim Lhaksmana, Xin Zhou, Trang Mai Xuan,Shinsuke Goto, Hiroaki Kingetsu, Xun Cao and Nguyen Cao Hong Ngoc,and all my dearest girlfriends, Linsi Xia, Juan Zhou, Meile Wang, Nan Jin,Ling Xu, Wenya Wu, Meng Zhao, Yating Zhang, Bei Liu and Jie Zhou.Thank you all so much for enriching my life in Japan. We’ll be friendsforever.

Getting through my dissertation required more than academic support.I would like to thank my parents who have given me the strongest love andsupport. I would also like to thank my fiance, Peng Chen, for listening tome, accompanying me, encouraging me and, at times, having to tolerate me,but never force me. I cannot begin to express my gratitude and appreciationfor their unconditional love.

Last but not the least, I would like to show gratitude to Professor ZiliZhang and Professor Huiwen Deng in Southwest University, China, for thecontinuous guidance on my research. Special thanks are also given to Mr.and Mrs. Nomura for treating me as a family member during my stay inJapan.

This research was partially supported by a Grant-in-Aid for ScientificResearch (S) (24220002, 2012-2016) from Japan Society for the Promo-tion of Science (JSPS). My stay in Kyoto University was supported by theJapanese Government (Monbukagakusho) Scholarship.

– viii –

Contents

Abstract i

Acknowledgements vii

List of Figures xiii

List of Tables xiv

1 Introduction 11.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Approaches and Issues . . . . . . . . . . . . . . . . . . . . 41.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background 92.1 Characters of Crowdsourcing Applications . . . . . . . . . . 10

2.1.1 Crowdsourcing Community . . . . . . . . . . . . . 122.1.2 Crowdsourcing Platforms . . . . . . . . . . . . . . 13

2.2 Research Issues in Crowdsourcing . . . . . . . . . . . . . . 152.2.1 Strategic Behavior . . . . . . . . . . . . . . . . . . 152.2.2 Reputation Systems . . . . . . . . . . . . . . . . . . 182.2.3 Efficiency and Quality Issue . . . . . . . . . . . . . 18

2.3 Mechanism Design in Crowdsourcing . . . . . . . . . . . . 192.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Efficient Task Decomposition in Crowdsourcing 23

– ix –

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.1 Vertical Task Decomposition . . . . . . . . . . . . . 283.3.2 Horizontal Task Decomposition . . . . . . . . . . . 313.3.3 Revenue Sharing Schemes . . . . . . . . . . . . . . 32

3.4 Strategic Behaviors Analysis . . . . . . . . . . . . . . . . . 333.4.1 Independent Behaviors of Individuals . . . . . . . . 343.4.2 Dependent Behaviors in Vertical Task Decomposition 35

3.5 Task Decomposition Strategy Analysis and Comparison . . . 403.5.1 Utility Specification . . . . . . . . . . . . . . . . . 403.5.2 Efficiency Comparison . . . . . . . . . . . . . . . . 413.5.3 Vertical Task Decomposition Strategy . . . . . . . . 46

3.6 Proofreading Experiments on MTurk . . . . . . . . . . . . . 513.6.1 Experimental Design: Comparison of Task Decom-

position Strategies . . . . . . . . . . . . . . . . . . 533.6.2 Experimental Design: First Subtasks with Different

Difficulties in Vertical Task Decomposition . . . . . 553.7 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . 55

3.7.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . 553.7.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . 59

3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.8.1 Discussion on Two Revenue Sharing Schemes . . . 61

4 MDP-Based Reward Design for Efficient Cooperative ProblemSolving 654.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 684.3 The Model and Problem Statement . . . . . . . . . . . . . . 68

4.3.1 Player-Specific Ability . . . . . . . . . . . . . . . . 694.3.2 Value of Accomplishing A New Subtask . . . . . . . 71

4.4 MDP-based Payment Planning . . . . . . . . . . . . . . . . 754.4.1 Markov Decision Processes . . . . . . . . . . . . . 75

– x –

4.4.2 Payment Planning in MDPs . . . . . . . . . . . . . 774.4.3 Continuous States and Discretization . . . . . . . . 78

4.5 Planning Algorithm . . . . . . . . . . . . . . . . . . . . . . 804.5.1 Solving an MDP . . . . . . . . . . . . . . . . . . . 804.5.2 UCT algorithm . . . . . . . . . . . . . . . . . . . . 81

4.6 Efficiency Analysis . . . . . . . . . . . . . . . . . . . . . . 824.6.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . 824.6.2 Simulation results . . . . . . . . . . . . . . . . . . 86

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 A Division Strategy for Efficient Quality Control 915.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 915.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 945.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.3.1 Bug Detection Model . . . . . . . . . . . . . . . . . 955.3.2 Contest Selection . . . . . . . . . . . . . . . . . . . 97

5.4 Bug Detection Efficiency Based on Division Strategy . . . . 995.4.1 Division Strategy . . . . . . . . . . . . . . . . . . . 1015.4.2 Bug Detection Efficiency . . . . . . . . . . . . . . . 1035.4.3 Example Study: Key Factors in Efficient Division

Strategy . . . . . . . . . . . . . . . . . . . . . . . . 1065.5 Simulation and Analysis . . . . . . . . . . . . . . . . . . . 108

5.5.1 Setting . . . . . . . . . . . . . . . . . . . . . . . . 1095.5.2 Results and Discussion . . . . . . . . . . . . . . . . 110

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6 Conclusion 1196.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 1196.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . 121

Bibliography 123

Publications 135

– xi –

List of Figures

3.1 Positive quality dependence in a two-subtask situation. . . . 313.2 Efficiency comparison of vertical and horizontal task de-

compositions. . . . . . . . . . . . . . . . . . . . . . . . . . 413.3 Efficiency comparison of two revenue sharing schemes for

vertical task decomposition strategy. . . . . . . . . . . . . . 453.4 Efficiency estimation of vertical task decomposition strat-

egy under group-based revenue sharing scheme. . . . . . . . 483.5 Efficiency estimation of vertical task decomposition strat-

egy under contribution-based revenue sharing scheme. . . . 503.6 Results of proofreading experiment under horizontal task

decomposition strategy. . . . . . . . . . . . . . . . . . . . . 573.7 Results of proofreading experiment under vertical task de-

composition strategy where errors are identified in phrase-wise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.8 Results of proofreading experiment under vertical task de-composition strategy where errors are identified in sentence-wise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.9 Correct rate comparison between phrase-wise and sentence-wise proofreading. . . . . . . . . . . . . . . . . . . . . . . 60

4.1 Subtask difficulty and positive quality dependence. . . . . . 744.2 Decision tree representation of MDP-based payment plan-

ning with discretized state space for a two-subtask solvingprocess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

– xiii –

4.3 Efficiency comparison for average difficulties. . . . . . . . . 864.4 Efficiency comparison for increasing difficulties. . . . . . . 874.5 Efficiency comparison for decreasing difficulties. . . . . . . 88

5.1 Group players and contests into different classes . . . . . . . 975.2 Contest selection behavior in equilibrium. . . . . . . . . . . 995.3 Relative importance of two key factors of efficient division

strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.4 Bug detection efficiency under linear skill distribution. . . . 1125.5 Bug detection efficiency under concave skill distribution. . . 1135.6 Bug detection efficiency under convex skill distribution. . . . 114

– xiv –

List of Tables

2.1 Examples of crowdsourcing applications. . . . . . . . . . . 10

3.1 Weight combinations for 2-subtask situation. . . . . . . . . . 423.2 Weight combinations for 3-subtask situation. . . . . . . . . . 433.3 English articles for proofreading task. . . . . . . . . . . . . 523.4 Information for experiment 1. . . . . . . . . . . . . . . . . . 543.5 Information for experiment 2. . . . . . . . . . . . . . . . . . 56

4.1 Summary of parameters in MDP-based budget allocationsimulation. . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.1 An example of bug detection efficiency increases with theskill mixing degree of division strategy. . . . . . . . . . . . 105

5.2 An example of bug detection efficiency increases with theskill similarity degree. . . . . . . . . . . . . . . . . . . . . 107

5.3 Summary of parameters in division strategy simulation. . . . 110

– xv –

Chapter 1

Introduction

Crowdsourcing is an online, distributed problem-solving and web-basedbusiness model that has emerged in recent years. Jeff Howe and MarkRobinson came up with the term “crowdsourcing” in an issue of Wired mag-azine in 2006 [Howe, 2006]. Generally, tasks or problems that need to besolved are broadcast to an unknown pool of people and an open call is an-nounced for solution collection. The pool of people, known as the workers,select the tasks based on their own preferences, exert their efforts indepen-dently or cooperatively and finally submit solutions. Solutions are then ver-ified and the best solutions are selected and possessed by the task requesterswho broadcast the task. The winning individuals provided the best solu-tions usually rewarded, monetarily or non-monetarily. Crowdsourcing nowis admired as one of the lucrative paradigms of leveraging the collective in-telligence of crowds, including both professionals and amateurs, which out-performs individual experts provided that certain controls on the workers’behavior and the working process [Howe, 2009] [Kittur, 2010]. In addition,crowdsourcing-based task solving has the added advantage of being cheaperto conduct [Kittur et al., 2008].

– 1 –

1.1 Objectives

As a successful business model, crowdsourcing applications are imple-mented by various organizations. Amazon Mechanical Turk (mturk.com)— one of the most popular crowdsourcing marketplaces, was introduced byAmazon in 2005 [Ipeirotis, 2010]. It allows for easy distribution of varioustypes of tasks to a multitude of unskilled workforce, and shows considerablepromise in having the broadcast tasks executed rapidly and successfully.These tasks requiring human intelligence, such as identifying objects in im-ages, finding relevant information, or doing a natural language processing,are typically “micro-tasks”, which are characterized by low complexity andrequire little specialized skill, time and cognitive effort to be completed byindependent individuals.

However, these features set the threshold for crowd employers to re-quest complex tasks. In contrast to micro-tasks posted on Amazon Mechan-ical Turk, much of the work on demand in real-world if often much morecomplex, consists of interdependent subtasks, and requires various special-ized skills, significant time and cognitive effort. Therefore, complex tasksrequire substantially more cooperative work among co-workers than do theindependent and self-contained micro-tasks [Malone and Crowston, 1994].

In order to take the advantage of the micro-task marketplaces for ac-complishing complex tasks, there has recently been work on addressing thetask-solving workflow design. Kittur et al. [Kittur et al., 2011] made thefirst contribution to conceptualize a framework for accomplishing complextask and interdependent tasks by using micro-task markets. Kittur et al. de-fined a general-purpose framework called Crowdforge, which treats parti-tion — to break a larger task down into discrete subtasks, map — to processa specified task by one or more workers, and reduce — to merge the resultsof multiple workers’ tasks into a overall output, as the basic building blocksfor distributed workflows, enabling complex tasks to be solved by coop-erative workers. Crowdforge delineates the case study on article writing,which consists of three sequential subtasks, as outline construction, infor-mation collection and writing. On average, five articles produced by the

– 2 –

above process are of higher qualities than those produced by independentindividuals. Related studies expand of this finding. Turkomatic comes upwith the innovative idea that ask the crowd to help decomposing the tasksinto set of subtasks that can be posted in parallel or must be serially exe-cuted [Kulkarni et al., 2011] [Kulkarni et al., 2012]. Soylent contributes thethree-stage pattern, Find-Fix-Verify, to text editing services in word process-ing, with high feasibility and efficiency [Bernstein et al., 2010]. Similarly,PlateMate provides Tag-Identify-Measure for nutritional analysis from pho-tos of meals [Noronha et al., 2011].

Although all these workflows show potential to sum the micro-tasksto significant productive labor, fully leveraging these systems for acquir-ing high quality task solution remains challenging. First, preparing com-plex tasks for crowdsourcing marketplaces requires careful attention to thestrategy of decomposing tasks into multiple subtasks, since it determineshow subtasks dependent on each other and the way multiple workers orga-nized during the task-solving process. However, hindered by the absenceof formalized model, which takes into account managing the dependenciesamong subtasks, explicit analysis on efficient task decomposition is alwaysneglected in previous research.

Secondly, a significant challenge in the above-mentioned crowdsourcing-based complex task solving has been designing an incentive scheme forthe task requester to maximize expected output quality subject to a lim-ited budget on the whole complex task, taking into account aspects of theself-interested individual worker’s utility maximization. This is particularlydifficult when task requesters face workers with unknown and heteroge-neous abilities. Given these difficulties, existing approaches typically pro-vide task requesters with simple rules of thumb for budgeting the tasks bylooking at the historical data of the crowdsourcing marketplaces. By com-paring the nature of the tasks, the historical activity levels of the workersand the qualities of outcomes, task requesters struggle to distribute appro-priate rewards among their tasks, to hope for the best possible task solution[Faradani et al., 2011] [Ipeirotis, 2010]. However, without formal models,improper budget allocation commonly leads to task starvation or inefficient

– 3 –

use of capital.Third, quality assurance on subtask executions can substantially im-

prove the quality of the final task solution. Crowdsourcing competitionshave been found to be conducive to acquiring high quality solutions byencouraging redundancy [Archak, 2010] [DiPalantino and Vojnovic, 2009][Yang et al., 2008a]. Such crowdsourcing marketplaces exhibit a similarstructure — a micro-task is specified, a reward and time period are stated,and during the period workers compete to provide the best solution. Af-ter the competition, workers are then used to verify redundant solutions forquality control, by asking workers to select and go through some of solu-tions. Finally, at the conclusion of the period, a subset of selected solutionsare accepted as final solutions, and the corresponding workers are grantedthe reward. However, the verification procedure presents weakness in termsof the distribution of tasks that have been verified. Because of the variousdifficulties of the micro-tasks which determine the expected rewards associ-ated with the verification executions, hard micro-tasks on demand endure arisk of long completion time since task selections of workers are congestedon easy micro-tasks. Strategic task selection behaviors result in uneven dis-tribution of solved micro-tasks, which can be viewed as low efficiency incrowdsourcing.

Therefore, the objective of this thesis is to design workers incentives incrowdsourcing-based cooperative task solving, with the aim of improvingthe quality of task solution. Of particular interest are the works that aimat task-solving model construction, worker’s behavior analysis, incentivedesign and experimental implementation.

1.2 Approaches and Issues

In order to address issues on incentive design in crowdsourcing-based co-operative task solving, we take the following steps in this thesis.

First we construct the cooperative problem-solving model by definingtwo task decompositions as horizontal task decomposition for independent

– 4 –

subtasks, and vertical task decomposition for dependent subtasks. We alsoshow that the dependence among subtasks can be formalized as the degreeto which a subtask’s difficulty depends on the qualities of the other subtasks.Based on the formalized model, we rigorously analyze the strategic behav-iors of the workers, with the aim of understanding the relationship betweenworkers’ strategic effort exertion and the reward in cooperative task solv-ing. Then simulations are conducted to analyze and compare the efficiencyof two task decompositions, aiming at generating explicit instructions onstrategies for optimal task decomposition. It is worth noting that we viewthe workers as self-interested throughout these issues, as it is the funda-mental criterion in incentive design. Finally proofreading experiments onAmazon Mechanical Turk are designed and implemented. Empirical resultscollected in the experiments verify the reasonableness of the model, andshow promise of our proposed instructions on vertically decomposing theproofreading task on improving the quality of final output.

Secondly, we address the design of incentive scheme for the task re-quester to maximize expected output quality, taking into account aspectsof sequential quality dependence among subtasks as formalized in theabove model. This is particularly difficult in crowdsourcing marketplaceswhere the task requester posts subtasks with their payments sequentiallyand pays the predetermined payment to the worker once the subtask is fin-ished. Markov Decision Processes (MDPs) are capable of capturing thetask-solving process’s uncertainty about the real abilities of a succession ofworkers, that is, the uncertainty about the qualities of the sequential sub-tasks. To the best of our knowledge, our work in this thesis is the firstattempt to apply MDPs to payment allocation for sequentially dependentsubtasks in crowdsourcing. By modeling this problem as MDPs, we showthe value of MDPs for optimizing the quality of the final solution by dy-namically determining the budget allocation on sequential and dependentsubtasks under the budget constrains. Finally, a simulation-based approachverifies that comparing to some benchmark approaches, our MDP-basedpayment planning is more efficient on optimizing the final quality under thesame budget constraints.

– 5 –

The third issue we address in this thesis is to mitigate the uneven distri-bution problem on solution examination to manage the quality of the sub-task’s solution. To achieve this goal, we engage in the challenge of balanc-ing the freedom and restriction on solution selection. It is noteworthy thatcrowdsourcing provides the policy of maximum freedom of task selection,which encourages crowds to solve the tasks based on their own preferencesand thus becomes indispensable for the crowdsourcing’s prosperity. Be-cause of that, the traditional assignment problem [Kuhn, 1955], which hasbeen well studied to solve the problem of assigning a given set of workerswith varying preferences to a given set of jobs in such a way that, maximumefficiency can be achieved, is not appropriate to be used in crowdsourcingsituation. By contrast, division strategy is designed and considered to behighly effective. The basic idea of division strategy is to control the work-ers’ skill distribution in each division by strategically dividing the workerswith different skills into divisions, and restrict solution examination in thedivision, i.e., the workers can only select from the solutions produced bythe workers in the same division. By doing this, we are able to guide theworkers to select from the codes with more appropriate levels of qualitiesthat correspond to their skill levels. Therefore, the division strategy can bereliable to work out a more uniform distribution of examined solutions andthus an improvement in subtasks’ qualities.

1.3 Thesis Outline

This thesis consists of six chapters, including this Introduction Chapter.Chapter 2 aims to provide a brief glimpse of the literature review on

the topic of mechanism design in crowdsourcing. By exploring the relatedliterature, we find out both the essential characters of practical crowdsourc-ing systems and the crucial research issues in crowdsourcing mechanismdesign. Furthermore, we classify some dominating mechanism design ap-proaches according to those diverse issues. Since there are a lot of crucialresearch topics in mechanism design in crowdsourcing, we mainly focus on

– 6 –

the issue of incentive design for solution quality assurance and efficiencyimprovement.

In order to facilitate crowdsourcing-based task solving, complex tasksare decomposed into smaller subtasks that can be executed either sequen-tially or in parallel by workers. These two task decompositions attract plentyof empirical explorations in crowdsourcing. However the absence of a for-mal study makes it difficult to provide task requesters with explicit guide-lines on task decomposition. In Chapter 3, we formally present and analyzethose two task decompositions as vertical and horizontal task decomposi-tion models. Our focus is on addressing the efficiency (i.e., the quality ofthe task’s solution) of task decomposition when the self-interested work-ers are paid in two different ways — equally paid and paid based on theircontributions. By combining the theoretical analyses on worker’s behaviorand simulation-based exploration on the efficiency of task decomposition,our study 1) shows the superiority of vertical task decomposition over hor-izontal task decomposition in improving the quality of the task’s solution;and 2) gives explicit instructions on strategies for optimal vertical task de-composition under both revenue sharing schemes to maximize the quality ofthe task’s solution. Furthermore proofreading experiments on Amazon Me-chanical Turk are designed and implemented. Empirical results collected inthe experiments verify the reasonableness of the model, and show promiseof our proposed instructions on vertically decomposing the proofreadingtask on improving the quality of final output.

Aiming to maximize the quality of the final solution subject to the self-interested worker’s utility maximization, a key challenge is to allocate thelimited budget among the sequential subtasks. This is particularly diffi-cult in crowdsourcing marketplaces where the task requester posts subtaskswith their payments sequentially and pays the predetermined payment tothe worker once the subtask is finished. Our study in Chapter 4 is the firstattempt to show the value of Markov Decision Processes for the problemof optimizing the quality of the final solution by dynamically determiningthe budget allocation on sequentially dependent subtasks under the budgetconstraints and the uncertainty of the workers’ abilities. Our simulation-

– 7 –

based approach verifies that compared to some benchmark approaches, theproposed MDP-based payment planning is more efficient on optimizing thefinal quality under the same limited budget.

In Chapter 5 we construct and analyze a crowdsourcing-based bug de-tection model in which strategic players select code and compete in bugdetection contests. We model the contests as all-pay auctions, and our fo-cus is on addressing the low efficiency problem in bug detection by divisionstrategy. Our study shows that the division strategy can control two featuresof the bug detection contest, in terms of the expected reward classes andthe scales of skill levels, by intentionally assembling players with particularskill distribution in one division. In this way, division strategy is able to de-termine the players’ strategic behaviors on code selection, and thus improvethe bug detection efficiency. We analyze the division strategy characterizedby skill mixing degree and skill similarity degree and find an explicit cor-respondence between the division strategy and the bug detection efficiency.Based on our simulation results, we verified that the skill mixing degree,serving as determinant factor of division strategy, controls the trend of thebug detection efficiency, and skill similarity degree plays an important rolein indicating the shape of the bug detection efficiency.

Finally, Chapter 6 concludes the thesis by summarizing the results ofthis research. We also discuss possible future work in this section.

– 8 –

Chapter 2

Background

Crowdsourcing is getting thousands of talented people to work for large orsmall business-related tasks that a company would normally either performitself or outsource to a professional third-party provider. Crowdsourcingnow is admired as one of the best way of leveraging the collective intelli-gence of crowds, where groups of people, including both professional andamateur, outperform individual experts. Although for task providers, crowd-sourcing promises significant cost-savings, quicker task completion times,and formation of expert communities, many aspects of crowdsourcing de-centralized management are under vigorous refinement, in order to motivatethe customers to exert their best effort independently or cooperatively.

Mechanism design, which emerged from economic game theory in the1970s, is now shaking hands with information technology, and come tocrowdsourcing management’s rescue. Indeed, the basic idea of mechanismdesign and its theoretical conclusions built on a formal mathematical baseare enabling advances in crowdsourcing management technology. In thischapter, we review the current research on incentive design for crowdsourc-ing, trying to 1) find out both the essential characters of practical crowd-sourcing systems and the crucial research issues in crowdsourcing incentivedesign, and 2) classify the dominating mechanism design approaches ac-cording to those diverse issues, and provide feasible or potential (relativelynew) approaches to mechanism design on crowdsourcing. This survey paper

– 9 –

mainly focus on the issue of incentive design in crowdsourcing.This Chapter is arranged as follows. Section 2.2 goes through current

research literature on real crowdsourcing platforms and applications, aimingto classify these platforms based on their uses and characterize their mainfeatures. Research issues in crowdsourcing incentive design are discussedin Section 2.3. In Section 2.4, we classify some dominating incentive de-sign approaches according to those diverse research issues, especially on theincentive design for solution quality assurance and efficiency improvement.Summary is given in Section 2.5.

2.1 Characters of Crowdsourcing Applications

As an emerging and successful business model [Howe, 2009], crowdsourc-ing applications are implemented by various organizations in the hopes thatthe participation of an online open call results in the design of products orsolving of problems. Such crowdsourcing platforms mushrooms in the pastfew years (see Table 2.1). The accompanying table shows a set basic CSsystem types. The set is not meant to be exhaustive; it shows only thoseplatforms that have received most attention.

Table 2.1: Examples of crowdsourcing applications.

Organizer (URL) CharacteristicTopCoder(topcoder.com) As on InnoCentive, companies can post soft-

ware development problems on TopCoder.comand crowd compete to provide the best solu-tion. Additionally, TopCoder.com site offerscontests itself to help finding the best program-mer.

Amazon Mechanical TurkContinued on next page

– 10 –

Table 2.1 – continued from previous pageOrganizer (URL) Characteristic

(mturk.com) Amazon Mechanical Turk is an online labormarket where employees are recruited by em-ployers for the execution of tasks in exchangefor a reward.

Taskcn(taskcn.com) Taskcn is one of the biggest Witkey websites

in China where users offer monetary awardsfor solutions to problems. Other users pro-vide solutions in the hopes of winning theawards. Taskcn.com has more than 1.7 reg-istered users.

Stack Overflow(stackoverflow.com) A language-independent collaboratively edited

question and answer site for programmers. Itencourages professional and enthusiast pro-grammers to ask practical, answerable ques-tions based on actual problems that they face.

Yahoo! Answers(answers.yahoo.com) Yahoo! Answers is an online question and

answer forum, allows users to ask questionsabout a variety of topics. Other users com-pete to provide the best answer to the questionposted by the other user. Winner is decided bythe questioner.

Threadless(threadless.com) Threadless.com is a web-based t-shirt com-

pany that crowdsources the design process fortheir shirts through an ongoing online crowdcontests. The winner is decided upon the votesgiven by the same crowd.

Continued on next page

– 11 –

Table 2.1 – continued from previous pageOrganizer (URL) Characteristic

Wikipedia(wikipedia.org) Wikipedia is an online encyclopedia estab-

lished in 2001. Mainly because its content ispublicly editable, it becomes the most prolificcollaborative authoring projects ever sustainedin an online environment.

2.1.1 Crowdsourcing Community

Some research tend to clearly differentiate crowdsourcing modelfrom virtual community model, as declared by Haythornthwaite[Haythornthwaite, 2009] that a crowdsourcing model is based on micro-participation from individuals who are unconnected. By contrast, virtualcommunity model is based on strong connections among a committed setof connected members. We respectfully disagree with the above opinion.Although crowdsourcing starts with decentralization, by sourcing tasks tra-ditionally performed by specific individuals to a group of people within acompany through an open call, which is different from sites such as Twit-ter or Facebook, which, at the beginning, do not have open call for con-tributions, people (crowds) engaged in a crowdsourcing open call task are“connected” in the following ways [Zhang et al., 2007].

− Similar domain knowledge. Crowds working on the same task,competitively or cooperatively, possess the similar domain knowledgewhich provides the foundation for them to communicate. Moreover,some crowdsourcing websites tend to provide support and informa-tion to people in certain fields. For instance, Yahoo! Answers andStack Overflow are both Q&A platforms, as mentioned in Table 1,however, the latter is specialized on programming questions.

– 12 –

− Communication. The Web is an instant communications platform,where messages, and thus idea exchange is convenient. Most crowd-sourcing websites that display content now also allow user commentsand discussions, and attract a huge volume of user posts. For in-stance, TopCoder display public forum for program developers forevery competition category, and it also allow face to face meetings.

− Public opinion. Since the crowdsourcing platforms involve signifi-cant interactions between users such as forum discussions and face toface meetings, public opinion toward a person or a group of people isformed gradually.

2.1.2 Crowdsourcing Platforms

Collective intelligence [Wolpert and Tumer, 1999] refers to a large dis-tributed collection of interacting computational processes among whichthere is little to no centralized communication or control, involving groupsof individuals collaborating or competing to create synergy, somethinggreater than each individual part [Surowiecki, 2004]. Crowdsourcing is aspecial case of such Collective Intelligence. It leverages the wisdom ofcrowds and is already changing the way groups of people produce knowl-edge, generate ideas and make them actionable [Surowiecki, 2005].

General-purpose.

General-purpose crowdsourcing platform provide a market in which any-one can post tasks to be completed and specify prices paid for completingthem. The inspiration of the system was to have human users complete sim-ple tasks that would otherwise be extremely difficult (if not impossible) forcomputer to perform. Examples of such problems in practice include ob-ject recognition, language understanding, text summarization, ranking, andlabeling [Von Ahn and Dabbish, 2004] [Fuxman et al., 2008].

– 13 –

− Amazon’s Mechanical Turk. The Amazon Mechanical Turk sys-tem[10] manages task submission, assignment, and completion,matches qualified people with tasks that require particular skills,provides a feedback mechanism to encourage quality work, andstores task details and results, all behind a Web services inter-face. The first paper investigating Mechanism Turk as a user studyplatform has amassed over one hundred citations in the past years[Kittur et al., 2008].

− Witkey Website is a new type of knowledge market website, in whicha user offers a monetary award for a question or task and other userscompete for the award. Taskcn.com [Yang et al., 2008b] is one thebiggest Witkey community in China.

Knowledge sharing.

Crowdsourcing not only let users capture and validate data, but also sharesheer amount of “data” such as products, services, textual knowledge, andstructured knowledge [Doan et al., 2011].

− Systems that share products and services include Napster, YouTube,and CPAN.

− Systems that share textual knowledge include mailing list, Twit-ter, how-to repositories (such as ehow.com, which lets users con-tribute and search how to articles), Q&A Web sites[Nam et al., 2009][Harper et al., 2009] (such as Stack Overflow, Yahho!Answers[Adamic et al., 2008], Yelp [Luca, 2011], Google Answers, whichis no longer accepting questions), online customer support systems(such as QUIQ, which powered Ask Jeeves’ AnswerPoint, a Yahoo!Answers-like site).

− Systems that share structured knowledge (for example, relational,XML, RDF data) include Swivel, Many Eyes, Google Fusion Ta-

– 14 –

bles, Google Base, many e-science Web sites (such as bmrb.wisc.edu,galaxyzoo.org).

Creative co-creation.

Crowdsourcing is using the innovative potential of customers to drive inno-vation. For example, the demands, wishes, and requirements of customersare often used systematically for new product development. In crowdsourc-ing, customers are treated not only as the recipients of products, but also asa source of innovation. Crowdsourcing platform examples include:

− Spreadshirt - shirt community− JuJups - personalized gifts− Threadless - create and sell your t-shirts− Naked&Angry - threadless for ties and wall coverings− Cafepress - shop, create or sell what’s on your mind− Zazzle - create and sell products− CreateMyTattoo - crowdsourced tattoo design− Sellaband - crowdfunded bands− Artistshare - fans funding new artists− Quirky - community product development− Jovoto - co-creation and mass collaboration− Dream Heels - design your dream heels.

2.2 Research Issues in Crowdsourcing

2.2.1 Strategic Behavior

Task selection

It is noteworthy that crowdsourcing provides the policy of maximum free-dom of task selection for the problem answerer. According to the empiricalstudy on data analysis, main conclusions are obtained as follows:

– 15 –

− Yang et al. [Yang et al., 2008b] explored the relationship betweentask properties and user’s participation on Taskcn.com. Task prop-erty variables that influence participation include task category, skillrequirement, workload and reward. Details can be found in the Yang’swork.

− Statistics show that the tasks with relatively low reward havehigh probability to be chosen and be solved. The problemposter, or seekers in InnoCentive parlance, pay solvers anywherefrom 10,000to100,000 per solution. According to Lakhani et al.[Lakhani et al., 2007], 30% of all problems on InnoCentive have beensolved. Dean [Dean and Image, 2008] shows that the rewards forthose solved problems are typically in the $10,000 to $25,000 range.

− Empirical study on Topcoder.com [Archak, 2010] shows that in orderto deter entry of their opponents in the contest they like, higher ratedcontestants would register early in the contest, while the lower ratedcontestants will prefer to wait until the higher rated opponents madtheir choice.

− Users are more and more likely to choose tasks with fewer competi-tors [Yang et al., 2008b]. Intentionally choosing less popular tasks toparticipate could potentially enhance winning probabilities, even ifone’s own expertise remains the same.

It implies that users with high abilities tend to compete with users withlow abilities in low skill required tasks. It brings some serious problems, forexample,

1. users with low abilities have small winning probability. These userswith low abilities who are interested in participating might be hin-dered by the very likely futility of their efforts, since winning plays acrucial role in continuous contribution [Yang et al., 2008b].

2. The posted problems that require high skill, even provide high reward,

– 16 –

are unlikely to be solved.

Timing of submission

The timing of users’ submissions is an important participation dynamic,since users can choose to submit early, or wait to see how many other sub-missions a task receives. Empirical analysis on the solution submission datafrom Taskcn.com shows [Yang et al., 2008b]:

− For all users, higher task award correlates with users submitting solu-tions later.

− Users, both who have won at least once (winners) and non-winnersalike, are likely to submit slightly later as they participate over time.

− The number of the submissions is negatively correlated with the timewhen people submit.

Uneven participation and outcome

By investigating the users’ structural prestige [Barabasi and Albert, 1999],it demonstrates an evident scale-free nature which fact indicates the uneveninteractions among the users. The majority of users participates in a fewtasks and wins fewer, while there is an extremely small group of users whohave had many submissions or have successfully won money. Specifically,by examining the behavior of users of the web-knowledge sharing marketTaskcn.com, Yang et al. [Yang et al., 2008b] [Yang et al., 2008a] find sig-nificant variation in the expertise and productivity of the participating users:a very small core of successful users contributes nearly 20% of the winningsolutions on the site.

– 17 –

2.2.2 Reputation Systems

Crowds are engaged and the wisdom of crowds is being used in a number ofway, e.g., by providing their members with access to knowledge of interestin return for their contribution, commonly in the monetary form, or by en-abling them to build their own digital reputation. Jøsang, Ismail and Boyd[Jøsang et al., 2007] give an overview of existing and proposed systems thatcan be used to derive measures of trust and reputation for Internet transac-tions, to analyze the current trends and developments in this area, and topropose a research agenda for trust and reputation systems.

Typical proposals in reputation system design research are:

− Providing some measure of protection for market participants againstdishonest traders, since in the filed of multi-agent systems, the suc-cess of an agent may depend on its reliable partners. A partic-ular focus has been on the electronic marketplace, such as eBay[Resnick et al., 2006] [Houser and Wooders, 2006].

− Providing an incentive for good behavior to achieve a positive effecton market quality [Zhang and Van der Schaar, 2012].

− User reputation is used as a heuristic to identify and pro-mote high quality contributions [Tausczik and Pennebaker, 2011][Sylvan, 2010] [Harper et al., 2008].

2.2.3 Efficiency and Quality Issue

Nielsen [Nielsen, 2006] analyzed the phenomenon of “participation in-equality” in online communities. Nielsen advocates a general 90-9-1 rule:In most online communities, 90% of users are lurkers who never contribute,9% of users contribute a little, and 1% of users account for almost all theaction. Based on this, participation inequality is deemed to be a function ofhuman behavior and on that is almost impossible to overcome.

Stewart et al. [Stewart et al., 2010] argue for a SCOUT (Super contribu-

– 18 –

tor, Contributor, OUTlier) framework wherein 33% are the (OUT)liers whodo very little, 66% are the (C)ontributors who provide moderate contribu-tions, and 1% are the (S)uper contributors who offer super effort.

2.3 Mechanism Design in Crowdsourcing

The development of mechanism design theory began with the work ofLeonid Hurwicz [Hurwicz, 1960]. He defined a mechanism as a commu-nication system in which participants send messages to each other and / orto a “message center”, and where a pre-specified rule assigns an outcome(allocation of resources), for every collection of received messages. Un-til now, lots of the interest focused on two distinguishing strands: (i) theinformational and computational costs of this decentralized resource alloca-tion mechanism; and (ii) how can the desired outcome be achievable whenindividuals act according to their own best interests. That is how to dealwith the goal conflicts problem due to the multiplicity of individuals. Hur-wicz introduced the key notion of incentive-compatibility [Hurwicz, 1972],which enables the analysis to incorporate the incentive of the agents whichare self-interested and have private information.

Mechanism design based on the game theory is applied to tackle withthe issues we discussed in section 2.2 [Jain and Parkes, 2009]. In this sur-vey, we will mainly focus on the incentive design with the purpose of im-proving the solution quality and efficiency in crowdsourcing.

In terms of designing incentives to influence an agent’s behavior whenthe agent’s preferences are unknown, this work is related to work by Zhanget al. [Zhang et al., 2009] on environment design and policy teaching. En-vironment design considers the problem of perturbing agent decision prob-lems in order to influence their behavior. Policy teaching considers the par-ticular problem of trying to influence the policy of an agent following aMarkov Decision Process by assigning rewards to states.

Traditional economic theory has generally espoused the view that ra-tional workers will choose to improve their performance in response to a

– 19 –

scheme that rewards such improvements with financial gain. Empirical re-search shows evidence that monetary reward does encourage people’s par-ticipation [Yang et al., 2008b] [Mason and Watts, 2010] or motivate peopleto speak their real preference [Jurca and Faltings, 2007]. DiPalantino andVojnovic [DiPalantino and Vojnovic, 2009] suggest that it is important thatmonetary award mechanism should be designed to induce the appropriatelevels of participation and quality of submissions. They model contests asall-pay auctions and demonstrate the precise relationship between incentivesand participation, including task selection in crowdsourcing contest.

Contests where multiple prizes are awarded are ubiquitous and be-come an effective way to motivate all users to maximize their ef-fort. The research on understanding the incentive effects of multipleprizes on effort investment attracts great attention [Clark and Riis, 1998a][Clark and Riis, 1998b] [Moldovanu and Sela, 2001] [Cason et al., 2010].Very general conclusion is that a first prize always results in a positive in-centive to invest effort, second and later prizes lead to ambiguous effects.

Crowdsourcing platforms such as Wikipedia harness collaborative ef-fort to accomplish a comparatively complicated task. Cooperative gametheory and the Shapley value are used to design revenue sharing schemes toincentivize users [Abbassi and Misra, 2011].

Although some researches reveal the positive relationship between fi-nancial incentives and performance in crowdsourcing systems, others gaindifferent opinions [Hars and Ou, 2002], which indicates that the increasedfinancial incentives increase the quantity, but not the quality, of work per-formed by participants [Mason and Watts, 2010]. The similar conclusionsobtained by the research on Google Answers. Chen et al. [Chen et al., 2010][Jain et al., 2009] find that posting a higher prize leads to a significantlylonger answer, but not better.

Instead of relying on monetary reward, some incentive design turnedto non-monetary rewards that take the form of reputation points and con-fer a measure of social status within the communities. Zhang and Schaar[Zhang and Van der Schaar, 2012] provide incentives for workers to exertefforts using a novel game-theoretic model based on reputation system. Yan

– 20 –

and Roy [Yan and Van Roy, 2008] develop a rational expectation equilib-rium model to study how incentives created by reputation markets influencecommunity behavior.

For the crowdsourcing contest situation, another effective way for in-centive design is to control the contest architecture [Chawla et al., 2012][Cavallo and Jain, 2012]. Contest architecture specifies how the contes-tants are split among several sub-contests whose winners compete againsteach other (while other players are eliminated). Moldovanu and Sela[Moldovanu and Sela, 2006] compare the performance of such dynamicschemes to that of static winner-take-all contests from the point of view ofa designer who maximizes either the expected total efforts or the expectedhighest effort. There are sheer amount of literatures of the optimal (effort-maximizing) structure of multi-stage contest [Moldovanu and Sela, 2006][Kaplan and Sela, 2010] [Fu and Lu, 2012] [Fu and Lu, 2009].

The traditional job assignment problem has been well studied to solvethe task of assigning a given set of workers with varying abilities to a givenset of jobs in such a way that the overall working time can be kept to a min-imum. The assignment literature, surveyed by Stattinger [Sattinger, 1993],distinguishes between model where assignment is based on worker’s prefer-ences over job characteristics, comparative advantage and production com-plementarities across job. All these three types of assignment model haveone fundamental assumption in common, that is workers are endowed withcomplete information on their abilities, or there are activities (schoolingis an example) which generate information on the worker’s ability. Basedon the ability information, workers of higher ability are optimally assignedto jobs with more capital, unilaterally by firms. However, that is not thecase in the crowdsourcing situation. First of all, as mentioned before, thecrowds are “undefined”. It is hard to distinguish professionals and amateurssince the crowds can register with very basic personal information. Sec-ondly, firms cannot designate one worker, whom they thought competent todo a specific job, because crowdsourcing provides the policy of maximumfreedom of task selection for the crowds. Otherwise, the incentives for thecrowds to participate could be dramatically hurt.

– 21 –

2.4 Summary

Some researchers point out the pitfalls in mechanism design in crowdsourc-ing. Robert Kleinberg, a computer science professor at Cornell Univer-sity, says, mechanisms may be so complicated that users don’t understandthem and hence won’t participate. It’s not sufficient to tell them, “don’tworry, I have proved a theorem that the best thing for you to do is such-and-such”. This indicates that the mechanism designed for crowdsourcingsystem should be simple and explainable as much as possible, which is atough problem.

This survey provides a brief glimpse of the literature review on the topicof mechanism design in crowdsourcing. By exploring the related literature,we find out both the essential characters of practical crowdsourcing sys-tems and the crucial research issues in crowdsourcing mechanism design.Furthermore, we classify some dominating mechanism design approachesaccording to those diverse issues. Since there are a lot of crucial researchtopics in mechanism design in crowdsourcing, we mainly focus on the issueof incentive design for solution quality assurance and efficiency improve-ment.

– 22 –

Chapter 3

Efficient Task Decomposition inCrowdsourcing

3.1 Introduction

Crowdsourcing system is an online, distributed problem-solving and web-based business model. It is now admired as one of the most lucrativeparadigm of leveraging collective intelligence to carry out a wide varietyof tasks with various complexity. This very success is dependent on the po-tential for replacing a limited number of skilled experts with a multitude ofunskilled crowds, by decomposing the complex tasks into smaller pieces ofsubtasks, such that each subtasks becomes low in complexity, requires littlespecialized skill, time and cognitive effort to be completed by an individual.

After decomposing a complex task into multiple small subtasks, a col-lective of crowds (or workers) execute the subtasks either independentlyor dependently, which depends on whether there are dependencies amongthe subtasks [Malone et al., 2009]. When the subtasks are structured in-dependently, multiple workers are recruited to collaborate in parallel, andsubtask’s quality depends only upon the effort of the worker who per-forms it. By contrast, when there are dependencies among the subtasks,workers are organized to collaborate sequentially, and subtask’s quality de-

– 23 –

pends on efforts of multiple workers who jointly produce output. In thesequential process, subtask dependence mainly characterized in the strik-ing feature that one worker’s output is used as the starting point for thefollowing worker, which makes the assumption that following worker cando better based on quality solution provided by previous worker hold (see[Kulkarni et al., 2012] [Kulkarni et al., 2011]). Sequential process has beenimplemented for various types of tasks and exhibits outstanding effective-ness. For instance, CrowdForge [Aniket Kittur and Kraut, 2011] and Soy-lent [Bernstein et al., 2010] delineate case studies on article writing andword processing respectively with sequential process, and they both comeup with high quality final outcomes.

Our research focuses on the complex tasks decomposition in crowd-sourcing, wherein the complex tasks can be decomposed and executed inboth independent and dependent way. Of particular interest are the workthat aim at analyzing workers’ strategic behaviors, comparing the efficiencyof two task decompositions, in terms of final quality, and finally generat-ing explicit instructions on strategies for optimal task decomposition. Wedefine two task decompositions as horizontal task decomposition for inde-pendent subtasks, and vertical task decomposition for dependent subtasks.To illustrate the concepts, we refer to the following crowdsourcing-basedproofreading task as example, wherein an article containing three para-graphs requires spelling, style and grammar error correction. In this con-text, the article could be horizontally decomposed into three pieces of sub-tasks that each has one paragraph and be performed by one worker inde-pendently, and later combine the revised paragraphs into a coherent article.Meanwhile, the original article could also be vertically decomposed intothree sequential subtasks, as “Find-Fix-Verify” proposed by Bernstein et al.[Bernstein et al., 2010]. Specifically, find stage asks a worker to identify thepatches that need proofreading throughout the whole article. Fix stage re-cruits a worker to correct the errors in all patches as many as he can. In theverify stage, all the corrections made in the fix stage are verified by anotherworker.

Absent explicit task decomposition guidelines, it is difficult for the task

– 24 –

requesters to decide how to decompose the task properly. Although somecomparative studies of horizontal and vertical task decomposition strategieshave been conducted by experimentations [Aniket Kittur and Kraut, 2011],only implicit or even ambiguous conclusions have been drawn. Aiming atproviding the task requesters with explicit guidelines on efficient task de-composition, we first construct the formal models for vertical and horizon-tal task decompositions, which respectively specify the relationship betweensubtask quality and the worker’s effort level in the presence of positive andnone dependence among subtasks. Then, the efficiency of task decomposi-tions is explored under two revenue sharing schemes, group-based sharing(i.e., equally sharing) and contribution-based sharing, which have receivedmost of the attention so far [Mason and Watts, 2010].

At a broad level, task requester first decomposes the complex task intosmaller subtasks, and then presents them to the workers with associatedrewards through crowdsourcing websites, seeking to maximize the qualityof the task’s solution, which is positively related to the efforts devoted byall the workers. The workers perform the subtasks either sequentially or inparallel, and each of them is self-interested and devotes an amount of effortto her own subtask for individual utility maximization. When the complextask is decomposed, it is decomposed into subtasks with varying intrinsicdifficulties. We define subtask dependence by characterizing how subtask’sdifficulty can be altered by the quality of the previous subtask.

We summarize our main contribution in the following. In Section 3.3,we formally construct the models for both vertical and horizontal task de-compositions. We show that the dependence among subtasks can be formal-ized as the degree to which a subtask’s difficulty depends on the qualitiesof the other subtasks. In Section 3.4, we rigorously analyze the strategicbehaviors of the workers, and find that contribution-based revenue sharingscheme provides more incentives for workers to exert higher efforts on dif-ficult subtasks. In Section 3.5, we conduct simulations to analyze and com-pare the efficiency of two task decompositions, aiming at generating explicitinstructions on strategies for optimal task decomposition. We conclude thatwhen the final quality is defined as the sum of the qualities of solutions to

– 25 –

all decomposed subtasks, vertical task decomposition strategy outperformsthe horizontal one in improving the quality of the final solution, and giveexplicit instructions on the optimal strategy (i.e., arrangement of subtaskswith different difficulties for final quality maximization) under vertical taskdecomposition situation from the task requester’s point of view. Specificexamples are also demonstrated to show how instructions can be utilizedfor constructing optimal strategy and selecting the best one from the givenstrategies.

Furthermore, based on the above studies and results, we design and con-duct two series of proofreading experiments on Amazon Mechanical Turk.The first series of experiments compare the efficacy of vertical decomposi-tion and horizontal decomposition on proofreading task. The second seriesof experiments focus on the vertical task decomposition, and compare thesituations where the first subtasks with different difficulties. Experimen-tal design are detailed in Section 3.6. We present the experiment data wecollect in Section 3.7. According to the data analysis, we 1) verify the su-periority of the vertical task decomposition strategy over the horizontal oneon proofreading task; and 2) apply some of the our proposed instructionson vertically decomposing the proofreading task, which show promise onimproving the quality of final outcome.

3.2 Related Work

In order to harness crowdsourcing marketplaces, such as MTurk[Ipeirotis, 2010], to accomplish complex tasks, efficient task decompositionin workflow design has been a research problem that received considerableattention from many researchers.

CrowdForge is a rigidly defined framework for task decomposition[Aniket Kittur and Kraut, 2011]. It delineates the case study on article writ-ing, which consists of three sequential subtasks, as outline construction,information collection, and writing. On average, five articles producedby the above process are of higher qualities than those produced by inde-

– 26 –

pendent individuals. Turkomatic comes up with the innovative idea thatask the crowd to help decomposing the tasks into set of subtasks that canbe posted in parallel or must be serially executed [Kulkarni et al., 2011].Afterwards, to guarantee the success of task decomposition, real-time ex-pert intervention is introduced in Turkomatic [Kulkarni et al., 2012]. Forspecific tasks, particular sequential processes are executed in a previouslydetermined manner. Soylent contributes the three-stage pattern, Find-Fix-Verify, to text editing services in word processing, with high fea-sibility and efficiency [Bernstein et al., 2010]. Similarly, PlateMate pro-vides Tag-Identify-Measure for nutritional analysis from photos of meals[Noronha et al., 2011]. Although all these workflows show superiority ofvertical task decomposition over horizontal task decomposition in complextask solving, and are evaluated to be efficient empirically, further and ex-plicit analysis on efficient task decomposition is still hindered by the ab-sence of formalized model, which takes into account the dependence amongsubtasks.

Furthermore, iterative paradigm could be introduced into task decom-position as subdecompositions to break subtasks down to small pieces untilthey can be solved individually. At the same time, iterative paradigm canserve as an workflow alone for complex task solving without task decompo-sition. TurKit presents the first examples in the literature of iterative work-flow for complex tasks [Greg Little and Miller, 2010] [Little et al., 2009].Such iterative task solving workflows without task decomposition are be-yond the scope of this study.

In addition, it is worth noting that, for efficient complex task solv-ing in crowdsourcing, online task assignment has been a research problemthat has also attracted attentions from many researchers. Different fromthe works that provide efficient solutions for applications with indepen-dent subtasks [Ho and Vaughan, 2012] [Tran-Thanh et al., 2013], Long etal. [Tran-Thanh et al., 2014a] first investigates the interdependent subtaskallocation in crowdsourcing systems, which has the most relevant back-ground to our research.

– 27 –

3.3 The Model

In this section we consider the complex task, e.g., proofreading, that canbe both vertically and horizontally decomposed into N (N > 1) subtasks.In both situations, N workers contribute their efforts, such as time and re-sources, to N subtasks respectively. The amount of effort exerted by workeri to subtask i is characterized by ei. In the following, we formally introducethe models for vertical and horizontal task decompositions respectively, byspecifying the relationship between subtask quality and the worker’s effortlevel in the presence of positive and none dependence among subtasks.

3.3.1 Vertical Task Decomposition

We first take Find-Fix-Verify workflow for proofreading as an specific ex-ample to highlight the relationship between subtasks’ qualities and the qual-ity of the final solution in the presence of positive dependence among se-quential subtasks.

Find-Fix-Verify: This sequential task solving process is mainly charac-terized in the striking feature that one worker’s output is used as the startingpoint for the following worker. The output of Find stage is the patches thatmay have spelling and grammar errors and need corrections or edits. Thequality of patches could be evaluated by how well they cover the true posi-tions (i.e., the positions with actual errors) [Tran-Thanh et al., 2014a]. Theoutput of Fix stage is the corrections of the errors in patches that identifiedin Find stage. The quality of fix task could be evaluated by the number andthe average validity of the proposed corrections. Last, in Verify stage, work-ers performs quality control. By either accepting or rejecting the correctionsand edits, verify task further improves the proofreading result.

It is worth noting that although the output of Verify stage is the finalsolution of the proofreading task, the quality obtained by Verify stage is notthe quality of the final solution — since the quality of Verify stage absolutelyowes to the efforts for verification, it merely contributes a additive portionto the final quality. Find and Fix also contribute to the final quality in this

– 28 –

way. In other words, the quality of the final solution can be viewed as thecumulative qualities obtained from all subtasks. Formally, we define thequality of the final solution as follows.

Definition 3.1 (Final quality). The quality of the final solution (Q) to thecomplex task is the cumulative qualities of all the N decomposed subtasks,i.e.,

Q(e) = ∑Ni=1 qi

vertical (3.1)

where e = (e1, · · · ,eN), and qivertical is the quality function of subtask i.

It is reasonable to assume that, the quality of each subtask is positivelyrelated to the effort from the worker. Furthermore, since each subtask takesthe output from the previous subtask as input, the quality of each subtask isalso positively related to the quality of the previous subtask. Formally, weassume the subtask quality function is governed by the following form.

Assumption 3.1. The quality of subtask i’s solution depends not only onthe effort exerted by worker i, but also on the quality of previous subtask’ssolution, i.e.,

qivertical = f i(qi−1

vertical,ei) (3.2)

where f i increases with ei at a decreasing rate, i.e.,

∂ f i

∂ei> 0 and

∂ 2 f i

∂e2i< 0 (3.3)

Remark 3.1. The recursive definition of f i directly implies that the qualityof subtask i’s solution depends also on the efforts from all the prior workers.Hence, we can rewrite Eq. (3.2) equivalently as qi

vertical = f i(e1, · · · ,ei),and any increase in the efforts from the previous workers also leads to animprovement on the quality of subtask i, i.e., ∀k ∈ {1, · · · , i}, ∂ f i/∂ek > 0.Last, note that, for the first subtask (i = 1), the quality function is simplifiedas q1

vertical = f 1(e1).

– 29 –

Before we illustrate how does qivertical depend on qi−1

vertical , we first in-troduce the concept of subtask difficulty. Take Find stage for example, ar-ticles that consist more frequent in long and compound sentences, indicatemore grammatical errors, and thus require considerable effort for locatingthe true positions. Also, when the patches are required to be identified atphrase-level instead of sentence-level, it demands greater effort. Thus, highdifficulty indicates low marginal contribution based on the same level ofeffort, which is formalized as follows.

Definition 3.2 (Subtask difficulty). We endow subtask i with weight ωi ∈(0,1) as its difficulty. Subtask i is said to be more difficult than subtask j(i.e., ωi > ω j), iff for any effort level l

∂ f i(ei,e−i = eN−1)

∂ei| ei=l <

∂ f j(e j,e− j = eN−1)

∂e j| e j=l (3.4)

where eN−1 is the effort levels of all other N − 1 workers. Furthermore,∑

Ni=1 ωi = 1.

Remark 3.2. The constraints on the weights (∑Ni=1 ωi = 1 and ωi ∈ (0,1))

indicate that any subtask is neither replaceable nor dispensable, and only ifall the subtasks are accomplished, is the complex task regarded as accom-plished.

Now, we continue with the Find-Fix-Verify example to illustrate howdoes qi−1

vertical affect qivertical by altering the difficulty of subtask i. As Find

stage, Fix stage also has its own intrinsic difficulty, for example, interact-ing grammatical errors in multiple patches result in high level of difficulty.Nevertheless, its difficulty can be altered by the quality of the Find task. Forexample, high quality of Find task due to phrase-level error location reducesthe difficulty of Fix task, however, low quality due to the noisy patches canmake Fix task more difficult. Combined with Eq. (3.3), we formalize therelationship between previous efforts and current subtask’s difficulty as fol-lows.

– 30 –

Qua

lity

of su

btas

k 1

Effort Effort

A

B

Qua

lity

of su

btas

k 2

1e ′1e

( )11f e ′

( )11f e

( )11f e ( )2

2f e

( )22f ′e

( )21 2,f e e′

( )21 2,f e e

2e

Figure 3.1: For a two-subtask situation, where the quality function for sub-task 1 and 2 are both increasing and concave. The quality increase of thefirst subtask due to the increase of the effort exerted by the first worker(from e1 to e′1) leads to 1) the quantity increase in subtask 2’s quality (fromf 2(e1,e2) to f 2(e′1,e2)), and 2) the slope increase (slope of point A is greaterthan that of point B), which indicates the productivity improvement of thesecond worker.

Assumption 3.2 (Quality dependency). The difficulty of subtask i decreasesas the efforts on previous subtasks increase, i.e., ∀k ≤ i−1, if ek < e′k, then

∂ f i(e−k,ek)

∂ei<

∂ f i(e−k,e′k)∂ei

(3.5)

Remark 3.3. It is worth noting that an increase in the effort previous sub-tasks not only increases the quality of subtask i in quantity (Eq. (3.3)), butalso, according to Eq. (3.5), improves worker i’s productivity, which enableworker i to make greater quality improvement by exerting the same level ofeffort. Fig. 3.1 illustrates the positive quality dependence in a two-subtasksituation.

3.3.2 Horizontal Task Decomposition

At a broad level, in horizontal task decomposition situation, the complextask is decomposed into N subtasks with no interdependencies, which im-

– 31 –

plies∂

2 f i/∂e j∂ei = 0, for ∀i, j ∈ 1, · · · ,N.

Then, these subtasks are presented to N workers, who devote efforts si-multaneously and independently to their own subtasks for individual utilitymaximization. Finally, the solutions of all subtasks are recomposed into anoverall solution, also in a cumulative way. Thus, the quality function of thefinal solution is defined as in Eq. (3.1), i.e.,

Q(eN) = ∑Ni=1 qi

horizontal,

where qihorizontal is the quality function of subtask i.

In contrast to vertical task decomposition, where each worker concen-trates on a single stage of the workflow (take proofreading for example,Find, Fix or Verify), in horizontal task decomposition, each worker givesconsiderations to all stages. This makes the worker have to divide her effortamong all stages. We assume that the effort ei, exerted by worker i to sub-task i, can be viewed as being distributed among N stages as in the verticaltask decomposition situation, proportionally to the difficulties of the stages.This assumption simplifies the results without sacrificing much in terms ofgenerality. Thus, we can define the subtask quality functions by utilizing theform of quality functions defined in vertical task decomposition as follows.

Assumption 3.3 (Horizontal subtask quality function). The quality of thesolution to subtask i only depends on the effort exerted by worker i, whichis distributed among N stages proportionally to their difficulties. Hence,

qihorizontal =

N

∑k=1

f k(ω1ei, · · · ,ωkei) (3.6)

3.3.3 Revenue Sharing Schemes

We formally define two revenue sharing schemes and the correspondingutilities of workers as follows.

– 32 –

Definition 3.3 (Group-based revenue sharing). Under the group-based rev-enue sharing scheme, each worker receives an equal share of the total rev-enue given by the task requester, i.e., Ri = Q(ei)/N. Therefore, worker i’sutility is

π(ei) =Q(e)

N− c(ei). (3.7)

Definition 3.4 (Contribution-based revenue sharing). Under the contribution-based revenue sharing scheme, each worker receives reward determined byher marginal contribution to the final task solution, i.e., Ri(ei) = Qei(ei) · ei.Thus, worker i’s utility is

π(ei) = Qei(ei) · ei− c(ei) (3.8)

Each worker incurs a cost of exerting effort for accomplishing her ownsubtask. The more effort one devoted, the more costs of time and resourcesare required. Since the time and resource for individuals are limited, thecosts of providing better solutions to subtasks escalate. Based on this ar-gument and previous literature [Holmstrom and Milgrom, 1991], cost is as-sumed to increase at an increasing rate and of the same functional form forall workers.

Assumption 3.4. In order to execute subtask i, worker i exerts an effortlevel ei with a cost c(ei). The cost increases with the effort, and it increasesat an increasing rate. That is, the cost function is convex, i.e.,

cei =dcdei

> 0 and ceiei =d2cde2

i> 0 (3.9)

3.4 Strategic Behaviors Analysis

In this section we aim to demonstrate the relationship between the incen-tive and the performance, i.e., the level of effort. This section consistsof two parts. Individual behavior analysis focuses on the effort devotion

– 33 –

on the subtasks with different difficulties, and dependent behavior analy-sis explores the dependent relationship between the effort levels devoted bysequential workers. Our two conclusions of individual behavior analysisare similar to previous theoretical analyses of team-based worker behaviors[Barua et al., 1995] [Wageman and Baker, 1997]. However, the conclusionson workers’ dependent behaviors in sequential task solving process are ourcontributions.

In the following discussion, we assume that the exact efforts exerted byall workers are a common knowledge. This assumption simplifies the resultswithout sacrificing much in term of generality, since similar results readilyhold when we extent the exact utilities of workers with commonly knownexact efforts to the expected utilities of workers with commonly known ef-fort distribution.

3.4.1 Independent Behaviors of Individuals

Proposition 3.1. Under both vertical and horizontal task decompositionstrategies, when group-based sharing scheme is applied, workers devotehigher efforts to easy subtasks than difficult subtasks, for individual utilitymaximization.

Proof. Let subtask i be more difficult than subtask j. Suppose each workerchooses an effort level to maximize her utility, and her optimal effort forsubtask i and j are denoted by e∗i and e∗j , respectively. We want to provee∗i < e∗j . For group-based revenue sharing, the first-order conditions forworker on subtask i and j are Qei/N−cei=0 and Qe j/N−ce j=0, respec-tively. Therefore, under the vertical task decomposition, we have

f iei(e∗i )/N− f j

e j(e∗j)/N = cei(e

∗i )− ce j(e

∗j).

Suppose e∗i > e∗j , then cei(e∗i ) > ce j(e

∗j), such that f i

ei(e∗i ) > f j

e j(e∗j). How-

ever, since subtask i is more difficult than subtask j, according to Eq. (3.3)

– 34 –

and Eq. (3.4),

f iei(ei = e∗i ,e j = e∗j)< f j

e j(ei = e∗j ,e j = e∗i )< f j

e j(ei = e∗j ,e j = e∗j)

which leads to a contradiction. The supposition e∗i = e∗j also leads to con-tradiction. So e∗i < e∗j . Similarly, we can prove e∗i < e∗j for horizontal taskdecomposition situation.

Proposition 3.2. Under both vertical and horizontal task decompositionstrategies, when contribution-based sharing scheme is applied, workers de-vote efforts to easy subtasks no less than difficult subtasks, for individualutility maximization.

Proof. The proof is similar to that for Proposition 3.1.

According to Proposition 3.1 and 3.2, although contribution-basedrevenue sharing may provide workers with more incentives to per-form difficult subtasks than group-based revenue sharing, they both in-dicate the fact that workers are more inclined to perform easy sub-tasks. This is consistent with findings in worker behavior studies incrowdsourcing (e.g., [DiPalantino and Vojnovic, 2009] [Yang et al., 2008a][Zheng et al., 2011]), and highlights the need for task design, which we willexplore with respect to task decomposition in Section 3.5.

3.4.2 Dependent Behaviors in Vertical Task Decomposi-tion

Under vertical task decomposition strategy, subtasks are chained togetherinto complementary and sequentially dependent stages in the way, specifiedin Assumption 3.1, that as the effort exerted to the prior subtask becomeshigher, the increase of the quality of the posterior subtask based on the sameamount of effort exerted by the following worker becomes larger. In orderto explore the dependent relationship between the effort levels of sequentialworkers, avoiding notational complexity, we use a two-worker situation in

– 35 –

the discussions below. The conclusions are readily extended to N-workercase.

Proposition 3.3. Consider a two-worker task under group-based revenuesharing scheme, higher effort on the prior subtask from worker 1 incitesworker 2 for the posterior subtask to exert higher level of effort, andvice versa. The same holds true when contribution-based revenue sharingscheme is applied.

Proof. Consider a two-worker situation under group-based revenue sharingscheme. Utility for worker 2 is π(e2) = ( f 1(e1) + f 2(e1,e2))/2− c(e2).Her optimal efforts e∗21 for e1 = e11 and e∗22 for e1 = e12 fit the first-orderconditions as f 2

e2(e11,e∗21)/2−ce2(e

∗21)=0 and f 2

e2(e11,e∗21)/2−ce2(e

∗21)=0.

Therefore,

12

f 2e2(e11,e∗21)− ce2(e

∗21) =

12

f 2e2(e12,e∗22)− ce2(e

∗22).

When e11 < e12, suppose e∗21 > e∗22, we have

ce2(e∗21)> ce2(e

∗22)⇒ f 2

e2(e11,e∗21)> f 2

e2(e12,e∗22).

However, f 2e2(e11,e∗21) < f 2

e2(e12,e∗22) for e∗21 > e∗22 and e12 > e11, which

leads to a contradiction. Therefore, when e11 < e12, e∗21 < e∗22. Similarly,when e11 > e12, we have e∗21 > e∗22.

In the same way, we can prove when e11 < e12, e∗21 <e∗22, and whene11 > e12, e∗21 >e∗22 for the contribution-based revenue sharing scheme.

Since under vertical task decomposition strategy, workers are endowedwith the ability to observe the effort levels exerted by the previous work-ers before him, any extra effort devoted by the first worker can not onlybenefit the second worker, but also cause an extra effort from the secondworker, thus, in an N-worker situation, lead to a “chain effect” that triggersa sequence of extra efforts from all the following workers. This can sig-nificantly improve the final quality of the task. By contrast, horizontal task

– 36 –

decomposition strategy inherently cannot have this advantage.

Proposition 3.4. Consider a two-worker task under group-based revenuesharing, worker 1 is incited to exert higher effort on the prior subtask withthe believe that worker 2 will exert high effort for the posterior subtask, andvice versa. The same holds true when contribution-based revenue sharingscheme is applied.

Proof. Consider a two-worker situation under group-based revenue sharingscheme. Utility for worker 1 is π(e1) =

12 [ f

1(e1)+ f 2(e1,e2)]− c(e1). Forworker 1, her optimal effort levels, e∗11 for e2 = e21, and e∗12 for e2 = e22 fitthe first-order conditions as follows.

πe1(e∗11) =

12[ f 1

e1(e∗11)+ f 2

e1(e∗11,e21)]− ce1(e

∗11) = 0,

and πe1(e∗12) =

12[ f 1

e1(e∗12)+ f 2

e1(e∗12,e22)]− ce1(e

∗12) = 0.

Suppose e∗11 < e∗12 for e21 > e22, then ce1(e∗11)< ce1(e

∗12). Therefore,

f 1e1(e∗11)+ f 2

e1(e∗11,e21)< f 1

e1(e∗12)+ f 2

e1(e∗12,e22).

However, when e∗11 < e∗12 and e21 > e22, we have

f 1e1(e∗11)> f 1

e1(e∗12)

and f 2e1(e∗11,e21)> f 2

e1(e∗12,e21)> f 2

e1(e∗12,e22),

which lead to a contradiction. Therefore, when e21>e22, e∗11>e∗12. Similarly,we can prove when e21 < e22, e∗11 < e∗12.

In the same way, we can prove when e21 > e22, e∗11 >e∗12 and whene21 < e22, e∗11 < e∗12 for contribution-based revenue sharing scheme.

Proposition 3.5. Consider a two-worker task situation under group-basedsharing scheme. When two subtasks have the same difficulty, the sameamount of increase in reward incites more efforts from worker 1 than that

– 37 –

from worker 2.

Proof. Without loss of generality, suppose that for the workers, maximiza-tion problem is given by

argmaxei

{α

2Q(e)− c(ei)}, α ∈ (0,1],

where the reward is able to be modified by adjusting the value of α . Thenfor worker 1, the first-order condition is π(e1) =

α

2 [ f1e1(e∗1)+ f 2

e1(e∗1,e2)]−

ce1(e∗1) = 0, where e∗1 represents her optimal effort level. Then we have

de1

dα=−

f 1e1+ f 2

e1

α( f 1e1e1

+ f 2e1e1

)−2ce1e1

> 0

The positive values of de1dα

shows that the effort of worker 1 increases withthe rewards. Similarly, for worker 2, her the optimal effort level increasesas the reward increases, i.e.,

de2

dα=−

f 2e2

α( f 2e2e2

)−2ce2e2

> 0.

Notice that we assume that the cost function is the same for all workers,and when two subtasks have the same difficulty, i.e., f 1

e1= f 2

e1= f 2

e2and

f 1e1e1

= f 2e1e1

= f 2e2e2

, we have

de1

dα>

de2

dα.

i.e., the same amount of increase in reward incites more efforts from worker1 than that from worker 2.

Proposition 3.6. Consider a two-worker task under contribution-based rev-enue sharing scheme. When two subtasks have the same difficulty, the sameamount of increase in reward incites less efforts from worker 1 than thatfrom worker 2.

– 38 –

Proof. Without loss of generality, suppose that for worker i, the utility func-tion is π(ei) = Qei · ei− c(ei)+α2ei, α ∈ (0,1]. For worker 1, the maxi-mization problem is given by

argmaxe1

{ f 1e1(e1) · e1 + f 2

e1(e1) · e1− c(e1)+α

2e1}

⇒ de1

dα=− 2α

f 1e1e1e1

e1 + f 2e1e1e1

e1 +2 f 1e1e1

+2 f 2e1e1− ce1e1

> 0.

That is, for worker 1, her optimal effort level increases as the reward in-creases.

For worker 2, the maximization problem is given by

argmaxe2

{ f 2e2(e2) · e2− c(e2)+α

2e2}

⇒ de2

dα=− 2α

f 2e2e2e2

e2 +2 f 2e2e2− ce2e2

> 0.

That is, for worker 2, her optimal effort level also increases as the rewardincreases.

Notice that we assume that the cost function is the same for all re-ferrers, when effort exerted by referrer 1 has the same effect on theinfluence increase for both of the subtasks, i.e., f 1

e1e1= f 2

e1e1= f 2

e2e2and

f 1e1e1e1

= f 2e1e1e1

= f 2e2e2e2

, we have

de1

dα<

de2

dα.

That is, when the workers are provided with same amount of extra rewardfor the same extra effort, the first worker exerts lower effort than that exertedby the second worker.

According to Proposition 3.5, under group-based revenue sharingscheme, increase in reward for the first worker will be more beneficial tothe requester. By contrast, according to Proposition 3.6, under contribution-

– 39 –

based revenue sharing scheme, award marginal reward to second workerwill be more beneficial to the requester.

3.5 Task Decomposition Strategy Analysis andComparison

Different from the behavior analysis, which focuses on individual strate-gic effort devotion, we construct simulation aiming at explicitly evaluatingand comparing the efficiencies, in terms of the final quality (Eq. (3.1)), oftwo task decomposition strategies under two revenue sharing schemes. It isworth noting that, besides the 2-subtask and 3-subtask situation presentedin the following, we also explored the situations for N = 4,· · · ,9, which notshown here due to limited space but generate the similar results.

3.5.1 Utility Specification

Subtask quality functions

We consider the simulated task solving process under the assumption ofsubtask quality functions in the Cobb-Douglas form. We assume there areN workers for N subtasks. Specifically, for subtask i, the quality functionsunder vertical and horizontal task decompositions are respectively definedas

qivertical(e1, · · · ,ei)=

i

∏k=1

eωkk

and

qihorizontal(ei)=

N

∑k=1

k

∏r=1

(ωrei)ωrN

where ωi∈(0,1) is the difficulty of subtask i, and ∑Ni=1 ωi=1.

Remark 3.4. According to the specified functions, high value of ωi corre-sponds to high difficulty of the subtask, and it can be easily verified that

– 40 –

2 4 6 80.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Weights

Fin

al Q

ual

ity o

f

Contr

ibuti

on−

bas

ed R

even

ue

Shar

ing

Vertical Task Decomposition

Horizontal Task Decomposition

(a) Final quality comparison undercontribution-based revenue sharing.

2 4 6 8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Weights

Fin

al Q

ual

ity o

f

Gro

up−

bas

ed R

even

ue

Shar

ing


Horizontal Task Decomposition

(b) Final quality comparison undergroup-based revenue sharing.

Figure 3.2: Efficiency comparison of vertical and horizontal task decom-positions (two-subtask situation). Generally, vertical task decompositionstrategy outperforms horizontal task decomposition strategy in terms of fi-nal quality.

they both satisfy the requirements given in Assumption 3.1.

Cost functions. We simply specify the cost function for worker i as

c(ei) = e2i , i = 1,2, · · · ,N

which satisfies the requirements given in Assumption 3.4.

3.5.2 Efficiency Comparison

Efficiency Comparison of Two Task Decompositions

Two-subtask situation is depicted in Fig. 3.2 as a concise case to illustratethe efficiency difference between vertical and horizontal task decompositionstrategies. We endow each of two subtasks with a weight (ω1,ω2 ∈ (0,1)),which is restricted to one decimal place. Then, we exhaustively examine thefinal quality under all the combinations of two weights, which are sorted in

– 41 –

increasing order of the first subtask’s weight, as given in the Table 3.1.

Table 3.1: Weight combinations for 2-subtask situation.

X-axis 1 2 3 4 5 6 7 8 9ω1 1 2 3 4 5 6 7 8 9ω2 9 8 7 6 5 4 3 2 1Note: weight = ωi/10

Fig. 3.2 (a) and (b) illustrate the final qualities of two task decompo-sition strategies under group-based and contribution-based revenue sharingschemes, respectively. As can be seen in them, the final quality of verticalsituation under both revenue sharing schemes increases along with the dif-ficulty of the first subtask. By contrast, the efficiency of horizontal situationincreases with the difficulty of the first subtask only when the first subtask ismore difficult than the second one, i.e., (ω2 > ω1). Otherwise, the efficiencydecreases with the difficulty of the first subtask.

It is clear from Fig. 3.2 that for final quality estimation, vertical taskdecomposition strategy is superior to the horizontal one, under both group-based and contribution-based revenue sharing schemes. Furthermore, whenthe difficulty of the first subtask gets higher, vertical strategy exhibits clearersuperiority over the horizontal one. It is worth noting that, for the group-based revenue sharing, in Fig. 3.2 (b), at the first point, corresponding toweight combination (0.1, 0.9), horizontal task decomposition defeats thevertical task decomposition. Intuitively, the reason is that, when the secondsubtask has high difficulty, the second worker refuses to make much effortbecause it won’t bring much marginal increase to the final quality, let alonehalf of the total revenue will go to the first worker according to the group-based revenue sharing, which won’t compensate the second worker. Thus,the vertical decomposition is less effective.

To sum up, generally, vertical task decomposition outperforms horizon-tal task decomposition in terms of final quality. Besides the two-subtasksituation, we also explored the situations for N = 3, · · · ,9, which not shownhere due to limited space but generate similar results.

– 42 –

Table 3.2: Weight combinations for 3-subtask situation.

(a)X-axis 1 2 · · · 8 9 10 · · · 15 16 · · · 21 22 23

ω1 1 1 · · · 1 2 2 · · · 2 3 · · · 3 4 4ω2 1 2 · · · 8 1 2 · · · 7 1 · · · 6 1 5ω3 8 7 · · · 1 7 6 · · · 1 6 · · · 1 5 1

(b)X-axis 1 · · · 3 4 5 6 7 8 · · · 10 11 12 13

ω1 4 · · · 4 5 5 5 5 6 · · · 6 7 7 8ω2 2 · · · 4 1 2 3 4 1 · · · 3 1 2 1ω3 4 · · · 2 4 3 2 1 3 · · · 1 2 1 1Note: weight = ωi/10

Vertical Task Decomposition Efficiency under Two Revenue SharingSchemes

In Fig. 3.3 we explore 3-subtask situation to compare the efficiencies of tworevenue sharing schemes for vertical task decomposition strategy. In thefollowing, we first present the lessons we learn by analysing the data illus-trated in Fig. 3.3, then give examples of revenue sharing scheme selectionbased on what we learn.

As we do for the two-subtask situation, we respectively endow 3 sub-tasks with weights ω1, ω2, ω3, which are restricted to one decimal place aswell, and then exhaustively examine all combinations of the weights, i.e., apermutation of {0.1, 0.2, · · · , 0.9} with a restriction that the sum of threeweights equals 1. As shown in Table 3.2, for 3-subtask situation, there area total of 36 combinations, and they are sorted in a lexicographic manner,i.e., (ω1,ω2,ω3) occurs before (ω ′1,ω

′2,ω

′3) iff ω1 < ω ′1, or ω1 = ω ′1 and ω2

< ω ′2, or ω1 = ω ′1 and ω2 = ω ′2 and ω3 < ω ′3.

Lesson 1. When the first subtask is not the most difficult subtask, for theseries of subtasks with the strictly concave weights (i.e., ω1 < ω2 and ω2 >

ω3), contribution-based revenue sharing scheme leads to higher final qual-ities than group-based revenue sharing scheme.

– 43 –

As can be observed in Fig. 3.3 (a) (for the X axis and weight com-binations, please refer to Table 3.2 (a)), when the first subtask is not themost difficult subtask, although in the beginning, contribution-based rev-enue sharing scheme is inferior to group-based scheme, as moving alongthe weight combinations sorted in lexicographic manner, it becomes supe-rior. The turning point lies at the weight combiantion which begins to bestrictly concave. For example, in Fig. 3.3 (a), for the eight series of subtasksbegin with weight 0.1 (X axis scale is 1 to 8), the final qualities broughtby contribution-based sharing scheme exceed those brought by group-basedsharing scheme at the fifth series of weights (5. (0.1, 0.5, 0.4)), and thenbecome completely superior.

Lesson 2. When the first subtask is the most difficult subtask, contribution-based revenue sharing scheme leads to higher final qualities than group-based revenue sharing scheme.

As illustrated in Fig. 3.3 (b) (for the X axis and weight combinations,please refer to Table 3.2 (b)), when the first subtask is one of the mostdifficult subtasks, contribution-based sharing scheme is superior to group-based sharing scheme. Furthermore, as the weight of the first subtask getshigher, contribution-based sharing scheme exhibits clearer superiority overthe group-based one.

Example 3.1. Task requester who aims at optimizing the final quality hasto determine the revenue sharing scheme first. For these very similar taskdecompositions, (0.3, 0.3, 0.4), (0.3, 0.4, 0.3) and (0.4, 0.3, 0.3), the rev-enue sharing scheme can be decided based on Lesson 1 and 2. For (0.3,0.3, 0.4), which is neither concave nor starts with a most difficult subtask,group-based scheme is selected. For (0.3, 0.4, 0.3), which is strictly con-cave, according to Lesson 1, contribution-based scheme is selected. Also,for (0.4, 0.3, 0.3), contribution-based scheme is selected, the reason is dif-ferent — according to Lesson 2, for the series of subtasks start with the mostdifficult subtask, contribution-based scheme is superior. All selections can

– 44 –

1 8 9 1516 2122 230.25

0.3

0.35

0.4

0.45

0.5

0.55

Fin

al Q

ualit

y

Contribution−based Revenue SharingGroup−based Revenue Sharing

(a) Final quality comparison when the first subtask is the most difficult subtask.

Weight

1 3 4 7 8 10 11 12 13

0.4

0.5

0.6

0.7

0.8

Fin

al Q

ualit

y

Contribution−based Revenue SharingGroup−based Revenue Sharing

(b) Final quality comparison when the first subtask is not the most difficult subtask.

Weight

Figure 3.3: Efficiency comparison of two revenue sharing schemes for ver-tical task decomposition strategy. In general, when the series of subtasks be-gin with a difficult subtask, the contribution-based revenue sharing schemeis superior to the group-based revenue sharing scheme.

– 45 –

be verified in Fig. 3.3.

3.5.3 Vertical Task Decomposition Strategy

In the case of vertical task decomposition, the multiple subtasks are chainedinto sequentially dependent stages. In this study, we restrict out attention tothe case where prior subtasks positively support the subsequent subtasks, asassumed by Eq. (3.5). As the qualities of the prior subtasks improve, thepositive support they provide become strong, which makes the subsequentsubtasks more dependent on the prior ones. Further, in the extreme situa-tion where all subtasks have the highest qualities with all workers exert theirhighest efforts (e = 1), the dependence among subtasks become strongest,which can be viewed as the intrinsic dependence among the sequential sub-tasks. Formally, we define the dependence of two sequential subtasks, andthen the total dependence among all subtasks as follows.

Definition 3.5. Given a succession of N subtasks and correspondingweights, the sequential dependence between subtasks i− 1 and i is definedas

d(ei−1,ei) =∂

∂ei−1

∂qi

∂ei

∣∣∣∣e1=···=ei−1=ei=1,

= ωi−1ωi, (3.10)

and the total dependence among all subtask is

N

∑i=2

d(ei−1,ei). (3.11)

Lessons Learned on Group-based Revenue Sharing Scheme

In Fig. 3.4, we explore 3-subtask situation and illustrate the final qualityof vertical task decomposition strategy under group-based revenue sharingscheme. In the following, we first present the lessons we learn by analysingthe data illustrated in Fig. 3.4, then integrate them into explicit guidelines ontask decomposition from task requester’s point of view, lastly give examplesof task decomposition selection based on the lessons we learn.

– 46 –

Lesson 3. The highest final quality brought by the series of subtasks beginwith high difficulty is superior to that brought by the series of subtasks beginwith low difficulty under group-based revenue sharing scheme.

As can be observed in Fig. 3.4 (a), when the first subtask is not themost difficult subtask, the highest quality of the series of subtasks beginwith weight equals 0.4 (X axis scale is 22 to 23) is superior to that of theseries of subtasks begin with weight 0.3, which then is superior to that ofthe series of subtasks begin with weight 0.2, and so on. Similar observationscan be observed in Fig. 3.4 (b), which illustrates the situation where the firstsubtask is one of the most difficult subtasks.

Lesson 4. When the first subtask is not the most difficult subtask, for theseries of subtasks begin with the same difficulty, the lowest task dependenceleads to the lowest final quality.

For instance, in Fig. 3.4 (a), for the eight series of subtasks begin withweight 0.1 (X axis scale is 1 to 8), the lowest task dependence, correspond-ing to x-axis value 1, gives us the lowest final quality among these eightseries of subtasks.

Lesson 5. When the first subtask is the most difficult subtask, for the seriesof subtasks begin with the same difficulty, i) the highest task dependenceleads to the lowest final quality; and ii) the highest task dependence givenby the convex weights (i.e., ω1 ≥ ω2 and ω2 ≤ ω3) leads to the highest finalquality.

For instance, as can be observed in Fig. 3.4 (b), for four series of sub-tasks begin with weight 0.5 (X axis scale is 4 to 7), the highest task de-pendence, corresponding to x-axis value 7, gives us the lowest final qualityamong these four series of subtasks. Furthermore, in these four series ofsubtasks begin with weight 0.5, there are two with convex weights (4. (0.5,0.1, 0.4) and 5. (0.5, 0.2, 0.3), with task dependence 0.09 and 0.16, respec-

– 47 –

1 8 9 1516 21 2322

0.1

0.2

0.3

0.4

0.5

Weights

Final QualityTask Dependence

(a) When the first subtask is not the most difficult subtask.

1 3 4 7 8 10 11 12 13

0.1

0.2

0.3

0.4

0.5

0.6

Weights


(b) When the first subtask is the most difficult subtask.

Figure 3.4: Efficiency estimation of vertical task decomposition strategyunder group-based revenue sharing scheme. In general, series of subtasksbegin with high difficulties generate high efficiency with respect to the finalquality.

– 48 –

tively), and the higher task dependence, corresponding to x-axis value 5,gives us the highest final quality among these four series of subtasks.

Example 3.2. Suppose the task requester selects the group-based revenuesharing scheme and has to choose among three very similar task decompo-sitions, as (0.3, 0.4, 0.3), (0.4, 0.3, 0.3) and (0.4, 0.4, 0.2). According toLesson 3, he would prefer the series of subtasks start with a more difficultsubtask and eliminate option (0.3, 0.4, 0.3). Furthermore, according to Les-son 5, convex weights (0.4, 0.3, 0.3) is the decomposition, among all seriesof subtasks starts with difficulty 0.4, that leads to the highest final quality.So the task requester can construct her preference as (0.4, 0.3, 0.3) � (0.4,0.4, 0.2) � (0.3, 0.4, 0.3).

Lessons Learned on Contribution-based Revenue Sharing Scheme

We explore the 3-subtask situation under contribution-based revenue shar-ing scheme as shown in Fig. 3.5. In the following, we present the lessonswe learn by analysing the data illustrated in Fig. 3.5, and integrate theminto explicit guidelines on task decomposition from task requester’s pointof view.

Lesson 6. When the first subtask is not the most difficult subtask, for an N-subtask situation, given a segment of weights on the first k (k < N) subtasks,the highest weight of all the possible weights on the (k+1)−th subtask leadsto the highest final quality.

Take the 3-subtask situation in Fig. 3.5 (a) as an example, given theweight on the first subtask equaling 0.1 (X axis scale is 1 to 8), all possibleweights on the second subtask are 0.2, 0.3, · · · , 0.8, then the highest weighton the second subtask which equals 0.8 (X axis value is 8) gives us thehighest final quality among all the series of subtasks begin with weight 0.1.

Lesson 7. When the first subtask is the most difficult subtask, the more diffi-cult the first subtask is, the more efficient is the contribution-based revenue

– 49 –

1 8 9 15 16 21 2223

0.1

0.2

0.3

0.4

0.5

0.6

Weights


(a) When the first subtask is not the most difficult subtask.

1 3 4 7 8 10 11 12 130

0.2

0.4

0.6

0.8

Weights


(b) When the first subtask is the most difficult subtask.

Figure 3.5: Efficiency estimation of vertical task decomposition strategyunder contribution-based revenue sharing scheme.

– 50 –

sharing scheme.

As can be observed in Fig. 3.5 (b), the final quality brought by the seriesof subtasks begin with weight equals 0.8 (X axis value is 13) is higher thanthose brought by the series of subtasks begin with weight 0.7, which arehigher than those brought by the series of subtasks begin with weight 0.6,and so on.

Lesson 8. When the first subtask is the most difficult subtask, for the seriesof subtasks begin with the same difficulty, highest task dependence leads tothe highest final quality.

For instance, as can be observed in Fig. 3.5 (b), for the four series ofsubtasks begin with weight equals 0.5 (X axis scale is 4 to 7), the highesttask dependence, corresponding to x-axis value 7, gives us the highest finalquality among these four series of subtasks.

Example 3.3. Suppose the task requester is restricted to start with the sub-task of a given difficulty. When the first subtask is not the most difficultsubtask, (for example, with difficulty 0.3), according to Lesson 6, the opti-mal decomposition is the one whose second subtask’s difficulty is the high-est among all possible difficulties (in this case, 0.1, · · · , 0.6), so the optimaldecomposition is (0.3, 0.6, 0.1). When the first subtask is the most diffi-cult subtask, (for example, with difficulty 0.5), according to Lesson 8, theoptimal decomposition is (0.5, 0.4, 0.1) with the highest dependence 0.24among all series of subtasks start with difficulty 0.5.

3.6 Proofreading Experiments on MTurk

MTurk is a crowdsourcing web service that coordinates the supply and thedemand of tasks that require human intelligence to complete. The inspi-ration of the system was to have human users complete simple tasks thatwould otherwise be extremely difficult (if not impossible) for computers to

– 51 –

Table 3.3: English articles for proofreading task.

Article Words Grammarerrors

Spellingerrors

Logicallymisusedwords

Totalerrors

Tuberculosis 215 7 0 3 10Change in fashion 167 4 1 5 10

Job interview 263 6 2 2 10American music 273 5 1 4 10Weather forecast 190 8 0 2 10City and county 237 6 0 4 10

perform. Those simple tasks are in various domains, for example, for exam-ple to identify objects in images, find relevant information, or to do naturallanguage processing.

Proofreading tasks, i.e., correcting articles to make them error-free,are conducted on MTurk. Six articles that require proofreading are in En-glish. They are all from proofreading and error correction part of the Testfor English Majors, Band 8 (TEM-8)1. TEM-8 is the highest-level test tomeasure the English proficiency of Chinese university undergraduates ma-joring in English Language and Literature and to examine whether thesestudents meet the required levels of English language abilities as speci-fied in the National College English Teaching Syllabus for English Ma-jors [Jin and Fan, 2011]. The vocabulary requirement for TEM−8 is 13000words.

In the experiment, the six articles were chosen to cover various domains,such as medical care, culture, technology, and everyday life. We character-ize three different types of error in each article as grammar errors, spellingerrors and logically misused words. Table 3.3 gives the statistics on thenumbers of errors in each type for all articles.

1http://en.wikipedia.org/wiki/College English Test

– 52 –

3.6.1 Experimental Design: Comparison of Task Decom-position Strategies

In the first experiment, we attempted to 1) verify the efficacy of vertical taskdecomposition in improving the quality of the task solution; and 2) show thesuperiority of vertical task decomposition strategy over the horizontal taskdecomposition strategy.

To these aims, we conducted two contrast groups of proofreading ex-periments. One is under the horizontal task decomposition strategy, wherethe proofreading task for the workers is to “read the English article, andcorrect the grammar errors, spelling errors and logically misused words asmany as possible”. In other words, the workers are instructed to use theirown judgment on which words to correct and how to correct. One article issupposed to be proofread independently by an individual worker. The arti-cle is presented in text format in multiple lines, and each line is endowedwith a line number. Workers are required to identify their corrections as

Line No., incorrect word / phrase→ correct word / phrase.

The other group is under the vertical task decomposition strategy, wherethe proofreading task is decomposed into three sequential subtasks, Find-Fix-Verify, as formally modeled in Section 3.3.1. Specifically, in the exper-iment, the Find task for the workers is to “read the English articles and iden-tify grammar errors, spelling errors and logically misused words as many aspossible”. Furthermore, the worker are notified of the following workingprocess as “you don’t need to provide any correction. Other workers willfix the errors you identified to finally make the articles correct”. The articleis also presented in text format in multiple lines, and each line is endowedwith a line number. Workers are required to identify each error as

Line No., incorrect word / phrase.

After the Find task is finished, Fix task will be announced on MTurk,with the potential errors identified in the Find task. It is worth notingthat, unlike the exact workflow proposed by Bernstein et al. in their work

– 53 –

Table 3.4: Information for experiment 1.

Experiment Workflow Articles Payment ($) (Bonus*)

Duration (min)

Qualifications Required


Find-Fix-Verify:

Granularity Levels: Phrase-wise Number of Corrections: 1 Voting Methods: Vote / Veto

Article 1, 2, 3

Find: 0.5 (0.1) 20

None

Fix: 0.5 (0.1) 20 Verify: 0.5 (0.1) 20

Article 4, 5, 6

Find: 0.5 (0.1) 20 Fix: 0.5 (0.1) 20

Verify: 0.5 (0.1) 20 Horizontal Task Decomposition One article for an individual. Article

1~ 6 0.5 (0.1) 20/Article

*When correctness ≥ 90%, bonus is awarded.

[Bernstein et al., 2010], the error correction in our Fix task is not con-strained on the identified errors in Find task (see Section 3.3.1). The Fixtask for workers is to “read the article and correct the identified grammar er-rors, spelling errors and logically misused words as many as possible. Some(but not all) possible mistakes are indicated in parentheses, if you think it isnot a mistake, please leave as it is”. Similarly, the worker are notified of thefollowing working process as “other workers will vote on the correctionsyou provide to finally make the articles correct”. Workers are required toidentify their corrections as

Line No., incorrect word / phrase→ correct word / phrase.

The last phase is the Vote task, where the corrections given in the Fixtask are presented to the workers in the above form, and they are asked to“read the article and examine the corrections”. Workers are required to showvote or veto on every correction as

If the correction is correct, tag ’O’; otherwise tag ’X’.

– 54 –

3.6.2 Experimental Design: First Subtasks with DifferentDifficulties in Vertical Task Decomposition

Explicit instructions on the optimal strategy (i.e., arrangement of subtaskswith different difficulties for final quality maximization) under vertical taskdecomposition situation from the task requester’s point of view are dis-cussed and proposed in Section 3.5. In this experiment, we applied themto the proofreading task decomposition, with the aim of verifying and revis-ing the instructions such that they are more correct when subjected to realapplications.

To this aim, we conducted two contrast groups of proofreading exper-iments. In both groups, proofreading task is vertically decomposed intothree sequential subtasks, as Find-Fix-Verify. As we discussed in Section3.3, when workers are required to identify the errors in phrase-wise, Findtask becomes more difficulty than identifying the errors in sentence-wise.In the experiment, Find task with low difficulty requires workers to identifythe errors in sentence-wise. When they find an error, they simply identify itas

Line No.

By contrast, Find task with high difficulty requires workers to identifythe errors in phrase-wise, as

Line No., incorrect word / phrase.

For both groups, Fix task and Verify task are the same as described inexperiment 1. Other information is summarized in Table 3.5.

3.7 Empirical Results

3.7.1 Experiment 1

Under horizontal task decomposition situation, 32 different workers pro-vided 32 proofreading results for six articles. Upon completion of a work

– 55 –

Table 3.5: Information for experiment 2.

Experiment Workflow Articles Payment ($) (Bonus*)

Duration (min)

Qualifications Required


Find-Fix-Verify: Granularity Levels: Phrase-wise Number of Corrections: 1 Voting Methods: Vote / Veto

Article 1~6

Find: 0.5 (0.1) 20

None

Fix: 0.5 (0.1) 20 Verify: 0.5 (0.1) 20

Find-Fix-Verify: Granularity Levels: Sentence-wise Number of Corrections: 1 Voting Methods: Vote / Veto

Article 1~6

Find: 0.5 (0.1) 20 Fix: 0.5 (0.1) 20

Verify: 0.5 (0.1) 20 *When correctness ≥ 90%, bonus is awarded.

item, the task requester can review the submitted results and approve orreject the payment on the work item. 8 workers out of the above 32 arerejected due to the poor quality (i.e., the correct rate is less than 10%). Theremaining 24 proofreading results are accepted, i.e., four proofreading re-sults for each article. Worker response was extremely fast, with 87.5% ofthe proofreading results received in the one hour after the task was posted,and the remaining ones received in the next one hour.

In Fig. 3.6, we illustrate statistics for proofreading results under hor-izontal task decomposition strategy, including the average number of thesubmitted edits, the average number of correct corrections from the submit-ted edits, and the average number of new errors that introduced during theproofreading. The very high numbers of the submitted edits for each arti-cle indicate that all the workers gave fairly good effort to proofreading thearticles (with an average number equaling to 8.54). Those submitted editscover well on the true errors in the articles, with an average correct rateequaling to 43.8% (i.e., averagely, 4.38 corrections for each article). Nev-ertheless, a substantial number of new errors are also introduced. Specifi-cally, the average number of new errors that introduced into each article is3.83, which is close to the average number of the corrections. Due to thenew introduced errors, although horizontal task decomposition strategy per-forms well at correcting errors, the final correct rate2 is bad, which equals

2 f inal correct rate = number o f correctionsnumber o f original error + number o f new errors ×100%

– 56 –

Article 1 Article 2 Article 3 Article 4 Article 5 Article 60

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Number of Submitted Edits

Number of Corrections

Number of New Errors

Figure 3.6: Results of proofreading experiment under horizontal task de-composition strategy.

to 31.5% in our experiments.Under vertical task decomposition strategy, we execute the proofread-

ing task for each article as Find-Fix-Verify procedure described in experi-mental design in Section 3.6.1. It is worth noting that, for each Verify task,three workers were employed to examine the submitted edits independently.Specifically, one edit wins a vote from a worker only if the worker judges itas correct. Moreover, when one edit wins at least two votes, it is kept as onecorrection3. 68 work items, including all the Find, Fix, and Verify tasks, areexecuted. Similarly, we rejected 24 of them due to the poor qualities (i.e.,the correct rate is less than 10%) and duplicating submissions by the sameworker. Finally, same as in horizontal task decomposition experiment, eacharticle received four proofreading results.

In Fig. 3.7, we illustrate statistics for proofreading results under verticaltask decomposition strategy, including the average number of the submittededits which are proposed by the Fix task, the average number of correctcorrections from the submitted edits, and the average number of new er-rors that introduced during the proofreading. In contrast to the horizontal

3Majority rule is applied.

– 57 –


1

2

3

4

5

6

7

8

9

10

11

12

13




Figure 3.7: Results of proofreading experiment under vertical task decom-position strategy where errors are identified in phrase-wise.

task decomposition situation, the number of submitted edits in vertical situ-ation is smaller (with an average number equaling to 7.50). However, thesesubmitted edits cover better on the true errors, with an average correct rateequaling to 57.1% (i.e., averagely, 5.71 corrections for each article). More-over, comparing to a substantial number of new errors (3.38), the averagenumber of new errors that introduced into each article by Fix task equals to0.92. Furthermore, the examination done by workers for Verify tasks leadsto a further decrease to the number of new errors. Averagely, only 0.58 newerror is introduced into each article. The final correct rate of proofreadingtask achieved under vertical task decomposition strategy is 54.1%.

Putting the above observations together leads us to the conclusion thatvertical task decomposition strategy outperforms horizontal task decompo-sition strategy in terms of final quality, which is consistent with the conclu-sion we obtained in Chapter 3.5.2.

– 58 –


1

2

3

4

5

6

7

8

9

10

11

12

13




Figure 3.8: Results of proofreading experiment under vertical task decom-position strategy where errors are identified in sentence-wise.

3.7.2 Experiment 2

Aiming at verifying and revising the instructions on vertical task decompo-sition strategy, in this experiment, we conduct the proofreading task startingwith sentence-wise Find task, where the workers are required to identify thelines that have errors. In contrast, sentence-wise Find task has lower diffi-culty than phrase-wise Find task as we do in experiment 1 where workersare required to identify the incorrect word or phrase. Same as experiment1, for each Verify task, three workers were employed to examine and voteon the submitted edits independently. Majority rule is applied to determinewhether the edit can be kept as a correction. 58 work items, including allthe Find, Fix, and Verify tasks, are executed. Similarly, we rejected 16 ofthem due to the poor qualities (i.e., the correct rate is less than 10%) andduplicating submissions by the same worker. Finally, in this experiment,each article received four proofreading results.

In Fig. 3.8, we illustrate statistics for proofreading results under verticaltask decomposition strategy, including the average number of the submittededits which are proposed by the Fix task, the average number of correctcorrections from the submitted edits, and the average number of new errors

– 59 –

Find Fix Verify50%

52%

54%

56%

58%C

orr

ect

Rat

e.

Phrase−wise Proofreading

Sentence−wise Proofreading

Figure 3.9: Correct rate comparison between phrase-wise and sentence-wiseproofreading.

that introduced during the proofreading. In Fix task, the number of sub-mitted edits is between those of phrase-wise proofreading and horizontalproofreading (with an average number equaling to 8.17). Although, thesesubmitted edits cover better on the true errors comparing to those in phrase-wise proofreading, with an average correct rate equaling to 59.6% (i.e., av-eragely, 5.96 corrections for each article), the average number of new errorsthat introduced into each article by Fix task is larger than that in phrase-wiseproofreading, which equals to 1.13. Finally, after the examination done byworkers for Verify tasks, the final correct rate of sentence-wise proofreadingtask achieved under vertical task decomposition strategy is 53.6%, which issmaller than that of phrase-wise proofreading (54.1%).

3.8 Summary

In this study we have formally presented and analyzed vertical and horizon-tal task decomposition models which respectively specify the relationshipbetween subtask quality and the worker’s effort level in the presence of

– 60 –

positive and none dependence among subtasks. Our focus is on address-ing the efficiency of task decomposition when the workers are paid equallyand contribution-based. We conducted simulations to analyze and comparethe efficiency of two task decompositions, aiming at generating explicit in-structions on strategies for optimal task decomposition. We conclude thatwhen the final quality is defined as the sum of the qualities of solutions toall decomposed subtasks, vertical task decomposition strategy outperformsthe horizontal one in improving the quality of the final solution, and fur-thermore give explicit instructions the optimal strategy under vertical taskdecomposition situation from the task requester’s point of view.

Furthermore, based on the previous studies and results, we design andconduct proofreading experiments on Amazon Mechanical Turk. The firstseries of experiments compare the efficacy of vertical decomposition andhorizontal decomposition on proofreading task. The second series of exper-iments focus on the vertical task decomposition, and compare the situationswhere the first subtasks with different difficulties. Our experiments ran fortwo months, and was highly visible on the MTurk crowdsourcing platform.According to the results we collect, we 1) verify the superiority of the ver-tical task decomposition strategy over the horizontal one on proofreadingtask; and 2) apply some of the our proposed instructions on vertically de-composing the proofreading task, which show promise on improving thequality of final outcome.

3.8.1 Discussion on Two Revenue Sharing Schemes

Nonetheless, we have noted that although group-based and contribution-based revenue sharing schemes have received most of attention, they aremerely used as reward distribution method in traditional companies, wherethe staff can get reward only after the final solution is worked out. Thisis not the case in crowdsourcing marketplaces, where rewards for tasks arepreviously determined and claimed by the task requesters, and workers getreward as soon as they accomplish their tasks and the solutions are quali-fied. However, task requesters can partially simulate these revenue sharing

– 61 –

schemes by offering bonus rewards or negative rewards [Chen et al., 2011].Formal studies on optimal task decomposition taking more parameters intoconsideration, such as the number of subtasks, and the experimental studiesthat utilize MTurk could be potential topics for future research.

Definition 3.6. Shirking is said to occur when the worker’s effort level fallsshort of that required for employer’s utility maximization, i.e., eE

i < eOi ,

where eEi is the worker’s optimal effort level for his utility maximization,

and eOi represents the optimal effort level from the employer’s standpoint.

Proposition 3.7. The group-based revenue sharing leads to shirking on thepart of workers in both horizontal task decomposition and vertical task de-composition.

Proof. For each referrer, the first-order condition for utility maximization isgiven by

1N

fei− cei = 0.

For the marketer, the first-order condition for utility maximization isgiven by

fei− cei = 0.

Let eEi and eO

i denote the referrer’s and the marketer’s optimal effortlevels respectively for referrer i. Therefore,

1N

fei(eEi )− cei(e

Ei ) = fei(e

Oi )− cei(e

Oi ) = 0.

Suppose eOi ≤ eE

i . Then cei(eOi )≤ cei(e

Ei ). Therefore,

fei(eOi )≤

1N

fei(eEi )< fei(e

Ei ), f or N > 1.

However,fei(e

Oi )≥ fei(e

Ei ), f or eO

i < eEi .

which leads to a contradiction. Therefore, eOi > eE

i .

– 62 –

Proposition 3.8. The contribution-based revenue sharing leads to shirkingon the part of workers in both horizontal task decomposition and verticaltask decomposition.

Proof. For each referrer, the first-order condition for utility maximization isgiven by

feiei · ei + fei− cei = 0.

For the marketer, the first-order condition for utility maximization isgiven by

fei− cei = 0.

Let eEi and eO

i denote the referrer’s and the marketer’s optimal effortlevels respectively for referrer i. Therefore,

feiei(eEi ) · eE

i + fei(eEi )− cei(e

Ei ) = fei(e

Oi )− cei(e

Oi )

Suppose eOi ≤ eE

i . Then fei(eOi )≥ fei(e

Ei ). Therefore,

cei(eOi )≥ cei(e

Ei )− feiei(e

Ei ) · eE

i

⇒ cei(eOi )> cei(e

Ei ).

However,cei(e

Oi )≤ cei(e

Ei ), f or eO

i ≤ eEi ,

which leads to a contradiction. Therefore, eEi < eO

i .

According to the analysis, shirking problem exists in both group-basedrevenue sharing scheme and contribution-based revenue sharing scheme,which means that by dividing rewards equally or according to the marginalcontribution among the workers, the individual effort level always falls shortof that required for maximum employer payoff. It implies that neither of theabove two revenue sharing schemes is effective for encouraging the workersto exert high level of efforts from the employer’s point of view.

– 63 –

Chapter 4

MDP-Based Reward Design forEfficient Cooperative ProblemSolving

4.1 Introduction

Crowdsourcing is now admired as one of the most lucrative paradigm ofleveraging collective intelligence to carry out a wide variety of tasks withvarious complexity. When a complex task is beyond the capability of un-skilled individuals (crowds) on their own to deal with, task requester (orprincipal) decomposes tasks into smaller pieces of subtasks, which becomelow in complexity, and can be completed by individual workers (or agents).After the task decomposition, task requesters then present the subtasks to theworkers with associated rewards through crowdsourcing websites, such asAmazon Mechanical Turk (MTurk) [Ipeirotis, 2010], seeking to maximizethe quality of the task’s final solution.

On the other hand, the recruited workers are organized to perform thesubtasks cooperatively, which makes the subtask’s quality depends on ef-forts of multiple workers who jointly produce the output. Particularly, incrowdsourcing, those individuals usually cooperate with each other to gen-

– 65 –

erate a portion of a satisfactory solution in sequential workflows. A typicalexample is Soylent [Bernstein et al., 2010], which first contributes a three-phase workflow, Find-Fix-Verify for crowdsourcing-based text editing task.Specifically, take proofreading for example, in Find phase, workers takean article with spelling and grammar errors as input and try to identify thepatches that may need corrections. Workers in Fix phase, take the patches asinput and provide alternative corrections. Finally, in Verify phase, workersaccept or reject the corrections to improve the quality of proofreading re-sults. By this way, these Find, Fix and Verify subtasks are chained togetherinto positively and sequentially interdependent subtask units. The strikingcharacteristic of this solving process is that each subtask takes the outputform the previous subtask as input, which makes the sequential coopera-tion assumption that the following workers can do better based on qual-ity solution provided by previous workers hold (see [Kulkarni et al., 2011][Kulkarni et al., 2012]). It is worth noting that the rewards for each subtaskis required to be announced before subtask execution and paid to the workerwho performs it once the subtask is finished.

A significant challenge in the above-mentioned crowdsourcing-basedcomplex task solving has been designing an incentive scheme for the taskrequester to maximize expected output quality subject to the self-interestedindividual worker’s utility maximization, taking into account aspects of se-quential quality dependence among subtasks. This is particularly difficultwhen task requesters face workers with unknown, heterogeneous abilitiesand have a limited budget to spend on the whole complex task. Giventhese challenges, existing approaches typically provide task requesters withsimple rules of thumb for budgeting the tasks by looking at the histori-cal data of the crowdsourcing marketplaces. By comparing the nature ofthe tasks, the historical activity levels of the workers and the qualities ofoutcomes, task requesters struggle to distribute appropriate rewards amongtheir tasks, to hope for the best possible task solution [Faradani et al., 2011][Ipeirotis, 2010]. However, without formal models, improper budget allo-cation commonly leads to task starvation or inefficient use of capital.

Against this background, this study addresses the challenge of allocat-

– 66 –

ing limited budget to multiple subtasks which are sequentially interdepen-dent with each other. We first formalize a general model of such crowd-sourcing subtasks by specifying the relationship between subtask qualityand the worker’s effort level, and illustrate sequential cooperation by for-malizing how sequential subtasks explicitly depend on each other. Then wepose the problem of maximizing the quality of the final solution with limitedbudget. The key challenge in this problem is to allocate the limited budgetamong the sequential subtasks. This is particularly difficult in crowdsourc-ing marketplaces where the task requester posts subtasks with their pay-ments sequentially and pays the predetermined payment to the worker oncethe subtask is finished. Markov Decision Processes (MDPs) are capable ofcapturing the task-solving process’s uncertainty about the real abilities of asuccession of workers, that is, the uncertainty about the qualities of the se-quential subtasks. To the best of our knowledge, our study is the first attemptto apply MDPs to payment allocation for sequentially dependent subtasksin crowdsourcing. By modeling this problem as MDPs, we show the valueof MDPs for optimizing the quality of the final solution by dynamically de-termining the budget allocation on sequential and dependent subtasks underthe budget constrains. Our simulation-based approach verifies that compar-ing to some benchmark approaches, our MDP-based payment planning ismore efficient on optimizing the final quality under the same budget con-straints.

The rest of this chapter is organized as follows. Section 4.2 summariesthe related work, and Section 4.3 describes our model and research problem.We detail in Section 4.4 and Section 4.5 how decision-theoretic techniquescan be applied to define and solve our problem. Section 4.6 shows the supe-riority of our MDP-based approach for payment planning over some bench-mark approaches. Finally, we summary our conclusions and propose futurework in Section 4.7.

– 67 –

4.2 Related Work

Workers in crowdsourcing marketplaces are motivated by a variety of in-centives. In this work, we particularly focus on utilizing the financialstrategies as the incentive to improve the quality of crowd work. Workby [Ho and Vaughan, 2012] first formalized the online task assignmentproblem, which addresses the challenge that a single task requester faceswhen assigning heterogeneous tasks to workers with unknown, heteroge-neous abilities, taking into account aspects of task quality and budget con-straints. Like other works (see [Azaria et al., 2012] [Karger et al., 2014][Tran-Thanh et al., 2014b] [Tran-Thanh et al., 2013] ), it deal with the inde-pendent subtasks, and does not take into account the sequential quality de-pendence among subtasks. BudgetFix [Tran-Thanh et al., 2014a] is the firstwork which formalizes the Find-Fix-Verify process, and propose the algo-rithm that guarantees the quality of the task at a minimal cost by determiningthe number of interdependent subtasks and the price to pay for each subtaskgiven budget constraints. However, it uses a non-adaptive scheme whichmakes all payment assignments before any worker arrives, and thus, is notsuitable for our settings where the payment on each subtask is determineddynamically during the process of the task solving. Decision-theoretic tech-niques shows the value on dynamic decision in crowdsourcing. Work by[Dai et al., 2013] designs AI agents that use Partially-Observable MarkovDecision Process (POMDPs) for optimizing workflows used in crowdsourc-ing and obtains excellent cost-quality tradeoffs. However, like TurKit, itexplores iterative workflows for complex tasks solving, which is also notsuitable for our settings.

4.3 The Model and Problem Statement

At a broad level, the task requester first decomposes the complex task intosmaller subtasks. These subtasks are supposed to be positively dependentwith each other in a sequential way. Specifically, it requires a sequential

– 68 –

task solving process where one worker’s output is used as the starting pointfor the following worker. Furthermore, any increase in the efforts from theprevious workers also leads to an improvement on the quality of the currentsubtask. These subtasks are then presented to the workers with predeter-mined payments through crowdsourcing websites, seeking to maximize thequality of the task’s solution (under a certain limited budget), which is posi-tively related to the efforts devoted by all the workers. The workers performthe subtasks sequentially, and each of them is self-interested and devotes anamount of effort to her own subtask for individual utility maximization.

4.3.1 Player-Specific Ability

One striking feature of most crowdsourcing marketplaces is that workerswho are available to the task can come from a diverse pool, and they are freeto choose to complete the tasks which interest them. The workers can comefrom a diverse pool including genuine experts, novices, and at the extreme,“spammers”, who submit arbitrary answers independent of the question inorder to collect their fee. The task requesters do not have control over theability of the workers. As in MTurk, workers remain anonymous to taskrequesters, and all payment occurs through MTurk platform. However, sincetask requesters are able to accept submitted solutions or reject those whichdoes not meet their standards, workers on MTurk are only paid if a taskrequester accepts their work. Based on this regulation, we assume that eachworker is diligent and self-interested. That is, the malicious behavior ofworker’s task execution is outside of the scope of this research, and undera given reward system, each worker selects an effort level (at a cost thatdepends on her ability) such that her utility is maximized.

Associated with each worker i from the huge pool of potential laborforce is a vector of abilities ~vi = (vi1,vi2, · · · ,viN ,), where vin (n ≤ N, andN is the number of the decomposed subtasks), represents worker i’s abil-ity to do subtask n. We suppose that the ability vector for each workeris drawn from a continuous joint probability distribution (0,1]N , 1 repre-sents the highest ability. The ability vectors for different workers are drawn

– 69 –

independently from each other, and the distribution is known to the task re-quester (or principal), but the ability vector ~vi is known only to worker ihimself. For simplicity, we shall assume in this research that each worker isendowed with the ability that applies across all subtasks. More concretely,for each worker i, the skill vector ~vi is simplified as (vi,vi, · · · ,vi), where vi

is drawn from a continuous distribution F(vi) identically and independentlyof the abilities of other workers over (0,1].

In order to solve the subtasks, workers contribute their efforts, such astime and resources, to their subtasks respectively. The amount of effortexerted by worker i to subtask i is characterized by ei. It is the worker’sproblem to decide the optimal amount of effort she exerts in order to max-imize her utility. After exerting ei units of effort, and thereby incurring thecost of C(ei), worker i receives a payment, which is positively related to theamount of her effort, denoted as mi.

Definition 4.1 (Worker’s utility function). For subtask i, the worker’s util-ity, u(ei), is the individual reward mi minus her cost, i.e.

u(ei) = mi−C(ei), (4.1)

where C(ei)> 0 is the total cost of effort ei.

In our setting, a worker’s ability vi can be interpreted as theamount of effort she is able to exert per unit cost (denoted as costi)[DiPalantino and Vojnovic, 2009]. Therefore, the utility of worker i can beexpressed by

u(ei) = mi− ei · costi

= mi−ei

vi.

(4.2)

In our model, all workers are supposed to be diligent, so they solve theirown subtasks to the best of their efforts e∗, where

e∗i (vi) = mi · vi. (4.3)

– 70 –

Sequential payment decisions introduce new subtlety into the paymentplanning problem. That is, each payment decision now have expected valueinstead of certain value because of the uncertainty about the future workers’abilities.

4.3.2 Value of Accomplishing A New Subtask

We are now focus on the sequential task solving process where the subtasksare carried out sequentially one at a time. Consider the situation where theoriginal task is vertically decomposed into N (N > 1) sequential subtaskunits. N workers contribute their efforts, such as time and resources, to Nsubtasks respectively. The amount of effort exerted by worker i is charac-terized by ei ∈ [0,1]. As we discussed in Chapter 3, the quality of the finalsolution can be viewed as the cumulative qualities obtained from all sub-tasks. Furthermore, without loss of generality, we suppose that the benefitof the task requester from the task solution and the quality of the final solu-tion are numerically equal. Formally, the benefit function of task requesteris defined as follows.

Definition 4.2 (Benefit function of the task requester). For the the task re-quester, the benefit (U) from the final task solution is the cumulative quali-ties obtained from all the N decomposed subtasks.

U(eN) = ∑Ni=1 f i(ei) (4.4)

where eN = (e1, · · · ,eN) is the effort vector consists of the efforts exertedby all workers on their own subtasks, and f i(ei) is the quality function ofsubtask i.

Remark 4.1. It is worthwhile to note that since all the subtasks are essentialsubdivisions of the original complex task, none of them is neither replace-able nor dispensable. Hence, it is reasonable to assume that, only if all thesubtasks are accomplished, is the original complex task regarded as accom-

– 71 –

plished and the benefit of the task requester becomes positive. Formally,

U(eN′) = ∑N′

i=1 f i(ei) = 0, ∀ N′ < N. (4.5)

where N′ are integers representing for any subtask except for the last one.

Quality functions of the subtasks are thoroughly discussed in Section3.3. Attempting to make each chapter self-contained insofar as possible,in the following, we summarize the striking properties of subtask qualityfunctions, and provide necessary explanations in an intuitive way.

− Increasing and concave. The quality of subtask i is positively relatedto the amount of effort (ei) that exerted by the corresponding worker i,and since the time and resources processed by individuals are limited,the marginal contribution (∂ f i/∂ei) on the subtask quality of eachexerted effort unit is decreasing [Holmstrom and Milgrom, 1991].These endow the subtask quality functions with two basic properties,as increasing and concave, i.e., the subtask’s quality increases withthe worker’s effort at a decreasing rate (see Eq. (3.3)).

− Positive quality dependence. The positive dependence of subtaskqualities exhibits in two aspects. 1) Since each subtask takes the out-put from the previous subtask as input, the quality of subtask i’s so-lution depends not only on the effort exerted by worker i, but also onthe quality of previous subtask’s solution. Specifically, any increase inthe efforts from the previous workers also leads to an improvement onthe quality of the current subtask (see Eq. (3.2)). 2) For each subtask,it has an intrinsic difficulty. High difficulty indicates low marginalcontribution based on the same level of effort. The difficulty of thecurrent subtask decreases as the efforts on previous subtasks increase(see Eq. (3.5)).

In this research, we apply the Cobb-Douglas production function as theform of our subtask quality function. Cobb-Douglas production function is

– 72 –

a widely used form for studying important topic for researchers to calcu-late the contribution rate of various influential factors to the final efficiencygrowth, such as the economic growth under a Cobb-Douglas productionfunction model [Cheng and Han, 2013]. The general form of Cobb-Douglasproduction function is expressed as

Y = AXβ11 Xβ2

2 · · ·XβNN ,

where Xi (i = 1,2, · · · ,N) denotes the input of the ith factor and Y denotesthe output; βi (i = 1,2, · · · ,N) is the output elasticity of the factor Xi, and Adenotes the level of the technical progress.

We now map all the elements in the Cobb-Douglas production functioninto the sequential task solving process to cast light upon the reasonability ofchoosing the Cobb-Douglas production function as the form of our subtaskquality function. First of all, the input of the ith factor Xi is the effort exertedby worker i on ith subtask. For example, for subtask k (k = 1,2, · · · ,N), thequality function in Cobb-Douglas form is f k = Aeβ1

1 eβ22 · · ·e

βkk . This can be

equivalently rewritten in a recursive way as f k = f k−1 · eβkk , which indicates

the positive quality dependence among the sequential subtasks. Secondly,the output elasticity of the factor Xi can be represented by the intrinsic dif-ficulty of subtask i which is decided at the task decomposition stage anddenoted as ωi for subtask i. Once we assume normalization and constraints(ωi ∈ (0,1) and ∑

Ni=1 ωi = 1), the implementation of the Cobb-Douglas pro-

duction function, i.e., for subtask k, f k = Aeω11 eω2

2 · · ·eωkk , satisfies the in-

creasing and concave property. And finally, since the difficulty of the firstsubtask cannot be modified at the very beginning, and for analysis simplic-ity, we set the value of A equals to 1. Formally, we define the subtask qualityfunction in the following form.

Definition 4.3. For subtask i, the quality function ( f i) is

f i(e1, · · · ,ei)=i

∏k=1

eωkk (4.6)

– 73 –

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Effort by worker i (ei )

Qual

ity o

f su

bta

sk i

(f

i (e

i ))

ω i =0.2

ω i =0.4

ω i =0.6

ω i =0.8

(a) Subtask difficulties

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Effort by worker i (ei )

Qual

ity o

f su

bta

sk i

(f

i (e

i ))

f i−1

=0.2

f i−1

=0.4

f i−1

=0.6

f i−1

=0.8

(b) Positive quality dependence

Figure 4.1: (a) illustrates the subtask qualities given the same ability ofworkers under various subtask difficulties. With a fixed effort, the lowerthe difficulty value (ωi), the higher the quality of the subtask ( f i(ei)). (b)illustrates the positive dependence among subtask qualities. With a fixedeffort, higher quality of the previous subtask ( f i−1) enables the successivesubtask to achieve higher quality ( f i).

where ωi ∈ (0,1) is the difficulty of subtask i, and ∑Ni=1 ωi = 1.

Remark 4.2. Eq. (4.6) satisfies the requirements in Definition 3.2, wherehigh value of ωi corresponds to high difficulty and indicates low marginalcontribution based on the same amount of effort. Fig. 4.1(a) illustrates thequalities of subtasks with various subtask difficulties. For a fixed level ofeffort, lower difficulty (i.e., lower value of ωi) leads to higher quality. Fur-thermore, Fig. 4.1(b) gives the illustration on the positive quality depen-dence among two successive subtasks, when subtask i− 1 comes up witha higher-quality solution, the difficulty of subtask i becomes lower. It isworth noting that the constraints on the difficulty coefficient indicate thatany subtask is neither replaceable (i.e., ω > 0) nor dispensable (i.e., ω < 1),and only if all the subtasks are accomplished, is the original task regardedas accomplished.

– 74 –

We now formalize the problem addressed in this study. The task re-quester, who decomposes a complex task into multiple small dependent sub-tasks and post them on Crowdsourcing system sequentially, can be viewedas a principal agent. The requester has a budget M =∑

Ni=1 mi > 0, and a ben-

efit function (Eq. (4.4)). Due to the low complexity, the small subtasks canbe performed by individual workers. As we assumed in the above model,each subtask performed by a worker has a value, which equals to the sub-task quality, thus the requester’s objective is to maximize the final quality ofthe complex task subject to a fixed budget constraint. More formally, the re-quester’s utility maximization problem can be represented as the followingmathematical programming formulation:

max U(e1,e2, · · · ,eN) where ei = mi · vi,

s.t.

m1 +m2 + · · ·+mN = M

mi > 0, i = 1, · · · ,N

(4.7)

4.4 MDP-based Payment Planning

4.4.1 Markov Decision Processes

We are interested in online crowdsourcing settings, where the task re-questers are required to post tasks along with specified payments based ona expected quality of the task solution. Individual workers can then elect toperform any of these tasks, and receive the corresponding payments as soonas they successfully complete the tasks. Due to the uncertainty of workers’abilities, the task requester confronts the problem of deciding the paymentfor the task / subtask before she observes both the ability of the worker andthe quality of the task solution. Moreover, in a budget-constrained sequen-tial subtask solving process, this uncertainty brings even severe difficulty inpayment allocation planning.

The sequential payment decision problem induces a Markov Decision

– 75 –

Process (MDP), which is able to represent the uncertainty about the realabilities of a succession of workers through the sequential task solving pro-cess, and when solved can be used to implement the efficient payment allo-cation to guarantee the quality of the final solution to the original task.

Definition 4.4 ([Puterman, 2009]). An MDP is defined as a four-tuple〈S,A,P,R〉, where

− S is a finite set of discrete states.− A is a finite set of all actions.− P: S×A×S→ [0,1] (Transition probabilities):

For each state-action pair (s,a), a next-state distribution Pa(s′|s) thatspecifies the probability of transition to each state s′ upon executionof action a from state s.

− R: S×A→ R (Reward distributions):For each state-action pair (s,a), a distribution R(s,a) on real-valuedreward for taking action a in state s.

Markov Decision Process is commonly used to formulate full-observable decision making under uncertainty problems [Kolobov, 2012].Putting incentives to one side for now, the MDP execution is described asfollows. An agent executes its action in discrete steps starting from someinitial state. At each step, the system is at one distinct state s ∈ S. Theagent can execute any action a from a set of action A, and receives a re-ward R(s,a). The action takes the system to a new state s′ stochastically.The transition process exhibits the Markov property, i.e., the new state s′ isindependent of all previous states given the current state s. The transitionprobability is defined by Pa(s′|s). The model assumes full observability, i.e.,after executing an action and transitioning stochastically to a next state asgoverned by P, the agent has full knowledge of the state.

– 76 –

4.4.2 Payment Planning in MDPs

In our setup during the sequential task solving, the agent’s (i.e., the taskrequester’s) payment planning problem is defined as follows. As input theagent is given a set of subtasks, and the agent is asked to return a final solu-tion with the highest quality. In pursuit of this goal, the agent is allowed todetermines payments for each subtasks within the limited budget. We nowmodel the agent’s sequential payment decision problem as a MDP. To dothis, we first define a generative model for the subtask execution by crowd-sourcing workers of various uncontrolled abilities.

We assume workers are self-interested, so they exert their efforts ontheir own subtasks to the best of their abilities to maximize their individ-ual utilities. The quality of each subtask increases as the effort gets larger.Moreover, the quality of the current subtask increases as the quality of theprevious subtask’s solution becomes higher. Specifically, according to Eq.(4.6), given the previous subtask’s solution with quality f i−1, the quality ofcurrent subtask i with difficulty ωi is governed by the following probabilisticequation:

Pmi( f i | f i−1) = Pmi( f i = f i−1 · eiωi) (4.8)

Furthermore, when subtask i is performed by worker i with ability vi anda predetermined payment mi is provided, the above transition probabilityfunction (Eq. (4.8)) is specified according to the optimal effort of worker ithat given by Eq. (4.3) as

Pmi( f i | f i−1) = Pmi( f i = f i−1 · (mivi)ωi)

= F(vi)(4.9)

where F(vi) is the distribution function from which vi drawn independentlyof the abilities of other workers.

We now formally define our sequential payment allocation decisionproblem as planning problem in MDPs.

Definition 4.5. The MDP for our sequential payment allocation problem

– 77 –

with a limited budget M is a four-tuple 〈S,A,P,R〉, where

− S= {(i,U(ei−1),Mi)}. These three elements involving in the state are1. i ∈ 1, · · · ,N, is the subtask which is on turn to be executed by

worker i;

2. U(ei−1) = ∑i−1k=1 f k(ei), is the quality of the partially completed

task, i.e., the accumulative quality of subtask 1 to i−1;

3. Mi =M−∑i−1k=1 mk, represents the remaining budget available for

the task requester for the rest of subtasks.− A= {set payment mi | mi ∈ (0,Mi]}, mi is the accessible payment for

performing subtask i.− P= {Pmi( f i | f i−1) = F(vi)}.− R= { f i | f i > 0} f i is the quality of subtask i.

4.4.3 Continuous States and Discretization

As we see in Definition 4.5, the MDP for online payment planning problemleads unavoidably to a large state space, due to the following two intrinsicreasons. The first is the continuous action space, since the requester canassign an arbitrary amount of payment (mi ≤ M) to subtask i. The secondis the continuous action-state space. In the payment planning settings, byassigning payment mi is incapable of guaranteeing the quality of the subtaskf i, since it also depends on the worker’s ability vi according to Eq. (4.6) andEq. (4.3), which is drawn from a continuous probability distribution F(vi)

over (0,1]. The infinite number of actions and possible abilities of workersmakes S= R2N an infinite set of states.

In order to solve the continuous-state MDP, we first discretize both ac-tion space and state space. By discretizing the budget into M(> N) units,the payment for each subtask is restricted to integer set {1, · · · ,M}. Fur-thermore, we maintain the assumption that the values of abilities are inde-pendently distributed from F(v), but divide the them into L(> 1) abilitylevels. We can then approximate the continuous-space MDP via a discrete-state one

⟨S,A,{Psa},R

⟩, where S = (M×L)N is the set of discrete states,

– 78 –

1s

1 1m

1 2m

1 km

2 1m

2 2m

2 km…

…

2 1m

2s

Subtask 1

3s1v

2v

Subtask 2 Final solution

(1,0, )M

111 1 11(2, ( ) , )m v M mω −

11 21 11(2, ( ) , )m v M mω −

…Payment 1 Payment 2

Figure 4.2: Decision tree representation of MDP-based payment planningwith discretized state space for a two-subtask solving process.

{Psa} are our state transition probabilities over the discrete states and ac-tions. When our actual payment planning is in some continuous-valuedstate and we need to decide a payment for the subtask, we compute the cor-responding discretized state s, and execute action π∗(s) according to Eq.(4.14).

In Fig. 4.2 we visualize the MDP-based payment planning model withdiscretized state space for a two-subtask solving process (i.e., N = 2). It is atwo-step horizon decision tree. From the starting point s1 there are k(< M)

levels of payments, m11, · · · ,m1k. The payment chosen gives probabilitiesof moving to the next states. The rewards for moving to the next statesdepend on the abilities of the workers, which are discretized into two levels(i.e., L = 2). Notice that, since there are only two subtasks, for subtask 2,the payments access only one choice m2k and each choice is constrained bym1i +m2i = M, where i = 1, · · · ,k.

– 79 –

4.5 Planning Algorithm

4.5.1 Solving an MDP

Methods for solving an MDP include value iteration [Bellman, 1956], pol-icy iteration, and linear programming [Bertsekas et al., 1995] [Puterman, 2009].Any solution to an MDP problem is in the form of a policy.

Definition 4.6. ([Dai et al., 2013]) A policy, π : S → A of an MDP is amapping from the stage space to the action space.

A policy is static if the action taken at each state is the same at everytime step. π(s) indicates which action to execute when the system is at states. Solving the planning problem in MDPs consists in computing an optimalpolicy (π∗: S→ A), which is a probabilistic execution plan that associateswith any state s ∈ S an optimal action π∗(s) and achieves the maximumexpected sum of rewards. The policy π is evaluated by its value function:

V π(s) = R(s,π(s))+ γ ∑s′∈S

Pπ(s)(s′|s)V π(s′). (4.10)

where γ ∈ [0,1) called the discount factor, which makes the model dis-counted MDPs. Any optimal policy’s value function satisfies the followingBellman equations:

V ∗(s) = maxa∈A

[R(s,a)+ γ ∑

s′∈SPa(s′|s)V ∗(s′)

]. (4.11)

The corresponding optimal policy can be extracted from the Bellman equa-tion:

π∗(s) = argmaxa∈A

[R(s,a)+ γ ∑

s′∈SPa(s′|s)V ∗(s′)

]. (4.12)

Given an implicit optimal policy π∗ in the form of its optimal value

– 80 –

function V ∗(s), we measure the superiority of an action by a Q-function:S×A→ R.

Definition 4.7. ([Dai et al., 2013]) The Q∗-value of a state-action pair (s,a)is the value of state s, if an immediate action a is performed, followed by π∗

afterwards. More concretely.

Q∗(s,a) = R(s,a)+ γ ∑s′∈S

Pa(s′|s)V ∗(s′). (4.13)

Therefore, the optimal value function can be expressed by:

V ∗(s) = argmaxa∈AQ∗(s,a), for all s ∈ S. (4.14)

4.5.2 UCT algorithm

Although we simplied the continuous-state payment planning problem bydiscretizing both action space and state space, as we can see, in the planningproblem settings, the state space is still large, and grows exponentially withthe number of subtasks (S= (M×L)N).

UCT [Kocsis and Szepesvari, 2006] is a planning algorithm that cansuccessfully cope even with an exponentially sized states. It has achievedremarkable success in the challenging game of Go [Gelly et al., 2006]. Ex-ponential states plague other MDP solvers ultimately because these solversattempt to enumerate the domain of the transition function for various state-action pairs, e.g., when summing over the transition probabilities or whengenerating action outcomes to build determinations. As a consequence, theyneed the transition function to be explicitly known and efficiently enumer-able. Instead, being a Monte Carlo planning technique, UCT only needsto be able to efficiently sample from the transition function and the costfunction.

Based on the multi-armed algorithm UCB [Auer et al., 2002], UCT ef-fectively selects an action in each state based on a combination of two char-acteristics — an approximation of the action’s Q−value (Q(s,a)) and a mea-

– 81 –

Algorithm 1: PaymentPlanningData: initial state sResult: optimal payment a

1 repeat2 Search(initial state)3 until timeout4 return bestPayments

sure of how well-explored the action in this state is (C√

ln(ns)ns,a

). More specif-ically, UCT selects the action that maximizes an upper confidence bound onthe action value,

a∗ = argmina∈A

{Q(s,a)+C

√ln(ns)

ns,a

}(4.15)

where Q(s,a) is the estimated value of action a in state s, ns is the numberof times state s has been visited, and ns,a is the number of times action a wasselected when state s has been visited. Clearly, for each s, ns = ∑a∈A ns,a.Details are showed in Algorithm 1.

4.6 Efficiency Analysis

4.6.1 Settings

We explore the efficiency of the MDP-based payment planning in the set-tings giving considerations to the conclusions obtained from numerous em-pirical and theoretical studies of crowdsourcing task solving and user be-havior. Parameters used in simulations are summarized in Table 4.1.

– 82 –

Function Search(state s)Result: optimal reward Q

1 if state == terminal then2 return 03 end if4 if s has not been visited before then5 ns← 06 for ∀a ∈A do7 ns,a← 08 Q(s,a)← 09 end for

10 end if11 if random number <0.01 then12 a′ = random action(s)13 if there is an untried action in s then14 a′ = random action o f untried(s)15 else16 a′ = argmina∈A

{Q(s,a)+C

√ln(ns)

ns,a

}17 end if18 end if19 /* State transition from s to s′ based on

Eq.(4.9) */20 s′← simulate action a′ in state s

21 ns← ns +122 ns,a← ns,a +123 Q(s,a)← Q(s,a)+ γ Search(state s′)24 return Q(s,a)

– 83 –

Table 4.1: Summary of parameters

Parameter Symbol ValueNo. of Ability Levels L 5No. of Subtasks N 5No. of Budget Units M 10

Normal ability distribution

We consider the ability distribution function based on the conclusions pro-vided by the empirical user studies in crowdsourcing marketplaces. Ac-cording to both studies on crowdsourcing market for general-purpose tasks,MTurk [Ipeirotis, 2010] and a crowdsourcing portal with strong market ori-entation, TopCoder [Archak, 2010], we assume the ability distribution func-tion have a concave cumulative distribution (F(v)) and in a normal distribu-tion form with mean µ and variance σ2 > 0, as follows.

F(v) = Φ((v−µ)/σ) (4.16)

where Φ(x) = 1√2x

∫ x−∞

e−(t2/2)dt. The normal ability distribution indicates

that there are more workers with relatively low abilities.

Discretized state space

Online reputation system is a key component of virtually all crowdsourcingwebsite. [Archak, 2010] provides the evidence that worker’s reputation isa significant indicator of his ability. Reputation system collects and aggre-gates the historical records of workers’ behavior to form their reputation.For example, users on TopCoder are distinguished into five groups accord-ing to their ratings. Without loss of generality, we assume the number ofability level to be 5 (L = 5). Furthermore, as discussed in subsection 4.4.3,we discretize the budget into M =10 units.

– 84 –

Subtask types

Crowdsourcing applications which apply sequential task solving processand exhibit outstanding effectiveness, typically have a small set of sub-task types. For example, CrowdForge provides a high-level structure fortask decomposition which generally consists of three types of subtasks[Kittur et al., 2011]. Soylent also contributes a specific three-stage pattern,Find-Fix-Verify, to text editing services [Bernstein et al., 2010]. Further-more, this three-stage pattern is attested to be effective for not only wordprocessing, but also photo analysis application [Noronha et al., 2011]. Bylooking at these empirical studies, we consider a five-subtask (N = 5) situ-ation in simulation, considering the task requester may prefer to additionalstages for quality control.

Difficulty combinations

By task decomposition, task requester decomposes the complex task intosubtasks of various difficulty combinations. Since these subtasks representelementary units composed in the original task, their difficulties are deter-mined intrinsically by the characteristic of the task. Without loss of gener-ality, we explore MDP-based payments in three difficulty combinations as1) average difficulties, which gives all subtasks with the same difficulties;2) increasing difficulties, which starts with an easy subtask and leaves themost difficult one to the last stage; 3) decreasing difficulties, which on thecontrary, starts with the most difficult subtask.

Benchmark approaches

Consider an (unrealistic) offline payment planning algorithm with completeaccess to the true abilities of a succession of workers. The maximal finalquality in this setting can be achieved by constrained optimization algo-rithm. We denote this benchmark by OPT-Quality, i.e., the optimal pay-ment quality. Other ways to predetermine the payments for all subtasks areto divide the total payments proportionally to subtasks’ difficulties, or the

– 85 –

1 2 3 4 5

0.4

0.5

0.6

0.7

0.8

0.9

1

Subtask number (n)

Tas

k qu

ality

OPT−QualityMDP−QualityA−QualityD−Quality

Figure 4.3: Efficiency comparison for average difficulties.

abilities of workers, which we denote with D-Quality, i.e., difficulty-basedpayment quality, and A-Quality, i.e., ability-based payment quality, respec-tively.

4.6.2 Simulation results

In Fig. 4.3, 4.4 and 4.5, we compare the efficiency of MDP-based paymentplanning with three benchmark approaches, for three difficulty combina-tions. We generate 20 different sets of data for each of the situations, andillustrate the average final qualities respectively. The constant factor C inEq. (4.15)) equals 0.06, which is determined by trial and error method andgenerates relatively high convergence speed. Last, we examine three diffi-culty combinations, and for each combination, we test 20 sets of randomlygenerated values, and for each test set we run at least to average 3000 trialswith UCT algorithm. We finally demonstrate in Fig. 4.3, 4.4 and 4.5, thefinal quality that averages these 20 data sets. We now present and discussthe findings from our simulation.

As shown in Fig. 4.3, 4.4 and 4.5, generally, when our approach (MDP-

– 86 –

1 2 3 4 5

0.4

0.5

0.6

0.7

0.8

0.9

1

Subtask number (n)

Tas

k qu

ality


Figure 4.4: Efficiency comparison for increasing difficulties.

Quality) is applied, it allows the task’s final quality to approximate the opti-mal quality brought by the offline optimal payments (OPT-Quality). For alldifficulty combinations, Fig. 4.3, 4.4 and 4.5, illustrate clearly that for twoof the offline payment planning approaches, ability-based payment planningalways outperforms difficulty-based payment planning, and the latter one isan inferior approach to all the others. This indicates that the abilities ofworkers play more significant role in determining the qualities of the finalsolutions than the difficulties of subtasks.

We can see in Fig. 4.3, 4.4 and 4.5, that MDP-based payments alwaysgenerate relatively high qualities of the first two sequential subtasks. Espe-cially, for average and increasing difficulty combination situations (see Fig.4.3 and Fig. 4.4), the qualities of the first two subtasks exceed the corre-sponding qualities brought about by the optimal payments. It indicates thatsince in the sequential subtask solving model the first subtask’s quality iscrucial to the final quality, the MDP-based payment planning tends to al-locate a relatively high payment to the first subtask. This strategy workswell under average and decreasing difficulty combinations (see Fig. 4.3 andFig. 4.5). However, when the difficulties of the subtasks are increasing, this

– 87 –

1 2 3 4 50.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Subtask number (n)

Tas

k qu

ality


Figure 4.5: Efficiency comparison for decreasing difficulties.

strategy which allocates high payments in the beginning, leaves inadequatebudget for maintaining quality growth rate when facing more and more dif-ficult subtasks, as shown in Fig. 4.4.

4.7 Summary

This study addresses the challenge of allocating limited budget to multi-ple subtasks which are sequentially interdependent with each other. Ourfirst contribution is we formalize a general model of such crowdsourcingsubtasks by specifying the relationship between subtask quality and theworker’s effort level and how sequential subtasks explicitly depend on eachother. Then we pose the problem of maximizing the quality of the finalsolution with limited budget. A key challenge in this problem is to allo-cate the limited budget among the sequential subtasks. This is particularlydifficult in crowdsourcing marketplaces where the task requester posts sub-tasks with their payments sequentially and pays the predetermined paymentto the worker once the subtask is finished. By modeling this problem asMarkov Decision Processes (MDPs) which capture the task-solving pro-

– 88 –

cess’s uncertainty about the real abilities of a succession of workers, that is,the uncertainty about the qualities of the sequential subtasks, we show thevalue of MDPs for the problem of optimizing the quality of the final solutionby dynamically determining the budget allocation on sequential dependentsubtasks under the budget constrains and the uncertainty of the workers’abilities. Our simulation-based approach verifies that comparing to somebenchmark approaches, our MDP-based payment planning is more efficienton optimizing the final quality under the same budget constraints.

– 89 –

Chapter 5

A Division Strategy for EfficientQuality Control

5.1 Introduction

Crowdsourcing is an online, distributed problem-solving and web-basedbusiness model that has emerged in recent years [Brabham, 2008], and it isnow admired as one of the most lucrative paradigm of leveraging the collec-tive intelligence of crowds. This very success is dependent on the diversityof crowds, not only the diversity of opinion and skill, but also the diversityof role played by the crowds in crowdsourcing process.

Common crowdsourcing process proceeds in three phases: (i) a taskis announced, usually with its requirements, reward and time period; (ii)crowds work effortfully to compete to provide the best solution; (iii) all ofthe solutions are examined, a subset of solutions are selected, and the corre-sponding users are granted the rewards. Although the crowd’s primary roleis as task solvers in the second phase, they become more and more importantin the solution examination phase in a wide variety of tasks, work as solutionexaminers to help screening or picking out the best solutions. For example,Stack Overflow, which is a Q&A site for programmers, introduced a votingsystem to help valuable pieces of technical knowledge to become more vis-

– 91 –

ible [Mamykina et al., 2011]. Specifically, users earn rights to examine thecontents of posts (questions and answers) of others and vote on the postsif they are considered to be especially useful and appropriate. Upvote on apost helps it to be more visible — appear in the front of the post list. An-other example is Threadless, which grants its community members to scorethe designs of t-shirts submitted by other members [Brabham, 2010].

In this study, we explore the model for crowdsourcing software devel-opment process, where the crowdsourcing-based bug detection is importedas a solution examination phase. Specifically, we model this process asa two-stage process in which the crowds who selected the same program-ming problem announced by crowdsourcing center (i) in the submissionstage, work as coders independently, and submit their unique pieces of pro-gram codes to the crowdsourcing center and (ii) in the bug detection stage,are presented all of the codes have been submitted to the center, select onepiece of them and work as bug detectors. Each piece of code is endowedwith reward and has a single chance to be proved incorrect. Hence, rewardis granted to the bug detector who becomes the first one to provide the testcase which can prove his selected code to be incorrect. For instance, the al-gorithm competition on TopCoder.com includes coding stage, where crowdsare presented with the same programming problem and are supposed to sub-mit their own codes within limited time, and challenge stage, where each ofthe users has a chance to challenge the functionality of the codes from oth-ers. A successful challenge results in a 50-point reward for challenging user[Lakhani et al., 2010].

We are of the opinion that this bug detection process have some strongadvantages. Above all, it is a crowdsourcing-based process that can elimi-nate the false codes and retain the true ones more efficiently and econom-ically than employing a small number of experts to test the large numberof codes. Secondly, bug detection by going through others’ programs couldbe beneficial to the programmers since it gives them a chance to learn fromothers for skill improvement. In contrast, crowds fail to improve in crowd-sourcing without such processes [Yang et al., 2008b].

In spite of the above mentioned advantage, the crowdsourcing-based

– 92 –

bug detection paradigm has inherent weakness in term of the distributionof codes that have been selected to be debugged. Because of the variousdegrees of the codes’ qualities, crowds incline to congest on the ones withrelatively low qualities that give them high probabilities of successful bugdetection, which leaves a substantial part of unchecked solutions and leadsto redundant examination on the other part. From the efficiency standpoint,adopting the perspective of a principal with the goal to identify maximumnumber of incorrect codes, we view the uneven distribution of debuggedcodes as low efficiency.

The goal of this study is to mitigate the uneven distribution problem toimprove the crowdsourcing-based bug detection from an efficiency stand-point. To achieve this goal, we encounter the challenge of balancing thefreedom and restriction on code selection. It is noteworthy that crowdsourc-ing provides the policy of maximum freedom of task selection, which en-courages crowds to solve the tasks based on their own preferences and thusbecomes indispensable for the crowdsourcing’s prosperity. Because of that,the traditional assignment problem [Kuhn, 1955], which has been well stud-ied to solve the problem of assigning a given set of workers with varyingpreferences to a given set of jobs in such a way that, maximum efficiencycan be achieved, is not appropriate to be used in crowdsourcing situation.

By contrast, division strategy is considered to be highly effective. Thebasic idea of division strategy is to control the crowds’ skill distribution ineach division by strategically dividing the crowds with different skills intodivisions, and restrict code selection in the division, i.e., the crowds canonly select from the codes produced by the crowds in the same division. Bydoing this, we are able to guide the crowds to select from the codes withmore appropriate levels of qualities that corresponding to their skill levels.Therefore, the division strategy can be reliable to work out a more uniformdistribution of debugged codes and thus an improvement in bug detectionefficiency.

We construct this chapter as follows. In Section 5.3, we model the bugdetection contest as all-pay auction, and rigorously analyze the relationshipbetween strategic code selection behaviors and the offered rewards. In Sec-

– 93 –

tion 5.4, we present the basic idea of the division strategy and explore twokey features in efficient division strategy. In Section 5.5, we conduct thesimulation which verifies the effectiveness of division strategy on bug de-tection efficiency. Finally, Section 5.6 concludes our results, discusses thelimitation of our model and suggests potential topics for future research.

5.2 Related Work

There has recently been work on addressing the efficiency problem ofcrowdsourcing where the crowdsourcing-based contests are modeled as all-pay auctions, a well-studied mechanism that is frequently employed inthe contest literature [Baye et al., 1996]. To date, most previous researchhas focused on the crowdsourcing contests with the format that the prin-cipal only benefits from the submission with the highest bid (i.e., the so-lution with the highest quality), with the aim of optimizing the quality ofthe best submission [Archak and Sundararajan, 2009] [Chawla et al., 2012][Ghosh and McAfee, 2012].

Instead of suffering the loss of submissions provided by non-winners, our research takes the full advantage of all-pay auction,with the goal of optimizing the sum of qualities for the principal,in connection to bug detection, the total number of debugged codes.Moldovanu and Sela [Moldovanu and Sela, 2001] study the contest modelwith multiple prizes for the sum-of-qualities objective. Another work[Moldovanu and Sela, 2006] proposed by them studies both the highestquality submission objective and the sum of qualities objective in a multi-stage contest model. Minor [Minor, 2011] studies the incentive design forcontest with heterogeneous players to elicit the maximum of the sum ofqualities. In contrast, Cavallo and Jain [Cavallo and Jain, 2012] considerthe crowdsourcing from a social planner’s perspective that seeks to maxi-mize social welfare.

The empirical research shows the evidence that both mone-tary reward and non-monetary reward, taking the form of reputa-

– 94 –

tion points, provide incentive for crowds to submit high quality so-lutions [Mason and Watts, 2010] [Yang et al., 2008b]. DiPalantino andVojnovic [DiPalantino and Vojnovic, 2009] theoretically demonstrate thatqualities of submissions are logarithmically increasing as a function ofthe offered reward. Those results motivate the reward allocation de-sign in crowdsourcing contests for improving efficiency. Particularly,contest with multiple prizes becomes an effective way to maximizethe sum of effort investment [Cason et al., 2010] [Clark and Riis, 1998a][Moldovanu and Sela, 2001] [Szymanski and Valletti, 2005]. In addition,contest architecture design is also used for improving contest efficiency.A contest architecture specifies a multi-stage contest in which players aresplit among several sub-contests whose winners compete against each oth-ers. Optimal structure and reward allocation are studied in a sheer amountof literatures [Moldovanu and Sela, 2006] [Fu and Lu, 2012].

Our work differs from both of the above methods mainly in the fol-lowing two aspects. 1) Instead of considering one contest with multiplerewards, our bug detection model is more akin to a collection of sepa-rate contests and each contest has one single reward. 2) Instead of al-locating multiple prizes to the winners, we indirectly control the rewardsof a set of contests using division strategy. A notably relevant work is[DiPalantino and Vojnovic, 2009].

5.3 Preliminaries

5.3.1 Bug Detection Model

It is natural to view the competitive nature of the bug detection stage as acontest, i.e., crowds who selected the same piece of code for bug detectioncompete among themselves for the reward. We model the contest as an all-pay auction, and consider an highest-bid-wins single-item all-pay auction,in which bidders simultaneously submit sealed bids for an item. All biddersforfeit their bids. The auctioneer awards the item to the bidder with the

– 95 –

highest bid, and keeps all the bids.In the connection to the bug detection contest, the item is the reward, the

bids are the actions of bug detection (e.g., constructing test cases), and thesealed value of the bids is the efforts exerted in the bug detection. Specifi-cally, we consider the bug detection stage as a game in which each player,i.e. the bug detector, selects a contest, i.e. a unique piece of code, exertseffort and in each contest the player with the highest effort wins the contest.In the event of a tie, a winner is selected uniformly at random among thehighest bidders. Consider there are N players. Associated with each playeri is his skill parameter, vi, where vi ∈ (0,1] is drawn from a non-decreasingdistribution function F(v) independently of the skills of the other players.Player i’s skill is higher than player j’s if vi > v j.

The players can be naturally partitioned into K(K ≥ 2) classes, fromlow skill to high skill, according to the skill classes ~θ = (θ0,θ1, ...,θK), θ0

= 0 < θ1 < ... < θK = 1. Thus the k-th class contains the players with k-thlowest skills, v∈(θk−1,θk]. Let Nk be the number of players in class k, thenwe must have ∑

Kk=1 Nk = N.

Let R denote the reward for first successful bug detection for any of thecodes, and the value of the reward is common knowledge. However, simplytreating the rewards as the same fails to capture the quality difference ofcodes written by players with different levels of skills, since it is generallyacknowledged that there are variations in individual programming perfor-mance. One reasonable assumption is that the codes produced by low-skillcoders are those with higher probabilities of incorrectness than those pro-duced by high-skill coders. It is implied that the skill parameter vi stands forboth coding and debugging skills of player i. Based on this assumption, wefurthermore endow each piece of code with a weight coefficient accordingto its quality, which is positively correlated with the player’s skill. Specif-ically, the codes produced by player from k-th class are associated withweight βk∈[0,1]. Intuitively, for each piece of code, the associated weightcan be viewed as its probability of having bugs, we have β1 > β2 > · · · >βK . The expected rewards for successful bug detection for the codes pro-duced by k-th class of players can be calculated as ERk = βk ·R. That is, for

– 96 –

0 = θ0

θ1

θ3

1 = θ4

θ2

skills (v)

N1

N2

N3

N4

weights (β)

1

0

1

0

β1

β2

β3

β4

expected rewards (ER)

ER 1

ER 2

ER 3

ER 4

Players Contests

Figure 5.1: Group players and contests into different classes. It shows thatthe skill distribution of players, in terms of skill classes and the number ofplayers in each class, determines the number of bug detection contests ineach corresponding class and the associated rewards.

K classes of codes produced by K classes of players, we have K classes ofrewards,

−→ER = (ER1, ..., ERK), where ER1 > ER2 > ... > ERK , thus, the

bug detection contests are naturally partitioned into K classes, which takingon one of these values as its reward.

In Fig. 5.1, we provide an illustration of grouping contests into fourclasses. It is of particular importance that the skill distribution of players,in terms of skill classes and the number of players in each class, determinesthe number of bug detection contests in each corresponding class and theassociated rewards. For example, since there are N1 players of the lowest-skill class, we have N1 lowest-quality pieces of code, which become the N1

highest-reward contests.

5.3.2 Contest Selection

Proposition 5.1. (Proposition 3.1 and 4.1 [DiPalantino and Vojnovic, 2009]).There exist a unique symmetric equilibrium to the bug detection contestgame.

– 97 –

Theorem 1. (Theorem 4.2 [DiPalantino and Vojnovic, 2009]). In the equi-librium, players select unique code as given in the following.− Skill levels. Players are partitioned over L skill levels. A skill level l

corresponds to the interval of skill values [vl+1,vl), where

F(vl) = 1−N[1,l]

1−ER

1N−1l

H[1,l](−→ER)

, f or l = 1, ...,L (5.1)

where

H[1,l](−→ER) =

(l

∑k=1

Nk

N[1,l]ER− 1

N−1k

)−1

and N[1,l] =l

∑k=1

Nl.

− Code selection vs. skill level. A player of skill v selects a particularcontest/code of class j with probability π j(v) given by

π j(v) =

ER− 1

N−1j

∑lk=1 NkER

− 1N−1

k

, for j = 1, ..., l

0, for j = l +1, ...,K.

(5.2)

for v ∈ [vl+1,vl).

Remark 5.1. In item 1, Eq. (5.1) shows that the intervals of skill levels isdetermined by the players’ skill classes and the number of players of eachskill class. It is worthwhile to note that, according to Eq. (5.1), in theequilibrium, two players in the same skill class could be partitioned intodifferent skill levels, and one skill level could be constituted by the playersfrom more than one skill classes. Item 2 shows that, in the equilibrium, aplayer of skill level l selects a contest that provides one of l highest rewardswith probability given by Eq. (5.2). Specifically, the player with skill levell selects contest that provides the corresponding expected reward, that is,the l-th highest reward, with the highest probability, and those that provides

– 98 –

1

0

1

v2

v3

v4

2

3

4

ER 1

ER 2

ER 3

ER 4

Ability levels Contest classes

Figure 5.2: Contest selection behavior in equilibrium. Players are parti-tioned into 4 skill levels. A player of skill level 3 selects the contests thatprovide the 3rd highest expected rewards with the largest probability.

larger expected rewards are selected with lower probabilities.

Fig. 5.2 provides an illustration of the contest selection behavior forK=4 and L=4. The thickness of the arrows indicates the probability that aplayer selects that particular contest. A player of skill level 3 will selectthe contests that offer one of 3 highest expected rewards. Moreover, sheselects contests that offer the 3rd highest expected reward with the largestprobability, and that offer the highest expected reward with the smallestprobability.

5.4 Bug Detection Efficiency Based on DivisionStrategy

Aimed at identifying the key features of division strategy that determine thebug detection efficiency, we explore division strategies in two dimensions— horizontal and vertical. For the horizontal comparison, we explore theskill mixing degree, which varies from none — separating, assembles agentswith same skill class in the same division, to very high — mixing, assembles

– 99 –

agents with various skill classes in one division. Based on the examplestudy, we make our first appealing conjecture as follows.

− Skill mixing degree is the most significant feature of the efficient di-vision strategy. Specifically, bug detection efficiency is positively re-lated to the skill mixing degree;

Although the skill mixing degree controls the trend of the bug detectionefficiency, such a relationship between skill mixing degree and bug detec-tion efficiency is not a one-to-one correspondence. It inspires us to explorethe feature that can differentiate the division strategies wherein the sameskill mixing degree leads to different bug detection efficiencies. For thisvertical analysis, we propose the concept of skill similarity degree whichrepresents the closeness of players’ skills in value. Based on the examplestudy, we make our second appealing conjecture as follows.

− Skill similarity degree plays a subordinate role. Specifically, for thesituations wherein the same skill mixing degree leads to different bugdetection efficiencies, high skill similarity degree indicates high levelof bug detection efficiency.

We arrange this section as follows. First we formally construct the def-initions for skill mixing degree, skill similarity degree and bug detectionefficiency. Then we conduct an example study. At first, we compare the bugdetection efficiencies on different skill mixing degrees, which indicates thatthe bug detection efficiency increases along with the increase in skill mixingdegree. Secondly, the comparison among the situations wherein the sameskill mixing degree leads to different bug detection efficiencies indicates usthat high bug detection efficiency also depends on high skill similarity de-gree. Finally, weighted average of skill mixing degree and skill similaritydegree is constructed for evaluating their relative importance.

– 100 –

5.4.1 Division Strategy

Definition 5.1. The skill mixing degree is defined as the sum of the degreesof the skill mixing of all divisions, as follows

M =I

∑i=1

Mi (5.3)

where I is the number of divisions, and

Mi =∏k∈Γi Nik

∑k∈Γi N2ik

(5.4)

where Γi = {k : Nik > 0,k∈ {1, . . . ,K}}, and Nik is the number of the playersof k-th skill class in division i.

Definition 5.2. The skill similarity degree is defined as the sum of the de-grees of the skill similarity of all divisions, as follows

S =I

∑i=1

Si (5.5)

where I is the number of divisions, and

Si = 1−

√√√√ 1Ni−1

Ni−1

∑k=1

(vk− vk+1)2 (5.6)

where Ni is the number of the players in division i, and the all players’ skillsare sorted in non-decreasing order, i.e., v1 ≤ v2 ≤ ·· · ≤ vNi .

In order to clarify the efficacy of the division strategy, let’s first considerthe problem of uneven distribution of debugged codes in bug detection con-tests without division strategy, which is reduced to the situation wherein allplayers are assembled in the unique division. When players of highest skillclass and lowest skill class are assembled in the same division, according

– 101 –

to Eq.(5.2), players with highest skills will exclusively select the codes pro-duced by the lowest-skill player which offer the highest expected rewards.Consequently, in order to avoid competing with the players having higherskills, players with lowest skills would probably select the codes with thehighest qualities. Moreover, due to their low skills, they can hardly catch thebugs in those codes. This results in low efficiency, from the principal’s pointof view, although codes produced by low-skill player would be examined,those produced by high-skill players are left unchecked or checked super-ficially due to the debuggers’ limited skills. In addition, from the player’spoint of view, it leads to significant inequality and harms the benefits gainedby low-skill players.

By applying the division strategy, we can mitigate the uneven distribu-tion of the checked codes and the inequality by restricting the players toselect the codes with more appropriate levels of rewards. Specifically, inour bug detection model, we divide N players into I divisions, and restrictthe contest selection in the divisions, i.e., the players can only select fromthe codes produced by the players in the same division. By intentionally as-sembling players with particular skills classes in one division, the principalcan control the essential features of the collection of contests in the division,in terms of expected reward classes and the scales of skill levels, which, ac-cording to Eq.(5.1) and Eq.(5.2), determine the players’ strategic behaviorson code selection. For instance,when the players of highest skill class andthose of lowest skill class are partitioned into different division, the formerare restricted to select the codes produced by the players with higher skillclasses instead of the lowest skill class.

In the division, the expected reward class is the most significant de-terminant of the players’ strategic behaviors on code selection, and there’sone-to-one correspondence between it and the composition of players’ skillclasses, in terms of the number of the skill classes and the number of playersin each skill class. Hence, the concept of skill mixing degree has been con-structed for distinguishing the division strategies varying in compositionsof players’ skill classes, and becomes the most important factor in our bugdetection model.

– 102 –

Moreover, it is worthwhile to note that, the exchange of two players be-long to two different divisions can bring different effects on the skill mixingdegree and skill similarity degree. Specifically, if the two players in two di-visions has similar skill values but belong to different skill classes (since theskill values for players are continuous, two players with similar skill valuescould be in two adjacent but different skill classes.), the exchange of thesetwo player will cause small similarity degree changes in both divisions, buta relatively large change in mixing degree. It is reasonable based on thedefinitions given by Eq.(5.4) and Eq.(5.6) that skill mixing degree concernsthe number of players in different skill classes, but the skill similarity degreeconcerns the specific value of the players skills. In the same way, we can de-duce that if there’s a large difference between the skill values of two playersin two divisions and belong to different skill classes, the exchange of themwill cause relatively small change in mixing degree, but a large change insimilarity degree.

5.4.2 Bug Detection Efficiency

Definition 5.3. The bug detection efficiency is defined as the sum of the bugdetection efficiency of all divisions, as follows

E =I

∑i=1

Ei (5.7)

where Ei is the efficiency of the i-th division, which is defined as

Ei =1

∑n=L

n

∑m=L

R ·βm ·Nim ·πL−m+1(vL−n+1)∫ vL−n+1

vL−n+2

vdv. (5.8)

where− R:the reward;− βm:the weight coefficient endowed to m-th class of codes, represent-

ing its probability of having bugs;− Nim: the number of players of m-th skill class in division i;

– 103 –

− πL−m+1(vL−n+1): the probability for a player with skill level L−n+1to select a particular code of class L−m+1;

− [vL−n+2,vL−n+1): the interval of skill level L−n+1.

Remark 5.2. The integrals of the skill parameter imply another reasonableassumption that the player’s skill of coding is proportional to her skill of bugdetection. Therefore, the skill with a player can be viewed as her probabilityof successful bug detection. Then π · v can be understood as the probabilityof one piece code to be selected and successfully proved to be flawed by aplayer with skill v, and β ·N is the number of incorrect codes produced bythe player in the same division. Hence, when the reward is normalized to1, the efficiency of bug detection defined above can be understood as theexpected number of codes proved to be flawed in all divisions.

Given reward normalization (R=1), the efficiency of bug detection canbe understood as the expected number of codes proved to be flawed. Con-sider a particularly simple situation that in one division, there are playersof two skill levels ([0,vthreshold), [vthreshold,1]), and codes with two classesof expected reward (βhigh, βlow, Nhigh, Nlow). Players with high skills onlycontribute to the set of codes with high expected rewards, i.e., select high-expected-reward codes with probability equals to 1

Nhigh. The expected num-

ber of incorrect codes contributed by high-skill players is calculated as

βhigh ·Nhigh ·1

Nhigh

∫ 1

vthreshold

vdv = βhigh

∫ 1

vthreshold

vdv.

In contrast, players of low skill level contribute to both classes of codes,with probabilities πhigh and πlow for high-expected-reward codes and low-expected-reward codes respectively. Note that πlow > πhigh, i.e., the low-expected-reward codes are their first choices. The expected number of in-correct codes contributed by low-skill players is calculated as

βlow ·Nlow ·πlow

∫ vthreshold

0vdv+βhigh ·Nhigh ·πhigh

∫ vthreshold

0vdv.

– 104 –

Tabl

e5.

1:A

nex

ampl

eof

bug

dete

ctio

nef

ficie

ncy

incr

ease

sw

ithth

esk

illm

ixin

gde

gree

ofdi

visi

onst

rate

gy.

0.1

R

1

0.2

R

1

0.3

R

1

0.4

R

1

0.5

R

2

0.6

R

2

0.7

R

2

0.8

R

2

0.1

R

1

0.2

R

1

0.3

R

1

0.7

R

2

0.4

R

1

0.5

R

2

0.6

R

2

0.8

R

2

0.1

R

1

0.2

R

1

0.7

R

2

0.8

R

2

0.3

R

1

0.4

R

1

0.5

R

2

0.6

R

2

Div

isio

nS

trat

egy I

Div

isio

nS

trat

egy I

ID

ivis

ion

Str

ateg

y I

II

Div

isio

n 1

_1

Div

isio

n 1

_2

Div

isio

n 2

_1

Div

isio

n 2

_2

Div

isio

n 3

_1

Div

isio

n 3

_2

Div

isio

nIn

itia

liza

tio

n

Sk

ill

Mix

ing

Deg

ree

(M)

M1

= 0

.5M

2=

0.6

M3

= 1

Sk

ill

Sim

ilar

ity D

egre

e (S

)S

1=

1.8

S2

= 1

.62

S3

= 1

.6

Sk

ill

Lev

el (L

)

L1

_1=

1L

1_

2=

1L

2_

1=

2L

2_

2=

2L

3_

1=

2L

3_

2=

2

<

2,

1>

=<

0,1

><

2,

1>

=<

0,1

><

3,

2,

1>

=<

0,

0.3

8, 1

>

<

3,

2,

1>

=<

0,

0.7

9, 1

>

<

3,

2,

1>

=<

0,

0.5

9, 1

>

<

3,

2,

1>

=<

0,

0.5

9, 1

>

Co

nte

stS

elec

tio

n

Bu

gD

etec

tion

Eff

icie

ncy

(E

)E

1_

1=

0.8

E1

_2=

1.0

4E

2_

1=

0.6

8E

2_

2=

1.2

4E

3_

1=

1.3

2E

3_

2=

0.8

4

E1=

1.8

4E

2=

1.9

2E

1=

2.1

6

0.1

R

1

0.2

R

1

0.3

R

1

0.4

R

1

0.5

R

2

0.6

R

2

0.7

R

2

0.8

R

2

0.1

R

1

0.2

R

1

0.3

R

1

0.7

R

2

0.4

R

1

0.5

R

2

0.6

R

2

0.8

R

2

0.1

R

1

0.2

R

1

0.7

R

2

0.8

R

2

0.3

R

1

0.4

R

1

0.5

R

2

0.6

R

2

– 105 –

Additionally, the integrals of the skill parameter imply the reasonable as-sumption that the player’s skill of coding is proportional to his/her skill ofbug detection.

5.4.3 Example Study: Key Factors in Efficient DivisionStrategy

In Table 5.1, we consider a two-division situation wherein eight players whoare endowed with skills v1,v2· · · ,v8=0.1, 0.2, · · · , 0.8, belong to two skillclasses, (0,0.4] and (0.4,0.8]. The two classes of codes produced by low-skill players and high-skill players are, without loss of generality, endowedwith specific values, R1=0.8 and R2=0.4, respectively. We initialize the twodivisions with equal number of players. In Division Strategy I, two classesof players are completely separated into two divisions, which leads to theminimum skill mixing degree M1=0.5. In the equilibrium, players in bothdivisions are partitioned into only one skill level (L1 1=L1 2=1). Therefore,in Division 1 1, players have equal probability for selecting among the fourhigh-reward contests.

Consider the common case in which 1) players in high-skill level selectcontests with high rewards and players in low-skill level select those providelow rewards; and 2) the “congestion problem” — more than one playerscompete on the same piece of code — is minimized. Thus, in Division 1 1no players is supposed to compete on any of the contests, and each piece ofcode is debugged by a unique player. By contrast, the situation is different indivision 2 1 brought by Division Strategy II, where players are partitionedinto two skill levels in the equilibrium. There’s one player (v = 0.7) in thehigh-skill level, who is also the only one can select from three high-rewardcontests. The other three players of low-skill level have to compete for theonly contest with low reward R2. Player with the highest skill (v= 0.3) winsthe reward.

The horizontal comparison among the outcomes under different divi-sion strategies demonstrates our first conjecture, that bug detection effi-ciency is positively related to the skill mixing degree. Specifically,skill mix-

– 106 –

Table 5.2: An example of bug detection efficiency increases with the skillsimilarity degree.

Division Strategy II_A Division Strategy III_A

M M2_A= 0.6 M3_A = 1

L< 3, 2, 1>

=<0, 0.38, 1>

< 3, 2, 1>

=<0, 0.79, 1>

< 3, 2, 1>

=<0, 0.59, 1>

< 3, 2, 1>

=<0, 0.59, 1>

Contest

Selection

S S2_A = 1.6 S3_A = 1.8

E

E2_A_1=0.76 E2_A_2=0.72 E3_A_1=0.76 E3_A_2=1.48

E2_A=1.48 E3_A=2.24

0.1 R1

0.2 R1

0.3 R1

0.8 R2

0.4 R1

0.5 R2

0.6 R2

0.7 R2

0.1 R1

0.2 R1

0.5 R2

0.6 R2

0.3 R1

0.4 R1

0.7 R2

0.8 R2

ing degrees of Division Strategy I, II, III are M1 = 0.5, M2 = 0.6, M3 = 1,and the corresponding bug detection efficiency are E1 = 1.84, E2 = 1.92,E3 = 2.16.

In Table 5.2, we re-initialize the player allocation in Division StrategyII. By exchanging two players with skills v = 0.7 and v = 0.8, the bug detec-tion efficiency decreases to E2 A = 1.48 (< E2 = 1.92), without any changeto the skill mixing degree (M2 A = M2 = 0.6). By contrast, the change ofplayer allocation in Division Strategy III leads to contrary outcome. Byexchanging the high-skill-class players in two divisions, the bug detectionefficiency increases to E3 A = 2.24 (> E3 = 2.16), with the same skill mix-ing degree (M3 A = M3 = 1).

The one-to-many correspondence between skill mixing degree and bugdetection efficiency inspires us to do the vertical comparison by checkingthe skill similarity degree of division strategies with the same skill mix-ing degree. For the Division Strategy II and II A, with the same skillmixing degree (M2 A = M2 = 0.6), II A with lower skill similarity de-

– 107 –

gree (S2 A = 1.6 < S2 = 1.62) works out lower bug detection efficiency(E2 A = 1.48 < E2 = 1.92). For the Division Strategy III and III A, withthe same skill mixing degree equals to 1, the increase in skill similaritydegree of III A (S3 A = 1.8 > S3 = 1.6) leads to a higher bug detection ef-ficiency (E3 A = 2.24 > E3 = 2.16). This vertical comparison demonstratesour second conjecture that for the situations wherein the same skill mix-ing degree leads to different bug detection efficiencies, high skill similaritydegree indicates high level of bug detection efficiency.

Based on the above analysis, we go one step further by incorporat-ing the relative importance relation of skill mixing degree and skill sim-ilarity degree. Using weighted average to combine these two determi-nant factors allows to consider their relative importance in division strat-egy for bug detection efficiency. Fig. 5.3 visualizes the shape of the valueof the weighted averages over the skill mixing degrees and skill similaritydegrees((M+0.9S)/2) given in the example. By weighting skill mixing de-gree and skill similarity degree with coefficients 1 and 0.9 respectively, theaverage weight of skill mixing degree and skill similarity degree is demon-strated to be consistent with the shape of bug detection efficiency. Although,0.9 is just an instance for the weight on skill similarity degree, one can easilyfind that once the coefficient given to skill similarity degree is equal or largerthan the one of skill mixing degree, the weighted average becomes differentfrom the bug detection efficiency. The values of weight demostrate that skillmixing degree plays a more important role than skill similarity in the bugdetecttion efficiency.

5.5 Simulation and Analysis

In this section, we experimentally evaluate the performance of the divisionstrategies with different skill mixing degrees and skill similarity degrees.First, we randomly generate N players with different skills. Second, wedivide them into I divisions in the way that it initials the division strategywith the lowest skill mixing degree. Without loss of generality, we suppose

– 108 –

each division contains the same number of players, i.e., N/I. Based on ourmodel, bug detection efficiency of each division can be computed. Third,by strategically changing the allocation of players, we develop a new divi-sion strategy with higher skill mixing degree. By repeatedly conducting thesecond and the third steps, we can then investigate the conjectures given inthe above section.

5.5.1 Setting

We explore the above two features in three environments.

Linear skill distribution function. Suppose the skills of players are uni-formly distributed on the unit interval [0,1], we assume the skill distributionfunction as linear, i.e.,

Flinear(v) = v,v ∈ [0,1].

Concave skill distribution function. In the situation where the contestsattract a lot of players with low skills rather than high skill — for example,a relatively new crowdsourcing platform or the early stages in the multi-stage sequential-elimination contests [Fu and Lu, 2012] — we assume theskill distribution function as concave function which is a normal distributionwith mean µ and variance σ2 > 0, i.e.,

Fconcave(v) = Φ((v−µ)/σ),

where Φ(x)= 1√2x

∫ x−∞

e−(t2/2)dt.

Convex skill distribution function. By contrast, in the situation wherethe contest attracts more high-skill players than low-skill players — for ex-ample, the crowdsourcing contests with minimum skill requirement — weassume the skill distribution function as convex function which, without lossof generality, can be the following power function:

Fconvex(v) = vα , α > 1.

– 109 –

Table 5.3: Summary of parameters

Parameter Symbol ValueNo. of Players N 200Skill of Players v (0,1]No. of Skill Classes K 4No. of Divisions I 4Thresholds of Skill Classes ~θ (0,0.25,0.5,0.75,1)Weight Coefficients ~β (0.8,0.6,0.4,0.2)

Parameters used in simulation are presented in Table 5.3. 200 players of4 skill classes are equally divided into 4 divisions. As the weight coefficientsshow, we don’t guarantee the high-quality codes have no bugs, and not allof the low-quality codes are incorrect. We normalize the reward to 1, andwe also normalize the simulation results, including bug detection efficiency,skill mixing degree and skill similarity degree in order to compare the effectof different division strategies.

5.5.2 Results and Discussion

Bug detection efficiency of division strategies with different skill mixingdegree are investigated under three kinds of skill distributions. Differentfrom the common case we illustrated in the example in section 4, whereplayers in skill level l are only allowed to select the contests providing thel-th class of rewards, we conducted the simulation in less controlled setting.In the simulation, the players are supposed to strategically select contestwith probabilities given by Eq.(5). That is, a player of skill level l can ran-domly select a contest that provides one of l highest rewards. Specifically,the player with skill level l selects contest that provides the correspondingclass reward, that is, the l-th highest reward, with the highest probability andthose that provides larger expected rewards are selected with lower proba-bilities.

First, we can easily confirm the conjecture that the bug detection effi-

– 110 –

I II II_A III III_A

0.5

1

Division Strategies

Skill Mixing Degree (M)

Skill Similarity Degree (S)

Bug Detetcion Efficiency (E)

Weighted Average of M and S

Skill Mixing

Degree (M)

Skill Similarity

Degree (S)

Bug Detection

Efficiency (E)

Weighted

Average

I 0.50 1.00 0.82 0.70

II 0.60 0.90 0.85 0.71

II_A 0.60 0.88 0.66 0.69

III 1.00 0.88 0.96 0.89

III_A 1.00 1.00 1.00 0.95

Figure 5.3: Relative importance of two key factors of efficient division strat-egy. Shape of weighted average ((M+0.9S)/2) over skill mixing degree andskill similarity degree is consistent with that of bug detection efficiency (E).

ciency increases with the skill mixing degree, for all of the three types ofskill distributions. In Fig. 5.4, (a.1) presents the linear skill distributionwith mean skill value 0.7, and (a.2) shows the mean efficiency of bug de-tection versus the skill mixing degree. The linear regression on the bugdetection efficiencies indicates an increasing trend for bug detection effi-ciency through the whole interval of skill mixing degree. Fig. 5.5 (a.2) andFig. 5.6 (a.2) lead to the same conclusion under both concave skill distribu-tion and convex skill distribution. It is noticeable that, the increase in bugdetection efficiency under concave skill distribution situation is particularly

– 111 –

00

.20

.40

.60

.81

0

0.2

0.4

0.6

0.81

Ski

ll (

v)

Skill Distribution F(v)

(a.1

) L

inea

r sk

ill

dist

ribu

tion

.

00

.20

.40

.60

.81

0.7

0.8

0.91

Ski

ll M

ixin

g D

egre

e (M

)

Bug Detection Efficiency (E)

(a.2

) M

ean

effi

cien

cy v

ersu

s th

e sk

ill

mix

ing

degr

ee.

00

.20

.40

.60

.81

0.8

0.8

5

0.9

0.9

51

Ski

ll M

ixin

g D

egre

e (M

)


(b.1

) T

he m

ean

effi

cien

cy i

n co

mm

on c

ase.

00

.20

.40

.60

.81

0.9

6

0.9

7

0.9

8

0.9

91

Ski

ll M

ixin

g D

egre

e (M

)


(b.2

) M

ean

skil

l si

mil

arit

y de

gree

ver

sus

skil

l m

ixin

g de

gree

.

0.9

75

0.9

80

.98

50

.88

0.9

0.9

2

0.9

4

0.9

6 Ski

ll S

imil

arit

y D

egre

e (S

)


(c.1

) E

ffic

ienc

y ve

rsus

ski

ll s

imil

arit

y de

gree

in

com

mon

cas

e.

0.9

75

0.9

80

.98

50

.84

0.8

6

0.8

8

0.9

0.9

2 Ski

ll S

imil

arit

y D

egre

e (S

)


(c.2

) E

ffic

ienc

y ve

rsus

ski

ll s

imil

arit

y de

gree

in

sim

ulat

ion.

Min

imum

Ski

ll:

0.00

50M

axim

um S

kill

: 0.

9950

Mea

n: 0

.500

0

Figu

re5.

4:L

inea

rsk

illdi

stri

butio

n.B

ugde

tect

ion

effic

ienc

yin

crea

ses

with

the

skill

mix

ing

degr

ee,

and

shap

edas

skill

sim

ilari

tyde

gree

.

– 112 –

00.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.81

Skil

l (v

)


(a.1

) C

onca

ve

skil

l dis

trib

uti

on.

00.2

0.4

0.6

0.8

10.4

0.6

0.81

Skil

l M

ixin

g D

egre

e (M

)


(a.2

) M

ean e

ffic

iency

ver

sus

the

skil

l m

ixin

g d

egre

e.

00.2

0.4

0.6

0.8

10.4

0.6

0.81

Skil

l M

ixin

g D

egre

e (M

)


(b.1

) T

he

mea

n e

ffic

iency

in c

om

mon c

ase.

00.2

0.4

0.6

0.8

10.9

5

0.9

6

0.9

7

0.9

8

0.9

91

Skil

l M

ixin

g D

egre

e (M

)


(b.2

) M

ean s

kil

l si

mil

arit

y d

egre

e ver

sus

skil

l m

ixin

g d

egre

e.

0.9

70.9

75

0.6

06

0.6

07

0.6

08

0.6

09

0.6

1 Skil

l S

imil

arit

y D

egre

e (S

)


(c.1

) E

ffic

iency

ver

sus

skil

l si

mil

arit

y d

egre

e in

com

mon c

ase.

0.9

70.9

75

0.9

2

0.9

4

0.9

6

0.9

8 Skil

l S

imil

arit

y D

egre

e (S

)


(c.2

) E

ffic

iency

ver

sus

skil

l si

mil

arit

y d

egre

e in

sim

ula

tion.

Min

imum

Skil

l: 0

.0089

Max

imum

Skil

l: 0

.9323

Mea

n:

0.2

782

Figu

re5.

5:C

onca

vesk

illdi

stri

butio

n.B

ugde

tect

ion

effic

ienc

yin

crea

ses

with

the

skill

mix

ing

degr

ee,a

ndsh

aped

assk

illsi

mila

rity

degr

ee.

– 113 –

0.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.81

Skil

l (v

)


(a.1

) C

onvex

skil

l dis

trib

uti

on.

00.2

0.4

0.6

0.8

10.2

0.4

0.6

0.81

Skil

l M

ixin

g D

egre

e (M

)


(a.2

) M

ean e

ffic

iency

ver

sus

the

skil

l m

ixin

g d

egre

e.

00.2

0.4

0.6

0.8

10.7

5

0.8

0.8

5

0.9

0.9

51

Skil

l M

ixin

g D

egre

e (M

)


(b.1

) T

he

mea

n e

ffic

iency

in c

om

mon c

ase.

00.2

0.4

0.6

0.8

10.9

5

0.9

6

0.9

7

0.9

8

0.9

91

Skil

l M

ixin

g D

egre

e (M

)


(b.2

) M

ean s

kil

l si

mil

arit

y d

egre

e ver

sus

skil

l m

ixin

g d

egre

e.

0.9

66

0.9

68

0.9

70.8

0.8

05

0.8

1

0.8

15

0.8

2 Skil

l S

imil

arit

y D

egre

e (S

)


(c.1

) E

ffic

iency

ver

sus

skil

l si

mil

arit

y d

egre

e in

com

mon c

ase.

0.9

66

0.9

68

0.9

70.4

27

0.4

28

0.4

29

0.4

3

0.4

31

0.4

32 Skil

l S

imil

arit

y D

egre

e (S

)


(c.2

) E

ffic

iency

ver

sus

skil

l si

mil

arit

y d

egre

e in

sim

ula

tion.

Min

imum

Skil

l: 0

.2161

Max

imum

Skil

l: 0

.9783

Mea

n:

0.7

198

Figu

re5.

6:C

onve

xsk

illdi

stri

butio

n.B

ugde

tect

ion

effic

ienc

yin

crea

ses

with

the

skill

mix

ing

degr

ee,

and

shap

edas

skill

sim

ilari

tyde

gree

.

– 114 –

remarkable comparing to the situations of linear and convex skill distribu-tions. Specifically, for the concave skill distribution situation, Fig. 5.5 (a.2)shows the increase in the efficiency is about 0.2 (from 0.6 to 0.8), which islarger than those of the situations for linear skill distributions (Fig. 5.4 (a.2))and convex skill distributions (Fig. 5.6 (a.2)), wherein the increases in theefficiency is about 0.1 (from 0.8 to 0.9, 0.6 to 0.7, respectively). The reasonis easy to understand that for the concave skill distribution, with a low meanskill equaling to 0.2782, the number of potential bugs in all codes is higherthan that under both linear and convex skill distribution. In Fig. 5.4 (b.1),Fig. 5.5 (b.1) and Fig. 5.6 (b.1) we investigate the mean efficiency versusthe skill mixing degree in common case. The conjecture that bug detectionefficiency is positively related to the skill mixing degree is also hold underthree types of skill distributions in this case.

Secondly, we illustrate the mean skill similarity degree versus the skillmixing degree in (b.2). As we initialize the divisions with the lowest skillmixing degree at the very beginning by allocating the players of the sameskill class in the same division, the skill similarity degree is set to be highat first. Then by exchanging a small number of players in different skillclasses among divisions, the skill similarity degree decreases dramaticallyalong with a slight increase in skill mixing degree. As the exchanging pro-cess continues, the numbers of the players in different skill classes becomecloser, which makes the mean of skill similarity degree rebound gradually.Comparing to the skill similarity degree, the corresponding bug detectionefficiency in the common case shapes up in a consistent way. This consis-tence confirms our conjecture that for the situations wherein the same skillmixing degree leads to different bug detection efficiencies, high skill simi-larity degree indicates high level of bug detection efficiency.

Furthermore, we present the bug detection efficiency versus skill sim-ilarity degree under the same skill mixing degree (by simply choosing themedian value of efficiency for illustration) in Fig. 5.4 (c.1)(c.2), Fig. 5.5(c.1)(c.2) and Fig. 5.6 (c.1)(c.2). They show that in common case, as weconjectured, the efficiency increases with the skill similarity degree. How-ever, it is unlikely that this conjecture holds in the simulation which may

– 115 –

due to the random selection on contests. Finally, contrary to the bug detec-tion efficiency, the linear regression lines on the mean skill similarity degreeversus skill mixing degree (Fig. 5.4 (b.2), Fig. 5.5 (b.2) and Fig. 5.6 (b.2))show that the skill similarity degree decreases along with the increase inskill mixing degree. However, this does not reverse the increasing trend ofbug detection efficiency, which indicates that skill mixing degree controlsthe trend, and skill similarity degree indicates the shape.

5.6 Summary

In this study we have presented and analyzed a crowdsourcing-based bugdetection model in which strategic players select code and compete in bugdetection contests. Our focus is on addressing the low efficiency problemin bug detection. Our study shows that by intentionally assembling playerswith particular skill distribution in one division and limiting the code se-lection and bug detection contests in the division, the division strategy, tosome extent, can control the essential features of the contests, in terms ofexpected reward classes and the scales of skill levels, which determine theplayers’ strategic behaviors on code selection. By exploring key factors ofdivision strategy, we conclude that skill mixing degree, serving as determi-nant factor, controls the trend of the bug detection efficiency, specifically,high degree of skill mixing leads to high level of bug detection efficiency,and skill similarity degree plays an important role in indicating the shape ofthe bug detection efficiency.

Nonetheless, we have noted that although skill distribution as skill mix-ing degree and skill similarity degree is the main determinant of playerstrategic behavior on code selection, it is not the only factor that can in-fluence the bug detection efficiency. Future work could consider the impactof the size and the number of the divisions, and how they might affect thebug detection efficiency. In addition, one limitation on the assignment ofcodes’ rewards in the proposed model also could be considered in futureresearch. Aiming at finding as many incorrect codes as possible, we con-

– 116 –

struct expected rewards from the player’s point of view to differentiate thequalities of codes and apply the relationship between reward and player’sstrategic behavior to encourage more appropriate code selection behavior.However, for other situations wherein the principal requires guarantee ofhigh quality on codes produced by high-skill agents, the rewards assignedto the codes produced by high-skill players should be higher than those pro-vided by low-skill players.

– 117 –

Chapter 6

Conclusion

6.1 Contributions

In this thesis, we have proposed incentive approaches to motivate workerswho cooperatively resolve complex tasks in crowdsourcing environment tocome up with high-quality solutions. The main contributions of this thesisare summarized as follows:

1. Efficient task decomposition strategy design for complex tasks.

We have formally presented and analyzed vertical and horizontal taskdecomposition models which respectively specify the relationship betweensubtask quality and the worker’s effort level in the presence of positive andnone dependence among subtasks. Our focus is on addressing the efficiencyof task decomposition when the workers are paid equally and contribution-based. We conducted simulations and experiments on Amazon MechanicalTurk to analyze and compare the efficiency of two task decompositions,aiming at generating explicit instructions on strategies for optimal task de-composition. We conclude that when the final quality is defined as the sumof the qualities of solutions to all decomposed subtasks, vertical task decom-position strategy outperforms the horizontal one in improving the quality ofthe final solution. We also give explicit instructions the optimal strategy un-

– 119 –

der vertical task decomposition situation to help the task requester to achievethe highest quality of the final solution.

2. MDP-Based reward design for efficient cooperative problem solving.

We addresses the challenge of allocating limited budget to multiplesubtasks which are sequentially interdependent with each other. Our firstcontribution is that we formalize a general model of such crowdsourcingsubtasks by specifying the relationship between subtask quality and theworker’s effort level and how sequential subtasks explicitly depend on eachother. Then we pose the problem of maximizing the quality of the finalsolution with limited budget. A key challenge in this problem is to allo-cate the limited budget among the sequential subtasks. This is particularlydifficult in crowdsourcing marketplaces where the task requester posts sub-tasks with their payments sequentially and pays the predetermined paymentto the worker once the subtask is finished. By modeling this problem asMarkov Decision Processes (MDPs) which capture the task-solving pro-cess’s uncertainty about the real abilities of a succession of workers, that is,the uncertainty about the qualities of the sequential subtasks, we show thevalue of MDPs for the problem of optimizing the quality of the final solutionby dynamically determining the budget allocation on sequential dependentsubtasks under the budget constrains and the uncertainty of the workers’abilities. Our simulation-based approach verifies that comparing to somebenchmark approaches, our MDP-based payment planning is more efficienton optimizing the final quality under the same budget constraints.

3. A division strategy for quality control.

In this issue we have presented and analyzed a crowdsourcing-basedbug detection model in which strategic players select code and compete inbug detection contests. Our focus is on addressing the low efficiency prob-lem in bug detection. This bug detection contest can be view as a particularmechanism for redundant solution examination, and is readily extended tothe cooperative task-solving process we formalized in the first two issues.

– 120 –

Our study shows that by intentionally assembling players with particularskill distribution in one division and limiting the code selection and bug de-tection contests in the division, the division strategy, to some extent, cancontrol the essential features of the contests, in terms of expected rewardclasses and the scales of skill levels, which determine the players’ strategicbehaviors on code selection. By exploring key factors of the division strat-egy, we conclude that skill mixing degree, serving as determinant factor,controls the trend of the bug detection efficiency, specifically, high degreeof skill mixing leads to high level of bug detection efficiency, and skill sim-ilarity degree plays an important role in indicating the shape of the bugdetection efficiency.

6.2 Future Directions

Our future research directions of incentive design in cooperative problemsolving in crowdsourcing are listed below.

− Vertical and horizontal task decomposition strategy combination.

In this thesis we mainly concentrate on modeling and analyzing verti-cal and horizontal task decomposition independently. Although in terms ofquality improvement, vertical task decomposition strategy is superior to thehorizontal task decomposition, the latter one is comparatively easier to con-duct in crowdsourcing platform, and still powerful in crowdsourcing-basedproblem solving. It is necessary to combine these two strategy in the work-flows design for complex problem solving. Empirical studies are on thisissue (e.g., [Kulkarni et al., 2011]), however, absent explicit guidelines, itis difficult for the task requesters to decide how to take fully advantage ofthese two task decomposition when they work coordinately.

− Budget-constrained crowdsourcing.

In this thesis we consider how to allocate limited budget for complexworkflows that interleave heterogeneous micro-task (e.g., Find-Fix-Verify).

– 121 –

In our process, all budget is supposed to be spent. In the future work, isit valuable to consider how to balance the final quality and the amount ofbudget, that is how to make it cheaper and achieve a threshold of quality aswell. The MDP-based budget allocation algorithm proposed in this thesishas the potential to deal with the new research problem.

– 122 –

Bibliography

[Abbassi and Misra, 2011] Abbassi, Z. and Misra, V. (2011). Multi-levelrevenue sharing for viral marketing. In Proceedings of ACM NetEcon.

[Adamic et al., 2008] Adamic, L. A., Zhang, J., Bakshy, E., and Acker-man, M. S. (2008). Knowledge sharing and yahoo answers: everyoneknows something. In Proceedings of the 17th International Conferenceon World Wide Web, pages 665–674.

[Aniket Kittur and Kraut, 2011] Aniket Kittur, Boris Smus, S. K. andKraut, R. E. (2011). Crowdforge: crowdsourcing complex work. In Pro-ceedings of the 24th Annual ACM Symposium on User Interface Softwareand Technology, pages 16–19.

[Archak, 2010] Archak, N. (2010). Money, glory and cheap talk: analyzingstrategic behavior of contestants in simultaneous crowdsourcing contestson topcoder. com. In Proceedings of the 19th International Conferenceon World Wide Web, pages 21–30.

[Archak and Sundararajan, 2009] Archak, N. and Sundararajan, A. (2009).Optimal design of crowdsourcing contests. ICIS 2009 Proceedings, page200.

[Auer et al., 2002] Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002).Finite-time analysis of the multiarmed bandit problem. Machine Learn-ing, 47(2-3):235–256.

[Azaria et al., 2012] Azaria, A., Aumann, Y., and Kraus, S. (2012). Auto-mated strategies for determining rewards for human work. In Proceed-

– 123 –

ings of the 26th AAAI Conference on Artificial Intelligence.[Barabasi and Albert, 1999] Barabasi, A.-L. and Albert, R. (1999). Emer-

gence of scaling in random networks. Science, 286(5439):509–512.[Barua et al., 1995] Barua, A., Lee, C.-H. S., and Whinston, A. B. (1995).

Incentives and computing systems for team-based organizations. Orga-nization Science, 6(4):487–504.

[Baye et al., 1996] Baye, M. R., Kovenock, D., and De Vries, C. G. (1996).The all-pay auction with complete information. Economic Theory,8(2):291–305.

[Bellman, 1956] Bellman, R. (1956). Dynamic programming and lagrangemultipliers. In Proceedings of the National Academy of Sciences of theUnited States of America, volume 42, page 767.

[Bernstein et al., 2010] Bernstein, M. S., Little, G., Miller, R. C., Hart-mann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich,K. (2010). Soylent: a word processor with a crowd inside. In Proceed-ings of the 23nd Annual ACM Symposium on User Interface Softwareand Technology, pages 313–322.

[Bertsekas et al., 1995] Bertsekas, D. P., Bertsekas, D. P., Bertsekas, D. P.,and Bertsekas, D. P. (1995). Dynamic Programming and Optimal Con-trol, volume 1. Belmont, MA: Athena Scientific.

[Brabham, 2008] Brabham, D. C. (2008). Crowdsourcing as a model forproblem solving an introduction and cases. Convergence: the Interna-tional Journal of Research into New Media Technologies, 14(1):75–90.

[Brabham, 2010] Brabham, D. C. (2010). Moving the crowd at threadless:motivations for participation in a crowdsourcing application. Informa-tion, Communication & Society, 13(8):1122–1145.

[Cason et al., 2010] Cason, T. N., Masters, W. A., and Sheremeta, R. M.(2010). Entry into winner-take-all and proportional-prize contests: anexperimental study. Journal of Public Economics, 94(9):604–611.

[Cavallo and Jain, 2012] Cavallo, R. and Jain, S. (2012). Efficient crowd-

– 124 –

sourcing contests. In Proceedings of the 11th International Conferenceon Autonomous Agents and Multiagent Systems (AAMAS 2012), pages677–686.

[Chawla et al., 2012] Chawla, S., Hartline, J. D., and Sivan, B. (2012). Op-timal crowdsourcing contests. In Proceedings of the Twenty-Third An-nual ACM-SIAM Symposium on Discrete Algorithms, pages 856–868.

[Chen et al., 2011] Chen, J. J., Menezes, N. J., Bradley, A. D., and North, T.(2011). Opportunities for crowdsourcing research on amazon mechanicalturk. Interfaces, 5(3).

[Chen et al., 2010] Chen, Y., HO, T.-H., and KIM, Y.-M. (2010). Knowl-edge market design: A field experiment at google answers. Journal ofPublic Economic Theory, 12(4):641–664.

[Cheng and Han, 2013] Cheng, M. L. and Han, Y. (2013). A modifiedcobb–douglas production function model and its application. IMA Jour-nal of Management Mathematics, pages 353–365.

[Clark and Riis, 1998a] Clark, D. J. and Riis, C. (1998a). Competition overmore than one prize. The American Economic Review, 88(1):276–289.

[Clark and Riis, 1998b] Clark, D. J. and Riis, C. (1998b). Influence and thediscretionary allocation of several prizes. European Journal of PoliticalEconomy, 14(4):605–625.

[Dai et al., 2013] Dai, P., Lin, C. H., Weld, D. S., et al. (2013). Pomdp-based control of workflows for crowdsourcing. Artificial Intelligence,202:52–85.

[Dean and Image, 2008] Dean, C. and Image, E. (2008). If you have a prob-lem, ask everyone. New York Times, 22.

[DiPalantino and Vojnovic, 2009] DiPalantino, D. and Vojnovic, M.(2009). Crowdsourcing and all-pay auctions. In Proceedings of the 10thACM Conference on Electronic Commerce, pages 119–128.

[Doan et al., 2011] Doan, A., Ramakrishnan, R., and Halevy, A. Y. (2011).Crowdsourcing systems on the world-wide web. Communications of the

– 125 –

ACM, 54(4):86–96.[Faradani et al., 2011] Faradani, S., Hartmann, B., and Ipeirotis, P. G.

(2011). What’s the right price? pricing tasks for finishing on time. InProceedings of HCOMP11: The 3rd Workshop on Human Computation.

[Fu and Lu, 2009] Fu, Q. and Lu, J. (2009). The beauty of “bigness”: Onoptimal design of multi-winner contests. Games and Economic Behavior,66(1):146–161.

[Fu and Lu, 2012] Fu, Q. and Lu, J. (2012). The optimal multi-stage con-test. Economic Theory, 51(2):351–382.

[Fuxman et al., 2008] Fuxman, A., Tsaparas, P., Achan, K., and Agrawal,R. (2008). Using the wisdom of the crowds for keyword generation. InProceedings of the 17th international conference on World Wide Web,pages 61–70.

[Gelly et al., 2006] Gelly, S., Wang, Y., Munos, R., and Teytaud, O. (2006).Modification of uct with patterns in monte-carlo go.

[Ghosh and McAfee, 2012] Ghosh, A. and McAfee, P. (2012). Crowd-sourcing with endogenous entry. In Proceedings of the 21st internationalconference on World Wide Web (WWW’12), pages 999–1008.

[Greg Little and Miller, 2010] Greg Little, Lydia B. Chilton, M. G. andMiller, R. C. (2010). Exploring iterative and parallel human computationprocesses. In Proceedings of the ACM SIGKDD Workshop on HumanComputation, pages 68–76.

[Harper et al., 2009] Harper, F. M., Moy, D., and Konstan, J. A. (2009).Facts or friends? distinguishing informational and conversational ques-tions in social q&a sites. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, pages 759–768.

[Harper et al., 2008] Harper, F. M., Raban, D., Rafaeli, S., and Konstan,J. A. (2008). Predictors of answer quality in online q&a sites. In Pro-ceedings of the SIGCHI Conference on Human Factors in ComputingSystems, pages 865–874.

– 126 –

[Hars and Ou, 2002] Hars, A. and Ou, S. (2002). Working for free? moti-vations for participating in open-source projects. International Journalof Electronic Commerce, pages 25–39.

[Haythornthwaite, 2009] Haythornthwaite, C. (2009). Crowds and commu-nities: Light and heavyweight models of peer production. In The 42ndHawaii International Conference on System Sciences (HICSS’09), pages1–10.

[Ho and Vaughan, 2012] Ho, C.-J. and Vaughan, J. W. (2012). Online taskassignment in crowdsourcing markets. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2012), pages 45–51.

[Holmstrom and Milgrom, 1991] Holmstrom, B. and Milgrom, P. (1991).Multitask principal-agent analyses: Incentive contracts, asset ownership,and job design. Journal of Law, Economics, & Organization, 7:24–52.

[Houser and Wooders, 2006] Houser, D. and Wooders, J. (2006). Reputa-tion in auctions: Theory, and evidence from ebay. Journal of Economics& Management Strategy, 15(2):353–369.

[Howe, 2006] Howe, J. (2006). The rise of crowdsourcing. Wired maga-zine, 14(6):1–4.

[Howe, 2009] Howe, J. (2009). Crowdsourcing: Why the power of thecrowd is driving the future of business. Three Rivers Press.

[Hurwicz, 1960] Hurwicz, L. (1960). Optimality and informational effi-ciency in resource allocation processes. Stanford University Press.

[Hurwicz, 1972] Hurwicz, L. (1972). On informationally decentralizedsystems. Decision and Organization, pages 297–336.

[Ipeirotis, 2010] Ipeirotis, P. G. (2010). Analyzing the amazon mechanicalturk marketplace. XRDS: Crossroads, The ACM Magazine for Students,17(2):16–21.

[Jain et al., 2009] Jain, S., Chen, Y., and Parkes, D. C. (2009). Designingincentives for online question and answer forums. In Proceedings of the

– 127 –

10th ACM Conference on Electronic Commerce (EC’09), pages 129–138.[Jain and Parkes, 2009] Jain, S. and Parkes, D. C. (2009). The role of

game theory in human computation systems. In Proceedings of the ACMSIGKDD Workshop on Human Computation, pages 58–61.

[Jin and Fan, 2011] Jin, Y. and Fan, J. (2011). Test for english majors (tem)in china. Language Testing, 28(4):589–596.

[Jøsang et al., 2007] Jøsang, A., Ismail, R., and Boyd, C. (2007). A surveyof trust and reputation systems for online service provision. DecisionSupport Systems, 43(2):618–644.

[Jurca and Faltings, 2007] Jurca, R. and Faltings, B. (2007). Collusion-resistant, incentive-compatible feedback payments. In Proceedings of the8th ACM Conference on Electronic Commerce (EC’07), pages 200–209.

[Kaplan and Sela, 2010] Kaplan, T. R. and Sela, A. (2010). Effective con-tests. Economics Letters, 106(1):38–41.

[Karger et al., 2014] Karger, D. R., Oh, S., and Shah, D. (2014). Budget-optimal task allocation for reliable crowdsourcing systems. OperationsResearch, 62(1):1–24.

[Kittur, 2010] Kittur, A. (2010). Crowdsourcing, collaboration and creativ-ity. ACM Crossroads, 17(2):22–26.

[Kittur et al., 2008] Kittur, A., Chi, E. H., and Suh, B. (2008). Crowdsourc-ing user studies with mechanical turk. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, pages 453–456.

[Kittur et al., 2011] Kittur, A., Smus, B., Khamkar, S., and Kraut, R. E.(2011). Crowdforge: Crowdsourcing complex work. In Proceedings ofthe 24th Annual ACM Symposium on User Interface Software and Tech-nology, pages 43–52.

[Kocsis and Szepesvari, 2006] Kocsis, L. and Szepesvari, C. (2006). Ban-dit based monte-carlo planning. In Machine Learning: ECML 2006,pages 282–293.

[Kolobov, 2012] Kolobov, A. (2012). Planning with markov decision pro-

– 128 –

cesses: an ai perspective. Synthesis Lectures on Artificial Intelligenceand Machine Learning, 6(1):1–210.

[Kuhn, 1955] Kuhn, H. W. (1955). The hungarian method for the assign-ment problem. Naval research logistics quarterly, 2(1-2):83–97.

[Kulkarni et al., 2012] Kulkarni, A., Can, M., and Hartmann, B. (2012).Collaboratively crowdsourcing workflows with turkomatic. In Proceed-ings of the ACM 2012 conference on Computer Supported CooperativeWork, pages 1003–1012.

[Kulkarni et al., 2011] Kulkarni, A. P., Can, M., and Hartmann, B. (2011).Turkomatic: automatic recursive task and workflow design for mechani-cal turk. In CHI’11 Extended Abstracts on Human Factors in ComputingSystems, pages 2053–2058.

[Lakhani et al., 2010] Lakhani, K., Garvin, D., and Lonstein, E. (2010).Topcoder (a): Developing software through crowdsourcing. HarvardBusiness School General Management Unit Case No. 610-032.

[Lakhani et al., 2007] Lakhani, K. R., Jeppesen, L. B., Lohse, P. A., andPanetta, J. A. (2007). The value of openess in scientific problem solving.HBS Working Paper Number: 07-050, Harvard Business School.

[Little et al., 2009] Little, G., Chilton, L. B., Goldman, M., and Miller,R. C. (2009). Turkit: tools for iterative tasks on mechanical turk. InProceedings of the ACM SIGKDD Workshop on Human Computation,pages 29–30.

[Luca, 2011] Luca, M. (2011). Reviews, reputation, and revenue: The caseof yelp. com. Technical report, Harvard Business School.

[Malone and Crowston, 1994] Malone, T. W. and Crowston, K. (1994).The interdisciplinary study of coordination. ACM Computing Surveys(CSUR), 26(1):87–119.

[Malone et al., 2009] Malone, T. W., Laubacher, R., and Dellarocas, C.(2009). Harnessing crowds: mapping the genome of collective intelli-gence. In MIT Sloan School of Management Working Paper No. 4732-09.

– 129 –

[Mamykina et al., 2011] Mamykina, L., Manoim, B., Mittal, M., Hripcsak,G., and Hartmann, B. (2011). Design lessons from the fastest q&a site inthe west. In Proceedings of the SIGCHI Conference on Human Factorsin Computing Systems (CHI’11), pages 2857–2866.

[Mason and Watts, 2010] Mason, W. and Watts, D. J. (2010). Financialincentives and the “performance of crowd”. ACM SIGKDD ExplorationsNewsletter, 11(2):100–108.

[Minor, 2011] Minor, D. B. (2011). Increasing effort through softening in-centives in contests. Technical report, Working Paper, University of Cal-ifornia, Berkeley.

[Moldovanu and Sela, 2001] Moldovanu, B. and Sela, A. (2001). The opti-mal allocation of prizes in contests. American Economic Review, pages542–558.

[Moldovanu and Sela, 2006] Moldovanu, B. and Sela, A. (2006). Contestarchitecture. Journal of Economic Theory, 126(1):70–96.

[Nam et al., 2009] Nam, K. K., Ackerman, M. S., and Adamic, L. A.(2009). Questions in, knowledge in?: a study of naver’s question an-swering community. In Proceedings of the SIGCHI Conference on Hu-man Factors in Computing Systems, pages 779–788.

[Nielsen, 2006] Nielsen, J. (2006). Participation inequality: En-couraging more users to contribute. Jakob Nielsen’s alertbox,http://www.useit.com/alertbox/participation inequality.

[Noronha et al., 2011] Noronha, J., Hysen, E., Zhang, H., and Gajos, K. Z.(2011). Platemate: crowdsourcing nutritional analysis from food pho-tographs. In Proceedings of the 24th Annual ACM Symposium on UserInterface Software and Technology, pages 1–12.

[Puterman, 2009] Puterman, M. L. (2009). Markov decision processes: dis-crete stochastic dynamic programming. John Wiley & Sons.

[Resnick et al., 2006] Resnick, P., Zeckhauser, R., Swanson, J., and Lock-wood, K. (2006). The value of reputation on ebay: A controlled experi-

– 130 –

ment. Experimental Economics, 9(2):79–101.[Sattinger, 1993] Sattinger, M. (1993). Assignment models of the distribu-

tion of earnings. Journal of Economic Literature, pages 831–880.[Stewart et al., 2010] Stewart, O., Lubensky, D., and Huerta, J. M. (2010).

Crowdsourcing participation inequality: a scout model for the enterprisedomain. In Proceedings of the ACM SIGKDD Workshop on Human Com-putation, pages 30–33.

[Surowiecki, 2004] Surowiecki, J. (2004). The wisdom of crowds: Whythe many are smarter than the few and how collective wisdom shapesbusiness. Economies, Societies and Nations.

[Surowiecki, 2005] Surowiecki, J. (2005). The wisdom of crowds. RandomHouse LLC.

[Sylvan, 2010] Sylvan, E. (2010). Predicting influence in an online com-munity of creators. In Proceedings of the SIGCHI Conference on HumanFactors in Computing Systems, pages 1913–1916.

[Szymanski and Valletti, 2005] Szymanski, S. and Valletti, T. M. (2005).Incentive effects of second prizes. European Journal of Political Econ-omy, 21(2):467–481.

[Tausczik and Pennebaker, 2011] Tausczik, Y. R. and Pennebaker, J. W.(2011). Predicting the perceived quality of online mathematics contribu-tions from users’ reputations. In Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems, pages 1885–1888.

[Tran-Thanh et al., 2014a] Tran-Thanh, L., Huynh, T. D., Rosenfeld, A.,Ramchurn, S. D., and Jennings, N. R. (2014a). Budgetfix: budget lim-ited crowdsourcing for interdependent task allocation with quality guar-antees. In Proceedings of the 2014 International Conference on Au-tonomous Agents and Multiagent Systems (AAMAS 2014), pages 477–484.

[Tran-Thanh et al., 2014b] Tran-Thanh, L., Stein, S., Rogers, A., and Jen-nings, N. R. (2014b). Efficient crowdsourcing of unknown experts using

– 131 –

bounded multi-armed bandits. In Proceedings of the 20th European Con-ference on Artificial Intelligence (ECAI 2012), pages 768–773.

[Tran-Thanh et al., 2013] Tran-Thanh, L., Venanzi, M., Rogers, A., andJennings, N. R. (2013). Efficient budget allocation with accuracy guar-antees for crowdsourcing classification tasks. In Proceedings of the 2013International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2013), pages 901–908.

[Von Ahn and Dabbish, 2004] Von Ahn, L. and Dabbish, L. (2004). La-beling images with a computer game. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, pages 319–326.

[Wageman and Baker, 1997] Wageman, R. and Baker, G. (1997). Incen-tives and cooperation: the joint effects of task and reward interde-pendence on group performance. Journal of Organizational Behavior,18:139–158.

[Wolpert and Tumer, 1999] Wolpert, D. H. and Tumer, K. (1999). An in-troduction to collective intelligence. Technical report, NASA-ARC-IC-99-63, NASA Ames Research Center.

[Yan and Van Roy, 2008] Yan, X. and Van Roy, B. (2008). Reputation mar-kets. In Proceedings of the 3rd International Workshop on Economics ofNetworked Systems, pages 79–84.

[Yang et al., 2008a] Yang, J., Adamic, L. A., and Ackerman, M. S. (2008a).Competing to share expertise: The taskcn knowledge sharing commu-nity. In Proceedings of the International Conference on Weblogs andSocial Media.

[Yang et al., 2008b] Yang, J., Adamic, L. A., and Ackerman, M. S. (2008b).Crowdsourcing and knowledge sharing: strategic user behavior ontaskcn. In Proceedings of the 9th ACM conference on Electronic com-merce (EC’08), pages 246–255.

[Zhang et al., 2009] Zhang, H., Chen, Y., and Parkes, D. C. (2009). A gen-eral approach to environment design with one agent. In Proceedings of

– 132 –

the Twenty-First International Joint Conference on Artificial Intelligence(IJCAI-09), pages 2002–2014.

[Zhang et al., 2007] Zhang, J., Ackerman, M. S., and Adamic, L. (2007).Expertise networks in online communities: structure and algorithms. InProceedings of the 16th International Conference on World Wide Web,pages 221–230.

[Zhang and Van der Schaar, 2012] Zhang, Y. and Van der Schaar, M.(2012). Reputation-based incentive protocols in crowdsourcing appli-cations. In INFOCOM, 2012 Proceedings IEEE, pages 2140–2148.

[Zheng et al., 2011] Zheng, H., Li, D., and Hou, W. (2011). Task design,motivation, and participation in crowdsourcing contests. InternationalJournal of Electronic Commerce, 15(4):57–88.

– 133 –

Publications

Journals

1. Huan Jiang and Shigeo Matsubara, “A Division Strategy for Achiev-ing Efficient Crowdsourcing Contest”, Journal of Information Pro-cessing, pp. 202-209. Vol. 22-2, 2014.

International Conference

1. Huan Jiang and Shigeo Matsubara, “Efficient Task Decomposition inCrowdsourcing”, In PRIMA 2014: Principles and Practice of Multi-Agent Systems, pp. 65-73. Springer International Publishing, 2014.

2. Huan Jiang and Shigeo Matsubara, “Improving Crowdsourcing Ef-ficiency Based on Division strategy”, In Proceedings of the 2012IEEE/WIC/ACM International Joint Conferences on Web Intelligenceand Intelligent Agent Technology, pp. 425-429. December, 2012.

Workshops and Symposiums

1. Huan Jiang and Shigeo Matsubara, “MDP-based Reward Design forEfficient Cooperative Problem Solving”, In Proceedings of the 18thInternational Workshop on Coordination, Organizations, Institutionsand Norm (COIN 2014), December, 2014, Australia.

– 135 –

2. Huan Jiang and Shigeo Matsubara, “Crowdsourcing Software BugDetection and Division Strategy”, In Proceedings of the Second JointAgent Workshop & Symposium (JAWS2011), October, 2011, Japan.

– 136 –

designing incentive for cooperative problem...

Documents