1 zebralancer: crowdsource knowledge atop open blockchain ... · crowdsourcing empowers open...

17
arXiv:1803.01256v5 [cs.HC] 7 May 2020 IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 1 ZebraLancer: Decentralized Crowdsourcing of Human Knowledge atop Open Blockchain Yuan Lu, Qiang Tang and Guiling Wang Abstract—We design and implement the first private and anonymous decentralized crowdsourcing system ZebraLancer, and overcome two fundamental challenges of decentralizing crowdsourcing, i.e., data leakage and identity breach. First, our outsource-then-prove methodology resolves the tension between the blockchain transparency and the data confidentiality to guarantee the basic utilities/fairness requirements of data crowdsourcing, thus ensuring: (i) a requester will not pay more than what data deserve, according to a policy announced when her task is published via the blockchain; (ii) each worker indeed gets a payment based on the policy, if he submits data to the blockchain; (iii) the above properties are realized not only without a central arbiter, but also without leaking the data to the open blockchain. Second, the transparency of blockchain allows one to infer private information about workers and requesters through their participation history. Simply enabling anonymity is seemingly attempting but will allow malicious workers to submit multiple times to reap rewards. ZebraLancer also overcomes this problem by allowing anonymous requests/submissions without sacrificing the accountability. The idea behind is a subtle linkability: if a worker submits twice to a task, anyone can link the submissions, or else he stays anonymous and unlinkable across tasks. To realize this delicate linkability, we put forward a novel cryptographic concept, i.e., the common-prefix-linkable anonymous authentication. We remark the new anonymous authentication scheme might be of independent interest. Finally, we implement our protocol for a common image annotation task and deploy it in a test net of Ethereum. The experiment results show the applicability of our protocol atop the existing real-world blockchain. Index Terms—Blockchain, crowdsourcing, confidentiality, anonymity, accountability. 1 I NTRODUCTION Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an- notated data: the benchmark of famous ImageNet challenge [1] was created via Amazon’s crowdsourcing marketplace, Mechanical Turk (MTurk) [2]. Another notable example is mobile crowdsensing [3] where one (called “requester”) can request a group of individuals (called “workers”) to use mobile devices to gather information fostering data-driven applications [4, 5]. Various monetary incentive mechanisms were introduced [6–12] to motivate workers to make real efforts. To facilitate these mechanisms, the state-of-the-art solution necessarily requires a trusted third-party to host crowdsourcing tasks to fulfill the fair exchange between the crowd-shared data and the rewards; otherwise, the effec- tiveness of incentive mechanisms can be hindered by the so-called “free-riding” (i.e. dishonest workers reap rewards without making real efforts) and “false-reporting” (i.e. dis- honest requesters try to repudiate the payment). It is well-known that the reduction of the reliance on a trusted third-party is desirable in practice, and the same goes for the case of crowdsourcing. First, numerous real- world incidents reveal that the party might silently mis- behave in-house for self-interests [13]; or, some of its em- ployees [14] or attackers [15] can compromise its function- The authors are with the Department of Computer Science, Ying Wu College of Computing, New Jersey Institute of Technology, Newark, NJ, 07102, USA. E-mail: {yl768, qiang, gwang}@njit.edu Two popular hypotheses of zebra strips were: (i) camouflage used to confuse predators by motion dazzle, and (ii) visual cues used by herd peers to identify. The delicate anonymity in our system can be analog, as it overcomes the natural tension between anonymity and accountability. Manuscript received Dec xx, 2018. An abridged version of this paper appeared in the 38th IEEE ICDCS, Vienna, Austria, July 2018. ality. Second, the party often fails to resolve disputes. For instance, requesters have a good chance to collect data without paying at MTurk, because the platform is biased on requesters over workers [16]. Third, a centralized plat- form inevitably inherits all the vulnerabilities of the single point failure. For example, Waze, a crowdsourcing map app, suffered from 3 unexpected server downs and 11 scheduled service outages during 2010-2013 [4]. Last but not least, a central platform hosting all tasks also increases the worry of massive privacy breach. A most fresh lesson to us is the tremendous data leakage of Uber [17]. In contrast, an open blockchain 1 is a distributed, trans- parent and immutable “bulletin board” organized as a chain of blocks. It is usually managed and replicated by a peer-to-peer (P2P) network collectively. Each block will include some messages committed by network peers, and be validated by the whole network according to a pre- defined consensus protocol. This ensures reliable message deliveries via the untrusted Internet. More interestingly, the messages contained in each block can be program code, the execution of which is enforced and verified by all P2P network peers; hence, a more exotic application of smart contract [18] is enabled. Essentially, the smart contract can be viewed as a “decentralized computer” that faithfully handles all computations and message deliveries related to a specified task. It becomes enticing to build a decentralized crowdsourcing platform atop. 1 We remark that blockchain is used to refer open blockchain through this paper. An open/permissionless/public blockchain is a blockchain network that allows any party to participate and leave, as opposed to a less ambitious way of building blockchain atop per- missioned parties. ZebraLancer can inherit the P2P network of open blockchain as the underlying infrastructure.

Upload: others

Post on 02-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

arX

iv:1

803.

0125

6v5

[cs

.HC

] 7

May

202

0IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 1

ZebraLancer: Decentralized Crowdsourcing ofHuman Knowledge atop Open Blockchain

Yuan Lu, Qiang Tang and Guiling Wang

Abstract—We design and implement the first private and anonymous decentralized crowdsourcing system ZebraLancer, and

overcome two fundamental challenges of decentralizing crowdsourcing, i.e., data leakage and identity breach.

First, our outsource-then-prove methodology resolves the tension between the blockchain transparency and the data confidentiality to

guarantee the basic utilities/fairness requirements of data crowdsourcing, thus ensuring: (i) a requester will not pay more than what

data deserve, according to a policy announced when her task is published via the blockchain; (ii) each worker indeed gets a payment

based on the policy, if he submits data to the blockchain; (iii) the above properties are realized not only without a central arbiter, but

also without leaking the data to the open blockchain. Second, the transparency of blockchain allows one to infer private information

about workers and requesters through their participation history. Simply enabling anonymity is seemingly attempting but will allow

malicious workers to submit multiple times to reap rewards. ZebraLancer also overcomes this problem by allowing anonymous

requests/submissions without sacrificing the accountability. The idea behind is a subtle linkability: if a worker submits twice to a task,

anyone can link the submissions, or else he stays anonymous and unlinkable across tasks. To realize this delicate linkability, we put

forward a novel cryptographic concept, i.e., the common-prefix-linkable anonymous authentication. We remark the new anonymous

authentication scheme might be of independent interest. Finally, we implement our protocol for a common image annotation task and

deploy it in a test net of Ethereum. The experiment results show the applicability of our protocol atop the existing real-world blockchain.

Index Terms—Blockchain, crowdsourcing, confidentiality, anonymity, accountability.

1 INTRODUCTION

Crowdsourcing empowers open collaboration over theInternet. One remarkable example is the solicitation of an-notated data: the benchmark of famous ImageNet challenge[1] was created via Amazon’s crowdsourcing marketplace,Mechanical Turk (MTurk) [2]. Another notable example ismobile crowdsensing [3] where one (called “requester”) canrequest a group of individuals (called “workers”) to usemobile devices to gather information fostering data-drivenapplications [4, 5]. Various monetary incentive mechanismswere introduced [6–12] to motivate workers to make realefforts. To facilitate these mechanisms, the state-of-the-artsolution necessarily requires a trusted third-party to hostcrowdsourcing tasks to fulfill the fair exchange between thecrowd-shared data and the rewards; otherwise, the effec-tiveness of incentive mechanisms can be hindered by theso-called “free-riding” (i.e. dishonest workers reap rewardswithout making real efforts) and “false-reporting” (i.e. dis-honest requesters try to repudiate the payment).

It is well-known that the reduction of the reliance ona trusted third-party is desirable in practice, and the samegoes for the case of crowdsourcing. First, numerous real-world incidents reveal that the party might silently mis-behave in-house for self-interests [13]; or, some of its em-ployees [14] or attackers [15] can compromise its function-

• The authors are with the Department of Computer Science, Ying WuCollege of Computing, New Jersey Institute of Technology, Newark, NJ,07102, USA. E-mail: {yl768, qiang, gwang}@njit.edu

• Two popular hypotheses of zebra strips were: (i) camouflage used toconfuse predators by motion dazzle, and (ii) visual cues used by herdpeers to identify. The delicate anonymity in our system can be analog, asit overcomes the natural tension between anonymity and accountability.

Manuscript received Dec xx, 2018. An abridged version of this paper appearedin the 38th IEEE ICDCS, Vienna, Austria, July 2018.

ality. Second, the party often fails to resolve disputes. Forinstance, requesters have a good chance to collect datawithout paying at MTurk, because the platform is biasedon requesters over workers [16]. Third, a centralized plat-form inevitably inherits all the vulnerabilities of the singlepoint failure. For example, Waze, a crowdsourcing map app,suffered from 3 unexpected server downs and 11 scheduledservice outages during 2010-2013 [4]. Last but not least, acentral platform hosting all tasks also increases the worryof massive privacy breach. A most fresh lesson to us is thetremendous data leakage of Uber [17].

In contrast, an open blockchain1 is a distributed, trans-parent and immutable “bulletin board” organized as achain of blocks. It is usually managed and replicated bya peer-to-peer (P2P) network collectively. Each block willinclude some messages committed by network peers, andbe validated by the whole network according to a pre-defined consensus protocol. This ensures reliable messagedeliveries via the untrusted Internet. More interestingly, themessages contained in each block can be program code,the execution of which is enforced and verified by all P2Pnetwork peers; hence, a more exotic application of smartcontract [18] is enabled. Essentially, the smart contract canbe viewed as a “decentralized computer” that faithfullyhandles all computations and message deliveries related toa specified task. It becomes enticing to build a decentralizedcrowdsourcing platform atop.

1 We remark that blockchain is used to refer open blockchainthrough this paper. An open/permissionless/public blockchain is ablockchain network that allows any party to participate and leave,as opposed to a less ambitious way of building blockchain atop per-missioned parties. ZebraLancer can inherit the P2P network of openblockchain as the underlying infrastructure.

Page 2: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 2

Unfortunately, this new fascinating technology alsobrings about new privacy challenges [19], which were neverthat severe in the centralized setting before and could evenharm the basic system utilities, as one notable feature of theblockchain is its transparency. The whole chain is replicatedto the whole network to ensure consistency, thus the datasubmitted to the blockchain will be visible to the public.

First, the blockchain transparency causes an immediateproblem violating data privacy, considering that many ofthe crowdsourced data maybe sensitive. For example, evenin the intuitively “safe” image annotation tasks, if there aresome special ambiguous pictures (e.g. Thematic Appercep-tion Test pictures [20]), the answers to them can be usedto infer the personality profiles of workers. Sometimes, thedata are simply valuable to the requester who paid to getthem. What is worse, since the block confirmation (whichcorresponds to the time when the submitted answers areactually recorded in a block) normally takes some time afterthe data is submitted to the network, a malicious workercan simply copy the data committed by others, and submitthe same data as his own to run the free-riding attack.Without data confidentiality, the incentive mechanisms couldbe rendered completely ineffective in decentralized settings.

Second, most crowdsourcing systems [2, 4] and incentivemechanisms [7–9] implicitly require requesters/workers toauthenticate to prevent misbehaviors caused by (colluded)counterfeited identities [21]. When crowdsourcing is decen-tralized, this basic requirement will cause the history ofsubmitting/requesting to be known by everyone (whichwas previously “protected” in a data center such as thebreached one of Uber’s). Thus considerable informationabout workers/requesters [22] will leak to the public throughtheir participation history, which seriously impairs theirprivacy and even demotivates them to join. Notably, if auser frequently joins traffic monitoring tasks, then anyonecan read the blockchain and figure out his location traces.

Clearly, to decentralize crowdsourcing in a meaningfulway to realize basic system utilities, the above fundamentalprivacy challenges have to be overcome, which requires usto resolve two natural tensions: (i) the tension between theblockchain transparency and the data confidentiality, and(ii) the tension between the anonymity and the accountabil-ity. Simple solutions utilizing some standard cryptographictools (e.g. encryption and/or group signature) to protectthe data confidentiality and the anonymity do not workwell: the encryption of data immediately prevents smartcontracts from enforcing the rewards policy; to allow fullyanonymous participation will give a dishonest worker anopportunity of multiple submissions in one crowdsourcingtask, and thus he may claim more rewards than what issupposed (similarly for a malicious requester). See moredetails about the challenges in Section 2.

Our contributions. In this paper, we construct, analyzeand implement a general blockchain based protocol toenable the first private and anonymous decentralized data

crowdsourcing system2. Without relying on any third-partyinformation arbiter, our protocol can still guarantee thefaithful execution for a class of incentive mechanisms,once they are announced as pre-specified policies in theblockchain. More importantly, we also protect confidentialdata and identities from the blockchain network, while theunderlying blockchain is auditing the correct execution ofcrowdsourcing tasks. Specifically,

1) A blockchain based protocol is proposed to realizedecentralized crowdsourcing that satisfies: (i) the fairexchange between data and rewards, i.e., a workerwill be paid the correct amount according to the pre-defined policy of evaluating data, if he submits to theblockchain; (ii) data confidentiality, i.e., the submitteddata is confidential to anyone other than the requester;and (iii) anonymity and accountability.Intuition behind the fairness and confidentiality is anoutsource-then-prove methodology that: (i) the requesteris enforced to deposit the budget of her incentive pol-icy to a smart contract; (ii) submissions are encryptedunder the requester’s public key and will be collectedby the blockchain; (iii) the evaluation of rewards isoutsourced to the requester who then needs to send aninstruction about how to reward workers. The instruc-tion is ensured to follow the promised incentive policy,because the requester is also required to attach a validsuccinct zero-knowledge proof.The worker anonymity of our protocol can ensure: (i)the public, including the requester and the implicitregistration authority, is not able to tell whether a datacomes from a given worker or not; (ii) if a workerjoins multiple tasks announced via the blockchain, noone can link these tasks. More importantly, we alsoaddress the threat of multiple-submission exacerbatedby anonymity misuse. In particular, if a worker anony-mously submits more than the number allowed in onetask, our scheme allows the blockchain to tell and dropthese invalid submissions. Similarly, we achieve therequesters’ accountable anonymity.

2) To achieve the above goal of anonymity while preserv-ing accountability, we propose, define and constructa new cryptographic primitive, called common-prefix-linkable anonymous authentication. In most of the time,a user can authenticate messages and attest the validityof identity without being linked. The only exceptionthat anyone can link two authenticated messages is thatthey share the same prefix and are authenticated by oneuser.To utilize the new primitive in our protocol, a workerhas to submit to a task via an anonymous authen-tication. The reference of the task will be unique,and should be the common-prefix, so that the speciallinkability will prevent multi-submission to a task. Arequester can also use it to authenticate in each taskshe publishes, and convince workers that she cannot

2A couple of recent attempts on decentralized crowdsourcing [23]have been made, however, none resolved the fundamental privacyissues, which are quite arguable for basic functionalities, as in thesesystems, malicious workers can simply free-ride by copying othersubmissions. See Section 3 for the detailed insufficiencies.

Page 3: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 3

maliciously submit to intentionally downgrade theirrewards. We remark that such a scheme can be usedto anonymously authenticate in a constant time inde-pendent to tasks.Such a primitive may be of independent interests forthe special flavor of its accountability-yet-anonymity.

3) To showcase the feasibility of applying our protocol,we implement the system that we call ZebraLancer fora common image annotation task on top of Ethereum,a real-world blockchain infrastructure. Intensive exper-iments and performance evaluations are conducted ina Ethereum test net. Since current smart contracts sup-port only primitive operations, tailoring such protocolscompatible with existing blockchain platforms is non-trivial.

2 PROBLEM FORMULATION

In this section, we will give more precise definitions aboutthe problem and its security requirements.

TABLE 1Key notations related to crowdsourcing model

Notation Represent for

R A requester that can be uniquely identified by ID R

Wj A worker who can be uniquely identified by ID Wi

TThe crowdsourcing task published by R to collect nanswers from n different workers

Aj The answer that is submitted to T by Wj

R(Aj ; τ) The process that defines the value of the answer Aj

Rj The reward for answer Aj according to its value

Data crowdsourcing model. As illustrated in Fig.1, thereare four roles in the model of data crowdsourcing, i.e.,requesters, workers, a platform and a registration authority.A requester, uniquely identified by R, can post a task tocollect a certain amount of answers from the crowd. Whenannouncing the task, the requester promises a concrete re-ward policy to incentivize workers to contribute (see detailsabout the definition of reward policy below). A worker witha unique ID Wj , submits his answer Aj and expects toreceive the corresponding reward. The platform, a mediumassisting the exchange between requesters and workers, iseither a trusted party or emulated by a network of peers. Theplatform considered in this paper is jointly maintained by acollection of network peers, and in particular, we will buildit atop a open blockchain network. The registration authority(RA), can play an important role of verifying and managingunique identities of workers/requesters, by binding eachidentity to a unique credential (e.g. a digital certificate). Notethat an answer Aj is chosen from a well-defined set A.

We remark that the well established identities are nec-essary demand of real-world crowdsourcing systems, forexample MTurk and Waze, to prevent misbehaviors suchas Sybil attack. Moreover, many auction-based incentivemechanisms [8, 9] are built upon the non-collusive gametheory that implicitly requires established identities to en-sure one bid from one unique ID. We employ RA to establish

Fig. 1. The model of data crowdsourcing: workers and requesters obtainunique credentials bound to their identities at a registration authority(RA); authenticated requesters and authenticated workers can makefair exchange between data and rewards through a third-party platform(arbiter).

identities. In practice, a RA can be instantiated by (i) theplatform itself, (ii) the certificate authority who providesauthentication service, or (iii) the hardware manufacturerwho makes trusted devices that can faithfully sign messages[24]. Our solution should be able to inherit these establishedRAs in the real-world.

In this paper, w.l.o.g., we assume that each unique iden-tity is only allowed to submit one answer to a task. Also, weconsider settings where the value of crowd-shared answerscan be evaluated by a well-defined process such as auctionsand quality-aware rewards, and also the correspondingrewards, c.f. [10–12] about quality-aware rewards and [8, 9]about auction-based incentives. These incentives share thesame essence as follows.

Suppose an authenticated requester publishes a task Twith a budget τ to collect n answers from n workers. Anauthenticated worker interested in it will then submit his an-swers. The reward of an answer Aj will follow a well-definedprocess determined by some auxiliary variables, i.e., thereward of Aj can be defined as Rj := R(Aj ; a1, . . . , am, τ),where R is a function parameterized by some auxiliaryvariables denoted by a1, . . . , am, τ . Remark that τ is thebudget of the requester, and we will use Rj := R(Aj ; τ)for short.

Particularly, in some simple tasks (e.g. multiple choiceproblems), the quality of an answer can be straightfor-wardly evaluated by all answers to the same task, i.e.Rj = R(Aj ;A1, . . . , An, τ), with using majority voting orestimation maximization iterations [10–12]. More generally,[25] proposed a universal method to evaluate quality: (i)some workers are requested to answer a complex task;(ii) other different workers are then requested to gradeeach answer collected in the previous stage. What’s more,our model captures the essence of auction-based incentivemechanism such as [8, 9], when the parameters a1, . . . , amrepresent the bids of workers (and other necessary auxiliaryinputs such as their reputation scores).

Our work considers the general definition above andwill be extendable to the scope of auction-based incentives,even though the protocol design and implementations inthis paper mainly focus on how to instantiate quality-awareincentives. Also note that the flat-rate incentive, that is each

Page 4: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 4

submitted answer will get a flat-rate payment [2], can beconsidered as a special case of the above abstraction as well.

Security models. Next, we specify the basic security require-ments for our (decentralized) crowdsourcing system on topof the existing infrastructure of open blockchain.

Data confidentiality. Ideally, this property requires thatthe communication transcripts (including the blocks in theblockchain) do not leak anything to anyone (except therequester) about the input parameters a1, . . . , am of the in-centive policy R. Because these parameters might actually beconfidential data or sealed bids. We can adapt the classicalsemantic security [26] style definition from cryptographyfor this ideal purpose: the distribution of the public com-munication can be simulated with only public knowledge.However, an adversarial worker can infer information afterreceiving his payment. This is the inherent issue of incentivemechanisms, and cannot be resolved even if there is a fullytrusted third-party. So our protocol will focus on the best-possible data confidentiality, as if there is a fully trustedparty. Informally speaking, our protocol is running in the“real” world, and imagine there is an “ideal” world whereexists a fully trusted party facilitating the incentives; by real-ideal world paradigm, our data confidentiality requires thatthe two worlds are computationally indistinguishable.

Anonymity. Private information of worker/requester canbe explored by linking tasks they join/publish [22]. In-tuitively, we might require two anonymity properties forworkers: (i) unlinkability between a submission and a particularworker and (ii) unlinkability among all tasks joined by a par-ticular worker. However, (i) indeed can be implied by (ii),because the break of first one can obviously lead up to thebreak of the latter one. Similarly, the anonymity of requestercan be understood as the unlinkability among all taskspublished by her. The requirement of worker anonymitycan be formulated via the following game. An adversaryA corrupts a requester, the registration authority (RA), andthe platform (e.g. the blockchain); suppose there are onlytwo honest workers, W0 and W1. In the beginning, theadversary announces a number of n tasks. For each taskTi, suppose there are a set of participating workers WTi

.After seeing all the communications, for any Ti 6= Tj , Acannot tell whether WTi

∩WTj= ∅ better than guessing.

We note that the anonymity should hold even if all entities,including the requester and the platform (except W0 andW1), are corrupted. The requester anonymity can be definedvia the above game similarly, and we omit the details.

Security against a malicious requester. A malicious re-quester may avoid paying rewards (defined by the policyR), e.g. launches the false-reporting attack. Security in thiscase can be formulated via the following security game:an adversary A corrupts the requester and executes theprotocol to publish a task with a promised reward policy R

and a budget τ . Let us define a bad event B1 to be that thereexists a worker Wj , who submits answers Aj and receivesa payment smaller than Rj = R(Aj ; τ). We require that forevery polynomial time adversary A, the probability Pr[B1]is negligibly small.

Security against malicious workers. A dishonest workermay try to harvest more rewards than what he deserves.Security in this case can be formulated as follows. An

adversary A corrupts one worker,3 and participates in theprotocol interacting with a requester (and the platform):(i) A submits some answers {A1, . . . , An}, n ≥ 1, let usdefine the bad event B2 as that A receives a paymentgreater than max{Aj∈{A1,...,An}} Rj := R(Aj ; τ) from therequester; (ii) A also sees the transcripts corresponding toan honest worker Wi’s submission (e.g. the ciphertext ofAi), and then submits her answer Aj to the platform, letus define the bad event B3 as that A receives a paymentRj = R(Ai; τ). We require that for all polynomial time A,Pr[B2] and Pr[B3] − Pr[(AA ← A) = Ai] are negligiblysmall.

We remark that the above securities against a maliciousrequester and malicious workers have captured the specialfairness of the exchange between crowd-shared data andrewards. For example, Sybil attackers who forge identities,and front-running attackers who copy and paste others’submissions (which can be either ciphertexts or plaintexts)shall be prevented.

Technical challenges. The main advantage that theblockchain offers is the smart contract that can be auto-matically executed as a piece of pre-defined agreement. Letus firstly look at a naive decentralized solution to a crowd-sourcing task: the requester codes her incentive policy into atask’s smart contract C , and then broadcasts the contract C

to the blockchain network to collect n answers; the incentivepolicy is using an incentive mechanism defined by a func-tion R and a budget τ ; after the contract C is included by ablock, it can expect workers submit answers to it in the clear;finally, C can evaluate the reward Ri = R(Ai;A1, . . . , A,τ)for each answer Ai; more importantly, the contract canfurther automatically transfer Ri to the blockchain addressbound to answer Ai.

The above naive solution looks appealing to resolve thefairness issues such as free-riding and false-reporting. Noticethat an implicit requirement is that answers (or bids) shouldbe submitted in the clear, as the contract needs all thoseas the inputs (i.e. a1, . . . , am) of incentive policy R to auditthe right amount of the reward for each answer. However,those inputs can be valuable and sensitive crowd-sharedanswers, and should be kept confidential. Or they could beauction bids, and should be confidential as well, because amalicious worker can learn the distribution of truthful bidsof honest workers, and further break auctions by craftingoptimized malicious (untruthful) bids. One may suggestthat a worker encrypts his answer (or his bid) under therequester’s public key, and submits the ciphertext instead.Unfortunately, this immediately renders the above naive so-lution to fail, as the contract (more precisely, the blockchainpeer) only sees ciphertexts and thus cannot evaluate thereward. Another proposal to hardcode the secret key inthe smart contract also fails, as the secret key will becometransparent. Also remark that more advanced encryptionschemes such as fully homomorphic encryption do not help,

3We remark that we focus on resolving the new challenges intro-duced by blockchain, and put forth the best possible security, as if thereis a fully trusted third-party serving as the crowdsourcing platform.For example, it is not quite clear how to handle the collusion of manyidentities, even in the centralized setting; thus such a problem is out ofthe scope of this paper.

Page 5: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 5

because the blockchain peers still cannot know the plaintextof rewards without the secret key, even if they can computethe ciphertext of rewards. Therefore, the tension between thedata confidentiality and the transparent execution of smartcontracts brings in our first challenge:

Challenge 1. Leveraging the blockchain to enforce incentives,but not revealing the confidential inputs of incentive mechanism(e.g. answers or bids) to the public.

As briefly pointed out in the introduction, when theanonymity of workers/requesters is not protected well,their private information can be hampered, because theparticipation history of tasks will leak to the public throughthe blockchain. What’s worse, negative payoffs will beadded to decrease the motivations of workers/requestersto participate, due to their breached privacy. To mitigatethe risk of privacy breach, one might suggest anonymoussubmission/request. However, if a worker is allowed tosubmit data without authenticating his unique identity, amalicious one may exploit this to flood a task by multiplesubmissions. In particular, a requester specifies a task tocollect 20 answers from 20 anonymous workers, and a mali-cious worker might submit 20 colluded answers with claim-ing that they are from 20 different workers, as all answersare anonymously submitted. Worse still, an anonymousrequester might send (colluded) answers to her own task torepudiate payments. For instance, a requester anonymouslypublishes a task to collect 50 answers, and she submits30 colluded anonymous answers to downgrade other 20answers, such that she avoids paying but still gets 20 usefulanswers. These scenarios cause another tension betweenthe participants anonymity and their accountability, whichbrings in the second challenge:

Challenge 2. Ensuring participants to be anonymous whilekeeping their accountability in decentralized crowdsourcing.

3 RELATED WORK

We thoroughly review related works, and briefly discuss theinsufficiencies of the state-of-the-art solutions.

Centralized crowdsourcing systems. MTurk [2] is the mostcommercially successful crowdsourcing platform. But ithas a well-know vulnerability allowing false-reporters gainshort-term advantage [16]. Also, MTurk collects plaintexts ofanswers, which causes considerable worry of data leakage.Last, the pseudo IDs in MTurks can be trivially linked by amalicious requester. Dynamo [27] was designed as a privacywrapper of MTurk. Its pseudo ID can only be linked by thepseudo ID issuer, but still it inherited all other weaknesses ofMTurk. SPPEAR [28] considered a couple of privacy issuesin data crowdsourcing, and thus introduced a couple moreauthorities, each of which handled a different functionality.Distributing one authority into multiple reduces the exces-sive trust, but, unfortunately, it is still not clear how toinstantiate all those different authorities in practice.

Decentralized crowdsourcing. We also note there are sev-eral attempts [23, 29–31] using blockchain to decentralizecrowdsourcing, but neither of them considers privacy andanonymity which are arguably fundamental for basic utility:the authors of [29] built up a crowd-shared service on topof the blockchain, but the system is not compatible with

incentive mechanisms and is not privacy-preserving either;the authors of [30] leveraged the blockchain as a paymentchannel in their crowdsourcing framework, but it is neithersecure against malicious workers and dishonest requesters,nor privacy-preserving; a couple of recent attempts [23, 31]took advantage of the public blockchain to enforce incen-tives, but these frameworks are neither private nor anony-mous, i.e., the collected data and the unique identities (suchas certificates) will leak to the whole network of the openblockchain.

Anonymous crowdsourcing. Li and Cao [32] proposeda framework to allow workers generate their ownpseudonyms based on their device IDs. But the protocolsacrificed the accountability of workers, because workerscan forge pseudonyms without attesting that they are asso-ciated to real IDs, which gave a malicious worker chances toforge fake pseudo IDs and cheat for rewards. Rahaman et al.[33] proposed an anonymous-yet-accountable protocol forcrowdsourcing based on group signature, and focused onhow to revoke the anonymity of misbehaved workers. Mis-behaved workers could be identified and further revokedby the group manager. The authors in [28] similarly reliedon group signature but introduced a couple of separateauthorities. Our solution can be considered as a proactiveversion that can prevent worker misbehavior, and withoutrelying on a group manager.

Accountable anonymous authentication. The pioneeringworks in anonymous e-cash [34, 35] firstly proposed thenotion of one-time anonymous authentication. The conceptlater was studied in the context of one-show anonymouscredential [36]. Some works [37, 38] further extended thenotion of one-time use to be k-time use, and thereforeenabled a more general accountability for anonymous au-thentications. In [39], the authors considered a special flavorof accountability to periodically allow k-time anonymousauthentications.

Our new primitive provides a more fine-grained con-ditional linkage of anonymous authentications, which isneeded for preventing multi-submission in each crowd-sourcing task.

Linkable group/ring signature. Conceptually similar to thelinkability appeared in one-show credential [36], linkablegroup/ring signatures [40, 41] were proposed to allow auser to sign messages on behalf of his group unlinkably upto twice.

In [42], a more general concept of event-oriented linkablegroup signature was formulated to realize more fine-grainedtrade-off between accountability and anonymity: a user cansign on behalf of his group unlinkably up to k times perevent, where an event could be a common reference string(e.g., the unique address to call a smart contract deployed inthe blockchain). But its main disadvantage is that the groupmanager can reveal the actual identities of users. Similarevent-oriented linkability was discussed in the context ofring signature [43] as well. Even though that constructionenjoys unbreakable anonymity, its main drawback becomesthe impracticability of “hiding” the real identity behind alarge group (mainly because the verification of signaturesmight require the public keys of all group members).

Page 6: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 6

Our new primitive can be considered as a specialcryptographic notion to formalize the subtle balance be-tween event-oriented linkability and irrevocable anonymity(within a large and dynamic group). Specifically, our schemeensures that no one can link the actual identity to anyauthenticated message (which is strictly stronger than [42]),and also it enables a user to “hide” behind a large group ofusers (i.e. all users registered at RA, which is impossible inpractice via [43]).

Privacy-preserving smart contracts. Privacy-preservingsmart contract is a recent hot topic in blockchain research.Most of them are for general purpose consideration [44, 45],and thus deploy heavy cryptographic tools including gen-eral secure multi-party computation (MPC). Hawk [46] didprovide a general framework for privacy-preserving smartcontracts using light zk-SNARK, but mainly for reward re-ceiver to prove to the contract. Our work can be consideredas a very specially designed MPC protocol, and a lot of ded-icated optimizations of zk-SNARK exist which can directlybenefit our protocol. Last, cryptocurrencies like Zcash [47]and Ethereum [48] also leverage zk-SNARK to build a publicledger that supports anonymous transactions. We note thatthey consider more basic blockchain infrastructures, on topof which we may build our application for crowdsourcing.

4 PRELIMINARIES

Blockchain and smart contracts. A blockchain is a globalledger maintained by a P2P network collectively following apre-defined consensus protocol. Each block in the chain willaggregate some transactions containing use-case specificdata (e.g., monetary transfers, or program codes). In general,we can view the blockchain as an ideal public ledger [49]where one can write and read data in the clear. Moreover, itwill faithfully carry out certain pre-defined functionalities.The last property captures the essence of smart contracts,that every blockchain node will run them as programsand update their local replicas according to the executionresults, collectively. More specifically, the properties of theblockchain can be informally abstracted as the followingideal public ledger model [46]:

1) Reliable delivery of messages. The blockchain can be mod-eled as an ideal public ledger that ensures the livenessand persistence of messages committed to it [49]. De-tailedly, a message sent to the blockchain is in the formof a validly signed transaction broadcasted to the wholeblockchain network, and then it will be solicited by ablock and written into the blockchain.Remark that we assume the synchronous networkmodel [46, 50] of blockchain through the paper, thatmeans a transaction will appear in any honest node’sreplica within a certain time after it appears in theblockchain network. Essentially, the blockchain deliversany message in the form of valid transaction to anyblockchain node within a-priori delay.Also we remark that a network adversary can reordertransactions that are broadcasted to the network but notyet written into a block.

2) Correct computation. The blockchain can be seen as astate machine driven by messages included in each

block [51]. Specifically, miners and full nodes will per-sistently receive newly proposed blocks, and faithfullyexecute “programs” defined by current states with tak-ing messages in new blocks as inputs. Moreover, thecomputing results can be reliably delivered to the wholenetwork.

3) Transparency. All internal states of the blockchain willbe visible to the whole blockchain (intuitively, anyone).Therefore, all message deliveries and computations viathe blockchain are in the clear.

4) Blockchain address (Pseudonym). By default, the sender ofa message in the blockchain is referred to a pseudonym,a.k.a. blockchain address. In practice, a blockchain ad-dress is usually bound to the hash of a public key;more importantly, the security of digital signaturescan further ensure that one cannot send messages inthe name of a blockchain address, unless she has thecorresponding secret key.Also, the program code of a smart contract deployed inthe blockchain can also be referred by a unique address,such that one can call the contract to be executed, bycommitting a message pointing to this unique address.

zk-SNARK. A zero-knowledge proof (zk-proof) allows aparty (i.e. prover) to generate a cryptographic proof con-vincing another party (i.e. verifier) that some values areobtained by faithfully executing a pre-defined computationon some private inputs (i.e. witness) without revealing anyinformation about the private state. The security guaranteesare: (i) soundness, that no prover can convince a verifier ifshe did not compute the results correctly; sometimes, werequire a stronger soundness that for any prover, there existsan extractor algorithm which interacts with the prover andcan actually output the witness (a.k.a. proof-of-knowledge); (ii)zero-knowledge, that the proof distribution can be simulatedwithout seeing any secret state, i.e., it leaks nothing aboutthe witness. Both above will hold with an overwhelmingprobability.

The zero-knowledge succinct non-interactive argumentof knowledge (zk-SNARK) further allows such a proofto be generated non-interactively. More importantly, theproof is succinct, i.e., the proof size is independent on thecomplexity of the statement to be proved, and is alwaysa small constant. More precisely, zk-SNARK is a tuple ofthree algorithms. A setup algorithm can output the publicparameters to establish a SNARK for a NP-complete lan-guageL = {~x | ∃~w, s.t., C(~x, ~w) = 1}. The Prover algorithmcan leverage the established SNARK to generate a constant-size proof attesting the trueness of a statement ~x ∈ L withwitness ~w. The Verifier algorithm can efficiently check theproof.

5 PROTOCOL: PRIVATE AND ANONYMOUS DE-

CENTRALIZED DATA CROWDSOURCING

In this section, we will construct a private and anonymousprotocol to address the critical challenges of decentralizingcrowdsourcing, without sacrificing security against “free-riders” and “false-reporters”. The procedures of crowd-sourcing will be decentralized atop an existing network of

Page 7: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 7

blockchain. More specifically, we will tackle the new privacyand anonymity challenges brought by the blockchain.

As we briefly mentioned in previous sections, the sys-tem will implicitly has a separate registration service thatvalidates each participant’s unique identity before issuing acertificate. Such setup alleviates some basic problems thatevery worker is allowed to submit no more than a fixednumber k of answers. For simplicity, we consider here k = 1.

Intuitions. Our basic strategy is to let the smart contract(which is hosted by the blockchain network) to enforcethe fair exchange between the submitted answers and theircorresponding rewards, but without revealing any data orany identity to the blockchain. Let us walk through the highlevel ideas first.

The requester firstly codifies a reward policy parame-terized by her budget (i.e. R(·; τ)) into a smart contract.She broadcasts a transaction containing the contract codeand the budget. Once the smart contract is included inthe blockchain, it can be referred by a unique blockchainaddress, and the budget should be deposited to this ad-dress (otherwise, no one would participate). After that, anyworker who is interested in contributing could simply sub-mit his answer to the blockchain, via a transaction pointingto the contract’s address.

As pointed out before, we have to protect the confiden-tiality of the answers, in order to ensure that answers fromdifferent workers are independent. So the workers encryptthe answers under the requester’s public key. Now thecontract cannot see the answers so it cannot calculate thecorresponding rewards. But the requester can retrieve all theencrypted answers and decrypt them off-chain, and furtherlearn the rewards they deserve. It would be necessary thatthe requester will correctly instruct the smart contract how toproceed forward. Concretely, we will leverage the practicalcryptographic tool of zk-SNARK to enforce the requesterto prove: she indeed followed the pre-specified reward policycalculating the rewards. Detailedly, the requester shouldprove her instruction for rewarding is computed as follows:(i) obtain all answers by decrypting all encrypted answersusing a secret key corresponding to the public key containedin the smart contract; (ii) use all those answers and theannounced R(·; τ) to compute the quality of each answer.

A more challenging issue arises regarding anonymity-yet-accountability. Also as briefly pointed before, we would liketo achieve a balance between anonymity and accountability.Here we put forth a new cryptographic primitive to resolvethe natural tension. A user can anonymously authenticateon messages (which are composed of a fixed length pre-fix and the remaining part). But if the two authenticatedmessages share a common prefix, anyone can tell whetherthey are done by a same user or not. Moreover, no onecan link any two message-authentication pairs, as long asthese messages have different prefixes. Having this newprimitive in hand, a simple and intuitive solution to theanonymous-yet-accountable protocol is to let each workeranonymously authenticate on a message αC ||Ci, wheneverthe encrypted answer Ci is submitted to a task contract C

that can be uniquely addressed by αC . This implies thatall submissions from the same worker to one task can belinked and then counted, but any two submissions to two

different tasks will be provably unlinkable, even if they aresubmitted by a same user. Also, the number of maximumallowed submissions in each task can be easily tuned (bycounting linked submissions).

Last, we also need to augment the smart contract bybuilding the general algorithm of verifying zero-knowledgeproofs in it. In particular, when the smart contract re-ceives an instruction regarding rewards and its proof, theverification algorithm will be executed. All inputs of theverification algorithm are common knowledge stored in theopen blockchain, e.g., the budget, the encrypted answersand the public key of encryption. If a dishonest requesterreports a false instruction, her proof cannot be verified andthe contract will simply drop the instruction. What’s more,if the smart contract does not receive a correct instructionwithin a time limit, it can directly disseminate the budget toall workers evenly as punishment (which can be consideredas part of the pre-specified incentive mechanism), since thebudget has been deposited. In this way, the requester cannotgain any benefit by deviating from the protocol, and she willbe self-enforced to respond properly and timely, resultingin that each worker will receive the expected reward. Onthe other hand, a dishonest worker can never claim morerewards than that he is supposed to get, as the reward iscalculated by the requester herself.

Fig. 2. Subtle linkability of the common-prefix-linkable anonymous au-thentication scheme. All involved algorithms except Setup are shown inbold.

5.1 Common-prefix-linkable anonymous authentica-

tion

Before the formal description of ZebraLancer’s protocol,let us introduce the new primitive for achieving theanonymous-yet-accountable authentication first. As brieflyshown in Fig.2, our new primitive can be built atop anycertification procedure, thus we include a certification gen-eration procedure that can be inherited from any existingone. Also, we insist on non-interactive authentication, thus allthe steps (including the authentication step) are described asalgorithms instead of protocols. Formally, a common-prefix-linkable anonymous authentication scheme is composed ofthe following algorithms:

Page 8: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 8

TABLE 2Notations related to common-prefix-linkable anonymous authentication

Notation Represent for

(mpk,msk)RA’s public-secret key pair, which is generatedby the algorithm of Setup

(pki, ski)The public key certified by RA to bind theunique ID i, and its corresponding secret key

CertGenThe algorithm executed by RA to issue a certifi-cate associated to a registered public key

CertVrfyThe algorithm that can be run by anyone tocheck the validity of a certificate

AuthThe algorithm to authenticate on a message withusing a certified public key

VerifyThe algorithm to verify whether a message isauthenticated by a certified public key or not

LinkThe algorithm to link two authenticated mes-sages, if and only if they share a common-prefix

ProverThe zk-SNARK algorithm that generates zk-proofs attesting the statements to be proven

VerifierThe zk-SNARK algorithm that checks whetherzk-proofs are generated faithfully or not

- Setup(1λ). This algorithm outputs the system’s masterpublic key mpk, and system’s master secret key msk,where λ is the security parameter.

- CertGen(msk, pk). This algorithm outputs a certificatecert to validate the public key.

- Auth(m, sk, pk, cert,mpk). This algorithm generates anattestation π on a message m that: the sender of mindeed owns a secret key corresponding to a validcertificate.

- Verify(m,mpk, π). This algorithm outputs 0/1 to decidewhether the attestation is valid or not for the attestedmessage, with using system’s master public key.

- Link(mpk,m1, π1,m2, π2). This algorithm takes inputstwo valid message-attestation pairs. If m1,m2 have acommon-prefix with length λ, and π1, π2 are generatedfrom a same certificate, it outputs 1; otherwise outputs0.

We also define the properties for a common-prefix-linkable anonymous authentication. Correctness is straight-forward that an honestly generated authentication can beverified. Unforgeability also follows from the standard notionof authentications, that if one does not own any validcertificate, she cannot authenticate any message. We mainlyfocus on the formalizing the security notions of common-prefix-linkability and anonymity.

The first one characterizes a special accountability re-quirement in anonymous authentication. It requires thatno efficient adversary can authenticate two messages witha common-prefix without being linked, if using a samecertificate. More generally, if an attacker corrupts q users,she cannot authenticate q + 1 messages sharing a common-prefix, without being noticed. Formally, consider the fol-lowing cryptographic game between a challenger C and anattacker A:

1) Setup. The challenger C runs the Setup algorithm andobtains the master keys.

2) CertGen queries. The adversary A submits q publickeys with different identities and obtains q differentcertificates: cert1, . . . , certq.

3) Auth. The adversary A chooses q + 1 messagesp||m1, . . . , p||mq+1 sharing a common-prefix p (with|p| = λ) and authenticates to the challenger C bygenerating the corresponding attestations π1, . . . , πq+1.

Adversary A wins if all q + 1 authentications pass the veri-fication, and no pair of those authentications were linked.

Definition 1 (Common-prefix-linkability). For all probabilis-tic polynomial time algorithm A, Pr[A wins in the above game]is negligible on the security parameter λ.

Next is the anonymity guarantee in normal cases. Wewould like to ensure the anonymity against any party,including the public, the registration authority, and theverifier who can ask for multiple (and potentially correlated)authentication queries. Also, our strong anonymity requiresthat no one can even tell whether a user is authenticating fordifferent messages, if these messages have different prefixes.The basic requirement for anonymity is that no one canrecognize the real identity from the authentication tran-script. But our unlinkability requirement is strictly stronger,as if one can recognize identity, obviously, she can link twoauthentications by firstly recovering the actual identities.

To capture the unlinkability (among the authenticationsof different-prefix messages), we can imagine the moststringent setting, where there are only two honest users inthe system, the adversary still cannot properly link any ofthem from a sequence of authentications. Formally, considerthe following game between the challenger C and adversaryA.

1) Setup. The adversary A generates the master key pair.2) CertGen. The adversary A runs the certificate gener-

ation procedure as a registration authority with thechallenger. The challenger submits two public keyspk0, pk1 and the adversary generates the correspondingcertificates for them cert0, cert1.A can always generatecertificates for public keys generated by herself.

3) Auth-queries. The adversary A asks the challenger toserially use (sk0, pk0, cert0) and (sk1, pk1, cert1) to doa sequence of authentications on messages chosen byher. Also, the number q of authentication queries is cho-sen by A. The adversary obtains 2q message-attestationpairs.

4) Challenge. The adversary A chooses a new messagem∗ which does not have a common prefix with anyof the messages asked in the Auth-queries, and asks thechallenger to do one more authentication. C picks a ran-dom bit b and authenticates on m∗ using skb, pkb, certb.After receiving the attestation πb, A outputs her guessb′.

The adversary wins if b′ = b.

Definition 2 (Anonymity). An authentication scheme is un-linkable, if ∀ probabilistic polynomial-time algorithm A, |Pr[Awins in the above game]− 1

2| is negligible.4

4We remark that our definition of anonymity is strictly strongerthan that definition of the event-oriented linkable group signature [42],in which the RA can revoke user anonymity under certain conditions.

Page 9: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 9

Construction. Now we proceed to construct such a prim-itive. Same as many anonymous authentication construc-tions, we will also use the zero-knowledge proof techniquetowards anonymity. In particular, we will leverage zk-SNARK to give an efficient construction. For the aboveconcept of common-prefix-linkable anonymous authentica-tion, we need to further support the special accountabilityrequirement. The idea is as follows, since the conditionthat “breaks” the linkability is common-prefix, thus theauthentication will do a special treatment on the prefix. Inparticular, the authentication shows a tag committing to theprefix together with the user’s secret key, and then presentszero-knowledge proof that such a tag is properly formed,i.e., computed by hashing the prefix and a secret key. Toensure other basic security notions, we will also computethe other tag that commits to the whole message. The userwill further prove in zero-knowledge that the secret keycorresponds to a certified public key.

Note that our main goal is to decentralize crowdsourc-ing, such a new anonymous authentication primitive couldbe further studied systematically in future works. Con-cretely, we present the detailed construction as follows:

- Setup(λ). This algorithm establishes the public parame-ters PP that will be needed for the zk-SNARK system.Also, the algorithm generates a key pair (msk,mpk)which is for a digital signature scheme.5

- CertGen(msk, pki): This algorithm runs a signing algo-rithm on pki,

6 and obtains a signature σi. It outputscerti := σi.

- Auth(p||m, ski, pki, certi, PP ): On inputing a messagep||m having a prefix p.The algorithm first computes two tags (or interchange-ably called headers later), t1 = H(p, ski) and t2 =H(p||m, ski), where H is a secure hash function. Then,let ~w = (ski, pki, certi) represent the private wit-ness, and ~x = (p||m,mpk) be all common knowl-edge, the algorithm runs zk-SNARK proving algorithmProver(~x, ~w, PP ) for the following language LT :={t1, t2, ~x = (p||m,mpk) | ∃~w = (ski, pki, certi) s.t.CertVrfy(certi, pki,mpk) = 1∧pair(pki, ski) = 1∧t1 =H(p, ski) ∧ t2 = H(p||m, ski)}, where the CertVrfy

algorithm checks the validity of the certificate usinga signature verification, and pair algorithm verifieswhether two keys are a consistent public-secret keypair. This prove algorithm yields a proof η for thestatement ~x ∈ L (also for the proof-of-knowledge of~w).Finally, the algorithm outputs π := (t1, t2, η).

- Verify(p||m,π,mpk, PP ): this algorithm runs the veri-fying algorithm of zk-SNARK Verifier on ~x, π and PP ,and outputs the decision bit d ∈ {0, 1}.

5To be more precise, the public parameter generation could be fromanother algorithm. For simplicity, we put it here. In the security gamefor anonymity, the adversary only generates msk,mpk, not the publicparameter.

6We assume an external identification procedure that can check theactual identity bound to each public key, and the user key pairs aregenerated using common algorithms, e.g., for a digital signature. Weignore the details here.

- Link(m1, π1,m2, π2): On inputting two attestationsπ1 := (t11, t

12, η1) and π2 := (t21, t

22, η2), the algorithm

simply checks t11?= t21. If yes, output 1; otherwise,

output 0. We also use Link(π1, π2) for short.

Security analysis (sketch). Here we briefly sketch the se-curity analysis for the construction. As the scheme is of in-dependent interests, we defer detailed analyses/reductionsto an extended paper where the scientific behind will beformally studied.

Regarding correctness, it is trivial, because of the com-pleteness of underlying SNARK.

Regarding unforgeability, we require an uncertified at-tacker cannot authenticate. The only transcripts can be seenby the adversary are headers and the zero-knowledge at-testation. Headers include one generated by hashing theconcatenation of p||m, sk. In order to provide a header,the attacker has to know the corresponding sk, as it canbe extracted in the random oracle queries. Thus there areonly two different ways for the attacker: (i) the attackergenerates forges the certificate, which clearly violates thesignature security; (ii) the attacker forges the attestationusing an invalid certificate, which clearly violates the proof-of-knowledge of the zk-SNARK.

Regarding the common-prefix-linkability, it is also fairlystraightforward, as the final authentication transcript con-tains a header computed by H(p, sk) which is an invariablefor a common prefix p using the same secret key sk.

Regarding the anonymity/unlinkability, we require thatafter seeing a bunch of authentication transcripts fromone user, the attacker cannot figure out whether a newauthentication comes from the same user. This holds evenif the attacker can be the registration authority that issuesall the certificates. To see this, as the attacker will not beable to figure the value of the sk from all public value,thus the headers/tags can be considered as random values.It follows that H(p, sk) and a random value r cannot bedistinguished (similarly for H(p||m, sk)). More importantly,due to the zero-knowledge property of zk-SNARK, givenr, a simulator can simulate a valid proof η∗ by controllingthe common reference string of the zk-SNARK. That said,the public transcript t1, t2, η can be simulated by r1, r2, η

where r1, r2 are uniform values, and η∗ is a simulated proof,all of which has nothing to do with the actual witness sk.

Summarizing the above intuitive analyses, we have thefollowing theorem:

Theorem 1. Conditioned on that the hash function to be mod-eled as a random oracle and the zk-SNARK is zero-knowledge,the construction of the common-prefix-linkable anonymous au-thentication satisfies anonymity. Conditioned on the underlyingdigital signature scheme used is secure, and the zk-SNARK sat-isfies proof-of-knowledge, our construction of the common-prefix-linkable anonymous authentication will be unforgeable. It is alsocorrect and common-prefix linkable.

5.2 The protocol of ZebraLancer

Now we are ready to present a general protocol for a classof crowdsourcing tasks having proper quality-aware incen-tives mechanisms defined as in Section 2. As zk-SNARK

Page 10: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 10

requires a setup phase, we consider that a setup algorithmgenerated the public parameters PP for this purpose, andpublished it as common knowledge.7 Our descriptions fo-cuses on the application atop the open blockchain, andtherefore omits details of sending messages through theunderlying blockchain infrastructure. For example, “oneuses blockchain address α to send a message m to theblockchain” will represent that he broadcasts a blockchaintransaction containing the message m, the public key asso-ciated to α, and the signature properly generated under thecorresponding secret key.

Fig. 3. The schematic diagram of the protocol of ZebraLancer for thequality-aware incentive mechanisms as proof-of-concept. Encrypted an-swers and the instruction of rewarding are pointing to the task publishedas a smart contract.

Remark that here we let each worker/requester to gen-erate a different blockchain address for each task (i.e. aone-task-only address) as a simple solution to avoid de-anonymization in the underlying blockchain.8 For concreteinstantiations of the underlying infrastructures, see the im-plementations in Section 6. We further remark that the pro-tocol can be extended to private and anonymous auction-based incentives trivially (see Appendix B for a concreteexample), although it mainly focuses on quality-aware ones.

Protocol details. As shown in Fig.3, the details of Ze-braLancer protocol can be described as follows:

7This in practice can be done via a secure multiparty computationprotocol [52] to eliminate potential backdoors.

8Our anonymous protocol mainly focuses on the application layersuch as the crowdsourcing functionality that is built on top of theblockchain infrastructure. If the underlying blockchain layer supportsanonymous transaction, such as Zcash [47], the worker and the re-quester can re-use account addresses. We further remark that theanonymity in network layer are out the scope of this paper, we maydeploy our protocol on existing infrastructure such as Tor.

TABLE 3Key notations related to the protocol of ZebraLancer

Notation Represent for

CThe smart contract programmed for the crowd-sourcing task to be published

αC

The blockchain address of C , which is also usedas the prefix for authentication

αR

The one-time blockchain address used by therequester R to anonymously interact with C

αiThe one-time blockchain address used by theworker Wi to anonymously interact with C

(esk, epk)The one-time public-secret key pair used in C forencrypting/decrypting crowd-shared answers

CiThe encrypted answer submitted to C by theworker Wi, i.e. the ciphertext of answer Ai

R

The instruction sent from the requester to instructthe smart contract to reward each crowd-sharedanswer

πreward

The zk-SNARK proof generated to attest the cor-rectness of the instruction of paying rewards

πiThe attestation sent from a user i to anonymouslyauthenticate a message having a prefix αC

PP

Public parameters for zero-knowledge proof,including the one for common-prefix-linkableanonymous authentication and the one for incen-tive mechanism

• Register. Everyone registers at RA to get a certificate boundto his/her unique ID, which is done off-line only once for pereach participant.

A requester, having a unique ID denoted by R, createsa public-secret key pair (pkR, skR), and registers at theregistration authority (RA) to obtain a certificate certRbinding pkR to R. Each worker, having a unique IDdenoted by Wi, also generates his public and secret keypair (pki, ski), and registers his public key at RA toobtain a certificate certi binding pki and Wi.

• TaskPublish. A requester anonymously authenticates andpublishes a task contract with a promised reward policy.

When the requester R has a crowdsourcing task, shegenerates a fresh blockchain account address αR, and akey pair (epk, esk) (which will be used for workers toencrypt submissions) for this task only.R then prepares parameters Param, including theencryption key epk, the number of answers tocollect (denoted by n), the deadline, the budgetτ , the reward policy R, SNARK’s public parame-ters PP , RA’s public key mpk, and also πR =Auth(αC ||αR, skR, pkR, certR, PP ).9

9We remark that the requester should authenticate on herblockchain address αR along with the task contract, and workers willjoin the task only if the task contract is indeed sent from a blockchainaddress as same as the authenticated αR. So a malicious requestercannot “authenticate” a task by copying other valid authentications.In addition, each worker has to authenticate on his blockchain addressαi along with his answer submission as well. The task contract willcheck the submission is indeed sent from a blockchain address same tothe authenticated αi. Otherwise, a malicious worker can launch free-riding through copying and re-sending authenticated submissions thathave been broadcasted but not yet confirmed by a block.

Page 11: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 11

The requester then codes a smart contract C that con-tains all above information for her task. After compilingC , she puts C ’s code and a transfer of the budget intoa blockchain transaction, and uses the one-task-onlyaddress αR to send the transaction into the blockchainnetwork. When a block containing C is appended to theblockchain, C gets an immutable blockchain addressαC to hold the budget and interact with anyone.10

See Algorithm 1 below for a concrete example of taskcontract. (The important component of verifying zk-proofs is done by calling a library libsnark.Verifier in-tegrated into the blockchain infrastructure, and imple-mentation details will be explained in Section 6).

• AnswerCollection. The contract collects anonymously au-thenticated encrypted answers from workers who didn’t sub-mit before.

If a registered worker Wi is interested in contributing,he first validates the contract content (e.g., checking theparameters), then generates a one-time blockchain ad-dress αi. He signs his answer Ai with using the privatekey associated with the blockchain address αi to obtaina signature σi, then encrypts the signed answer Ai||σi

under the task’s public key epk to obtain ciphertext Ci.Note that σi can be verified by the chain address αi,e.g., it is generated by the same signing algorithm tosign transactions.He then uses common-prefix-linkable anonymous au-thentication scheme to generate an attestation πi =Auth(αC ||αi||Ci, ski, pki, certi, PP ).9 Then he uses hisone-time address αi to send Ci, πi to the blockchainnetwork (with a pointer to αC , i.e. the unique addressof the contract C ).Then, C runs Verify(αC ||αi||Ci, πi,mpk, PP ), and alsoexecutes Link(πi, π∗) for each valid authentication at-testation π∗ that was received before (including re-quester’s, namely πR). Such that, C can ensure Ci isthe first submission of a registered worker. For unau-thenticated or double submissions, C simply drops it.11

The contract C will keep on collecting answers, untilit receives n answers or the deadline (in unit of block)passes. It also records each address αi that sends Ci.Remark that Link algorithm will be executed O(n2)times, but it is a simple equality check over a pair ofhashes, such that the cost of running it for several timeswill be nearly nothing in practice.Along the way, the requester can keep on listening theblockchain, and decrypts each submission Ci that issent by αi and accepted by the contract. Such that,the requester can obtain the signed answer Ai||σi

bound to Ci, and verifies the signature σi using theblockchain address αi. If Ai is not properly signed,

10We emphasize that αC will be unique per each contract. Inpractice, αC can be computed via H(αR||counter), where H is asecure hash function, and counter is governed by the blockchain tobe increased by exact one for each contract created by the blockchainaddress αR. It’s also clear that the requester R can predicate αC beforeC is on-chain, such that she can compute πR off-line and let it be aparameter of contract C .

11We remark that our protocol can be extended trivially to alloweach worker to submit some k answers in one task by modifying thechecking condition programmed in the smart contract of crowdsourc-ing task.

Algorithm 1: Example using quality-aware incentive

Require : This contract’s address αC ; requester’sone-time blockchain address αR; requester’sauthenticating attestation πR; RA’s public keympk; budget τ ; public key epk for encryptinganswers; SNARK’s public parameters PP ;number of requested answers n; deadline ofanswering in unit of block TA; deadline ofinstructing reward in unit of block TI .

1 List keeping answers’ ciphertexts, C ← ∅;2 Map of anonymous attestations and authenticated

one-time blockchain addresses of workers, W ← ∅;3 if getBalance(αC ) < τ ∨ ¬Verify(αC ||αR, πR,mpk,PP )

then4 goto 24 ; ⊲ Unidentified requester or no deposit.

5 timerA ← a timer expires after TA; ⊲ Collect answers6 while ||C|| < n ∧ timerA NOT expired do7 if αi sends πi, Ci then8 if ¬Link(πi, πR)∧ ∀π∗ ∈W .keys() ¬Link(πi, π∗)∧

Verify(αC ||αi||Ci, πi,mpk, PP ) then9 W .add(πi → αi); C.add(Ci);

10 if αR sends i, πfake then11 if libsnark.Verifier((Ci, σi), πfake, PP ) then12 W .remove(πi → αi); C.remove(Ci);

13 timerI ← a timer expires after TI ; ⊲ Wait instruction14 while timerR NOT expired do15 if αR sends R := (R1, . . . , Rn) and πreward then16 P ← (epk, τ, C1, . . . , Cn);17 if libsnark.Verifier((P,R), πreward, PP ) then18 for each (πi → αi) ∈W do19 transfer(αC, αi, Ri);

20 goto 24;

21 R← τ/||W ||; ⊲ Reward all if no correct instruction22 for each (πi → αi) ∈W do23 transfer(αC, αi, R);

24 transfer(αC, αR, getBalance(αC)); ⊲ Refund the else25 function getBalance(addr)26 return the balance of addr in the ledger;

27 function transfer(src, dst, v)28 if getBalance(src) < v then29 return false;

30 src’s balance - v; dst’s balance + v; return true;

31 ⊲ The correctness and availability of this task contract isgoverned by the blockchain network; the contractprogram is driven by a “discrete” clock that incrementswith validating each newly proposed block

32 ⊲ libsnark.Verifier is a library embedded in the runtimeenvironment of smart contract such as EVM

the requester will generate a zero-knowledge proofπfake to convince the contract that the plaintext boundto Ci is not correctly signed message. Particularly,the proof is generated by zk-SNARK’s Prover al-gorithm to attest the statement that (Ci, epk, αi) ∈{Ci, epk, αi | ∃esk, s.t., verifying(αi, Ai, σi) 6= 1 ∧Ai||σi = Dec(esk, Ci) ∧ pair(esk, epk) = 1}, whereverifying is a function that outputs 1 iff the inputscorrespond properly signed message. Once the contractreceives and validates such a proof πfake for a givensubmission from αi, it will remove Ci and αi. Note

Page 12: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 12

that the above proof and verification are necessary toprevent the network adversaries who can delay others’submissions, and then copy and paste others’ cipher-texts to reap rewards.

• Reward. The requester computes and prove how to rewardproperly authenticated anonymous answers.

The requester R keeps listening to the blockchain, andonce C collects n submissions, she retrieves and de-crypts all of them to obtain the corresponding answersA1, . . . , An (if there are not enough submissions whenthe deadline passes, the requester simply sets the re-maining answers to be ⊥ which has been consideredby the incentive mechanism R).Next, the requester computes the reward for each an-swer Ri = R(Ai;A1, . . . , An, τ) as specified by thepolicy codified in C . More importantly, she generates azero knowledge proof πreward, with the secret key eskas witness to attest the validity of the instruction. In par-ticular, the proof is for the following NP-language L ={R,P | ∃esk s.t. ∧nj=1Aj ||σi = Dec(esk, Cj) ∧nj=1 Rj =

R(Aj ;A1, . . . , An, τ) ∧ pair(esk, epk) = 1}, where Pdenotes Param together with ciphertexts C1, . . . , Cn;while R := (R1, . . . , Rn) is the instruction about howto reward each answer. After computing R and πreward,R puts them into a blockchain transaction, and stilluse her one-task-only blockchain address αR to sendthe transaction to C (by using a pointer to αC ). Thisfinishes the outsource-then-prove methodology.Once a newly proposed block contains the rewardinstruction R and its attestation πreward, the contractC first checks that they are indeed sent from αR

(by verifying the digital signature of the underlyingblockchain transaction). Then it leverages SNARK’sVerifier algorithm to verify the proof πreward regardingthe correctness of R. If the verification passes, it trans-fers each amount Ri to each account αi, and refunds theremaining balance to αR. Otherwise, pause. If receivingno valid instruction after a predefined time (in unit ofblock), the contract simply transfers τ/n to each αi aspart of the policy R.

5.3 Analysis of the protocol

Correctness and efficiency. It is clear to see that the re-quester will obtain data and the workers would receive theright amount of payments. If they all follow the protocol,under the conditions that (i) the blockchain can be modeledas an ideal public ledger, (ii) the underlying zk-SNARK isof completeness, (iii) the public key encryption is correct,and (iv) common-prefix-linkable anonymous authenticationsatisfies correctness. Regarding efficiency, we note the on-chain computation (and storage which are two of the majorobstacles for applying blockchain in general) is actually verylight, as the contract essentially only carries a verificationstep. Thanks to zk-SNARK, the verification can be efficientlyexecuted by checking only a few pairing equalities; more-over, the special library can be dedicatedly optimized invarious ways [53].

Security analysis (sketch). We briefly discuss security hereand defer the details to an extended version. The underlying

primitives, including our common-prefix-linkable anony-mous authentication scheme, are well abstracted, whichenable us to argue security in a modular way.

Regarding the data confidentiality of answers, all relatedpublic transcripts are simply the ciphertexts C1, . . . , Cn, andthe zk-SNARK proof π. The ciphertexts are easily simulat-able according to the semantic security of the public keyencryption, and the proof π can also be simulated withoutseeing the secret witness because of the zero-knowledge prop-erty.

Regarding the anonymity, an adversary has two ways tobreak it: (i) link a worker/requester through his blockchainaddresses; (ii) link answers/tasks of a worker/requesterthrough his authenticating attestations. The first case istrivial, simply because every worker/requester will inter-act with each task by a randomly generated one-task-onlyblockchain address (and the corresponding public key). Thesecond case is more involved, but the anonymity of workersand requesters can be derived through the anonymity of thecommon-prefix-linkable anonymous authentication scheme.

Regarding the security against a malicious requester, amalicious requester has three chances to gain advantage:(i) deny the policy announced in TaskPublish phase; (ii)cheat in Reward phase; (iii) submit answers to intention-ally downgrade others in AnswerCollection phase. The firstthreat is prevented because the smart contract is public,and the requester cannot deny it once it is posted in theimmutable blockchain. The second threat is prohibited bythe soundness of the underlying zk-SNARK, since anyincorrect instruction passing the verification in the smartcontract, directly violates the proof-of-knowledge (i.e. thestrong soundness). The last threat is simply handled the un-forgeability and common-prefix-linkability of our common-prefix-linkable anonymous authentication scheme.

Security against malicious workers is straightforward, theonly ways that malicious workers can cheat are: (i) sub-mitting more than one answers in AnswerCollection phase;(ii) copying and pasting others’ answers to earn reward;(iii) sending the contract a fake instruction in the nameof requester in Reward phase; (iv) altering the policyspecified in the contract. The first threat is simply han-dled by the common-prefix-linkability and unforgeabilityof common-prefix-linkable anonymous authentication. Thesecond threat can be approached by predicating the plain-texts of answers or copying the ciphertexts, but both areprevented: predicating plaintext is prevented due to thesemantical security of public key encryption; copying theciphertexts are prevented because the securities of digitalsignature and zk-SNARK. The third threat is simply han-dled by the security of digital signatures. The last issueis trivial, because the blockchain security ensures the an-nounced policy is immutable.

Theorem 2. The data confidentiality of our protocol holds, if theunderlying public key encryption is semantically secure and theused zk-SNARK is of zero-knowledge.

The anonymity of our protocol for both workers and requesterswill be satisfied, if the underlying common-prefix-linkable anony-mous authentication satisfies the anonymity defined in Definition2, and the zk-SNARK is zero-knowledge.

Page 13: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 13

Conditioned on that the blockchain infrastructure we rely oncan be modeled as an ideal public ledger, the underlying common-prefix-linkable anonymous authentication satisfies the unforge-ability and the common-prefix-linkability, the zk-SNARK satisfiesproof-of-knowledge and the digital signature in use is secure,our protocol satisfies: security against a malicious requester andsecurity against malicious workers.

6 ZEBRALANCER: IMPLEMENTATION AND EXPER-

IMENTAL EVALUATION

We implement the protocol of ZebraLancer atop Ethereum,and instantiate a series of typical image annotation tasks [11]with using it. Furthermore, we conduct experiments of thesetasks in an Ethereum test net to evaluate the applicability.

Fig. 4. The system-level view of ZebraLancer. Our Dapp layer can bebuilt on top of an existing blockchain, e.g. the Ethereum Byzantiumrelease [48].

System in a nutshell. As shown in Fig.4, the decentralizedapplication (DApp) of our system is composed of an on-chain part and an off-chain part. The on-chain part consistsof crowdsourcing task contracts and an interface contractof the registration authority (RA). The RA’s contract simplyposits the system’s master public key as a common knowl-edge stored in the blockchain. The off-chain part consistsof requester clients and worker clients. These clients canbe blockchain clients wrapped with functionalities requiredby our system. Specifically, a client of requester shouldcodify a specific task with a given incentive mechanismand announces it as a smart contract. Note that we, asthe designers of the DApp, can provide contract templatesto requesters for easier instantiation of incentive mecha-nisms, c.f. [54]. The clients further need an integrated zk-SNARK prover to produce the anonymous authenticatingattestations; moreover, a requester client should also lever-age SNARK prover to generate proofs attesting the correctexecution of incentive policies.

We also compare ZebraLancer to some existing crowd-sourcing systems in Table.4. The security performance ofour system overwhelms others, as it considers the most strictfairness of exchange, user anonymity and data confidential-ity, under that condition of minimum trust. For example,our design realizes the fair exchange without leaking datato a third-party information arbiter. And ZebraLancer alsoguarantees the strongest user anonymity that cannot bebroken by any third-party (even the registration authority),

while the anonymity of other systems can be broken bya third-party authority (or few colluded third-parties). Re-mark that the identities established via a registration serviceare required by all systems in Table.4.

TABLE 4Comparison between our system ZebraLancer and some other

crowdsourcing platforms

Ours MTurk Dynamo SPPEAR CrowdBC

[2] [27] [28] [23]

Preventingfalse-reporting

√ × × © √

Preventingfree-riding

√ © © © √

Data confiden-tiality

√ × × × ×

Useranonymity

√ × © © ×Note:

√denotes a functionality realized without relying on any

central trust except the established identities; © denotes a (partially)realized function by relying on a central trust (other than theregistration authority); × denotes an unrealized feature. Note that thedata confidentiality is marked as ×, if any third-party other than therequesters can access the submitted data.

Implementation challenges. The main challenge of de-ploying smart contracts in general is that they can onlysupport very light on-chain operations for both computingand storing.12 Our protocol actually has taken this intoconsideration. In particular, our on-chain computation onlyconsists SNARK verifications, while the heavy computationof SNARK proofs are all done off the blockchain. Even still,building an efficient privacy-preserving DApp compatiblewith existing blockchain platform such as Ethereum is notstraightforward. For instance, in order to allow smart con-tracts to call a zk-SNARK verification library, a contract ofthis library should be thrown into a block, but this libraryis a general purpose tool that can be too complex to beexecuted in the smart contract runtime environment, e.g.Ethereum Virtual Machine (EVM). Alternatively, we modifythe the runtime environment of smart contracts, so that anoptimized zk-SNARK verification library [53] is embeddedin it as a primitive operation. Our modified Ethereum clientis written in Java 1.8 with Spring framework, and is avail-able at github.com/maxilbert/ethereumj.

We remark that Ethereum project recently integratedsome new cryptographic primitives into EVM to enableSNARK verification as well [48], which ensures our DAppcan essentially inherit all Ethereum users to maintain theblockchain infrastructure to govern the faithful execution ofthe smart contracts in our DApp.

Establishments of zk-SNARKs (off-line). As the feasibilityof ZebraLancer highly depends on the tininess of SNARK

12We remark the communication overhead is not a serious worry,because: (i) a blockchain network such as Ethereum does not requirefully meshed connections, i.e. requesters and workers can only connecta constant number of Ethereum peers; (ii) if necessary, requesters andworkers can even run on top of so-called light-weight nodes, whicheventually allows them receive and send messages only related tocrowdsourcing tasks; (iii) even if there is a trusted arbiter facilitatingincentive mechanisms, the only saving in communication is just aninstruction about how to reward answers (and its attestation).

Page 14: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 14

proofs and the efficiency of SNARK verifications, it be-comes critical to establish necessary zk-SNARKs off-line.As formally discussed before, the authentication schemeand nearly all incentive mechanisms can be stated as somewell-defined deterministic constraint relationships. We firsttranslate these mathematical statements into their corre-sponding boolean circuit satisfiability representations. Fur-thermore, we establish zk-SNARK for each boolean circuit,such that all required public parameters are generated. Allthe above steps are done off-line, as they are executedfor only once when the system is launched. Note that thepotential backdoors in these zk-SNARK public parameterscould be further eliminated via an off-line protocol basedon secure multi-party computation [52]. However, such anoff-line setup is beyond the scope of showing our systemfeasibility.

An image annotation crowdsourcing task. To showcase theusability of our system, we implement a concrete crowd-sourcing task of image annotation similar to the ones in[11]. The task is to solicit labels for an image which canlater be used to train a learning machine. The task requestsn answers from n workers, and can be considered as amulti-choice problem. Majority voting is used to estimatethe “truth”. An answer is seen as “correct”, if it equals to the“truth”. The reward amount of a worker is τ/n if he answerscorrectly, otherwise, he receives nothing. In our terminol-ogy, the reward Ri := R(Ai;A1, . . . , An, τ) = τ/n, if Ai

equals the majority; otherwise, Ri = 0. Following [11], weimplement and deploy 5 contracts in the test net to collect3 answers, 5 answers, 7 answers, 9 answers and 11 answersfrom anonymous-yet-accountable workers, respectively.

The smart contracts are written in Solidity, a high-level scripting language translatable to smart contracts ofEthereum. We also modify Solidity compiler, such that aprogrammer can write a contract involving zk-SNARK ver-ifications at high-level. We instantiate the encryption to beRSA-OAEP-2048, the DApp-layer hash function to be SHA-256, and the DApp-layer digital signature to be RSA signa-ture. Moreover, for zk-SNARK, we choose the constructionof libsnark from [53]. We deploy a test network consisting offour PCs: three PCs are equipped with Intel Xeon E3-1220V2CPU (PC-As), and the other one is equipped with Intel i7-4790 CPU (PC-B); all PCs have 16 GB main memory andhave Ubuntu 14.04 LTS installed. In the test net, a PC-A anda PC-B play the role of miners, and the other two PC-As onlyvalidate blocks (i.e. full nodes that do not mine). One fullnode plays the role of the requester, and anonymously pub-lishes crowdsourcing tasks to the blockchain; and the otherfull node mimics workers, and sends each anonymouslyauthenticated answer from a different blockchain address.Miners are only responsible to maintain the test net and donot involve in tasks.

Performance evaluation. As the main bottleneck is the on-chain computation of the smart contract, we first measurethe time cost and the spatial cost of miners, regarding the ex-ecutions of zk-SNARK verifications used in the above anno-tation tasks. These zk-SNARKs are established for common-prefix-linkable anonymous authentications and incentivemechanisms, respectively. The results of time cost are listedin Table 5. It is clear that zk-SNARK verifications in our

TABLE 5Execution time of in-contract zk-SNARK verifications.

Verification forOperands Length Time@

PC-A

Time@

PC-BProof Key Inputs

Our authentication 729B 1.2KB 1.5KB 10.9ms 6.2ms

Majority (3-Worker) 729B 16.0KB 3.4KB 15.5ms 9.1ms

Majority (5-Worker) 730B 21.6KB 4.7KB 16.3ms 9.8ms

Majority (7-Worker) 731B 27.3KB 6.0KB 17.0ms 10.3ms

Majority (9-Worker) 729B 32.9KB 7.3KB 17.5ms 12.1ms

Majority (11-Worker) 730B 38.6KB 8.6KB 17.9ms 13.1ms

Note: the sizes of proofs, keys and inputs are their bytes after beingserialized in unicode.

system can be efficiently executed in respect of verificationtime. Moreover, our experiment results also reveal that thespatial cost of zk-SNARK verifications is constant and tinyat both types of PCs (exactly 17MB main memory). Also,the required on-chain storage for the task contracts is atthe acceptable magnitude of kilobyte. Therefore, the on-chain performance of the system can be clearly practical,considering both time and space.

At the off-chain side of requester, we consider refor-mation of the statements of attesting correct paymentsto enhance the off-chain efficiency, as the inherent heavyNP-reductions of generating zk-SNARK proofs could beprohibitively expensive for the requester without care-ful optimizations. Briefly speaking, we reform L ={R,P | ∃esk s.t. ∧nj=1Aj ||σi = Dec(esk, Cj) ∧nj=1 Rj =R(Aj ;A1, . . . , An, τ) ∧ pair(esk, epk) = 1}, and translate itinto L′ = {R,P | ∃esk s.t. ∧nj=1Cj = Enc(epk,Aj ||σi) ∧

nj=1

Rj = R(Aj ;A1, . . . , An, τ) ∧ pair(esk, epk) = 1}, becauseepk, as the exponent of modular exponentiation operations,is significantly smaller than esk in RSA, and therefore thelatter language corresponds a more compact arithmetic cir-cuit and essentially brings more efficient proving. The prov-ing cost at the requester end therefore becomes acceptable;for example, to prove the majority of 11 workers’ binaryvoting, the requester spends 56 GB memory+swap and lessthan 2 hours, which was hundreds of gigabytes and aboutone day without such the optimization [55].

61

62

78

79

@PC-A(3.1GHz E3-1220V2)

@PC-B(3.6GHz i7-4790)

Exec

utio

n Ti

me

of G

ener

atin

g At

test

atio

ns

for A

nony

mou

s Au

then

ticat

ions

(sec

ond)

Fig. 5. The time of generating common-prefix-linkable anonymous au-thentications in two PCs. The box plot is derived from 12 differentexperiments.

We also measure the off-chain cost of anonymity, ifone uses the common-prefix-linkable anonymous authen-

Page 15: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 15

tication. We measure the running time of generating theauthenticating attestations at PCs. As shown in Fig.5, ourexperiment results clarify that about 78 seconds are spenton generating an anonymous attestation with using PC-A (3.1 GHz CPU). In PC-B (3.6GHz CPU), the runningtime can be shortened to about 62 seconds. Those are notideal, but acceptable by the anonymity-sensitive workers.We remark that our protocol can be trivially extended tosupport non-anonymous mode, in case that one gives upthe anonymity privilege: s/he can generate a public-privatekey pair (for digital signatures), and then registers the publickey at RA to receive a certificate bound to the public key; toauthenticate, s/he can simply show the certified public key,the certificate, along with a message properly signed underthe corresponding secret key, which essentially costs nearlynothing regarding the computational efficiency.

7 EXTENSION FOR AUCTION-BASED INCENTIVES

The pseudocode in Algorithm 2 showcases how to extendthe ZebraLancer protocol to instantiate a private and anony-mous crowdsourcing task using an auction-based incentivemechanism to pre-select workers. The main difference isusing auction to pre-select workers before they submit data(line 6-12). As shown in the algorithm, the workers cansigned their bids (with using their private keys associatedwith the blockchain addresses), and then encrypt thesesigned bids under requester’s public key. The ciphertextscan be sent to the task contract (line 7-9). Then, the requesteris allowed to convince the contract that the plaintext boundto a submission is not properly signed by the correspondingworker (line 10-12). Moreover, the requester is then incen-tivized to leverage the outsource-then-prove method to provethe result of auction (i.e. the selected workers) to the taskcontract (line 13-18). After the workers are selected, theycan submit answers to the contract (line 20-25).

The main motivation of the above “private” biddingprocess is to prevent a malicious worker from learningthe bidding strategies of honest workers through exploringall historic bids stored in the open blockchain, which isan implicit requirement of most auction-based incentivemechanisms (mainly for truthfulness). For example, whenthe requester is using an incentive mechanism (e.g. a variantof reverse Vickrey auction [8]) to make workers give theirtruthful bids, if a malicious worker can well estimate thedistribution of all other bids, then an optimized bid, whichis usually untruthful, can be explored by him to breakthe requester’s auction mechanism. Note that the publicparameter of the zk-SNARK for verifying the result of theauction (i.e. pre-selected workers) is established off-lineand published as common knowledge, such that it can bereferred in the contract by PP .

Remark that the private auction atop blockchain is aninteresting orthogonal problem [56] per se. Nevertheless,ZebraLancer is really general, and can be adapted for a verybroad variety of incentive mechanisms including auctions,although we didn’t intentionally focus on private auctions.

It is also clear that our common-prefix-linkable anony-mous authentication scheme can be straightforwardly used

Algorithm 2: Example of auction-based incentive

Require : This contract’s address αC ; requester’sone-time blockchain address αR; requester’sauthenticating attestation πR; RA’s public keympk; budget τ ; public key epk for encryptingbids and answers; SNARK’s public parametersPP ; deadline of bidding in unit of block TB ;deadline of instructing auction result in unit ofblock TI ; deadline of answering in unit of blockTA; deadline of instructing auction result inunit of block TI .

1 List keeping the ciphertexts of bids, B ← ∅;2 Map of anonymous attestations and authenticated

one-time blockchain addresses of workers, W ← ∅;3 if getBalance(αC ) < τ ∨ ¬Verify(αC ||αR, πR,mpk,PP )

then4 goto 26 ; ⊲ Unidentified requester or no deposit.

5 timerB ← a timer expires in TB; ⊲ Collect secret bids.6 while timerB NOT expired do7 if αi sends πi, and his encrypted bid Bi then8 if ¬Link(πi, πR)∧ ∀π∗ ∈W .keys() ¬Link(πi, π∗)∧

Verify(αC ||αi, πi, mpk,PP ) then9 W .add(Wi := (πi → αi)); B.add(Bi);

10 if αR sends i, πfake then11 if libsnark.Verifier((Bi, σi), πfake, PP ) then12 W .remove(πi → αi); B.remove(Bi);

13 timerI ← a timer expires after TI ; ⊲ Wait instruction14 while timerI NOT expired do15 if αR sends selected workers S := (W1′ , . . . ,Wn′) and

πauction then16 P ← (epk, τ, B1, . . . , B||B||,W1, . . . ,W||W ||);17 if libsnark.Verifier((P, S), πauction, PP ) then18 goto 20;

19 S = W ; ⊲ All workers are “selected” if no instruction.20 timerA ← a timer expires after TA; ⊲ Collect answers21 while ||S|| 6= 0 ∧ timerA NOT expired do22 if αi sends encrypted answer Ci then23 if (∗ → αi) ∈ S then24 transfer(αC, αi, Bi);25 S.remove((∗ → αi));

26 transfer(αC, αR, getBalance(αC)); ⊲ Refund the else

to make the auction-based extension to be anonymous-yet-accountable, as the multiple bidding of each workeris prevented by its subtle common-prefix linkability. Tofurther allow a worker submit k bids in a single task, wecan instantiate a counter in the smart contract to track thenumber of bids from each worker, because of the common-prefix-linkability property.

8 CONCLUSION & OPEN PROBLEMS

ZebraLancer can facilitate a large variety of incentive mech-anisms to realize the fair exchange between the crowd-shared data and their corresponding rewards, without theinvolvement of any third-party arbiter. Moreover, it showsthe practicability to resolve two natural tensions in theuse-case of the decentralized crowdsourcing atop openblockchain: one between the data confidentiality and the

Page 16: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 16

blockchain transparency, and the other one between theparticipants’ anonymity and the their accountability.

Along the way, we put forth a new anonymous authen-tication scheme. Besides the strong anonymity that cannotbe revoked by any authority, it also supports a delicate link-ability only for messages that share the common prefix andare authenticated by the same user. A concrete constructionof the scheme is proposed, and it shows the compatibility toreal-world blockchain infrastructure. The delicate linkabilityof the scheme is subtly different from the state-of-the-artof anonymous-yet-accountable authentication schemes [34–39, 42], and we envision the scheme might be of indepen-dent interests. We also develop a general outsource-then-provetechnique to use smart contracts in a privacy-preservingway. This technique can further extend the scope of ap-plications atop some existing privacy-preserving blockchaininfrastructures such as [46–48].

Open problems. Since this work is the first attemptof decentralizing data crowdsourcing atop the real-worldblockchain in a privacy-preserving way, the area remainslargely unexplored. Here we name a few open questions,and we defer solutions to them in our future work. First,there are many incentive mechanisms using reputationsystems, can we further extend our implementations tosupport those incentives? Second, as the current smart con-tract technology is at an infant stage and can only allowvery tiny on-chain storage, can we further optimize ourimplementations for particular crowdsourcing tasks to assistmore large-scale tasks, e.g. to collect annotations for millionsof images (i.e. the scale of ImageNet dataset)? Third, ouranonymous protocol currently either relies on the under-lying blockchain to support anonymous transaction, or re-quires workers/requesters use one-time blockchain accountto submit data and receive reward. Can we design a (DApp-layer) protocol to solve the drawbacks? Last but not least,our protocol relies on a trusted registration authority (RA) toestablish identities. Although such a trusted RA could be areasonable assumption (in view of real-world experiences),it is more tempting to develop an alternative methodologyto remove the third-party RA without sacrificing securities.For example, can we adapt the successful invention ofproof-of-work to build up a crowdsourcing framework fromliterally “zero” trust without any established identity?

REFERENCES

[1] J. Deng, W. Dong, R. Socher et al., “Imagenet: A large-scalehierarchical image database,” in Proc. IEEE CVPR 2009, pp. 248–255.

[2] Mechanical Turk. [Online]. Available:https://www.mturk.com/mturk/

[3] R. Ganti, F. Ye, and H. Lei, “Mobile crowdsensing: current stateand future challenges,” IEEE Commun. Mag., vol. 49, no. 11, pp.32–39, 2011.

[4] Waze. [Online]. Available: https://status.waze.com[5] S. Devarakonda, P. Sevusu, H. Liu, R. Liu, L. Iftode, and B. Nath,

“Real-time air quality monitoring through mobile sensing inmetropolitan areas,” in Proc. ACM UrbComp 2013, pp. 15:1–15:8.

[6] Y. Wen, J. Shi, Q. Zhang et al., “Quality-driven auction-basedincentive mechanism for mobile crowd sensing,” IEEE Trans. Veh.Technol., vol. 64, no. 9, pp. 4203–4214, 2015.

[7] Y. Zhang and M. Van der Schaar, “Reputation-based incentiveprotocols in crowdsourcing applications,” in Proc. IEEE INFOCOM2012, pp. 2140–2148.

[8] D. Zhao, X.-Y. Li, and H. Ma, “How to crowdsource tasks truth-fully without sacrificing utility: Online incentive mechanisms withbudget constraint,” in Proc. IEEE INFOCOM 2014, pp. 1213–1221.

[9] D. Yang, G. Xue, X. Fang, and J. Tang, “Crowdsourcing to smart-phones: Incentive mechanism design for mobile phone sensing,”in Proc. ACM MobiCom 2012, pp. 173–184.

[10] D. Peng, F. Wu, and G. Chen, “Pay as how well you do: Aquality based incentive mechanism for crowdsensing,” in Proc.ACM MobiHoc 2015, pp. 177–186.

[11] N. Shah and D. Zhou, “Double or Nothing: multiplicative Incen-tive Mechanisms for Crowdsourcing,” J. Mach. Learn. Res., vol. 17,no. 1, pp. 5725–5776, 2016.

[12] H. Jin, L. Su, D. Chen, K. Nahrstedt, and J. Xu, “Quality of in-formation aware incentive mechanisms for mobile crowd sensingsystems,” in Proc. ACM MobiHoc 2015, pp. 167–176.

[13] P. Muncaster. China’s Internet wunderkind inthe dock over alleged fraud. [Online]. Available:http://www.theregister.co.uk/2012/07/03/qihoo fraud traffic comscore

[14] H. Wei. Alipay apologizes for leakof personal info. [Online]. Available:http://www.chinadaily.com.cn/china/2014-01/07/content 17219203.htm

[15] H. Kelly. Apple account hack raises con-cern about cloud storage. [Online]. Available:http://www.cnn.com/2012/08/06/tech/mobile/icloud-security-hack/

[16] B. McInnis, D. Cosley, C. Nam, and G. Leshed, “Taking a HIT:Designing around rejection, mistrust, risk, and workers’ experi-ences in Amazon Mechanical Turk,” in Proc. ACM CHI 2016, pp.2271–2282.

[17] K. B. Mike Isaac and S. Frenkel. Uber hid 2016 breach,paying hackers to delete stolen data. [Online]. Available:https://www.nytimes.com/2017/11/21/technology/uber-hack.html

[18] N. Szabo, “Formalizing and securing relationships on public net-works,” First Monday, vol. 2, no. 9, 1997.

[19] A. Kate. Blockchain privacy: Challenges, solutions,and unresolved issues. [Online]. Available:http://www.isical.ac.in/∼rcbose/blockchain2017/lecture/Kate Slides.pdf

[20] H. A. Murray, “Thematic apperception test.” 1943.[21] J. R. Douceur, “The sybil attack,” in International workshop on peer-

to-peer systems. Springer, 2002, pp. 251–260.[22] K. Yang, K. Zhang, J. Ren, and X. Shen, “Security and privacy in

mobile crowdsourcing networks: challenges and opportunities,”IEEE Commun. Mag., vol. 53, no. 8, pp. 75–81, 2015.

[23] M. Li, J. Weng, A. Yang, and W. Lu, “CrowdBC: A Blockchain-based Decentralized Framework for Crowdsourcing,” 2017.[Online]. Available: http://eprint.iacr.org/2017/444.pdf

[24] S. Saroiu and A. Wolman, “I am a sensor, and I approve thismessage,” in Proc. ACM HotMobile 2010, pp. 37–42.

[25] Y. Baba and H. Kashima, “Statistical quality estimation for generalcrowdsourcing tasks,” in Proc. ACM SIGKDD 2013, pp. 554–562.

[26] S. Goldwasser and S. Micali, “Probabilistic encryption & how toplay mental poker keeping secret all partial information,” in Proc.ACM STOC 1982, pp. 365–377.

[27] N. Salehi, L. C. Irani, M. S. Bernstein et al., “We are dynamo:Overcoming stalling and friction in collective action for crowdworkers,” in Proc. ACM CHI 2015, pp. 1621–1630.

[28] S. Gisdakis, T. Giannetsos, and P. Papadimitratos, “Security, pri-vacy, and incentive provision for mobile crowd sensing systems,”IEEE Internet of Things Journal, vol. 3, no. 5, pp. 839–853, 2016.

[29] F. Buccafurri, G. Lax, S. Nicolazzo, and A. Nocera, “Tweetchain:An alternative to blockchain for crowd-based applications,” inInternational Conference on Web Engineering 2017. Springer, pp.386–393.

[30] C. Tanas, S. Delgado-Segura, and J. Herrera-Joancomart, “Anintegrated reward and reputation mechanism for MCS preservingusers privacy,” in International Workshop on Data Privacy Manage-ment and Security Assurance 2015, pp. 83–99.

[31] A. Muehlemann, “Sentiment protocol: A decentralized protocolleveraging crowd sourced wisdom,” 2017. [Online]. Available:https://eprint.iacr.org/2017/1133.pdf

[32] Q. Li and G. Cao, “Providing efficient privacy-aware incentivesfor mobile sensing,” in Proc. IEEE ICDCS 2014, pp. 208–217.

[33] S. Rahaman, L. Cheng, D. D. Yao, H. Li, and J.-M. J. Park,“Provably secure anonymous-yet-accountable crowdsensing withscalable sublinear revocation,” Proceedings on Privacy EnhancingTechnologies, vol. 2017, no. 4, pp. 384–403, 2017.

[34] D. Chaum, “Blind signatures for untraceable payments,” in Pro-ceedings on Advances in cryptology, 1983, pp. 199–203.

Page 17: 1 ZebraLancer: Crowdsource Knowledge atop Open Blockchain ... · Crowdsourcing empowers open collaboration over the Internet. One remarkable example is the solicitation of an-notated

IEEE TRANSACTIONS ON XXXX, VOL. XX, NO. XX, XXX 20XX 17

[35] D. Chaum, A. Fiat, and M. Naor, “Untraceable electronic cash,” inProceedings on Advances in cryptology. Springer, 1990, pp. 319–327.

[36] J. Camenisch and A. Lysyanskaya, “An efficient system for non-transferable anonymous credentials with optional anonymity re-vocation,” in Advances in Cryptology–EUROCRYPT. Springer,2011, pp. 93–118.

[37] S. Xu and M. Yung, “K-anonymous secret handshakes withreusable credentials,” in Proc. ACM CCS 2004, pp. 158–167.

[38] I. Teranishi, J. Furukawa, and K. Sako, “K-times anonymousauthentication,” in Asiacrypt, 2004, vol. 3329, pp. 308–322.

[39] J. Camenisch, S. Hohenberger, M. Kohlweiss, A. Lysyanskaya, andM. Meyerovich, “How to win the clonewars: efficient periodic n-times anonymous authentication,” in Proc. ACM CCS 2006, pp.201–210.

[40] T. Nakanishi, T. Fujiwara, and H. Watanabe, “A linkable groupsignature and its application to secret voting,” Trans. of InformationProcessing Society of Japan, vol. 40, no. 7, pp. 3085–3096, 1999.

[41] J. K. Liu, V. K. Wei, and D. S. Wong, “Linkable spontaneousanonymous group signature for ad hoc groups,” in AustralasianConference on Information Security and Privacy 2004, pp. 325–335.

[42] M. H. Au, W. Susilo, and S.-M. Yiu, “Event-oriented k-timesrevocable-iff-linked group signatures,” in Australasian Conferenceon Information Security and Privacy 2006, pp. 223–234.

[43] P. P. Tsang, V. K. Wei, T. K. Chan, M. H. Au, J. K. Liu, andD. S. Wong, “Separable linkable threshold ring signatures,” inInternational Conference on Cryptology in India, 2004, pp. 384–398.

[44] A. Kiayias, H.-S. Zhou, and V. Zikas, “Fair and robust multi-partycomputation using a global transaction ledger,” in Proc. Eurocrypt2016, pp. 705–734.

[45] G. Zyskind, O. Nathan, and A. Pentland, “Enigma: Decentralizedcomputation platform with guaranteed privacy,” 2015. [Online].Available: https://arxiv.org/abs/1506.03471

[46] A. Kosba, A. Miller, E. Shi et al., “Hawk: The blockchain modelof cryptography and privacy-preserving smart contracts,” in Proc.IEEE S&P 2016, pp. 839–858.

[47] D. Hopwood, S. Bowe, T. Hornby, and N. Wilcox,“Zcash Protocol Specification.” [Online]. Available:https://github.com/zcash/zips/blob/master/protocol/protocol.pdf

[48] Ethereum Team. Byzantium HFAnnouncement. [Online]. Available:https://blog.ethereum.org/2017/10/12/byzantium-hf-announcement/

[49] J. A. Garay, A. Kiayias, and N. Leonardos, “The bitcoin backboneprotocol: Analysis and applications,” in Proc. EUROCRYPT 2015,pp. 281–310.

[50] R. Pass, L. Seeman, and A. Shelat, “Analysis of the blockchainprotocol in asynchronous networks,” in Proc. EUROCRYPT 2017,pp. 643–673.

[51] V. Buterin, “A next-generation smart contract anddecentralized application platform,” 2014. [Online]. Available:https://github.com/ethereum/wiki/wiki/White-Paper

[52] E. Ben-Sasson, A. Chiesa, M. Green, E. Tromer, and M. Virza, “Se-cure sampling of public parameters for succinct zero knowledgeproofs,” in Proc. IEEE S&P 2015, pp. 287–304.

[53] E. Ben-Sasson, A. Chiesa, D. Genkin et al., “SNARKs for C:Verifying program executions succinctly and in zero knowledge,”in Advances in Cryptology–CRYPTO. Springer, 2013, pp. 90–108.

[54] C. Frantz and M. Nowostawski, “From institutions to code: To-wards automated generation of smart contracts,” in Proc. IEEEFAS*W 2016, pp. 210–215.

[55] Y. Lu, Q. Tang, , and G. Wang, “On Enabling Machine LearningTasks atop Public Blockchains: a Crowdsourcing Approach,” inProc. IEEE ICDM Workshops (BlockSEA) 2018.

[56] H. Galal and A. Youssef, “Verifiable sealed-bid auction on theethereum blockchain,” in International Conference on FinancialCryptography and Data Security, Trusted Smart Contracts Workshop.Springer, 2018.