a belief propagation-based method for task …minjie/pub-ps.dir/knowledge-based-yan-16.… ·...

A Belief Propagation-based Method for Task Allocation

in Open and Dynamic Cloud Environments

Yan Konga, Minjie Zhangb, Dayong Yec

[email protected],Faculty of Computer and Software, Nanjing University of Information, Science, and

Technology, Nanjing, 210044, China

bSchool of Computer Science and Software Engineering, University of Wollongong,Wollongong, 2522, Australia

cSchool of Computer Science and Software Engineering, University of Wollongong,Wollongong, 2522, Australia

Abstract

We propose a decentralized belief propagation-based method, PD-LBP, formulti-agent task allocation in open and dynamic grid and cloud environ-ments where both the sets of agents and tasks constantly change. PD-LBPaims at accelerating the online response to, improving the resilience from theunpredicted changing in the environments, and reducing the message pass-ing for task allocation. To do this, PD-LBP devises two phases, pruningand decomposition. The pruning phase focuses on reducing the search spacethrough pruning the resource providers, and the decomposition addressesdecomposing the network into multiple independent parts where belief prop-agation can be operated in parallel. Comparison between PD-LBP and twoother state-of-the-art methods, Loopy Belief Propagation-based method andReduced Binary Loopy Belief Propagation based method, is performed. Theevaluation results demonstrate the desirable efficiency of PD-LBP from boththe shorter problem solving time and smaller communication requirement oftask allocation in dynamic environments.

Keywords: Belief Propagation, Task Allocation, Dynamism and Openness

Preprint submitted to Elsevier October 17, 2016

1. Introduction

Task allocation in open and dynamic network environments, especiallyin cloud computing environments, is an important issue [1, 2, 3, 4] becauseit could be motivated by various contexts, such as supply chain formation[5, 6, 7, 8], electronic commerce1, RoboCup rescue [9, 10, 11, 12, 13], and com-putation platforms [15, 16]. In such contexts, task allocation helps regulatethe resource management and utilization in the environments. In specific,task allocation in supply chain formation addresses the problem of determin-ing who (i.e., participants in a supply chain) will exchange which resourceswith whom [7]. In electronic commerce, task allocation is allocating tasksof consumers to resources provided by resource providers who rent their re-sources to earn profits. For example, EC2 (Elastic Compute Cloud) [17] isa cloud computing platform of Amazon where Amazon rents its resourcesto earn profits and resource consumers consume the provided resources toperform their tasks. The tasks of resource consumers need to be allocated tosuitable resource providers to meet the goals of both the consumers and theproviders. Different from the resource competition in electronic commerce,task allocation in Robocup is always distributed, addressing the coordinationamong teams of agents that always only have local views in disaster scenar-ios [10]. GENI [18] is a representative one of open and dynamic computa-tion platforms where people/organizations earn profits through exchangingor renting resources to accomplish tasks.

In cloud environments, many tasks with deadlines require resources frommultiple administratively independent resource providers. In addition, thetask may consist of more than one subtask which have dependency con-straints (i.e., a subtask cannot start to be executed until some other sub-task(s) is (are) finished). Considering the dynamism and openness of the en-vironments where both resource providers and consumers can come and leavefreely, two main challenges of task allocation arise. First, due to the timeconstraint, quick online response to and strong resilience from the unpredict-ed changing in the environment are required [19, 20, 21]; the second challengeis how to reduce the communication requirement when changing keeps takingplace and thus the allocation has to re-proceed frequently [22, 23, 24].

Much study about such a type of task allocation has been done, andmany methods have been proposed in the past years, such as multi-resource

1www.ebay.com

2

negotiation-based methods [25, 16, 5, 26, 27, 15], double auction-based meth-ods [28, 29], combinatorial auction-based methods [6, 30, 31, 32], beliefpropagation-based methods [33, 7], and evolutionary algorithm-based method[34]. In addition, Jiang et. al. proposed a novel cloud resource auto-scalingscheme at the virtual machine (VM) level for web application providers toachieve both true elasticity and cost-effectiveness in the pay-per-use cloudbusiness model [35]. In the distributed negotiation mechanism proposed in[16], agents negotiate over both a contract price and a decommitment penal-ty. The decommitment penalty allows agents to decommit from contractsat a cost, and thus an agent could sign contracts with multiple resourceproviders to increase the success probability of task allocation. However, itis hard for the agent to decide how many contracts to sign to achieve theoptimized solution. To solve the problem that most negotiation strategiescannot assure an equilibrium in real applications, Gatti et. al. proposedan efficient bargaining algorithm [26] to achieve an equilibrium in uncertainenvironments. Their bargaining algorithm has seen great success but stillhas some limitations, e.g., the bargaining is carried out in one-sided uncer-tain environments while real applications are always two-sided uncertain. In[29], Walsh et. al. defined a market protocol based on distributed, myopic,progressive auctions, and non-strategic agent bidding policies to determineprices in supply chain formation. The proposed protocol could always ap-proximate to optimal solutions overall through the locally optimized resourceprovider selection. In multi-resource negotiation-based methods, the con-sumer obtains each of the required resources through negotiating with thecorresponding providers separately, and this makes the resource obtainmen-t flexible. One drawback of such methods, however, is that the resourceconsumer risks in not obtaining the complementary resources in follow-upnegotiation threads, after successfully obtaining a partial set of the neededresources. Most of the combinatorial auctions require a central controller(i.e., the auctioneer), which not only hinders the scalability of these methodsbut also is hard to be trusted by other selfish participants. A large numberof evaluations have shown that belief propagation-based methods work well[33, 7], due not only to the distribution characteristic of belief propagationbut also to its resistance characteristic to the dynamism of the environments.Unfortunately, neither of the method in [33] nor [7] pays enough attentionto the quick online response to and resilience from the changing of highlydynamic environments, even though belief propagation can work in dynam-ic environments. A ranking model was proposed in [36] to rank the tasks

3

to be assigned resources through calculating the weights of tasks, and thenresources are assigned to the ranked tasks accordingly. This model couldwork well in quite steady environments, but not desirable for highly dynamicones because any changing can result in the re-ranking of all of the tasks.The resource allocation method proposed by Piraghaj et al. maps groups oftasks to customized virtual machine types, according to the mapping whichis based on the task usage patterns obtained from the analysis of the histor-ical data extracted from utilization traces [37]. However, due to the moreand more hierarchies and types of tasks and resources in cloud environments,the task usage patterns need to be updated frequently. This may inhibit thecloud scalability and decrease the time efficiency of task allocation which isimportant to the task allocation with time constraints. Guo et. al. proposeda workflow task scheduling model, in which the processors are pre-treatedby a fuzzy clustering method in order to realize the reasonable partition ofprocessor network, and this can largely reduce the cost in deciding whichprocessor to execute the current task [38].

Against the background, the novelty of this paper is that in order to ad-dress the challenging issues of openness and dynamism for task allocation incloud environments, the proposed belief propagation-based task allocationmethod (i.e., PD-LBP) devises two phases for the task allocation process:pruning and decomposition. The pruning phase prunes some alternativeproviders to decrease the number of providers involved in belief propagation.The decomposition phase decomposes the whole network into multiple inde-pendent sub-networks where belief propagation can be run at the same timeand thus converge quickly.

This paper is organized as follows. The problem definition is formulatedin Section 2, and Section 3 introduces our belief propagation-based taskallocation method. Evaluation is presented in Section 4, and we conclude inSection 5.

2. Problem Definition

Assume that the task to be allocated is T = {t1, t2, ..., tm} (t1, t2, ..., tmare the subtasks of T ), only when all the subtasks are successfully allocated,can the allocation of T be considered to succeed. T has a deadline dl, butthere is not deadline for individual subtasks. Similarly, the consumer hasa reserve price pres for T but does not have reserve price for any individualsubtask. The provider agents each of which can execute one subtask of T are

4

called alternatives, and there may be multiple alternatives for each subtask.Task allocation in this paper is to select a provider for each of the subtasksfrom the corresponding alternatives to make them collaboratively finish thetask, aiming at maximizing some pre-defined objective function. In otherwords, the solution of our problem is a configuration of providers that canoptimize the task allocation according to some predefined criterion. It isnotable that maybe some alternatives cannot collaborate with each otherdue to some reasons (e.g., geography, traffic reasons) which are beyond thestudy of this paper. Due to the dynamism and openness of the environment,the sets of both alternatives and tasks change constantly.

Formally, we use S = {S1, S2, ..., Sp} to denote the set of all the configu-rations that can finish T . Sk ∈ S (1 ≤ k ≤ p) is the kth configuration, andSk = {a1k, a2k, ..., amk } where ajk ∈ Sk (1 ≤ j ≤ m) is the selected alternativefor subtask tj. If the execution time needed by the selected alternatives tofinish the related subtasks are t1k, t2k, ..., tmk , and the quotes of the selectedalternatives are q1k, q

2k, ..., q

mk , respectively, then the goal of our problem is to

find the best configuration S∗ (S∗ ∈ S) according to:

S∗ = arg maxS

∑uj

subject to{∑mj=1 q

j∗ ≤ pres

t +∑m

j=1 tj∗ ≤ dl

(1)

where uj is the utility that can be gained by the consumer from subtasktj and will be defined later, and t is the time when this equation is calculated.Equation 1 represents that the quote summation of the selected alternativesfor all the subtasks must not be higher than the reserve price, i.e., pres, of thetask. Besides, the execution time summation of all the selected alternativemust not be longer than the deadline of the task.

The task allocation problem described above is illustrated in Fig. 1 wherethe task to be allocated is T = {t1, t2, t3, t4}, and the dependency constraintof the subtasks is t1 → t2 → t3 → t4. The set of alternatives for all thesubtasks is A = {a1, a2, a3, a4, a5, a6, a7}, and the respective alternative setsfor t1, t2, t3, and t4 are A1 = {a1, a2}, A2 = {a3, a4}, A3 = {a5, a6}, andA4 = {a7}. As shown in Fig. 1, either a1 or a2 can execute t1. If a1 isselected to execute t1, it can pass over the task to either a3 or a4 to finisht2 after finishing t1. If a3 is selected to execute t2, it can pass over the

5

task only to a5 to finish t3 after finishing t2. Regardless of the constraintsof both reserve price and deadline, the solutions (i.e., configurations) setis S = {{a1, a3, a5, a7}, {a1, a4, a6, a7}, {a2, a3, a5, a7}, {a2, a4, a6, a7}}. Thepurpose of the task allocation is to find the configuration S∗ to maximize someobjective function in the dynamic environments where both the alternativesand tasks keep changing over time.

Figure 1: Subtasks and Alternatives Figure 2: Markov Random Field Form

From the above analysis, the provider alternative selection for all subtasksof a task is a Markov Random Field (MRF). MRF (also named undirectedgraphical model) is a set of variables having Markov properties described byan undirected graph [39]. In brief, given an undirected graph G = (V, E),the set of variables can form a MRF with respect to G if they can satisfy thefollowing above Markov properties [40]:

1: Pairwise Markov property: given all other variables, any two non-adjacent variables are conditionally independent. For example, in Fig.2, provided that p1, p2, and p3 are resource providers, and c1 is the re-source consumer whose task need to be accomplished by p1 and eitherp2 or p3. The task of c1 need to be executed by p1 first, and then accom-plished by either p2 or p3. Obviously, when the states of p2 and p3 areboth decided, whether c1 can get the accomplished task is consequentlydecided. In other words, the states of c1 is only dependent with thestates of its adjacent neighbors (i.e., p2 and p3), and is conditionallyindependent with the states of p1.

2: Local Markov property: given its neighbors, a variable is conditionallyindependent of all other variables.

6

3: Global Markov property: given a separating subset, any two subsets ofvariables are conditionally independent.

3. PD-LBP

According to the problem description in the previous section, a pruning-decomposition loopy belief propagation-based task allocation method (PD-LBP) is proposed. PD-LBP is based on the loopy belief propagation-based(LBP) supply chain formation method [33] proposed by Whisper and Chliin the state and belief definitions. PD-LBP improves LBP mainly from t-wo points. First, PD-LBP simplifies LBP through two simplification phases(i.e., pruning and decomposition), aiming at improving the performances interms of quick convergence of belief propagation, mitigating the communica-tion requirement, and accelerating the online response to and resilience fromunpredicted changing when the environment is highly dynamic. Second, un-like LBP where the consumer only considers the quotes of alternatives, boththe reserve price and deadline are considered.

3.1. State and Belief Definition

Following the state definition of LBP, the possible states of an alternativein PD-LBP include inactive and the possible partnerships of the alternative.The state of inactive represents that the alternative is not selected to executethe related subtask. The possible partnership of an alternative stands forfrom whom the alternative takes over the task and to whom the task ispassed over. For example, in Fig. 2, the task to be allocated consists of twosubtasks, t1 and t2. p1 is the alternative for subtask t1, p2 and p3 are thealternatives for subtask t2, and c1 is the consumer. The states of p2 include‘inactive’, and ‘taking over the task from p1 and passing the finished task tothe consumer’. We list the possible states of p1 and p2 in Table 1 where snmstands for the nth state of alternative am.

Because the objective function of PD-LBP is to maximize the utilitygained from the task allocation, the belief of an alternative about a state isthe utility as a whole if the alternative was assigned to the state. Still follow-ing LBP, two types of utilities are defined in PD-LBP: the unary utility (willbe formulated later) of a state and pairwise utility between two states of twoadjacent agents. Pairwise utility is defined to represent the compatibilitybetween two states of two adjacent agents. Two states of two adjacent al-ternatives are incompatible only when one of the alternatives wants to trade

7

Table 1: States of p1 and p2

states of p1 states of p2

s11: inactives21: executing t1 and then passingover the task to p2 to execute t2s31: executing t1 and then passingover the task to p3 to execute t2

s12: inactives22: taking over the taskfrom p1 to execute t2and passing over thefinished task to c1

Table 2: Pairwise utility between States of p1 and p2

p1 ←→ p2

g(s11, s12) = 0

g(s11, s22) = −∞

g(s21, s12) = −∞

g(s21, s22) = 0

g(s31, s12) = 0

g(s31, s22) = −∞

with the other but the other does not. For example, according to the statedefinition of p1 and p2 in Table 1, s21 and s12 are incompatible in that theformer means that after finishing t1, p1 passes over the task to p2 to executet2, whereas the later stands for that p2 is inactive. It is notable that twoinactive states are considered compatible. If two states are incompatible, thepairwise utility between them is −∞ and zero otherwise. We still take p1and p2 as examples to list the pairwise utilities between the states of themin Table 2.

The unary utility of the state of inactive is set zero. Now we studyand formulate the unary utility of active states of an alternative. UnlikeLBP where the consumer considers only one attribute (i.e., the quote) of theservice of an alternative, in PD-LBP, there are two attributes of the deliveredservice of an alternative taken into account by the consumer, i.e., the quoteand the time needed by the alternative to finish the related subtask. Againstthis, the simple multi-attribute rating technique (SMART) methodology [41,42] is employed to formulate the unary utility, and there are three main stepsfor this.

8

Step 1: To identify the decision maker, alternatives, and attributes; Apparent-ly, the decision maker and alternatives are the consumer and provideragents, respectively. The attributes include both the quote and theexecution time.

Step 2: To scale and develop weights for each attribute;

We assume that the consumer has expected ranges for the quote andthe needed execution time, denoted by [qlow, qhigh] and [tlow, thigh], re-spectively, for each individual subtask. This assumption is reasonableand feasible in real life applications in that the consumer always has ex-plicit and fixed restrains for neither the quote nor the execution time foreach individual subtask, but only has explicit and fixed restrains for thetotal cost and the total execution time for the whole task. [qlow, qhigh]and [tlow, thigh] are used as the respective scales of quote and executiontime. The consumer assigns weight values to the quote and executiontime, according to its preference which is beyond our research in thispaper. We use w1 and w2, subject to w1+w2 = 1, to denote the weightsassigned to the quote and execution time, respectively.

Step 3: To score each alternative on each attribute separately first, then valuethe alternatives accordingly and synthetically. Suppose that qi and exeiare the quotes and execution time of alternative ai, respectively, thelower the reputation of the bidder is, the score of ai on qi is formulatedas:

sqi =qhigh − qiqhigh − qlow

(2)

The score of ai on exei is formulated as:

sexei =thigh − exeithigh − tlow

(3)

The reason to formulate sqi and sexei like Equations (2) and (3) re-spectively is to reflect the truth that the higher (longer) the quote(execution time) is, the lower of the reputation of the related alterna-tive.

An additive value function [42] which is widely used is adopted to scorealternative ai on both qi and exei synthetically by:

scor(ai) = w1 × sqi + w2 × sexei (4)

9

scor(ai) is used as the unary utility of the states of ai except the stateof ‘inactive’ of which the unary utility is set zero.

After defining the states and corresponding beliefs for belief propaga-tion, we in detail introduce the simplification phases of PD-LBP next,before introducing the belief updating and message passing in Section3.3.

3.2. Computational Simplification

Imagine that if the network formed by all the alternatives and consumercan be separated into more than one independent part, and thus belief prop-agation can be locally operated in these parts in parallel but not sequentiallyin the whole network, then the convergence of belief propagation (i.e., theoptimization solution) could probably be obtained quicker. In addition, whenchanging in the environments takes place frequently, the caused large amountof re-computations and message passing may prevent belief propagation fromconverging. Even after the convergence of belief propagation, the consumermay still want to re-operate the belief propagation before the task deadlinewhenever a new provider (alternative) comes to check whether a higher u-tility can be obtained due to the arrival of the new alternative, or has tore-operate the belief propagation due to the defaults of some committed al-ternatives. Based on all of the above inspirations, two phases, pruning anddecomposition, are designed to mitigate the problem. In the pruning phase,given both the reserve price and task deadline, the alternatives (if any) thatwill never maximize the whole utility regardless of the configurations of allother alternatives are pruned in advance of belief propagation. After thepruning, the network is decomposed into independent parts (if possible), andthis is the decomposition phase. We will in detail introduce the two phasesseparately.

3.2.1. Phase 1: pruning

The principles of pruning come from both the reserve price and deadlineconstraints. If no matter what the configuration of all the other alternativesis, the summation of the quote (execution time) of an alternative for thesubtask that this alternative is interested in and the quotes (execution time)for all the other subtasks is higher (larger) than the reserve price (task dead-line), apparently this alternative will never be selected and thus should be

10

pruned in advance of and to simplify belief propagation. Such an alternativeis called a dominated alternative. We assume that the consumer has all thealternatives’ information (i.e., quote and execution time). This assumptionis feasible and practical in that it is common for the alternatives to registerin the consumer when they arrive at the environment. It is notable thatthe consumer has the information about all the alternatives does not meanthat PD-LBP is centralized, because belief propagation is decentralized anddoes not need the control of the consumer. Therefore, it is reasonable for usto designate the consumer to do the pruning and decomposition which willbe introduced later. Formally, assume that the quote and execution time ofalternative ai ∈ Ai (Ai is the alternative set of subtask ti) are qi and exei,respectively, if at least one of equations (5) and (6) is met, ai is dominatedand will be pruned.

qi +∑

tk∈T \ti

min{qk|ak ∈ Ak} > pres (5)

orexei + t +

∑tk∈T \ti

min{exek|ak ∈ Ak} > dl (6)

where T is the subtask set, t is the time when Equation (5) is calculated,Ak is the alternative set of subtask tk, and pres and dl are the reserve priceand deadline of T , respectively. From Equation (5), we know that if thesummation of the quote of alternative ai for subtask ti and the lowest quotesfor all other subtasks is higher than the reserve price pres, ai will be pruned.Similarly, if the summation of the execution time of alternative ai for subtaskti and the shortest execution time for all other subtasks is longer than thedeadline dl, ai will also be pruned, as Equation (6) indicates.

It is recognized that in the dynamic environments a dominated alternativemay become non-dominated due to the arrivals of some new alternatives.In order to reactivate such alternatives, the consumer stores a list of allthe dominated alternatives, and checks whether the dominated alternativesbecome non-dominated whenever a new alternative comes. If a dominatedalternative becomes non-dominated, the consumer informs this alternativeand the alternative’s intermediate neighbors to make the alternative involvedinto the belief propagation in the next iteration.

3.2.2. Phase 2: decomposition

11

Decomposition in belief propagation has been paid more and more at-tention recently [43]. Before in detail introducing the decomposition phasein PD-LBP, we first make some concepts clear. If subtask ti+1 cannot startto be executed until subtask ti is finished, then ti is called the predecessorof ti+1, and ti+1 is the successor of ti. For example, in Fig. 1, t2 is thepredecessor of t3, and t3 is the successor of t2. According to the problemdescription in Section 2, it is possible that all the alternatives of the pre-decessor subtask can collaborate with all the alternatives of the successorone. This means that the alternative selection for the predecessor subtaskdoes not affect that for the successor one. According to the Markov prop-erties presented in Section 2, non-adjacent nodes in MRF are conditionallyindependent. Consequently, if the alternative selections for two adjacentsubtasks are independent, the MRF (i.e., the corresponding network formedby alternatives and the consumer) can be decomposed between these twoadjacent subtasks. To find such adjacent subtasks, dependency weight be-tween two adjacent subtasks is defined. Assume that ti and ti+1 are adjacent,when the dependency weight between them, wdep(i, i + 1), is zero, the net-work could be separated between them, and the connection between them iscalled a separation link. Formally, if Ai and Ai+1 are the alternative sets ofti and ti+1, respectively, wdep(i, i+1) is zero if the following condition is met:∀(am ∈ Ai, sm ∈ Sm \ inactive, ak ∈ Ai+1),∃sk ∈ Sk, g(sm, sk) = 0 where Sm

and Sk are the state sets of am and ak, respectively.Assume that the whole network is decomposed into two parts: the left

part and the right one. Upon the decomposition, a problem rises: there maybe more than one chain formed for the subtasks in the left part. Against this,an agent is added to the end of the left part to play the role of the consumerto guarantee that only one chain is formed. As it was analyzed earlier,after the convergence of belief propagation, the committed alternatives maydefault. Additionally, new alternatives keep coming. Against this, how todeal with the dynamism from the perspectives of both the leaving and arrivalof alternatives is not trivial and thus is the problem that we study next.

3.2.3. Dealing with the dynamism

Suppose that ti−1 is the predecessor of ti, and ti is the predecessor of ti+1.In addition, both wdep(ti−1, ti) and wdep(ti, ti+1) are zero. Now we separatelystudy how to deal with the leaving and arrival of alternatives.

The Leaving of an Alternative

12

Actually, the leaving of an alternative of ti affects neither wdep(ti−1, ti)nor wdep(ti, ti+1). Formally, we have:

Theorem 1: After the alternative am ∈ Ai (Ai is the alternative set ofti) leaving, both wdep(ti−1, ti) and wdep(ti, ti+1) are still zero. The followingis the proof about this.

Proof: Because wdep(ti−1, ti) = 0, according to the analysis in Section3.2.2, we have ∀(an ∈ Ai−1, sn ∈ Sn \ inactive, ak ∈ Ai), ∃sk ∈ Sk, g(sn, sk) =0 (Sn and Sk are the state sets of an and ak, respectively). Consequently,when am ∈ Ai leaves, ∀(an ∈ Ai−1, sn ∈ Sn \ inactive, ak ∈ Ai \ am), it stillholds that ∃sk ∈ Sk, g(sn, sk) = 0. Therefore, wdep(ti−1, ti) is still zero.

Similarly, because wdep(ti, ti+1) = 0, then we know that ∀(an ∈ Ai, sn ∈Sn \ inactive, ak ∈ Ai+1), ∃sk ∈ Sk, g(sn, sk)= 0. Consequently, when am ∈ Ai leaves, ∀(an ∈ Ai \ am, sn ∈ Sn \inactive, ak ∈ Ai+1), it still holds that ∃sk ∈ Sk, g(sn, sk) = 0. Therefore,wdep(ti, ti+1) = 0 still holds.

The Arrival of a New AlternativeWhen a new agent ap ∈ Ai comes, if ∀(sp ∈ Sp \ inactive, ak

∈ Ai+1),∃sk ∈ Sk, g(sp, sk) = 0, then we have ∀(an ∈ Ai ∪ {ap}, sn ∈ Sn \inactive, ak ∈ Ai+1), ∃sk ∈ Sk, g(sn, sk) = 0. In this situation, wdep(ti, ti+1)is still zero. Otherwise, ti and ti+1 are not independent any more due tothe arrival of ap. As a consequence, the two parts, which contain ti and ti+1

respectively, will be merged into one. Upon the merging, the agent whichwas added into the end of the left part (i.e., the part that contains ti) whenthe two parts were separated is removed.

If ∀(am ∈ Ai−1, sm ∈ Sm \ inactive),∃sp ∈ Sp, g(sm, sp) = 0, then we have∀(am ∈ Ai−1, sm ∈ Sm \ inactive, ak ∈ Ai ∪ {ap}),∃sk ∈ Sk, g(sm, sk) = 0. Inthis situation, the separation between ti−1 and ti still holds. Otherwise, theseparation becomes invalid, and the two corresponding parts will be mergedinto one. In addition, the agent added to the end of the left part is removedas well.

3.3. Belief Updating and Message Passing

After the pruning and decomposition, belief propagation is in paralleloperated on the pruned and decomposed parts of the network. The beliefsabout all the states are initialized to 0. Inspired by [44], after receivingmessages from all of its adjacent agents, agent au updates its belief about its

13

state siu, denoted as belu(siu), by:

belu(siu) = uti(siu) +∑

av∈Nu

mv→u(siu) (7)

where uti(siu) is the unary utility of siu which has been analyzed in Section3.1, Nu is the neighbor (i.e., adjacent agent) set of au, and mv→u(siu) is thebelief of av about siu contained within the message passed from av to au whichwill be defined next.

The message passed from av to au, denoted by mv→u, contains a vectorthat consists of the beliefs of av about all the states of au. The belief of avabout the state siu of au contained in mv→u is calculated by [44]:

mv→u(siu) = maxsjv(uti(sjv) + g(sjv, s

iu) +

∑ap∈Nv\au

mp→v(sjv)) (8)

where uti(sjv) is the unary utility of the state sjv, g(sjv, siu) is the pairwise

utility between sjv and siu, Nv is the neighbor set of av, and mp→v(sjv) is the

belief of ap about the state sjv contained within the message passed from apto av.

The belief propagation converges when the belief values of all the agentsabout all their states remain the same with those in the previous iteration.

3.4. Task Allocation

Upon the convergence of belief propagation, task allocation is performedaccording to the convergence results. If belief propagation has not convergedeven for the first time till the task deadline arrives, task allocation fails.We emphasize ‘the first time’ in that the situation could also be that theconsumer re-operate belief propagation after belief propagation already con-verged to try to obtain a higher utility, due to the new arrivals of alternatives.If the results of the re-operation of belief propagation is not better than thealready obtained supply chain, or the re-operation does not converge whentask deadline arrives, the consumer adopts the already obtained chain ob-tained in the previous belief propagation. In this situation, task allocationis still considered successful. Upon the convergence of belief propagation,the alternatives whose states are not ‘inactive’ form the supply chain. Aswe have analyzed earlier, belief propagation may be in parallel operated onmultiple independent parts caused by the decomposition. In this situation,

14

the integrated supply chain is obtained through merging the sub-chains ob-tained in all the independent parts. When belief propagation is re-operateddue to some changing in the environment, the scope where the re-operationshould take place has been analyzed in Section 3.2.3. It is notable that beliefpropagation may not be re-operated when some changing takes place. Forexample, the newly coming alternative will be pruned in the pruning phasebefore the start of belief propagation.

In order to make the whole procedure of task allocation clear, a proce-dure algorithm and a flowchart are presented in Algorithm 1 and Figure 3,respectively.

3.5. Time Complexity Analysis

Assume that there are A agents (i.e., resource providers) connected witheach subtask, T subtasks connected with each agent, and totally n agentsfor all the subtasks. According to the pruning and decomposition algorithm-s analyzed in Sections 3.2.1 and 3.2.2, respectively, the time complexity ofpruning is O(AT ), and that of decomposition algorithm is O((T−1)A2). Pro-vided that after the pruning and decomposition phases, there are n/p agentsare remained and the network is decomposed into q sub-networks, then thetime complexity of messaging forming of PD-LBP is O(GA2G+1(n/p)(q/s)).Therefore, the time complexity of PD-LBP at each iteration is O(AT + (T −1)A2 + GA2G+1(n/p)(q/s)). Because G ≥ 1 and thus 2G + 1 ≥ 3, the timecomplexity consequently is O(GA2G+1(n/p)(q/s)).

4. Evaluation

4.1. Benchmarks

LBP is the method that PD-LBP is based on and tries to improve, andthus is picked as one of the evaluation benchmarks. Penya-Alba et al. pro-posed a Reduced Binary Loopy Belief Propagation based method (RB-LBP)in [44]. Through encoding the network into a binary factor graph, RB-LBPhas been experimentally proved to outperform LBP in terms of the com-munication, computation, and memory requirements. Being an extension ofLBP as well, RB-LBP should not be ignored when PD-LBP is evaluated.

The time complexity of LBP at each iteration is O(nGA2G+1), and thatof RB-LBP is O(nGA2). Obviously, O(GA2G+1(n/p)(q/s)) is smaller thanO(nGA2G+1). Now we compare the time complexities of PD-LBP and RB-LBP. In the task allocation problem addressed in this paper, each subtask

15

Algorithm 1: Task Allocation Procedure

We assume that the task to be allocated is T = {t1, t2, ..., tm},the alternatives set is A = {a1, a2, ..., an}, and Ai is thealternative set of ti.Pruning1 for (i=1...n)2 if (ai meets Equation (5) or (6))3 prune ai;Decomposition4 for (i=1...m-1)5 if (wdep(ti, ti+1) = 0)6 decompose the network from the connection between ti7 and ti+1;Belief propagation is in parallel operated inthe decomposed parts locallyDealing with dynamism after the convergence ofbelief propagation8 when(an alternative ai ∈ Ai leaves)9 belief propagation is locally re-operated in the part that10 concludes ti;11 when(a new alternative ai ∈ Ai comes)12 to check whether ai can be pruned13 if yes,14 prune ai;15 if no,16 to check whether the separation link between ti and17 ti+1 is still valid;18 if yes,19 only the part where the changing happens needs20 to re-operate belief propagation;21 if no,22 the corresponding two parts are combined into one23 part, and belief propagation needs to be24 re-operated in the newly formed part;25 to check whether the separation link between26 ti−1 and ti is still valid;27 if yes,28 only the part where the changing happens needs29 to re-operate belief propagation;30 if no,31 the corresponding two parts are combined into32 one part and belief propagation needs to be33 re-operated in the newly formed part;Task allocation according to belief propagation result34 Allocating subtasks upon and according to the35 convergence results of belief propagation.

16

Figure 3: Flowchart of The Task Allocation Procedure

17

is actually connected with two agents, the one it takes the task from, andthe other one it passes the task to, and thus G = 2 and the time complexityof PD-LBP is O(GA5(n/p)(q/s)). When ps ≥ A3q, O(GA5(n/p)(q/s)) ≤O(nGA2). It demonstrates that when A is defined, the bigger s is, it is moreprobable that the used time of PD-LBP is small than that of RB-LBP. In oth-er words, when a task contains quite many subtasks, PD-LBP theoreticallyoutperform RB-LBP, and this will be experimentally tested later.

4.2. Evaluation Criteria and Settings

With the three goals of PD-LBP (i.e., the quick convergence of beliefpropagation, the quick response to and resilience from changing in dynamicenvironments, and small communication requirement) in mind, the evalua-tion criteria are correspondingly determined. First, the used time to suc-cessfully allocate the task is tested. Second, the total bandwidth of all theagents at each iteration is tested to evaluate the communication requiremen-t. Furthermore, some intermediate results are also necessary to be tested toevaluate the devised pruning and decomposition algorithms. In order to eval-uate the pruning algorithm, the percentage of remained agents after pruningis tested, and the number of decomposed sub-networks after decompositionis also tested to evaluate the decomposition algorithm.

To obtain satisfactory performance in the face of dynamism of the envi-ronment is an important motivation of PD-LBP, as a consequence, the eval-uation (which is implemented using JAVA programming language) should becarried out with various dynamism levels (denoted by dyn) of the environ-ments. In the evaluation, four different problems are generated: 50 subtasksand 200 agents (i.e., resource providers), 100 subtasks and 400 agents, 150subtasks and 600 agents, and 200 subtasks and 800 agents. An indicator,a binary variable, is assigned to each provider to simulate the state of theprovider. In specific, it represents that the provider is in the environmen-t when the indicator variable is 1, not in the environment otherwise. Wesimulate the dynamism and openness of the environment, i.e., the leavingand entering of providers, through changing the values of the indicators ofproviders. In particular, it represents that a provider enters the environmentwhen its indicator becomes 1 from 0, and leaves the environment when theindicator becomes 0 from 1. The indicator variables of all the providers areinitialised by 1 at the start of the evaluation. The value of dyn is the sum-mation of the number of both the leaving and entering of all the providersper time unit (i.e., 100 seconds). For simplicity reason, the dynamism is

18

classified into five levels listed in Tables 3. The evaluation will be performedbased on the five levels of dynamism for all the above mentioned four differentproblems respectively.

The environment used to implement our simulation algorithm is as fol-lows: our algorithm PD-LBP and the comparative algorithms, LBP andRB-LBP, are all implemented in the simulation environment Omnet ++ 4.0.The simulation is run on Celeron (R) dual-core CPU running at 2.10 GHzwith 2GB of RAM, Window 7.

Table 3: Definition of Dynamism Level

Dynamism Level

(0, 2] 1(2, 4] 2(4, 6] 3(6, 8] 4(8, 10] 5

4.3. Observations

(a) (b)

Figure 4: Performance Based on Various Dynamism Levels when Nsub = 50 and n = 200

Fig. 4, 5, 6, and 7 depict the the communication requirement of all theagents at each iteration and the problem solving time (i.e., the totally usedtime to successfully allocate a task) obtained based on various dynamism lev-els, in the four different problems mentioned in Section 4.2, respectively. Fig.

19

(a) (b)


(a) (b)


(a) (b)


20

8 presents some intermediate results including the percentage of the remainedagents after pruning (denoted by p) and the number of sub-networks afterdecomposition (denoted by q). Nsub and n denote the numbers of subtasksand agents, respectively.

From Fig. 4 (a), Fig. 5 (a), Fig. 6 (a), and Fig. 7 (a), it can be seenthat the total bandwidths of all the agents per iteration of both RB-LBPand PD-LBP are fewer than that of LBP. The general reason for this is thatRB-LBP can mitigate the communication requirement through encoding theTDN model into a binary factor graph of which each factor node is binary(i.e., only has two states). As to PD-LBP, the pruning and decompositionphases significantly decrease the communication requirement. It also can beseen that the higher the dynamism level is, the more PD-LBP outperformsboth LBP and RB-LBP. This is because the advantage of simplification (i.e.,the pruning and decomposition) of PD-LBP becomes more obvious with theincrease of the dynamism level. In specific, when a changing takes place,the already converged belief propagation needs to be re-operated throughthe whole network in both LBP and RB-LBP, while PD-LBP narrows thescope that needs to re-operate the belief propagation through decomposition.This results in that the total bandwidth of PD-LBP does not increase asdramatically as those of LBP and RB-LBP when the the dynamism levelincreases, as we can see from Fig. 4 (a), Fig. 5 (a), Fig. 6 (a), and Fig. 7(a). Furthermore, when Nsub is 50 and the dynamism level is not high, thebandwidth of PD-LBP is slightly higher than that of RB-LBP, the only reasoncan explain this is that when Nsub is small, the advantage of pruning anddecomposition of PD-LBP is not obvious, compared with the binary nodesadopted in RB-LBP. However, with Nsub increasing from 50 to 200, PD-LBPoutperforms RB-LBP in both the communication and used time. Due tothe reduction of messages to be calculated and passed, the convergence ofbelief propagation is consequently accelerated. This can explain the shortestused time of PD-LBP, compared with both LBP and RB-LBP, as shown inFig. 4 (b), Fig. 5 (b), Fig. 6 (b), and Fig. 7 (b). It is notable that theevaluation is carried out in a simulation platform (i.e., Omnet++) which iswidely used for simulations due to its high fidelity, and the obtained resultsof problem solving time is acceptable, we infer that the time complexity isalso acceptable in real applications. In summary, from Fig 4, 5, 6, and 7, wecould see that compared with LBP and RB-LBP, PD-LBP is more suitablefor big-scalability tasks (i.e., contains more subtasks) and highly dynamicenvironments.

21

(a) (b)

Figure 8: Performance Based on Dynamism Level

Fig. 8 presents the percentages of remained agents after pruning (i.e.,p) and the numbers of sub-networks after decomposition (i.e., q) which arestatistically obtained at six time points during the task allocation procedure.From Fig. 8 (a), it can be seen that p varies from 0.3 to 0.67, and thelargest number of q is 45 while the smallest one is 22. In Fig. 8 (b), pvaries from 0.32 to 0.56, and q varies from 33 to 50. It is notable thatbecause p and q are statically obtained at six time points, the values of pand q only depend on the unpredicted changing(s) in the environment thattakes place at the corresponding time points and thus are statistical. Thesignificantly decreased p can explain the largely alleviated communicationand time requirements of PD-LBP presented in Fig. 4, 5, 6, and 7. Inaddition, from Fig. 8 (b), it could be predicted that when some agent(s)come(s)/leave(s), belief propagation of PD-LBP may only need to be re-operated in the 1/22 or 1/45 of the network, while that of both LBP and RB-LBP need to be re-operated through the whole network. In summary, Fig. 8could explain the shorter problem solving time and smaller communicationrequirement of PD-LBP in a more direct and obvious way.

5. Conclusion and Future Work

Aiming at improving the performances of task allocation method (e.g.,mitigating the communication requirement and shortening the problem solv-ing time) in highly dynamic environments, PD-LBP is proposed in this paper.Compared with other state-of-the-art methods, the main innovation of our

22

method is that it takes the dynamism and openness of the environments intoconsideration when allocating tasks. Accordingly, PD-LBP devises two phas-es, pruning and decomposition, to withstand the dynamism and openness,and has been experimentally proved to succeed. In specific, the totally usedbandwidth for all the agents per iteration of PD-LBP is smaller than thatof two other state-of-the-art benchmarks. Furthermore, the problem solvingtime of PD-LBP is also shorter. The evaluation results can demonstrate thetwo main contributions of PD-LBP: first, compared with state-of-the-art taskallocation methods, PD-LBP can work better in allocating big scale tasks;second, PD-LBP could work well in highly dynamic environments.

However, some assumptions that PD-LBP is based on are not always fea-sible (e.g., providers quote truthfully and do not change their quotes throughthe whole task allocation process, and each agent can execute only one sub-task). Against this, our future work will address the releasing of these as-sumptions through allowing for strategic quotes of providers and allowingan agent to perform more than one subtask. In addition, no testing in realapplications is one limitation of our work, and thus to test our method inreal world is also one of our future works.

6. Acknowledgement

This work is supported by The Chinese National Natural Science Foun-dation (Fund No. 61602254), The Jiangsu Provience Natural Science Foun-dation, China (Fund No. BK2160968), and The Startup Foundation forIntroducing Talent of NUIST (No. 2015r050) from Nanjing University ofInformation, Science and Technology, China.

[1] M. J. Mataric, G. S. Sukhatme, E. H. Østergaard, Multi-robot task allo-cation in uncertain environments, Autonomous Robots 14 (2-3) (2003)255–263.

[2] A. V. Chandak, B. Sahoo, A. K. Turuk, Heuristic task allocation strate-gies for computational grid.

[3] J. Ko lodziej, F. Xhafa, Modern approaches to modeling user require-ments on resource and task allocation in hierarchical computational grid-s, International Journal of Applied Mathematics and Computer Science21 (2) (2011) 243–257.

23

[4] F. Xhafa, A. Abraham, Computational models and heuristic methodsfor grid scheduling problems, Future generation computer systems 26 (4)(2010) 608–621.

[5] H. S. Kim, J. H. Cho, Supply chain formation using agent negotiation,Decision Support Systems 49 (1) (2010) 77–90.

[6] W. E. Walsh, M. P. Wellman, F. Ygge, Combinatorial auctions for sup-ply chain formation, in: Proceedings of the 2Nd ACM Conference onElectronic Commerce, EC ’00, ACM, New York, NY, USA, 2000, pp.260–269. doi:10.1145/352871.352900.URL http://doi.acm.org/10.1145/352871.352900

[7] T. Penya-Alba, From supply chain formation to multi-agent coordi-nation, in: Proceedings of the 2013 International Conference on Au-tonomous Agents and Multi-agent Systems, AAMAS ’13, InternationalFoundation for Autonomous Agents and Multiagent Systems, Richland,SC, 2013, pp. 1447–1448.URL http://dl.acm.org/citation.cfm?id=2484920.2485269

[8] T. Penya-Alba, M. Vinyals, J. Cerquides, J. A. Rodriguez-Aguilar,A scalable message-passing algorithm for supply chain formation., in:AAAI, 2012.

[9] S. Ramchurn, A. Farinelli, K. Macarthur, N. Jennings, Decentralizedcoordination in robocup rescue, The Computer Journal 53 (9) (2010)1447–1461.

[10] P. R. Ferreira Jr, F. Dos Santos, A. L. Bazzan, D. Epstein, S. J. Waskow,Robocup rescue as multiagent task allocation among teams: experimentswith task interdependencies, Autonomous Agents and Multi-Agent Sys-tems 20 (3) (2010) 421–443.

[11] R. Nair, T. Ito, M. Tambe, S. Marsella, Task allocation in the robocuprescue simulation domain: A short note, in: RoboCup 2001: RobotSoccer World Cup V, Springer, 2002, pp. 751–754.

[12] S. Ramchurn, M. Polukarov, A. Farinelli, C. Truong, N. Jennings, Coali-tion formation with spatial and temporal constraints, in: Proc. of AA-MAS, 2010, pp. 1181–1188.

24

http://doi.acm.org/10.1145/352871.352900

http://doi.acm.org/10.1145/352871.352900

http://dx.doi.org/10.1145/352871.352900

http://doi.acm.org/10.1145/352871.352900

http://dl.acm.org/citation.cfm?id=2484920.2485269



[13] A. Chapman, R. Micillo, R. Kota, N. Jennings, Decentralised dynamictask allocation: a practical game-theoretic approach, in: Proc. of AA-MAS, 2009, pp. 915–922.

[14] B. An, V. Lesser, D. Westbrook, M. Zink, Agent-mediated multi-stepoptimization for resource allocation in distributed sensor networks, in:AAMAS, 2011, pp. 609–616.

[15] B. An, V. Lesser, K. M. Sim, Strategic agents for multi-resource ne-gotiation, Autonomous Agents and Multi-Agent Systems 23 (1) (2011)114–153.

[16] B. An, V. Lesser, D. Irwin, M. Zink, Automated negotiation with de-commitment for dynamic resource allocation in cloud computing, in:Proc. of AAMAS, 2010, pp. 981–988.

[17] Amazon elastic compute cloud (ec2).URL http://aws.amazon.com/ec2/

[18] Geni system overview.URL http://www.geni.net/

[19] K. Kc, K. Anyanwu, Scheduling hadoop jobs to meet deadlines, in:Cloud Computing Technology and Science (CloudCom), 2010 IEEE Sec-ond International Conference on, IEEE, 2010, pp. 388–392.

[20] R. I. Davis, A. Burns, A survey of hard real-time scheduling for mul-tiprocessor systems, ACM Computing Surveys (CSUR) 43 (4) (2011)35.

[21] P. Minder, S. Seuken, A. Bernstein, M. Zollinger, Crowdmanager-combinatorial allocation and pricing of crowdsourcing tasks with timeconstraints, in: Workshop on Social Computing and User GeneratedContent in conjunction with ACM Conference on Electronic Commerce(ACM-EC 2012), 2012.

[22] S. Ponda, J. Redding, H.-L. Choi, J. P. How, M. Vavrina, J. Vian, De-centralized planning for complex missions with dynamic communicationconstraints, in: American Control Conference (ACC), 2010, IEEE, 2010,pp. 3998–4003.

25

http://aws.amazon.com/ec2/

http://aws.amazon.com/ec2/

http://www.geni.net/

http://www.geni.net/

[23] P. V. Krishna, Honey bee behavior inspired load balancing of tasks incloud computing environments, Applied Soft Computing 13 (5) (2013)2292–2303.

[24] C. Li, L. Li, Energy constrained resource allocation optimization for mo-bile grids, Journal of Parallel and Distributed Computing 70 (3) (2010)245–258.

[25] J. Collins, W. Ketter, M. Gini, B. Mobasher, A multi-agent negotiationtestbed for contracting tasks with temporal and precedence constraints,International Journal of Electronic Commerce 7 (2002) 35–58.

[26] N. Gatti, D. Giunta, S. Marino, Alternating-offers bargaining with one-sided uncertain deadlines: an efficient algorithm, Artificial Intelligence172 (8) (2008) 1119–1157. doi:10.1016/j.artint.2007.11.007.URL http://dl.acm.org/citation.cfm?id=1354945.1355159

[27] N. Jennings, P. Faratin, A. Lomuscio, S. Parsons, M. Wooldridge,C. Sierra, Automated negotiation: prospects, methods and challenges,Group Decision and Negotiation 10 (2) (2001) 199–215.

[28] J. Cerquides, U. Endriss, A. Giovannucci, J. A. Rodrıguez-Aguilar, Bid-ding languages and winner determination for mixed multi-unit combina-torial auctions, Institute for Logic, Language and Computation (ILLC),University of Amsterdam, 2006.

[29] W. E. Walsh, M. P. Wellman, Decentralized supply chain formation: Amarket protocol and competitive equilibrium analysis, J. Artif. Intell.Res.(JAIR) 19 (2003) 513–567.

[30] C. Li, K. Sycara, A. Scheller-Wolf, Combinatorial coalition formation formulti-item group-buying with heterogeneous customers, Decision Sup-port Systems 49 (1) (2010) 1–13.

[31] N. Edalat, W. Xiao, N. Roy, S. K. Das, M. Motani, Combinatorialauction-based task allocation in multi-application wireless sensor net-works, in: The 9th International Conference on Embedded and Ubiqui-tous Computing (EUC), 2011, pp. 174–181.

[32] S. Kraus, O. Shehory, G. Taase, Coalition formation with uncertainheterogeneous information, in: Proc. of AAMAS, 2003, pp. 1–8.

26



http://dx.doi.org/10.1016/j.artint.2007.11.007


[33] M. Winsper, M. Chli, Decentralized supply chain formation using max-sum loopy belief propagation, Computational Intelligence 29 (2) (2013)281–309.

[34] F. Ramezani, J. Lu, J. Taheri, F. K. Hussain, Evolutionary algorithm-based multi-objective task scheduling optimization model in cloud envi-ronments, World Wide Web 18 (6) (2015) 1737–1757.

[35] J. Jiang, J. Lu, G. Zhang, G. Long, Optimal cloud resource auto-scalingfor web applications, in: Cluster, Cloud and Grid Computing (CCGrid),2013 13th IEEE/ACM International Symposium on, IEEE, 2013, pp.58–65.

[36] D. Ergu, G. Kou, Y. Peng, Y. Shi, Y. Shi, The analytic hierarchy process:task scheduling and resource allocation in cloud computing environment,The Journal of Supercomputing 64 (3) (2013) 835–848.

[37] S. F. Piraghaj, R. N. Calheiros, J. Chan, A. V. Dastjerdi, R. Buyya,Virtual machine customization and task mapping architecture for effi-cient allocation of cloud data center resources, The Computer Journal(2015) bxv106.

[38] F. Guo, L. Yu, S. Tian, J. Yu, A workflow task scheduling algorithmbased on the resources’ fuzzy clustering in cloud computing environment,International Journal of Communication Systems 28 (6) (2015) 1053–1067.

[39] R. Kindermann, J. L. Snell, et al., Markov random fields and theirapplications, Vol. 1, American Mathematical Society Providence, RI,1980.

[40] M. Iosifescu, Finite Markov processes and their applications, CourierCorporation, 2014.

[41] W. Edwards, How to use multiattribute utility measurement for socialdecisionmaking, Systems, Man and Cybernetics, IEEE Transactions on7 (5) (1977) 326–340.

[42] D. Von Winterfeldt, W. Edwards, et al., Decision analysis and behavioralresearch, Vol. 604, Cambridge University Press Cambridge, 1986.

27

[43] Y. Kim, M. Krainin, V. Lesser, Effective variants of the max-sum al-gorithm for radar coordination and scheduling, in: Proceedings of the2011 IEEE/WIC/ACM International Conferences on Web Intelligenceand Intelligent Agent Technology-Volume 02, IEEE Computer Society,2011, pp. 357–364.

[44] T. Penya-Alba, M. Vinyals, J. Cerquides, J. A. Rodriguez-Aguilar,A scalable message-passing algorithm for supply chain formation., in:AAAI, 2012.

28

a belief propagation-based method for task …minjie/pub-ps.dir/knowledge-based-yan-16.… ·...

Documents