practical load balancing for content requests in peer-to...

1

Practical Load Balancing for Content Requestsin Peer-to-Peer Networks

Mema Roussopoulos Mary BakerDepartment of Computer Science

Stanford UniversityStanford, California, 94305

�mema, mgbaker � @cs.stanford.edu

http://mosquitonet.stanford.edu/

Abstract—This paper studies the problem of load-balancing the demand for content in a peer-to-peernetwork across heterogeneous peer nodes that holdreplicas of the content. Previous decentralized loadbalancing techniques in distributed systems basetheir decisions on periodic updates containing infor-mation about load or available capacity observed atthe serving entities. We show that these techniquesdo not work well in the peer-to-peer context; eitherthey do not address peer node heterogeneity, or theysuffer from significant load oscillations. We propose anew decentralized algorithm, Max-Cap, based on themaximum inherent capacities of the replica nodes andshow that unlike previous algorithms, it is not tied tothe timeliness or frequency of updates. Yet, Max-Capcan handle the heterogeneity of a peer-to-peer envi-ronment without suffering from load oscillations.

I. Introduction

Peer-to-peer networks are becoming a popular ar-chitecture for content distribution [Ora01]. The ba-sic premise in such networks is that any one of a setof “replica” nodes can provide the requested con-tent, increasing the availability of interesting con-tent without requiring the presence of any particularserving node.

Many peer-to-peer networks push index en-tries throughout the overlay peer network in re-sponse to lookup queries for specific content [gnu],[RFH � 01], [RD01], [SMK � 01], [ZKJ01]. Theseindex entries point to the locations of replica nodeswhere the particular content can be served, and aretypically cached for a finite amount of time, after

which they are considered stale. Until now, how-ever, there has been little focus on how an individualpeer node should choose among the returned indexentries to forward client requests.

One reason for considering this choice is loadbalancing. Some replica nodes may have more ca-pacity to answer queries for content than others,and the system can serve content in a more timelymanner by directing queries to more capable replicanodes.

In this paper we explore the problem of load-balancing the demand for content in a peer-to-peernetwork. This problem is challenging for severalreasons. First, in the peer-to-peer case there isno centralized dispatcher that performs the load-balancing of requests; each peer node individuallymakes its own decision on how to allocate incomingrequests to replicas. Second, nodes do not typicallyknow the identities of all other peer nodes in the net-work, and therefore they cannot coordinate this de-cision with those other nodes. Finally, replica nodesin peer-to-peer networks are not necessarily homo-geneous. Some replica nodes may be very powerfulwith great connectivity, whereas others may havelimited inherent capacity to handle content requests.

Previous load-balancing techniques in the litera-ture base their decisions on periodic or continuousupdates containing information on load or avail-able capacity. We refer to this information as load-balancing information (LBI). These techniques havenot been designed with peer-to-peer networks inmind and thus

� do not take into account the heterogeneity ofpeer nodes (e.g., [GC00], [Mit97]), or

� use techniques such as migration or handoff oftasks that cannot be used in a peer-to-peer en-vironment (e.g., [LL96]), or

� suffer from significant load oscillations, or“herd behavior” [Mit97], where peer nodes si-multaneously forward an unpredictable num-ber of requests to replicas with low reportedload or high reported available capacity, caus-ing them to become overloaded. This herdbehavior defeats the attempt to provide load-balancing.

Most of these techniques also depend on the time-liness of LBI updates. The wide-area nature of peer-to-peer networks and the variation in transfer delaysamong peer nodes makes guaranteeing the timeli-ness of updates difficult. Peer nodes will experi-ence varying degrees of staleness in the LBI up-dates they receive depending on their distance fromthe source of updates. Moreover, maintaining thetimeliness of LBI updates is also costly, since allupdates must travel across the Internet to reach in-terested peer nodes. The smaller the inter-updateperiod and the larger the overlay peer network, thegreater the network traffic overhead incurred by LBIupdates. Therefore, in a peer-to-peer environment,an effective load-balancing algorithm should not becritically dependent on the timeliness of updates.

In this paper we propose a practical load-balancing algorithm, Max-Cap, that makes deci-sions based on the inherent maximum capacities ofthe replica nodes. We define maximum capacity asthe maximum number of content requests per timeunit that a replica claims it can handle. Alterna-tive measures such as maximum (allowed) connec-tions can be used. The maximum capacity is likea contract by which the replica agrees to abide. Ifthe replica cannot sustain its advertised rate, then itmay choose to advertise a new maximum capacity.Max-Cap is not critically tied to the timeliness orfrequency of LBI updates, and as a result, when ap-plied in a peer-to-peer environment, outperforms al-gorithms based on load or available capacity, whosebenefits are heavily dependent on the timeliness ofthe updates.

We show that Max-Cap takes peer node hetero-geneity into account unlike algorithms based on

load. While algorithms based on available capac-ity take heterogeneity into account, we show thatthey can suffer from load oscillations in a peer-to-peer network in the presence of small fluctuations inthe workload even when the workload request rateis well below the total maximum capacities of thereplicas. On the other hand, Max-Cap avoids over-loading replicas in such cases and is more resilientto very large fluctuations in workload. This is be-cause a key advantage of Max-Cap is that it usesinformation that is not affected by changes in theworkload.

Since it is most probable that each replica nodewill run other applications besides the peer-to-peercontent distribution application, Max-Cap must alsobe able to handle fluctuations in “extraneous load”observed at the replicas. This is load caused by ex-ternal factors such as other applications the users ofthe replica node are running or network conditionsoccurring at the replica node.

We modify Max-Cap to perform load-balancingusing the “honored maximum capacity” of eachreplica. This is the maximum capacity minus theextraneous load observed at the replica. Althoughthe honored maximum capacities may change fre-quently, the changes are independent of fluctuationsin the content request workload. As a result, Max-Cap continues to provide better load-balancing thanavailability-based algorithms even when there arelarge fluctuations in the extraneous load.

In a peer-to-peer environment the expectation isthat the set of participating nodes changes con-stantly. Since replica arrivals to and departures fromthe peer network can affect the information carriedin LBI updates, we also compare Max-Cap againstavailability-based algorithms when the set of repli-cas continuously changes. We show that Max-Capis less affected by changes in the replica set than theavailability-based algorithms.

We evaluate load-based and availability-basedalgorithms and compare them with Max-Cap inthe context of CUP [RB02], a protocol that asyn-chronously builds and maintains caches of indexentries in peer-to-peer networks through ControlledUpdate Propagation. The index entries for a particu-lar content contain IP addresses that point to replicanodes serving the content. Load-balancing deci-sions are made from amongst these cached indices

2

to determine to which of the replica nodes a requestfor that content should be forwarded. CUP period-ically propagates updates of desired index entriesdown a conceptual tree (similar to an application-level multicast tree) whose vertices are interestedpeer nodes. We leverage CUP’s propagation mecha-nism by piggybacking LBI such as load or availablecapacity onto the updates CUP propagates.

The rest of this paper is organized as follows.Section II briefly describes the CUP protocol andhow we use it to propagate the load-balancing in-formation necessary to implement the various load-balancing algorithms across replica nodes. Sec-tion III introduces the algorithms compared. Sec-tion IV presents experimental results showing thatin a peer-to-peer environment, Max-Cap outper-forms the other algorithms with much less or nooverhead. Section V describes related work, andSection VI concludes the paper.

II. CUP Protocol Design

In this section we briefly describe how we leveragethe CUP protocol to study the load-balancing prob-lem in a peer-to-peer context. CUP is a protocol formaintaining caches of index entries in peer-to-peernetworks through Controlled Update Propagation.

CUP supports both structured and unstructurednetworks. In structured networks lookup queriesfor particular content follow a well-defined pathfrom the querying node toward an authority node,which is guaranteed to know the location of the con-tent within the network. In unstructured networkslookup queries are flooded haphazardly throughoutthe network until a node that knows the locationof the content is met. In this paper, we will de-scribe how CUP works within structured networks[RFH � 01], [RD01], [SMK � 01], [ZKJ01].

In CUP every node in the peer-to-peer networkmaintains two logical channels per neighbor: aquery channel and an update channel. The querychannel is used to forward lookup queries for con-tent of interest to the neighbor that is closest to theauthority node for that content. The update channelis used to forward query responses asynchronouslyto a neighbor. These query responses contain setsof index entries that point to nodes holding the con-tent in question. The update channel is also used to

update the index entries that are cached at the neigh-bor.

Figure 1 shows a snapshot of CUP in progress ina network of seven nodes. The four logical chan-nels are shown between each pair of nodes. Theleft half of each node shows the set of content itemsfor which the node is the authority. The right halfshows the set of content items for which the nodehas cached index entries as a result of handlinglookup queries. For example, node A is the author-ity node for content

��and nodes C,D,E,F, and G

have cached index entries for content��

. The pro-cess of querying and updating index entries for aparticular content

�forms a CUP tree whose root

is the authority node for content�

. The branches ofthe tree are formed by the paths traveled by lookupqueries from other nodes in the network. For exam-ple, in Figure 1, node A is the root of the CUP treefor��

and branch�F,D,C,A � has grown as a result

of a lookup query for��

at node F.It is the authority node A for content

��which is

guaranteed to know the location of all nodes, calledcontent replica nodes or simply replicas, that servecontent

��. Replica nodes first send birth messages

to authority A to indicate they are serving content��

. They may also send periodic refreshes or in-validation messages to A to indicate they are stillserving or no longer serving the content. A then for-wards on any birth, refresh or invalidation messagesit receives, which are propagated down the CUP treeto all interested nodes in the network. For example,in Figure 1 any update messages for index entriesassociated with content

��that arrive at A from

replica nodes are forwarded down the��

CUP treeto C at level 1, D and E at level 2, and F and G atlevel 3.

CUP has been extensively studied in [RB02].While the specific update propagation protocol CUPuses has been shown to provide benefits such asgreatly reducing the latency of lookup queries, thespecific CUP protocol semantics are not required forthe purposes of load-balancing. We simply lever-age the update propagation mechanism of CUP topush LBI such as replica load or capacity to inter-ested peer nodes throughout the overlay network.These peer nodes can then use this informationwhen choosing to which replica a client requestshould be forwarded.

3

K1, K5 K3 K4 K2, K5

K6 K1, K3, K5

K1, K3, K4 K5 K7 K1, K2, K3

K1, K2 K3, K4, K5

C

K8, K9 K3, K4

A

F

D

B

E

G

Fig. 1. CUP Trees

III. The Algorithms

We evaluate two different algorithms, Inv-Load andAvail-Cap. Each is representative of a different classof algorithms that have been proposed in the dis-tributed systems literature. We study how these al-gorithms perform when applied in a peer-to-peercontext and compare them with our proposed al-gorithm, Max-Cap. These three algorithms dependon different LBI being propagated, but their overallgoal is the same: to balance the demand for contentfairly across the set of replicas providing the con-tent. In particular, the algorithm should avoid over-loading some replicas while underloading others,especially when the aggregate capacity of all repli-cas is enough to handle the content request work-load. Moreover, the algorithm should prevent indi-vidual replicas from oscillating between being over-loaded and underloaded.

Oscillation is undesirable for two reasons. First,many applications limit the number of requests ahost can have outstanding. This means that when areplica node is overloaded, it will drop any request itreceives. This forces the requesting client to resendits request which has a negative impact on responsetime. Even for applications that allow requests tobe queued while a replica node is overloaded thequeueing delay incurred will also increase the av-erage response time. Second, in a peer-to-peer net-work, the issue of fairness is sensitive. The owners

of replica nodes are likely not to want their nodes tobe overloaded while other nodes in the network areunderloaded. An algorithm that can fairly distributethe request workload without causing replicas to os-cillate between being overloaded and underloaded ispreferable.

We describe each of the algorithms we evaluatein turn:

Allocation Proportional to Inverse Load (Inv-Load). There are many load-balancing algorithmsthat base the allocation decision on the load ob-served at and reported by each of the serving enti-ties (see Related Work Section V). The representa-tive load-based algorithm we examine in this paperis Inv-Load, based on the algorithm presented byGenova et al. [GC00]. In this algorithm, each peernode in the network chooses to forward a request toa replica with probability inversely proportional tothe load reported by the replica. This means that thereplica with the smallest reported load (as of the lastreport received) will receive the most requests fromthe node. Load is defined as the number of requestarrivals at the replica per time unit. Other possibleload metrics include the number of request connec-tions open at the replica at reporting time [AB00] orthe request queue length at the replica [Dah99].

The Inv-Load algorithm has been shown to per-form as well as or better than other proposed algo-rithms in a homogeneous environment and for thisreason we focus on this algorithm in this study. Aswe will see in Section IV-A, Inv-Load is not de-signed to handle replica node heterogeneity.

Allocation Proportional to Available Capacity(Avail-Cap). In this algorithm, each peer nodechooses to forward a request to a replica with proba-bility proportional to the available capacity reportedby the replica. Available capacity is the maximumrequest rate a replica can handle minus the load (ac-tual request rate) experienced at the replica. Thisalgorithm is based on the algorithm proposed byZhu et al. [ZYZ � 98] for load sharing in a clus-ter of heterogeneous servers. Avail-Cap takes intoaccount heterogeneity because it distinguishes be-tween nodes that experience the same load but havedifferent maximum capacities.

Intuitively, Avail-Cap seems like it should work;it handles heterogeneity by sending more requests tothe replicas that are currently more capable. Repli-

4

cas that are overloaded report an available capacityof zero and are excluded from the allocation deci-sion until they once more report a positive availablecapacity. Unfortunately, as we will show in SectionIV-B, this exclusion can cause Avail-Cap to sufferfrom wild load oscillations.

Both Inv-Load and Avail-Cap implicitly assumethat the load or available capacity reported by areplica remains roughly constant until the next re-port. Since both these metrics are directly affectedby changes in the request workload, both algorithmsrequire that replicas periodically update their LBI.(We assume replicas are not synchronized in whenthey send reports.) Decreasing the period betweentwo consecutive LBI updates increases the timeli-ness of the LBI at a cost of higher overhead (in num-ber of updates pushed through the peer-to-peer net-work). In large peer-to-peer networks, there may beseveral levels in the CUP tree down which updateswill have to travel, and the time to do so could be onthe order of seconds.

Allocation Proportional to Maximum Capacity(Max-Cap). This is the algorithm we propose. Inthis algorithm, each peer node chooses to forwarda request to a replica with probability proportionalto the maximum capacity of the replica. The max-imum capacity is a contract each replica advertisesindicating the number of requests the replica claimsto handle per time unit. Unlike load and availablecapacity, the maximum capacity of a replica is notaffected by changes in the content request workload.Therefore, Max-Cap does not depend on the timeli-ness of the LBI updates. In fact, replicas only pushupdates down the CUP tree when they choose to ad-vertise a new maximum capacity. This choice de-pends on extraneous factors that are unrelated to andindependent of the workload (see Section IV-D). Ifreplicas rarely choose to change contracts, Max-Capincurs near-zero overhead. We believe that this in-dependence of the timeliness and frequency of up-dates makes Max-Cap practical and elegant for usein peer-to-peer networks.

IV. Experiments

In this section we describe experiments that mea-sure the ability of the Inv-Load, Avail-Cap and

Max-Cap algorithms to balance requests for con-tent fairly across the replicas holding the con-tent. We simulate a content-addressable network(CAN) [RFH � 01] using the Stanford Narses sim-ulator [MGB01]. A CAN is an example of a struc-tured peer-to-peer network, defined in Section II. Ineach of these experiments, requests for a specificpiece of content are posted at nodes throughout theCAN network for 3000 seconds. Using the CUPprotocol described in Section II, a node that receivesa content request from a local client retrieves a setof index entries pointing to replica nodes that servethe content. The node applies a load-balancing al-gorithm to choose one of the replica nodes. It thenpoints the local client making the content request atthe chosen replica.

The simulation input parameters include: thenumber of nodes in the overlay peer-to-peer net-work, the number of replica nodes holding the con-tent of interest, the maximum capacities of thereplica nodes, the distribution of content requestinter-arrival times, a seed to feed the random num-ber generators that drive the content request arrivalsand the allocation decisions of the individual nodes,and the LBI update period, which is the amount oftime each replica waits before sending the next LBIupdate for the Inv-Load and Avail-Cap algorithms.

We assign maximum capacities to replica nodesby applying results from recent work that measuresthe upload capabilities of nodes in Gnutella net-works [SGG02]. This work has found that for theGnutella network measured, around 10% of nodesare connected through dial-up modems, 60% areconnected through broadband connections such ascable modem or DSL where the upload speed isabout ten times that of dial-up modems, and theremaining 30% have high-end connections withupload speed at least 100 times that of dial-upmodems. Therefore we assign maximum capacitiesof 1, 10, and 100 requests per second to nodes withprobabilty of 0.1, 0.6, and 0.3, respectively.

In all the experiments we present in this paper,the number of nodes in the network is 1024, eachindividually deciding how to distribute its incomingcontent requests across the replica nodes. We useboth Poisson and Pareto request inter-arrival distri-butions, both of which have been found to hold inpeer-to-peer networks [Cao02], [Mar02].

5

We present five experiments. First we show thatInv-Load cannot handle heterogeneity. We thenshow that while Avail-Cap takes replica heterogene-ity into account, it can suffer from significant loadoscillations caused by even small fluctuations in theworkload. We compare Max-Cap with Avail-Capfor both Poisson and bursty Pareto arrivals. We alsocompare the effect on the performances of Avail-Cap and Max-Cap when replicas continuously enterand leave the system. Finally, we study the effect onMax-Cap when replicas cannot always honor theiradvertised maximum capacities because of signifi-cant extraneous load.

A. Inv-Load and Heterogeneity

In this experiment, we examine the performance ofInv-Load in a heterogeneous peer-to-peer environ-ment. We use a fairly short inter-update period ofone second, which is quite aggressive in a largepeer-to-peer network. We have ten replica nodesthat serve the content item of interest, and we gen-erate request rates for that item according to a Pois-son process with an arrival rate that is 80% of thetotal maximum capacities of the replicas. Undersuch a workload, a good load-balancing algorithmshould be able to avoid overloading some replicaswhile underloading others. Figure 2 shows a scat-terplot of how the utilization of each replica pro-ceeds with time when using Inv-Load. We defineutilization as the request arrival rate observed bythe replica divided by the maximum capacity of thereplica. In this graph, we do not distinguish amongpoints of different replicas. We see that throughoutthe simulation at any point in time, some replicasare severely overutilized (over 250%) while othersare lightly underutilized (around 25%).

Figure 3 shows for each replica, the percentageof all received requests that arrive while the replicais overloaded. This measurement gives a true pic-ture of how well a load-balancing algorithm worksfor each replica. In Figure ??b, the replicas thatreceive almost 100% of their requests while over-loaded (i.e., replicas 0-6) are the low and middle-end replicas. The replicas that receive almost norequests while overloaded (i.e., replicas 7-9) are thehigh-end replicas. We see that Inv-Load penalizesthe less capable replicas while giving the high-end

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)

100% UtilizationReplica Utilization

Fig. 2. Replica Utilization versus Time for InvLoad withheterogeneous replicas.

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 3. Percentage Overloaded Queries versus ReplicaID for Inv-Load with heterogeneous replicas.

replicas an easy time.Inv-Load is designed to perform well in a homo-

geneous environment. When applied in a heteroge-neous environment such as a peer-to-peer network,it fails. As we will see in the next section Max-Cap is much better suited. Apart from showing thatMax-Cap has comparable load balancing capabilitywith no overhead in a homogeneous environment(see Appendix), we do not consider Inv-Load in theremaining experiments as our focus here is on het-erogeneous environments.

B. Avail-Cap versus Max-Cap

In this set of experiments we examine the perfor-mance of Avail-Cap and compare it with Max-Cap.

1) Poisson Request Arrivals: In Figures 4 and5 we show the replica utilization versus time for an

6

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 4. Replica Utilization versus Time for Avail-Capwith heterogeneous replicas.

experiment with ten replicas with a Poisson requestarrival rate of 80% the total maximum capacities ofthe replicas. For Avail-Cap, we use an inter-updateperiod of one second. For Max-Cap, this parame-ter is inapplicable since replica nodes do not sendupdates unless they experience extraneous load (seeSection IV-D). We see that Avail-Cap consistentlyoverloads some replicas while underloading others.In contrast, Max-Cap tends to cluster replica utiliza-tion at around 80%. We ran this experiment witha range of Poisson lambda rates and found similarresults for rates that were 60-100% the total maxi-mum capacities of the replicas. Avail-Cap consis-tently overloads some replicas while underloadingothers whereas Max-Cap clusters replica utilizationat around X% utilization, where X is the overall re-quest rate divided by the total maximum capacitiesof the replicas.

It turns out that in Avail-Cap, unlike Inv-Load,it is not the same replicas that are consistentlyoverloaded or underloaded throughout the experi-ment. Instead, from one instant to the next, indi-vidual replicas oscillate between being overloadedand severely underloaded.

We can see a sampling of this oscillation by look-ing at the utilizations of some individual replicasover time. In Figures 6-11, we plot the utilizationover a one minute period in the experiment for a rep-resentative replica from each of the replica classes(low, medium, and high maximum capacity). Wealso plot the ratio of the overall request rate to thetotal maximum capacities of the replicas and the

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 5. Replica Utilization v. Time for Max-Cap withheterogeneous replicas.

0

0.5

1

1.5

2

2.5

3

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)

Low Cap ReplicaRequestRate/SumMaxCaps

100% Utilization

Fig. 6. Low-end Replica Utilization versus Time forAvail-Cap, Poisson arrivals.

line �� showing 100% utilization. We see thatfor all replica classes, Avail-Cap suffers from sig-nificant oscillation when compared with Max-Capwhich causes little or no oscillation. This behavioroccurs throughout the experiment.

Figures 12 and 13 show the percentage of re-quests that arrive at each replica while the replicais overloaded for Avail-Cap and Max-Cap respec-tively. We see that Max-Cap achieves much lowerpercentages than Avail-Cap.

We also see in Figure 13 that Max-Cap ex-hibits a step-like behavior where the low-capacityreplica (replica 1) is overloaded for about 35% ofits queries, the middle-capacity replicas (replicas 0and 2-6) are each overloaded for about 14% of theirqueries, and the high-capacity replicas (replicas 7-9)are each overloaded for about 0.1% of their queries.

7

0

0.5

1

1.5

2

2.5

3

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 7. Low-end Replica Utilization versus Time forMax-Cap, Poisson arrivals.

0

0.5

1

1.5

2

2.5

3

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)

Medium Cap ReplicaRequestRate/SumMaxCaps

100% Utilization

Fig. 8. Medium-end Replica Utilization versus Time forAvail-Cap, Poisson arrivals.

0

0.5

1

1.5

2

2.5

3

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 9. Medium-end Replica Utilization versus Time forMax-Cap, Poisson arrivals.

0

0.5

1

1.5

2

2.5

3

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)

High Cap ReplicaRequestRate/SumMaxCaps

100% Utilization

Fig. 10. High-end Replica Utilization versus Time forAvail-Cap, Poisson arrivals.

0

0.5

1

1.5

2

2.5

3

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 11. High-end Replica Utilization versus Time forMax-Cap, Poisson arrivals.

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 12. Percentage Overloaded Queries versus ReplicaID for Avail-Cap, with inter-update period of 1 second.

8

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 13. Percentage Overloaded Queries versus ReplicaID for Max-Cap.

To verify that this step effect is not a random coinci-dence, we ran a series of experiments, with ten repli-cas per experiment, and Poisson arrivals of 80%the total maximum capacity, each time varying theseed fed to the simulator. In Figure 14, we showthe overloaded percentages for ten of these exper-iments. On the x-axis we order replicas accordingto maximum capacity, with the low-capacity repli-cas plotted first (replica IDs 1 through 10), followedby the middle-capacity replicas (replica IDs 11-70),followed by the high-capacity replicas (replica IDs71-100). From the figure we see that the step behav-ior consistently occurs. This step behavior occursbecause the lower-capacity replicas have less toler-ance for noise in the random coin tosses the nodesperform while assigning requests. They also haveless tolerance for small fluctuations in the requestrate. As a result, lower-capacity replicas are over-loaded more easily than higher-capacity replicas.

Figure 12 shows that Avail-Cap with an inter-update period of one second causes much higherpercentages than Max-Cap (more than twice as highfor the medium and high-end replicas). Avail-Capalso causes fairly even overloaded percentages ataround 40%. Again, to verify this evenness, in Fig-ure 15, we show for a series of ten experiments,the percentage of requests that arrive at each replicawhile the replica is overloaded. We see that Avail-Cap consistently achieves roughly even percentages(at around 40%) across all replica types in contrastto the step effect observed by Max-Cap. This can beexplained by looking at the oscillations observed by

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 14. Percentage Overloaded Queries versus ReplicaID for Max-Cap for ten experiments.

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 15. Percentage Overload Queries versus Replica IDfor Avail-Cap with inter-update period of 1 second, forten experiments.

replicas in Figures 6-11. In Avail-Cap, each replicais overloaded for roughly the same amount of timeregardless of whether it is a low, medium or high-capacity replica. This means that while each replicais getting the correct proportion of requests, it is re-ceiving them at the wrong time and as a result allthe replicas experience roughly the same overloadedpercentages. In Max-Cap, we see that replicas withlower maximum capacity are overloaded for moretime that higher-capacity replicas. Consequently,higher-capacity replicas tend to have smaller over-load percentages than lower-capacity replicas.

The performance of Avail-Cap is highly depen-dent on the inter-update period used. We find thatas we increase the period and available capacity up-dates grow more stale, the performance of Avail-

9

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 16. Percentage Overload Queries versus Replica IDfor Avail-Cap with inter-update period of 10 seconds, forten experiments.

Cap suffers more. As an example, in Figure 16,we show the overloaded query percentages in thesame series of ten experiments for Avail-Cap with aperiod of ten seconds. The overloaded percentagesjump up to about 80% across the replicas.

In a peer-to-peer environment, we argue thatMax-Cap is a more practical choice than Avail-Cap.First, Max-Cap typically incurs no overhead. Sec-ond, Max-Cap can handle workload rates that arebelow 100% the total maximum capacities and canhandle small fluctuations in the workload as are typ-ical in Poisson arrivals.

A question remaining is how do Avail-Cap andMax-Cap compare when workload rates fluctuatebeyond the total maximum capacities of the repli-cas? Such a scenario can occur for example whenrequests are bursty, as when inter-request arrivaltimes follow a Pareto distribution. We examinePareto arrivals next.

2) Pareto Request Arrivals: Recent work hasobserved that in some peer-to-peer networks, re-quest inter-arrivals exhibit burstiness on severaltime scales [Mar02], making the Pareto distributiona good candidate for modeling these inter-arrivaltimes.

The Pareto distribution has two parameters as-sociated with it: the shape parameter �� andthe scale parameter �� . The cumulative dis-tribution function of inter-arrival time durations is��

� ��

This distribution is heavy-tailed with unbounded variance when �� . For

�� , the average number of query arrivals pertime unit is equal to

� �! �" �� . For �#� � � , the expec-tation of an inter-arrival duration is unbounded andtherefore the average number of query arrivals pertime unit is 0.

Typically, Pareto request arrivals are character-ized by frequent and intense bursts of requests fol-lowed by idle periods of varying lengths. During thebursts, the average request arrival rate can be manytimes the total maximum capacities of the replicas.We present a representative experiment in which �and � are 1.1 and 0.000346 respectively. These par-ticular settings cause bursts of up to 230% the to-tal maximum capacities of the replicas. With suchintense bursts, no load-balancing algorithm can beexpected to keep replicas underloaded. Instead thebest an algorithm can do is to have the oscillationobserved by each replica’s utilization match the os-cillation of the ratio of overall request rate to totalmaximum capacities.

In Figures 17-22 we plot the same representativereplica utilizations over a one minute period in theexperiment. We also plot the ratio of the overall re-quest rate to the total maximum capacities as wellas the � � �$�%�'& utilization line. From the figureswe see that Avail-Cap suffers from much wilder os-cillation than Max-Cap, causing much higher peaksand lower valleys in replica utilization than Max-Cap. Moreover, Max-Cap adjusts better to the fluc-tuations in the request rate; the utilization curves forMax-Cap tend to follow the ratio curve more closelythan those for Avail-Cap.

(Note that idle periods contribute to the drops inutilization of replicas in this experiment. For exam-ple, an idle period occurs between times 324 and332 at which point we see a decrease in both theratio and the replica utilization.)

3) Why Avail-Cap Can Suffer: From the ex-periments above we see that Avail-Cap can sufferfrom severe oscillation even when the overall re-quest rate is well below (e.g., 80%) the total max-imum capacities of the replicas. The reason whyAvail-Cap does not balance load well here is that avicious cycle is created where the available capac-ity update of one replica affects a subsequent up-date of another replica. This in turn affects laterallocation decisions made by nodes which in turnaffects later replica updates. This description be-

10

0

0.5

1

1.5

2

2.5

3

3.5

4

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 17. Low-capacity Replica Utilization versus Timefor Avail-Cap, Pareto arrivals.

0

0.5

1

1.5

2

2.5

3

3.5

4

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 18. Low-capacity Replica Utilization versus Timefor Max-Cap, Pareto arrivals.

0

0.5

1

1.5

2

2.5

3

3.5

4

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 19. Medium-capacity Replica Utilization versusTime for Avail-Cap, Pareto arrivals.

0

0.5

1

1.5

2

2.5

3

3.5

4

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 20. Medium-capacity Replica Utilization versusTime for Max-Cap, Pareto arrivals.

0

0.5

1

1.5

2

2.5

3

3.5

4

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 21. High-capacity Replica Utilization versus Timefor Avail-Cap, Pareto arrivals.

0

0.5

1

1.5

2

2.5

3

3.5

4

270 280 290 300 310 320 330

Util

izat

ion

Time (seconds)


100% Utilization

Fig. 22. High-capacity Replica Utilization versus Timefor Max-Cap, Pareto arrivals.

11

comes more concrete if we consider what happenswhen a replica is overloaded.

In Avail-Cap, if a replica becomes overloaded, itreports an available capacity of zero. This reporteventually reaches all peer nodes, causing them tostop redirecting requests to the replica. The exclu-sion of the overloaded replica from the allocationdecision shifts the entire burden of the workloadto the other replicas. This can cause other repli-cas to overload and report zero available capacitywhile the excluded replica experiences a sharp de-crease in its utilization. This sharp decrease causesthe replica to begin reporting positive available ca-pacity which begins to attract requests again. Sincein the meantime other replicas have become over-loaded and excluded from the allocation decision,the replica receives a flock of requests which causeit to become overloaded again. As we observedin previous sections, a replica can experience wildand periodic oscillation where its utilization contin-uously rises above its maximum capacity and fallssharply.

In Max-Cap, if a replica becomes overloaded, theoverload condition is confined to that replica. Thesame is true in the case of underloaded replicas.Since the overload/underload situations of the repli-cas are not reported, they do not influence follow-upLBI updates of other replicas. It is this key propertythat allows Max-Cap to avoid herd behavior.

There are situations however where Avail-Capperforms well without suffering from oscillation(see Section IV-C). We next describe the factorsthat affect the performance of Avail-Cap to get aclearer picture of when the reactive nature of Avail-Cap is beneficial (or at least not harmful) and whenit causes oscillation.

4) Factors Affecting Avail-Cap: There are fourfactors that affect the performance of Avail-Cap: theinter-update period � , the inter-request period � ,the amount of time � it takes for all nodes in thenetwork to receive the latest update from a replica,and the ratio of the overall request rate to the totalmaximum capacities of the replicas. We examinethese factors by considering three cases:

Case 1: � is much smaller than � ( �� ),and � is sufficiently small so that when a replicapushes an update, all nodes in the CUP tree receivethe update before the next request arrival in the net-

work. In this case, Avail-Cap performs well sinceall nodes have the latest load-balancing informationwhenever they receive a request.

Case 2: � is long relative to � ( �� ) and theoverall request rate is less than about 60% the to-tal maximum capacities of the replicas. (This 60%threshold is specific to the particular configurationof replicas we use: 10% low, 60% medium, 30%high. Other configurations have different thresholdpercentages that are typically well below the totalmaximum capacities of the replicas.) In this case,when a particular replica overloads, the remainingreplicas are able to cover the proportion of requestsintended for the overloaded replica because there isa lot of extra capacity in the system. As a result,Avail-Cap avoids oscillations. We see experimentalevidence for this in Section IV-C. However, over-provisioning to have enough extra capacity in thesystem so that Avail-Cap can avoid oscillation inthis particular case seems a high price to pay forload stability.

Case 3: � is long relative to � ( �� ) andthe overall request rate is more than about 60% thetotal maximum capacities of the replicas. In thiscase, as we observe in the experiments above, Avail-Cap can suffer from oscillation. This is because ev-ery request that arrives directly affects the availablecapacity of one of the replicas. Since the requestrate is greater than the update rate, an update be-comes stale shortly after a replica has pushed it out.However, the replica does not inform the nodes ofits changing available capacity until the end of itscurrent update period. By that point many requestshave arrived and have been allocated using the pre-vious, stale available capacity information.

In Case 3, Avail-Cap can suffer even if � � �and updates were to arrive at all nodes immediatelyafter being issued. This is because all nodes wouldsimultaneously exclude an overloaded replica fromthe allocation decision until the next update is is-sued. As � increases, the staleness of the reportonly exacerbates the performance of Avail-Cap.

In a large peer-to-peer network (more than 1000nodes) we expect that � will be on the order ofseconds since current peer-to-peer networks withmore than 1000 nodes have diameters ranging froma handful to several hops [RF02]. We consider �= 1 second to be as small (and aggressive) an inter-

12

update period as is practical in a peer-to-peer net-work. In fact even one second may be too aggres-sive due to the overhead it generates. This meansthat when particular content experiences high popu-larity, we expect that typically �� . Undersuch circumstances Avail-Cap is not a good load-balancing choice. For less popular content, where�� , Avail-Cap is a feasible choice, al-though it is unclear whether load-balancing acrossthe replicas is as urgent here, since the request rateis low.

The performance of Max-Cap is independent ofthe values of � , � , and � . More importantly, Max-Cap does not require continuous updates; replicasissue updates only if they choose to re-issue newcontracts to report changes in their maximum ca-pacities. (See Section IV-D). Therefore, we believethat Max-Cap is a more practical choice in a peer-to-peer context than Avail-Cap.

C. Dynamic Replica Set

A key characteristic of peer-to-peer networks is thatthey are subject to constant change; peer nodes con-tinuously enter and leave the system. In this exper-iment we compare Max-Cap with Avail-Cap whenreplicas enter and leave the system. We present re-sults here for a Poisson request arrival rate that is80% the total maximum capacities of the replicas.

We present two dynamic experiments. In both ex-periments, the network starts with ten replicas andafter a period of 600 seconds, movement into andout of the network begins. In the first experiment,one replica leaves and one replica enters the net-work every 60 seconds. In the second and muchmore dynamic experiment, five replicas leave andfive replicas enter the network every 60 time units.The replicas that leave are randomly chosen. Thereplicas that enter the network enter with maximumcapacities of 1, 10, and 100 with probability of 0.10,0.60, and 0.30 respectively as in the initial alloca-tion. This means that the total maximum capacitiesof the active replicas in the network varies through-out the experiment, depending on the capacities ofthe entering replicas.

Figures 23 and 24 show for the first dynamic ex-periment the utilization of active replicas through-out time as observed for Avail-Cap and Max-Cap.

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)

SumMaxFluctuationReplica Utilization

Fig. 23. Replica Utilization versus Time for Avail-Capwith a dynamic replica set. One replica enters and leavesevery 60 seconds.

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 24. Replica Utilization versus Time for Max-Capwith a dynamic replica set. One replica enters and leavesevery 60 seconds.

Note that points with zero utilization indicate newlyentering replicas. The jagged line plots the ratio ofthe current sum of maximum capacities in the net-work, �� , to the original sum of maximum ca-pacities, �� . With each change in the replica set,the replica utilizations for both Avail-Cap and Max-Cap change. Replica utilizations rise when �� falls and vice versa.

From the figure we see that between times 1000and 1820, �� is between 1.75 and 2 times �� ,and is more than double the overall workload rate of� �� . During this time period, Avail-Cap per-forms quite well because the workload rate is notvery demanding and there is plenty of extra capac-ity in the system (Case 2 above). However, when at

13

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 25. Percentage Overloaded Queries versus ReplicaID for Avail-Cap with a dynamic replica set. One replicaenters and leaves every 60 seconds.

time 1940 �� falls back to � �� , we see that bothalgorithms exhibit the same behavior as they do atthe start, between times 0 and 600. Max-Cap read-justs nicely and clusters replica utilization at around80%, while Avail-Cap starts to suffer again.

Figures 25 and 26 show for the first dynamicexperiment the percentage of queries that were re-ceived by each replica while the replica was over-loaded for Avail-Cap and Max-Cap. Replicas thatentered and departed the network throughout thesimulation were chosen from a pool of 50 replicas.Those replicas in the pool which did not participatein this experiment do not have a bar associated withtheir ID in the figure. From the figure, we see thatMax-Cap achieves smaller overload query percent-ages across all replica IDs.

Figures 27 and 28 show the utilization scatterplotand Figures 29 and 30 show the overloaded querypercentage for the second dynamic experiment. Wesee that changing half the replicas every 60 secondscan dramatically affect � �� . For example, when� �� drops to � � � �� at time 2161, we see the uti-lizations rise dramatically for both Avail-Cap andMax-Cap. This is because during this period theworkload rate is four times that of � �� . Howeverby time 2401, �� has risen to �

� � � �� which al-lows for both Avail-Cap and Max-Cap to adjust anddecrease the replica utilization. At the next replicaset change at time 2461, �� equals � �� . Dur-ing the next minute we see that Max-Cap overloadsvery few replicas whereas Avail-Cap does not re-

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 26. Percentage Overloaded Queries versus ReplicaID for Max-Cap with a dynamic replica set. One replicaenters and leaves every 60 seconds.

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 27. Replica Utilization versus Time for Avail-Capwith a dynamic replica set. Half the replicas enter andleave every 60 seconds.

cuperate as well. Similarly, when examining theoverloaded query percentage we see that Max-Capachieves smaller percentages when compared withAvail-Cap.

The two dynamic experiments we have describedabove show two things; first, when the workloadis not very demanding and there is unused capac-ity, the behaviors of Avail-Cap and Max-Cap aresimilar However, Avail-Cap suffers more as over-all available capacity decreases. Second, Avail-Capis affected more by short-lived fluctuations (in par-ticular, decreases) in total maximum capacity thanMax-Cap. This is because the reactive nature ofAvail-Cap causes it to adapt abruptly to changes incapacities, even when these changes are short-lived.

14

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 28. Replica Utilization versus Time for Max-Capwith a dynamic replica set. Half the replicas enter andleave every 60 seconds.

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 29. Percentage Overloaded Queries versus ReplicaID for Avail-Cap with a dynamic replica set. Half thereplicas enter and leave every 60 seconds.

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 30. Percentage Overloaded Queries versus ReplicaID for Max-Cap with a dynamic replica set. Half thereplicas enter and leave every 60 seconds.

D. Extraneous Load

When replicas can honor their maximum capacities,Max-Cap avoids the oscillation that Avail-Cap cansuffer, and does so with no update overhead. Oc-casionally, some replicas may not be able to honortheir maximum capacities because of extraneousload caused by other applications running on thereplicas or network conditions unrelated to the con-tent request workload.

To deal with the possibility of extraneous load,we modify the Max-Cap algorithm slightly to workwith honored maximum capacities. A replica’s hon-ored maximum capacity is its maximum capacityminus the extraneous load it is experiencing. Thealgorithm changes slightly; a peer node chooses areplica to which to forward a content request withprobability proportional to the honored maximumcapacity advertised by the replica. This meansthat replicas may choose to send updates to indi-cate changes in their honored maximum capacities.However, as we will show, the behavior of Max-Capis not tied to the timeliness of updates in the wayAvail-Cap is.

We view the honored maximum capacity reportedby a replica as a contract. If the replica cannot ad-here to the contract or has extra capacity to give,but does not report the deficit or surplus, then thatreplica alone will be affected and may be overloadedor underloaded since it will be receiving a requestshare that is proportional to its previous advertisedhonored maximum capacity.

If, on the other hand, a replica chooses to issuea new contract with the new honored maximum ca-pacity, then this new update can affect the load bal-ancing decisions of the nodes in the peer networkand the workload could shift to the other replicas.This shift in workload is quite different from that ex-perienced by Avail-Cap when a replica reports over-load and is excluded. The contracts of any otherreplica will not be affected by this workload shift.Instead, the contract is are solely affected by theextraneous load that replica experiences which isindependent of the extraneous load experienced bythe replica issuing the new contract. This is unlikeAvail-Cap where the available capacity reported byone replica directly affects the available capacitiesof the others.

15

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 31. Replication Utilization versus Time for Max-Cap with extraneous load and an inter-update period ofone second.

In this section we study the performance of Max-Cap in an experiment where all replica nodes arecontinuously issuing new contracts. Specifically,for each of ten replicas, we inject extraneous loadinto the replica once a second. The extraneous loadinjected is randomly chosen to be anywhere be-tween 0% and 50% of the replica’s original maxi-mum capacity. Figures 31 and 32 show the replicautilization versus time and the overloaded querypercentages for Max-Cap with an inter-update pe-riod of 1 second. The jagged line in Figure 31 showsthe total honored maximum capacities over time.Since throughout the experiment each replica’s hon-ored maximum capacity varies between 50% and100% its original maximum capacity, the total max-imum capacity is expected to hover at around 75%the original total maximum capacity and we see thatthe jagged line hovers around this value. We there-fore generate Poisson request arrivals with an aver-age rate that is 80% of this value to keep consistentwith our running example of 80% workload rates.

From the figures, we see that Max-Cap continuesto cluster replica utilization at around 80%, but thereare more overloaded replicas throughout time thanwhen compared with the experiment in which allreplicas adhere to their contracts all the time (Fig-ure 5). We also see that the overloaded percentagesare higher than before (Figure 13). The reason forthis performance degradation is that the randomlyinjected load (of 0% to 50%) can cause sharp risesand falls in the reported contract of each replica

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 32. Percentage Overloaded Queries versus ReplicaID, Max-Cap with extraneous load and an inter-updateperiod of one second.

from one second to the next. Since the change is sorapid, and updates take on the order of seconds toreach all allocating nodes, allocation decisions arecontinuously being made using stale information.

In the next experiment we use the same param-eters as above but we change the update period to10 seconds. Figures 33 and 34 show the utilizationand overloaded percentages for this experiment. Wesee that the overloaded percentages increase onlyslightly while the overhead of pushing the updatesdecreases by a factor of ten. In contrast, when weperform the same experiment for Avail-Cap, we findthat the overloaded query percentages for Avail-Capincrease from about 55 to more than 80% across allthe replicas when the inter-update period changesfrom 1 to 10 seconds. However, this performancedegradation is not so much due to the fluctuation ofthe extraneous load as it is due to Avail-Cap’s ten-dency to oscillate when the request rate is greaterthan the update rate.

We purposely choose this scenario to test howMax-Cap performs under widely fluctuating extra-neous load on every replica. We generally expectthat extraneous load will not fluctuate so wildly, norwill all replicas issue new contracts every second.Moreover, we expect the inter-update period to beon the order of several seconds or even minutes,which further reduces overhead.

We can view the effect of extraneous load on theperformance of Max-Cap as similar to that seen inthe dynamic replica experiments. When a replica

16

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 33. Replication Utilization versus Time for Max-Cap with extraneous load and an inter-update period often seconds.

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 34. Percentage Overloaded Queries versus ReplicaID for Max-Cap with extraneous load and an inter-updateperiod of ten seconds.

advertises a new honored maximum capacity, it isas if that replica were leaving and being replaced bya new replica with a different maximum capacity.

V. Related Work

Load-balancing has been the focus of many studiesdescribed in the distributed systems literature. Wefirst describe load-balancing techniques that couldbe applied in a peer-to-peer context. We classifythese into two categories, those algorithms wherethe allocation decision is based on load and thosewhere the allocation decision is based on availablecapacity. We then describe other load-balancingtechniques (such as process migration) that cannot

be directly applied in a peer-to-peer context.

A. Load-Based Algorithms

Of the load-balancing algorithms based on load,a very common approach to performing load-balancing is to choose the server with the least re-ported load from among a set of servers. Thisapproach performs well in a homogeneous systemwhere the task allocation is performed by a singlecentralized entity (dispatcher) which has completeup-to-date load information [Web78], [Win77]. Ina system where multiple dispatchers are indepen-dently performing the allocation of tasks, this ap-proach however has been shown to behave badly,especially if load information used is stale [ELZ86],[MTS89], [Mit97], [SKS92]. Mitzenmacher talksabout the “herd behavior” that can occur whenservers that have reported low load are inundatedwith requests from dispatchers until new load infor-mation is reported [Mit97].

Dahlin proposes load interpretation algorithms[Dah99]. These algorithms take into account theage (staleness) of the load information reported byeach of a set of distributed homogeneous servers aswell as an estimate of the rate at which new requestsarrive at the whole system to determine to whichserver to allocate a request.

Many studies have focused on the strategy of us-ing a subset of the load information available. Thisinvolves first randomly choosing a small number,�

, of homogeneous servers and then choosing theleast loaded server from within that set [Mit96],[ELZ86], [VDK96], [ABKU94], [KLH92]. In par-ticular, for homogeneous systems, Mitzenmacher[Mit96] studies the tradeoffs of various choices of�

and various degrees of staleness of load informa-tion reported. As the degree of staleness increases,smaller values of

�are preferable.

Genova et al. [GC00] propose an algorithm,which we call Inv-Load that first randomly selects�

servers. The algorithm then weighs the servers byload information and chooses a server with proba-bility that is inversely proportional to the load re-ported by that server. When

�� , where � is the

total number of servers, the algorithm is shown toperform better than previous load-based algorithmsand for this reason we focus on this algorithm in thispaper.

17

As we see in Section IV-A, algorithms that basethe decision on load do not handle heterogeneity.

B. Available-Capacity-Based Algorithms

Of the load-balancing algorithms based on avail-able capacity, one common approach has beento choose amongst a set of servers based on theavailable capacity of each server [ZYZ � 98] or theavailable bandwidth in the network to each server[CC97]. The server with the highest available ca-pacity/bandwidth is chosen by a client with a re-quest. The assumption here is that the reportedavailable capacity/bandwidth will continue to bevalid until the chosen server has finished servicingthe client’s request. This assumption does not al-ways hold; external traffic caused by other applica-tions can invalidate the assumption, but more sur-prisingly the traffic caused by the application whoseworkload is being balanced can also invalidate theassumption. We see this in Section IV-B.

Another approach is to to exclude servers that failsome utilization threshold and to choose from theremaining servers. Mirchandaney et al. [MTS90]and Shivaratri et al. [SKS92] classify machines aslightly-utilized or heavily-utilized and then chooserandomly from the lightly-utilized servers. Thiswork focuses on local-area distributed systems. Co-lajanni et al. use this approach to enhance round-robin DNS load-balancing across a set of widelydistributed heterogeneous web servers [CYC98],Specifically, when a web server surpasses a utiliza-tion threshold it sends an alarm signal to the DNSsystem indicating it is out of commission. Theserver is excluded from the DNS resolution until itsends another signal indicating it is below thresh-old and free to service requests again. In this work,the maximum capacities of the most capable serversare at most a factor of three that of the least capableservers.

As we see in Section IV-B, when applied inthe context of a peer-to-peer network where manynodes are making the allocation decision and wherethe maximum capacities of the replica nodes candiffer by two orders of magnitude, excluding a serv-ing node temporarily from the allocation decisioncan result in load oscillation.

C. Other Load-balancing Techniques

We now describe load-balancing techniques that ap-pear in the literature but cannot be directly appliedin a peer-to-peer context.

There has been a large body of work devoted tothe problem of load-balancing across a set of serversresiding within a cluster. In some cluster systemsthere is one centralized dispatcher through whichall incoming requests to the cluster arrive. The dis-patcher has full control over the allocation of re-quests to servers [DKMT96], [cis]. In other sys-tems there are multiple dispatchers that make theallocation decision. One common approach is tohave front-end servers sit at the entrance of the clus-ter intercepting incoming requests and allocating re-quests to the back-end servers within the cluster thatactually satisfy the requests [CDR99]. Still othershave requests be evenly routed to servers within thecluster via DNS rotation (described below) or viaa single IP-switch sitting at the front of the cluster(e.g., [fou98]). Upon receiving a request each serverthen decides whether to satisfy the request or to dis-patch it to another server [ZYZ � 98]. Some clustersystems have the dispatchers(s) poll each server or arandom set of servers for load/availability informa-tion just before each allocation decision [AAFL96],[SYC02]. Others have the dispatcher(s) periodicallypoll servers, while still others have servers period-ically broadcast their load-balancing information.Studies that compare the tradeoffs among these in-formation dissemination options within a cluster in-clude [ZYZ � 98], [SYC02].

Regardless of the way this information is ex-changed, cluster-based algorithms take advantage ofthe local-area nature of the cluster network to de-liver timely load-balancing updates. This character-istic does not apply in a peer-to-peer network whereload-balancing updates may have to travel acrossthe Internet.

Most cluster algorithms assume that servers arehomogeneous. The exceptions to this rule includework by Castro et al. [CDR99]. This work as-sumes that servers will have different processing ca-pabilities and allows each server to stipulate a max-imum desirable utilization that is incorporated intothe load-balancing algorithm. The algorithm theyuse assumes that servers are synchronized and send

18

their load updates at the same time. This is nottrue in a peer-to-peer network where replicas can-not be synchronized. Zhu et al. [ZYZ � 98] as-sume servers are heterogeneous and use a metricthat combines available disk capacity and CUP cy-cles to choose a server within the cluster to handlea task [ZYZ � 98]. Their algorithm uses a combina-tion of random polling before selection and randommulticasting of load-balancing information to a se-lect few servers. Both are techniques that would notscale in a large peer-to-peer network.

Another well-studied load-balancing cluster ap-proach is to have heavily loaded servers hand-off requests they receive to other servers withinthe cluster that are less loaded or to have lightlyloaded servers attempt to get tasks from heavilyloaded servers (e.g., [Dan95], [SK90]). This can beachieved through techniques such as HTTP redirec-tion (e.g., [CCY99], [AYI96], [CCY00]) or packetheader rewriting (e.g., [AB00]) or remote script ex-ecution [ZYZ � 98]. HTTP redirection adds addi-tional client round-trip latency for every resched-uled request. TCP/IP hand-off and packet headerrewriting require changes in the OS kernel or net-work interface drivers. Remote script execution re-quires trust between the serving entities.

Similar to task handoff is the technique of pro-cess migration. Process migration to spread job loadacross a set of servers in a local-area distributed sys-tem has been widely studied both in the theoreti-cal literature as well as the systems literature (e.g.,[DO91], [LM93], [DHB95], [PL95], [LL96]). Inthese systems overloaded servers migrate some oftheir currently running processes to lighter loadedservers in an attempt to achieve more equitable dis-tribution of work across the servers.

Both task handoff and process migration requireclose coordination amongst serving entities that canbe afforded in a tightly-coupled communication en-vironment such as a cluster or local-area distributedsystem. In a peer-to-peer network where the replicanodes serving the content may be widely distributedacross the Internet, these techniques are not possi-ble.

A lot of work has looked at balancing load acrossmulti-server homogeneous web sites by leveragingthe DNS service used to provide the mapping be-tween a web page’s URL and the IP address of a

web server serving the URL. Round-robin DNS wasproposed, where the DNS system maps requeststo web servers in a round-robin fashion [KBM94],[AYHI96]. Because DNS mappings have a Time-to-Live (TTL) field associated with them and tendto be cached at the local name server in each do-main, this approach can lead to a large numberof client requests from a particular domain gettingmapped to the same web server during the TTL pe-riod. Thus, round-robin DNS achieves good balanceonly so long as each domain has the same client re-quest rate. Moreover, round-robin load-balancingdoes not work in a heterogeneous peer-to-peer con-text because each serving replica gets a uniformrate of requests regardless of whether it can handlethis rate. Work that takes into account domain re-quest rate improves upon round-robin DNS and isdescribed by Colajanni et al. [CYD97].

Colajanni et al. later extend this work to bal-ance load across a set of widely distributed het-erogeneous web servers [CYC98]. This work pro-poses the use of adaptive TTLs, where the TTL fora DNS mapping is set inversely proportional to thedomain’s local client request rate for the mappingof interest (as reported by the domain’s local nameserver). The TTL is at the same time set to be pro-portional to the chosen web server’s maximum ca-pacity. So web servers with high maximum capac-ity will have DNS mappings with longer TTLs, anddomains with low request rates will receive map-pings with longer TTLs. Max-Cap, the algorithmproposed in this thesis, also uses the maximum ca-pacities of the serving replica nodes to allocate re-quests proportionally. The main difference is that inthe work by Colajanni et al., the root DNS sched-uler acts as a centralized dispatcher setting all DNSmappings and is assumed to know what the requestrate in the requesting domain is like. In the peer-to-peer case the authority node has no idea what therequest rate throughout the network is like, nor howlarge is the set of requesting nodes.

Lottery scheduling is another technique that, likeMax-Cap, uses proportional allocation. This ap-proach has been proposed in the context of resourceallocation within an operating system (the Mach mi-crokernel) [WW94]. Client processes hold ticketsthat give them access to particular resources in theoperating system. Clients are allocated resources

19

by a centralized lottery scheduler proportionally tothe number of tickets they own and can donate theirtickets to other clients in exchange for tickets at alater point. Max-Cap is similar in that it allocatesrequests to a replica node proportionally to the max-imum capacity of the replica node. The main differ-ence is that in Max-Cap the allocation decision iscompletely distributed with no opportunity for ex-change of resources across replica nodes.

VI. Conclusions

In this paper we examine the problem of load-balancing in a peer-to-peer network where the goalis to distribute the demand for a particular contentfairly across the set of replica nodes that serve thatcontent. Existing load-balancing algorithms pro-posed in the distributed systems literature are notappropriate for a peer-to-peer network. We findthat load-based algorithms do not handle the hetero-geneity that is typical in a peer-to-peer network. Wealso find that algorithms based on available capacityreports can suffer from load oscillations even whenthe workload request rate is as low as 60% of thetotal maximum capacities of replicas.

We propose and evaluate Max-Cap, a practicalalgorithm for load-balancing. Max-Cap handlesheterogeneity, yet does not suffer from oscillationswhen the workload rate is below 100% of the totalmaximum capacities of the replicas, adjusts betterto very large fluctuations in the workload and con-stantly changing replica sets, and incurs less over-head than algorithms based on available capacitysince its reports are affected only by extraneous loadon the replicas. We believe this makes Max-Cap apractical and elegant algorithm to apply in peer-to-peer networks.

VII. Acknowledgments

This research is supported by the Stanford Net-working Reseach Center, and by DARPA (contractN66001-00-C-8015).

The work presented here has benefited greatlyfrom discussions with Armando Fox and RajeevMotwani. We thank them for their invaluable feed-

back. We also thank Petros Maniatis for his detailedcomments on earlier drafts of this paper.

References

[AAFL96] B. Awerbuch, Y. Azar, A. Fiat, and T. Leighton.Making Commitments in the Face of Uncertainty:How to Pick a Winner Almost Every Time. InTwenty-eighth ACM Symposium on Theory ofComputing, 1996.

[AB00] L. Aversa and A. Bestavros. Load Balanc-ing a Cluster of Web Servers Using DistributedPacket Rewriting. In IEEE International Perfor-mance, Computing, and Communications Confer-ence, February 2000.

[ABKU94] Y. Azar, A. Broder, A. Karlin, and E. Upfal. Bal-anced Allocations. In Twenty-sixth ACM Sympo-sium on Theory of Computing, 1994.

[AYHI96] D. Andresen, T. Yang, V. Holmedahl, and O.H.Ibarra. SWEB: Towards a Scalable WWW Serveron MultiComputers. In IEEE International Sym-posium on Parallel Processing, April 1996.

[AYI96] D. Andresen, T. Yang, and O.H. Ibarra. Towards aScalable Distributed WWW Server on NetworkedWorkstations. Journal of Parallel and DistributedComputing, 42:91–100, 1996.

[Cao02] P. Cao. Search and Replication in UnstructuredPeer-to-Peer Networks, February 2002. Talkat http://netseminar.stanford.edu/sessions/2002-01-31.html.

[CC97] R. Carter and M. Crovella. Server Selection Us-ing Dynamic Path Characterization in Wide-AreaNetworks. In Infocom, 1997.

[CCY99] V. Cardellini, M. Colajanni, and P.S. Yu. Redirec-tion Algorithms for Load Sharing in DistributedWeb Server Systems. In ICDCS, June 1999.

[CCY00] V. Cardellini, M. Colajanni, and P.S. Yu. Geo-graphic Load Balancing for Scalable DistributedWeb Systems. In Proceedings of Modeling, Anal-ysis and Simulation of Computer and Telecommu-nication Systems (Mascots), August 2000.

[CDR99] M. Castro, M. Dwyer, and M. Rumsewicz. Loadbalancing and control for distributed World WideWeb servers. In Proceedings of the IEEE Inter-national Conference on Control Applications, Au-gust 1999.

[cis] Scaling the Internet Web Servers. Cisco SystemsWhitepaper, November 1997.

[CYC98] M. Colajanni, P. S. Yu, and V. Cardellini. Dy-namic Load Balancing in Geographically Dis-tributed Heterogeneous Web Servers. In ICDCS,1998.

[CYD97] M. Colajanni, P.S. Yu, and D.M. Dias. Schedul-ing Algorithms for Distributed Web Servers. InICDCS, 1997.

[Dah99] M. Dahlin. Interpreting Stale Load Information.In ICDCS, 1999.

20

[Dan95] S. Dandamudi. Performance Impact of Schedul-ing Discipline on Adaptive Load Sharing in Ho-mogeneous Distributed Systems. In ICDCS, 1995.

[DHB95] A. Downey and M. Harchol-Balter. A Note on’The Limited Performance Benefits of MigratingActive Processes for Load Sharing’. Technical Re-port UCB/CSD-95-888, UC Berkeley, November1995.

[DKMT96] D. M. Dias, W. Kish, R. Mukherjee, andR. Tewari. A Scalable and Highly Available WebServer. In Proceedings of IEEE COMPCON’96,1996.

[DO91] F. Douglis and J. Ousterhout. Transparent Pro-cess Migration: Design Alternatives and the SpiteImplementation. Software - Practice and Experi-ence, 21(8):757–785, 1991.

[ELZ86] D. Eager, E. Lazowska, and J. Zahorjan. AdaptiveLoad Sharing in Homogeneous Distributed Sys-tems. IEEE Transactions on Software Engineer-ing, 12(5):662–675, 1986.

[fou98] Foundry Networks ServerIron ServerLoad Balancing Switch, 1998. http://www.foundrynet.com.

[GC00] Z. Genova and K. J. Christensen. Challenges inURL Switching for Implementing Globally Dis-tributed Web Sites. In Workshop on Scalable WebServices, 2000.

[gnu] The Gnutella Protocol Specification v0.4. http://gnutella.wego.com.

[KBM94] E.D. Katz, M. Butler, and R. McGrath. A Scal-able HTTP server: the NCSA prototype. Com-puter Networks and ISDN Systems, 27:155–164,1994.

[KLH92] R. Karp, M. Luby, and F. M. Heide. EfficientPRAM Simulation on a Distributed Memory Ma-chine. In Twenty-fourth ACM Symposium on The-ory of Computing, 1992.

[LL96] C. Lu and S.M. Lau. An Adaptive Load BalancingAlgorithm for Heterogeneous Distributed Systemswith Multiple Task Classes. In ICDCS, 1996.

[LM93] R. Luling and B. Monien. A Dynamic DistributedLoad Balancing Algorithm with Provable GoodPerformance. In ACM Symposium on Parallel Al-gorithms and Architectures, 1993.

[Mar02] E. P. Markatos. Tracing a large-scale Peer-to-PeerSystem: an hour in the life of Gnutella. In SecondIEEE/ACM International Symposium on ClusterComputing and the Grid, 2002.

[MGB01] P. Maniatis, T.J. Giuli, and M. Baker. En-abling the Long-Term Archival of Signed Docu-ments through Time Stamping. Technical Reportcs.DC/0106058, Stanford University, June 2001.http://www.arxiv.org/abs/cs.DC/0106058.

[Mit96] M. Mitzenmacher. The Power of Two Choicesin Randomized Load Balancing. PhD thesis, UCBerkeley, September 1996.

[Mit97] M. Mitzenmacher. How Useful is Old Informa-tion? In Sixteenth Symposium on the Principles ofDistributed Computing, 1997.

[MTS89] R. Mirchandaney, D. Towsley, and J. Stankovic.Analysis of the Effects of Delays on Load Sharing.IEEE Transactions on Computers, 38:1513–1525,1989.

[MTS90] R. Mirchandaney, D. Towsley, and J. Stankovic.Adaptive Load Sharing in Heterogeneous Dis-tributed Systems. Journal of Parallel and Dis-tributed Computing, 9:331–346, 1990.

[Ora01] Andy Oram. Peer-to-Peer: Harnessing the Powerof Disruptive Technologies. O’Reilly PublishingCompany, March 2001.

[PL95] S. Petri and H. Langendorfer. Load Balancing andFault Tolerance in Workstation Clusters - Migrat-ing Groups of Communicating Processes. Oper-ating Systems Review, 29(4):25–36, Oct 1995.

[RB02] Mema Roussopoulos and Mary Baker. CUP: Con-trolled Update Propagation in Peer to Peer Net-works. Technical Report cs.NI/0202008, StanfordUniversity, February 2002. http://arXiv.org/abs/cs.NI/0202008.

[RD01] A. Rowstron and P. Druschel. Pastry: Scal-able, distributed object location and routing forlarge-scale peer-to-peer systems. In MiddleWare,November 2001.

[RF02] Matei Ripeanu and Ian Foster. Mapping theGnutella Network: Macroscopic Properties ofLarge-Scale Peer-to-Peer Systems. In First In-ternational Workshop on Peer-to-Peer Systems(IPTPS), 2002.

[RFH � 01] S. Ratnasamy, P. Francis, M. Handley, R. Karp,and S. Shenker. A Scalable Content-AddressableNetwork. In SIGCOMM, 2001.

[SGG02] S. Saroiu, P. K. Gummadi, and S. D. Gribble. AMeasurement Study of Peer-to-Peer File SharingSystems. In Proceedings of Multimedia Comput-ing and Networking (MMCN), 2002.

[SK90] N.G. Shivaratri and P. Krueger. Two Adaptive Lo-cation Policies for Global Scheduling Algorithms.In IEEE International Conference on DistributedComputing Systems (ICDCS), 1990.

[SKS92] N. Shivaratri, P. Krueger, and M. Singhal. LoadDistributing for Locally Distributed Systems.IEEE Computer, pages 33–44, Dec 1992.

[SMK � 01] I. Stoica, R. Morris, D. Karger, F. Kaashoek, andH. Balakrishnan. Chord: A Scalable Peer-to-peerLookup Service for Internet Applications. In SIG-COMM, 2001.

[SYC02] K. Shen, T. Yang, and L. Chu. Cluster Load Bal-ancing for Fine-Grain Network Services. In Inter-national Parallel and Distributed Processing Sym-posium, 2002.

[VDK96] N. Vvedenskaya, R. Dobrushin, and F. Karpele-vich. Queuing Systems with Selection of theShortest of Two Queues: an Asymptotic Ap-proach. Problems of Information Transmission,32:15–27, 1996.

[Web78] R. Weber. On the Optimal Assignment of Cus-tomers to Parallel Servers. Journal of AppliedProbability, 15:406–413, 1978.

21

[Win77] W Winston. Optimality of the Shortest Line Dis-cipline. Journal of Applied Probability, 14:181–189, 1977.

[WW94] C.A. Waldspurger and W.E. Weihl. Lotteryscheduling: Flexible proportional-share resourcemanagement. In Proceedings of the First USENIXSymposium on Operating Systems Design and Im-plementation (OSDI), Nov 1994.

[ZKJ01] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph.Tapestry: An Infrastructure for Fault-tolerantWide-area Location and Routing. Technical Re-port UCB/CSD-01-1141, U. C. Berkeley, April2001.

[ZYZ � 98] H. Zhu, T. Yang, Q. Zheng, D. Watson, O. H.Ibarra, and T. Smith. Adaptive load sharing forclustered digital library services. In 7th IEEEIntl. Symposium on High Performance DistributedComputing (HPDC), 1998.

Appendix

It should not surprise the reader that Inv-Load does nothandle heterogeneity since the same load at one replicamay have a different effect on another with a differentmaximum capacity. However, surprisingly it turns outthat when replicas are homogeneous, the performance ofInv-Load and Max-Cap are comparable.

In this set of experiments, there are ten replicas, eachof whose maximum capacity we set at 10 requests persecond for a total maximum capacity of 100 requests persecond. Queries are generated according to a Poissonprocess with a lambda rate that is 80% the total maxi-mum capacities of the replicas.

Figures 35 and 36 show a scatterplot of how the uti-lization of each replica proceeds with time when usingInv-Load with a refresh period of one time unit and Max-Cap respectively. Inv-Load and Max-Cap have similarscatterplots.

Figures 37 and 38 show for each replica, the percent-age of queries that arrived at the replica while the replicawas overloaded. Again, we see that Inv-Load and Max-Cap have comparable performance.

The difference is that Inv-Load incurs the extra over-head of one load update per replica per second. In aCUP tree of 100 nodes this translates to 1000 updatesper second being pushed down the CUP tree. In a treeof 1000 nodes this translates to 10000 update per secondbeing pushed. Thus, the larger the CUP tree, the largerthe overall network overhead. The overhead incurred byInv-Load could be reduced by increasing the period be-tween two consecutive updates at each replica. Increas-ing the period results in staler load updates. We find thatwhen experimenting with a range of periods (one to sixtyseconds), we confirm earlier studies [Mit97] that havefound that as load information becomes more stale withincreasing periods, the performance of load-based bal-ancing algorithms decreases.

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 35. Replica Utilization versus Time for Inv-Loadwith an inter-update period of one second and homoge-neous replicas.

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500

Util

izat

ion

Time (seconds)


Fig. 36. Replica Utilization versus Time for Max-Capwith homogeneous replicas.

We ran experiments with Pareto( � , � ) query interar-rivals with a wide range of � and � values (the Paretodistribution shape and scale parameters) and found thatwith homogeneous replicas, Inv-Load with a period ofone and Max-Cap continue to be comparable. However,Max-Cap is preferable in these cases because it incurs nooverhead.

22

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 37. Percentage Overload Queries versus Replica IDfor Inv-Load with an inter-update period of one secondand homogeneous replicas.

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

% Q

uerie

s O

verlo

aded

Replica ID

% Queries Over

Fig. 38. Percentage Overload Queries versus Replica IDfor Max-Cap with homogeneous replicas.

23

practical load balancing for content requests in peer-to...

Documents