time-critical event dissemination in geographically...

6
Time-Critical Event Dissemination in Geographically Distributed Clouds Chi-Jen Wu *† , Jan-Ming Ho and Ming-Syan Chen *†‡ * Department of Electrical Engineering, National Taiwan University, Taiwan Institute of Information Science, Academia Sinica, Taiwan Research Center for Information Technology Innovation, Academia Sinica, Taiwan {cjwu,hoho}@iis.sinica.edu.tw, [email protected] Abstract—Cloud computing has rapidly become a new infras- tructure for organizations to reduce their capital cost in IT in- vestment and to develop planetary-scale distributed applications. One of the fundamental challenges in geographically distributed clouds is to provide efficient algorithms for supporting intercloud data management and dissemination. In this paper, we present Plume, a generic distributed intercloud overlay for time-critical event dissemination services. Plume aims at improving the inter- operability of interclouds in time-critical event dissemination services, such as computing policy updating, message sharing, event notifications and so forth. Plume organizes these distributed clouds into a novel quorum ring overlay to support a constant event dissemination latency. Our numerical results show that the proposed Plume greatly improves the efficiency as compared to a DHT-based overlay approach and provides better scalability than the fully-meshed approach. I. I NTRODUCTION In the last few years, cloud computing has rapidly become a new infrastructure for organizations to reduce their capital cost in IT investment and to develop planetary-scale distributed applications over the Internet. Cloud service providers, exem- plified by Amazon Simple Storage Service (S3), Microsofts Azure Service Platform, and Google App Engine, have to scat- ter geographically as many datacenters as possible to support a great number of users spread around the world. Because of the demands, a growing number of vendors are introducing datacenter containers [1] to support modular datacenters that can provide predictable, repeatable components for the desired organizations. Moreover, IT organizations/enterprises, govern- ment/departments, universities may deploy their own clouds in the future [2], [3]. We can imagine that, in the future, a lot of distributed clouds, including public and private clouds, will soon be built around the globe. One of the fundamental challenges in large-scale geograph- ically distributed clouds is to provide efficient algorithms for supporting intercloud [4] data management and spread [5]. In other words, it is that intercloud applications/systems have to interact with a set of clouds via the Internet. Especially for time-critical services across a large geographic area, these service providers prefer to delivery the events as early as possible. Therefore, as the number of clouds continues to grow, the scale of distributed clouds may achieve a planetary-scale, with even thousand or hundred thousands of clouds. One of major challenges in these planetary-scale distributed clouds is to promptly disseminate the time-critical event to other clouds. A typical example for such time-critical services considers disaster early warning systems [6], [7]. Assume that each country owns a disaster alert cloud for hosting the disaster early warning system and these disaster alert clouds are connected with the Internet. This disaster alert cloud is a private cloud built by the weather bureau of a country. The disaster early warning system allows end users to monitor a large disaster, such as earthquakes, hurricanes, tsunamis etc., which influences many countries. In such a system, the disaster measurement data is collected by a lot of deployed sensors, and is gathered into the disaster alert cloud of a country for representing a summary to disaster observers. However, if the monitoring disaster may cause damages to other countries, an alert notification should be rapidly disseminated to clouds across other countries to minimize the damage. Another example is that for financial computing services or social networking services, such as SalesForce, Facebook and Twitter, may spread many geographically distributed dat- acenters (clouds) or rent an amount of datacenters across a large geographic area. When a user updates his/her activity with friends, the update event should quickly reflect to his/her friends who may be served on other clouds. Those busi- ness/social networking services need an efficient way to dis- seminate the update events among geographically distributed clouds to maintain the quality of user experiences. Thus how to share and to disseminate information among geographically distributed datacenters is critical for providing time-sensitive services in a distributed cloud computing environment. To manage and deliver the content well, these distributed clouds have to organize themselves into a federated intercloud overlay network [8]–[11]. Currently, each large-scale cloud service provider uses its own approach to manage or deliver the content. For instance, Facebook aggregates the server log data using a tree-based overlay [12], and Amazon’s Dynamo leverages a fully-meshed topology for managing data across multiple datacenters [5]. However, to meet the demand of time- critical services, this overlay requires a fundamental property, which is that scalability, constant dissemination delay and robustness in the intercloud overlay should be considered cau- tiously. Many existing overlay techniques can support the event dissemination in distributed systems. The fully-meshed design is the simplest way to disseminate time-critical messages to IEEE INFOCOM 2011 Workshop on Cloud Computing 978-1-4244-9920-5/11/$26.00 ©2011 Crown 665

Upload: others

Post on 04-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Time-Critical Event Dissemination in Geographically ...cse.unl.edu/~byrav/INFOCOM2011/workshops/papers/p665-wu.pdfTime-Critical Event Dissemination in Geographically Distributed Clouds

Time-Critical Event Dissemination inGeographically Distributed Clouds

Chi-Jen Wu∗†, Jan-Ming Ho† and Ming-Syan Chen∗†‡∗Department of Electrical Engineering, National Taiwan University, Taiwan

†Institute of Information Science, Academia Sinica, Taiwan‡Research Center for Information Technology Innovation, Academia Sinica, Taiwan

{cjwu,hoho}@iis.sinica.edu.tw, [email protected]

Abstract—Cloud computing has rapidly become a new infras-tructure for organizations to reduce their capital cost in IT in-vestment and to develop planetary-scale distributed applications.One of the fundamental challenges in geographically distributedclouds is to provide efficient algorithms for supporting interclouddata management and dissemination. In this paper, we presentPlume, a generic distributed intercloud overlay for time-criticalevent dissemination services. Plume aims at improving the inter-operability of interclouds in time-critical event disseminationservices, such as computing policy updating, message sharing,event notifications and so forth. Plume organizes these distributedclouds into a novel quorum ring overlay to support a constantevent dissemination latency. Our numerical results show that theproposed Plume greatly improves the efficiency as compared toa DHT-based overlay approach and provides better scalabilitythan the fully-meshed approach.

I. INTRODUCTION

In the last few years, cloud computing has rapidly becomea new infrastructure for organizations to reduce their capitalcost in IT investment and to develop planetary-scale distributedapplications over the Internet. Cloud service providers, exem-plified by Amazon Simple Storage Service (S3), MicrosoftsAzure Service Platform, and Google App Engine, have to scat-ter geographically as many datacenters as possible to supporta great number of users spread around the world. Because ofthe demands, a growing number of vendors are introducingdatacenter containers [1] to support modular datacenters thatcan provide predictable, repeatable components for the desiredorganizations. Moreover, IT organizations/enterprises, govern-ment/departments, universities may deploy their own cloudsin the future [2], [3]. We can imagine that, in the future, a lotof distributed clouds, including public and private clouds, willsoon be built around the globe.

One of the fundamental challenges in large-scale geograph-ically distributed clouds is to provide efficient algorithms forsupporting intercloud [4] data management and spread [5].In other words, it is that intercloud applications/systems haveto interact with a set of clouds via the Internet. Especiallyfor time-critical services across a large geographic area, theseservice providers prefer to delivery the events as early aspossible. Therefore, as the number of clouds continues to grow,the scale of distributed clouds may achieve a planetary-scale,with even thousand or hundred thousands of clouds. One ofmajor challenges in these planetary-scale distributed clouds is

to promptly disseminate the time-critical event to other clouds.A typical example for such time-critical services considers

disaster early warning systems [6], [7]. Assume that eachcountry owns a disaster alert cloud for hosting the disasterearly warning system and these disaster alert clouds areconnected with the Internet. This disaster alert cloud is aprivate cloud built by the weather bureau of a country. Thedisaster early warning system allows end users to monitor alarge disaster, such as earthquakes, hurricanes, tsunamis etc.,which influences many countries. In such a system, the disastermeasurement data is collected by a lot of deployed sensors,and is gathered into the disaster alert cloud of a country forrepresenting a summary to disaster observers. However, if themonitoring disaster may cause damages to other countries,an alert notification should be rapidly disseminated to cloudsacross other countries to minimize the damage.

Another example is that for financial computing servicesor social networking services, such as SalesForce, Facebookand Twitter, may spread many geographically distributed dat-acenters (clouds) or rent an amount of datacenters across alarge geographic area. When a user updates his/her activitywith friends, the update event should quickly reflect to his/herfriends who may be served on other clouds. Those busi-ness/social networking services need an efficient way to dis-seminate the update events among geographically distributedclouds to maintain the quality of user experiences. Thus howto share and to disseminate information among geographicallydistributed datacenters is critical for providing time-sensitiveservices in a distributed cloud computing environment.

To manage and deliver the content well, these distributedclouds have to organize themselves into a federated intercloudoverlay network [8]–[11]. Currently, each large-scale cloudservice provider uses its own approach to manage or deliverthe content. For instance, Facebook aggregates the server logdata using a tree-based overlay [12], and Amazon’s Dynamoleverages a fully-meshed topology for managing data acrossmultiple datacenters [5]. However, to meet the demand of time-critical services, this overlay requires a fundamental property,which is that scalability, constant dissemination delay androbustness in the intercloud overlay should be considered cau-tiously. Many existing overlay techniques can support the eventdissemination in distributed systems. The fully-meshed designis the simplest way to disseminate time-critical messages to

IEEE INFOCOM 2011 Workshop on Cloud Computing

978-1-4244-9920-5/11/$26.00 ©2011 Crown 665

Page 2: Time-Critical Event Dissemination in Geographically ...cse.unl.edu/~byrav/INFOCOM2011/workshops/papers/p665-wu.pdfTime-Critical Event Dissemination in Geographically Distributed Clouds

each other nodes. This scheme takes constant transmissiondelay easily. However, each node in the fully-meshed designis required to maintain O(n) connections with other nodes ina distributed system of n nodes. Thus this scheme does notscale well in large-scale distributed systems.

Alternative approach is the tree-based design. Many tree-based approaches have been proposed with scalability as themain design criterion in the peer-to-peer paradigm [13]–[15].However, most of their conception don’t aim to provide a time-critical event dissemination, instead of they focus on efficientevent dissemination as the communication cost incurred bydelivering event among the participated nodes. Based on tree-based design, it needs O(log n) hops to delivery events to allnodes, where n is the number of participated nodes. Anotherpossible way to efficiently interconnect distributed clouds isbased on Distributed Hash Tables (DHT) overlay. DHT, suchas Chord [16], is inherently scalable and efficient at theInternet scale. However, the most of DHT also require at mostO(log n) delivery steps [14]. Gossip-based event dissemina-tion protocols [15] also achieved scalable and fault-tolerant,however the most of them usually come with long delayin transporting messages. On the other hand, time-criticalservices such as those that processing financial market dataor disseminating the diaster events, have tight delivery delaydemands, thus these tree-based, DHT-based or Gossip-basedapproaches may not meet the new challenge of intercloudenvironments in a timely manner.

In this paper, our main focus is this problem: how quicklyupdated event will be available at every clouds in a large-scale distributed clouds environment, it is to improve theinter-operability of interclouds in computing policy updating,message sharing, event notifications and so forth. Specifically,we present Plume, a generic distributed intercloud overlay forevent dissemination, that aims at forming an overlay topologyfor supporting time-critical event dissemination in a planetary-scale self-configuring overlay of clouds connected via theInternet. The cloud nodes in Plume are organized into a novelquorum ring overlay that is similar to the Chord ring [16].However, the finger tables of nodes are constructed in a verydifferent way: each cloud node independently carries out thefinger table based on the concept of an overlay ring and thequorum system [17]. This eliminate the need for complexdistributed overlay tree construction protocols, yet it resultsan efficient one-hop overlay and allows the overlay to deliveryevents without long event propagation delay. In addition, forscalability reasons, the size of finger table maintained by eachcloud node to others is O(

√n). Extensive simulations are

conducted to evaluate the event transmission efficient of Plumeand a DHT-based design. Our numerical results show that theproposed Plume greatly improve the efficiency as compared toa DHT-based overlay approach and provide better scalabilitythan the fully-meshed approach.

In summary, this paper makes the following contributions:1) We present a distributed intercloud protocol, called Plume,based on a novel quorum ring to construct a low complexitycloud-level overlay for time-critical event dissemination. Also

Plume is simple and easy to implement. 2) To the best of ourknowledge, this paper provides the first study to address time-critical event dissemination in intercloud environments. More-over, event dissemination provides the essential functionalityfor cloud computing applications that require collaborationwith geographically distributed clouds. 3) We analyze theperformance of Plume and different designs of distributedarchitectures, and also evaluate them empirically to show theperformance advantages of Plume.

The rest of this paper is organized as follows. In the nextsection, we describe the background and related work. InSection III, we present the detailed design of the proposedPlume. Performance analysis of Plume and other schemes arepresented in Section 4. In Section IV, we present performanceresults on Plume and a DHT-based scheme. In Sections 6and V, we discuss the further consideration and conclude thispaper.

II. RELATED WORK

Intercloud [4] is one of the latest research fields in cloudcomputing research community. The one of important func-tionalities of intercloud is that if a cloud is out of its resources,then it could not satisfy further requests from its users.However, intercloud may bring new business niches amongcloud service providers if these clouds can inter-operate eachother for sharing resources. Thus the demand of creatinginter-operability amongst these clouds will be occurred soonin the future Internet. Intercloud is still in its infancy. Afew related results in the intercloud research area have beenrecently proposed by cloud computing community. Therefore,as interest in intercloud grows, so will the need for developingtechnologies and tools as its building blocks.

Most research on interclod communication is still in itsearly stage: focusing on the standardization of communicationprotocols, dynamic computing workload migration, etc, andthere is no initial empirical results [18]. In 2010, Vint Cerf,one of pioneers of the Internet, pointed out that the need forinter-operability between different clouds will emerge in thefuture [8]. Cisco researchers, Bernstein et al. [9], discussedresearch issues around inter-cloud protocols and presented anarchitecture for intercloud networks. They also indicated thatpresence and messaging mechanisms of intercloud should bedesigned. Buyya et al. [10] also presented research challengesof distributed clouds, and designed a federated cloud comput-ing environment for dynamically coordinating computing loadamong distributed clouds. Cross-Cloud Federation Manager(CCFM) [11] is another resources management design offederated clouds.

The ADAMANT project [19] is a pub/sub middleware thatuses supervised machine learning to autonomously configureavailable resources among distributed clouds. It aims at design-ing a cloud middleware for supporting on-demand requests ofcloud resources as a part of a disaster management system.Aneka-Federation [20] is a fully decentralized cloud overlaybased on DHT. It is used to coordinate application schedulingand to federate resources from multiple clouds. Volley [21]

666

Page 3: Time-Critical Event Dissemination in Geographically ...cse.unl.edu/~byrav/INFOCOM2011/workshops/papers/p665-wu.pdfTime-Critical Event Dissemination in Geographically Distributed Clouds

presents a data replication mechanism for reducing the inter-datacenters traffic and improve user latency, it analyzes userlogs to output data migration recommendations back to themanagers of datacenters. Laoutaris et al. [3] presented thenotion of nano datacenters, which are deployed at the edgeof the networks, such as set-top-boxes. They argue that thenovel nano datacenters are better suited for new emerging ap-plications, e.g., online gaming, interactive IPTV, etc. However,this conceptual design requires an efficient way to inter-operatethese distributed nano datacenters.

Compared with the above efforts, we focus on the designof intercloud network protocols and intercloud overlay archi-tecture for time-critical event dissemination as a basic build-ing blocks of cloud computing for many-to-many messagessharing. We tackle the scalability and efficiency problems ofintercloud overlay, by organizing distributed clouds into a lowcomplexity quorum ring overlay to reduce the transmittedlatency of intercloud communications.

III. DESIGN OF PLUME

We propose a new intercloud overlay design, refers to Plumeintercloud, in distributed clouds computing environments. Theresulting overlay is then used for efficient and scalable manyto many event dissemination. In this section, we begin withan introduction on the overview of Plume intercloud andits assumptions. Then, we present detail design of quorumring overlay on which Plume is built, then present howPlume intercloud disseminate events among distributed cloudenvironments.

A. Overview of Plume Intercloud and its Assumptions

We consider an intercloud network with a set of geographi-cally distributed clouds connected by the Internet, and eachcloud is composed of the following components: a cloudmanager, applications/services and a Plume node. We illustratea simple overview of a Plume intercloud in Figure 1. As shownin Figure 1, these distributed clouds are organized into anintercloud overlay via Plume nodes. In each cloud, there is aphysical data center management service for cloud platformvendors, such as Amazon EC2, VMware or IBM Tivoli. Anapplication or service refers to a user level application or arunning service, e.g., a virtual cloud management service 1,Facebook and Twitter hosted by multiple clouds in geograph-ically dispersed locations.

Hence, for the scenario in Figure 1, if the user changesthe cloud computing policy of the cloud manager, then thePlume node acts as a middleware service that disseminatesthe new policy to the rest of Plume nodes of clouds throughthe Plume intercloud. In other words, a Plume node preformsa connectivity gateway for a cloud utilizing control anddata plane mechanisms to coordinate other distributed clouds.Moreover, a newly cloud can be deployed or join into a Plumeintercloud, the new deploying cloud needs to initiate a Plumenode for communication with other existing clouds.

1As well as CloudStatus [22] or CloudSwitch [23], it is designed as a hostedservice for monitoring and managing user’s applications or services.

Fig. 1. An overview of a Plume intercloud network.

In this work, we assume that communications in a Plumeintercloud are treated as an event dissemination. We explorethe one-to-many event dissemination scenario, and it easilyextends Plume to support many to many event dissemina-tion. Formally, we assume that an intercloud network is anundirected graph G = (V,E) with nodes V correspondingto Plume nodes and edges E corresponding to unidirectionaloverlay links. Let u, v ∈ V , an edge e(u, v) ∈ E indicatesthat u and v know each other and can communicate directlythrough the intercloud overlay. In addition, we assume thatdistributed clouds are uniquely identified. In what follows, wedescribe our Plume intercloud in detailed descriptions.

B. Plume Intercloud

A Plume intercloud organizes the Plume nodes of thesedistributed clouds into an intercloud overlay based on theQuorum Ring Overlay (QRO) topology. We will describe itlater. Plume intercloud ensures that any Plume node onlyneeds at most 2 hops to reach any other Plume nodes inthe intercloud network. Thus it can offer a constant dis-semination latency for time-critical event delivery in large-scale distributed cloud computing environments. Specifically,a Plume intercloud as a middleware provides an interfacefor the cloud services or applications to share event amongthese distributed clouds, which consists of the basic primitivefunction calls: 1) Disseminate(e) operation to disseminate anevent e to other clouds, 2) Join() and Leave() operation for aPlume node to join or depart a Plume intercloud, 3) Relay(e)operation that is invoked when a Plume node receives an evente and 4) a simple Stabilize() process that runs periodicallyat each Plume node and maintains the QRO topology.

The QRO topology of Plume intercloud exploits the sim-ple concepts of an overlay ring [16] and the grid quorumsystem [17] to make Plume intercloud scalable and efficientin event dissemination. As well as Chord ring, the overlayring topology in QRO is easily imposed on it by defining theSuccessor(i) and the Predecessor(i) operations to maintainthe ring edges of a Plume node i. The two operations returnboth successor and predecessor of the position for a Plume

667

Page 4: Time-Critical Event Dissemination in Geographically ...cse.unl.edu/~byrav/INFOCOM2011/workshops/papers/p665-wu.pdfTime-Critical Event Dissemination in Geographically Distributed Clouds

Fig. 2. An example of Quorum Ring Overlay of 9 distributed clouds.

node i maintaining the adjacency relations in the overlay ring.However, the event dissemination scheme that uses the simplering overlay requires at last n

2 hops for delivering events to anintercloud network of size n.

To accelerate the event dissemination time, each Plume nodemaintains a set of Plume nodes, refers to q-table, to expeditethe event dissemination operations. The q-table is constructedbased on the grid quorum system. A formal definition of agrid quorum system is as follows.

DEFINITION: A grid quorum system Q is an universal setof logical points U = {1, 2, . . . , n} that is arranged as a

√n×√

n array. A subset qi ⊆ Q is defined as the set of points onthe row and the column passing through logical point i in thearray. Thus, it is clear that qi ∩ qj ̸= null, for any point i andj, 1 ≤ i, j ≤ n. Based on the definition, for any logical pointi, the size of qi is 2

√n− 1 in a grid quorum system.

For an instance, in Figure 2, in which the grid quorum Q ofsize√9×√9 is filled with number 1,. . . ,9. The q2 is {1,3,5,8},

and q7 is {1,4,8,9}. It is clear that q2 ∩ q7 ̸= null.Before a Plume node joins the Plume intercloud, it has to

get an identifier in the grid quorum by manually configure orby centralized assignment. In a

√n×√n grid quorum system,

a Plume node with a grid identifier can easily pick one columnand one row of entries and these entries will become its q-tablein a Plume intercloud overlay. More precisely, at Plume nodei, its q-table consists of a rowi and a coli entries. Then thisPlume node can obtain its ring edges and q-table by askingany existing Plume nodes in the Plume intercloud. Thus, thePlume nodes can construct the QRO topology according totheir ring edges and q-tables respectively. Note that the sizeof ring edges plus q-table in some Plume nodes is 2

√n, e.g.,

when a Plume node with the grid point i ≡ 0 (mod√n) or i ≡

1 (mod√n). As an example, suppose we have nine distributed

clouds as in Figure 2. The Plume node in Cloud 2 has q-table{1,3,5,8} and its ring edges {1,3}. Plume node in Cloud 7has q-table {1,4,8,9}. In addition, there are ring edges {6,8}in Plume node 7.

We now show that the construction of the quorum ringoverlay using the grid quorum results in each Plume node

can reach any Plume node at most two hops.LEMMA 1: In a stable Plume intercloud network of n nodes,

each Plume node can reach any other Plume nodes in a twohops route.

Proof: In a grid quorum system of√n ×

√n points,

for each grid point i has a quorum set, qi that contains afull column and a full row of elements in the array. By thedefinition of a grid quorum system, in any grid quorum system,qi ∩ qj ̸= null for any grid point i and j, 1 ≤ i, j ≤ n. Letk be the some grid points in qi ∩ qj . Thus, for any grid point,j /∈ qi, the grid point i can reach j by passing one of gridpoints, l ∈ k. Then it is clear that any Plume node i in aquorum ring overlay can reach any other plume nodes in atwo hops route.

Next we discuss how a Plume node maintain its q-table. Ina Plume intercloud, every Plume node initiates a Stabilize()procedure to maintain the disconnected Plume nodes in its q-table. The presences of Plume nodes in q-table can easily beaware by exchanging heartbeat messages with ring edges andPlume nodes in its q-table periodically. If failure Plume nodej ∈ the column of Plume node i’s q-table then the Plume nodei can replace it by any active Plume nodes in the rowj of thefailure Plume node j. For example, in Figure 2, if the cloud5 is disconnected with the Plume intercloud then cloud 2 willreplace it by cloud 4 to maintain the overlay connectivity. Orif the cloud 1 is disconnected with the Plume intercloud thencloud 7 will replace it by cloud 2.

LEMMA 2: If each Plume node in the Plume intercloudretains the correct ring edges, i.e., successor and predecessor,then the q-table of Plume nodes can be recovered from anyPlume node failure after a stabilization time.

Proof: In a stable Plume intercloud network of n nodes,the Stabilize() procedure starts at a Plume node p by sta-bilizing its ring edges, i.e, successor and predecessor whichhave already the correct entries for any Plume node. Afteranother run of stabilization, the successor of p’s successorbecomes correct, and so on. After at most

√n stabilization

runs, Plume node p can correct all of row entries in its q-table.After stabilizing row entries, each Plume node has correctrow entries. And the remained col entries can be retrieved byasking these stabilized row entries.

C. Event Dissemination Mechanism

Minimizing the event dissemination time is important toa time-critical event service, we contend that. We develop aquick event dissemination algorithm, in which time-criticalevents are disseminated to all Plume nodes via the q-tableedges of the QRO topology. Based on the QRO topology,Plume intercloud can typically provides swift event dissem-ination in large-scale distributed clouds. For simplicity, weassume that the QRO is with a perfect grid quorum, al-though the algorithm can tolerant node failure. The eventdissemination algorithm can be invoked by any Plume nodeto initiate a disseminating operation among its q-table edges.The event dissemination algorithm is represented as follows.In algorithm 1, we assume that a Plume node i requires to

668

Page 5: Time-Critical Event Dissemination in Geographically ...cse.unl.edu/~byrav/INFOCOM2011/workshops/papers/p665-wu.pdfTime-Critical Event Dissemination in Geographically Distributed Clouds

Algorithm 1 Plume Disseminate(e) Algorithm1: /* Plume node s is invoked to disseminate an event e */2: for all Plume node i ∈ s’s q-table do3: e.ttl ← 24: Send(e, i)5: end for

Algorithm 2 Plume Relay(e) Algorithm1: /* Plume node r receives an event e from s */2: if e.ttl -1 ≤ 0 then3: return4: end if5: if Plume node r ∈ cols then6: for all Plume node i ∈ rowr do7: e.ttl ← 18: Send(e, i)9: end for

10: else if Plume node r ̸∈ cols then11: /* the event dissemination operation is done */12: return13: end if

disseminate events to every other nodes in a Plume intercloud.And algorithm 2 presents the Relay(e) operation in a Plumenode r that receives an event e from s.

As previously mentioned, the event dissemination algorithmof Plume intercloud does not requires a complex disseminationalgorithm, the simple event dissemination technique can ex-hibit much improve event dissemination time based on the lowdiameter property of QRO topology. For an instance, in Figure2, the Plume node s of Cloud 2 is invoked to disseminate anevent to other clouds in the Plume intercloud. Thus, thesePlume nodes of Cloud 1, 3, 5 and 8 receive the event from sdirectly. And the Plume nodes of Cloud 5 and 8 ∈ col2 willrelay the event to Cloud 4, 6 ∈ row5 and Cloud 7, 9 ∈ row8

respectively.In discussing the average dissemination delay in Plume

intercloud, we simplified the transmission delay as hop counts,i.e., one signal hop costs a delay of 1.

LEMMA 3: In a stable Plume intercloud network of nnodes, the average dissemination delay caused by the Plumeintercloud is equal to 2(n−

√n)

n−1 .Proof: In a stable intercloud network of n nodes, the

originator Plume node disseminates the events to its q-tablewith one hop, and these Plume nodes in its q-table shouldrelay the events with another hop. Therefore, given any Plumenode in the Plume intercloud, the size of q-table Plume nodesreceive the event with a delay of 1. The size of q-table is2√n−1. And there is a total of (n−2

√n) Plume nodes receive

the event with a delay of 2. Thus, the total dissemination delayin Plume intercloud is approximately equal to 1 × (2

√n) +

2 × (n − 2√n) = 2(n −

√n) and the average dissemination

delay in Plume intercloud is 2(n−√n)

n−1 .

0 100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

King-topology Brite-topology

CDF

End-to-End Latency (ms)

Fig. 3. End-to-end latency distribution over all pairs of King topology andBrite-topology

IV. PERFORMANCE EVALUATION

In this section, we present the details of the frameworkused for the experiments. Our implementation of a packet-level simulator and two intercloud overlay designs, includingPlume and a DHT-based intercloud [20], was written in Java.We implement one of the most popular DHT system, Chord, asthe event dissemination overlay [14]. We apply two physicaltopologies to simulate Internet networks. 1) King-topology:This is a real Internet topology from the King data set.The King data set delay matrix is derived from Internetmeasurements using techniques that described by Gummadiet al [24]. It consists of 2,048 DNS servers. The latenciesare measured as RTTs between the DNS servers. 2) Brite-topology: This is a AS topology generated by the BRITEtopology generator [25] using the Waxman model where alphaand beta are set to 0.15 and 0.2, respectively. In addition, HS(size of one side of the plane) is set to 1,000 and LS (sizeof one side of a high-level square) is set to 100. Totally, theBrite-topology consists of 1,000 nodes.

The simulated topology places every Plume node at aposition on the King-topology or the Brite-topology, chosenuniformly at random. Every simulation results is the average50 runs. The average delay of King-topology is 77.4 millisec-onds and 96.2 milliseconds in the Brite-topology. In Figure 3,we show the CDF of the King-topology and the CDF of theBrite-topology.

First, we investigate the event dissemination latency ofintercloud overlay designs. Figure 4 plots the average eventdissemination latency of both two intercloud designs in theBrite-topology with the 95% confidence interval. Figure 5plots these results in the King-topology. As shown as Figure 4and 5, for Plume intercloud, the average event disseminationlatency dose not grow with the number of Plume nodes.However, in Chord-based design, an event dissemination op-eration that may visit a logarithmic number hops to deliveryevent to other clouds [14]. The expected event disseminationlatency for Plume intercloud is about 150 milliseconds inKing-topology and is approximately 200 milliseconds in Brite-topology. The confidence intervals are computed over 50independent runs.

669

Page 6: Time-Critical Event Dissemination in Geographically ...cse.unl.edu/~byrav/INFOCOM2011/workshops/papers/p665-wu.pdfTime-Critical Event Dissemination in Geographically Distributed Clouds

0 600 1200 1800 2400 3000 3600 4200100

200

300

400

500

600

Ave

rage

Eve

nt D

isse

min

atio

n La

tenc

y

Number of Plume Nodes

Chord-based Plume

Brite-topology

Fig. 4. Average event dissemination latency vs. number of Plume nodes inthe Brite-topology

0 600 1200 1800 2400 3000 3600 4200100

200

300

400

500

600King-topology

Ave

rage

Eve

nt D

isse

min

atio

n La

tenc

y

Number of Plume Nodes

Chord-based Plume

Fig. 5. Average event dissemination latency vs. number of Plume nodes inthe King topology

Figure 6 plots that the relative performance of variousintercloud overlay designs in the event dissemination latencyfor two different network topologies. We run experiments inwhich the number of Plume nodes is set to 1024. As shown inFigures 6, the curve for Chord-based design has a flatter tail inthe King-topology. This is due to the fact that the distributionof physically link latency in king-topology is more skewedthan the distribution of Brite-topology. These results showthat Plume provides a suitable technique for time-critical eventdissemination services.

V. CONCLUSIONS

We present Plume, an initial design of an intercloud overlayfor efficient time-critical event dissemination in large-scalegeographically distributed cloud computing environments. Thedesign of Plume avoids the bottlenecks of centralized ap-proaches or long dissemination latency of tree-based and DHT-based approaches. Overall, Plume intercloud achieves constantevent dissemination latency, it can be used as a buildingblock of intercloud for deploying efficient time-critical eventdissemination services in distributed cloud environments.

REFERENCES

[1] K. Church, A. Greenberg, and J. Hamilton, “On delivering embarrass-ingly distributed cloud services,” Proc. of ACM HotNets, 2008.

[2] “Datacenter dynamics - gartner: private clouds to enjoy more investmentthrough 2012 than public services,” http://www.datacenterdynamics.com,2009.

0.00.20.40.60.81.0

0 400 800 1200 1600 20000.00.20.40.60.81.0

Chord-based Plume

Brite-topology

CDF

King-topology

CDF

Event Dissemination Latency

Fig. 6. The CDF of event dissemination latency in Brite and King topology

[3] N. Laoutaris, P. Rodriguez, and L. Massoulie, “Echos: Edge capacityhosting overlays of nano data centers,” ACM SIGCOMM CCR, 2008.

[4] “Wikipedia-intercloud,” http://en.wikipedia.org/wiki/Intercloud.[5] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,

A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo:amazon’s highly available key-value store,” Proc. of ACM SOSP, 2007.

[6] “Disaster alert network,” http://www.ubalert.com/.[7] “The australian early warning network, availabile at

http://www.ewn.com.au/.”[8] V. Cerf, “Technology review: Integrating the clouds,”

http://www.technologyreview.com/computing/24173/, 2010.[9] D. Bernstein, E. Ludvigson, K. Sankar, S. Diamond, and M. Morrow,

“Blueprint for the intercloud-protocols and formats for cloud computinginteroperability,” Proc. of IEEE International Conference on Internetand Web Applications and Services, vol. 0, pp. 328–336, 2009.

[10] R. Buyya, R. Ranjan, and R. N. Calheiros, “Intercloud: Scaling ofapplications across multiple cloud computing environments,” Proc. ofIEEE International Conference on Algorithms and Architectures forParallel Processing, vol. 0, pp. 328–336, 2010.

[11] A. Celesti, F. Tusa, M. Villari, and A. Puliafito, “How to enhance cloudarchitectures to enable cross-federation,” Proc. of IEEE InternationalConference on Cloud Computing, vol. 0, pp. 337–345, 2010.

[12] “Facebook developers page, inc, availabile athttp://developers.facebook.com/opensource/.”

[13] S. Girdzijauskas, G. Chockler, Y. Vigfusson, Y. Tock, and R. Melamed,“Magnet: practical subscription clustering for internet-scale pub-lish/subscribe,” Proc. of ACM DEBS, pp. 172–183, 2010.

[14] W. W. Terpstra, S. Behnel, L. Fiege, A. Zeidler, and A. P. Buchmann,“A peer-to-peer approach to content-based publish/subscribe,” Proc. ofACM DEBS, pp. 1–8, 2003.

[15] D. Kempe, A. Dobra, and J. Gehrke, “Gossip-based computation ofaggregate information,” Proc. of IEEE FOCS, p. 482, 2003.

[16] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan,“Chord: A scalable peer-to-peer lookup service for internet applications,”Proc. of ACM SIGCOMM, 2001.

[17] M. Maekawa, “A sqrt(n) algorithm for mutual exclusion in decentralizedsystems,” ACM Trans. Comput. Syst., 1985.

[18] I. Sriram and A. Khajeh-Hosseini, “Research agenda in cloud technolo-gies,” Proc. of ACM Symposium on Cloud Computing, 2010.

[19] J. Hoffert, D. Schmidt, and A. Gokhale, “Adapting distributed real-time and embedded publish/subscribe middleware for cloud-computingenvironments,” Proc. of ACM/IFIP/USENIX Middleware, 2010.

[20] R. Ranjan and R. Buyya, “Decentralized overlay for federation ofenterprise clouds,” Handbook of Research on Scalable Computing Tech-nologies, IGI Global, 2010.

[21] S. Agarwal, J. Dunagan, N. Jain, S. Saroiu, A. Wolman, and H. Bhogan,“Volley: Automated data placement for geo-distributed cloud services,”Proc. of USENIX NSDI, 2010.

[22] “Cloudstatus technologies, hyperic, inc.” http://www.cloudstatus.com/.[23] “Cloudswitch, inc.” http://www.cloudswitch.com/.[24] K. P. Gummadi, S. Saroiu, and S. D. Gribble., “King: Estimating latency

between arbitrary internet end hosts,” Proc. of ACM IMW, 2002.[25] A. Medina, A. Lakhina, I. Matta, and J. Byers, “Brite: An approach to

universal topology generation,” Proc of ACM MASCOTS, 2001.

670