network-aware operator placement for stream-processing systems

12
Network-Aware Operator Placement for Stream-Processing Systems Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, Margo Seltzer Division of Engineering and Applied Science, Harvard University [email protected] Abstract To use their pool of resources efficiently, distributed stream-processing systems push query operators to nodes within the network. Currently, these operators, ranging from simple filters to custom business logic, are placed man- ually at intermediate nodes along the transmission path to meet application-specific performance goals. Determining placement locations is challenging because network and node conditions change over time and because streams may interact with each other, opening venues for reuse and repo- sitioning of operators. This paper describes a stream-based overlay net- work (SBON), a layer between a stream-processing system and the physical network that manages operator placement for stream-processing systems. Our design is based on a cost space, an abstract representation of the network and on-going streams, which permits decentralized, large-scale multi-query optimization decisions. We present an evalua- tion of the SBON approach through simulation, experiments on PlanetLab, and an integration with Borealis, an exist- ing stream-processing engine. Our results show that an SBON consistently improves network utilization, provides low stream latency, and enables dynamic optimization at low engineering cost. 1. Introduction Distributed stream-processing systems (DSPSs), such as Borealis [7], Medusa [9], PIER [14], GATES [8], and Iris- Net [13], collect, process, and aggregate data across mas- sive numbers of real-time streams. These systems support applications such as monitoring of financial markets, de- tecting network intrusion, and connecting geographically- diverse sensor networks. DSPSs move query operators into the network, eliminating centralized processing. Push- ing filtering, compression, and aggregation logic down- stream can greatly reduce network traffic and increase per- formance. Even though identifying good placement nodes to host operators for a query is crucial, this operator place- ment problem is not addressed by current DSPSs. DSPSs face three challenges when solving the operator placement problem. First, the placement of operators must result in good query performance for the application, such as low delay. Second, the DSPS should use the network efficiently, minimizing the global impact of all its running queries on the network. Doing so makes the DSPS more scalable in the number of supported queries and avoids squandering network resources that could be used by other applications. Especially in shared hosting environments with a large number of streams such as PlanetLab [24], socially-acceptable use of network links is important be- cause resources are scarce and shared between participants. Third, when queries submitted by different users overlap, load and network traffic can be further reduced by finding and reusing existing operators. Although DSPSs have the need for decentralized, efficient operator placement and op- erator reuse, no common infrastructure exists for optimizing distributed data streams in a network-aware fashion. In this paper, we describe a network abstraction called a stream-based overlay network (SBON). Designed for large- scale placement of query operators in DSPSs, an SBON manages operator placement within a pool of wide-area overlay nodes in order to make efficient use of network re- sources. The SBON makes placement decisions based on its on-going knowledge of stream, network and node condi- tions and continuously optimizes placement without global knowledge of the system. By hiding the details of network measurement, operator placement decisions, and dynamic adaptation, the SBON layer simplifies the development of efficient, network-aware DSPSs. Our SBON architecture uses a scalable, decentralized, and adaptive optimization technique based on a multidi- mensional metric space we call a cost space. Every node in the SBON maintains its cost space coordinate such that the distance between two nodes represents the overhead for routing data between them. The SBON determines the placement of a query operator in this virtual space using a spring relaxation algorithm, then maps its decision back to a physical node. This algorithm minimizes the network usage of a query, while keeping the query delay low and picking nodes with adequate bandwidth. Existing operators are also

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Network-Aware Operator Placement for Stream-Processing Systems

Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, Margo SeltzerDivision of Engineering and Applied Science, Harvard University

[email protected]

Abstract

To use their pool of resources efficiently, distributedstream-processing systems push query operators to nodeswithin the network. Currently, these operators, rangingfrom simple filters to custom business logic, are placed man-ually at intermediate nodes along the transmission path tomeet application-specific performance goals. Determiningplacement locations is challenging because network andnode conditions change over time and because streams mayinteract with each other, opening venues for reuse and repo-sitioning of operators.

This paper describes a stream-based overlay net-work (SBON), a layer between a stream-processing systemand the physical network that manages operator placementfor stream-processing systems. Our design is based on acost space, an abstract representation of the network andon-going streams, which permits decentralized, large-scalemulti-query optimization decisions. We present an evalua-tion of the SBON approach through simulation, experimentson PlanetLab, and an integration with Borealis, an exist-ing stream-processing engine. Our results show that anSBON consistently improves network utilization, provideslow stream latency, and enables dynamic optimization atlow engineering cost.

1. Introduction

Distributed stream-processing systems (DSPSs), such asBorealis [7], Medusa [9], PIER [14], GATES [8], and Iris-Net [13], collect, process, and aggregate data across mas-sive numbers of real-time streams. These systems supportapplications such as monitoring of financial markets, de-tecting network intrusion, and connecting geographically-diverse sensor networks. DSPSs move query operatorsinto the network, eliminating centralized processing. Push-ing filtering, compression, and aggregation logic down-stream can greatly reduce network traffic and increase per-formance. Even though identifying good placement nodesto host operators for a query is crucial, thisoperator place-ment problemis not addressed by current DSPSs.

DSPSs face three challenges when solving the operatorplacement problem. First, the placement of operators mustresult ingood query performancefor the application, suchas low delay. Second, the DSPS shoulduse the networkefficiently, minimizing the global impact of all its runningqueries on the network. Doing so makes the DSPS morescalable in the number of supported queries and avoidssquandering network resources that could be used by otherapplications. Especially in shared hosting environmentswith a large number of streams such as PlanetLab [24],socially-acceptable use of network links is important be-cause resources are scarce and shared between participants.Third, when queries submitted by different users overlap,load and network traffic can be further reduced by findingandreusing existing operators. Although DSPSs have theneed for decentralized, efficient operator placement and op-erator reuse, no common infrastructure exists for optimizingdistributed data streams in a network-aware fashion.

In this paper, we describe a network abstraction called astream-based overlay network (SBON). Designed for large-scale placement of query operators in DSPSs, an SBONmanages operator placement within a pool of wide-areaoverlay nodes in order to make efficient use of network re-sources. The SBON makes placement decisions based onits on-going knowledge of stream, network and node condi-tions and continuously optimizes placement without globalknowledge of the system. By hiding the details of networkmeasurement, operator placement decisions, and dynamicadaptation, the SBON layer simplifies the development ofefficient, network-aware DSPSs.

Our SBON architecture uses a scalable, decentralized,and adaptive optimization technique based on a multidi-mensional metric space we call acost space. Every nodein the SBON maintains its cost space coordinate such thatthe distance between two nodes represents the overheadfor routing data between them. The SBON determines theplacement of a query operator in this virtual space using aspring relaxation algorithm, then maps its decision back to aphysical node. This algorithm minimizes the network usageof a query, while keeping the query delay low and pickingnodes with adequate bandwidth. Existing operators are also

SBON node

C

C

C

Consumer

ProducerP

P

P

P

PP

P

P

P

P

P

P

P

P

P

P

aggregation

operator

intrusion

detection

operator

filtering

operator

filtering

operatorStream S1

Stream

S2

Stream S3

Stream

seismic

monitors

Figure 1. An example of a DSPS

located through the cost space and re-used for new querieswhen possible. SBON nodes dynamically re-evaluate oper-ator placement decisions and migrate operators to new hostsbased on changing conditions.

We evaluate our SBON implementation in simulationand with a deployment on PlanetLab. Our results showthat, in simulation, our placement optimization techniqueexhibits close to optimal network usage. On PlanetLab, webenchmark the migration and reuse capabilities, in particu-lar, and measure the engineering effort required to integrateit with the Borealisstream-processing system [7]. In ourexperiments, we found that operator migration reduced net-work usage by17% and that re-use reduced it by21%.

The rest of the paper is structured as follows. In the Sec-tion 2, we give the problem background and in Section 3we describe our architecture. In Section 4, we evaluate theSBON implementation, and in Section 6 we conclude.

2. Distributed Stream-Processing Systems

A distributed stream-processing system streams datafrom multiple producers to multiple consumers via in-network processing operators, an example of which isshown in Figure 1. The figure shows three query streamsthat belong to two applications.S1 performs intrusion de-tection for several networks andS2 andS3 aggregate seis-mic data from multiple sensor network deployments. Notethat S2 andS3 share operators because the consumers areinterested in the same data.Query Model. We use the termqueryto denote the expres-sion that is submitted by a single user describing her infor-mation need. Our basic model is one of multiplequeries,each interconnecting multipleoperators. The DSPS instan-tiates the query in the network in the form of aquery streamof data tuples flowing through the network. Operators thatare part of a query stream are interconnected by overlayop-erator links, each with a certain latency and a data rate.Operator Model. Generic operators in a DSPS can be cat-egorized into three classes:producers, consumers, andop-

erators, which act as data generators, receivers, and pro-cessors, respectively. If an operator permanently resides ata physical network location, it is called apinned operator.Producers and consumers are typically pinned. In contrast,an unpinned operator, such as a join or a select operator,can be instantiated at an arbitrary node in the network. Ex-amples of processing operators include an aggregation op-erator for seismic data, a join operator for relational data,and a face recognition operator for video surveillance.Operator Placement Problem.In Figure 1, there are manynodes that can host the four pictured unpinned operators.One of the main tasks of a DSPS isoperator placement,or the selection of the physical node that should host theoperator. The quality of a given placement is quantified byanoperator placement metric.

2.1 Metrics for Operator Placement

In Section 1, we described three challenges when fac-ing the operator placement problem: achieving good appli-cation query performance, using the network efficiently toservice a query, and reusing existing operators when appro-priate. These challenges should guide the choice of operatorplacement metric.

Achieving good application-perceived query perfor-mance, such as low delay, is a basic necessity. However,optimizing for individual application performance alone ig-nores the need to support a large number of streams. InDSPSs for financial markets, network monitoring, or wide-area sensor data collection, we expect many large concur-rent streams to be a common use case. Individual streamsmust use network resources efficiently by conserving band-width and reusing existing operators where possible in or-der to support the largest number of concurrent streams.Furthermore, there is a monetary argument for minimizingbandwidth if the owners of a DSPS pay for the bandwidthusage of their system.

A tension exists between satisfying the application-perceived performance needs and minimizing overall band-width usage. At one extreme, an application-perceived de-lay metric will greedily blast data from producers to con-sumers, rather than favoring a lower bandwidth stream thatuses in-network operator placement. At the other extreme,a metric that seeks to maximize the number of concurrentstreams could make room by routing an individual streamon a circuitous path, increasing that stream’s application-perceived delay.

In this paper, we introduce a metric for operator place-ment, network usage, that trades off application delayand consumed network bandwidth. This metric is sim-ilar to what others have proposed for the evaluation ofapplication-level multicast where minimizing applicationdelay and consumed network bandwidth have been seen tocollide [10].

2.2 Current Techniques for Operator Placement

Even though all DSPSs are faced with the problem of in-network operator placement, current placement metrics donot satisfy the three operator placement goals listed above.

Most DSPSs, such asBorealis [7] and GATES[8], cur-rently avoid the operator placement problem by supportingonly pre-defined operator locations with pinned operatorsin the network. This leaves the burden of efficient operatorplacement to the system administrator, which is infeasiblefor a dynamic, large-scale system with thousands of queries.

Other DSPSs, such asMedusa[9], place operators toimprove application performance by balancing load. Thisis appropriate within a single data center but leads to poorperformance on a wide-area network, in which communi-cation latencies can dominate processing costs. In addition,these approaches are not scalable to DSPSs with thousandsof nodes that must be considered for placement.

The location of operators and corresponding relationaltables inPIER [14], a distributed database built on topof a DHT, is determined through hashing, leading to ef-fectively random placement of operators in the network.Such random distribution has good load-balancing proper-ties but causes large query delays when operators are placedat nodes distant from both producers and consumers.

To our knowledge, the only prior work on network-awareoperator placement in DSPSs is SAND [1, 2], which isproposed as an extension to Boeralis. Here, operators areplaced either at the consumer side, at the producer side, orin-network on a DHT routing path between the two end-points, depending on the bandwidth usage of a query. Ap-plications can also specify delay constraints on the place-ment path in the DHT. In previous work [20], we haveshown that DHT routing paths can lead to inefficient can-didate sets for operator placement. This is because DHTrouting tables are optimized for minimizing hop count andnot for delay or bandwidth usage. Our approach of per-forming operator placement in a cost space is more generalthan SAND because placements are not tied to DHT routingpaths.

2.3 Network Usage: A Blended Metric

We suggest a more appropriate metric for operator place-ment,network usage, that trades off overall application de-lay and network bandwidth consumption. The network us-ageu(q) is the amount of data that is in-transit for queryqat a given instant:

u(q) =∑l∈L

DR(l) Lat(l). (1)

whereL is the set of links used by the stream,DR(l) is thedata rate over linkl, andLat(l) is the latency. This capturesthe bandwidth-delay product of the query. The network us-

Distributed

Stream

Processing

System

Query plan generation

Operator instantiation

Data stream management

Operator migration

Operators

Cost space maintenance

Full query placement

Local query placement

Stream

data

Stream

data

Stream

dataSBON

Cost space

updates

Latency

measurements

Stream

monitoring

Placement

decisions

Network

Figure 2. Architecture Overview

ageU for the entire DSPS is simply the sum over all queriesin the system.

From an individual application’s perspective, this metricmay appear frightening as it seemingly says little about anindividual stream’s perceived delay. However, in this pa-per we show that negative effects on application-perceiveddelay are small. Intuitively, this is true because of the wayin which thenetwork usagemetric effectively favors nodesthat are “close” to the centroid of the producers and the con-sumers in a query.

From the network perspective, one can think of thisequation as scaling the network bandwidth by the relevantlink latencies. This interpretation captures the idea that thelonger data stays in the network, the more likely it is to tra-verse nodes and links that could be used for other queries.At the same time, a path that requires extra latency to reachan in-network operator is justified if that the operator cancommensurately reduce the overall stream bandwidth.

While high latency links are obviously poor choices forlatency-sensitive applications, they are poor choices for net-working reasons as well. High latency between two nodesindicates that the routing path traverses a long geographi-cal distance or passes through multiple physical links withintermediate routers. In both cases, the monetary cost ofoperating such a path is likely higher compared to shorterones. Also, if path congestion is causing a link’s high la-tency, this link should be avoided for the global good.

3. Architecture

In Figure 2, we show how we extend the functionalityof a DSPS with an SBON layer. Our approach augmentsexisting DSPSs with efficient operator placement, while ex-posing their full features to users. In particular, the queryoptimizer of the DSPS remains unchanged and creates anefficient query plan before passing it to the SBON layer forplacement. A network node runs two components: (1) theDSPS handles operator instantiation, migration, and state,and stream data transport and (2) the SBON layer monitorslocal performance, manages the cost space, and informs theDSPS when to migrate its services.

Stream instantiation and optimization take place overseveral stages. First, the DSPS defines a query plan through

its internal query optimizer, taking into account estimates ofdata rates and selectivity of operators, for example. Second,it passes the query plan to the SBON layer, which performsoperator placement based on current network and node con-ditions, binding every operator to a node. During the life-time of a query, nodes hosting operators periodically recon-sider local placements, potentially migrating operators.

As illustrated in Figure 2, the SBON layer interacts withthe DSPS through a simple interface: the SBON requeststhe DSPS (1) toinstantiateanddestroyoperators on the lo-cal node, (2) toconnectanddisconnectthe input and outputlinks of an operator to other operators and (3) tomigrateanoperator to a remote node. The DSPS is responsible for tear-ing down connections, packaging state, and instantiating theoperator on the new node during a migration operation.

In turn, the SBON layer monitors information about theoperators hosted on the local node in order to make opti-mization decisions: (1) the SBON is aware of the number ofinput and output linksthat an operator supports, (2) an op-erator may advertise itsselectivity(i.e.,the ratio of the inputand output data rates) or the SBON may obtain this infor-mation through runtime measurements of data rates on op-erator links, and (3) an operator exports information aboutwhether it can bemigratedbetween nodes andreusedbe-tween multiple queries.

3.1. Operator Placement Algorithm

The main challenge for an operator placement algorithmis the potentially large number of nodes that need to be con-sidered for placement. No entity has complete informationabout current network and node conditions to make an opti-mal decision. Unfortunately, simple heuristics that consideronly a subset of all nodes for placement [1] or perform alocalized search [5] risk never finding a good placement.

We wanted to design our operator placement algorithmto satisfy three basic requirements. First, it must bescalablein the number of concurrent queries, as well as the numberof network nodes where operators can be placed. This im-plies that the algorithm must be decentralized, not dependon global knowledge of network conditions, and have lowcommunication overhead. Second, the placement algorithmshould beefficientand yield “good” placements in termsof latency and network usage. Finally, placement decisionsshould beadaptiveto changing conditions, such as latency,stream data rates, and load.

The SBON achieves efficient, decentralized in-networkoperator placement and optimization through two mecha-nisms: (1) acost space, which is a metric space that cap-tures the cost for routing data between nodes and (2) are-laxation placementalgorithm, which places operators us-ing a spring relaxation technique that manages single- andmulti-query optimization.

3.1.1 Cost Space

A cost space encodes network and node measurementsfrom all nodes and is constructed in a decentralized fash-ion. By projecting the placement problem into a virtual costspace and then mapping the solution back to physical nodes,SBON nodes can compute operator placement decisions ina continuous mathematical space with the use of sophisti-cated optimization techniques. In addition, any SBON nodecan approximate globally optimal decisions. For example,an SBON node in Europe making a placement decision fora query with producers located in Japan can determine that agood placement node may be on the US West coast withouthaving to probe the node directly.

A cost space is ad-dimensional metric space where theEuclidean distance between two nodes is an estimate ofthe cost of routing data between those nodes. Each SBONnode is responsible for maintaining its own coordinate. Awide range of performance metrics can be used to definea cost space. We examine latency and load, but other re-source measures, such as availability, bandwidth, memory,or processing power could have been included instead. OurSBON implementation uses a combinedlatency/load spacewith three latency dimensions and one load dimension.Latency. The latency dimensions of a cost space form alatency space[16], where the distance between two coordi-nates is a reasonable prediction of the latency between thenodes. Several instances of synthetic network coordinatesto estimate Internet latencies exist [16, 17]. The overheadof maintaining a latency space is small because a node cancalculate its network coordinate after probing the latencyto only a small subset of nodes. That subset consists ofeither well-known landmark nodes [17] or is randomly cho-sen [11]. Since Internet latencies often violate the triangleinequality and change dynamically over time, network coor-dinates can only provide an estimate of true latency. How-ever, simulation results suggest that network coordinates,even with low dimensionality, have a small prediction errorwith a median of11% [12].

Figure 3 shows a3-dimensional latency space as calcu-lated by theVivaldi algorithm [11] with115 North Amer-ican nodes on PlanetLab. As annotated, three geographicclusters of nodes can be identified. Each node running Vi-valdi keeps track of its3-dimensional coordinate and itsconfidence in that coordinate. The algorithm works by eachnode successively refining its coordinates through periodicmeasurements to random other nodes. Each update consistsof two nodes measuring the current latency to one another,plus an exchange of their coordinates and confidences. Withthis information, nodes develop a metric space where twonodes can approximate their true latency even if they havenever exchanged measurements directly.

The latency space dynamically adapts to changing net-work conditions as nodes continuously refine their coordi-

0 20

40 60

80 100 -20

-10 0

10 20

30 40

50

-50-40-30-20-10

0 10 20 30 40 50 60

US East

US West

US Central

latency (in ms)

latency (in m

s)

late

ncy (

in m

s)

Figure 3. 3-D latency space on PlanetLab

0

50

100

150

200

0 5 10 15 20

Raw UDP latency sampleSmoothed latency sample

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6

Frac

tion o

f sam

ples

latency (seconds)

CDF

Time (hours)

Late

ncy

(mill

isec

onds

)

Figure 4. Latency samples on PlanetLab

nates. This raises the issue of stability of the latency space:if latency measurement variance is high, the latency spacemight not converge. Especially on nodes with high CPUload, application-level latency measurements may lead toskewed results [23]. On the other hand, a certain amount ofdynamism is desired for the latency space to adapt to chang-ing network conditions.

As part of developing the latency space, we found a sim-ple mechanism to create stable, adaptive coordinates. Wefeed each latency observation into a moving percentile filter,an instance of a low-pass non-linear filter. Each link main-tains its own filter and, even though latency observationshave significant variance and their distribution is heavilyskewed, the filter captures the consistent baseline of eachlink. Figure 4 shows the raw and smoothed application-level UDP latency samples gathered between two SBONnodes on PlanetLab over the course of23 hours togetherwith the CDF of all samples. The observed baseline la-tency varies due to Internet route changes. The variance inmeasurements is caused by the load on the nodes, whichaveraged40 during our sampling, and congestion on thenetwork paths. We found the filter dampens measurementvariance on the latency space without losing sensitivity tochanges in Internet routing paths. We determined empiri-cally that the20th percentile from a history of10 samples

0

500

1000

1500

2000

2500

0 20 40 60

time (in min)

Up

da

tes p

er

min

ute

Figure 5. Rate of coordinate updates

leads to a stable latency space on PlanetLab [18].We also found empirically that the latency space con-

verges quickly. As shown in Figure 5, we observedthat it takes around30 minutes for a latency spacewith 116 simultaneously-added North American PlanetLabnodes to reach stability. The median relative prediction er-ror for latencies was9% for these nodes. In addition, the la-tency space can gracefully handle node churn because newnodes can learn their coordinates after a small number ofmeasurements.Load. The performance of a placed operator may be dom-inated by the CPU load of the hosting node. Therefore, anSBON node monitors its own load average and includes thisas another dimension in its coordinate. A scaling functiondetermines the weight given to load with respect to latency.As a result, nodes with a high load will move away in thecost space making them less likely candidates for placementdecision. If particularly load-sensitive operators were to berun on the system, a multiplicative or non-linear additivefunction could be used to further penalize nodes with highload. For simplicity, we adopt a linear mapping in our ex-periments.

3.1.2 Relaxation Placement

Our placement algorithm, calledRELAXATION , makes ef-ficient operator placement decisions within the cost space.The main idea behindRELAXATION is to partition the place-ment problem into two phases. First, an unpinned operatorin a query isplacedusing a spring relaxation technique inthe latency dimensions of the virtual cost space, and thenthe solution ismappedto the closest physical SBON nodein the cost space.

Using spring relaxation to place services in the costspace has several advantages. First, its iterative nature al-lows placement decisions to adapt to changing network,node, and query conditions. Second, spring relaxation isdecentralized and does not require coordination betweennodes. Finally, it naturally supports cross-query optimiza-tion, which considers the impact of shared placement deci-sions.

RELAXATION models the overlay network of queries asa collection of massless bodies (operators) connected by

n

Latency

DR

P1

C

S

P2

P1

C

S

P2

(a) Before Relaxation (b) After Relaxation

Figure 6. Example of RELAXATION

springs (operator links). Pinned operators have a fixed lo-cation, whereas unpinned operators can move freely. Thegoal is to compute the rest lengths of this system of inter-connected springs. The average force~Fi experienced by aspringi is ~Fi = 1

2ki~si whereki is the spring constant and~si

is the extension vector. To minimize network usage in thelatency space with spring relaxation, we set the spring ex-tension to the latency,sl = Lat(l), and the spring constantto the data rate transfered over that link,kl = DR(l).

Optimizing placement then becomes solving for the lowenergy state of the system, minimizing the sum of the po-tential energiesEi stored in the springs:

arg min~si

∑i

Ei =∑

i

~Fi~si =∑

i

12ki~si

2 (2)

Thus, spring relaxation minimizes∑l∈L

DR(l) Lat(l)2, (3)

which includes the network usage metricu(q). The addi-tional squared exponent in the function ensures that there isa unique solution from a set of placements with equal net-work usage.

We illustrate the placement of a query in latency spacein Figure 6. The thickness of each link is proportional toits data rate and length to link latency. In (a), the locationof the unpinned operatorS has become sub-optimal due tonetwork dynamics and “stretches” the operator links to pro-ducersP1 andP2. This causes the operator to migrate to abetter noden, which is shown in (b). In general, a strongforce pulling an unpinned operator in a particular directioncan either be caused by multiple operator links or a singleoperator link that has a high data rate.

Spring relaxation can be implemented efficiently with adecentralized algorithm. Each spring is relaxed indepen-dently by moving it a small amount, potentially affectingthe extensions of other springs in the system. After a num-ber of such relaxation iterations, the system of springs con-verges towards a low energy state because each iteration de-creases the total force in the system. Another advantage ofthis approach is that it naturally supports placement deci-sions when queries are interconnected with shared opera-

V IRTUAL -PLACE(S)1 repeat2 ~F ← ~03 for eachSparent in Parents(S)4 do ~F ← ~F + (~S − ~Sparent)×DR(S, Sparent)5 for eachSchild in Children(S)6 do ~F ← ~F + (~S − ~Schild)×DR(Schild, S)7 ~S ← ~S + ~F × δ8 until ‖~F‖ < Ft

Figure 7. RELAXATION algorithm

tors. By viewing the entire query graph as a network ofsprings, the placement of an operator can then potentiallyaffect the placements of other operators with transitivelyshared operator links. In practice, placement effects willbe more localized because we factor in the operator migra-tion cost to ensure that placement decisions do not createoscillation in the system (see Section 3.2).

3.1.3 Algorithm

The SBON layer employs a two-step placement algo-rithm that has a straightforward decentralized implemen-tation. Consider an SBON with its queries. For each un-pinned operator, the following two steps are executed: theoperator is firstplacedusingRELAXATION in the latency di-mensions of the cost space with respect to the location of itsneighbors in the query; next, the computed cost space co-ordinate ismappedto a physical SBON node. To make theplacement decision adaptable, the placement and mappingsteps are repeated continuously for all unpinned operators.This does not cause a large communication overhead be-cause each SBON node can perform placement decisionswith local knowledge after learning only the cost space co-ordinates of its direct query neighbors. Because all nodesare a part of the cost space, any node can perform the virtualoperator placement and physical operator mapping stages.We now describe the two steps in more detail.Virtual Operator Placement. Figure 7 shows the pseudo-code used byRELAXATION to find the virtual placement foran unpinned operator in the cost space. The VIRTUAL -PLACE function is executed by the SBON optimizers onevery SBON node. The current cost space coordinate ofthe operator is~S. Only the latency dimensions of the costspace coordinate are considered because, ideally, the oper-ator should be placed at a node with lowest possible load.A new unpinned operator without a coordinate is assigneda provisional location close to the coordinate origin at ran-dom. The total force~F that this operator experiences is cal-culated by iterating over its parent and child operators con-nected through input and output operator links and retriev-ing their current cost space coordinates over the network.The force~F is updated with the distance in cost space be-tween the current coordinate~S and the remote coordinatesscaled by the data rate used by the query link (lines 3–6).

The data rates of a new query are based on estimates pro-vided by DSPS, which are refined through network mea-surements once stream data is flowing. The movement ofthe operator through latency space is dampened by a fac-tor δ to avoid unnecessary oscillation around the optimallocation. The cost space coordinate of the operator is up-dated iteratively (line 7) until the magnitude of the force~Fis larger than a force thresholdFt (line 8). The force thresh-old sets the desired precision of the virtual operator place-ment. After the operator coordinate has been computed, theoperator is mapped to a physical SBON node. We deter-mined empirically thatδ = 0.1 andFt = 1 work well onPlanetLab and used these values in our experiments.Physical Operator Mapping. The second stage of theplacement algorithm maps the target cost space coordinateto a physical node. It uses the virtual placement coordinate(with the load dimension equal to zero) to identify a regionof the cost space and then proceeds to actually select a node(with potentially non-zero load) within that region based onapplication requirements.

The stage falls into five steps:

1. Findk nodes whose coordinates are near the target costspace coordinate.

2. Contact this small set of nodes directly to discovertheir current operators and resources.

3. Sort the list by distance to the target coordinate.

4. Walk the sorted list, returning the first node alreadyrunning the operator.

5. Failing that, return the nearest node that meets the ap-plication’s resource criteria

Step (1) is done by ak-nearest neighbor search in the costspace. We usedk = 10 in our experiments. This problemhas been well-studied in the literature and our prototype cur-rently solves it centrally by performing a simple brute-forcesearch. Several distributed solutions have been deployedwith success and are directly applicable. The two primaryapproaches have been based on space-filling curves [3, 22]and on greedy geographic routing in a multi-dimensionalmetric space [21]. We have developed initial implementa-tions based on each of these approaches [19] and comparingthem is part of on-going work. If no nodes are found, weexpand the search parameterk.

Step (5) ensures that operators are only placed on nodesthat have sufficient resources to support the operator. Re-sources include node resources, such as CPU, memory, anddisk space, and also network resources, such as availablenetwork bandwidth. Due to a lack of resources operatorsometimes are placed sub-optimally in terms of our networkusage metric. However, this distributes processing load ofoperators across nodes, avoiding localized hotspots.

3.2. Placement Optimization

Operators are initially placed by afull placement opti-mizer and the placements are periodically re-evaluated bylocal placement optimizers.Full Placement Optimizer. To create a new stream, aDSPS passes a query plan to the SBON layer on any node.The full placement optimizerfinds an initial placement forall unpinned operators in the new query usingRELAXATION .The full placement optimizer runs only once and has com-plete knowledge of the entire query plan. After receivingphysical placements for all unpinned operators from the fullplacement optimizer, the SBON layer instructs the DSPS toinstantiate the new operators and create the stream.

When a new query is added that has the same structureand operators as an existing query, network and node re-sources can be saved by reusing operators between queries.This is related to multi-query optimization in traditionaldatabase systems [15]. The cost space can guide the searchfor reusable operators and reduce the complexity of hav-ing to consider all operators in the system: when the fullplacement optimizer performs the physical operator map-ping for a desired placement coordinate, it also retrieves thecurrently hosted operators at nodes in thatregionof the costspace. If it finds a reusable operator that produces the samedata as the operator to be placed, it reuses this operator andthe corresponding sub-query instead of instantiating a newinstance.Local Placement Optimizer. The local placement opti-mizerre-evaluate the placement decisions of locally runningoperators and initiates migrations of operators when neces-sary. It runs periodically on every SBON node and iteratesover all local unpinned operators, re-placing them in the la-tency space. It then maps the new virtual placement coor-dinate to a physical node only when the displacement fromthe previous coordinate is larger than a cost space thresh-old to avoid unnecessary nearest neighbor lookups. Thisthreshold value depends on the perceived cost of a lookup.After mapping the coordinate to a physical node in the la-tency/load space, the local optimizer calculates the saving innetwork usage for this new placement. To prevent needlessmigrations, an operator is migrated only if the saving in net-work usage is higher than aminimum migration threshold.This threshold depends on the cost of operator migrationsand ensures that migrations are amortized over the lifetimeof a query.

One might think that two local optimizers running ondifferent nodes could potentially lead to oscillations of op-erator placements. Since all local optimizers have accessto the same cost space, however, their placement decisionsagree with each other because they are based on the sameinformation. In rare cases, rounding errors determine theoutcome of a placement that is exactly in the middle be-

tween physical nodes in the cost space. Then the minimummigration threshold will ensure that an operator remains atits current location.

Since local optimizers are running concurrently, it is im-portant that they do not interfere with each other. Therefore,migrations are done atomically in the SBON. When a localoptimizer decides to migrate an operator, it first notifies theDSPS to stop the data flow through the query. While a queryis stopped, local optimizers running on other SBON nodesdo not attempt optimizations for the same query. After anoptimizer has finished running (and carried out any migra-tions), the flow of data is restarted. An additional benefit isthat no data is lost during the migration because operatorsare explicitly notified to stop or buffer data production.

4. Evaluation

We evaluated our implementation of the SBON layer insimulation and with experiments on PlanetLab. Our resultsfall into three groups: we show in simulation thatRELAX -ATION minimizes network usage while providing low delayto applications (4.1); we investigate how the SBON layermigrates and reuses operators on PlanetLab (4.2); and welink the SBON layer to a current DSPS that does not per-form network-aware operator placement (4.3). Our experi-ments show how DSPS can benefit from our placement op-timization techniques with little implementation effort.

4.1. Placement Efficiency in Simulation

We first wanted to compare the performance ofRELAX -ATION to other placement algorithms. We implemented thealgorithms in a discrete-event simulator. Using simulationhas the advantage of giving us complete control over thenetwork topology and the experiment, so we could ensurethat the placement algorithms were compared under identi-cal conditions.

We examined five alternative placement approaches:OP-TIMAL chooses the best possible placement with the lowestnetwork usage based on an exhaustive search over all pos-sible placements. This requires global knowledge of theentire network and is not feasible in practice, but it givesa baseline for the performance of the algorithms.IP MUL-TICAST places unpinned operators at nodes that would berouters hosting branches of an IP multicast tree rooted ateach producer. For each query, we calculate the IP mul-ticast tree by taking the union of all the IP unicast routesfrom the producers to the consumers.PRODUCERrandomlypicks one of the nodes hosting producers for placement ofthe unpinned operator.CONSUMERplaces the operator at thenode hosting the consumer. Finally,RANDOM picks a phys-ical node for each operator at random and serves as a worstcase comparison.

We simulated1550 nodes in a transit-stub topology gen-

Table 1. Increase in network usage and delayAlgorithm Network Usage Penalty Delay PenaltyOPTIMAL 0% 13%RELAXATION 15% 24%IP MULTICAST 27% 0%PRODUCER 43% 75%CONSUMER 60% 0%RANDOM 81% 76%

erated by the Georgia Tech topology generator [25]. Thetopology has10 transit domains with5 nodes, each con-nected to150 stub domains with10 nodes on average. Rout-ing tables for the topology were calculated using the routingpolicy weights assigned by the topology generator to reflectInternet routing policy. The network diameter of the topol-ogy was878 ms. Producers and consumers were randomlydistributed across the network. Only a single producer op-erator was hosted at any physical node. A query consistedof four producers, an intermediary join and select operator,and a consumer. We refer to the join and select as a singleaggregatoroperator. For simplicyt, all producers sent dataat a rate of2 KB/s and the aggregator had a selectivity of8:1. We placed the aggregators on routers in the transit-stubtopology; in a real deployment, they would be part of low-latency networks connected to the routers. The placementof operators in transit domains reflects that some nodes in alarge-scale DSPS might be hosted at Internet exchanges.Network Usage.We show the average increase in networkusage after placing1000 queries in Table 1, whereOPTI-MAL minimizes network usage.RELAXATION performs well,which is expected because network usage is the metric thatit optimizes for in its cost space.IP MULTICAST is less effi-cient because it minimizes routing hops and not necessarilyrouting latency. Therefore,RELAXATION manages to findbetter placement nodes that are not on the direct IP routingpath from the producer to consumer.PRODUCER does bet-ter thanCONSUMERbecause, by placing the selective opera-tors on one of the producers, data from only three producersneeds to be sent through the network to the operator.RAN-DOM exhibits poor network usage andCONSUMER is onlymarginally better.

The distribution of network usage over all queries isshown in Figure 8. It illustrates the clear division betweenthe evaluated placement algorithms. Looking at the80thpercentile,RELAXATION manages to cause only14% morenetwork traffic thanOPTIMAL , whereasPRODUCERresults in48% more traffic.Delay Penalty. We examined the delay penalty that an ap-plication experiences due to each placement strategy. Wedefine delay penalty as the increase in delay when com-pared to the IP routing delay on the longest path from anyproducer to the consumer. Placing all unpinned operatorson the consumer node (as done byCONSUMER) achieves the

0

10

20

30

40

50

60

70

80

90

100

1000 1500 2000 2500 3000 3500 4000 4500 5000 5500

Perc

enta

ge o

f Que

ries

Network Usage (in KB)

Optimal

Optim

al

IP Multicast

IP M

ultica

st

Relaxation

Relax

ation

Producer

Prod

ucer

Consumer

Cons

umer

Random

Rand

om

Figure 8. Network usage

lowest delay penalty of0%.Table 1 shows the average delay penalty after placing

1000 queries. As expected,CONSUMER and IP MULTICAST

have no delay penalty because they place operators onlyalong IP routing paths.RELAXATION and OPTIMAL intro-duce only a small delay penalty of24% and13%, respec-tively. PRODUCERandRANDOM perform worst because theyadd a random delay overhead.

Figure 9 shows the distribution of delay penalty. A inter-esting result is that bothOPTIMAL andRELAXATION achievelower delay penalties thanCONSUMER for about15% of thequeries. This is due to the fact that the routing weightsin the transit-stub topology reflect the fact that IP routessometimes have sub-optimal latencies.OPTIMAL and RE-LAXATION are then able to reduce the delay by choosing dif-ferent overlay routes with a higher hop count but lower de-lay. Note thatPRODUCER andRANDOM have a long tail ofbad placement decisions with a large delay penalty. The re-sults from the simulations portray that, whileRELAXATION

is optimizing for network usage, its impact on delay penalty,an application-oriented metric, remains low.

4.2. Placement Optimization on PlanetLab

Our second set of experiments examine operator migra-tion and reuse on PlanetLab. We first wanted to monitorthe behavior of a single pair of queries over a long duration.We wanted to see if and to what extent migration, driven bya changing cost space, would change a single continuousquery in detail.

We started two simple queries and monitored them for20 hours. Both queries had the same pair of producersP1andP2, a single operator that perfomed a select and a join,and the same consumer. The producers produced mono-tonically numbered tuples at a constant rate. The SQLfor the query wasSELECT * FROM P1,P2 WHEREP1.data=P2.data AND mod(P1.data,4)=0 , ef-fectively selecting1

4 of the tuples for forwarding.We used130 PlanetLab nodes from diverse geographic

0

10

20

30

40

50

60

70

80

90

100

0.5 1 1.5 2 2.5 3 3.5 4 4.5

Perc

enta

ge o

f Que

ries

Delay Penalty

ConsumerOptimalIP MulticastRelaxationProducerRandom

Optim

alRe

laxati

on

Prod

ucer

Cons

umer

/IP M

ultic

ast

Rand

om

Figure 9. Delay penalty

locations as our experimental testbed. We used a la-tency/load cost space, including load as a fourth dimensionwith equal-weight. We measured application-perceived tu-ple delay by having the recipient send a small reply packetfor each tuple sent. This practical measure approximates“true” delay, given that the error in local clocks on Planet-Lab can exceed tuple delay. The producers sent a datapacket once a second for each query.

The pair of single queries had two producers on the USWest Coast (berkeleyand ucsc) and had the consumer inEurope (upc.es). Starting at the same time, each query ranfor 20 hours. Figure 10 shows the network usage for eachquery. Each point depicts the instantaneous network us-age of the query. Vertical bars denote operator migrations.The results show that, after4 hours, the migrating query re-duces its network usage by30% from migrationC onwardby placing the operator next to one of the producers. Fur-ther migrations keep the operator near the producers, whichoccur due to changes in the trans-Atlantic link and the loadon the West Coast nodes.

A more detailed look at the effect of each placement de-cision (A, B, ...) is illustrated in Figure 11. It illustratesthe effect of the migrationsA throughF on the query. Thenumbers on top of each bar graph give the network usagenormalized by the network usage of the first placement,A.This placement is the same as for the non-migratable queryand, therefore provides a comparison point for how migra-tion has improved network usage while also reducing delayby a small margin. The bar graphs denote each link’s con-tribution to total network usage over time and the length ofthe lines denotes latency. In columnA, the data initiallytravels from two producers (ucscandberkeley), through anaggregator (uoregon), where it is joined and selected, andsent across the Atlantic toupc.es. Over 20 hours, the op-erator migrates towashington(B), thenberkeley(C), thenucsc(D), then back toberkeley(E), and thenucsc(F ). Op-erator migration minimizes the length of the portions of thequery before the selection, reducing network usage.

0.0

20.0

40.0

60.0

80.0

100.0

0 5 10 15 20

Net

wor

k U

sage

(Byt

es)

Time (hours)

A B C D E F

With Migration0.0

20.0

40.0

60.0

80.0

100.0

0 5 10 15 20

Net

wor

k U

sage

(Byt

es)

Time (hours)

Without Migration

Figure 10. Network usage of a pair of queries, one with a migration (left) and one without (right)

ucsc berkeley(1)

uoregon

Aupc.es

B C D E F

ucsc berkeley(1)

washington

upc.es

ucscberkeley(1)

berkeley(4)

upc.es

ucscberkeley(1)

ucsc

upc.es

ucsc berkeley(1)

berkeley(13)

upc.es

ucsc berkeley(1)

ucsc

upc.es

1.00 1.04

0.72 0.71 0.72 0.71

Late

ncy

III

II I

II

I

III

Net

wor

k U

sage

Figure 11. Effect of operator migration

4.2.1 Operator Migration

The third experiment examines the aggregate effect ofmigrations due to changes in network and node conditionsover an extended period of time. Given that PlanetLabwas heavily loaded when we ran our experiments, we alsowanted to quantify the effect incorporating node load intothe cost space would have on network usage. Like the previ-ous experiment, each query streamed data from two produc-ers through an aggregator/selector, and out to a consumer.Data rates, tuple sizes, and constituent nodes were the sameas in the previous experiment. Because of potentially highvariation between one set of〈producer,producer,consumer〉triples and another, we created our migratable and non-migratable queries in pairs, each with the same set of end-points. We created24 pairs of queries, which ran for5 hours, and recorded the network usage of each.

We show the change in the relative network usage ofeach migratable query in Figure 12, compared to the querywith the same producers and consumer without migrationenabled. Values below0 indicate adecreasein network us-age when migration is enabled. Permitting migration of-ten leads to lower mean network usage, although in a smallnumber of cases the usage increases. We find that migra-tions were effective in decreasing network usage for75%of the query pairs. Data from the same experiment alsoshow that the set of queries that are permitted to migrateuse16.9% less of the total network capacity compared tonon-migratable queries. The migratable queries performed

an average of3.5 migrations each, suggesting that migra-tions are occurring at a moderate rate. Allowing queries tomigrate also reduced aggregate query delay by10.5%.

We also ran the same experiment with a pure latencyspace. The results show that including load resulted ina modest improvement in per-query latency and aggregatenetwork usage with similar numbers of migrations. We haveomitted the results due to space constraints.

4.2.2 Operator Reuse

Another advantage of the SBON model is that the pro-cessing of operators can be reused across different queries.In this way, the SBON layer avoids transmitting multiple re-dundant copies of the same data across the network. This isespecially important when transporting data over long-haultrans-oceanic links, which may experience high congestionand long latencies. The SBON automatically detects oppor-tunities for data sharing across queries and combines op-erators with the same inputs whenever those operators areplaced in a similar region of the cost space byRELAXATION .

To demonstrate this, we created four pairs of queries.Each query contained a consumer on the US West Coastrequesting aggregated data from the same two producers inEurope. In the first case, we disabled operator reuse, re-quiring each query to instantiate its own aggregation oper-ator. In the second case, operator reuse was enabled, andaggregation was shared across three of the four queries (asdetermined byRELAXATION ). Because the shared aggrega-tion operator was generating only a single copy of its outputdata, the query also included amulticastoperator which re-distributed the shared data to the individual consumers. Fig-ure 13 shows the logical topology of the two sets of queries.

As expected, the overall network usage with operatorreuse enabled (119.55 bytes) was lower than when reusewas disabled (152.32 bytes), a savings of 21%. We expectthe savings to be greater with a larger number of queriessharing the same data. In many stream applications, weanticipate data and operator popularity to follow Zipf-likedistributions, as has been demonstrated for other types ofInternet traffic [6], further highlighting the importance ofoperator reuse in DSPSs.

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25Rel

ativ

e N

etw

ork

Usa

ge in

crea

se w

ith m

igra

tion

Query pair number

Decreased Network Usage

Figure 12. Aggregate effect of migrationJoin/Select

Without Reuse

Producers

Consumers

Join/Select

With Reuse

Producers

Consumers

Multicast

Figure 13. Reuse Topology

4.3. Borealis Extension

We believe that current DSPSs can benefit from anSBON layer to achieve network-aware operator placement.To test this claim, we extendedBorealis [7], an existingdistributed stream processing engine, with our SBON im-plementation. This section describes how, with few codechanges, the resulting system retains the full query func-tionality of Borealis while achieving the network usage anddelay benefits shown in the previous experiments.

In Borealis, part of a running query, orchunkof boxes,can be migrated to different nodes. This migration capa-bility is used by an agoric contract-based optimizer [4] thatdecides to buy or sell units of work based on internal queueusage. Borealis can also handle manual operator reuse, inwhich a single query can serve multiple consumers. Thismeans that Borealis can take full advantage of the adaptiveplacement decisions made by the SBON.

In an attempt to obtain a useful measure of integrationeffort, the Borealis extension was performed with no priorcode-level knowledge of the Borealis code base. Once theBorealis codebase was understood, the actual code changeswere minimal and were completed and tested over thespan of several days. The integration involved fewer than400 lines of code, mostly for communication between ourSBON layer (written in Java) and Borealis (written in C++).

Our extension is structured as follows. When a user cre-ates a new query, Borealis creates a query plan that is passedto the SBON layer. Since Borealis requires a fixed initiallocation for operators, the initial placement result of theSBON full query optimizer is used as the starting point forthe query passed to Borealis. This starting location is auto-

0 50 100 150 200unibo2

ucalgary1ucdavis1

utah2rice1

wayne1rice3

wustl2columbia1harvard1

dartmouth2duke1nuy2

upenn1mit1

purdue1harvard2

mit3utexas3

Network Usage (in bytes)

Nod

e

SBON placement

Best

Worst

Figure 14. Operator placement withSBON/Borealis.

matically sent to Borealis, which in turn starts the Borealisdata flow. From this point onwards, the SBON optimizersends migrate commands to the Borealis operator executionenvironment via their operator interface. This interface di-rectly invokes the migration code implemented in Borealis.

On PlanetLab, we used Borealis to deploy a set of iden-tical queries that had two pinned producers, one consumer,and an unpinned windowing aggregate operator. The pinnedproducers were atutexas3and harvard1, and the pinnedconsumer was atucdavis1.1 The producers published dataat the same rate, while the operator had a measured selectiv-ity of 12%. One query was placed by the SBON optimizeron one of19 PlanetLab nodes. To compare against theefficiency of manual placement, we simultaneously addedqueries that pinned their operator on each of the PlanetLabnodes in turn, which is akin to conducting an exhaustivesearch for the best placement in terms of network usage.

As can be seen in Figure 14, the query placed by theSBON optimizer has one of the lowest network usages ofany of the placement possibilities. While there are a num-ber of nodes that are relatively good choices,RELAXATION

favors nodes farther from the query endpoints (because ofspring pull) when there are good mid-cost space candidates.This figure is representative of how the SBON performs agood initial placement decision without needing to performthe exhaustive search, and how it can help Borealis with theinitial placement.

5. Future Work

As future work, we plan to explore extensions of ourcost space approach to include other metrics, such as avail-able bandwidth and node reliability. We also intend to an-alyze the convergence properties of relaxation placement inmore detail with the placement of more complex queries.Especially with bursty workloads, additional dampening ofplacement decisions may be necessary.

1The Planetlab overlay involves machines placed at different univer-sities and research institutions. The notationutexas3signifies the thirdmachine that is placed at University of Texas at Austin.

Another area for future investigation is the applicabilityof classical database optimization techniques in the settingof a cost space. This introduces an interesting tension be-tween the desire to make the SBON’s placement optimizeragnostic to operator semantics and the need to give theSBON enough information to make good optimization de-cisions. For example, some stream operators, such as joins,can be decomposed, which would allow the SBON to placethem closer to producers. Appropriate interfaces betweenthe DSPS’s query optimizer and the SBON’s placement op-timizer could be designed for a tighter integration.

6. Conclusions

We introduced an SBON layer as a tool for distributedstream-processing systems to assist with the operator place-ment problem. A DSPS generates a query plan with pinnedand unpinned operators, and the SBON manages the taskof positioning unpinned operators efficiently using its on-going knowledge of network and node conditions. We solvethe operator placement problem with a two step approach:(1) the placement decision is made in a virtual cost space,and (2) the cost space coordinate is mapped to a physicalnode. The cost space encodes network and node measure-ments from all nodes in a scalable and decentralized man-ner, enabling individual nodes to make adaptive placementdecisions with local information.

Through PlanetLab-based experimentation and simula-tion we demonstrated that the SBON creates data streamsthat minimize overall network usage, a characteristic thatis increasingly important as the number of data streams inthe network increases. We showed that SBONs do so witha combination of intelligent placement, based on the char-acteristics of the stream, and of operator reuse. We believethat the relatively easy integration of our SBON layer withan existing stream processing system demonstrates its ben-efits. We hope this work will induce future research effortsinto large-scale query optimization for DSPSs that leverageour approach.

References

[1] Y. Ahmad and U. Cetintemel. Network-Aware QueryProcessing for Stream-based Applications. InProc. ofVLDB’04, Toronto, Canada, Aug. 2004.

[2] Y. Ahmad, U. Cetintemel, J. Jannotti, A. Zgolinski, andS. Zdonik. Network Awareness in Internet-Scale StreamProcessing.IEEE Data Eng. Bulletin, 28(1), Mar. 2005.

[3] A. Andrzejak and Z. Xu. Scalable, Efficient Range Queriesfor Grid Information Services. InProc. of P2P, Linkoping,Sweden, September 2002.

[4] M. Balazinska, H. Balakrishnan, and M. Stonebraker.Contract-Based Load Management in Federated DistributedSystems. InProc. of NSDI’04, Mar. 2004.

[5] B. J. Bonfils and P. Bonnet. Adaptive and DecentralizedOperator Placement for In-Network Query Processing. InProc. of IPSN, Palo Alto, CA, Apr. 2003.

[6] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. WebCaching and Zipf-like Distributions: Evidence and Implica-tions. InProc. of INFOCOM, New York, NY, March 1999.

[7] U. Centintemel, M. Cherniack, J.-H. Hwang, W. Lindner,A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, andS. Zdonik. The Design of the Borealis Stream ProcessingEngine. InProc. of CIDR, Asilomar, CA, Jan. 2005.

[8] L. Chen, K. Reddy, and G. Agrawal. GATES: A Grid-BasedMiddleware for Processing Distributed Data Streams. InProc. of HPDC, Honolulu, HI, June 2004.

[9] M. Cherniack, H. Balakrishnan, M. Balazinska, et al. Scal-able Distributed Stream Processing. InProc. of CIDR,Asilomar, CA, Jan. 2003.

[10] Y.-H. Chu, S. G. Rao, and H. Zhang. A Case for End SystemMulticast. InProc. of SIGMETRICS, June 2000.

[11] R. Cox, F. Dabek, F. Kaashoek, J. Li, and R. Morris. Practi-cal, Distributed Network Coordinates. InProc. of HotNets-II , Cambridge, MA, Nov. 2003.

[12] F. Dabek, R. Cox, F. Kaashoek, and R. Morris. Vivaldi:A Decentralized Network Coordinate System. InProc. ofSIGCOMM, Portland, OR, Aug. 2004.

[13] P. B. Gibbons, B. Karp, Y. Ke, S. Nath, and S. Seshan. Iris-Net: An Architecture for a World-Wide Sensor Web.IEEEPervasive Computing, 2(4), Oct. 2003.

[14] R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo,S. Shenker, and I. Stoica. Querying the Internet with PIER.In Proc. of VLDB, Berlin, Germany, Sept. 2003.

[15] D. Kossmann. The State of the Art in Distributed QueryProcessing.ACM Computing Surveys, 32(4), Dec. 2000.

[16] T. E. Ng and H. Zhan. Predicting Internet Network Dis-tance with Coordinates-Based Approaches. InProc. of IN-FOCOM, New York, NY, June 2002.

[17] M. Pias, J. Crowcroft, S. Wilbur, S. Bhatti, and T. Harris.Lighthouses for Scalable Distributed Location. InProc. ofIPTPS, Berkeley, CA, Feb. 2003.

[18] P. Pietzuch, J. Ledlie, and M. Seltzer. Supporting NetworkCoordinates on PlanetLab. InProc. of WORLDS’05, SanFrancisco, CA, Nov. 2005.

[19] P. Pietzuch, J. Ledlie, J. Shneidman, M. Welsh, M. Rous-soupolos, and M. Seltzer. Service Placement in Stream-Based Overlay Networks. Poster, NSDI’05, May 2005.

[20] P. Pietzuch, J. Shneidman, J. Ledlie, M. Welsh, M. Seltzer,and M. Roussopoulos. Evaluating DHT-Based ServicePlacement for Stream-Based Overlays. Proc. of IPTPS’05.

[21] S. Ratnasamy, P. Francis, M. Handley, R. Karp, andS. Shenker. A Scalable Content Addressable Network. InProc. of SIGCOMM, San Diego, CA, Aug. 2001.

[22] C. Schmidt and M. Parashar. Flexible Information Dis-covery in Decentralized Distributed Systems. InProc. ofHPDC, Seattle, WA, June 2003.

[23] K. Shanahan and M. J. Freedman. Locality Prediction forOblivious Clients. InProc. of IPTPS, Ithaca, NY, Feb. 2005.

[24] The Planetlab Consortium. http://www.planetlab.org, 2004.[25] E. W. Zegura, K. L. Calvert, and S. Bhattacharjee. How to

Model an Internetwork. InProc. of INFOCOM, San Fran-cisco, CA, Mar. 1996.

This material is based upon work supported by the National Science Foun-

dation under Grant Number 0330244.