tag: a tiny aggregation service for ad-hoc sensor networksdga/15-849/papers/madden-osdi2002.pdf ·...

Draft submitted for publication. Copyright 2002 Samuel Madden, et al. 1

TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks

Samuel Madden, Michael J. Franklin and Joseph Hellerstein Wei Hong�madden, franklin, jmh � @cs.berkeley.edu [email protected]

UC Berkeley Intel Berkeley Research Lab

Abstract

We present the Tiny AGgregation (TAG) service for ag-gregation in TinyOS. TAG allows users to express sim-ple, declarative queries and have them distributed andexecuted efficiently in networks of low-power, wirelesssensors. We discuss various generic properties of ag-gregates, and show how those properties affect the per-formance of our in-network approach. We include aperformance study demonstrating the advantages of ourapproach over traditional centralized, out-of-networkmethods, and discuss a variety of optimizations for im-proving the performance and fault-tolerance of the basicsolution.

1 Introduction

Recent advances in computing technology have led tothe production of a new class of computing device: thewireless, battery powered, smart sensor. These new sen-sors are active, full fledged computers, capable not onlyof measuring real world phenomena but also filtering,sharing, and combining those measurements. One ex-ample of such small sensor devices are the motes underdevelopment at UC Berkeley. Current generation motesare roughly 2cm x 4cm x 1cm and are equipped with aradio, a processor, memory, a small battery pack, and asuite of sensors. The mote operating system, TinyOS,provides a set of primitives designed to facilitate the de-ployment of motes in ad-hoc networks. In such net-works, devices can locate each other and route datawithout prior knowledge of or assumptions about thenetwork topology, thereby allowing the network topol-ogy to change as devices move, run out of power, orexperience shifting waves of interference.

Due to the relative ease of deployment of mote-basedsensor networks, practitioners in a variety of fields havebegun considering them for a range of monitoring anddata collection tasks. For example: civil engineers areusing motes to monitor building integrity during earth-quakes [1]; biologists are planning a mote deploymentfor habitat monitoring of Storm Petrels on Great DuckIsland [14] off the coast of Maine; administrators oflarge computer clusters are interested in using motes tomonitor the temperature and power usage in their datacenters.

Sensor applications depend on the ability to extractneeded data from the network. Often, this data consistsof summaries (or aggregations) rather than raw sensorreadings. Other researchers have noted the importanceof data aggregation in sensor networks [9, 7, 8], butthis previous work has tended to view aggregation asan application-specific mechanism that would be pro-grammed into the devices on an as needed basis. In con-trast, our position is that because aggregation is so cen-tral to emerging sensor network applications, it must beprovided as a core service by the system software. Fur-thermore, we believe that such a service can and shouldprovide generic aggregation functionality, allowing it tobe easily invoked and manipulated by applications tosatisfy a varied and dynamic set of user requirements.

1.1 The TAG Approach

We have developed Tiny AGgregation (TAG), a genericaggregation service for ad hoc networks of TinyOSmotes. There are two essential attributes of this service.First, it provides a simple, declarative interface for ag-gregation, inspired by aggregation operators in databasequery languages. Second, it intelligently distributes andexecutes aggregation operators in the sensor network ina time and power-efficient manner, and is sensitive tothe resource constraints and lossy communication prop-erties of wireless sensor mote networks. TAG processesaggregates in network by computing over the data as itflows through the sensors, discarding irrelevant data andcombining relevant readings into more compact recordswhenever possible.

TAG operates as follows: users pose aggregationqueries from a powered, storage-rich basestation. Op-erators that implement the query are distributed into thenetwork by piggybacking on the existing ad hoc net-working protocol. Sensors route data back towards theuser through a routing tree rooted at the basestation. Asdata flows up this tree, it is aggregated according to anaggregation function and value-based partitioning spec-ified in the query. For example, consider the problem ofcounting the number of nodes in a network of indeter-minate size. First, the request to count is injected intothe network. Then, each leaf node in the tree reports acount of 1 to their parent; interior nodes sum the countof their children, add 1 to it, and report that value to


their parent. Counts propagate up the tree in this man-ner, and flow out at the root.

1.2 Overview of the Paper

The contributions of this paper are four-fold: first, wepropose a simple, SQL-like declarative language for ex-pressing aggregation queries over streaming sensor dataand identify key properties of aggregation functions thataffect the extent to which they can be efficiently pro-cessed in-network. Second, we demonstrate how exe-cuting queries in-network can yield an order of mag-nitude reduction in communication compared to cen-tralized approaches. Third, we show that, by adoptinga well-defined, declarative query language as a levelof abstraction between the user and specific network-ing and routing protocols, a number of optimizationscan be transparently applied to further reduce the data-demands on the system. Finally, we show that our fo-cus on a high-level language leads to useful end-to-endtechniques for reducing the effects of network loss onaggregate results.

The remainder of the paper is structured as follows. Inthe next section, we briefly overview the TinyOS hard-ware and software environment. Then, we discuss thesyntax and semantics of queries in TAG and classifythe types of aggregates supported by the system, focus-ing on the characteristics of aggregates that impact theirperformance and fault tolerance. We then present thecore TAG algorithm and show how our solution sat-isfies the query requirements while providing perfor-mance and tolerance to network faults. We discuss sev-eral optimizations for improving the performance of thebasic approach. Additionally, we include experimentalresults demonstrating the effectiveness and robustnessof our algorithms in a simulation environment, as wellas a brief study of a real-world deployment on TinyOSmotes. Finally, we discuss related work and conclude.

2 Motes and Ad-Hoc NetworksIn this section, we provide a brief overview of the motehardware architecture, the TinyOS system, and an adhoc routing algorithm for mote-based sensor networks.

Current generation TinyOS motes are equipped with a4Mhz Atmel microprocessor with 4 kB of RAM and128 kB of code space, a 917 MHz RFM radio runningat 50 kb/s, and 512kB of EEPROM. An expansion slotaccommodates a variety of sensor boards by exposinga number of analog input lines as well as popular chip-to-chip serial busses. Current sensor options include:light, temperature, magnetic field, acceleration (and vi-bration), sound, and power.

The single-channel radio is half duplex, meaning motescannot send and receive at the same time. Currently,the default TinyOS implementation uses a CSMA-likemedia access protocol with a random backoff scheme.

Message delivery is unreliable by default, though appli-cations can build up an acknowledgment layer. Often, amessage acknowledgment can be obtained for free (seebelow in Section 2.1).

Power is supplied via a free-hanging AA battery packor a coin-cell attached through the expansion slot. Theeffective lifetime of the sensor is determined by thispower supply. In turn, the power consumption of eachsensor node tends to be dominated by the cost of trans-mitting and receiving messages. In terms of power con-sumption, transmitting a single bit of data is equivalentto 800 instructions. This energy tradeoff between com-munication and computation implies that many applica-tions will benefit by processing the data inside the net-work rather than simply transmitting the sensor read-ings. A AA battery pack will allow a sensor to send5.52 million messages (if it does no other computationand only powers its radio up to transmit) which is equiv-alent to one message per second every day for abouttwo months – not particularly long if the goal is to de-ploy long lived, zero-maintenance ad-hoc sensor net-works. Hence, power-conserving algorithms are partic-ularly important. As we will discuss in Section 4.1 , ourdesign is amenable to very low power modes in whichthe radio is kept powered down for long periods of time.

To understand how data is routed in our ad-hoc ag-gregation network, two properties of radio communica-tion need to be emphasized. First, radio is a broadcastmedium, such that any sensor within hearing distancehears any message, irrespective of whether or not thatsensor is the intended recipient. Second, radio links aretypically symmetric: if sensor � can hear sensor

�, we

assume�

can hear � . 1

Messages in the current generation of TinyOS are afixed size – by default, 30 bytes. Each sensor has aunique sensor ID that distinguishes it from others. Allmessages specify their recipient (or specify broadcast,meaning all available recipients), allowing sensors toignore messages not intended for them, although non-broadcast messages are received by all sensors withinrange – unintended recipients simply drop messages notaddressed to them.

2.1 Ad-Hoc Routing AlgorithmGiven this overview of the mote environment, we nowdiscuss how sensors route data. One common tech-nique, which we describe here, is to build a routing tree.Note that a number of alternative techniques exist forthis purpose; the reader is referred to [20, 9, 8, 10, 2]for work on routing in sensor networks. In general, werequire our routing algorithm to provide two capabili-ties: first, it must be able to deliver query requests to

1Note that this may not be a valid assumption in some cases.However, most ad-hoc routing algorithms in the networking litera-ture also make this assumption [20, 9] .


all nodes in a network. Second, it must be able to pro-vide one or more routes from every node to the root ofthe network where aggregation data is being collected.These routes must guarantee that one or fewer copiesof every message arrive (no duplicates are generated).Work from UCLA on greedy aggregation [8] is relevanthere: it discusses how topology affects the quality ofin-network aggregates, and presents several alternativesfor topology construction.

In the tree-based routing scheme, one sensor is ap-pointed to be the root, usually because it is the pointwhere the user interfaces to the network. The rootbroadcasts a message asking sensors to organize intoa routing tree; in that message it specifies its own idand its level, or distance from the root (in this case,zero.) Any sensor without an assigned level that hearsthis message assigns its own level to be the level in themessage plus one. It also chooses the sender of the mes-sage as its parent, through which it will route messagesto the root.

Each of these sensors then rebroadcasts the routingmessage, inserting their own ids and levels. The rout-ing message floods down the tree in this fashion, witheach node rebroadcasting the message until all nodeshave been assigned a level and a parent. These routingmessages are periodically broadcast from the root, sothat the process of topology discovery goes on contin-uously. This constant topology maintenance makes itrelatively easy to adapt to network changes caused bymobility of certain nodes, or to the addition or deletionof sensors. To maintain stability in the network, parentsare retained unless a child does not hear from them forsome long period of time, at which point it selects a newparent using this same process. We will look in moredetail at the robustness of this approach with respect toloss and its effect on aggregate values in Section 7.

When a sensor wishes to send a message to the root, itbroadcasts a message addressed to its parent, which inturn forwards the message on to its parent, and so on,eventually reaching the root. In the Section 4, we showhow, as data is routed towards the root, it can be com-bined with data from other sensors to efficiently com-bine routing and aggregation. Now, however, we turn tothe syntax and semantics of aggregate queries in TAG.

3 Query Model and Environment3.1 Query Model

Given our goal of allowing users to pose declarativequeries over sensor networks, we needed a languagefor expressing such queries. Rather than inventing ourown, we chose to adopt a SQL-style query syntax. Wesupport SQL-style queries (without joins) over a singletable called sensors, whose schema is known at thebase station. As is the case in Cougar [15], this table can

be thought of as an append-only relational table withone attribute per input of the motes (e.g., temperature,light.) In TAG, we focus on the problem of aggregatesensor readings, though facilities for collecting individ-ual sensor readings also exist. Queries in TAG have theform:

SELECT �� ( �� ), attrs FROM sensors

WHERE � selPreds GROUP BY � attrs HAVING � havingPreds EPOCH DURATION �

With the exception of the EPOCH DURATION clause,the semantics of this statement are similar to SQL ag-gregate queries. The SELECT clause specifies an arbi-trary arithmetic expression over one or more aggrega-tion attributes. We expect that the common case here isthat �� will simply be the name of a single attribute.Attrs (optionally) selects the attributes by which thesensor readings are partitioned ; these are the same at-trs that appear in the GROUP BY clause. The syn-tax of the �� clause is discussed below. The WHEREclause filters out individual sensor readings before theyare aggregated. Such predicates can typically be exe-cuted locally at the sensor before readings are commu-nicated, as in [15, 12]. The GROUP BY clause specifiesan attribute based partitioning of the sensors. Logically,each sensor reading belongs to exactly one group in aquery, and the evaluation of the query is a table of groupidentifiers and aggregate values. The HAVING clausefilters that table by suppressing groups that do not sat-isfy the havingPreds predicates.

The primary semantic difference between TAG queriesand SQL queries is that the output of a TAG queryis a stream of values, rather than a single aggregatevalue (or batched result). In monitoring applications,such continuous results are often more useful than asingle, isolated aggregate, as they allow users to un-derstand how the network is behaving over time andobserve transient effects (such as message losses) thatmake individual results, taken in isolation, hard to inter-pret. In these stream semantics, each record consists ofone � group id,aggregate value � pair per group. Eachgroup is time-stamped and the readings used to computeaggregate record all belong to the same time interval,or epoch. The duration of each epoch is the argumentof the EPOCH DURATION clause, which specifies theamount of time (in seconds) sensors wait before acquir-ing and transmitting each successive sample. This valuemay be as large as the user desires; it must be at leastas long as the time it takes for a sensor to process andtransmit a single radio message and do some local pro-cessing – between 10 and 20 ms for current generationsensors. In section 4.1, we will discuss situations inwhich a longer lower bound on epoch duration is re-quired.


As an example, consider a user who wishes to monitorthe occupancy of the conference rooms on a particularfloor of a building, which she chooses to do by usingmicrophone sensors attached to motes, and looking forrooms where the average volume is over some threshold(assuming that rooms can have multiple sensors). Herquery could be expressed as:

SELECT AVG(volume),room FROM sensors

WHERE floor = 6

GROUP BY room

HAVING AVG(volume) > thresh

EPOCH DURATION 30s

This query partitions sensors on the 6th floor accordingto the room in which they are located (which may bea hard-coded constant in each sensor, or may be deter-mined via some localization component available to thesensors), and then reports all rooms where the averagevolume is over a specified threshold. Updates are deliv-ered every 30 seconds, although the user may deregisterher query at any time.

3.2 Structure of Aggregates

The problem of computing aggregate queries in largeclusters of nodes has been addressed in the context ofshared-nothing parallel query processing environments[16]. Like sensor networks, those environments requirethe coordination of a large number of nodes to processaggregations. Thus, while the severe bandwidth limita-tions, lossy communications, and variable topology ofsensor networks means that the specific implementationtechniques used in the two environments must differ, itis still useful to leverage the techniques for aggregatedecomposition used in database systems [3, 23].

The approach used in such systems (and followed inTAG) is to implement �� via three functions: a merg-ing function � , an initializer � , and an evaluator, � .In general, � has the following structure:�� where � � � and �� are multi-valued partial staterecords, computed over one or more sensor values, rep-resenting the intermediate state over those values thatwill be required to compute an aggregate. �� isthe partial-state record resulting from the application offunction � to � � � and �� . For example, if � isthe merging function for AVERAGE, each partial staterecord will consist of a pair of values: SUM and COUNT,and � is specified as follows, given two state records�� "!#� � and ��%$&�"!'$ � :�� (*)+�-,�).��/��(10"�2,.03��4�5��(6)*78(109�2,�)*78,.03�The initializer � is needed to specify how to instantiate astate record for a single sensor value; for the an AVER-AGE over a sensor value of � , the initializer �": �%; returnsthe tuple � ��=< � . Finally, the evaluator � takes a par-tial state record and computes the actual value of the

aggregate. For AVERAGE, the evaluator �>: �?�3�"! �@;simply returns �BAC! .

These three functions can easily be derived for the ba-sic SQL aggregates; in general, any operation that canbe expressed as commutative applications of a binaryfunction is expressible.

3.3 Taxonomy of Aggregates

Given our basic syntax and structure of aggregates, anobvious question remains: what aggregate functionscan be expressed in TAG? The original SQL specifica-tion offers just five options: COUNT, MIN, MAX, SUM,and AVERAGE. Although these basic functions are suit-able for a wide range of database applications, we didnot wish to constrain TAG to only these choices. Forthis reason, we present a general classification of ag-gregate functions and show how the dimensions of thatclassification affect the performance of TAG throughoutthe paper. We will assume that when aggregation func-tions are registered with TAG, they are classified alongthe dimensions described below.2

We classify aggregates according to four properties thatare particularly important to sensor networks. Table 1shows how specific aggregation functions can be clas-sified according to these properties, and indicates thesections of the paper where the various dimensions ofthe classification are emphasized.

The first dimension is duplicate sensitivity. Duplicateinsensitive aggregates are unaffected by duplicate read-ings from a single sensor while duplicate sensitive ag-gregates will change when a duplicate reading is re-ported. Duplicate sensitivity implies restrictions on net-work properties and on certain optimizations, as de-scribed in Section 7.4.

Second, exemplary aggregates return one or more repre-sentative values from the set of all values; summary ag-gregates compute some property over all values. Thisdistinction is important because exemplary aggregatesbehave unpredictably in the face of loss, and, for thesame reason, are not amenable to sampling. Conversely,for summary aggregates, the aggregate applied to a sub-set can be treated as a robust approximation of the trueaggregate value, assuming that either the subset is cho-sen randomly, or that the correlations in the subset canbe accounted for in the approximation logic.

Third, monotonic aggregates have the property thatwhen two partial state records, D1� and DE$ , are combinedvia � , the resulting state record D&F will have the prop-erty that either G%D � �HD $ � �>:IDJFK;@LNM�O#PQ: �>:ID � ;"� �>:ID $ ;+; orG%DR� �HDJ$&� �>:ID F ;�SNM�T>UQ: �>:IDR�V;"� �>:IDE$E;+; . This is importantwhen determining how far predicates (such as HAV-

2We omit a detailed discussion of how new aggregate functionsare registered with sensors. For now, assume aggregates are pre-compiled into sensors.


MAX, MIN COUNT, SUM AVERAGE MEDIAN COUNT DISTINCT 3 HISTOGRAM 4 SectionDuplicate Sensitive No Yes Yes Yes No Yes Section 7.4Exemplary (E), Summary (S) E S S E S S Section 6.2Monotonic Yes Yes No No Yes No Section 4.2Partial State Distributive Distributive Algebraic Holistic Unique Content-Sensitive Section 5.1

Table 1: Classes of aggregates

ING) can be pushed into the network.

The fourth dimension relates to the amount of state re-quired for each partial state record. For example, a par-tial AVERAGE record consists of a pair of values, whilea partial COUNT record constitutes only a single value.Though TAG correctly computes any aggregate thatconforms to the specification of � in Section 3.1 above,its performance is inversely related to the amount of in-termediate state required per aggregate. The first threecategories of this dimension were initially presented inthe original work on data-cubes [6].

� In Distributive aggregates, the partial state is sim-ply the aggregate for the partition of data over whichthey are computed. Hence the size of the partial staterecords is the same as the size of the final aggregate.

� In Algebraic aggregates, the partial state records arenot themselves aggregates for the partitions, but are ofconstant size.

� In Holistic aggregates, the partial state records areproportional in size to the set of data in the parti-tion. In essence, for holistic aggregates no useful par-tial aggregation can be done, and all the data must bebrought together to be aggregated by the evaluator.

� Unique aggregates are similar to holistic aggregates,except that the amount of state that must be propa-gated is proportional to the number of distinct valuesin the partition.

� In Content-Sensitive aggregates, the partial staterecords are proportional in size to some (perhaps sta-tistical) property of the data values in the partition.Many approximate aggregates proposed recently inthe database literature are content-sensitive.

In summary, we have classified aggregates accordingto their state requirements, their tolerance of loss, andduplicate sensitivity, and their monotonicity. We willrefer back to this classification throughout the text, asthese properties will determine the applicability of com-munication optimizations we present later. Understand-ing how aggregates fit into these categories is a cross-cutting issue that is critical (and generically useful) inmany aspects of sensor data collection.

3The HISTOGRAM aggregate sorts sensor readings into fixed-width buckets and returns the size of each bucket; it is content-sensitive because the number of buckets varies depending on howwidely spaced sensor readings are.

4COUNT DISTINCT returns the number of distinct values re-ported across all sensors.

3.4 Attribute Catalog

Queries in TAG contain named attributes. Some mech-anism is needed to allow external users to determine theset of attributes they may query, and to allow sensorsto advertise the attributes they can provide. In TAG,we include on each sensor a small catalog of attributes.This catalog can be searched for attributes of a specificname, or iterated through. To limit the burden of re-porting catalog information from motes, we assume thecentral query processor caches or stores the attributes ofall motes it may access.

When a TAG sensor receives a query, it converts namedfields into local catalog identifiers. Sensors lacking at-tributes specified in the query simply tag missing at-tributes as NULL in their result records. As in relationaldatabases, partial state records resulting from the eval-uation of a query have the same layout across all sen-sors. Thus, tuples in TAG need not be self-describing;attribute names are not carried with results, leading toa significant reduction in the amount of data that mustbe propagated with each tuple. At the same time, it isnot necessary for all sensors to have identical catalogs,which allows heterogeneous sensing capabilities and in-cremental deployment of motes.

Attributes in TAG may be direct representations of sen-sor values, such as light or temperature, or may be in-trospective, such as remaining energy or network neigh-borhood information. More generally, they can rep-resent time-varying statistics over local sensor values,such as an exponentially decaying average of the last �

light readings, or more complicated attributes such asa room number from a localization component. Indi-vidual software components in TinyOS choose whichattributes they will make available, and provide an ac-cessor function for acquiring the next attribute reading.

4 In-Network Aggregates

Given the simple routing protocol from Section 2.1 andour SQL-like query model, we now discuss the imple-mentation of the core TAG algorithm for in-network ag-gregation.

A naive implementation of sensor network aggregationwould be to use a centralized, server-based approachwhere all sensor readings are sent to the base station,which then computes the aggregates. In TAG, however,we compute aggregates in-network whenever possible,because, if properly implemented, this approach can belower in number of message transmissions, latency, and


power consumption than the server-based approach. Wewill measure the advantage of in-network aggregationin Section 5 below; first, we present the basic algorithmin detail. We first consider the operation of the basicapproach in the absence of grouping; we show how toextend it with grouping in Section 4.2.

4.1 Tiny Aggregation

TAG consists of two phases: a distribution phase, inwhich aggregate queries are pushed down into the net-work, and a collection phase, where the aggregate val-ues are continually routed up from children to par-ents. Recall that our query semantics partition time intoepochs of duration � , and that we must produce a singleaggregate value (when not grouping) that combines thereadings of all sensors in the network during that epoch.

Given our goal of using as few messages as possible,the collection phase must insure that parents in the rout-ing tree wait until they have heard from their childrenbefore propagating an aggregate value for the currentepoch. We will accomplish this by having parents sub-divide the epoch such that children are required to de-liver their partial state records during a parent-specifiedtime interval. This interval is selected such that thereis enough time for the parent to combine partial staterecords and propagate its own record to its parent.

When a sensor � receives a request to aggregate, � , ei-ther from another sensor or from the user, it awakens,synchronizes its clock according to timing informationin the message, and prepares to participate in aggrega-tion. In the tree based routing scheme, � chooses thesender of the message as its parent. In addition to the in-formation in the query, � includes the interval when thesender is expecting to hear partial state records from � .� then forwards the query request � down the network,setting this delivery interval for children to be slightlybefore the time its parent expects to see � ’s partial staterecord. In the tree-based approach, this forwarding con-sists of a broadcast of � , to include any nodes that didnot hear the previous round, and include them as chil-dren (if it has any.) These nodes continue to forward therequest in this manner, until the query has been propa-gated throughout the network.

During the epoch after query propagation, each sensorlistens for messages from its children during the inter-val it specified when forwarding the query. It then com-putes a partial state record consisting of the combina-tion of any child values it heard with its own local sen-sor readings. Finally, during the transmission intervalrequested by its parent, the sensor transmits this par-tial state record up the network. Figure 1 illustrates theprocess. Notice that parents listen for longer than thetransmission interval they specified, to overcome limita-tions in the quality of clock synchronization algorithmsbetween parents and children. In this way, aggregates

Level 1

Level 2

Level 3

Level 4

Level 5

Time

TreeDepth

Sensing and Processing,Radio IdleDelivery Interval(Transmitting)

Listening/Receiving

Radio and Processor Idle

End ofEpoch

Start ofEpoch

Root

Figure 1: Partial state records flowing up the tree dur-ing an epoch.

flow back up the tree interval-by-interval. Eventually,a complete aggregate arrives at the root. During eachsubsequent epoch, a new aggregate is produced. Noticethat, for a significant portion of every epoch, sensors arecompletely idle and can enter a low power state.

This scheme begs the question of how parents choosethe duration of the interval in which they will receivevalues. It needs to be long enough such that all ofa node’s children can report, but not so long that theepoch ends before nodes deep in the tree can scheduletheir communication. Furthermore, longer intervals re-quire radios to be powered up for more time, which con-sumes precious energy. In general, the proper choiceof duration for these intervals is somewhat environmentspecific, as it depends on the density of radio cells andbushiness of the network topology. For the purposes ofthe simulations and experiments in this paper, we as-sume the network has a maximum depth

�, and set the

duration of each interval to be (EPOCH DURATION)/�,

with nodes at level � transmitting during the �� inter-val. We rely on the TinyOS MAC layer [20] to avoidcollisions between sensors transmitting during the sameinterval. Note that this provides a lower-bound on theEPOCH DURATION and constrains the maximum sam-ple rate of the network, since the epoch must be longenough for partial state records from the bottom of thetree to propagate to the root.

To increase the sample rate, one could consider pipelin-ing the communications schedule shown in Figure 1.With pipelining, the output of the network would be de-layed by one or more epochs, as some nodes would waituntil the next epoch to report the aggregates they col-lected during the current epoch. In exchange for suchdelays, the effective sample rate of the system is in-creased (for the same reason that pipelining a long pro-cessor stage increases the clock rate of a CPU.) We donot consider such schemes in detail here; we discusseda fully-pipelined approach to aggregation in a workshopsubmission [13].


In Section 5.1 we show how TAG can provide an orderof magnitude decrease in communications costs over acentralized approach. Before presenting performanceresults, however, we show how to extend the approachto support grouping.

4.2 Grouping

Grouping in TAG is functionally equivalent to theGROUP BY clause in SQL: each sensor reading isplaced into exactly one group, and groups are parti-tioned according to an expression over one or more at-tributes. The basic grouping technique is to push the ex-pression down with the query, ask sensors to choose thegroup they belong to, and then, as answers flow back,update aggregate values in the appropriate groups.

Partial state records are aggregated just as in the ap-proach described above, except that those records arenow tagged with a group id. When a sensor is a leaf,it applies the grouping expression to compute a groupid. It then tags its partial state record with the groupand forwards it on to its parent. When a sensor re-ceives an aggregate from a child, it checks the groupid. If the child is in the same group as the sensor, itcombines the two values using the combining function� . If it is in a different group, it stores the value of thechild’s group along with its own value for forwarding inthe next epoch. If another child message arrives with avalue in either group, the sensor updates the appropriateaggregate. During the next epoch, the sensor will sendout the value of all the groups it collected informationabout during the previous interval, combining informa-tion about multiple groups into a single message as longas the message size permits. Figure 2 shows an exam-ple of computing a query grouped by temperature thatselects average light readings.

Recall that queries can also contain a HAVING clause,which constrains the set of groups in the final query re-sult. We sometimes pass this predicate into the networkalong with the grouping expression. The predicate isonly sent into the network if it can potentially be usedto reduce the number of messages that must be sent: for,

Group AVG

Group AVG

Group AVG

Temp: 20Light: 10

Temp: 20Light: 50

Temp: 10Light: 15

Temp: 30Light: 25

Temp: 10Light: 15

Temp: 10Light: 5

1

2

3

4

5

6

1

2

3

4

5

6

AggregateAVG(light)

Groups1 : 0 < temp 102 : 10 < temp 203 : 20 < temp 30 1 10

13

1025

Group AVG123

105025

123

103025

(6,5,2)(3,1)(4)

(6,5)(4)

(6,5)

SELECT AVG(light),temp/10FROM sensorsGROUP BY temp/10

Figure 2: A sensor network (left) with an in-network,grouped aggregate applied to it (right). Parenthesizednumbers represent sensors that contribute to the average

example, if the predicate is of the form MAX(attr) �x, then information about groups with MAX(attr) Lx need not be transmitted up the tree, and so the predi-cate is sent down into the network. When a node detectsthat a group does not satisfy a HAVING clause, it cannotify other nodes in the network of this information tosuppress transmission and storage of values from thatgroup. Note that HAVING clauses can be pushed downonly for monotonic aggregates; non-monotonic aggre-gates are not amenable to this technique. However, notall HAVING predicates on monotonic aggregates can bepushed down; for example, MAX(attr) � x, cannotbe applied in the network because a node cannot knowthat, just because its local value of � � �� is less than � ,the MAX over the entire group is less than � .

Because the number of groups can exceed availablestorage on any one (non-leaf) sensor, a way to evictgroups is needed. Once an eviction victim is selected, itis forwarded to the sensor’s parent, which may chooseto hold on to the group or continue to forward it upthe tree. Notice that a single sensor may evict severalgroups in a single epoch (or the same group multipletimes, if a bad victim is selected). This is because, oncegroup storage is full, if only one group is evicted at atime, a new eviction decision must be made every timea value representing an unknown or previously evictedgroup arrives. Because groups can be evicted, the basestation at the top of the network may be called upon tocombine partial groups to form an accurate aggregatevalue. Evicting partially computed groups is known aspartial preaggregation, as described in [11].

Thus, we have shown how to partition sensor readingsinto a number of groups and properly compute aggre-gates over those groups, even when the amount of groupinformation exceeds available storage in any one sensor.We will discuss experiments with grouping and groupeviction policies in Section 5.2. First, we summarizesome of the additional benefits of TAG.

4.3 Additional Advantages of TAG

The principal advantage of TAG is its ability to dramati-cally decrease the amount of communication required tocompute an aggregate versus a centralized aggregationapproach. However, TAG has a number of additionalbenefits.

One of these is its ability to tolerate loss. In sensor en-vironments, it is very likely that some aggregation re-quests or partial state records will be garbled, or thatsensors will move or run out of power. These losses willinvariably result in some sensors becoming lost, eitherwithout a parent or not incorporated into the aggrega-tion network during the initial flooding phase. If we in-clude information about queries in partial state records,lost nodes can reconnect by listening to other sensor’sstate records – not necessarily intended for them – as


they flow up the tree. We revisit the issue of loss inSection 7.

A second advantage of the TAG approach is that, inmost cases, each sensor is required to transmit onlya single message per epoch, regardless of its depth inthe routing tree. In the centralized (non TAG) case, asdata converges towards the root, nodes at the top of thetree are required to transmit significantly more data thannodes at the leaves; their batteries are drained faster andthe lifetime of the network is limited. Furthermore, be-cause the top of the routing tree must forward messagesfor every node in the network, the maximum samplerate of the system is inversely proportional to the totalnumber of nodes. To see this, consider a radio channelwith a capacity of � messages per second. If � sensorsare participating in a centralized aggregate, to obtain asample rate of

�samples per second, ��

�messages

must flow through the root during each epoch. ��

must be no larger than � , so the sample rate�

can be atmost ��A � messages per sensor per epoch, regardless ofthe network density. When using TAG, the maximumtransmission rate is limited instead by the occupancy ofthe largest radio-cell; in general, we expect that eachcell will contain far fewer than � motes.

Yet another advantage of TAG is that, by explicitly di-viding time into epochs, a convenient mechanism foridling the processor is obtained. The long idle times inFigure 1 show how this is possible; during these inter-vals, the radio and processor can be put into deep sleepmodes that use very little power. Of course, some boot-strapping phase is needed where motes can learn aboutqueries currently in the system, acquire a parent, andsynchronize clocks; a simple strategy involves requiringthat every node wake up infrequently but periodically toadvertise this information and that sensors that have notreceived advertisements from their neighbors listen forseveral times this period between sleep intervals. Re-search on energy aware MAC protocols [22] presents asimilar scheme in detail. That work also discusses is-sues such as required time synchronization resolutionand the maximum sleep duration to avoid the adverseeffects of clock skew on individual devices.

Taken as a whole, these properties provide users with astream of aggregate values that changes as sensor read-ings and the underlying network change. These read-ings are provided in an energy and radio-bandwidth ef-ficient manner.

5 Simulation-Based Evaluation

In this section, we present a simulation environment forTAG and evaluate its behavior using this simulator. Wealso have an initial, real-world deployment; we discussits performance at the end of the paper, in Section 8.

To study the algorithms presented in this paper, we

simulated TAG in Java. The simulator models sensorbehavior at a coarse level: time is divided into unitsof epochs, messages are encapsulated into Java objectsthat are passed directly into sensors without any modelof the time to send or decode. Sensors are allowed tocompute or transmit arbitrarily within a single epoch,and each sensor executes serially. Messages sent by allsensors during one epoch are delivered in random orderduring the next epoch to model a parallel execution.

Our simulation includes an interchangeable communi-cation model that defines connectivity based on geo-graphic distance. Figure 3 shows screenshots of a vi-sualization component of our simulation; each squarerepresents a single sensor, and shading (in these images)represents the number of radio hops the sensor is fromthe root (center); darker is closer. We measure the sizeof networks in terms of diameter, or width of the sensorgrid (in terms of number of nodes). Thus, a diameter 50network contains 2500 sensors.

We have run experiments with three communicationsmodels; 1) a simple model, where sensors have perfect(lossless) communication with their immediate neigh-bors, which are regularly placed (Figure 3(a)), 2) a ran-dom placement model (Figure 3(b)), and 3) a realisticmodel that attempts to capture the actual behavior ofthe radio on TinyOS motes (Figure 3(c).) In the lattermodel, notice that the number of hops from a particu-lar node to the root is no longer directly proportional tothe distance between the node and the root, althoughthe two values are still related. This model uses re-sults from real world experiments [4] to approximatethe actual loss characteristics of the TinyOS radio. Lossrates are high in in the realistic model: a pair of adja-cent sensors loses more than 20% of the traffic betweenthem. Sensors separated by larger distances lose stillmore traffic. Note that the simulator does not model ra-dio contention – we assume that the data delivery rateof the sensors is low enough that the MAC layer caneffectively eliminate contention.

The simulator also models the costs of topology mainte-nance: if a sensor does not transmit a reading for severalepochs (which will be the case in some of our optimiza-tions below), that sensor must periodically send a heart-beat to advertise that it is still alive, so that its parentsand children know to keep routing data through it. Theinterval between heartbeats can be chosen arbitrarily;choosing a longer interval means fewer messages mustbe sent, but requires sensors to wait longer before de-ciding that a parent or child has disconnected, makingthe network less adaptable to rapid change.

This simulation allows us to measure the the numberof bytes, messages, and partial state records sent overthe radio by each mote. Since we do not simulate themote CPU, it does not give us an accurate measurement


(a) Simple (b) Random (c) Realistic

Figure 3: The TAG Simulator, with Three DifferentCommunications Models, Diameter = 20.

of the number of instructions executed in each mote. Itdoes, however, allow us to obtain an approximate mea-sure of the amount of state required for various algo-rithms, based on the size of the data structures allocatedby each mote.

Unless otherwise specified, our experiments are overthe simple radio topology in which there is no loss. Wealso assume sensor values do not change over the courseof a single simulation run.

5.1 Performance of TAG

In the first set of experiments, we compare the per-formance of the TAG in-network approach to central-ized approaches on queries for the different classes ofaggregates discussed in Section 3.3. Centralized ag-gregates have the same communications cost irrespec-tive of the aggregate function, since all data must berouted to the root. For this experiment, we comparedthis cost to the number of bytes required for distributiveaggregates (MAX and COUNT), an algebraic aggregate(AVERAGE), a holistic aggregate (MEDIAN), a content-sensitive aggregate (HISTOGRAM), and a unique ag-gregate (COUNT DISTINCT); the results are shown inFigure 4.

Values for in this experiment represent the steady-statecost to extract an additional aggregate from the networkonce the query has been propagated; the cost to flood arequest down the tree in not considered.

MAX and COUNT have the same cost in-network, about5000 bytes per epoch, since they both send just a sin-gle integer per partial state record; similarly AVERAGErequires just two integers, and thus always has doublethe cost of the distributive aggregates. MEDIAN coststhe same as a centralized aggregate, about 90000 bytesper epoch, which is significantly more expensive thanother aggregates, especially for larger networks, as par-ents have to forward all of their children’s values to theroot. COUNT DISTINCT is only slightly less expen-sive (73000 bytes), as there are few duplicate sensorvalues; a less uniform sensor-value distribution wouldreduce the cost of this aggregate. For the HISTOGRAMaggregate, we set the size of the fixed-width buckets tobe 10; sensor values ranged over the interval [0..1000].At about 9000 messages per epoch, HISTOGRAM pro-

CO

UN

T

MIN

HIS

TO

GR

AM

AV

ER

AG

E

CO

UN

T D

IST

INC

T

ME

DIA

N

Cen

tral

ized

(no

t TA

G)

Aggregation Function

0

20000

40000

60000

80000

Byt

es T

rans

mitt

ed /

Epo

ch, A

ll Se

nsor

s

TAG (In Network)Any Centralized Aggregate

In-network vs. Centralized AggregationNetwork Diameter = 50, No Loss

Figure 4: In-network Vs. Centralized Aggregates

vides an efficient means for extracting a density sum-mary of readings from the network.

Note that the benefit of TAG will be more or less pro-nounced depending on the topology. In a flat, single-hop environment, where all sensors are directly con-nected to the root, TAG is no better than the centralizedapproach. For a topology where � sensors are arrangedin a line, centralized aggregates will require �

$ A�� par-tial state records to be transmitted, whereas TAG willrequire only � records.

Thus, we have shown that, for our simulation topol-ogy, in-network aggregation can reduce communicationcosts by an order of magnitude over centralized ap-proaches, and that, even in the worst case (such as withMEDIAN), it always provides performance equal to thecentralized approach.

5.2 Grouping Experiments

We also ran several experiments to measure the perfor-mance of grouping in TAG, focusing on the behavior ofvarious eviction techniques in the case that the numberof groups exceeds the storage available on a single sen-sor. We tried a number of simple eviction policies, suchas evicting the group with the fewest members, evictinga random group, and evicting the group with the mostmembers. We found that the choice of policy made lit-tle difference for any of the sensor-value distributionswe tested. This is largely due to the tree topology: nearthe leaves of the tree, most sensors will see only a fewgroups, and the eviction policy will matter very little.At the top levels of the tree, the eviction policy becomesimportant, but the cost of forwarding messages from theleaves of the tree tends to dominate the savings obtainedat the top. In the most extreme case, the difference be-tween the best and worst case eviction policy accounted


for less than 10% of the total messages. We also ob-served that, when evicting, the best policy was to evictmultiple groups at a time, up to the number of grouprecords that will fit into a single radio message. Dueto space limitations, we omit detail discussion of theseexperiments.

6 Optimizations

In this section, we present several techniques to improvethe performance and accuracy of the basic approach de-scribed above. Some of these techniques are functiondependent; that is, they can only be used for certainclasses of aggregates. Also note that, in general, thesetechniques can be applied in a user-transparent fashion,since they are not explicitly a part of the query syntaxand do not affect the semantics of the results.

6.1 Taking Advantage of A Shared Channel

In our discussion of aggregation algorithms up to thispoint, we have largely ignored the fact that sensors com-municate over a shared radio channel. The fact that ev-ery message is effectively broadcast to all other sensorswithin range enables a number of optimizations that cansignificantly reduce the number of messages transmit-ted and increase the accuracy of aggregates in the faceof transmission failures.

In Section 4.3, we saw an example of how a sharedchannel can be used to increase message efficiencywhen a sensor misses an initial request to begin ag-gregation: it can initiate aggregation even after missingthe start request by snooping on the network traffic ofnearby sensors. When it sees another sensor reportingan aggregate, it can assume it too should be aggregat-ing. By allowing sensors to examine messages not di-rectly addressed to them, sensors are automatically in-tegrated into the aggregation. Note that snooping doesnot require sensors to listen all the time; by listening atpredefined intervals (which can be short once a sensorhas time-synchronized with its neighbors), sensors cankeep their duty cycles quite low.

Snooping can also be used to reduce the number of mes-sages sent for certain classes of aggregates. Considercomputing a MAX over a group of sensors; if a sensorhears a peer reporting a maximum value greater than itslocal maximum, it can elect to not send its own valueand be assured of not affecting the value of the final ag-gregate.

6.2 Hypothesis Testing

The snooping example above showed that we only needto hear from a particular sensor if that sensor’s valuewill affect the end value of the aggregate. For someaggregates, this fact can be exploited to significantlyreduce the number of nodes that need to report. Thistechnique can be generalized to an approach we call hy-

pothesis testing. For certain classes of aggregates, if anode is presented with a guess as to the proper value ofan aggregate, it can decide locally whether contributingits reading and the readings of its children will affectthe value of the aggregate.

For MAX, MIN and other monotonic, exemplary aggre-gates, this technique is directly applicable. There area number of ways it can be applied – the snooping ap-proach, where sensors suppress their local aggregatesif they hear other aggregates that invalidate than theirown, is one. Alternatively, the root of the network (orany subtree of the network) seeking an exemplary sen-sor value, such as a MIN, might compute the minimumsensor value � over the highest levels of the subtree,and then abort the aggregate and issue a new requestasking for values less than � over the whole tree. Inthis approach, leaf nodes need not send a message iftheir value is greater than the minimum observed overthe top

�levels; intermediate nodes, however, must still

forward partial state records, so even if their value issuppressed, they may still have to transmit.

Assuming for a moment that sensor values are indepen-dent and uniformly distributed, then a particular leafnode must transmit with probability <JA ��

(where�

isthe branching factor, so <JA � �

is the number of sensorsin the top

�levels), which is quite low for even small

values of�

. For bushy routing trees, this technique of-fers a significant reduction in message transmissions –a completely balanced routing tree would cut the num-ber of messages required to <JA � . Of course, the per-formance benefit may not be as substantial for other,non-uniform, sensor value distributions; for instance, adistribution in which all sensor readings are clusteredaround the minimum will not allow many messages tobe saved by hypothesis testing. Similarly, less balancedtopologies (e.g. a line of sensors) will not benefit fromthis approach.

For summary aggregates, such as AVERAGE or VARI-ANCE, hypothesis testing via a guess from the root canbe applied, although the message savings are not asdramatic as with monotonic aggregates. Note that thesnooping approach cannot be used: it only applies tomonotonic, exemplary aggregates where values can besuppressed locally without any information from a cen-tral coordinator. To obtain any benefit with summaryaggregates and hypothesis testing, the user must definea fixed-size error bound that he or she is willing to tol-erate over the value of the aggregate; this error is sentinto the network along with the hypothesis value.

Consider the case of an AVERAGE: any sensor that iswithin the error bound of the hypothesis value need notanswer – its parent will then assume its value is thesame as the approximate answer and count it accord-ingly (for AVERAGE parents must know how many chil-


0

500

1000

1500

2000

2500

10 15 20 25 30 35 40 45 50

Mes

sage

s / E

poch

/ S

enso

r

Network Diameter

Steady State Messages/Epoch Max Query With Hypothesis Testing

No HypothesisHypothesis : 50Hypothesis : 90

Hypothesis via Snooping

Figure 5: Benefit of Hypothesis Testing for MAX

dren they have.) It can be shown that the total computedaverage will not be off from the actual average by morethan the error bound, and leaf sensors with values closeto the average will not be required to report. Obviously,the value of this scheme depends on the distribution ofsensor values. If values are uniformly distributed, thefraction of leaves that need not report approximates thesize of the error bound divided by the size of the sensorvalue distribution interval. If values are normally dis-tributed, a much larger fraction of leaves do not report.

We conducted a simple experiment to measure the ben-efit of hypothesis testing and snooping for a MAX ag-gregate. The results are shown in Figure 5. In thisexperiment, sensor values were uniformly distributedover the range [0..100], and a hypothesis was madeat the root. Notice that the performance savings arenearly two-fold for a hypothesis of 90. We comparedthe hypothesis testing approach with the snooping ap-proach (which will be effective even in a non-uniformdistribution); surprisingly, snooping beat the other ap-proaches by offering a nearly three-fold performanceincrease over the no-hypothesis case. This is becausein the densely packed simple sensor distribution, mostsensors have three or more neighbors to snoop on, sug-gesting that only about one in four sensors will have totransmit. With topology maintenance and forwardingof child values by parents, the savings by snooping isreduced to a factor of three.

7 Improving Tolerance to Loss

Up to this point in our experiments we used a reliableenvironment where no messages were dropped and nosensors were lost. In this section, we address the prob-lem of loss and its effect on the algorithms presentedthus far. Unfortunately, loss is a a fact of life in the sen-sor domain; the techniques described in the section seekto mitigate that loss.

7.1 Effects of A Single Loss

We first study the effect that a single sensor going of-fline has on the value of the aggregate; this is an impor-tant measurement because it gives some intuition about

the magnitude of error that a single loss can generate.Note that, because we are doing hierarchical aggrega-tion, a single sensor going offline causes the entire sub-tree rooted at the sensor to be (at least temporarily) dis-connected. In this first experiment we used the sim-ple topology, with sensor readings chosen from the uni-form distribution over [1..1000]. After running the sim-ulation for several epochs, we selected, uniformly andat a random, a sensor to disable. In this environment,children of the disabled node were temporarily discon-nected but eventually their values were reintegrated intothe aggregate once they rediscovered their parents. Notethat the amount of time taken for lost nodes to rein-tegrate is directly proportional to the depth of the lostsensor, so we did not measure it experimentally. In-stead, we measured the maximum temporary deviationfrom the true value of the aggregate that the loss causedin the perceived aggregate value at the root during anyepoch. This maximum was computed by performing100 runs at each data point and selecting the largest er-ror reported in any run. We also report the average ofthe maximum error across all 100 runs.

Figure 6 shows the results of this experiment. Note thatthe maximum loss (Figure 6(a)) is highly variable andthat some aggregates are considerably more sensitive toloss than others. COUNT, for instance, has a very largeerror in the worst case: if a node that connects the rootto a large portion of the network is lost, the temporaryerror will be very high. The variability in maximum er-ror is because a well connected subtree is not alwaysselected as the victim. Indeed, assuming some unifor-mity of placement (e.g. the sensors are not arrangedin a line), as the network size increases, the chances ofselecting such a node go down, since a larger propor-tion of the sensors are towards the leaves of the tree. Inthe average case(Figure 6(b)), the error associated witha COUNT is not as high: most losses do not result in alarge number of disconnections. Note that MIN is insen-sitive to loss in this uniform distribution, since severalnodes are at or near the true minimum. The error forMEDIAN and AVERAGE is less than COUNT and morethan MIN: both are sensitive to the variations in thenumber of sensors, but not as dramatically as COUNT.

7.2 Effect of the Realistic Communication Model

In the second experiment, we examine how well TAGperforms in the realistic simulation environment (dis-cussed in Section 5 above). In such an environment,without some technique to counteract loss, a large num-ber of partial state records will invariably be droppedand not reach the root of the tree. We ran an experi-ment to measure the effect of this loss in the realisticenvironment. The simulation ran until the first aggre-gate arrived at the root, and then the average number ofsensors involved in the aggregate over the next several


0

5

10

15

20

25

30

35

40

45

10 15 20 25 30 35 40 45 50

Per

cent

Err

or

Network Diameter

Maximum Error vs. Aggregation Function

AVERAGECOUNT

MINIMUMMEDIAN

(a) Maximum Error

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

10 15 20 25 30 35 40 45 50

Network Diameter

Average Error vs. Aggregation Function

AVERAGECOUNT

MINIMUMMEDIAN

(b) Average ErrorFigure 6: Effect of a Single Loss on Various AggregateFunctions. Computed over a total of 100 runs at eachpoint. Error bars indicate standard error of the mean,95% confidence intervals.

epochs was measured. The “No Cache” line of Figure7 shows the performance of this approach; at diameter10, about 40% of the partial state records are reflected inthe aggregate at the root; by diameter 50, this percent-age has fallen to less than 10%. Performance falls offas the number of hops between the average sensor andthe root increases, since the probability of loss is com-pounded by each additional hop. Thus, the basic TAGapproach presented so far, running on current prototypehardware (with its very high loss rates), is not highlytolerant to loss, especially for large networks. Note thatany centralized approach would suffer from the sameloss problems.

7.3 Child Cache

To improve the quality of aggregates, we propose asimple caching scheme: parents remember the partialstate records their children reported for some number ofrounds, and use those previous values when new valuesare unavailable due to lost child messages. As long asthe duration of this memory is shorter than the intervalat which children select new parents, this technique willincrease the number of nodes included in the aggregatewithout over-counting any nodes. Of course, cachingtends to temporally smear the aggregate values that arecomputed, and so may not always be applicable. Gener-ally, however, we believe that using old values in placeof missing will be desirable.

We conducted some experiments to show the improve-ment this technique offers over the basic approach; weallocate a fixed size buffer at each node and measure theaverage number of sensors involved in the aggregationas in Section 7.2 above. The results are shown in Figure7 – notice that even five epochs of cached state offer asignificant increase in the number of nodes counted inany aggregate, and that 15 rounds increases the num-ber of sensors involved in the diameter 50 network to70% (versus less than 10% without a cache). There aretwo drawbacks to caching; First, it uses memory thatcould be used for group storage. Second, it sets a mini-mum bound on the time sensors must wait before deter-

0

10

20

30

40

50

60

70

80

90

100

10 15 20 25 30 35 40 45 50

Per

cent

age

of N

etw

ork

Invo

lved

Network Diameter

Percentage of Network Involved, Child Caching

No Cache5 Epochs Cache

9 Epochs Cache15 Epochs Cache

Figure 7: Percentage of Network Participating in Ag-gregate For Varying Amounts of Child Cache

mining their parent has gone offline; given the benefit itprovides in terms of accuracy, however, we believe it tobe useful despite these disadvantages. The substantialbenefit of this technique suggests that allocating RAMto application level caching may be more beneficial thanallocating it to lower-level schemes for reliable messagedelivery, as such schemes cannot take advantage of thesemantics of the data being transmitted.

7.4 Using Available Redundancy

Because there may be situations where the RAM or la-tency costs of the child cache are not desirable, it isworthwhile to look at alternative approaches for im-proving loss tolerance. In this section, we show how thenetwork topology can be leveraged to increase the qual-ity of aggregates. Consider a sensor with two possiblechoices of parent parents: instead of sending its aggre-gate value to just one parent, it can send it to both par-ents. It is easy for a node to discover that it has multipleparents, since it can simply build a list of nodes it hasheard that are one step closer to the root. Of course, forduplicate-sensitive aggregates (see Section 3.3), send-ing results to multiple parents has the undesirable effectof causing the node to be counted multiple times. Thesolution to this is to send part of the aggregate to oneparent and the rest to the other. Consider a COUNT; asensor with ��N< children and two parents can send aCOUNT of �=A�� to both parents instead of a count of � toa single parent. Note that, in general, if the aggregatecan be linearly decomposed in this fashion, it is possi-ble to broadcast just a single message that is receivedand processed by both parents, so this scheme incurs nomessage overheads, as long as both parents are at thesame level and request data to be delivered during thesame sub-interval of the epoch.

A simple statistical analysis reveals the advantage ofdoing this: assume that a message is transmitted withprobability � , and that losses are independent, so that ifa message � from sensor D is lost in transition to par-ent

� � , it is no more likely to be lost in transit to� $ .

First, consider the case where D sends � to a single par-ent; the expected value of the transmitted count is � � �


(0 with probability :/< � �%; and � with probability � ),and the variance is � $ � � � :/< � �%; , since these arestandard Bernoulli trials with a probability of success �multiplied by a constant � . For the case where D sends�=A�� to both parents, linearity of expectation tells us theexpected value is the sum of the expected value througheach parent, or � � � � �=A�� . Similarly, we cansum the variances through each parent to get:

var =�� 0 ��

=� 0 � ��

Thus, the variance of the multiple parent COUNT ismuch less than with just a single parent, although itsexpected value is the same. This is because it is muchless likely (assuming independence) for the message toboth parents to be lost, and a single loss will less dra-matically affect the computed value.

We ran an experiment to measure the benefit of thisapproach in the realistic topology for COUNT with anetwork diameter of 50. We measured the number ofsensors involved in the aggregation over a 50 epochperiod. When sending to multiple parents, the meanCOUNT was 974 ( � �� ), while when sending toonly one parent, the mean COUNT was 94 ( � �� < ).Surprisingly, sending to multiple parents substantiallyincreases the mean aggregate value; most likely this isdue to the fact that losses are not truly independent aswe assumed above.

This technique applies equally well to any distributiveor algebraic aggregate. For holistic aggregates, likeMEDIAN, this technique cannot be applied, since par-tial state records cannot be easily decomposed.

8 Real-World Experiments

Based on the encouraging simulation results presentedabove, we have built an implementation of TAG forTinyOS Mica motes. The implementation does notcurrently include many of the optimizations discussedin this paper, but contains the core TAG aggrega-tion algorithm and catalog support for querying arbi-trary attributes with simple predicates. In this section,we briefly summarize results from experiments withthis implementation, to demonstrate that the simulationnumbers given above are consistent with actual behav-ior and to show that substantial real-world message re-ductions over a centralized approach are possible.

These experiments involved sixteen sensors arranged ina depth four tree, computing a COUNT aggregate over150, 4 second epochs (a 10 minute run.) No childcaching or snooping techniques were used. Figure 8shows the COUNT observed at the root for a centralizedapproach, where all messages are forwarded to the root,versus the in-network TAG approach. Notice that thequality of the aggregate is substantially better for TAG;this is due to reduced radio contention. To measure theextent of contention and compare the message costs of

0

5

10

15

20

0 20 40 60 80 100 120 140

CO

UN

T

Epoch

Count Per Epoch, 16 Nodes (Epoch Duration = 4 Seconds)

TAGCentralized

Figure 8: Comparison of Centralized and TAG basedAggregation Approaches in Lossy, Real-World Environ-ment Computing a COUNT over a 16 node network.

the two schemes, we instrumented motes to report thenumber of messages sent and received. The central-ized approach required 4685 messages, whereas TAGrequired just 2330, representing a 50% communica-tions reduction. This is less than the order-of-magnitudeshown in Figure 4 for COUNT because our real-worldnetwork topology had a higher average fanout than thesimulated environment, so messages in the centralizedcase had to be retransmitted fewer times to reach theroot. Per hop loss rates were about 5% in the in-networkapproach. In the centralized approach, increased net-work contention drove these loss rates to 15%. The cu-mulative nature of loss meant that messages from nodesat the bottom of the routing tree successfully deliveredonly about 45% of messages to the root in the central-ized case (accounting for its poor performance.)

This completes our discussion of algorithms for TAG.We now summarize the extensive related work in thenetworking and database communities.

9 Related Work

The database community has proposed a number of dis-tributed and push-down based approaches for aggre-gates in database systems [16, 21], but these universallyassume a well-connected, low-loss topology that is un-available in sensor networks. The partial preaggrega-tion techniques [11] used to enable group eviction wereproposed as a technique to deal with very large numbersof groups to improve the efficiency of hash joins andother bucket-based database operators. The partial-staterequirements aggregates presented in Section 3.3 wereoriginally partially developed as a part of the researchon data-cubes [6]. [18] discusses online aggregationin the context of nested-queries; it proposes optimiza-tions to reduce the flow of tuples between outer and in-ner queries that bear some similarities to our techniquefor pushing HAVING clauses into the network. Withrespect to query language, our epoch based approachis related to languages and models from the Tempo-ral Database literature; see [17] for a survey of rele-vant work. The Cougar project at Cornell [15] discusses


queries over sensor networks, as does our own work onFjords [12], although the former only considers mov-ing selections (not aggregates) onto sensors and neitherpresents specific algorithms for use in sensor networks.

Literature on active networks [19] identified the ideathat the network could simultaneously route and trans-form data, rather than simply serving as an end-to-enddata conduit. Within the sensor network community,work on networks that perform data analysis has beenlargely confined to the USC/ISI and UCLA commu-nities. Their work on directed diffusion [9] discussestechniques for moving specific pieces of informationfrom one place in a network to another, and proposesaggregation-like operations that nodes may perform asdata flows through them. Work on low-level-naming[7]proposes a scheme for imposing names onto relatedgroups of sensors in a network, in much the way that ourscheme partitions sensor networks into groups. Workon greedy aggregation [8] discusses networking pro-tocols for routing data to improve the extent to whichdata can be combined as it flows up a sensor network– it provides low level techniques for building routingtrees that could be useful in computing database styleaggregates. These papers recognize that aggregationdramatically reduces the amount of data routed throughthe network but present application specific solutionsthat, unlike the declarative query approach approach ofTAG, do not offer an application independent interface,naming system, or aggregation mechanism. Finally, weinitially noted the advantages of a database style ap-proach to aggregation in a workshop publication [13].This work did not include simulation or real-world ex-periments, and was missing the taxonomy which lendsTAG much of its generality.

Networking protocols for routing data in wireless net-works are very popular within the literature [10, 2, 5],however, none of them address higher level issues ofdata processing, merely techniques for data routing.Our tree-based routing approach is clearly inferior tothese approaches for peer to peer routing, but workswell for the aggregation scenarios we are focusing on.

10 Conclusions

In summary, we have shown how declarative aggre-gate queries can be distributed and efficiently executedover sensor networks. Our in-network approach canprovide an order of magnitude reduction in bandwidthconsumption over approaches where data is aggregatedand processed centrally. The declarative query inter-face allows end-users to take advantage of this benefitfor a wide range of aggregate operations without hav-ing to modify low-level code or confront the difficul-ties of topology construction, data routing, loss toler-ance, or distributed computing. Furthermore, this inter-face, combined with tight integration with the network

enables transparent optimizations that further decreasemessage costs and improve tolerance to failure and loss.

As sensor networks become more widely deployed,especially in remote, difficult to administer locations,bandwidth and power sensitive methods to extract datafrom those networks will become increasingly impor-tant. In such scenarios, we see TAG as the central datadelivery service of TinyOS: the simplicity of declara-tive queries, combined with the ability of TAG to effi-ciently optimize and execute such queries makes it anideal choice for a wide range of sensor network dataprocessing situations.

Acknowledgments

Robert Szewczyk, David Culler, and Ramesh Govindancontributed to the design of the networking protocolsdiscussed in this paper. Per Ake Larson suggested theuse of partial preaggregation for group eviction.References[1] Smart buildings admit their faults. Web Page, November 2001. Lab

Notes: Research from the College of Engineering, UC Berkeley.http://coe.berkeley.edu/labnotes/1101.smartbuildings.html.

[2] W. Adjue-Winoto, E. Schwartz, H. Balakrishnan, and J. Lilley. Thedesign and implementation of an intentional naming system. In ACMSOSP, December 1999.

[3] F. Bancilhon, T. Briggs, S. Khoshafian, and P. Valduriez. FAD, a pow-erful and simple database language. In VLDB, 1987.

[4] D. Ganesan. Network dynamics in rene motes. PowerPoint Presenta-tion, January 2002.

[5] T. Goff, N. Abu-Ghazaleh, D. Phatak, and R. Kahvecioglu. Preemptiverouting in ad hoc networks. In ACM MobiCom, July 2001.

[6] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: Arelational aggregation operator generalizing group-by, cross-tab, andsub-total. February 1996.

[7] J. Heidemann, F. Silva, C. Intanagonwiwat, R. Govindan, D. Estrin,and D. Ganesan. Building efficient wireless sensor networks with low-level naming. In SOSP, October 2001.

[8] C. Intanagonwiwat, D. Estrin, R. Govindan, and J. Heidemann. Impactof network density on data aggregation in wireless sensor networks.Submitted for Publication, ICDCS-22, November 2001.

[9] C. Intanagonwiwat, R. Govindan, , and D. Estrin. Directed diffusion:A scalable and robust communication paradigm for sensor networks.In MobiCOM, Boston, MA, August 2000.

[10] J. Kulik, W. Rabiner, and H. Balakrishnan. Adaptive protocols forinformation dissemination in wireless sensor networks. In MobiCOM,Seattle, WA, 1999.

[11] P.-A. Larson. Data reduction by partial preaggregation. In ICDE,2002. (to appear).

[12] S. Madden and M. J. Franklin. Fjording the stream: An architechturefor queries over streaming sensor data. In ICDE, 2002. (to appear).

[13] S. Madden, R. Szewczyk, M. Franklin, and D. Culler. Supportingaggregate queries over ad-hoc wireless sensor networks. Submitted,Workshop on Mobile Computing and Systems Applications, 2002.

[14] A. Mainwaring, 2002. Personal Communication.[15] P.Bonnet, J.Gehrke, and P.Seshadri. Towards sensor database systems.

In Conference on Mobile Data Management, January 2001.[16] A. Shatdal and J. Naughton. Adaptive parallel aggregation algorithms.

In ACM SIGMOD, 1995.[17] R. T. Snodgrass, editor. The TSQL2 Temporal Query Language.

Kluwer Academic Publisher, 1995.[18] K.-L. Tan, C. H. Goh, and B. C. Ooi. Online feedback for nested

aggregate queries with multi-threading. In VLDB, 1999.[19] D. Tennenhouse. Active networks. In OSDI, October 1996.[20] A. Woo and D. Culler. A transmission control scheme for media access

in sensor networks. In ACM Mobicom, July 2001.

[21] W. P. Yan and P. A. Larson. Eager aggregation and lazy aggregation.In VLDB, 1995.

[22] W. Ye, J. Heidemann, and D. Estrin. An energy-efficient MAC proto-col for wireless sensor networks. In IEEE Infocom, 2002.

[23] A. Yu and J. Chen. The POSTGRES95 User Manual. UC Berkeley,1995.

tag: a tiny aggregation service for ad-hoc sensor networksdga/15-849/papers/madden-osdi2002.pdf ·...

Documents