flow updating: fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · data aggregation...

21
Accepted Manuscript Flow updating: Fault-tolerant aggregation for dynamic networks Paulo Jesus, Carlos Baquero, Paulo S´ ergio Almeida PII: S0743-7315(15)00041-6 DOI: http://dx.doi.org/10.1016/j.jpdc.2015.02.003 Reference: YJPDC 3388 To appear in: J. Parallel Distrib. Comput. Received date: 14 October 2014 Revised date: 31 January 2015 Accepted date: 16 February 2015 Please cite this article as: P. Jesus, C. Baquero, P.S. Almeida, Flow updating: Fault-tolerant aggregation for dynamic networks, J. Parallel Distrib. Comput. (2015), http://dx.doi.org/10.1016/j.jpdc.2015.02.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Upload: others

Post on 28-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

Accepted Manuscript

Flow updating: Fault-tolerant aggregation for dynamic networks

Paulo Jesus, Carlos Baquero, Paulo Sergio Almeida

PII: S0743-7315(15)00041-6DOI: http://dx.doi.org/10.1016/j.jpdc.2015.02.003Reference: YJPDC 3388

To appear in: J. Parallel Distrib. Comput.

Received date: 14 October 2014Revised date: 31 January 2015Accepted date: 16 February 2015

Please cite this article as: P. Jesus, C. Baquero, P.S. Almeida, Flow updating: Fault-tolerantaggregation for dynamic networks, J. Parallel Distrib. Comput. (2015),http://dx.doi.org/10.1016/j.jpdc.2015.02.003

This is a PDF file of an unedited manuscript that has been accepted for publication. As aservice to our customers we are providing this early version of the manuscript. The manuscriptwill undergo copyediting, typesetting, and review of the resulting proof before it is published inits final form. Please note that during the production process errors may be discovered whichcould affect the content, and all legal disclaimers that apply to the journal pertain.

Page 2: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

Highlights:

• We describe a fault-tolerant data aggregation algorithm for dynamic net-works;

• Experimental results show it outperforms previous averaging techniques;

• It self-adapts to churn and input value changes;

• It supports node crashes and high levels of message loss;

• It works in asynchronous settings;

1

*Highlights (for review)

Page 3: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

Flow Updating: Fault-Tolerant Aggregation for Dynamic Networks

Paulo Jesus1,∗, Carlos Baquero1,2,∗∗, Paulo Sergio Almeida1,3,∗∗

HASLab, INESC TEC and Universidade do Minho, Portugal.

Abstract

Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches, commonly

designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result,

converge to any required accuracy, and work independently from the network topology. However, existing approaches

exhibit many dependability issues when used in faulty and dynamic environments. This paper describes and evaluates

a fault tolerant distributed aggregation technique, Flow Updating, which overcomes the problems in previous averaging

approaches and is able to operate on faulty dynamic networks. Experimental results show that this novel approach

outperforms previous averaging algorithms; it self-adapts to churn and input value changes without requiring any periodic

restart, supporting node crashes and high levels of message loss, and works in asynchronous networks. Realistic concerns

have been taken into account in evaluating Flow Updating, like the use of unreliable failure detectors and asynchrony,

targeting its application to realistic environments.

1. Introduction

With the advent of multi-hop ad-hoc networks, sensor

networks and large-scale overlay networks, there is a de-

mand for tools that can abstract meaningful system prop-

erties from given assemblies of nodes. In such settings,

aggregation plays an essential role in the design of dis-

tributed applications [1], allowing the determination of

network-wide properties like network size, total storage

capacity, average load, and majorities. Although appar-

ently simple, in practice aggregation has revealed itself to

be a non-trivial problem in distributed settings, where no

single element holds a global view of the whole system.

In the recent years, several algorithms have addressed

the problem with diverse approaches, exhibiting different

∗Principal corresponding author∗∗Corresponding authors

Email addresses: [email protected] (Paulo Jesus),[email protected] (Carlos Baquero), [email protected] (PauloSergio Almeida)

1Fax: +351 253 604 4712Tel.: +351 253 604 4493Tel.: +351 253 604 451

characteristics in terms of accuracy, time and communica-

tion trade-offs. A useful class of aggregation algorithms is

based on averaging techniques. Such algorithms start from

a set of input values spread across the network nodes, and

iteratively average their values with neighbors. Eventually

all nodes will converge to the same value and can estimate

some useful metric.

Averaging techniques allow the derivation of different

aggregation functions besides average (like counting and

summing), according to the initial combinations of input

values. For example, if one node starts with input 1 and

all other nodes with input 0, eventually all nodes will end

up with the same average 1/n and the network size n can

be directly estimated by all of them [2].

Distributed data aggregation becomes particularly dif-

ficult to achieve when faults are taken into account (i.e.,

message loss and node crashes), and especially if dynamic

settings are considered (nodes arriving/leaving). Few have

approached the problem under these settings [3, 4, 5, 6, 7,

Preprint submitted to Elsevier January 31, 2015

*ManuscriptClick here to view linked References

Page 4: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

8], proving to be hard to efficiently obtain accurate and

reliable aggregation results in faulty and dynamic envi-

ronments.

This paper extends the previous work on Flow Updat-

ing [9, 10], a novel averaging approach, by presenting asyn-

chronous versions of the algorithm and providing exten-

sive evaluation results considering practical concerns such

as: dynamic input value changes, realistic failure detectors

and asynchronous execution with message loss. The eval-

uation shows that: it outperforms classic averaging-based

aggregation algorithms; it is fault-tolerant (to both mes-

sage loss and node crashes); it is able to efficiently support

network dynamism (churn); it can be used with realistic

failure detectors (and shows how these should be tuned);

it can continuously aggregate under changes of input val-

ues with no need for a restart; it can be used in asyn-

chronous settings, with variable transmission latency (and

shows how timeouts can be chosen for a typical latency

distribution, in a practical implementation).

The remainder of this paper is organized as follows.

We briefly refer to the related work on aggregation algo-

rithms in Section 2. Section 3 describes Flow Updating, a

robust distributed aggregation algorithm able to work in

dynamic networks. In Section 4, we evaluate the proposed

approach. Finally, we make some concluding remarks in

Section 5.

2. Related Work

Classic approaches, like TAG [3], perform a tree-based

aggregation where partial aggregates are successively com-

puted from child nodes to their parents until the root of

the aggregation tree is reached (requiring the existence

of a specific routing topology). This kind of aggregation

technique is often applied in practice to Wireless Sensor

Networks (WSN) [11]. Other tree-based aggregation ap-

proaches can be found in [4], [12], and [13]. We should

point out that, although being energy-efficient, the relia-

bility of these approaches may be strongly affected by the

inherent presence of single-points of failure in the aggrega-

tion structure. Moreover, in order to operate on dynamic

settings, a tree maintenance protocol is required to han-

dle node arrival/departure, which may lead to temporary

disconnection during the parent switching process.

Alternative aggregation algorithms based on the appli-

cation of probabilistic methods can also be found in the lit-

erature. This is the case of Extrema Propagation [14] and

COMP [15], which reduce the computation of an aggrega-

tion function to the determination of the minimum/maximum

of a collection of random numbers. These techniques tend

to emphasize speed, being less accurate than averaging

approaches.

Specialized probabilistic algorithms can also be used

to compute specific aggregation functions, such as count

(e.g., to determine the network size). This type of al-

gorithm essentially relies on the results from a sampling

process to produce an approximate estimate of the aggre-

gate, using properties of random walks, capture-recapture

methods and other statistic tools [16, 5, 17, 6]. These ap-

proaches can provide some flexibility in dynamic settings,

but are not accurate. The estimation error, present even in

fault-free settings, depends on the quality of the collected

sample, and the used estimator. Moreover, a sample is

made available at a single node, and it can take several

rounds to collect one sample. For example, the estima-

tion error can reach 20% in Sample & Collide [16, 5], and

a single sampling step takes dT (where d is the average

connection degree and T is a timer value that must be suf-

ficiently large to provide a good sample quality) and must

be repeated until l new samples have been observed.

The averaging approach to distributed aggregation is

based on an iterative averaging process between small sets

of nodes [18, 7, 2, 19, 20]. Eventually, all nodes will con-

verge to the correct value by performing the averaging pro-

cess across all the network. These approaches are indepen-

dent from the network routing topology, are often based

on a gossip (or epidemic) communication scheme, and are

2

Page 5: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

able to produce an estimate of the resulting aggregate at

every network node. Averaging techniques are considered

to be robust and accurate (converge over time) when com-

pared to other aggregation techniques, but in practice they

exhibit relevant problems that have been overlooked, not

supporting message loss nor node crashes (see [21] for more

details). Moreover, most existing approaches rely on in-

efficient strategies to handle network dynamism, like the

use of a restart mechanism that looses all progress.

A technique which combines the basic idea from Flow

Updating with mass distribution is the MDFU algorithm,

presented in [22]. This one keeps a pair of incoming-

outgoing flow-like values, but which increase unbound-

edly. It inherits the convergence properties of the underly-

ing mass distribution, while also being fault-tolerant and

allowing input value changes. Another algorithm which

keeps a pair of incoming-outgoing values that summarize

past messages is the more recent Limosense [23], which

adapts the classic Push-Sum [18], inheriting its conver-

gence properties, while being also fault-tolerant and allow-

ing input value changes and network dynamism. Recently,

an algorithm named Push-Flow that combines Flow Up-

dating with Push-Sum was described in [24].

A comprehensive survey about distributed data aggre-

gation algorithms is found in [25].

3. Flow Updating

Flow Updating [9, 10] is a recent averaging based ag-

gregation approach, which works for any network topol-

ogy and tolerates faults. Like existing gossip-based ap-

proaches, it averages values iteratively during the aggrega-

tion process towards converging to the global network av-

erage. But unlike them, it is based on the concept of flow,

providing unique fault-tolerant characteristics by perform-

ing idempotent updates.

The key idea in Flow Updating is to use the flow con-

cept from graph theory (which serves as an abstraction

for many things like water flow or electric current), and

instead of storing in each node the current estimate in a

variable, compute it from the input value and the contri-

bution of the flows along edges to the neighbors:

ei = vi −∑

j∈ni

fij . (1)

This can be read as: the current estimate ei in a node

i is the input value vi less the flows fij from the node

to each neighbor j. The algorithm aims to enforce and

explore the skew symmetry property of the flow along an

edge, i.e., fij = −fji.

The essence of the algorithm is: each node i stores the

flow fij to each neighbor j; node i sends flow fij to j in

a message; a node j receiving fij updates its own fji with

−fij . Messages simply update flows, being idempotent;

the value in a subsequent message overwrites the previous

one, it does not add to the previous value. If the skew

symmetry of flows holds, the sum of the estimates for all

nodes (the global mass) will remain constant:

i∈V

ei =∑

i∈V

(vi −∑

j∈ni

fij) =∑

i∈V

vi. (2)

The intuition is that if a message is lost the skew sym-

metry is temporarily broken, but as long as a subsequent

message arrives, it re-establishes the symmetry. The real-

ity is somewhat more complex: due to concurrent execu-

tion, messages between two nodes along a link may cross

each other and both nodes may update their flows concur-

rently; therefore, the symmetry may not hold, but what

happens is that fij + fji converges to 0, and the global

mass converges to the sum of the input values of all nodes.

Message loss only delays convergence; it does not impact

the convergence direction towards the correct value.

Enforcing the skew symmetry of flows along edges through

idempotent messages is what confers Flow Updating its

unique fault tolerance characteristics, that distinguish it

from previous approaches. It tolerates message loss by

design without requiring additional mechanisms to detect

3

Page 6: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

and recover mass from lost messages. It solves the mass

conservation problem, not by instantaneous mass invari-

ance, but by having mass convergence.

In order to operate on dynamic networks Flow Updat-

ing maintains a dynamic mapping of flows according to the

current set of neighbors: removing the entries relative to

leaving (or crashing) nodes, and adding entries for newly

arrived nodes. The averaging process in each node uses

only the current set of neighbors. This straightforward

strategy allows Flow Updating to cope with node depar-

ture/crash and node arrival, extending its fault tolerance

properties.

Node departure, crash or arrival are modeled by a fail-

ure detector [26] that gives for each node at each moment

the set of neighbors considered to be alive. The interest-

ing thing is that Flow Updating allows the use of practical

implementations of failure detectors, that can be incorrect

many times (falsely suspecting correct nodes or vice-versa)

without compromising correctness.

New nodes are immediately allowed to participate in

the averaging process, and leaving or crashed nodes im-

plicitly stop participating in it. The algorithm runs con-

tinuously, without requiring restarts in order to adapt to

network changes/failures, and simply makes use of the set

of neighbors ni given by the failure detector. No concept

of epoch is required and the algorithm is always converging

towards the average according to the current set of partic-

ipants, allowing a fast self adaptation to network changes.

Another advantage of Flow Updating is that it also allows

the input values to be aggregated vi to change over time

(e.g. a temperature). Again, the algorithm will converge

to the aggregation of the most recent value at each node

without requiring a restart.

3.1. Algorithm – Synchronous Version

The algorithm is now described under the synchronous

network model (as in Chapter 2 of [27]). Computation pro-

ceeds in synchronous rounds. At each round, first nodes

1 inputs:2 vi, value to aggregate3 ni, set of neighbors given by failure detector

4 state:5 Fi, flows: initially, Fi = 6 message-generation function:7 msgi(Fi, j) = (i, Fi(j), est(vi, Fi))

8 state-transition function:9 transi(Fi, Mi) = F ′i

10 with11 F = j 7→ −f | j ∈ ni ∧ (j, f, ) ∈Mi ∪12 j 7→ f | j ∈ ni ∧ (j, , ) 6∈Mi ∧ (j, f) ∈ Fi13 E = i 7→ est(vi, F ) ∪14 j 7→ e | j ∈ ni ∧ (j, , e) ∈Mi ∪15 j 7→ est(vi, Fi) | j ∈ ni ∧ (j, , ) 6∈Mi16 a = (

∑e | ( , e) ∈ E)/ |E|17 F ′i = j 7→ f + a− E(j) | (j, f) ∈ F18 estimation function:19 est(v, F ) = v −∑f | ( , f) ∈ FAlgorithm 1: Flow Updating algorithm for dynamicnetworks in the synchronous network model.

look at their state and compute what messages are sent,

through a message-generation function; then nodes take

their state and the messages received and compute a new

state, through a state-transition function. Each node needs

only to be able to distinguish its neighbors, not requiring

the use of globally unique identifiers.

The algorithm, presented in Algorithm 1, makes use of

inputs, which are values that can change due to external

factors, whose current value can be read at any round, but

are not updated by the algorithm itself. The two inputs of

each node i are the value to aggregate (vi) and the current

set of neighbors (ni) as given by the failure detector.

The state of each node i consists of a mapping Fi from

node ids to flows; it stores, for each current neighbor, the

flow along the edge to that node.

The message-generation function takes the state (the

flows) and a neighbor id, and returns the message to be

sent to that node. Every round, all nodes send messages

to all of their neighbors. A single type of message is sent,

containing the self id i, the flow Fi(j) to the respective

neighbor j, and the aggregate estimate (line 7). When

4

Page 7: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

no flow value is available for a given neighbor, initially

or when a new node starts participating, the value 0 is

used. We assume for notational convenience that apply-

ing a mapping M to a non-mapped key k yields 0, i.e.,

M(k) = 0. The estimate is computed by making use of

the estimation function (line 19), a function of the input

value and the flows (Equation 1).

The state-transition function (lines 9–17) takes a state

Fi and the set of messages Mi received by the node in

the round, and returns a new state F ′i . We make use of

some auxiliary variables to compute the new state: the

flows F updated according to the messages received, and

the estimates E (last received) used to compute the new

average a. In more detail:

• F is a mapping from current neighbor ids to: the

symmetric of the flow in messages, for those neigh-

bors whose messages arrived (line 11); the current

value, if any, in the case of message loss (line 12).

• E is a mapping from node ids to estimates: for the

self node i, according to the estimation function, us-

ing the newly updated flows in F (line 13); for neigh-

bors whose messages arrived, the estimate sent (line

14); otherwise, the estimate according to the estima-

tion function, using the flows at the beginning of the

round (this is the estimate sent to all neighbors at

the beginning of the round) (line 15).

• a is simply the average of the estimates in the map-

ping E, and represents the new estimate towards

which the node will lead its neighbors to converge

in the next round (line 16).

Finally, the new mapping F ′i is calculated by adjusting

each flow in F so that the estimates move towards a: the

estimate for node i in the beginning of next round will be

a; each neighbor would compute a as its estimate at the

end of the next round if it did not receive other messages

from its own neighbors (e.g., if it has node i as its only

neighbor).

3.2. Algorithm – Asynchronous Version

The first description of Flow Updating used the syn-

chronous network model. Such model is useful, to reason

only in the number of communication rounds (while taking

into account concurrent execution), and therefore, useful

to perform a quantitative evaluation, namely to compare

the performance of different algorithms without having to

make assumptions about latencies. Real distributed sys-

tems, however, give weaker timing guaranties, and the is-

sue of how to practically implement Flow Updating arises.

As we are talking about a fault-tolerant algorithm,

namely that works under message loss, the classic tech-

nique of using a synchronizer [28] to allow a synchronous

algorithm to be transformed into an asynchronous one can-

not be applied. Moreover, even though it is advantageous

(as we will see in the evaluation) to compute an average

after having collected contributions from many neighbors,

lockstep computation as in the synchronous version is not

an inherent requirement of flow updating.

These considerations lead us to present flow updating

for the asynchronous network model (see e.g., Chapters 8

and 14 of [27]) directly in two variants. The first one is

the more natural in the asynchronous setting, in which the

algorithm reacts to each message, performing a pairwise

averaging of two values; the second mimics the spirit of

the synchronous one, and tries to collect messages from all

neighbors before performing an averaging step.

The pairwise asynchronous version of Flow Updating

is presented in Algorithm 2, while the collect-all version

is presented in Algorithm 3. The algorithms have inputs

which are not only read but can change arbitrarily and

continuously due to external factors, state variables which

are manipulated by the algorithm, and events to which the

algorithms react, namely: initi when node i starts comput-

ing, receivej,i representing node i receiving a message sent

by node j, and ticki representing a clock tick at node i.

A node i reacts to an event by changing local state and

possibly sending messages to neighbors, through sendi,j .

5

Page 8: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

1 inputs:2 vi, value to aggregate3 ni, set of neighbors given by failure detector4 ti, timeout value (in number of ticks)

5 state:6 Fi, flows: initially, Fi = 7 Ei, neighbors estimates: initially, Ei = 8 Ci, ticks since last averaging, for each neighbor:

initially, Ci = 9 on initi()

10 foreach j ∈ ni do11 sendi,j(0, vi)

12 on receivej,i(f, e)13 Ei(j) := e14 Fi(j) := −f15 averageAndSend(j)

16 on ticki()17 foreach j ∈ ni do18 Ci(j) := Ci(j) + 119 if Ci(j) ≥ ti then averageAndSend(j)20

21 procedure averageAndSend(j)22 e = vi −

∑j∈ni

Fi(j)23 a = (Ei(j) + e)/224 Fi(j) := Fi(j) + a− Ei(j)25 Ei(j) := a26 Ci(j) := 027 sendi,j(Fi(j), a)

Algorithm 2: Flow Updating algorithm for dynamicnetworks in the asynchronous network model, pairwiseversion.

Common behavior upon receive and tick events is factored

out in the “averageAndSend” procedure. As before, we as-

sume for notational convenience that applying a mapping

M to a non-mapped key k yields 0, i.e., M(k) = 0. We

also update individual keys through a simple assignment,

i.e., M(k) := x, regardless of whether they were already

mapped.

In both versions each node starts by sending a message

to each neighbor. In the pairwise version, the basic behav-

ior, if no failures occurred and the timeout mechanism did

not exist, is for each node to “reply” back with a new flow

and estimate, after averaging, as soon as it receives new

knowledge from a given node. The idea is that it results in

communication along different links to adapt to the respec-

1 inputs:2 vi, value to aggregate3 ni, set of neighbors given by failure detector4 ti, timeout value (in number of ticks)

5 state:6 Fi, flows: initially, Fi = 7 Ei, neighbors estimates: initially, Ei = 8 Mi, neighbor ids of messages received since last

averaging: initially, Mi = 9 ci, ticks since last averaging: initially, ci = 0

10 on initi()11 foreach j ∈ ni do12 sendi,j(0, vi)

13 on receivej,i(f, e)14 Ei(j) := e15 Fi(j) := −f16 Mi := Mi ∪ j17 if Mi ⊇ ni then averageAndSend()18

19 on ticki()20 ci := ci + 121 if ci ≥ ti then averageAndSend()22

23 procedure averageAndSend(j)24 e = vi −

∑j∈ni

Fi(j)25 a = (e +

∑j∈ni

Ei(j))/(|ni|+ 1)26 foreach j ∈ ni do27 Fi(j) := Fi(j) + a− Ei(j)28 Ei(j) := a29 sendi,j(Fi(j), a)30 Mi := 31 ci := 0

Algorithm 3: Flow Updating algorithm for dynamicnetworks in the asynchronous network model, collect allversion.

tive link latency, with messages bouncing back and forth at

different rates for different links, without forcing a global

lockstep. The second version, by making each node wait

for all neighbors before averaging, would behave as the

synchronous algorithm under no message failures and no

timeout mechanism, i.e., it would force a global lockstep.

In both versions tolerance to message loss is obtained

through a “timeout” mechanism, which forces the averag-

ing step and message sending if a given number of local

clock ticks have occurred since the last averaging step. In

the asynchronous model this says nothing about real time,

6

Page 9: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

as a tick is just an event of which we only know that it

will occur infinitely often in an infinite trace. We have,

however, presented the algorithms this way (instead of us-

ing a more vague “periodically”) so that they can be more

directly translated to practical implementations where a

tick will correspond to the physical clock ticking of the

computing node.

An interesting difference between both versions is that

the pairwise version uses a per-link timeout (as opposed

to a single timeout as the other version). This is because

otherwise, if each of two linked nodes a and b kept com-

municating successfully with all their neighbors except b

and a respectively, we could have an infinite trace where a

single timeout variable was constantly being reset by the

successful communications. This would imply that mes-

sages would stop being exchanged between a and b after

the loss of the initial messages.

Only one flow and estimate is kept per neighbor. In

particular, in the collect-all version, if between averaging

steps a newer message arrives from a given neighbor it will

overwrite the previously stored values. This means that

the values used in the averaging may not be the last ones

sent, in the general case of non-FIFO channels; this can

also occur in the pairwise version. This is not a problem,

and no effort was made to enforce FIFO semantics through

sequence numbers.

4. Evaluation

In this section, we provide experimental results to eval-

uate Flow Updating under demanding faulty scenarios,

with both churn and message loss. We also compare it

against existing average based techniques, and take into

consideration some practical concerns, like the use of real-

istic failure detectors and asynchrony, to assess its imple-

mentation in real environments.

For this purpose, we use a custom discrete event simu-

lator which allows evaluating both synchronous and asyn-

chronous algorithms, to compute the count aggregation

function (determination of network size). The synchronous

algorithm is used everywhere, except in the section de-

voted to asynchrony. The count aggregation function is

used in all simulations, unless stated otherwise. Conver-

gence speed depends on the initial data distribution across

the network; count represents an extreme scenario where

only one node starts with the value 1 and all others with

0. We chose to use this aggregation function for evaluation

because it is the one with the worst performance. The al-

gorithm will perform better when computing an average

of uniformly distributed input values. Sum will have the

same performance as count, as it is computed by com-

bining average and count.

We consider two different network topologies: random

(where all nodes are randomly linked to each other, accord-

ing to a predefined degree d), and 2D/mesh (geographical

networks, with random uniform node placement, where

communication links are established according to a pre-

defined radio communication range, an approximation to

the topologies occurring in WSN). The results for each

scenario are drawn from 30 trials of the execution of the

algorithms under identical settings. In each trial different

randomly generated networks with the same characteris-

tics (topology, size and average degree) are used.

The main metric used in most simulation scenarios

is the CV(RMSE) (Coefficient of Variation of the Root

Mean Square Error)4, which expresses the global accuracy

reached by an algorithm. This metric allows the analy-

sis of the speed and message load of the tested algorithms,

when combined with the proper criteria, respectively: time

(or number of rounds) and number of messages sent (by

each node). Message load can be interpreted as an ap-

proximation to energy expenditure in WSN, as message

transmission is often the dominating factor in terms of en-

ergy consumption5.

4Root of the mean squared differences between the estimate ei ateach node i and the correct result a, divided by the correct result:1a

√1n

∑n

i=1(ei − a)2

5As referred in [29], the energy consumed to transmit a single

7

Page 10: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

4.1. Performance Comparison Under no Faults

Here, Flow Updating (FU) is compared to three sig-

nificant distributed aggregation algorithms from the same

class (i.e., averaging): Push-Sum Protocol (PSP) [18], Push-

Pull Gossiping (PPG) [7], and Distributed Random Group-

ing (DRG) [19]. This evaluation is performed under strictly

identical simulation settings (same networks and initial

distribution of input values), aiming for a fair comparison.

In addition, the specific parameters of each algorithm were

tuned to grant them the best performance in each simu-

lated scenario (e.g., the probability to become leader in

DRG).

A comparison with the recent Limosense [23] approach

was tried but was not viable, since simulations with con-

current executions quickly lead to runs where the algo-

rithm crashes due to divisions by 0. The root cause in

the algorithm formulation is probably quite addressable

but it precluded a direct comparison here. However, since

Limosense inherits the convergence behavior of PSP our

comparison with that protocol provides a suitable refer-

ence in the no faults scenario.

The comparison was performed on fixed and reliable

network topologies (i.e. without dynamism nor faults)

with the same size n = 1000, but two different average

connection degrees, i.e. d ≈ 3 and d ≈ 10. The results

are depicted by Figures 1 and 2 for random networks, and

Figures 3 and 4 for 2D/mesh. The first feature observed in

all results is that PPG does not converge over time (even

without faults). This issue was already reported and more

details can be found in [21].

On random networks with low connection degree (i.e.,

d ≈ 3) FU clearly outperforms the other compared al-

gorithms, both in terms of convergence speed and mes-

sage load. However, a degradation of the performance of

FU is observed in networks with a higher connection de-

gree (i.e., d ≈ 10), unlike the other compared algorithms

bit corresponds roughly to the one required to execute thousands ofinstructions.

which exhibit the opposite behavior (i.e., better perfor-

mance for the higher connection degree). Nonetheless, a

distinct behavior is perceived on 2D/mesh networks, and

the performance degradation of FU for the higher connec-

tion degree is no longer verified. In fact, the performance

of FU increases for d ≈ 10. In this type of network (which

more closely corresponds to WSN), FU considerably out-

performs the other techniques.

It was also observed in [24] that in large hypercube

topologies FU exhibits a worst performance than PSP. In

this type of topology, and in other networks with high con-

nection degree, it is possible to improve the performance

of FU by applying simple heuristics to use a subset of the

available neighbors for the aggregation process (i.e., ig-

noring some links) as shown in [30]. A detailed study of

the performance issues in these scenarios and heuristics for

their improvement is left for future work.

4.2. Churn and Message Loss

We now consider the count aggregate computation

in dynamic settings. Computing the count aggregate is

particularly demanding, and useful, in networks where the

number of nodes is actively changing. First, we will con-

sider this task in the absence of message loss and later

introduce that additional perturbation.

All networks considered in every churn scenario start

with the same size (n = 1000), and the same approxi-

mated average connection degree d ≈ log n (where log is

the natural logarithm). The choice of d was influenced by

[31], where it is stated that some nodes must have a de-

gree Ω(log n) in order to keep the network connected with

constant probability, considering that all nodes fail with a

probability of 0.5. In general, the value used was enough

to avoid network partitioning for the simulated churn sce-

narios (e.g., failure of one quarter of the nodes).

We start by considering a random network scenario,

when subject to both drastic and continuous changes of

the network membership. For this purpose, we succes-

8

Page 11: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

0.001

0.01

0.1

1

10

100

0 50 100 150 200 250 300

CV(

RM

SE)

Rounds

PSPPPGDRG

FU

(a) Convergence speed

0.001

0.01

0.1

1

10

100

0 50 100 150 200 250 300

CV(

RM

SE)

Messages Sent (by each node)

PSPPPGDRG

FU

(b) Message load

Figure 1: Comparison of averaging algorithms, on random networks with n = 1000 and d ≈ 3.

0.001

0.01

0.1

1

10

100

0 50 100 150 200 250 300

CV(

RM

SE)

Rounds

PSPPPGDRG

FU

(a) Convergence speed

0.001

0.01

0.1

1

10

100

0 50 100 150 200 250 300

CV(

RM

SE)

Messages Sent (by each node)

PSPPPGDRG

FU

(b) Message load

Figure 2: Comparison of averaging algorithms, on random networks with n = 1000 and d ≈ 10.

sively applied a sudden departure (catastrophic crash) and

arrival of 25% of the initial nodes, followed by an arrival

and departure of the same portion of nodes at a constant

rate (10 nodes per round). For a matter of clarity, a stabil-

ity period of 50 rounds is introduced between each network

change.

First, we compare Flow Updating (FU) with Push-

Pull Gossiping [7] (PPG), and Push-Pull Ordered Wait [21]

(PPOW is a fix of PPG that solves its atomicity prob-

lems), in the described random dynamic network scenario

without message loss. PPG implements a restart mecha-

nism to cope with churn, starting a new instance of the

algorithm after a predefined number of rounds (epoch),

and prevents new nodes from participating in the current

epoch. Similarly to PPG, PPOW was extended with a

restart mechanism, but instead of delaying the participa-

tion of new nodes to the next epoch, joining nodes are

allowed to participate immediately. This modification was

applied since it yielded more favorable results to PPOW

in all performed experiments.

Figure 5 shows the results obtained, using an epoch

length of 50 rounds. We can observe that an overestimate

is produced by PPG due to its atomicity problems, even

without network changes (e.g. between round 0 and 50),

9

Page 12: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

0.001

0.01

0.1

1

10

100

0 200 400 600 800 1000

CV(

RM

SE)

Rounds

PSPPPGDRG

FU

(a) Convergence speed

0.001

0.01

0.1

1

10

100

0 200 400 600 800 1000

CV(

RM

SE)

Messages Sent (by each node)

PSPPPGDRG

FU

(b) Message load

Figure 3: Comparison of averaging algorithms, on 2D/mesh networks with n = 1000 and d ≈ 3.

0.001

0.01

0.1

1

10

100

0 200 400 600 800 1000

CV(

RM

SE)

Rounds

PSPPPGDRG

FU

(a) Convergence speed

0.001

0.01

0.1

1

10

100

0 200 400 600 800 1000

CV(

RM

SE)

Messages Sent (by each node)

PSPPPGDRG

FU

(b) Message load

Figure 4: Comparison of averaging algorithms, on 2D/mesh networks with n = 1000 and d ≈ 10.

which is solved by PPOW that converges to the expected

value. More importantly, these results expose the effect

of the restart mechanism, that introduces an undesirable

delay to respond to network change. This delay is also

observed even if only the estimate at the end of each epoch

is considered valid (points at the end of each PPG and

PPOW epoch, every 50 rounds). In the particular case

of PPG, the delay is present in both node departure and

arrival. However, in PPOW the response time to changes

is reduced in the case of nodes arrival by allowing joining

nodes to immediately participate in the current epoch.

The restart mechanism introduces a trade-off between

the response time to network change and the accuracy of

the push-pull algorithms, preventing them from following

the network change with high accuracy. In contrast, FU is

able to closely follow the network changes without requir-

ing any restart mechanism. FU clearly outperforms the

other approaches (PPG and PPOW) which are unable to

adapt to network changes. For this reason, the remainder

of the evaluation focuses exclusively on Flow Updating.

We now evaluate the behavior of FU on the same dy-

namic random network. But besides churn, we also con-

sider that each individual message can be lost according

to a given probability. Figure 6 shows more clearly that

10

Page 13: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300 350

CO

UN

T

Rounds

real valueFU

PPGPPOW

Figure 5: Comparison of Flow Updating (FU), Push-Pull Gossiping(PPG) and Push-Pull Ordered Wait (PPOW): Average of estimatesin a dynamic random network, with no message loss.

600

700

800

900

1000

1100

1200

1300

0 50 100 150 200 250 300

CO

UN

T

Rounds

real valueno loss

20% loss40% loss

Figure 6: Average of estimates in a dynamic random network withmessage loss.

the mean of the estimates produced by FU closely follows

the network changes.

Figure 7 shows the CV(RMSE) over time, allowing the

observation of the global accuracy variation due to net-

work dynamism. This metric compares each individual

estimate against the actual network size, as perceived by

an external observer that can inspect the whole network

in 0 rounds. This is a very demanding metric, since in

any actual distributed algorithm nodes would have a de-

lay proportional to diameter rounds before knowing the

network size.

From Figures 6 and 7, we observe that even consid-

erable message loss (20% and 40%) only slightly affects

1e-05

0.0001

0.001

0.01

0.1

1

10

100

1000

0 50 100 150 200 250 300

CV(

RM

SE)

Rounds

no loss20% loss40% loss

Figure 7: Coefficient of variation of the RMSE in a dynamic randomnetwork with message loss.

convergence speed and the ability of the algorithm to cope

with churn. Curiously, in some situations the algorithm

even benefits from message loss, increasing its convergence

speed (e.g., 20% loss in rounds 175 to 225 being better

than no loss). We found out that it is possible to increase

convergence speed by “deactivating” some communication

links [30]. This deactivation also provides a considerable

reduction on the number of messages required to reach a

given accuracy level. In some cases, message loss repro-

duces this effect.

The results confirm the fast convergence of the algo-

rithm during stability periods, and show expected accu-

racy decreases (increase of the CV(RMSE)) resulting from

network changes. Brutal changes lead to momentary per-

turbations which are rapidly reduced, while continuous

changes will provoke an accuracy reduction that persists

during the continuous churn time period. In this particular

case, for the considered churn rate (10 nodes per round),

the arrival of nodes will increase the global error from less

than 0.01% to about 3.5%, and node departures will in-

crease it from less than 0.01% to about 50%. Node depar-

ture (or crashes) induce higher perturbations than node

arrivals; in both cases the higher the number of nodes

involved the bigger the impact on node estimation accu-

racy. The effect of churn on each node estimate is clearly

11

Page 14: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300

Est

imat

ed V

alue

s

Rounds

real value

Figure 8: Estimates distribution in a dynamic random network with20% of message loss.

300 400 500 600 700 800 900

1000 1100 1200 1300

0 500 1000 1500 2000 2500 3000

CO

UN

T

Rounds

real valueno loss

20% loss40% loss

Figure 9: Average of estimates in a dynamic 2D/mesh network withmessage loss.

depicted by Figure 8, which shows the distribution of in-

dividual estimates along time, considering 20% of message

loss6.

The previous simulation scenarios are now applied us-

ing 2D/mesh network topologies. Since the convergence

speed of these kind of networks is much slower, a big-

ger stability period (500 rounds) and slower churn rate (1

node per round) were considered. Results are presented in

Figures 9 and 10. In these settings, the behavior of Flow

Updating is similar to the one previously described for ran-

dom networks, although a deeper contrast between the ef-

fect of node arrival and departure is observed. Namely,

6The graphic of the distribution of node estimates in a scenariowithout message loss is very similar to the case of 20% faults.

0.0001

0.001

0.01

0.1

1

10

100

1000

0 500 1000 1500 2000 2500 3000

CV(

RM

SE)

Rounds

no loss20% loss40% loss

Figure 10: Coefficient of variation of the RMSE in a dynamic2D/mesh network with message loss.

the perturbation introduced by a sudden (round 1000)

or continuous (rounds 1500 to 1750) arrival of nodes is

very small. On the contrary, node departure/crash has a

greater impact in this kind of network.

Node departure/crash breaks the flows established be-

tween nodes, and can result in the removal of links con-

necting different clusters, breaking the equilibrium in the

whole network. This may lead to a global rearrangement

of flows across the network, in order to reach a new equi-

librium state. On the other hand, new nodes will simply

provide new links (alternative paths), without breaking ex-

isting ones, leading to a smaller adjustment of the existing

flows in order to converge to the new aggregate.

4.3. Failure Detection

Failure Detectors (FD) are oracles providing informa-

tion about whether processes have failed [26]; however,

they do not necessarily provide correct information. Two

main types of mistakes may occur: incorrect suspicions,

when the FD incorrectly suspects a correct process; non

suspicions, when a faulty process is not suspected by the

FD. Here, the impact of realistic unreliable FD in the ex-

ecution of FU on dynamic settings is evaluated.

Practical implementations of FD are commonly timeout-

based [32]. Therefore, a simple timeout based implemen-

tation was considered, marking a node as suspected if

12

Page 15: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

no message is received from it after a predefined time-

out value. The evaluation was carried out using the same

succession of churn events of the previous simulations (i.e.,

sudden departure/arrival of a large amount of nodes, and

continuous arrival/departure of a small number of nodes at

a constant rate), and on random networks with the same

setting (i.e., n = 1000 and d ≈ log n). The use of several

FD with different timeout values was compared, ranging

from 1 round (aggressive FD) to 4 rounds (conservative

FD), and including a perfect FD as baseline. Three sce-

narios of message loss were evaluated: no loss, 10% and

20% of message loss. Figure 11 shows how the performance

of FU is affected by FD with different timeout values, when

subjected to churn and message loss.

Each FD takes timeout rounds to detect the depar-

ture/crash of a node, never suspecting the leaving node

during that time. Therefore, only after timeout rounds

FU will be informed of the departure/crash of nodes, in-

curring on a delay proportional to the FD timeout to re-

act to departures/crashes, as shown by Figure 11(a) and

11(b). Nonetheless, the impact of this delay is not very

significant and FU is still able to closely adapt to changes.

On the other hand, message loss can significantly im-

pact the performance of FU when using aggressive FD.

Message loss may cause incorrect suspicion of some nodes,

making FU remove the flows of a correct process, and the

whole system to start converging to a new (incorrect) aver-

age. Upon the reception of a message from an incorrectly

suspected node, its flow will be immediately restored, and

the convergence will be back on track towards the correct

result. However, since message loss occurs continuously

over time, this situation will also occur recurrently, espe-

cially with aggressive FD (i.e., with a small timeout), in-

troducing a constant perturbation on the execution of FU

and impairing its convergence towards the correct value.

As shown by Figures 11(c)–11(d) and 11(e)–11(f), the

higher the amount of message loss the higher the impact on

FU, especially when using a FD with a small timeout. This

is because FD with small timeout values only requires a few

consecutive message losses to incorrectly mark a node as

suspected (e.g., only one message loss is enough for the FD

with timeout 1), while FD with larger timeouts will require

a proportional amount of consecutive message losses before

incorrectly suspecting a node, which is less likely to happen

for moderate message loss rates. Therefore, it is more

appropriate to use conservative FD timeouts.

The results obtained show that the selection of an ap-

propriate FD timeout is fundamental to ensure a good

performance and accuracy of FU. It is important to use a

practical FD that minimizes the number of incorrect sus-

picions, in order to avoid an undesired performance degra-

dation of FU. Despite the additional delay introduced by

a conservative FD to react to network changes (i.e., node

departure/crash), this kind of FD should, therefore, be

preferred. This recommendation is valid for any network

topology, as it is expected that fault detection will affect

FU in the same way.

4.4. Input Value Change

Here, the behavior of FU is experimentally evaluated

when subjected to the dynamic change of the initial input

values of the network nodes. For that purpose, a simple

dynamic input value change scenario was defined, to com-

pute the network average. Initially, each node starts with

an input value chosen uniformly at random between 25 and

35; after 50 rounds 50% of the nodes (randomly chosen)

start increasing their input value 5% each round, during 50

rounds; finally, they reduce their value by the same amount

during another 50 rounds. These simulation settings in-

tend to represent a possible variation of the temperature

sensed by some nodes in an hypothetical monitoring envi-

ronment. The execution of FU was compared considering

different message loss amounts (i.e., 0%, 20% and 40%),

over random networks (n = 1000 and d ≈ 3).

Figure 12 shows that the average of the estimates pro-

duced by all nodes closely follows the change of the global

13

Page 16: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

500

600

700

800

900

1000

1100

1200

1300

0 50 100 150 200 250 300

CO

UN

T

Rounds

Real ValuePerfect FD

FD (timeout=1)FD (timeout=2)FD (timeout=3)FD (timeout=4)

(a) Average estimate (no loss)

1e-05

0.0001

0.001

0.01

0.1

1

10

100

1000

0 50 100 150 200 250 300

CV(

RM

SE)

Rounds

Perfect FDFD (timeout=1)FD (timeout=2)FD (timeout=3)FD (timeout=4)

(b) Convergence speed (no loss)

500

600

700

800

900

1000

1100

1200

1300

0 50 100 150 200 250 300

CO

UN

T

Rounds

Real ValuePerfect FD

FD (timeout=1)FD (timeout=2)FD (timeout=3)FD (timeout=4)

(c) Average estimate (10% loss)

1e-06 1e-05

0.0001 0.001

0.01 0.1

1 10

100 1000

0 50 100 150 200 250 300

CV(

RM

SE)

Rounds

Perfect FDFD (timeout=1)FD (timeout=2)

FD (timeout=3)FD (timeout=4)

(d) Convergence Speed (10% loss)

500

600

700

800

900

1000

1100

1200

1300

0 50 100 150 200 250 300

CO

UN

T

Rounds

Real ValuePerfect FD

FD (timeout=1)FD (timeout=2)FD (timeout=3)FD (timeout=4)

(e) Average estimate (20% loss)

1e-06 1e-05

0.0001 0.001

0.01 0.1

1 10

100 1000

0 50 100 150 200 250 300

CV(

RM

SE)

Rounds

Perfect FDFD (timeout=1)FD (timeout=2)

FD (timeout=3)FD (timeout=4)

(f) Convergence Speed (20% loss)

Figure 11: Effect of FD on the execution of FU in dynamic settings with message loss – random networks (n = 1000, d ≈ log n).

14

Page 17: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

20 40 60 80

100 120 140 160 180 200

0 20 40 60 80 100 120 140 160 180 200

AVG

Rounds

real valueno loss

20% loss40% loss

Figure 12: Average of estimates in a random network with inputvalue changes and message loss.

20

40

60

80

100

120

140

160

180

200

220

0 20 40 60 80 100 120 140 160 180 200

Est

imat

ed V

alue

s

Rounds

real value

Figure 13: Estimates distribution in a random network with inputvalue changes and 20% message loss.

average (with a small delay), even under considerable mes-

sage loss. A more precise view of the estimate of all nodes

over time is given by Figure 13, for a simulation with 20%

message loss. The results confirm that the estimates at all

nodes closely follow the input changes, and that the dif-

ference between nodes estimates is small. The graphs for

the scenarios with 0% and 40% message loss (not shown)

are very similar. Only a slightly variance on the difference

between the estimates can be observed, being even smaller

in the scenario without loss and a bit bigger with 40% of

message loss.

4.5. Asynchrony

We now evaluate the asynchronous version of FU, when

used in realistic settings. The algorithm was described for

asynchronous networks, working under all timing assump-

tions. For evaluation purposes, given that there is no need

for a global clock, that processing time is negligible, and

assuming that local clock drift will also be negligible, we

focus on the effect of variable latency in communication,

assuming that a practical implementation will have the

timeout variable reflecting “elapsed real time”.

The evaluation aims to answer two questions: 1) which

of the two asynchronous versions of FU is preferable in

practice; 2) how should the timeout be chosen, for a given

latency distribution. As before, the criteria used are con-

vergence speed and number of messages sent.

The transmission time of each message was chosen ac-

cording to a fixed probability distribution, with no attempt

to distinguish different links. In particular, a rough ap-

proximation to the distribution of message latencies ob-

served in PlanetLab [33] was defined, according to the RTT

(Round Trip Time) measurements presented in [34]. In-

spired by [35], we approximated transmission times with

the sum of two components: a queuing delay given by

a Weibull distribution, and a minimum transmission de-

lay. More precisely, a Weibull with shape s = 2 and scale

r = 45 was used to generate the queuing delays, and a

minimum transmission delay of 50 ms was added, result-

ing in a distribution with an average of 89 ms and with

most of the transmission times below 150 ms, as presented

in Figure 14.

Timeout values of 25, 50, 100, 125, 150, and 300 ms

were considered to evaluate the performance of both asyn-

chronous versions of FU, under 20% message loss. The

results are presented in Figures 15 and 16.

The first conclusion that can be reached is that very

small timeouts are not worthwhile, because a small im-

provement in convergence speed is paid by a significant

increase of messages transmitted.

15

Page 18: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

0 0.002 0.004 0.006 0.008

0.01 0.012 0.014 0.016 0.018

0.02

40 60 80 100 120 140 160 180 200 220 240

Frac

tion

Message Transmission Time (ms)

Figure 14: Distribution of message latencies.

Comparing both asynchronous versions, we can see that

the version that mimics the synchronous one and waits for

messages from all neighbors is preferable. The pairwise

version, even though slightly faster, pays a high price in

messages transmitted: the pairwise version can be around

30% faster but at a cost of sending from 3 to more than

10 times as many messages.

Considering the choice of timeout, we can see that sen-

sible choices will be values above the average message la-

tency, in a high percentile position in the distribution, and

that there is a trade-off between convergence speed and

messages transmitted; the appropriate choice depends on

the objective pursued.

5. Conclusions

Average-based approaches constitute an important seg-

ment of aggregation algorithms due to their independence

from network topology and convergence to any desired

precision. Our previous works on averaging by Flow Up-

dating, already introduced fault tolerance and dynamism:

in static settings, achieving up to an order of magnitude

improvement in convergence speed without increasing the

message load [9]; self-adapting to network changes even

with high levels of message loss [10].

Here we have considered relevant practical concerns,

like the use of unreliable failure detectors and introduced

0.001

0.01

0.1

1

10

100

0 2000 4000 6000 8000 10000 12000

CV(

RM

SE)

Time (ms)

t=25t=50

t=100t=125t=150t=300

(a) Convergence speed

0.001

0.01

0.1

1

10

100

0 500 1000 1500 2000

CV(

RM

SE)

Messages Successfully Sent (by each node)

t=25t=50

t=100t=125t=150t=300

(b) Message load

Figure 15: Execution of asynchronous pairwise version on randomnetworks with 20% of message loss.

and evaluated an asynchronous version, showing that Flow

Updating can be effectively applied in real environments.

We have also brought attention to vulnerabilities of popu-

lar averaging techniques when faced with failures and dy-

namic environments. It is our belief that these shortcom-

ings are not easy to fix and that “mass exchange” must

give way to idempotent flow management, in order to ad-

dress these demanding scenarios.

Flow Updating uses a simple strategy to support dy-

namism, where entries for neighbor nodes are added or

removed according to the current participants (given by

a realistic failure detector implementation). This simple

design adapts to abrupt changes of network membership

16

Page 19: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

0.001

0.01

0.1

1

10

100

0 2000 4000 6000 8000 10000 12000

CV(

RM

SE)

Time (ms)

t=25t=50

t=100t=125t=150t=300

(a) Convergence speed

0.001

0.01

0.1

1

10

100

0 500 1000 1500 2000

CV(

RM

SE)

Messages Successfully Sent (by each node)

t=25t=50

t=100t=125t=150t=300

(b) Message load

Figure 16: Execution of asynchronous collect-all version on randomnetworks with 20% of message loss.

and tracks continuous variations of network size.

Evaluation showed that Flow Updating clearly outper-

forms previous strategies; unlike most, it adapts in a con-

tinuous fashion without requiring protocol restarts.

It allows all nodes to continuously adjust their esti-

mates according to network changes (i.e., churn) and input

values change, quickly converging to the current network

average, even with very high levels of message loss.

Finally, we have shown that Flow Updating can be

successfully applied in practice, relying on realistic fail-

ure detector implementations, and using a simple time-

out strategy to operate in asynchronous settings with un-

bounded communication latency. In particular, failure

detectors that reduce the number of incorrect suspicions

should be preferred (i.e., conservative timeout-based im-

plementations). In asynchronous settings, a better per-

formance is achieved by an algorithm that mimics the

synchronous one, collecting and averaging values from all

neighbors (as opposed to reacting to individual messages),

and using timeout values in a high percentile of the under-

lying message latency distribution.

Acknowledgment

This work was partially funded by FCT PhD grant

SFRH/BD/33232/2007 and by project Norte-01-0124-FEDER-

000058, co-financed by the North Portugal Regional Oper-

ational Program (ON.2 O Novo Norte), under the National

Strategic Reference Framework (NSRF), through the Eu-

ropean Regional Development Fund (ERDF).

[1] R. V. Renesse, The importance of aggregation, Future Direc-

tions in Distributed Computing 2584 (2003) 87–92.

[2] M. Jelasity, A. Montresor, Epidemic-style proactive aggregation

in large overlay networks, in: Proc. 24th International Confer-

ence on Distributed Computing Systems, 2004, pp. 102–109.

[3] S. Madden, M. Franklin, J. Hellerstein, W. Hong, TAG: a Tiny

AGgregation service for ad-hoc sensor networks, ACM SIGOPS

Operating Systems Review 36 (SI) (2002) 131–146.

[4] J. Li, K. Sollins, D. Lim, Implementing aggregation and broad-

cast over distributed hash tables, ACM SIGCOMM Computer

Communication Review 35 (1) (2005) 81–92.

[5] A. Ganesh, A. Kermarrec, E. L. Merrer, L. Massoulie, Peer

counting and sampling in overlay networks based on random

walks, Distributed Computing 20 (4) (2007) 267–278.

[6] D. Kostoulas, D. Psaltoulis, I. Gupta, K. Birman, A. Demers,

Decentralized schemes for size estimation in large and dynamic

groups, in: Proc. 4th IEEE International Symposium on Net-

work Computing and Applications, 2005, pp. 41–48.

[7] M. Jelasity, A. Montresor, O. Babaoglu, Gossip-based aggrega-

tion in large dynamic networks, ACM Transactions on Com-

puter Systems (TOCS) 23 (3) (2005) 219–252.

[8] O. Kennedy, C. Koch, A. Demers, Dynamic approaches to in-

network aggregation, in: Proc. 25th IEEE International Con-

ference on Data Engineering (ICDE), 2009, pp. 1331–1334.

[9] P. Jesus, C. Baquero, P. S. Almeida, Fault-tolerant aggregation

by flow updating, in: Proc. 9th IFIP International Conference

on Distributed Applications and Interoperable Systems (DAIS),

17

Page 20: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

Vol. 5523 of Lecture Notes in Computer Science, Springer, Lis-

bon, Portugal, 2009, pp. 73–86.

[10] P. Jesus, C. Baquero, P. S. Almeida, Fault-Tolerant Aggregation

for Dynamic Networks, in: 29th IEEE Symposium on Reliable

Distributed Systems, 2010, pp. 37–43.

[11] S. Madden, R. Szewczyk, M. Franklin, D. Culler, Supporting ag-

gregate queries over ad-hoc wireless sensor networks, in: Proc.

4th IEEE Workshop on Mobile Computing Systems and Appli-

cations, 2002, pp. 49–58.

[12] Y. Birk, I. Keidar, L. Liss, A. Schuster, R. Wolff, Veracity ra-

dius: capturing the locality of distributed computations, in:

Proc. 25th annual ACM symposium on Principles of Distributed

Computing (PODC), 2006, pp. 102–111.

[13] Y. Sun, H. Luo, S. K. Das, A trust-based framework for fault-

tolerant data aggregation in wireless multimedia sensor net-

works, Dependable and Secure Computing, IEEE Transactions

on 9 (6) (2012) 785–797. doi:10.1109/TDSC.2012.68.

[14] C. Baquero, P. Almeida, R. Menezes, P. Jesus, Extrema propa-

gation: Fast distributed estimation of sums and network sizes,

Parallel and Distributed Systems, IEEE Transactions on 23 (4)

(2012) 668–675. doi:10.1109/TPDS.2011.209.

[15] D. Mosk-Aoyama, D. Shah, Computing separable functions via

gossip, in: Proc. 25th annual ACM symposium on Principles of

Distributed Computing (PODC), 2006, pp. 113–122.

[16] L. Massoulie, E. Merrer, A.-M. Kermarrec, A. Ganesh, Peer

counting and sampling in overlay networks: random walk meth-

ods, in: Proc. 25th annual ACM symposium on Principles of

Distributed Computing (PODC), 2006, pp. 123–132.

[17] S. Mane, S. Mopuru, K. Mehra, J. Srivastava, Network size

estimation in a peer-to-peer network, Tech. rep., University of

Minnesota, Department of Computer Science (Sep. 2005).

[18] D. Kempe, A. Dobra, J. Gehrke, Gossip-based computation of

aggregate information, in: Proc. 44th Annual IEEE Symposium

on Foundations of Computer Science, 2003, pp. 482–491.

[19] J.-Y. Chen, G. Pandurangan, D. Xu, Robust computation of

aggregates in wireless sensor networks: Distributed random-

ized algorithms and analysis, IEEE Trans. Parallel Distrib. Syst.

17 (9) (2006) 987–1000.

[20] F. Wuhib, M. Dam, R. Stadler, A. Clemm, Robust monitoring

of network-wide aggregates through gossiping, in: Proc. 10th

IFIP/IEEE International Symposium on Integrated Network

Management, 2007, pp. 226–235.

[21] P. Jesus, C. Baquero, P. S. Almeida, Dependability in aggre-

gation by averaging, in: Proc. Simposio de Informatica (INFo-

rum), Lisboa, Portugal, 2009, pp. 457–470.

[22] P. S. Almeida, C. Baquero, M. Farach-Colton, P. Jesus, M. A.

Mosteiro, Fault-tolerant aggregation: Flow-updating meets

mass-distribution, in: A. F. Anta, G. Lipari, M. Roy (Eds.),

OPODIS, Vol. 7109 of Lecture Notes in Computer Science,

Springer, 2011, pp. 513–527.

[23] I. Eyal, I. Keidar, R. Rom, Limosense – live monitoring in dy-

namic sensor networks, in: Algorithms for Sensor Systems, Vol.

7111 of Lecture Notes in Computer Science, Springer Berlin /

Heidelberg, 2012, pp. 72–85.

[24] W. N. Gansterer, G. Niederbrucker, H. Strakova, S. S.

Grotthoff, Scalable and fault tolerant orthogonalization

based on randomized distributed data aggregation, Jour-

nal of Computational Science 4 (6) (2013) 480–488.

doi:10.1016/j.jocs.2013.01.006.

[25] P. Jesus, C. Baquero, P. S. Almeida, A Survey of Distributed

Data Aggregation Algorithms, IEEE Communications Surveys

and Tutorials (PrePrint). doi:10.1109/COMST.2014.2354398.

[26] T. Chandra, S. Toueg, Unreliable failure detectors for reliable

distributed systems, Journal of the ACM (JACM 43 (2) (1996)

225–267.

[27] N. A. Lynch, Distributed Algorithms, Morgan Kaufmann Pub-

lishers Inc., 1996.

[28] B. Awerbuch, Complexity of network synchronization, J. ACM

32 (4) (1985) 804–823.

[29] S. Aslam, F. Farooq, S. Sarwar, Power consumption in wireless

sensor networks, in: Proceedings of the 7th International Con-

ference on Frontiers of Information Technology (FIT), Punjab

University College of Information Technology (PUCIT), Uni-

versity of the Punjab, Anarkali, Lahore, Pakistan, Abbottabad,

Pakistan, 2009, pp. 14:1–14:9.

[30] P. Jesus, C. Baquero, P. S. Almeida, Using less links to improve

fault-tolerant aggregation, 4th Latin-American Symposium on

Dependable Computing (LADC), [Fast Abstract] (2009).

[31] M. F. Kaashoek, D. R. Karger, Koorde: A simple degree-

optimal distributed hash table, in: Proc. 2nd International

Workshop on Peer-to-Peer Systems (IPTPS), Vol. 2735 of Lec-

ture Notes in Computer Science, Springer, 2003, pp. 98–107.

[32] M. Dixit, A. Casimiro, Adaptare-FD: A Dependability-Oriented

Adaptive Failure Detector, in: 29th IEEE Symposium on Reli-

able Distributed Systems, 2010, pp. 141–147.

[33] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson,

M. Wawrzoniak, M. Bowman, PlanetLab: An Overlay Testbed

for Broad-Coverage Services, ACM SIGCOMM Computer Com-

munication Review 33 (3) (2003) 3–12.

[34] L. Tang, Y. Chen, F. Li, H. Zhang, J. Li, Empirical Study on the

Evolution of PlanetLab, in: Proceedings of the 6th International

Conference on Networking (ICN), IEEE, 2007, pp. 64–70.

[35] J. Hernandez, I. Phillips, Weibull mixture model to characterise

end-to-end Internet delay at coarse time-scales, IEE Proceed-

ings - Communications 153 (2) (2006) 295–304.

18

Page 21: Flow updating: Fault-tolerant aggregation for dynamic networks · 2020. 6. 8. · Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches,

Authors Biography

Paulo Jesus is currently a Software Developer at Oraclein the MySQL Utilities team, and an Invited Researcher atHASLab / INESC TEC (Universidade do Minho). He re-ceived the BEng degree in systems and informatics in 2001,and the MSc degree in mobile systems in 2007, both fromthe University of Minho (Portugal). He obtained his PhDdegree in 2012, from the MAP-i doctoral program in com-puter science by the Universities of Minho, Aveiro and Porto(Portugal). His research interests include distributed algo-rithms, fault tolerance, and mobile systems.

Carlos Baquero is currently Assistant Professor at theComputer Science Department in Universidade do Minho(Portugal). He obtained his MSc and PhD degrees from Uni-versidade do Minho in 1994 and 2000. His research interestsare focused on distributed systems, in particular in causal-ity tracking, peer-to-peer systems and distributed data ag-gregation. Recent research is focused on highly dynamic dis-tributed systems, both in internet P2P settings and in mobileand sensor networks.

Paulo Srgio Almeida is currently Assistant Professor atthe Computer Science Department in Universidade do Minho(Portugal). He obtained his MSc in Electrical Engineeringand Computing from Universidade do Porto in 1994 and hisPhD degree in Computer Science from Imperial College Lon-don in 1998. His research interests are focused on distributedsystems, in particular distributed algorithms (namely aggre-gation) and logical clocks for causality tracking (with appli-cations to optimistic replication).

1

*Author Biography & Photograph