secure data aggregation technique for

1545-5971 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TDSC.2014.2316816, IEEE Transactions on Dependable and Secure Computing

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 1

Secure Data Aggregation Technique forWireless Sensor Networks in the Presence of

Collusion AttacksMohsen Rezvani, Student Member, IEEE, Aleksandar Ignjatovic, Elisa Bertino, Fellow, IEEE

and Sanjay Jha, Senior Member, IEEE,

Abstract—Due to limited computational power and energy resources, aggregation of data from multiple sensor nodes done atthe aggregating node is usually accomplished by simple methods such as averaging. However such aggregation is known tobe highly vulnerable to node compromising attacks. Since WSN are usually unattended and without tamper resistant hardware,they are highly susceptible to such attacks. Thus, ascertaining trustworthiness of data and reputation of sensor nodes is crucialfor WSN. As the performance of very low power processors dramatically improves, future aggregator nodes will be capableof performing more sophisticated data aggregation algorithms, thus making WSN less vulnerable. Iterative filtering algorithmshold great promise for such a purpose. Such algorithms simultaneously aggregate data from multiple sources and provide trustassessment of these sources, usually in a form of corresponding weight factors assigned to data provided by each source. Inthis paper we demonstrate that several existing iterative filtering algorithms, while significantly more robust against collusionattacks than the simple averaging methods, are nevertheless susceptive to a novel sophisticated collusion attack we introduce.To address this security issue, we propose an improvement for iterative filtering techniques by providing an initial approximationfor such algorithms which makes them not only collusion robust, but also more accurate and faster converging.

Index Terms—Wireless sensor networks, robust data aggregation, collusion attacks.

F

1 INTRODUCTION

Due to a need for robustness of monitoring and lowcost of the nodes, wireless sensor networks (WSNs)are usually redundant. Data from multiple sensorsis aggregated at an aggregator node which then for-wards to the base station only the aggregate val-ues. At present, due to limitations of the computingpower and energy resource of sensor nodes, data isaggregated by extremely simple algorithms such asaveraging. However, such aggregation is known tobe very vulnerable to faults, and more importantly,malicious attacks [1]. This cannot be remedied bycryptographic methods, because the attackers gener-ally gain complete access to information stored in thecompromised nodes. For that reason data aggregationat the aggregator node has to be accompanied by anassessment of trustworthiness of data from individualsensor nodes. Thus, better, more sophisticated algo-rithms are needed for data aggregation in the futureWSN. Such an algorithm should have two features.

1) In the presence of stochastic errors such algo-rithm should produce estimates which are closeto the optimal ones in information theoretic

• M. Rezvani, A. Ignjatovic and S. Jha are with School of ComputerScience and Engineering, University of New South Wales, Sydney,Australia. E-mail: {mrezvani,ignjat,sanjay}@cse.unsw.edu.au

• E. Bertino is with Department of Computer Science, Purdue Univer-sity. E-mail: [email protected]

sense. Thus, for example, if the noise presentin each sensor is a Gaussian independently dis-tributed noise with a zero mean, then the es-timate produced by such an algorithm shouldhave a variance close to the Cramer-Rao lowerbound (CRLB) [2], i.e, it should be close to thevariance of the Maximum Likelihood Estima-tor (MLE). However, such estimation should beachieved without supplying to the algorithm thevariances of the sensors, unavailable in practice.

2) The algorithm should also be robust in thepresence of non-stochastic errors, such as faultsand malicious attacks, and, besides aggregatingdata, such algorithm should also provide anassessment of the reliability and trustworthinessof the data received from the sensor nodes.

Trust and reputation systems have a significantrole in supporting operation of a wide range ofdistributed systems, from wireless sensor networksand e-commerce infrastructure to social networks, byproviding an assessment of trustworthiness of par-ticipants in such distributed systems. A trustworthi-ness assessment at any given moment represents anaggregate of the behaviour of the participants up tothat moment and has to be robust in the presence ofvarious types of faults and malicious behaviour. Thereare a number of incentives for attackers to manip-ulate the trust and reputation scores of participantsin a distributed system, and such manipulation can




severely impair the performance of such a system [3].The main target of malicious attackers are aggregationalgorithms of trust and reputation systems [4].

Trust and reputation have been recently suggestedas an effective security mechanism for Wireless Sen-sor Networks (WSNs) [5]. Although sensor networksare being increasingly deployed in many applicationdomains, assessing trustworthiness of reported datafrom distributed sensors has remained a challengingissue. Sensors deployed in hostile environments maybe subject to node compromising attacks by adver-saries who intend to inject false data into the system.In this context, assessing the trustworthiness of thecollected data and announcing decision makers for thedata trustworthiness becomes a challenging task [6].

As the computational power of very low powerprocessors dramatically increases, mostly driven bydemands of mobile computing, and as the cost ofsuch technology drops, WSNs will be able to affordhardware which can implement more sophisticateddata aggregation and trust assessment algorithms; anexample is the recent emergence of multi-core andmulti-processor systems in sensor nodes [7].

Iterative Filtering (IF) algorithms are an attractiveoption for WSNs because they solve both problems- data aggregation and data trustworthiness assess-ment - using a single iterative procedure [8]. Suchtrustworthiness estimate of each sensor is based onthe distance of the readings of such a sensor from theestimate of the correct values, obtained in the previousround of iteration by some form of aggregation of thereadings of all sensors. Such aggregation is usuallya weighted average; sensors whose readings signif-icantly differ from such estimate are assigned lesstrustworthiness and consequently in the aggregationprocess in the present round of iteration their readingsare given a lower weight.

In recent years, there has been an increasing amountof literature on IF algorithms for trust and reputationsystems [9], [8], [10], [11], [12], [13], [14], [15]. Theperformance of IF algorithms in the presence of dif-ferent types of faults and simple false data injectionattacks has been studied, for example in [16] where itwas applied to compressive sensing data in WSNs. Inthe past literature it was found that these algorithmsexhibit better robustness compared to the simple av-eraging techniques; however, the past research didnot take into account more sophisticated collusionattack scenarios. If the attackers have a high levelof knowledge about the aggregation algorithm andits parameters, they can conduct sophisticated attackson WSNs by exploiting false data injection through anumber of compromised nodes. This paper presentsa new sophisticated collusion attack scenario againsta number of existing IF algorithms based on the falsedata injection. In such an attack scenario, colludersattempt to skew the aggregate value by forcing suchIF algorithms to converge to skewed values provided

by one of the attackers.Although such proposed attack is applicable to a

broad range of distributed systems, it is particularlydangerous once launched against WSNs for two rea-sons. First, trust and reputation systems play criticalrole in WSNs as a method of resolving a numberof important problems, such as secure routing, faulttolerance, false data detection, compromised nodedetection, secure data aggregation, cluster head elec-tion, outlier detection, etc [17]. Second, sensors whichare deployed in hostile and unattended environmentsare highly susceptible to node compromising attacks[18]. While offering better protection than the simpleaveraging, our simulation results demonstrate thatindeed current IF algorithms are vulnerable to suchnew attack strategy.

As we will see, such vulnerability to sophisticatedcollusion attacks comes from the fact that these IFalgorithms start the iteration process by giving anequal trust value to all sensor nodes. In this paper, wepropose a solution for such vulnerability by providingan initial trust estimate which is based on a robustestimation of errors of individual sensors. When thenature of errors is stochastic, such errors essentiallyrepresent an approximation of the error parametersof sensor nodes in WSN such as bias and variance.However, such estimates also prove to be robust incases when the error is not stochastic but due tocoordinated malicious activities. Such initial estima-tion makes IF algorithms robust against describedsophisticated collusion attack, and, we believe, alsomore robust under significantly more general circum-stances; for example, it is also effective in the presenceof a complete failure of some of the sensor nodes.This is in contrast with the traditional non iterativestatistical sample estimation methods which are notrobust against false data injection by a number ofcompromised nodes [18] and which can be severelyskewed in the presence of a complete sensor failure.

Since readings keep streaming into aggregatornodes in WSNs, and since attacks can be very dy-namic (such as orchestrated attacks [4]), in order toobtain trustworthiness of nodes as well as to identifycompromised nodes we apply our framework onconsecutive batches of consecutive readings. Sensorsare deemed compromised only relative to a particularbatch; this allows our framework to handle on-offtype of attacks (called orchestrated attacks in [4]).

We validate the performance of our algorithm bysimulation on synthetically generated datasets. Oursimulation results illustrate that our robust aggre-gation technique is effective in terms of robustnessagainst our novel sophisticated attack scenario as wellas efficient in terms of the computational cost.

Our contributions can be summarized as follows1:

1. An extended version of this paper has been published as atechnical report in [19]




1) Identification of a new sophisticated collusionattack against IF based reputation systems whichreveals a severe vulnerability of IF algorithms;

2) A novel method for estimation of sensors’ errorswhich is effective in a wide range of sensor faultsand not susceptible to the described attack;

3) Design of an efficient and robust aggregationmethod inspired by the MLE, which utilises anestimate of the noise parameters obtained usingcontribution 2 above;

4) Enhanced IF schemes able to protect againstsophisticated collusion attacks by providing aninitial estimate of trustworthiness of sensors us-ing inputs from contributions 2 and 3 above;

We provide a thorough empirical evaluation of ef-fectiveness and efficiency of our proposed aggregationmethod. The results show that our method providesboth higher accuracy and better collusion resistancethan the existing methods.

The rest of this paper is organized as follows.Section 2 describes the problem statement and theassumptions. Section 3 presents our novel robust dataaggregation framework. Section 4 describes our exper-imental results. Section 5 presents the related work.Finally, the paper is concluded in Section 6.

2 BACKGROUND, ASSUMPTIONS, THREATMODEL AND PROBLEM STATEMENT

In this section, we present our assumptions, discussIF algorithms, describe a collusion attack scenarioagainst IF algorithms, and state the problems that weaddress in this paper.

2.1 Network Model

For the sensor network topology, we consider theabstract model proposed by Wagner in [20]. Fig. 1shows our assumption for network model in WSN.The sensor nodes are divided into disjoint clusters,and each cluster has a cluster head which acts asan aggregator. Data are periodically collected andaggregated by the aggregator. In this paper we assumethat the aggregator itself is not compromised andconcentrate on algorithms which make aggregationsecure when the individual sensor nodes might becompromised and might be sending false data to theaggregator. We assume that each data aggregator hasenough computational power to run an IF algorithmfor data aggregation.

2.2 Iterative Filtering in Reputation Systems

Kerchove and Dooren proposed in [8] an IF algorithmfor computing reputation of objects and raters in arating system. We briefly describe the algorithm in thecontext of data aggregation in WSN and explain thevulnerability of the algorithm for a possible collusion

Base Station

Aggregator Node/ Cluster Head

Aggregator Node/ Cluster Head

Cluster

Cluster

Fig. 1: Network model for WSN.

attack. We note that our improvement is applicable toother IF algorithms as well.

We consider a WSN with n sensors Si, i = 1, . . . , n.We assume that the aggregator works on one block ofreadings at a time, each block comprising of readingsat m consecutive instants. Therefore, a block of read-ings is represented by a matrix X = {x1,x2, . . . ,xn}where xi = [x1i x

2i . . . xmi ]

T , (1 ≤ i ≤ n) representsthe ith m-dimensional readings reported by sensornode Si. Let r = [r1 r2 . . . rm]T denote the aggregatevalues for instants t = 1, . . . ,m, which authors of[8] call a reputation vector2, computed iteratively andsimultaneously with a sequence of weights w =[w1 w2 . . . wn]T reflecting the trustworthiness ofsensors. We denote by r(l), w(l) the approximationsof r, w obtained at lth round of iteration (l ≥ 0).

The iterative procedure starts with giving equalcredibility to all sensors, i.e., with an initial valuew(0) = 1. The value of the reputation vector r(l+1) inround of iteration l + 1 is obtained from the weightsof the sensors obtained in the round of iteration l as

r(l+1) =X ·w(l)

n∑i=1

w(l)i

Consequently, the initial reputation vector is r(1) =1nX ·1, i.e., r(1) is just the sequence of simple averagesof the readings of all sensors at each particular instant.The new weight vector w(l+1) to be used in round ofiteration l + 1 is then computed as a function g(d) ofthe normalized belief divergence d which is the distancebetween the sensor readings and the reputation vectorr(l). Thus, d = [d1 d2 . . . dn]T , di = 1

m

∥∥xi − r(l+1)∥∥22

and w(l+1)i = g(di), (1 ≤ i ≤ n).

Function g(x) is called the discriminant functionand it provides an inverse relationship of weights todistances d. Our experiments show that selecting adiscriminant function has a significant role in stabilityand robustness of IF algorithms. A number of alter-natives for this function are studied in [8]:

2. We find such terminology confusing, because reputationshould pertain to the level of trustworthiness rather than theaggregate value, but have decided to keep the terminology whichis already in use.




Algorithm 1: Iterative filtering algorithm.Input: X,n,m.Output: The reputation vector rl← 0;w(0) ← 1;repeat

Compute r(l+1);Compute d;Compute w(l+1);l← l + 1;

until reputation has converged;

• reciprocal: g(d) = d−k;• exponential: g(d) = e−d;• affine: g(d) = 1 − kld, where kl > 0 is chosen so

that g(maxi{d(l)i }) = 0.Algorithm 1 illustrates the iterative computation of

the reputation vector based on the above formulas.TABLE 1 shows a trace example of this algorithm.The sensor readings in the first three rows of this tableare from sensed temperatures in Intel Lab dataset [21]at three different time instants. We executed the IFalgorithm on the readings; the discriminant functionin the algorithm was a reciprocal of the distancebetween sensor readings and the current computedreputation. The lower part of the table illustrates theweight vector in each iteration as well as the obtainedreputation values for the three different time instants(t1, t2, t3) in the last three columns. As can be seen,the algorithm converges after six iterations.

2.3 Adversary ModelIn this paper, we use a Byzantine attack model, wherethe adversary can compromise a set of sensor nodesand inject any false data through the compromisednodes [22]. We assume that sensors are deployed in ahostile unattended environment. Consequently, somenodes can be physically compromised. We assumethat when a sensor node is compromised, all theinformation which is inside the node becomes acces-sible by the adversary. Thus, we cannot rely on cryp-tographic methods for preventing the attacks, sincethe adversary may extract cryptographic keys fromthe compromised nodes. We assume that through thecompromised sensor nodes the adversary can sendfalse data to the aggregator with a purpose of dis-torting the aggregate values. We also assume that allcompromised nodes can be under control of a singleadversary or a colluding group of adversaries, en-abling them to launch a sophisticated attack. We alsoconsider that the adversary has enough knowledgeabout the aggregation algorithm and its parameters.Finally, we assume that the base station and aggrega-tor nodes cannot be compromised in this adversarymodel; there is an extensive literature proposing howto deal with the problem of compromised aggregators;in this paper we limit our attention to the lower layer

problem of false data being sent to the aggregatorby compromised individual sensor nodes, which hasreceived much less attention in the existing literature.

2.4 Collusion Attack ScenarioMost of the IF algorithms employ simple assumptionsabout the initial values of weights for sensors. In caseof our adversary model, an attacker is able to misleadthe aggregation system through careful selection ofreported data values. We use visualisation techniquesfrom [18] to present our attack scenario.

Assume that ten sensors report the values of tem-perature which are aggregated using the IF algorithmproposed in [8] with the reciprocal discriminant func-tion. We consider three possible scenarios; see Fig. 2.• In scenario 1, all sensors are reliable and the result

of the IF algorithm is close to the actual value.• In scenario 2, an adversary compromises two

sensor nodes, and alters the readings of thesevalues such that the simple average of all sensorreadings is skewed towards a lower value. Asthese two sensor nodes report a lower value, IFalgorithm penalises them and assigns to themlower weights, because their values are far fromthe values of other sensors. In other words,the algorithm is robust against false data injec-tion in this scenario because the compromisednodes individually falsify the readings withoutany knowledge about the aggregation algorithm.TABLE 2 illustrates a trace example of the at-tack scenario on Intel dataset; sensors 9 and 10are compromised by an adversary. As one cansee, the algorithm assigns very low weights tothese two sensor nodes and consequently theircontributions decrease. Thus, the IF algorithm isrobust against the simple outlier injection by thecompromised nodes.

• In scenario 3, an adversary employs three com-promised nodes in order to launch a collusionattack. It listens to the reports of sensors in thenetwork and instructs the two compromised sen-sor nodes to report values far from the true valueof the measured quantity. It then computes theskewed value of the simple average of all sensorreadings and commands the third compromisedsensor to report such skewed average as its read-ings. In other words, two compromised nodesdistort the simple average of readings, while thethird compromised node reports a value veryclose to such distorted average thus making suchreading appear to the IF algorithm as a highlyreliable reading. As a result, IF algorithms willconverge to the values provided by the thirdcompromised node, because in the first iterationof the algorithm the third compromised node willachieve the highest weight, significantly dominat-ing the weights of all other sensors. This is re-inforced in every subsequent iteration; therefore,




TABLE 1: A trace example of iterative filtering algorithm.

sensor readings aggregate valuesinstant s1 s2 s3 s4 s5 s6 s7 s8

t=1 19.3612 19.42 19.0084 18.5674 17.95 22.153 18.0088 20.4t=2 19.3612 19.4102 19.0084 18.5478 21.282 21.347 18.0088 20.4098t=3 19.3612 19.42 19.0084 17.117 21.3408 20.813 21.625 19.7924

round# sensor weights t=1 t=2 t=31 1 1 1 1 1 1 1 1 19.3586 19.6719 19.80972 1.01E+01 1.34E+01 2.4896 0.3282 0.4335 0.2581 0.3806 1.8413 19.4008 19.439 19.43183 2.38E+02 2.24E+03 5.7843 0.4381 0.328 0.2286 0.3412 1.4486 19.4137 19.4052 19.41394 4.01E+02 2.96E+04 6.1705 0.446 0.3199 0.2267 0.3404 1.4116 19.4192 19.4095 19.41925 3.31E+02 1.59E+06 6.02 0.4433 0.3206 0.2278 0.3403 1.4273 19.42 19.4102 19.426 3.22E+02 6.47E+09 5.9971 0.4428 0.3207 0.2279 0.3402 1.4297 19.42 19.4102 19.42

the algorithm quickly converges to a reputationwhich is very close to the initial skewed simpleaverage, as shown in Fig. 2. TABLE 3 shows thesame attack scenario on Intel Lab dataset; sensors8, 9 and 10 are compromised by an adversary.As one can see, the algorithm converges quicklyto the readings of sensor 10 which is essentiallyequal to the simple average value of the sensors.

In the third scenario, how much the aggregatevalue is skewed directly depends on the number ofcompromised nodes which distort the sample averageof readings. Moreover, in this scenario, the attackerneeds to gain control over at least two sensor nodes;one which will reports readings which distort thesample average and another one which reports suchdistorted average. In our experiments, we investigatehow the behaviour of the IF algorithm depends onthe number of compromised nodes; see Section 4.4.

Clearly, the main source of the above vulnerabilitycomes from the fact that the algorithm assigns anequal initial weight to all sensor nodes in the first iter-ation. Moreover, the reciprocal discriminant functionhas a pole at zero which makes the algorithm unstablein the presence of sensors exhibiting a very smallbelief divergence at any given round of iteration.Therefore, under an attack of the kind described, thereputation value of the first iteration is equal to thesimple average of readings, and the second vectorof weights is computed based on the distance ofeach sensor to the simple average provided by thefirst iteration. As most of the IF algorithms in theliterature make the same assumption about the initialtrustworthiness of sensors, we argue that an adver-sary with sufficient knowledge of such algorithms canlaunch an attack as we have described and deceive theaggregator node.

In the case in which the nodes use cryptographyto ensure the confidentiality of readings they send tothe aggregator, the adversary can still estimate thesereadings by sensing the measured quantity using themalicious nodes.

To address the shortcoming of existing IF methods,we focus on estimating an initial trust vector based on

Reliable node

Compromised node

Reputation Value

Actual Value

Scenario 1:

Scenario 2:

Scenario 3:

Fig. 2: Attack scenario against IF algorithm.

an estimate of error parameters of sensor nodes. Afterthat, we use the new trust vector as the initial sensortrustworthiness in order to consolidate the algorithmsagainst an attack scenario of the type described.

3 ROBUST DATA AGGREGATION

In this section, we present our robust data aggregationmethod. TABLE 4 contains a summary of notationsused in this paper.

3.1 Framework Overview

In order to improve the performance of IF algorithmsagainst the aforementioned attack scenario, we pro-vide a robust initial estimation of the trustworthinessof sensor nodes to be used in the first iteration ofthe IF algorithm. Most of the traditional statisticalestimation methods for variance involve use of thesample mean. For this reason, proposing a robustvariance estimation method in the case of skewedsample mean is an essential part of our methodology.

In the rest of this paper, we assume that the stochas-tic components of sensor errors are independent ran-dom variables with a Gaussian distribution; however,our experiments show that our method works quitewell for other types of errors without any modifica-tion. Moreover, if error distribution of sensors is eitherknown or estimated, our algorithms can be adapted toother distributions to achieve an optimal performance.

Fig. 3 illustrates the stages of our robust aggre-gation framework and their interconnections. As wehave mentioned, our aggregation method operates on




TABLE 2: A trace example of a simple attack scenario.

sensor readingsinstant s1 s2 s3 s4 s5 s6 s7 s8 s9 s10

t=1 19.7336 19.6160 19.7728 20.2040 20.4196 19.4494 20.1354 19.0084 13.2001 13.5609

round# sensor weights t=11 1 1 1 1 1 1 1 1 1 1 18.50972 6.68 8.17 6.27 3.48 2.74 11.41 3.78 40.21 0.35 0.41 19.33903 64.21 130.29 53.13 13.36 8.56 872.81 15.77 91.52 0.27 0.30 19.48114 156.81 549.28 117.50 19.13 11.35 8.1E+3 23.36 44.76 0.25 0.29 19.46765 141.35 454.18 107.37 18.44 11.03 2.1E+4 22.42 47.42 0.25 0.29 19.45366 127.57 379.24 98.16 17.76 10.72 1.7E+5 21.51 50.45 0.26 0.29 19.44687 121.61 349.49 94.12 17.44 10.57 1.4E+7 21.09 52.02 0.26 0.29 19.44608 120.91 346.06 93.64 17.40 10.55 1.0E+11 21.04 52.22 0.26 0.29 19.4460

TABLE 3: A trace example of the proposed collusion attack scenario.

sensor readingsinstant s1 s2 s3 s4 s5 s6 s7 s8 s9 s10

t=1 19.7336 19.6160 19.7728 20.2040 20.4196 19.4494 20.1354 13.2001 13.5609 18.4546

round# sensor weights t=11 1 1 1 1 1 1 1 1 1 1 18.45462 0.6113 0.7414 0.5755 0.3268 0.2590 1.0106 0.3540 0.0362 0.0418 6.25E+08 18.45463 0.6113 0.7414 0.5755 0.3268 0.2590 1.0105 0.3540 0.0362 0.0418 1.78E+16 18.4546

TABLE 4: Notation used in this paper.

n number of sensors

m number of readings for each sensor

rt true value of the signal at time t

xts data from sensor s at time t

ets noise (error) of sensor s at time t

bs bias of sensor s

σs standard deviation of noise of sensor s

vs variance of sensor s

batches of consecutive readings of sensors, proceedingin several stages. In the first stage we provide aninitial estimate of two noise parameters for sensornodes, bias and variance; details of the computationsfor estimating bias and variance of sensors are pre-sented in Section 3.2 and 3.3, respectively.

Based on such an estimation of the bias and vari-ance of each sensor, the bias estimate is subtractedfrom sensors readings and in the next phase of theproposed framework, we provide an initial estimateof the reputation vector calculated using the MLE. Thedetailed computation operations of such estimationare described in Section 3.4.

In the third stage of the proposed framework, theinitial reputation vector provided in the second stageis used to estimate the trustworthiness of each sensorbased on the distance of sensor readings to suchinitial reputation vector. This idea will be describedin Section 3.5.

3.2 Estimating Bias

We assume that all sensors in WSN can have someerror; such error ets of a sensor s is modelled by the

Noise Parameters Estimation

Bias Estimation

Variance Estimation

Maximum Likelihood Estimation

Iterative Filtering with Initial Values

Iterative Filtering

Unbiasing Sensor Readings

MLE with Known Variance

Fig. 3: Our robust data aggregation framework.

Gaussian distribution random variable with a sensorbias bs and sensor variance σs, ets ∼ N (bs, σ

2s). Let rt

denotes the true value of the signal at time t. Eachsensor reading xts can be written as:

xts = rt + ets (1)

The main idea is that, since we have no access tothe true value rt we cannot obtain the value of theerror ets; however, we can obtain the values of thedifferences of such errors. Thus, if we define δ(i, j) =1m

m∑t=1

(xti − xtj

), we get:

δ(i, j) =1

m

m∑t=1

(xti − xtj

)=

1

m

m∑t=1

eti −1

m

m∑t=1

etj

where eti is a random variable with Gaussian distri-

bution eti ∼ N (bi, σ2i ). Let ei = 1

m

m∑t=1

eti be the sample

mean of this random variable. As the sample meanis an unbiased estimator of the expected value of arandom variable, we have

δ(i, j) = ei − ej ≈ bi − bj

Let δ = {δ(i, j) : 1 ≤ i, j ≤ n}; this matrix is anestimator for mutual difference of sensor bias. In orderto obtain the sensor bias from this matrix, we solve




the following minimization problem.

minimizeb

n∑i=1

i−1∑j=1

(bi − bjδ(i, j)

− 1

)2

subject ton∑i=1

bi = 0.

(2)

To justify our constraint, it is clear that if the meanof the bias of all sensors is not zero, then there wouldbe no way to account for it on the basis of sensorreadings. On the other hand, bias of sensors, undernormal circumstances, comes from imperfections inmanufacture and calibration of sensors as well asfrom the fact that they might be deployed in placeswith different environmental circumstances where thesensed scalar might in fact have a slightly differentvalue. Since by the very nature we are interested inobtaining a most reliable estimate of an average valueof the variable sensed, it is reasonable to assume thatthe mean bias of all sensors is zero (without faultsor malicious attacks). We chose the above objective

rather thann∑i=1

i−1∑j=1

(bi − bj − δ(i, j))2 to improve the

performance in case when biases can be of verydifferent magnitudes.

We introduce a Lagrangian multiplier λ and look atextremal values of the following function:

F (~b) =

n∑i=1

i−1∑j=1

(bi − bjδ(i, j)

− 1

)2

+ λ

n∑i=1

bi

By setting the gradient of F (~b) to zero we obtaina system of linear equations whose solution is ourapproximation of the the bias values. If we let

d(i, j) =

{−δ(j, i) i < j

δ(j, i) i ≥ j

then these equations can be written in the followingcompact form:

n∑i=1i 6=k

2d(i,k)2

bi −n∑i=1i 6=k

2d(i,k)2

bk − λ = 2n∑i=1i 6=k

1d(i,k) ,

for all k = 1, · · · , nn∑i=1

bi = 0.

(3)Note that the obtained value of bi is actually an

approximation of the sample mean of the error ofsensor i, which, in turn is an unbiased estimator ofthe bias of such a sensor.

3.3 Estimating Variance

In this section, we propose a similar method to esti-mate variance of the sensor noise using the estimatedbias from previous section. Given the bias vector

b = [b1, b2, · · · , bn] and sensor readings {xts}, we candefine matrices {xts} and β = {β(i, j)} as follows:

xts = xts − bs (4)

β(i, j) =1

m− 1

m∑t=1

(xti − xtj

)2=

1

m− 1

m∑t=1

((xti − xtj

)− (bi − bj)

)2By (1) we have xti −xtj = (rt + eti)− (rt + etj) = eti − etj ;thus, we obtain

β(i, j) =1

m− 1

m∑t=1

(eti − bi

)2+

1

m− 1

m∑t=1

(etj − bj

)2− 2

m− 1

m∑t=1

(eti − bi

) (etj − bj

)We assume that the sensors noise is generated by

independent random variables3; as we have men-tioned, our approximations of the bias bi are actuallyapproximations of the sample mean; thus

1

m− 1

m∑t=1

(eti − bi

) (etj − bj

)≈ Cov(ei, ej) = 0

and similarly

β(i, j) =1

m− 1

m∑t=1

(eti − bi

)2+

1

m− 1

m∑t=1

(etj − bj

)2≈ σ2

i + σ2j

The above formula shows that we can estimate thevariance of sensors noise by computing the matrix β.We also compute the sum of variances of all sensorsusing the following Lemma.

Lemma 3.1 (Total Variance). Let xt be the mean ofreadings in time t, then, using (4) and our assumption

thatn∑i=1

bi = 0 , we have

xt =1

n

n∑j=1

xtj =1

n

n∑j=1

xtj ,

and the statistic

S(t) =n

m(n− 1)

n∑i=1

m∑t=1

(xti − xt

)2is an unbiased estimator of the sum of the variances of all

sensors,n∑i=1

vi.

We presented the proof of the lemma in [19]. Toobtain an estimation of variances of sensors from

3. We analyze our estimation method with synthetic correlateddata and the experimental results show that the our methodproduces excellent results even for correlated noise.




the matrix β = {β(i, j)} we solve the followingminimization problem:

minimizev

n∑i=1

i−1∑j=1

(vi + vjβ(i, j)

− 1

)2

subject ton∑i=1

vi =n

m(n− 1)

n∑i=1

m∑t=1

(xti − xt

)2 (5)

Note that the constrain of the minimisation problemcomes from Lemma 3.1. We again introduce a La-grangian multiplier λ and by solving the minimiza-tion problem, we obtain linear Equations (6):

n∑i=1i 6=k

1β(i,k)2

vi +n∑i=1i 6=k

1β(i,k)2

vk + λ2 =

n∑i=1

1β(i,k) ,

for all k = 1, · · · , nn∑i=1

vi = nm(n−1)

n∑i=1

m∑t=1

(xti − xt)2.

(6)

3.4 MLE with Known VarianceIn the previous sections, we proposed a novel ap-proach for estimating the bias and variance of noisefor sensors based on their readings. The variance andthe bias of a sensor noise can be interpreted as thedistance measures of the sensor readings to the truevalue of the signal. In fact, the distance measuresobtained as our estimates of the bias and variancesof sensors also make sense for non-stochastic errors.

Given matrix {xts} where xts ∼ rt + N (bs, σ2s) and

estimated bias and variance vectors b and σ, wepropose to recover rt using (an approximate form of)the MLE applied to the values obtained by subtractingthe bias estimates from sensors readings. As it is wellknown, in this case the MLE has the smallest possiblevariance as it attains the CRLB.

From a heuristic point of view, we removed the“systematic component” of the error by subtractinga quantity which in the case of a stochastic errorcorresponds to an estimate of bias; this allows usto estimate the variability around such a systematiccomponent of the error, which, in case of stochasticerrors, corresponds to variance. We can now obtainan estimation which corresponds to MLE formula forthe case of zero mean normally distributed errors, butwith estimated rather than true variances. Therefore,we assume that the expected value rt of the mea-surements is the true value of the quantity measured,and is the only parameter in the likelihood function.Thus, in the expression for the likelihood function fornormally distributed unbiased case,

Ln(rt) =

n∏i=1

1

σi√

2πe− 1

2

(xti−rt)2

σ2i

we replace σ2i by the obtained variance vi from

Equation (6). Moreover, by differentiating the above

formula with respect to rt and setting the derivativeequal to zero we get

rt =

n∑i=1

1vi∑nj=1

1vj

xti for all t = 1, · · · ,m. (7)

Equation (7) provide an estimate of the true value ofthe quantity measured in a form of a weighted aver-age of sensor readings, with the sensor readings givena weight inversely proportional to the estimation oftheir error variance provided by our method:

r =

n∑s=1

wsxs (8)

Note that this method estimates the reputationvector without any iteration. Thus, the computationalcomplexity of the estimation is considerably less thanthe existing IF algorithms.

3.5 Enhanced Iterative Filtering

According to the proposed attack scenario, the at-tacker exploits the vulnerability of the IF algorithmswhich originates from a wrong assumption about theinitial trustworthiness of sensors. Our contribution toaddress this shortcomings is to employ the results ofthe proposed robust data aggregation technique as theinitial reputation for these algorithms. Moreover, theinitial weights for all sensor nodes can be computedbased on the distance of sensors readings to such aninitial reputation. Our experimental results illustratethat this idea not only consolidates the IF algorithmsagainst the proposed attack scenario, but using thisinitial reputation improves the efficiency of the IF al-gorithms by reducing the number of iterations neededto approach a stationary point within the prescribedtolerance; see Section 4.2.

4 SIMULATION RESULTS

In this section, we report on a detailed numericalsimulation study that examines robustness and effi-ciency of our data aggregation method. 4 The objec-tive of our experiments is to evaluate the robustnessand efficiency of our approach for estimating thetrue values of signal based on the sensor readingsin the presence of faults and collusion attacks. Foreach experiment, we evaluate the accuracy based onRoot Mean Squared error (RMS error) metric andefficiency based on the number of iterations neededfor convergence of IF algorithms.

4. Unfortunately, we are unable to prove mathematically thatour system is secure; however we demonstrated higher robustnesscompared to the state of the art by thorough empirical evaluation.We are working on obtaining a rigorous proof but this appears tobe a challenging problem.




4.1 Experimental Settings

All the experiments have been conducted on an HPPC with 3.30GHz Intel Core i5-2500 processor with8Gb RAM running a 64-bit Windows 7 Enterprise. Theprogram code has been written in MATLAB R2012b.Although there are a number of real world datasets forevaluating reputation systems and data aggregationin sensor networks such as Intel dataset [21], none ofthem provides a clear ground truth. Thus, we conductour experiments by generating synthetic datasets. Theexperiments are based on simulations performed withboth correlated and uncorrelated sensor errors. If notmentioned otherwise, we generate synthetic datasetsaccording to the following parameters:• Each simulation experiment was repeated 200

times and then results were averaged;• Number of sensor nodes is n = 20;• Number of readings for each sensor is m = 400;• For statistical parameters of the errors (noise)

used to corrupt the true readings, we considerseveral ranges of values for bias, variance andcovariance of noise for each experiment;

• The level of significance in K-S test is α = 0.05.In all experiments, we compare our robust ag-

gregation method against three other IF techniquesproposed for reputation systems. For all parametersof other algorithms used in the experiments, we setthe same values as used in the original papers wherethey were introduced.

The first IF method considered computes the trust-worthiness of sensor nodes based on the distance oftheir readings to the current state of the estimatedreputation [8]. We described the details of this ap-proach in Section 2.2. We investigate two discrimi-nant functions in our experiments g(d) = d−1 andg(d) = 1−kld, and call these methods dKVD-Reciprocaland dKVD-Affine, respectively.

The second IF method we consider is a correlationbased ranking algorithm proposed by Zhou et al. in[9]. In this algorithm, trustworthiness of each sensoris obtained based on the correlation coefficient be-tween the sensors readings and the current estimateof the true value of the signal. In other words, thismethod gives credit to sensor nodes whose readingscorrelate well with the estimated true value of thesignal. Based on this idea, the authors proposed aniterative algorithm for estimating the true value of thesignal by applying a weighted averaging technique.They argued that correlation coefficient is a goodway to quantify the similarity between two vectors.Thus, they employed Pearson correlation coefficientbetween sensor readings and the current state ofestimate signal in order to compute the sensor weight.We call this method Zhou.

The third algorithm considered has been proposedby Laureti et al. in [10] and is an IF algorithm basedon a weighted averaging technique similar to the

TABLE 5: Summary of different IF algorithms.

Name Discriminant Function

dKVD-Reciprocal wl+1i =

(1

m

∥∥∥xi − rl+1∥∥∥22

)−1

dKVD-Affine wl+1i = 1− k

1

m

∥∥∥xi − rl+1∥∥∥22

Zhou wl+1i =

1

m

m∑i=1

(xti − xt

σxi

)(rt − r

σr

)

Laureti wl+1i =

(1

m

∥∥∥xi − rl+1∥∥∥22

)−0.5

algorithm described in Section 2.2. The only differencebetween these two algorithms is in the discriminantfunction. The authors in [10] exploited discriminantfunction g(d) = d−0.5. We call this method Laureti.

We apply dKVD-Reciprocal, dKVD-Affine, Zhou, Lau-reti and our robust aggregation approach to syn-thetically generated data. Although we can simplyapply our robust framework to all existing IF ap-proaches, in this paper we investigate the improve-ment which addition of our initial trustworthinessassessment method produces on the robustness ofdKVD-Reciprocal and dKVD-Affine methods (We callthem RobustAggregate-Reciprocal and RobustAggregate-Affine, respectively.).

TABLE 5 shows a summary of discriminant func-tions for all of the above four different IF methods.

We first conduct experiments by injecting onlyGaussian noise into sensor readings. In the secondpart of the experiments, we investigate the behaviourof these approaches by emulating a simple, non-colluding attack scenario presented in the second caseof Fig. 2. We then evaluate these approaches in thecase of our sophisticated attack scenario.

4.2 Accuracy and Efficiency without an AttackIn the first batch of experiments we assume thatthere are no sensors with malicious behaviour. Thus,the errors are fully stochastic; we consider Gaussiansensors errors. In order to evaluate the performanceof our algorithm in comparison with the existingalgorithms, we produce the following four differentsynthetic datasets.

1) Unbiased error: We considered various distri-butions of the variance across the set of sensorsand obtained similar results. We have chosento present the case with the error of a sensors at time t is given by ets ∼ N (0, s × σ2), con-sidering different values for the baseline sensorvariance σ2. Fig. 4(a) shows the results of theMLE with our noise parameter estimation (steps¬ and in Fig. 3) and the information theoreticlimit for the minimal variance provided by theCRLB, achieved, for example, using the MLEwith the actual, exact variances of sensors, whichare NOT available to our algorithm. As onecan see in this figure, our proposed approach




nearly exactly achieves the minimal possiblevariance coming from the information theoreticlower bound. Furthermore, Fig. 4(b) illustratesthe performance of our approach for the initialtrustworthiness assessment of sensors with dif-ferent discriminant functions as well as other IFalgorithms. It shows that in this experiment, theperformance of our approach with both discrim-inant functions is very similar to the original IFalgorithm.

2) Bias error: In this scenario, we inject bias er-ror to sensor readings, generated by Gaussiandistribution with different variances. Therefore,the error of sensor s in time t is generatedby ets ∼ N (N (0, σ2

b ), s × σ2) with the varianceof the bias σ2

b = 4 and increasing values forvariances, where the variance of sensor s is equalto s × σ2. Thus, the sensors bias is producedby a zero mean Gaussian distribution randomvariable. Fig. 4(c) shows the RMS error for allalgorithms in this scenario. As can be seen in thisfigure, since all of the IF algorithms, along withour approach, generate an error close to theirerrors in the unbiased scenario, we can concludethat the methods are stable against bias but fullystochastic noise.

3) Correlated noise: The heuristics behind our ini-tial variance estimation assumed that the errorsof sensors are uncorrelated. Thus, we tested howthe performance of our method degrades if thenoise becomes correlated and how it comparesto the existing methods under the same circum-stances. So in this scenario, we assume that theerrors of sensors are no longer uncorrelated.Possible covariance functions can be of differenttypes, such as Spherical, Power Exponential, Ra-tional Quadratic, and Matern; see [23]. Althoughour proposed method can be applied to all co-variance functions, we present here the resultsfor the case of the Power Exponential functionρ(i, j) = e

−|i−j|n . Moreover, the variance of a

sensor s is again set to σ2s = s × σ2. From the

corresponding covariance matrix Σ = {Σij =ρ(i, j)σiσj : i, j = 1 · · ·n}, the noise values ofsensors are generated from multivariate Normaldistribution Noise ∼ N (Bias,Σ). In this sce-nario, we take into account different values ofσ for generating the noise values of sensors inorder to analyse the accuracy of the data aggre-gation under various levels of noise. Fig. 4(d)shows the RMS error of the algorithms for thisscenario. As can be seen in this figure, ourapproach with reciprocal discriminant functionimproves dKVD-Reciprocal algorithm for all dif-ferent values of variance, although our methodwith affine function generates very similar RMSerror to the original dKVD-Affine algorithm.

Moreover, the scale of RMS error is in generallarger than in scenarios with uncorrelated noise,as one would expect. This can be explainedby our assumption that the sensors noise isgenerated by independent random variables; seeSection 3.3.

The results of our simulations also show that theuse of our initial variance estimation in the secondphase of our proposed framework as the initial rep-utation of IF algorithms decreases the number ofiterations for the algorithms. We evaluate the numberof iterations for the IF algorithm proposed in [8] byproviding the initial reputation from the results ofthe our approach for both unbiased and biased sen-sors errors. The results of this experiment show thatthe proposed initial reputation for the IF algorithmimproves the efficiency of the algorithm in termsof the number of iterations until the procedure hasconverged. In other words, by providing this initialreputation, the number of iterations for IF algorithmdecreases approximately 9% for reciprocal and around8% for affine discriminant functions in both biasedand unbiased circumstances. This can be explainedby the fact that the new initial reputation is close tothe true value of signal and the IF algorithm needsfewer iterations to reach its stationary point. In thenext part of our experiments, we employ this idea forconsolidating the IF algorithm against the proposedattack scenario.

1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

2

2.5

3Unbiased2: Cramer−Rao lower band and our approach

σ, Es = N (0, sσ2)

RMSError

RobustAggregateMLE with Actual Variances

(a) Unbiased error

1 1.5 2 2.5 3 3.5 4 4.5 50.5

1

1.5

2

2.5

3

3.5

4

4.5

5Unbiased2: Unbiased sensors with different variances

σ, Es = N (0, sσ2)

RMSError

dKVD-ReciprocaldKVD-AffineZhouLauretiRobustAggregate-ReciprocalRobustAggregate-Affine

(b) Unbiased error

1 1.5 2 2.5 3 3.5 4 4.5 50.5

1

1.5

2

2.5

3

3.5

4

4.5Biased2: Biased sensors with different variances

σ, Es = N (0, 4) +N (0,σ2s)

RMSError


(c) Bias error

1 1.5 2 2.5 3 3.5 4 4.5 52

4

6

8

10

12

14Correlated1: Correlated noises with different variances

σ, σs = σ

√s, ρ(i, j) = exp(− | i− j |/N)

RMSError


(d) Correlated noise

Fig. 4: Accuracy for No Attack scenarios.

4.3 Accuracy with Simple Attack ScenarioLim et al. in [18] introduced an attack scenario againsttraditional statistical aggregation approaches. We de-scribed the scenario in Section 2.4 and the secondround of Fig. 2 as a simple attack scenario usinga number of compromised node for skewing thesimple average of sensors readings. In this section,we investigate the behavior of IF algorithms againstthe simple attack scenario. Note that the objective ofthis attack scenario is to skew the sample mean of




sensors readings through reporting outlier readingsby the compromised nodes.

In order to evaluate the accuracy of the IF algo-rithms against the simple attack scenario, we assumethat the attacker compromises c (c < n) sensor nodesand reports outlier readings by these nodes. We gen-erate synthetically datasets for this attack scenario bytaking into account different values of variance forsensors errors as well as employing various numberof compromised nodes. Moreover, we generate biasedreadings for all sensor nodes with bias provided bya random variable with a distribution N (0, σ2

b ) withthe variance of bias chosen to be σ2

b = 4.Fig. 5 shows the accuracy of the IF algorithms and

our approach in the presence of such simple attackscenario. It can be seen that the estimates provided bythe three approaches, dKVD-Affine, Zhou and Lauretiare significantly skewed by this attack scenario andtheir accuracy significantly decreases by increasingthe number of compromised nodes. On the otherhand, dKVD-Reciprocal provides a reasonable accuracyfor all parameter values of this simple attack scenario(see Fig. 5(a)). The robustness of this discriminantfunction can be explained by the fact that the functionsharply diminishes the contributions of outlier read-ings through assigning very low values of weights tothem. In our sophisticated collusion attack scenario,we exploit this property in order to compromise sys-tems employing such discriminant function.

The results of this experiment clearly show thatour initial trustworthiness has no negative effectson the performance of the IF algorithm with bothdiscriminant functions in the case of the simple attackscenario. In next section, we show that how thisinitial values improve the IF algorithm in the case ofproposed collusion attack scenario, while both dKVD-Affine and dKVD-Reciprocal algorithms are compro-mised against such an attack scenario.

4.4 Accuracy with a Collusion AttackIn order to illustrate the robustness of the pro-posed data aggregation method in the presence of so-phisticated attacks, we synthetically generate severaldatasets by injecting the proposed collusion attacks.Therefore, we assume that the adversary employs c(c < n) compromised sensor nodes to launch thesophisticated attack scenario proposed in Section 2.4.The attacker uses the first c−1 compromised nodes togenerate outlier readings in order to skew the simpleaverage of all sensor readings. The adversary thenfalsifies the last sensor readings by injecting the valuesvery close to such skewed average. This collusionattack scenario makes the IF algorithm to convergeto a wrong stationary point. In order to investigatethe accuracy of the IF algorithms with this collu-sion attack scenario, we synthetically generate severaldatasets with different values for sensors variances aswell as various number of compromised nodes (c).

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Kerchove−Reciprocal

number of colluders

RM

S E

rror

(a) dKVD-Reciprocal

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Kerchove−Affine

number of colluders

RM

S E

rror

(b) dKVD-Affine

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Zhou

number of colluders

RM

S E

rror

(c) Zhou

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Laureti

number of colluders

RM

S E

rror

(d) Laureti

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: RobustAggregate−Reciprocal

number of colluders

RM

S E

rror

(e) RobustAggregate-Reciprocal

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: RobustAggregate−Affine

number of colluders

RM

S E

rror

(f) RobustAggregate-Affine

Fig. 5: Accuracy with a simple attack scenario.

Fig. 6 shows the accuracy of the IF algorithms andour approach in the presence of the collusion attackscenario. It can be seen that the IF algorithms withreciprocal discriminant function are highly vulnerableto such attack scenario (see Fig. 6(a) and Fig. 6(d)),while the affine discriminant function generates morerobust results in this case (see Fig. 6(b)). However,the accuracy of the affine discriminant function is stillmuch worse than the previous experiment without thecollusion attack.

This experiment shows that the collusion attackscenario can circumvent all the IF algorithms we tried.Moreover, the accuracy of the algorithms dramaticallydecreases by increasing the number of compromisednodes participated in the attack scenario. As explainedbefore, the algorithms converge to the readings of oneof the compromised nodes, namely, to the readingsof the node which reports values very close to theskewed mean. This demonstrates that an attacker withenough knowledge about the aggregation algorithmemployed can launch a sophisticated collusion attackscenario which defeats IF aggregation systems.

Fig. 6(e) and Fig. 6(f) show the accuracy of ourapproach by taking into account the IF algorithm in[8] with reciprocal and affine discriminant functions,respectively. As one can see, our proposed approachis superior to all other algorithms in terms of theaccuracy for reciprocal discriminant functions, whilethe approach has a very small improvement on affinefunction. Moreover, comparing the accuracy of ourapproach in this experiment with the results from no




attack and simple attack experiments in Fig. 4 andFig. 5, we can argue that our approach with reciprocaldiscriminant function is robust against the collusionattack scenario. The reason is that our approach notonly provides the highest accuracy for this discrimi-nant functions, it actually approximately reaches theaccuracy of No Attack scenarios.

As we described, the main shortcoming of the IFalgorithms in the proposed attack scenario is thatthey quickly converge to the sample mean in thepresence of the attack scenario. In order to investigatethe shortcoming, we conducted an experiment byincreasing the sensor variances as well as the num-ber of colluders. In this experiment, we quantifiedthe number of iterations for the IF algorithm withreciprocal discriminant function (dKVD-Reciprocal andRobustAggregate-Reciprocal algorithms). The results ob-tained from this experiment show that the originalversion of the IF algorithm quickly converges (afteraround five iterations) to the skewed values providedby one of the attackers, while starting with an initialreputation provided by our approach, the algorithmsrequire around 29 iterations, and, instead of converg-ing to the skewed values provided by one of theattackers, it provides a reasonable accuracy.

The results of this experiment validate that oursophisticated attack scenario is caused by the discov-ered vulnerability in the IF algorithms which sharplydiminishes the contributions of benign sensor nodeswhen one of the sensor nodes reports a value veryclose to the simple average.

5 RELATED WORK

Robust data aggregation is a serious concern in WSNsand there are a number of papers investigating mali-cious data injection by taking into account the variousadversary models. There are three bodies of workrelated to our research: IF algorithms, trust and repu-tation systems for WSNs, and secure data aggregationwith compromised node detection in WSNs.

There are a number of published studies intro-ducing IF algorithms for solving data aggregationproblem [8], [9], [10], [11], [12], [13], [14], [15]. Wereviewed three of them in our comparative exper-iments in Section 4. Li et al. in [12] proposed sixdifferent algorithms, which are all iterative and aresimilar. The only difference among the algorithms istheir choice of norm and aggregation function. Aydayet al. proposed a slight different iterative algorithm in[13]. Their main differences from the other algorithmsare: 1) the ratings have a time-discount factor, soin time, their importance will fade out; and 2) thealgorithm maintains a black-list of users who areespecially bad raters. Liao et al. in [14] proposed aniterative algorithm which beyond simply using therating matrix, also uses the social network of users.The main objective of author in [15] is to introduce

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: Kerchove−Reciprocal

number of colluders

RM

S E

rror

(a) dKVD-Reciprocal1

23

45

4

6

80

10

20

30

40

50

standard deviation

Attack5: Kerchove−Affine

number of colluders

RM

S E

rror

(b) dKVD-Affine

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: Zhou

number of colluders

RM

S E

rror

(c) Zhou

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: Laureti

number of colluders

RM

S E

rror

(d) Laureti

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: RobustAggregate−Reciprocal

number of colluders

RM

S E

rror

(e) RobustAggregate-Reciprocal

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: RobustAggregate−Affine

number of colluders

RM

S E

rror

(f) RobustAggregate-Affine

Fig. 6: Accuracy with our collusion attack.

a “Bias-smoothed tensor model”, which is a Bayesianmodel of rather high complexity. Although the exist-ing IF algorithms consider simple cheating behaviourby adversaries, none of them take into account sophis-ticated malicious scenarios such as collusion attacks.

Our work is also closely related to the trust and rep-utation systems in WSNs. Authors in [24] proposed ageneral reputation framework for sensor networks inwhich each node develops a reputation estimation forother nodes by observing its neighbors which makea trust community for sensor nodes in the network.Xiao et al. in [25] proposed a trust based frameworkwhich employs correlation to detect faulty readings.Moreover, they introduced a ranking framework toassociate a level of trustworthiness with each sensornode based on the number of neighboring sensornodes are supporting the sensor. Li et al. in [26] pro-posed PRESTO, a model-driven predictive data man-agement architecture for hierarchical sensor networks.PRESTO is a two tier framework for sensor data man-agement in sensor networks. The main idea of thisframework is to consider a number of proxy nodesfor managing sensed data from sensor nodes. Authorsin [6] proposed an interdependency relationship be-tween network nodes and data items for assessingtheir trust scores based on a cyclical framework. Themain contribution of authors in [27] is to proposea combination of trust mechanism, data aggregation,and fault tolerance to enhance data trustworthinessin Wireless Multimedia Sensor Networks (WMSNs)




which considers both discrete and continuous datastreams. Tang et al. in [28] proposed a trust frameworkfor sensor networks in cyber physical systems suchas a battle-network in which the sensor nodes areemployed to detect approaching enemies and sendalarms to a command center. Although fault detectionproblems have been addressed by applying trust andreputation systems in the above research, none ofthem take into account sophisticated collusion attacksscenarios in adversarial environments.

Reputation and trust concepts can be used to over-come the compromised node detection and securedata aggregation problems in WSNs. Ho et al. in[29] proposed a framework to detect compromisednodes in WSN and then apply a software attesta-tion for the detected nodes. They reported that therevocation of detected compromised nodes can notbe performed due to a high risk of false positivein the proposed scheme. The main idea of false ag-gregator detection in the scheme proposed in [30]is to employ a number of monitoring nodes whichare running aggregation operations and providing aMAC value of their aggregation results as a part ofMAC in the value computed by the cluster aggregator.High computation and transmission cost required forMAC-based integrity checking in this scheme makesit unsuitable for deployment in WSN. Lim et al. in[18] proposed a game-theoretical defense strategy toprotect sensor nodes and to guarantee a high levelof trustworthiness for sensed data. Moreover, thereis a large volume of published studies in the areaof secure tiny aggregation in WSNs [31], [32], [33].These studies focus on detecting false aggregationoperations by an adversary, that is, on data aggre-gator nodes obtaining data from source nodes andproducing wrong aggregated values. Consequently,they address neither the problem of false data beingprovided by the data sources nor the problem ofcollusion. However, when an adversary injects falsedata by a collusion attack scenario, it can affects theresults of the honest aggregators and thus the basestation will receive skewed aggregate value. In thiscase, the compromised nodes will attest their falsedata and consequently the base station assumes thatall reports are from honest sensor nodes. Although theaforementioned research take into account false datainjection for a number of simple attack scenarios, tothe best of our knowledge, no existing work addressesthis issue in the case of a collusion attack by compro-mised nodes in a manner which employs high levelknowledge about data aggregation algorithm used.

6 CONCLUSIONS

In this paper, we introduced a novel collusion attackscenario against a number of existing IF algorithms.Moreover, we proposed an improvement for the IFalgorithms by providing an initial approximation of

the trustworthiness of sensor nodes which makes thealgorithms not only collusion robust, but also more ac-curate and faster converging. In future work, We willinvestigate whether our approach can protect againstcompromised aggregators. we also plan to implementour approach in a deployed sensor network.

REFERENCES[1] S. Ozdemir and Y. Xiao, “Secure data aggregation in wireless

sensor networks: A comprehensive overview,” Comput. Netw.,vol. 53, no. 12, pp. 2022–2037, Aug. 2009.

[2] L. Wasserman, All of statistics : a concise course in statisticalinference. New York: Springer.

[3] A. Jøsang and J. Golbeck, “Challenges for robust trust andreputation systems,” in Proceedings of the 5 th InternationalWorkshop on Security and Trust Management, Saint Malo, France,2009.

[4] K. Hoffman, D. Zage, and C. Nita-Rotaru, “A survey of attackand defense techniques for reputation systems,” ACM Comput.Surv., vol. 42, no. 1, pp. 1:1–1:31, Dec. 2009.

[5] R. Roman, C. Fernandez-Gago, J. Lopez, and H. H. Chen,“Trust and reputation systems for wireless sensor networks,”in Security and Privacy in Mobile and Wireless Networking,S. Gritzalis, T. Karygiannis, and C. Skianis, Eds. TroubadorPublishing Ltd, 2009, pp. 105–128.

[6] H.-S. Lim, Y.-S. Moon, and E. Bertino, “Provenance-basedtrustworthiness assessment in sensor networks,” in Proceedingsof the Seventh International Workshop on Data Management forSensor Networks, ser. DMSN ’10, 2010, pp. 2–7.

[7] H.-L. Shi, K. M. Hou, H. ying Zhou, and X. Liu, “Energyefficient and fault tolerant multicore wireless sensor network:E2MWSN ,” in Wireless Communications, Networking and Mo-bile Computing (WiCOM), 2011 7th International Conference on,2011, pp. 1–4.

[8] C. de Kerchove and P. Van Dooren, “Iterative filtering inreputation systems,” SIAM J. Matrix Anal. Appl., vol. 31, no. 4,pp. 1812–1834, Mar. 2010.

[9] Y. Zhou, T. Lei, and T. Zhou, “A robust ranking algorithm tospamming,” CoRR, vol. abs/1012.3793, 2010.

[10] P. Laureti, L. Moret, Y.-C. Zhang, and Y.-K. Yu, “Informationfiltering via Iterative Refinement,” EPL (Europhysics Letters),vol. 75, pp. 1006–1012, Sep. 2006.

[11] Y.-K. Yu, Y.-C. Zhang, P. Laureti, and L. Moret, “Decodinginformation from noisy, redundant, and intentionally distortedsources,” Physica A Statistical Mechanics and its Applications, vol.371, pp. 732–744, Nov. 2006.

[12] R.-H. Li, J. X. Yu, X. Huang, and H. Cheng, “Robust reputation-based ranking on bipartite rating networks,” in SDM’12, 2012,pp. 612–623.

[13] E. Ayday, H. Lee, and F. Fekri, “An iterative algorithm for trustand reputation management,” in Proceedings of the 2009 IEEEinternational conference on Symposium on Information Theory -Volume 3, ser. ISIT’09, 2009, pp. 2051–2055.

[14] H. Liao, G. Cimini, and M. Medo, “Measuring quality, repu-tation and trust in online communities,” ArXiv e-prints, Aug.2012.

[15] B.-C. Chen, J. Guo, B. Tseng, and J. Yang, “User reputation in acomment rating environment,” in Proceedings of the 17th ACMSIGKDD international conference on Knowledge discovery and datamining, ser. KDD ’11, 2011, pp. 159–167.

[16] C. T. Chou, A. Ignatovic, and W. Hu, “Efficient computationof robust average of compressive sensing data in wirelesssensor networks in the presence of sensor faults,” Parallel andDistributed Systems, IEEE Transactions on, vol. 24, no. 8, pp.1525–1534, Aug 2013.

[17] Y. Yu, K. Li, W. Zhou, and P. Li, “Trust mechanisms in wire-less sensor networks: Attack analysis and countermeasures,”Journal of Network and Computer Applications, vol. 35, no. 3, pp.867 – 880, 2012, ¡ce:title¿Special Issue on Trusted Computingand Communications¡/ce:title¿.

[18] H.-S. Lim, G. Ghinita, E. Bertino, and M. Kantarcioglu, “Agame-theoretic approach for high-assurance of data trustwor-thiness in sensor networks,” in Data Engineering (ICDE), 2012IEEE 28th International Conference on, april 2012, pp. 1192 –1203.




[19] M. Rezvani, A. Ignjatovic, E. Bertino, and S. Jha, “Secure dataaggregation technique for wireless sensor networks in thepresence of collusion attacks,” School of Computer Scienceand Engineering, UNSW, Tech. Rep. UNSW-CSE-TR-201319,July 2013.

[20] D. Wagner, “Resilient aggregation in sensor networks,” inProceedings of the 2nd ACM workshop on Security of ad hoc andsensor networks, ser. SASN ’04, 2004, pp. 78–87.

[21] “The Intel lab data,” Data set available at: http://berkeley.intel-research.net/ labdata/ , 2004.

[22] B. Awerbuch, R. Curtmola, D. Holmer, C. Nita-rotaru, andH. Rubens, “Mitigating byzantine attacks in ad hoc wirelessnetworks,” Department of Computer Science, Johns HopkinsUniversity, Tech, Tech. Rep., 2004.

[23] M. C. Vuran and I. F. Akyildiz, “Spatial correlation-basedcollaborative medium access control in wireless sensor net-works,” IEEE/ACM Trans. Netw., vol. 14, no. 2, pp. 316–329,Apr. 2006.

[24] S. Ganeriwal, L. K. Balzano, and M. B. Srivastava, “Reputation-based framework for high integrity sensor networks,” ACMTrans. Sen. Netw., vol. 4, no. 3, pp. 15:1–15:37, Jun. 2008.

[25] X.-Y. Xiao, W.-C. Peng, C.-C. Hung, and W.-C. Lee, “UsingSensorRanks for in-network detection of faulty readings inwireless sensor networks,” in Proceedings of the 6th ACMinternational workshop on Data engineering for wireless and mobileaccess, ser. MobiDE ’07, 2007, pp. 1–8.

[26] M. Li, D. Ganesan, and P. Shenoy, “PRESTO: feedback-drivendata management in sensor networks,” in Proceedings of the3rd conference on Networked Systems Design & Implementation -Volume 3, ser. NSDI’06, 2006, pp. 23–23.

[27] Y. Sun, H. Luo, and S. K. Das, “A trust-based framework forfault-tolerant data aggregation in wireless multimedia sensornetworks,” IEEE Trans. Dependable Secur. Comput., vol. 9, no. 6,pp. 785–797, Nov. 2012.

[28] L.-A. Tang, X. Yu, S. Kim, J. Han, C.-C. Hung, and W.-C. Peng,“Tru-Alarm: Trustworthiness analysis of sensor networks incyber-physical systems,” in Proceedings of the 2010 IEEE Inter-national Conference on Data Mining, ser. ICDM ’10, 2010, pp.1079–1084.

[29] J.-W. Ho, M. Wright, and S. Das, “ZoneTrust: Fast zone-basednode compromise detection and revocation in wireless sensornetworks using sequential hypothesis testing,” Dependable andSecure Computing, IEEE Transactions on, vol. 9, no. 4, pp. 494–511, july-aug. 2012.

[30] S. Ozdemir and H. Cam, “Integration of false data detectionwith data aggregation and confidential transmission in wire-less sensor networks,” IEEE/ACM Trans. Netw., vol. 18, no. 3,pp. 736–749, Jun. 2010.

[31] H. Chan, A. Perrig, and D. Song, “Secure hierarchical in-network aggregation in sensor networks,” in Proceedings of the13th ACM Conference on Computer and Communications Security,ser. CCS ’06. New York, NY, USA: ACM, 2006, pp. 278–287.

[32] Y. Yang, X. Wang, S. Zhu, and G. Cao, “SDAP: a secure hop-by-hop data aggregation protocol for sensor networks,” inMobiHoc, 2006, pp. 356–367.

[33] S. Roy, M. Conti, S. Setia, , and S. Jajodia, “Secure dataaggregation in wireless sensor networks,” Information Forensicsand Security, IEEE Transactions on, vol. 7, no. 3, pp. 1040–1052,2012.

Mohsen Rezvani got his Bachelors andMasters degrees in Computer Engineeringat the Amirkabir University of Technologyand Sharif University of Technology, respec-tively. He is a PhD candidate in the Schoolof Computer Science and Engineering atthe University of New South Wales, Sydney,Australia. His research focuses on trust andreputation systems in WSNs. He is also astudent member of the IEEE and the IEEECommunications Society.

Aleksandar Ignjatovic got his Bachelor andMaster degrees in Mathematics at the Uni-versity of Belgrade, former Yugoslavia, andPh.D. in Mathematical Logic at the Universityof California at Berkeley. After graduation hewas an Assistant Professor at the CarnegieMellon University, where he taught for 5years at the Department of Philosophy andsubsequently had a startup in the Silicon Val-ley. Aleks joined in 2002 the School of Com-puter Science and Engineering at UNSW,

where he is teaching algorithms. His research interests includesampling theory and signal processing, applications of mathematicallogic to computational complexity theory, algorithms for embeddedsystems design and most recently trust based data aggregationalgorithms.

Elisa Bertino is Professor of Computer Sci-ence at Purdue University, and serves asresearch director of the Center for Educationand Research in Information Assurance andSecurity (CERIAS) and Interim Director ofCyber Center (Discovery Park). Previously,she was a faculty member and departmenthead at the Department of Computer Sci-ence and Communication of the Universityof Milan. Her main research interests includesecurity, privacy, digital identity management

systems, database systems, distributed systems, and multimediasystems. She is currently serving as chair of the ACM SIGSAC andas a member of the editorial board of the following international jour-nals: IEEE Security and Privacy, IEEE TRANSACTIONS ON SER-VICE COMPUTING, ACM Transactions on Web. She also served aseditor-in-chief of the VLDB Journal and editorial board member ofACM TISSEC and IEEE TDSC. She coauthored the book IdentityManagementConcepts, Technologies, and Systems. Dr. Bertino isa fellow of the ACM. She received the 2002 IEEE Computer So-ciety Technical Achievement Award for outstanding contributions todatabase systems and database security and advanced data man-agement systems and the 2005 IEEE Computer Society TsutomuKanai Award for pioneering and innovative research contributions tosecure distributed systems.

Sanjay Jha received the PhD degree fromthe University of Technology, Sydney, Aus-tralia. He is a professor and head of theNetwork Group at the School of ComputerScience and Engineering at the University ofNew South Wales. He is an associate editorof the IEEE Transactions on Mobile Com-puting. He was a Member-at-Large, Tech-nical Committee on Computer Communica-tions (TCCC), IEEE Computer Society for anumber of years. He has served on program

committees of several conferences. He was cochair and generalchair of the Emnets-1 and Emnets-II workshops, respectively. Hewas also the general chair of the ACM Sensys 2007 symposium.His research activities include a wide range of topics in network-ing including wireless sensor networks, adhoc/community wirelessnetworks, resilience/quality of service (QoS) in IP networks, andactive/programmable networks. He has published more than 100articles in high quality journals and conferences. He is the principalauthor of the book Engineering Internet QoS and a coeditor of thebook Wireless Sensor Networks: A Systems Perspective. He is asenior member of the IEEE and the IEEE Computer Society.

secure data aggregation technique for

Documents