ieee transactions on dependable and secure …cs4121/paper 1.pdf · secure data aggregation...

18
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 1 Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani, Student Member, IEEE, Aleksandar Ignjatovic, Elisa Bertino, Fellow, IEEE and Sanjay Jha, Senior Member, IEEE, Abstract—At present, due to limited computational power and energy resources of sensor nodes, aggregation of data from multiple sensor nodes done at the aggregating node is usually accomplished by simple methods such as averaging. However, such aggregation has been known to be highly vulnerable to node compromising attacks. Since WSN are usually unattended and without tamper resistant hardware, they are highly susceptible to such attacks. Thus, ascertaining trustworthiness of data and reputation of sensor nodes has become crucially important for WSN. As the performance of very low power processors dramatically improves and their cost is drastically reduced, future aggregator nodes will be capable of performing more sophisticated data aggregation algorithms, which will make WSN less vulnerable to severe impact of compromised nodes. Iterative filtering algorithms hold great promise for such a purpose. Such algorithms simultaneously aggregate data from multiple sources and provide trust assessment of these sources, usually in a form of corresponding weight factors assigned to data provided by each source. In this paper we demonstrate that a number of existing iterative filtering algorithms, while significantly more robust against collusion attacks than the simple averaging methods, are nevertheless susceptive to a novel sophisticated collusion attack we introduce. To address this security issue, we propose an improvement for iterative filtering techniques by providing an initial approximation for such algorithms which makes them not only collusion robust, but also more accurate and faster converging. We believe that such modified iterative filtering algorithms have a great potential for deployment in the future WSN. Index Terms—Wireless sensor networks, robust data aggregation, collusion attacks. 1 I NTRODUCTION Due to a need for robustness of monitoring, wireless sensor networks (WSN) are usually redundant. Data from multiple sensors is aggregated at an aggregator node which then forwards to the base station only the aggregate values. At present, due to limitations of the computing power and energy resource of sen- sor nodes, data is aggregated by extremely simple algorithms such as averaging. However, such aggre- gation is known to be very vulnerable to faults, and more importantly, malicious attacks [1]. This cannot be remedied by cryptographic methods, because the attackers generally gain complete access to informa- tion stored in the compromised nodes. For that reason data aggregation at the aggregator node has to be accompanied by an assessment of trustworthiness of data from individual sensor nodes. Thus, better, more sophisticated algorithms are needed for data aggregation in the future WSN. Such an algorithm should have two important features. M. Rezvani, A. Ignjatovic and S. Jha are with School of Computer Science and Engineering, University of New South Wales, Sydney, Australia. E-mail: {mrezvani,ignjat,sanjay}@cse.unsw.edu.au E. Bertino is with Department of Computer Science, Purdue Univer- sity. E-mail: [email protected] 1) In the presence of stochastic errors such al- gorithm should produce estimates which are close to the optimal ones in information the- oretic sense. Thus, for example, if the noise present in each sensor is a Gaussian indepen- dently distributed noise with a zero mean, then the estimate produced by such an aggregation algorithm should have a variance close to the Cramer - Rao bound, i.e, it should be close to the variance of the Maximum Likelihood Estimator. However, such estimation should be achieved without supplying to the algorithm the variances of the sensors. 2) The algorithm should also be robust in the presence of non-stochastic errors, such as faults and malicious attacks, and, besides aggregating data, such algorithm should also provide an assessment of the reliability and trustworthiness of the data received from the individual sensor nodes. Trust and reputation systems have a significant role in supporting operation of a wide range of distributed systems, from wireless sensor networks and e-commerce infrastructure to social networks, by providing an assessment of trustworthiness of par- ticipants in such distributed systems. A trustworthi-

Upload: others

Post on 17-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 1

Secure Data Aggregation Technique forWireless Sensor Networks in the Presence of

Collusion AttacksMohsen Rezvani, Student Member, IEEE, Aleksandar Ignjatovic, Elisa Bertino, Fellow, IEEE

and Sanjay Jha, Senior Member, IEEE,

Abstract—At present, due to limited computational power and energy resources of sensor nodes, aggregation of data frommultiple sensor nodes done at the aggregating node is usually accomplished by simple methods such as averaging. However,such aggregation has been known to be highly vulnerable to node compromising attacks. Since WSN are usually unattendedand without tamper resistant hardware, they are highly susceptible to such attacks. Thus, ascertaining trustworthiness of dataand reputation of sensor nodes has become crucially important for WSN. As the performance of very low power processorsdramatically improves and their cost is drastically reduced, future aggregator nodes will be capable of performing moresophisticated data aggregation algorithms, which will make WSN less vulnerable to severe impact of compromised nodes.Iterative filtering algorithms hold great promise for such a purpose. Such algorithms simultaneously aggregate data from multiplesources and provide trust assessment of these sources, usually in a form of corresponding weight factors assigned to dataprovided by each source. In this paper we demonstrate that a number of existing iterative filtering algorithms, while significantlymore robust against collusion attacks than the simple averaging methods, are nevertheless susceptive to a novel sophisticatedcollusion attack we introduce. To address this security issue, we propose an improvement for iterative filtering techniques byproviding an initial approximation for such algorithms which makes them not only collusion robust, but also more accurate andfaster converging. We believe that such modified iterative filtering algorithms have a great potential for deployment in the futureWSN.

Index Terms—Wireless sensor networks, robust data aggregation, collusion attacks.

F

1 INTRODUCTION

Due to a need for robustness of monitoring, wirelesssensor networks (WSN) are usually redundant. Datafrom multiple sensors is aggregated at an aggregatornode which then forwards to the base station onlythe aggregate values. At present, due to limitationsof the computing power and energy resource of sen-sor nodes, data is aggregated by extremely simplealgorithms such as averaging. However, such aggre-gation is known to be very vulnerable to faults, andmore importantly, malicious attacks [1]. This cannotbe remedied by cryptographic methods, because theattackers generally gain complete access to informa-tion stored in the compromised nodes. For that reasondata aggregation at the aggregator node has to beaccompanied by an assessment of trustworthinessof data from individual sensor nodes. Thus, better,more sophisticated algorithms are needed for dataaggregation in the future WSN. Such an algorithmshould have two important features.

• M. Rezvani, A. Ignjatovic and S. Jha are with School of ComputerScience and Engineering, University of New South Wales, Sydney,Australia. E-mail: {mrezvani,ignjat,sanjay}@cse.unsw.edu.au

• E. Bertino is with Department of Computer Science, Purdue Univer-sity. E-mail: [email protected]

1) In the presence of stochastic errors such al-gorithm should produce estimates which areclose to the optimal ones in information the-oretic sense. Thus, for example, if the noisepresent in each sensor is a Gaussian indepen-dently distributed noise with a zero mean, thenthe estimate produced by such an aggregationalgorithm should have a variance close to theCramer - Rao bound, i.e, it should be close to thevariance of the Maximum Likelihood Estimator.However, such estimation should be achievedwithout supplying to the algorithm the variancesof the sensors.

2) The algorithm should also be robust in thepresence of non-stochastic errors, such as faultsand malicious attacks, and, besides aggregatingdata, such algorithm should also provide anassessment of the reliability and trustworthinessof the data received from the individual sensornodes.

Trust and reputation systems have a significantrole in supporting operation of a wide range ofdistributed systems, from wireless sensor networksand e-commerce infrastructure to social networks, byproviding an assessment of trustworthiness of par-ticipants in such distributed systems. A trustworthi-

Page 2: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 2

ness assessment at any given moment represents anaggregate of the behaviour of the participants up tothat moment and has to be robust in the presence ofvarious types of faults and malicious behaviour. Thereare a number of incentives for attackers to manip-ulate the trust and reputation scores of participantsin a distributed system, and such manipulation canseverely impair the performance of such a system [2].The main target of malicious attackers are aggregationalgorithms of trust and reputation systems [3].

Trust and reputation have been recently suggestedas an effective security mechanism for Wireless Sen-sor Networks (WSNs) [4]. Although sensor networksare being increasingly deployed in many applicationdomains, assessing trustworthiness of reported datafrom distributed sensors has remained a challengingissue. Sensors deployed in hostile environments maybe subject to node compromising attacks by adver-saries who intend to inject false data into the system.In this context, assessing the trustworthiness of thecollected data and making decision makers aware ofthe trustworthiness of data becomes a challengingtask [5].

As the computational power of very low powerprocessors dramatically increases, mostly driven bydemands of mobile computing, and as the cost ofsuch technology drops, WSNs will be able to affordhardware which can implement both more sophis-ticated data aggregation and trustworthiness assess-ment algorithms; an example is the recent emergenceof multi-core and multi-processor systems in sensornodes [6].

Iterative Filtering (IF) algorithms are an attractiveoption for WSNs because they solve both problems- data aggregation and data trustworthiness assess-ment - using a single iterative procedure [7]. Suchtrustworthiness estimate of each sensor is based onthe distance of the readings of such a sensor from theestimate of the correct values, obtained in the previousround of iteration by some form of aggregation of thereadings of all sensors. Such aggregation is usuallya weighted average; sensors whose readings signif-icantly differ from such estimate are assigned lesstrustworthiness and consequently in the aggregationprocess in the present round of iteration their readingsare given a lower weight.

In recent years, there has been an increasing amountof literature on IF algorithms for trust and reputationsystems [8], [7], [9], [10], [11], [12], [13], [14], [15].The performance of IF algorithms in the presence ofdifferent types of faults and simple false data injectionattacks has been studied, for example in [16] where itwas applied to compressive sensing data in WSNs. Inthe past literature it was found that these algorithmsexhibit better robustness compared to the simple av-eraging techniques; however, the past research didnot take into account more sophisticated collusionattack scenarios. If the attackers have a high level

of knowledge about the aggregation algorithm andits parameters, they can conduct sophisticated attackson WSNs by exploiting false data injection through anumber of compromised nodes. This paper presentsa new sophisticated collusion attack scenario againsta number of existing IF algorithms based on the falsedata injection. In such an attack scenario, colludersattempt to skew the aggregate value by forcing suchIF algorithms to converge to skewed values providedby one of the attackers.

Although such proposed attack is applicable to abroad range of distributed systems, it is particularlydangerous once launched against WSNs for two rea-sons. First, trust and reputation systems play criticalrole in WSNs as a method of resolving a numberof important problems, such as secure routing, faulttolerance, false data detection, compromised nodedetection, secure data aggregation, cluster head elec-tion, outlier detection, etc [17]. Second, sensors whichare deployed in hostile and unattended environmentsare highly susceptible to node compromising attacks[18]. While offering better protection than the simpleaveraging, our simulation results demonstrate thatindeed current IF algorithms are vulnerable to suchnew attack strategy.

As we will see, such vulnerability to sophisticatedcollusion attacks comes from the fact that these IFalgorithms start the iteration process by giving anequal trust value to all sensor nodes. In this paper, wepropose a solution for such vulnerability by providingan initial trust estimate which is based on a robustestimation of errors of individual sensors. When thenature of errors is stochastic, such errors essentiallyrepresent an approximation of the error parametersof sensor nodes in WSN such as bias and variance.However, such estimates also proves to be robust incases when the error is not stochastic but due to co-ordinated malicious activities. Such initial estimationmakes IF algorithms robust against such a sophisti-cated collusion attack, and, we believe, more robustunder significantly more general circumstances; it isalso effective in the presence of complete failure ofsome of the sensor nodes. This is in contrast withthe traditional non iterative statistical sample estima-tion methods which are not robust against false datainjection by a number of compromised nodes [18]and which can be severely skewed in the presenceof complete sensor failure.

Furthermore, we augment IF algorithms with anovel approach for collusion detection and revocation.Thus, we run our improved IF algorithm to obtain aninitial approximation of the aggregate values; we thenconsider distribution of differences of each sensorreadings and such approximations, rather than justthe average magnitude of such differences, to iden-tify the compromised nodes and eliminate them. Wefinally re-run our method to obtain the final aggregatevalues.

Page 3: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 3

Since readings keep streaming into aggregatornodes in WSNs, and since attacks can be very dy-namic (such as orchestrated attacks [3]), in order toobtain trustworthiness of sensor nodes as well as toidentify compromised nodes we apply our frame-work on consecutive batches of consecutive readings.Sensors are deemed compromised only relative to aparticular batch; this allows our framework to handleon-off type of attacks (called orchestrated attacks in[3]).

We validate the performance of our algorithm bysimulation on synthetically generated datasets. Oursimulation results illustrate that our robust aggre-gation technique is effective in terms of robustnessagainst our novel sophisticated attack scenario as wellas efficient in terms of the computational cost.

Our contributions can be summarized as follows:

1) Identification of a new sophisticated and pow-erful attack against IF based reputation systemswhich reveals a severe vulnerability in iterativefiltering algorithms;

2) A novel method for error estimation of sensorsnodes which is effective in a wide range ofsensor faults and not susceptible to the describedattack;

3) Design of an efficient and robust aggregationtechnique inspired by the Maximum LikelihoodEstimation (MLE), which utilises an estimate ofthe noise parameters obtained using contribu-tion 2 above;

4) Enhanced IF schemes able to protect againstsophisticated collusion attacks by providing aninitial estimate of trustworthiness of sensors us-ing inputs from contributions 2 and 3 above;

5) A novel collusion detection method based onan estimate of normality of sensor errors in theproposed robust aggregation framework.

We provide a thorough empirical evaluation of ef-fectiveness and efficiency of our proposed aggregationmethod. The results show that our method providesboth higher accuracy and better collusion resistancethan the existing methods.

The rest of this paper is organized as follows.Section 2 describes the problem statement and theassumptions. Section 3 presents our novel robust dataaggregation framework. Section 4 describes our exper-imental results. Section 5 presents the related work.Finally, the paper is concluded in Section 6.

2 BACKGROUND, ASSUMPTIONS, THREATMODEL AND PROBLEM STATEMENT

In this section, we present our assumptions, discussIF algorithms, describe a collusion attack scenarioagainst IF algorithms, and state the problems that weaddress in this paper.

Base Station

Aggregator Node/ Cluster Head

Aggregator Node/ Cluster Head

Cluster

Cluster

Fig. 1: Network model for WSN.

2.1 Network ModelFor the sensor network topology, we consider theabstract model proposed by Wagner in [19]. Fig. 1shows our assumption for network model in WSN.The sensor nodes are divided into clusters, and eachcluster has a cluster head which acts as an aggregator.Data are periodically collected and aggregated bythe aggregator. In this paper we assume that theaggregator itself is not compromised and concentrateon algorithms which make aggregation secure whenthe individual sensor nodes might be compromisedand might be sending false data to the aggregator. Weassume that each data aggregator has enough compu-tational power to run an iterative filtering algorithmfor data aggregation.

2.2 Iterative Filtering in Reputation SystemsKerchove and Dooren proposed in [7] an IF algorithmfor computing reputation of objects and raters in arating system. We briefly describe the algorithm in thecontext of data aggregation in WSN and explain thevulnerability of the algorithm for a possible collusionattack. We note that our improvement is applicable toother IF algorithms as well.

We consider a WSN with N sensors Sp, p =1, · · · , N . We also assume that the aggregator workson one block of readings, each block comprising ofreadings at T consecutive instants. Therefore, a blockof readings is represented by a matrix X = {xp :p = 1 · · ·N} where xp = {xtp : t = 1 · · ·T}is a sequence of readings from sensor Sp. Let r =〈r1, r2, . . . , rT 〉 denote the aggregate values at instantst = 1, · · · , T , which authors of [7] call a reputationvector1, computed iteratively and simultaneously witha sequence of weights w = 〈w1, w2, · · · , wN 〉 reflectingthe trustworthiness of sensors. We denote by rl, wl

the approximations of r, w obtained at lth round ofiteration (l ≥ 0).

The iterative procedure starts with giving equalcredibility to all sensors, i.e., with an initial value

1. We find such terminology confusing, because reputationshould pertain to the level of trustworthiness rather than theaggregate value, but have decided to keep the terminology whichis already in use.

Page 4: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 4

w0 = 1. The value of the reputation vector rl+1 inround of iteration l + 1 is obtained from the weightsof the sensors obtained in the round of iteration l as

rl+1 =X ·wl

N∑i=1

wli

Consequently, the initial reputation vector is r1 =1NX ·1, i.e., r1 is just the sequence of simple averagesof the readings of all sensors at each particular instant.The new weight vector wl+1 to be used in roundof iteration l + 1 is then computed as a functiong(d) of the normalized distance d between the sensorreadings and the reputation vector rl. Thus,

d =

d1...dN

=1

T

∥∥x1 − rl+1

∥∥22

...∥∥xN − rl+1∥∥22

.

wl+1 =

g(d1)

...g(dN )

;

Function g(x) is called the discriminant function andit provides an inverse relationship of weights and thedistances d. Our experiments show that selecting adiscriminant function has a significant role in stabilityand robustness of IF algorithms. A number of alter-natives for this function are studied in [7]:• reciprocal: g(d) = d−k;• exponential: g(d) = e−d;• affine: g(d) = 1 − kld, where kl > 0 is chosen so

that g(maxi{dli}) = 0.Algorithm 1 illustrates the iterative computation of

the reputation vector based on the above formulas.TABLE 1 shows a trace example of this algorithm.The sensor readings in the first three rows of this tableare from sensed temperatures in Intel Lab dataset [20]at three different time instants. We executed the IFalgorithm on the readings; the discriminant functionin the algorithm was a reciprocal of the distancebetween sensor readings and the current computedreputation. The lower part of the table illustrates theweight vector in each iteration as well as the obtainedreputation values for the three different time instants(t1, t2, t3) in the last three columns. As can be seen,the algorithm converges after six iterations.

2.3 Adversary Model

For describing the threat model, we assume thatsensors are deployed in a hostile unattended envi-ronment. Consequently, some nodes can be physicallycompromised. We assume that when a sensor nodeis compromised, all the information which is insidethe node becomes accessible by the adversary. Thus,

Algorithm 1: Iterative filtering algorithm.Input: X,N, T .Output: The reputation vector rl← 0;w0 ← 1;repeat

Compute rl+1;Compute d;Compute wl+1;l← l + 1;

until reputation has converged;

TABLE 1: A trace example of iterative filtering algo-rithm.

sensor readings aggregate valuesinstant s1 s2 s3 s4 s5 s6 s7 s8

t=1 19.3612 19.42 19.0084 18.5674 17.95 22.153 18.0088 20.4t=2 19.3612 19.4102 19.0084 18.5478 21.282 21.347 18.0088 20.4098t=3 19.3612 19.42 19.0084 17.117 21.3408 20.813 21.625 19.7924

round# sensor weights t=1 t=2 t=31 1 1 1 1 1 1 1 1 19.3586 19.6719 19.80972 1.01E+01 1.34E+01 2.4896 0.3282 0.4335 0.2581 0.3806 1.8413 19.4008 19.439 19.43183 2.38E+02 2.24E+03 5.7843 0.4381 0.328 0.2286 0.3412 1.4486 19.4137 19.4052 19.41394 4.01E+02 2.96E+04 6.1705 0.446 0.3199 0.2267 0.3404 1.4116 19.4192 19.4095 19.41925 3.31E+02 1.59E+06 6.02 0.4433 0.3206 0.2278 0.3403 1.4273 19.42 19.4102 19.426 3.22E+02 6.47E+09 5.9971 0.4428 0.3207 0.2279 0.3402 1.4297 19.42 19.4102 19.42

we cannot rely on cryptographic methods for pre-venting the attacks, since the adversary may extractcryptographic keys from the compromised nodes. Weassume that through the compromised sensor nodesthe adversary can send false data to the aggregatorwith a purpose to distort the aggregate values. Weassume that all compromised nodes can be undercontrol of a single adversary or a colluding group ofadversaries, enabling them to launch a sophisticatedattack. We also assume that the adversary has enoughknowledge about the aggregation algorithm and itsparameters. Finally, we assume that the base stationand aggregator nodes cannot be compromised inthis adversary model; there is an extensive literatureproposing how to deal with the problem of compro-mised aggregators; in this paper we limit our attentionto the lower layer problem of false data being sentto the aggregator node by compromised individualsensor nodes, which has received much less attentionin the existing literature.

2.4 Collusion Attack ScenarioMost of the IF algorithms employ simple assumptionsabout the initial values of weights for sensors. In caseof our adversary model, an attacker is able to misleadthe aggregation system through careful selection ofreported data values. We use visualisation techniquesfrom [18] to present our attack scenario.

Assume that ten sensors report the values of tem-perature which are aggregated using Kerchove andDooren algorithm proposed in [7] with the recipro-cal discriminant function. We consider three possiblescenarios; see Fig. 2.• In scenario 1, all sensors are reliable and the value

estimated by the IF algorithm is close to the actual

Page 5: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 5

value.• In scenario 2, an adversary compromises two

sensor nodes, and alters the readings of thesevalues such that the simple average of all sensorreadings is skewed towards a lower value. Asthese two sensor nodes report a lower value, IFalgorithm penalises them and assigns to themlower weights, because their values are far fromthe values of other sensors. In other words, thealgorithm is robust against the false data injec-tion in this scenario because the compromisednodes individually falsify the readings withoutany knowledge about the aggregation algorithm.TABLE 2 illustrates a trace example of the attackscenario on Intel dataset; sensors 9 and 10 arecompromised by an adversary. As one can see,the algorithm assigns very low weights to thesetwo sensor nodes and consequently their contri-butions decrease. Thus, the iterative algorithm isrobust against the simple outlier injection by thecompromised nodes.

• In scenario 3, an adversary employs three com-promised nodes in order to launch a collusionattack. It listens to the reports of sensors in thenetwork and instructs the two compromised sen-sor nodes to report values far from the true valueof the measured quantity. It then computes theskewed value of the simple average of all sensorreadings and commands the third compromisedsensor to report such skewed average as its read-ings. In other words, two compromised nodesdistort the simple average of readings, while thethird compromised node reports a value veryclose to such distorted average thus making suchreading appear to the IF algorithm as a highlyreliable reading. As a result, IF algorithms willconverge to the values provided by the thirdcompromised node, because in the first iterationof the algorithm the third compromised node willachieve the highest weight, significantly dominat-ing the weights of all other sensors. This is re-inforced in every subsequent iteration; therefore,the algorithm quickly converges to a reputationwhich is very close to the initial skewed simpleaverage, as shown in Fig. 2. TABLE 3 shows thesame attack scenario on Intel Lab dataset; sensors8, 9 and 10 are compromised by an adversary.As one can see, the algorithm converges quicklyto the readings of sensor 10 which is essentiallyequal to the simple average value of the sensors.

In the third scenario, how much the aggregatevalue is skewed directly depends on the number ofcompromised nodes which distort the sample averageof readings. Moreover, in this scenario, the attackerneeds to gain control over at least two sensor nodes;one which will reports readings which distort thesample average and another one which reports such

Reliable node

Compromised node

Reputation Value

Actual Value

Scenario 1:

Scenario 2:

Scenario 3:

Fig. 2: Attack scenario against iterative filtering algo-rithm.

TABLE 2: A trace example of a simple attack scenario.sensor readings

instant s1 s2 s3 s4 s5 s6 s7 s8 s9 s10t=1 19.7336 19.6160 19.7728 20.2040 20.4196 19.4494 20.1354 19.0084 13.2001 13.5609

round# sensor weights t=11 1 1 1 1 1 1 1 1 1 1 18.50972 6.68 8.17 6.27 3.48 2.74 11.41 3.78 40.21 0.35 0.41 19.33903 64.21 130.29 53.13 13.36 8.56 872.81 15.77 91.52 0.27 0.30 19.48114 156.81 549.28 117.50 19.13 11.35 8.1E+3 23.36 44.76 0.25 0.29 19.46765 141.35 454.18 107.37 18.44 11.03 2.1E+4 22.42 47.42 0.25 0.29 19.45366 127.57 379.24 98.16 17.76 10.72 1.7E+5 21.51 50.45 0.26 0.29 19.44687 121.61 349.49 94.12 17.44 10.57 1.4E+7 21.09 52.02 0.26 0.29 19.44608 120.91 346.06 93.64 17.40 10.55 1.0E+11 21.04 52.22 0.26 0.29 19.4460

TABLE 3: A trace example of the proposed collusionattack scenario.

sensor readingsinstant s1 s2 s3 s4 s5 s6 s7 s8 s9 s10

t=1 19.7336 19.6160 19.7728 20.2040 20.4196 19.4494 20.1354 13.2001 13.5609 18.4546

round# sensor weights t=11 1 1 1 1 1 1 1 1 1 1 18.45462 0.6113 0.7414 0.5755 0.3268 0.2590 1.0106 0.3540 0.0362 0.0418 6.25E+08 18.45463 0.6113 0.7414 0.5755 0.3268 0.2590 1.0105 0.3540 0.0362 0.0418 1.78E+16 18.4546

distorted average. In our experiments, we investigatehow the behaviour of the IF algorithm depends onthe number of compromised nodes; see Section 4.4.

Clearly, the main source of the above vulnerabilitycomes from the fact that the algorithm assigns anequal initial weight to all sensor nodes in the firstiteration. Therefore, under an attack, as we havedescribed, the reputation value of the first iterationis always extremely close to the simple average ofreadings, and the second vector of weights is com-puted based on the distance of each sensor to thesimple average provided by the first iteration. Asmost of the IF algorithms in the literature make thesame assumption about the initial trustworthiness ofsensors, we argue that an adversary with sufficientknowledge of such algorithms can launch an attack aswe have described and deceive the aggregator node.

To address the shortcoming of existing IF tech-niques, we focus on estimating an initial trust vectorbased on an estimate of noise (i.e., error) parametersof sensor nodes. After that, we use the new trustvector as the initial sensor trustworthiness in order toconsolidate the algorithms against an attack scenarioof the type described in this paper.

3 ROBUST DATA AGGREGATION

In this section, we present our robust data aggregationmethod. TABLE 4 contains a summary of notationsused in this paper.

Page 6: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 6

TABLE 4: Notation used in this paper.

N number of sensorsT number of readings for each sensorrt true value of the signal at time txts data from sensor s at time tets noise (error) of sensor s at time tbs bias of sensor sσs standard deviation of noise of sensor svs variance of sensor s

3.1 Framework Overview

In order to improve the performance of IF algorithmsagainst the aforementioned attack scenario, we pro-vide a robust initial estimation of the trustworthinessof sensor nodes to be used in the first iteration ofthe IF algorithm. Most of the traditional statisticalestimation methods for variances involve use of thesample mean. For this reason, proposing a robustvariance estimation method in the case of skewedsample mean is essential part of our methodology.

In the rest of this paper, we assume that the stochas-tic components of sensor errors are independent ran-dom variables with a Gaussian distribution; however,our experiments show that our method works quitewell for other types of errors without any modifi-cation; however, if error distribution of sensors isknown, our algorithms can be adapted to other ran-dom distributions to achieve an optimal performance.

Fig. 3 illustrates the stages of our robust aggre-gation framework and their interconnections. As wehave mentioned, our aggregation method operates onbatches of consecutive readings of sensors, proceedingin several stages. In the first stage we provide aninitial estimate of two noise parameters for sensornodes, bias and variance; details of the computationsfor estimating bias and variance of sensors are pre-sented in Section 3.2 and 3.3, respectively.

Based on such an estimation of the bias and vari-ance of each sensor, in the next phase of the proposedframework, we provide an initial estimate of thereputation vector calculated using the Maximum Like-lihood Estimation. The detailed computation opera-tions of such estimation are described in Section 3.4.

In the third stage of the proposed framework, theinitial reputation vector provided in the second stageis used to estimate the trustworthiness of each sensorbased on the distance of sensor readings to suchinitial reputation vector. This idea will be describedin Section 3.5.

Although using such initial reputation makes IFalgorithm more robust than its original version withequal weights for all sensors, our experiments showthat the attacker can still skew the reputation resultsconsiderably. Thus, in the fourth stage we suggest anovel collusion detection mechanism for eliminatingthe contributions of the compromised nodes.

Noise Parameters Estimation

Bias Estimation

Variance Estimation

MLE with Known Bias and Variances

Maximum Likelihood Estimation

Sensors Readings

Unbiasing Sensors Readings

MLE with Known Variances

Normality Test on Sensors Errors

Eliminate Readings of

Colluders

Collusion Detection

Compute Initial Trustworthiness

of Sensors

Original Iterative Filtering

Iterative Filtering

Compute Sensors Errors

1 2 3 4

Fig. 3: Our robust data aggregation framework.

The idea behind detection of colluders in a sophis-ticated collusion attack is that at least one of thecompromised nodes will have highly non stochasticbehaviour; for example, in our attack scenario, oneof the compromised nodes is constrained to reportingvalues which must be very close to the skewed mean.On the other hand, the error of non-compromisednodes, even when it is large, comes from a large num-ber of independent factors, and thus must roughlyhave a Gaussian distribution. Consequently, insteadof looking just at the Root Mean Square (RMS) magni-tude of errors of each sensor, we look at the statisticaldistribution of such errors, assessing the likelihoodwhether they came from a normally distributed ran-dom variable. Nodes that are highly unlikely to havecome from a normally distributed random variable,possibly with a bias, are eliminated.

Finally, after revoking the readings of untrustedsensors, we re-run our noise parameters estimationas well as the MLE with known variances on theremaining readings (stage 1 and 2). The details ofour collusion detection method will be described inSection 3.6.

3.2 Estimating BiasWe assume that all sensors in WSN can have someerror; such error ets of a sensor s is modelled by theGaussian distribution random variable with a sensorbias bs and sensor variance σs, ets ∼ N (bs, σ

2s). Let rt

denotes the true value of the signal at time t. Eachsensor reading xts can be written as:

xts = rt + ets (1)

The main idea is that, since we have no access tothe true value rt we cannot obtain the value of theerror ets; however, we can obtain the values of thedifferences of such errors. Thus, if we define δ(i, j) =

1T

T∑t=1

(xti − xtj

), we get:

δ(i, j) =1

T

T∑t=1

(xti − xtj

)=

1

T

T∑t=1

((rt + eti)− (rt + etj)

)=

1

T

T∑t=1

eti −1

T

T∑t=1

etj

Page 7: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 7

where eti is a random variable with Gaussian distri-

bution eti ∼ N (bi, σ2i ). Let ei = 1

T

T∑t=1

eti be the sample

mean of this random variable. As the sample meanis an unbiased estimator of the expected value of arandom variable, we have

δ(i, j) = ei − ej ≈ bi − bj

Let δ = {δ(i, j) : 1 ≤ i, j ≤ N}; this matrix is anestimator for mutual difference of sensor bias. In orderto obtain the sensor bias from this matrix, we couldsolve the following minimization problem.

minimizeb

N∑i=1

i−1∑j=1

(bi − bj − δ(i, j))2

subject toN∑i=1

bi = 0.

To justify our constraint, it is clear that if the mean ofthe bias of all sensors is not zero, then there wouldbe no way to account for it on the basis of sensorreadings. On the other hand, bias of sensors, undernormal circumstances, comes from imperfections inmanufacture and calibration of sensors as well asfrom the fact that they might be deployed in placeswith different environmental circumstances where thesensed scalar might in fact have a slightly differentvalue. Since by the very nature we are interested inobtaining a most reliable estimate of an average valueof the variable sensed, it is reasonable to assume thatthe mean bias of all sensors is zero (without faultsor malicious attacks), as we are looking for a robustaverage of sensor readings.

If relative magnitudes of bias can vary a lot fromsensor to sensor, then it is better to look at relativeerrors in bias estimates. Therefore, the following isa more robust version of the above minimisationproblem.

minimizeb

N∑i=1

i−1∑j=1

(bi − bjδ(i, j)

− 1

)2

subject toN∑i=1

bi = 0.

(2)

We introduce a Lagrangian multiplier λ and look atextremal values of the following function:

F (~b) =

N∑i=1

i−1∑j=1

(bi − bjδ(i, j)

− 1

)2

+ λ

N∑i=1

bi

By setting the gradient of F (~b) to zero we obtaina system of linear equations whose solution is ourapproximation of the values of the bias of sensors. Ifwe let

d(i, j) =

{−δ(j, i) i < j

δ(j, i) i ≥ j

then these equations can be written in the followingcompact form:

N∑i=1i6=k

2d(i,k)2

bi −N∑i=1i 6=k

2d(i,k)2

bk − λ = 2N∑i=1i 6=k

1d(i,k) ,

for all k = 1, · · · , N

N∑i=1

bi = 0.

(3)Note that the obtained value of bi is actually an

approximation of the sample mean of the error ofsensor i, which, in turn is an unbiased estimator ofthe bias of such a sensor.

3.3 Estimating VarianceIn this section, we propose a similar technique forestimating variance of the sensor noise using theestimated bias from previous section. Given the biasvector b = [b1, b2, · · · , bN ] and sensor readings {xts},we can define matrices {xts} and β = {β(i, j)} asfollows:

xts = xts − bs (4)

β(i, j) =1

T − 1

T∑t=1

(xti − xtj

)2=

1

T − 1

T∑t=1

((xti − bi

)−(xtj − bj

))2=

1

T − 1

T∑t=1

((xti − xtj

)− (bi − bj)

)2By (1) we have xti −xtj = (rt + eti)− (rt + etj) = eti − etj ;thus, we obtain

β(i, j) =1

T − 1

T∑t=1

((eti − etj

)− (bi − bj)

)2=

1

T − 1

T∑t=1

((eti − bi

)−(etj − bj

))2=

1

T − 1

T∑t=1

(eti − bi

)2+

1

T − 1

T∑t=1

(etj − bj

)2− 2

T − 1

T∑t=1

(eti − bi

) (etj − bj

)We assume that the sensors noise is generated by

independent random variables2; as we have men-tioned, our approximations of the bias bi are actuallyapproximations of the sample mean; thus

2. We analyze our estimation method with synthetic correlateddata and the experimental results show that the our methodproduces excellent results even for correlated noise.

Page 8: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 8

1

T − 1

T∑t=1

(eti − bi

) (etj − bj

)≈ 1

T − 1

T∑t=1

(eti − ei

) (etj − ej

)≈ Cov(ei, ej) = 0

and similarly

β(i, j) =1

T − 1

T∑t=1

(eti − bi

)2+

1

T − 1

T∑t=1

(etj − bj

)2≈ 1

T − 1

T∑t=1

(eti − ei

)2+

1

T − 1

T∑t=1

(etj − ej

)2≈ σ2

i + σ2j

The above formula shows that we can estimate thevariance of sensors noise by computing the matrix β.We also compute the sum of variances of all sensorsusing the following Lemma.

Lemma 3.1 (Total Variance). Let xt be the mean ofreadings in time t, then, using (4) and our assumption

thatN∑i=1

bi = 0 , we have

xt =1

N

N∑j=1

xtj =1

N

N∑j=1

xtj ,

and the statistic

S(t) =N

T (N − 1)

N∑i=1

T∑t=1

(xti − xt

)2is an unbiased estimator of the sum of the variances of all

sensors,N∑i=1

vi.

Proof: We define xi = {xti : t = 1 · · ·T} asthe unbiased readings of sensor i. Now we form thesecond central moment of the sum of xi for all sensorsas follows:

E

N∑i=1

xi −1

N

N∑j=1

xj

2

=1

N2E

N∑i=1

N∑j=1

(xi − xj)

2

=1

N2E

N∑i=1

N∑j=1

N∑k=1

(xi − xj)(xi − xk)

=

1

N2

N∑i=1

N∑j=1

N∑k=1

(E[x2i ]− E[xixk]− E[xixj ]

+ E[xjxk])

Note that the readings xi are unbiased, thereforeE[x2

i ] is equal to variance vi of sensor i. In addition,

we assume that the sensor noise is generated byindependent random variables, thus

E[xixj ] =

{0 if i 6= j ,

vi if i = j .

Given the above equations, we have:

E

N∑i=1

xi −1

N

N∑j=1

xj

2

=1

N2

N∑i=1

(N2 E[x2

i ]−N E[x2i ])

=

N∑i=1

vi −1

N

N∑i=1

vi

=N − 1

N

N∑i=1

vi

Thus, we obtain

N∑i=1

vi =N

N − 1E

N∑i=1

xi −1

N

N∑j=1

xj

2 .

By approximating the expected value with the samplemean we get

N∑i=1

vi ≈N

T (N − 1)

T∑t=1

N∑i=1

(xti − xt

)2.

To obtain an estimation of variances of sensorsfrom the matrix β = {β(i, j)} we solve the followingminimization problem:

minimizev

N∑i=1

i−1∑j=1

(vi + vjβ(i, j)

− 1

)2

subject toN∑i=1

vi =N

T (N − 1)

N∑i=1

T∑t=1

(xti − xt

)2 (5)

Note that the constrain of the minimisation problemcomes from theorem 3.1. We again introduce a La-grangian multiplier λ and look for the extremal of thefollowing function:

G(~v) =

N∑i=1

i−1∑j=1

(vi + vjβ(i, j)

− 1

)2

+ λ

(N∑i=1

vi −N

T (N − 1)

N∑i=1

T∑t=1

(xti − xt

)2) (6)

The minimum of G is obtained by setting the gra-dient of G(~v) to zero and solving the resulting linearequations (7),

Page 9: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 9

thus obtaining

N∑i=1i 6=k

1β(i,k)2

vi +N∑i=1i 6=k

1β(i,k)2

vk + λ2 =

N∑i=1

1β(i,k) ,

for all k = 1, · · · , N

N∑i=1

vi = NT (N−1)

N∑i=1

T∑t=1

(xti − xt)2.

(7)

3.4 MLE with Known VarianceGiven matrix {xts} where xts ∼ rt + N (bs, σ

2s) and

estimated bias and variance vectors b and σ, wepropose to recover rt using (an approximate formof) the Maximum Likelihood Estimation, treating sep-arately cases of unbiased and biased sensor errors,respectively.

3.4.1 Unbiased Sensor ErrorsIn the previous sections, we proposed a novel ap-proach for estimating the bias and variance of noisefor sensors based on their readings. The variance andthe bias of a sensor noise can be interpreted as thedistance measures of the sensor readings to the truevalue of the signal. In fact, the distance measuresobtained as our estimates of the bias and variancesof sensors also make sense for non-stochastic errors.

From a heuristic point of view, we removed the“systematic component” of the error by subtractinga quantity which in the case of a stochastic errorcorresponds to an estimate of bias; this allows usto estimate the variability around such a systematiccomponent of the error, which, in case of stochasticerrors, corresponds to variance. We can now obtainan estimation which corresponds to MLE formula forthe case of zero mean normally distributed errors, butwith estimated rather than true variances. Therefore,we assume that the expected value rt of the mea-surements is the true value of the quantity measured,and is the only parameter in the likelihood function.Thus, in the expression for the likelihood function fornormally distributed unbiased case,

LN (rt) =

N∏i=1

1

σi√

2πe− 1

2

(xti−rt)2

σ2i

=

(N∏i=1

1

σi√

)e− 1

2

∑Ni=1

(xti−rt)2

σ2i

we replace σ2i by the obtained variance vi from

equation (7). Moreover, by differentiating the aboveformula with respect to rt and setting the derivativeequal to zero we get

∂rtLN (rt) =

(N∏i=1

1√2πvi

)e− 1

2

∑Ni=1

(xti−rt)2

vi

N∑i=1

(xti − rt)vi

and consequently

∂rtLN (rt) = 0⇔

N∑i=1

xtivi− rt

N∑j=1

1

vj= 0

⇔ rt =

∑Ni=1

xtivi∑N

j=11vj

Thus,

rt =

N∑i=1

1vi∑Nj=1

1vj

xti for all t = 1, · · · , T. (8)

Equation (8) provide an estimate of the true value ofthe quantity measured in a form of a weighted aver-age of sensor readings, with the sensor readings givena weight inversely proportional to the estimation oftheir error variance provided by our method:

r =

N∑s=1

wsxs (9)

Instead of using weights from (8) , we can includea small regularisation factor λ which is needed tohandle the fact that the reciprocal function has a poleat zero which can make the calculation unstable if theestimate of one of the variances is very small:

ws =1

vs+λ

N∑i=1

1vi+λ

for all s = 1, · · · , N (10)

This is in fact a technique proposed in [16] for makingIF algorithms more robust.

Note that this method estimates the reputationvector without any iteration. Thus, the computationalcomplexity of the estimation is considerably less thanthe existing IF algorithms.

3.4.2 Biased Sensor ReadingsAs the MLE can be only applied to unbiased sensorreadings, we eliminate bias from the biased readingsbased on the results from the previous bias estimationprocess. Thus, we replace equation (8) with:

rt =

N∑i=1

1vi+λ∑Nj=1

1vj+λ

(xti− bi) for all t = 1, · · · , T (11)

Another method for obtaining an estimate of truevalues from biased sensor readings is to estimate thedistance of the readings to the true values of the signalusing a combination of bias and variance of the error.Thus, let us represent the error of sensor s in time t assum of the errors due to its bias and to its variance:

ets = bs + ets

where ets ∼ N (0, vs). We now obtain

Page 10: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 10

E[(es)2] = E[(bs + es)

2] = E[b2s] + 2bsE[es] + E[(es)2]

From our assumptions we have E[es] = 0 andE[(es)

2] = vs. Consequently, the expected value of thesquare of the distance of the reading xs satisfies

E[(es)2] = vs + b2s.

and we obtain an estimator for the true value as aweighted average of the form:

rt =

N∑i=1

1vi+b2i+λ∑Nj=1

1vj+b2j+λ

xti, for all t = 1, · · · , T

(12)Our experiment results show that both above ap-

proaches (equations (11) and (12)) provide similarrobustness against faults and attacks for biased sensorreadings.

3.5 Enhanced Iterative Filtering

According to the proposed attack scenario, the at-tacker exploits the vulnerability of the IF algorithmswhich originates from a wrong assumption about theinitial trustworthiness of sensors. Our contribution toaddress this shortcomings is to employ the results ofthe proposed robust data aggregation technique as theinitial reputation for these algorithms. Moreover, theinitial weights for all sensor nodes can be computedbased on the distance of sensors readings to such aninitial reputation. Our experimental results illustratethat this idea not only consolidate the IF algorithmsagainst the proposed attack scenario, but using thisinitial reputation improves the efficiency of the IF al-gorithms by reducing the number of iterations neededto approach a stationary point within the prescribedtolerance; see Section 4.2.

3.6 Collusion Detection and Revocation

Although using the initial reputation results providedby our method makes IF algorithms more robust thantheir original version, our experiments show that theattacker can still alter considerably the reputationresults of the IF algorithms (see Section 4.5). Thus,in this section we propose a novel attacker detectiontechnique in order to further diminish the impactof the compromised nodes. We will first describeour proposed collusion detection scheme and thendiscuss the proposed compromised nodes revocationapproach.

3.6.1 Detection MethodUpon computing the reputation values from the pre-vious consolidated IF approach, we carry out a col-lusion detection and revocation method based on ananalysis of the features of error distribution of the

sensor nodes. In the existing approaches, compro-mised sensor nodes are usually detected as outliersfrom some form of average of all readings. Instead,we propose a finer analysis based on a sequence ofsensor readings, by considering how differences be-tween readings of the individual sensor nodes and theestimate obtained by an iterative filtering techniqueare distributed. The main idea behind our methodis that, while faulty or compromised sensors mightskew the estimate, their action can only make non-compromised sensor appear biased, but the variabilityof such sensors around such a value will still havea distribution close to a normal distribution; on theother hand, the difference between the values pro-vided by the compromised nodes will have highlynon normal distribution, reflecting their essentiallynon-stochastic (colluding) behaviour. Accordingly, weassume a sensor with a non-Gaussian error distribu-tion is likely to be a compromised node.

In order to analyse the error behaviour of sensornodes, we first compute the sensors errors based onthe distances of each sensor readings to the obtainedreputation from our proposed consolidated versionof IF algorithm. After that, we employ a hypothesistesting method to assess the normality of the obtainederror values for each sensor node.

Thus, let es = {ets : t = 1, · · · ,m} be the vector oferror terms for a sensor s, defined as

es = xs − r

where xs = {xts : t = 1, · · · , T} is a sequence of read-ings from sensor s and r = 〈r1, r2, . . . , rT 〉 denote theaggregate values obtained from the previous phase ofour framework.

The problem of deciding whether a sensor node sis compromised can be formulated as a hypothesistesting problem with null and alternative hypothesesas follows:• Null hypothesis H0: The sequence of errors es is

drawn from a Normal distribution.• Alternative hypothesis H1: The sequence of er-

rors es is not drawn from a Normal distribution.In order to judge the compromise sensor nodes,

we employ the Kolmogorov-Smirnov test (K-S test)on sample errors of each sensor. Using the estimatesfor the sample mean and the sample variance wenormalise the errors; the Kolmogorov-Smirnov statis-tic then quantifies a distance between the empiricaldistribution of such normalised samples of sensorerrors es and the N (0, 1) Normal distribution.

3.6.2 Revocation MethodThe proposed collusion detection scheme classifiessensor nodes in two disjoint sets: the set of thecompromised, and the set of the non-compromisednodes. We can now re-apply our proposed estimationmethod on non-compromised sensors readings only

Page 11: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 11

to produce a more accurate estimation of the truevalue of the signal. Our extensive experiments showthat we should re-run both first and second phase ofour framework on readings from non-compromisedsensors; i.e., we should do our variance and biasestimation and then either apply (12), or (11), but wedo NOT run the IF algorithm again. Our experimentsshow that this approach generates more accurate re-sults than those obtained by subsequently running theIF algorithm again. In fact, when the errors come fromstochastic sources only applying the IF algorithms notonly produces no positive effect on the accuracy, but,in fact, it often degrades it. Thus, IF is used in thefirst round because it provides a superior robustnessagainst collusion attacks; however, after the readingsfrom the compromised nodes are removed, applyingjust our non iterative method produces the best re-sults.

4 SIMULATION RESULTS

In this section, we report on a detailed numericalsimulation study that examines the robustness andefficiency of our data aggregation method. The objec-tive of our experiments is to evaluate the robustnessand efficiency of our approach for estimating thetrue values of signal based on the sensor readingsin the presence of faults and collusion attacks. Foreach experiment, we evaluate the accuracy based onroot mean squared of error (RMS error) metric andefficiency based on the number of iterations neededto reach the convergence in the iterative filteringalgorithms.

4.1 Experimental Settings

All the experiments have been conducted on an HPPC with 3.30GHz Intel Core i5-2500 processor with8Gb RAM running a 64-bit Windows 7 Enterprise. Theprogram code has been written in MATLAB R2012b.Although there are a number of real world datasets forevaluating reputation systems and data aggregationin sensor networks such as Intel dataset [20], none ofthem provides a clear ground truth. Thus, we conductour experiments by generating synthetic datasets. Theexperiments are based on simulations performed onboth correlated and uncorrelated sensor readings.If not mentioned otherwise, we generate syntheticdatasets according to the following parameters for allexperiments:• Each simulation experiment was repeated 200

times and then results were averaged;• Number of sensor nodes is N = 20;• Number of readings for each sensor T = 400;• For statistical parameters of the errors (noise)

used to corrupt the true readings, we considerseveral ranges of values for bias, variance andcovariance of noise for each experiment.

In all experiments, we compare our robust aggre-gation method against three other iterative filteringtechniques proposed for reputation systems. For allparameters of other algorithms used in the experi-ments, we set the same values as used in the originalpapers where they were introduced.

The first IF method considered computes the trust-worthiness of sensor nodes based on the distance oftheir readings to the current state of the estimatedreputation [7]. We described the details of this ap-proach in Section 2.2. We investigate two discriminantfunctions g(d) = d−1 and g(d) = 1−kld in our exper-iments and call these methods as Kerchove-Reciprocaland Kerchove-Affine, respectively.

The second IF method we consider is a correlationbased ranking algorithm proposed by Zhou et al. in[8]. In this algorithm, trustworthiness of each sensoris obtained based on the correlation coefficient be-tween the sensors readings and the current estimateof the true value of the signal. In other words, thismethod gives credit to sensor nodes whose readingscorrelate well with the estimated true value of thesignal. Based on this idea, the authors proposed aniterative algorithm for estimating the true value of thesignal by applying a weighted averaging technique.They argued that correlation coefficient is a goodway to quantify the similarity between two vectors.Thus, they employed Pearson correlation coefficientbetween sensor readings and the current state ofestimate signal in order to compute the sensor weight.We call this method as Zhou.

The third algorithm considered has been proposedby Laureti et al. in [9] and is an IF algorithm basedon a weighted averaging technique similar to thealgorithm described in Section 2.2. The only differencebetween these two algorithms is in the discriminantfunction. The authors in [9] exploited discriminantfunction g(d) = d−β and β = 0.5. We call this methodas Laureti.

We apply Kerchove-Reciprocal, Kerchove-Affine, Zhou,Laureti and our robust aggregation approach to syn-thetically generated data. Although we can simplyapply our robust framework to all existing IF ap-proaches (see ® in Fig. 3), in this paper we investi-gate the improvement which addition of our initialtrustworthiness assessment method produces on therobustness of Kerchove-Reciprocal and Kerchove-Affinemethods (We call them RobustAggregate-Reciprocal andRobustAggregate-Affine, respectively.).

TABLE 5 shows a summary of aggregation and dis-criminant functions for all of the above four differentIF methods.

We first conduct experiments by injecting onlyGaussian noise into sensor readings. In the secondpart of the experiments, we investigate the behaviourof these approaches by emulating a simple, non-colluding attack scenario presented in the second caseof Fig. 2. We then evaluate these approaches in the

Page 12: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 12

TABLE 5: Summary of different IF algorithms.

Name Discriminant Function

Kerchove-Reciprocal wl+1i =

(1

T

∥∥∥xi − rl+1∥∥∥22

)−1

Kerchove-Affine wl+1i = 1− k

1

T

∥∥∥xi − rl+1∥∥∥22

Zhou wl+1i =

1

T

T∑i=1

(xti − xt

σxi

)(rt − r

σr

)

Laureti wl+1i =

√1

T

∥∥xi − rl+1∥∥22

case of our sophisticated attack scenario. Finally, weinvestigate the performance of our collusion detectionand sensor revocation method using several evalua-tion metrics.

4.2 Accuracy and Efficiency without an AttackIn the first batch of experiments we assume that thereare no sensors with malicious behaviour. Thus, theerrors are fully stochastic; we concentrate on the carof errors of sensors with Gaussian distributions. Inorder to evaluate the performance of our algorithm incomparison with the existing algorithms, we producethe following four different synthetic datasets.

1) Unbiased error with different variances forsensors: In this scenario, we consider unbiasederrors with different variances for sensor nodes.We considered various distributions of the vari-ance across the set of sensors and obtained sim-ilar results. We have chosen to present the casewith the error of a sensor s at time t is given byets ∼ N (0, s×σ2), considering different values forthe baseline sensor variance σ2. Fig. 4(a) showsthe results of our robust aggregation approachand the information theoretic limit for the min-imal variance provided by the Cramer - Raobound, achieved, for example, using the Maxi-mum Likelihood Estimator with the actual, exactvariances of sensors, which are NOT available toour algorithm. As one can see in this figure, ourproposed approach nearly exactly achieves theminimal possible variance coming from the in-formation theoretic lower bound. Furthermore,Fig. 4(b) illustrates the performance of our ap-proach for the initial trustworthiness assessmentof sensors with different discriminant functionsas well as other IF algorithms. It clearly showsthat our approach outperforms the other meth-ods by having smaller RMS value of error.

2) Bias error: In this scenario, we inject bias er-ror to sensor readings, generated by Gaussiandistribution with different variances. Therefore,the error of sensor s in time t is generatedby ets ∼ N (N (0, σ2

b ), s × σ2) with the varianceof the bias σ2

b = 4 and increasing values forvariances, where the variance of sensor s is equal

to s × σ2. Thus, the sensors bias is producedby a zero mean Gaussian distribution randomvariable. Fig. 4(c) shows the RMS error for allalgorithms in this scenario. As can be seen in thisfigure, our proposed approach provides the bestaccuracy in terms of the RMS error. Moreover,since all of the IF algorithms, along with ourapproach, generate an error close to their errorsin the unbiased scenario, we can conclude thatthe methods are stable against biased but fullystochastic noise.

3) Correlated noise: The heuristics behind our ini-tial variance estimation assumed that the errorsof sensors are uncorrelated. Thus, we tested howthe performance of our method degrades if thenoise becomes correlated and how it comparesto the existing methods under the same circum-stances. So in this scenario, we assume that theerrors of sensors are no longer uncorrelated.Possible covariance functions can be of differenttypes, such as Spherical, Power Exponential, Ra-tional Quadratic, and Matern; see [21]. Althoughour proposed method can be applied to all co-variance functions, we present here the resultsfor the case of the Power Exponential functionρ(i, j) = e

−|i−j|N . Moreover, the variance of a

sensor s is again set to σ2s = s × σ2. From the

corresponding covariance matrix Σ = {Σij =ρ(i, j)σiσj : i, j = 1 · · ·N}, the noise val-ues of sensors are generated from multivari-ate Normal distribution Noise ∼ N (Bias,Σ).In this scenario, we take into account differentvalues of σ for generating the noise values ofsensors in order to analyse the accuracy of thedata aggregation under various levels of noise.Fig. 4(d) shows the RMS error of the algorithmsfor this scenario. As can be seen in this fig-ure, our approach with reciprocal discriminantfunction improves Kerchove-Reciprocal algorithmfor all different values of variance, althoughour method with affine function generates verysimilar RMS error to the original Kerchove-Affinealgorithm. Moreover, the scale of RMS error isin general larger than in scenarios with uncorre-lated noise, as one would expect. It can be de-scribed by our assumption that the sensors noiseis generated by independent random variables;see Section 3.3. Thus, the error of our varianceestimation for correlated data is more than theerror for uncorrelated data.

The results of our simulations show that the use ofour initial variance estimation in the second phase ofour proposed framework as the initial reputation ofIF algorithms decreases the number of iterations forthe algorithms. We evaluate the number of iterationsfor the IF algorithm proposed in [7] by providing theinitial reputation from the results of the our approach

Page 13: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 13

for both unbiased and biased sensors errors. Theresults of this experiment show that the proposedinitial reputation for the IF algorithm improves theefficiency of the algorithm in terms of the number ofiterations until the procedure has converged. In otherwords, by providing this initial reputation, the num-ber of iterations for IF algorithm decreases approx-imately 9% for reciprocal and around 8% for affinediscriminant functions in both biased and unbiasedcircumstances. This can be explained by the fact thatthe new initial reputation is close to the true valueof signal and the IF algorithm needs fewer iterationsto reach its stationary point. In the next part of ourexperiments, we employ this idea for consolidatingthe iterative filtering algorithm against the proposedattack scenario.

1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

2

2.5

3Unbiased2: Cramer−Rao lower band and our approach

σ, Es = N (0, sσ2)

RMSError

RobustAggregateMLE with Actual Variances

(a) Unbiased error

1 1.5 2 2.5 3 3.5 4 4.5 50.5

1

1.5

2

2.5

3

3.5

4

4.5

5Unbiased2: Unbiased sensors with different variances

σ, Es = N (0, sσ2)

RMSError

Kerchove−ReciprocalKerchove−AffineZhouLauretiRobustAggregate−ReciprocalRobustAggregate−Affine

(b) Unbiased error

1 1.5 2 2.5 3 3.5 4 4.5 50.5

1

1.5

2

2.5

3

3.5

4

4.5Biased2: Biased sensors with different variances

σ, Es = N (0, 4) +N (0,σ2s)

RMSError

Kerchove−ReciprocalKerchove−AffineZhouLauretiRobustAggregate−ReciprocalRobustAggregate−Affine

(c) Bias error

1 1.5 2 2.5 3 3.5 4 4.5 52

4

6

8

10

12

14Correlated1: Correlated noises with different variances

σ, σs = σ√s, ρ(i, j) = exp(− | i− j |/N)

RMSError

Kerchove−ReciprocalKerchove−AffineZhouLauretiRobustAggregate−ReciprocalRobustAggregate−Affine

(d) Correlated noise

Fig. 4: Accuracy for No Attack scenarios with differentvariances.

4.3 Accuracy with Simple Attack ScenarioLim et al. in [18] introduced an attack scenario againsttraditional statistical aggregation approaches. We de-scribed the scenario in Section 2.4 and the secondround of Fig. 2 as a simple attack scenario usinga number of compromised node for skewing thesimple average of sensors readings. In this section,we investigate the behavior of IF algorithms againstthe simple attack scenario. Note that the objective ofthis attack scenario is to skew the sample mean ofsensors readings through reporting outlier readingsby the compromised nodes.

In order to evaluate the accuracy of the IF algo-rithms against the simple attack scenario, we assumethat the attacker compromises m (m < N ) sensornodes and reports outlier readings by these nodes. Wegenerate synthetically datasets for this attack scenarioby taking into account different values of variance forsensors errors as well as employing various number

of compromised nodes. Moreover, we generate biasedreadings for all sensor nodes with bias provided bya random variable with a distribution N (0, σ2

b ) withthe variance of bias chosen to be σ2

b = 4.Fig. 5 shows the accuracy of the IF algorithms and

our approach in the presence of such simple attackscenario. It can be seen that the estimates providedby the three approaches, Kerchove-Affine, Zhou andLaureti are significantly skewed by this attack sce-nario and their accuracy significantly decreases byincreasing the number of compromised nodes. On theother hand, Kerchove-Reciprocal provides a reasonableaccuracy for all parameter values of this simple at-tack scenario (see Fig. 5(a)). The robustness of thisdiscriminant function can be explained by the fact thatthe discriminant function sharply diminishes the con-tributions of outlier readings through assigning verylow values of weights to them. In our sophisticatedcollusion attack scenario, we exploit this property inorder to compromise systems employing such dis-criminant function.

Interestingly, the results of our approach showsa considerable improvement on Kerchove-Affine algo-rithm (see Fig. 5(f)), while it experiences no negativeimpacts on the accuracy of Kerchove-Reciprocal algo-rithm.

Moreover, comparing the RMS errors of our ap-proach for this attack scenario and the previous biasedexperiments (see Fig. 4(c)), it can clearly be seen thatour approach achieves the accuracy of without Attackscenario for both discriminant functions in this attackscenario; thus, this validates the robustness of ourapproach against this attack scenario. In next section,we show that this improvement is stable in the case ofproposed collusion attack scenario as well, while bothKerchove-Affine and Kerchove-Reciprocal algorithms arecompromised against such an attack scenario.

4.4 Accuracy with a Sophisticated Collusion At-tackIn order to illustrate the robustness of the pro-posed data aggregation method in the presence of so-phisticated attacks, we synthetically generate severaldatasets by injecting the proposed collusion attacks.Therefore, we assume that the adversary employs m(m < N ) compromised sensor nodes to launch thesophisticated attack scenario proposed in Section 2.4.The attacker uses the first m−1 compromised nodes togenerate outlier readings in order to skew the simpleaverage of all sensor readings. The adversary thenfalsifies the last sensor readings by injecting the valuesvery close to such skewed average. This collusionattack scenario makes the IF algorithm to convergeto a wrong stationary point. In order to investigatethe accuracy of the IF algorithms with this collu-sion attack scenario, we synthetically generate severaldatasets with different values for sensors variances aswell as various number of compromised nodes (m).

Page 14: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 14

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Kerchove−Reciprocal

number of colluders

RM

S E

rror

(a) Kerchove-Reciprocal

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Kerchove−Affine

number of colluders

RM

S E

rror

(b) Kerchove-Affine

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Zhou

number of colluders

RM

S E

rror

(c) Zhou

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: Laureti

number of colluders

RM

S E

rror

(d) Laureti

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: RobustAggregate−Reciprocal

number of colluders

RM

S E

rror

(e) RobustAggregate-Reciprocal

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack6: RobustAggregate−Affine

number of colluders

RM

S E

rror

(f) RobustAggregate-Affine

Fig. 5: Accuracy with a simple attack scenario.

Fig. 6 shows the accuracy of the IF algorithms andour approach in the presence of the collusion attackscenario. It can be seen that the IF algorithms withreciprocal discriminant function are highly vulnerableto such attack scenario (see Fig. 6(a) and Fig. 6(d)),while the affine discriminant function generates morerobust results in this case (see Fig. 6(b)). However,the accuracy of the affine discriminant function is stillmuch worse than the previous experiment without thecollusion attack.

This experiment shows that the collusion attackscenario can circumvent all the IF algorithms we tried.Moreover, the accuracy of the algorithms dramaticallydecreases by increasing the number of compromisednodes participated in the attack scenario. As explainedbefore, the algorithms converge to the readings of oneof the compromised nodes, namely, to the readingsof the node which reports values very close to theskewed mean. This demonstrates that an attacker withenough knowledge about the aggregation algorithmemployed can launch a sophisticated collusion attackscenario which defeats IF aggregation systems.

Fig. 6(e) and Fig. 6(f) show the accuracy of ourapproach by taking into account the IF algorithm in[7] with reciprocal and affine discriminant functions,respectively. As one can see, our proposed approachis superior to all other algorithms in terms of theaccuracy for both discriminant functions. Moreover,comparing the accuracy of our approach in this exper-

iment with the results from no attack and simple attackexperiments in Fig. 4 and Fig. 5, we can argue thatour approach is robust against the collusion attackscenario. The reason is that our approach not onlyprovides the highest accuracy for both discriminantfunctions, it actually approximately reaches the ac-curacy of the scenarios without any false data bycolluders.

As we described, the main shortcoming of the IFalgorithms in the proposed attack scenario is thatthey quickly converge to the sample mean in thepresence of the attack scenario. In order to investigatethe shortcoming, we conducted an experiment byincreasing the sensor variances as well as the numberof colluders. In this experiment, we quantified thenumber of iterations for the IF algorithm with re-ciprocal discriminant function (Kerchove-Reciprocal andRobustAggregate-Reciprocal algorithms). The results ob-tained from this experiment show that the originalversion of the IF algorithm quickly converges (afteraround five iterations) to the skewed values providedby one of the attackers, while starting with an initialreputation provided by our approach, the algorithmsrequire around 29 iterations, and, instead of converg-ing to the skewed values provided by one of theattackers, it provides a reasonable accuracy.

The results of this experiment validate that oursophisticated attack scenario is caused by the discov-ered vulnerability in the IF algorithms which sharplydiminishes the contributions of benign sensor nodeswhen one of the sensor nodes reports a value veryclose to the simple average.

4.5 Collusion Detection Performance

As we described, the third module of our robust dataaggregation framework is a novel collusion detectionsystem, which is a binary classification technique forclassifying the sensor nodes in two groups: compro-mised and non-compromised nodes. Based on theresults of this collusion detection system, we eliminatethe contributions of detected compromised nodes andthen re-run the first and second phases of our frame-work in order to obtain the final reputation based ononly the readings of non-compromised sensor nodes.Therefore, the performance of the collusion detectionsystem has a significant role in improving the accu-racy of the proposed robust aggregation framework.

In order to show how much the collusion detectionmodule improves the accuracy of the proposed robustaggregation framework, we investigate the RMS errorof the framework in presence of our sophisticatedattack scenario. The number of compromised nodes isset to 8 and the discriminant function is set to Affine.Fig. 7 shows the performance of proposed robustframework with the collusion detection module andwithout the module. It can be clearly seen that thecollusion detection module dramatically improved the

Page 15: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 15

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: Kerchove−Reciprocal

number of colluders

RM

S E

rror

(a) Kerchove-Reciprocal1

23

45

4

6

80

10

20

30

40

50

standard deviation

Attack5: Kerchove−Affine

number of colluders

RM

S E

rror

(b) Kerchove-Affine

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: Zhou

number of colluders

RM

S E

rror

(c) Zhou

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: Laureti

number of colluders

RM

S E

rror

(d) Laureti

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: RobustAggregate−Reciprocal

number of colluders

RM

S E

rror

(e) RobustAggregate-Reciprocal

12

34

5

4

6

80

10

20

30

40

50

standard deviation

Attack5: RobustAggregate−Affine

number of colluders

RM

S E

rror

(f) RobustAggregate-Affine

Fig. 6: Accuracy with our sophisticated collusion at-tack.

1 1.5 2 2.5 3 3.5 4 4.5 50

1

2

3

4

5

6

7

8

9

10Sophisticated Attack, with and without collusion detection

standard deviation

RMSError

Without Collusion DetectionWith Collusion Detection

Fig. 7: Impact of collusion detection module on theperformance of our aggregation framework.

accuracy of our robust aggregation framework. Thisis due to that the module can accurately detect thecompromised nodes. Therefore, the next steps of therobust framework only applied on the benign read-ings.

The detection performance of the module is eval-uated by its accuracy, precision, and recall measure-ments for each experimental scenario. A higher valueshows that the collusion detection module is superior.The accuracy is the proportion of the total numberof predictions that were correct; the recall or truepositive rate is the proportion of colluders that werecorrectly detected; precision is the proportion of thedetected colluders that were correct. Accuracy, preci-

TABLE 6: Confusion matrix for collusion detection.

Actual Class Predicted ClassColluder Benign

Colluder True Positive (TP) False Negative (FN)Benign False Positive (FP) True Negative (TN)

TABLE 7: Accuracy of our collusion detection moduleon No Attack scenarios.

Unbiased Biased Correlatedσ Reciprocal Affine Reciprocal Affine Reciprocal Affine

1.5 95.125 98.675 97.075 99.325 98.65 98.7752 94.9 98.95 96.95 98.75 98.7 99

2.5 95.5 99 96.55 98.975 98.8 993 95.125 98.925 96.9 99.075 98.875 98.65

3.5 95.075 98.825 96.125 99 99 99.1754 94.85 99.275 96.4 99.1 98.625 98.975

4.5 95.15 98.825 96.2 98.65 98.675 99.0255 94.925 98.65 96.075 99.1 98.675 99.25

sion and recall measurements are calculated based ona confusion matrix in TABLE 6 as follows:

Accuracy =TP + TN

TP + TN + FP + FN× 100 (13)

Precision =TP

TP + FP× 100 (14)

Recall =TP

TP + FN× 100 (15)

For all experiments described in previous sections,we obtained the confusion matrix as well as theaccuracy, precision and recall measurements for col-lusion detection module. We first investigate the per-formance of this module in the no-attack experimentalscenarios described in Section 4.2. Note that we canonly investigate the accuracy metric for those scenar-ios, because there is no compromised node for themand therefore TP = 0. Consequently, the precisionand recall measurements are zero for all the cases.

The accuracy results of the collusion detectionmodule for the No Attack scenarios are presented inTABLE 7. The table shows that for all experiments,the collusion detection mechanism generates a veryhigh accuracy. Thus, applying this mechanism oncompletely clean readings has no impact on the per-formance of data aggregation process. As one can seein this table, the integration of the collusion detectionmodule with affine discriminant function generateshigher accuracy than reciprocal function. The reasonis that our data aggregation with affine function pro-vides more accurate estimation of the true value ofthe signal as well (see Fig. 4(b) and Fig. 4(c)) andit therefore, leads to more accurate error values forsensors nodes. As our collusion detection method isbased on the error behaviour of sensor nodes, themore accurate estimation of the errors are, the moreeffective the collusion detection module is.

TABLE 8 shows the performance results of thecollusion detection module for described simple and

Page 16: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 16

TABLE 8: Performance of our collusion detectionmodule against attack scenarios.

Simple Attack Sophisticated AttackMetric Reciprocal Affine Reciprocal Affine

Accuracy 96.848 97.662 99.192 99.245Precision 94.573 96.721 96.725 96.900

Recall 94.852 94.966 100 100

sophisticated collusion attack scenarios based on theaverage values of three metrics accuracy, precisionand recall for previous experiments. The table showsthat for both attack scenarios, the collusion detectionmechanism is able to successfully detect the compro-mised nodes with high accuracy, precision and recallvalues.

5 RELATED WORK

Robust data aggregation is a serious concern in WSNsand there are a number of papers investigating mali-cious data injection by taking into account the variousadversary models. There are three bodies of workrelated to our research: IF algorithms, trust and repu-tation systems for WSNs, and secure data aggregationwith compromised node detection in WSNs.

There are a number of published studies intro-ducing IF algorithms for solving data aggregationproblem [8], [7], [9], [10], [11], [12], [13], [14], [15].The primary idea of the algorithm proposed in [8]is to compute correlation coefficients between usersand objects, which gives credit to users which ratingscorrelate nicely with the estimated true ratings ofobjects. Laureti et al. in [9] proposed an IF algorithmbased on a weighted averaging technique which theweights are computed through a simple reciprocaldiscriminant function. Li et al. in [11] proposed sixdifferent algorithms, which are all iterative and arevery similar. The only difference among the algo-rithms is their choice of norm and aggregation func-tion. Ayday et al. proposed a slight different iterativealgorithm in [12]. Their main differences from theother algorithms are: 1) the ratings have a time-discount factor, so in time, their importance will fadeout; and 2) the algorithm maintains a black-list ofusers who are especially bad raters. Liao et al. in[13] proposed an iterative algorithm which beyondsimply using the rating matrix, also uses the socialnetwork of users. The main objective of author in[14] is to introduce a “Bias-smoothed tensor model”,which is a Bayesian model, of rather high complexity.Although the existing IF algorithms consider simplecheating behaviour by adversaries, none of them takeinto account sophisticated malicious scenarios such ascollusion attacks.

Our work is also closely related to the trust and rep-utation systems in WSNs. Authors in [22] proposed ageneral reputation framework for sensor networks inwhich each node develops a reputation estimation forother nodes by observing its neighbors which make

a trust community for sensor nodes in the network.Xiao et al. in [23] proposed a trust based frameworkwhich employs correlation to detect faulty readings.Moreover, they introduced a ranking framework toassociate a level of trustworthiness with each sensornode based on the number of neighboring sensornodes are supporting the sensor. Li et al. in [24]proposed PRESTO, a model-driven predictive datamanagement architecture for hierarchical sensor net-works. PRESTO is a two tier framework for sen-sor data management in sensor networks. The mainidea of this framework is to consider a number ofproxy nodes for managing sensed data from sensornodes. Authors in [5] proposed an interdependencyrelationship between network nodes and data itemsfor assessing their trust scores based on a cyclicalframework. The main contribution of authors in [25]is to propose a combination of trust mechanism, dataaggregation, and fault tolerance to enhance data trust-worthiness in Wireless Multimedia Sensor Networks(WMSNs) which considers both discrete and continu-ous data streams. Tang et al. in [26] proposed a trustframework for sensor networks in Cyber PhysicalSystem (CPS). An example of deployment of sensorsin CPS is a battle-network system in which the sensornodes are employed to detect approaching enemiesand send alarms to a command center. Although faultdetection problems have been addressed by applyingtrust and reputation systems in the above research,none of them take into account sophisticated mali-cious scenarios such as collusion attacks in adversarialenvironments.

Reputation and trust concepts can be used to over-come the compromised node detection and securedata aggregation problems in WSNs. Alzaid in [27]proposed a secure aggregation scheme to addressbad mouthing, ballot stuffing, replay and newcomerattacks; however the scheme is limited to detectingthe On/Off attack launched from only one child cell.Ho et al. in [28] proposed a framework to detectcompromised sensor nodes in WSN and then applya software attestation for the detected nodes. Theyreported that the revocation of detected compromisednodes can not be performed due to a high risk of falsepositive in the proposed scheme. The main idea offalse aggregator detection in the scheme proposed in[29] is to employ a number of monitoring nodes whichare running aggregation operations and providing aMAC value of their aggregation results as a part ofMAC in the value computed by the cluster aggregator.High computation and transmission cost required forMAC-based integrity checking in this scheme makesit unsuitable for deployment in WSN. Lim et al.in [18] proposed a game-theoretical defense strategyto protect sensor nodes and to guarantee a highlevel of trustworthiness for sensed data. Although theaforementioned research take into account false datainjection for a number of simple attack scenarios, to

Page 17: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 17

the best of our knowledge, no existing work addressesthis issue in the case of a sophisticated attack of col-luding adversaries compromising a number of nodesin a manner which employs high level knowledgeabout data aggregation algorithm used.

6 CONCLUSIONSIn this paper, we introduced a novel sophisticatedcollusion attack scenario against a number of existingIF algorithms. We also showed how an attacker withenough knowledge about aggregation algorithm candistort the aggregation process by compromising anumber of sensor nodes in a WSN. Moreover, weproposed an improvement for the IF algorithms byproviding an initial approximation of the trustwor-thiness of sensor nodes which makes the algorithmsnot only collusion robust, but also more accurate andfaster converging. We also extended the IF algorithmswith a novel approach for collusion detection andrevocation based on an initial approximation of theaggregate values as well as distribution of differencesof each sensor readings and such an approximation.In future work, we will extend the proposed robustaggregation framework for WSNs in the presence ofcorrelated noise. We also plan to deploy our approachin a deployed sensor network.

REFERENCES[1] S. Ozdemir and Y. Xiao, “Secure data aggregation in wireless

sensor networks: A comprehensive overview,” Comput. Netw.,vol. 53, no. 12, pp. 2022–2037, Aug. 2009.

[2] A. Jøsang and J. Golbeck, “Challenges for robust trust andreputation systems,” in Proceedings of the 5 th InternationalWorkshop on Security and Trust Management, Saint Malo, France,2009.

[3] K. Hoffman, D. Zage, and C. Nita-Rotaru, “A survey of attackand defense techniques for reputation systems,” ACM Comput.Surv., vol. 42, no. 1, pp. 1:1–1:31, Dec. 2009.

[4] R. Roman, C. Fernandez-Gago, J. Lopez, and H. H. Chen,“Trust and reputation systems for wireless sensor networks,”in Security and Privacy in Mobile and Wireless Networking,S. Gritzalis, T. Karygiannis, and C. Skianis, Eds. TroubadorPublishing Ltd, 2009, pp. 105–128.

[5] H.-S. Lim, Y.-S. Moon, and E. Bertino, “Provenance-basedtrustworthiness assessment in sensor networks,” in Proceedingsof the Seventh International Workshop on Data Management forSensor Networks, ser. DMSN ’10, 2010, pp. 2–7.

[6] H.-L. Shi, K. M. Hou, H. ying Zhou, and X. Liu, “Energyefficient and fault tolerant multicore wireless sensor network:E2MWSN ,” in Wireless Communications, Networking and Mo-bile Computing (WiCOM), 2011 7th International Conference on,2011, pp. 1–4.

[7] C. de Kerchove and P. Van Dooren, “Iterative filtering inreputation systems,” SIAM J. Matrix Anal. Appl., vol. 31, no. 4,pp. 1812–1834, Mar. 2010.

[8] Y. Zhou, T. Lei, and T. Zhou, “A robust ranking algorithm tospamming,” CoRR, vol. abs/1012.3793, 2010.

[9] P. Laureti, L. Moret, Y.-C. Zhang, and Y.-K. Yu, “Informationfiltering via Iterative Refinement,” EPL (Europhysics Letters),vol. 75, pp. 1006–1012, Sep. 2006.

[10] Y.-K. Yu, Y.-C. Zhang, P. Laureti, and L. Moret, “Decodinginformation from noisy, redundant, and intentionally distortedsources,” Physica A Statistical Mechanics and its Applications, vol.371, pp. 732–744, Nov. 2006.

[11] R.-H. Li, J. X. Yu, X. Huang, and H. Cheng, “Robust reputation-based ranking on bipartite rating networks,” in SDM’12, 2012,pp. 612–623.

[12] E. Ayday, H. Lee, and F. Fekri, “An iterative algorithm for trustand reputation management,” in Proceedings of the 2009 IEEEinternational conference on Symposium on Information Theory -Volume 3, ser. ISIT’09, 2009, pp. 2051–2055.

[13] H. Liao, G. Cimini, and M. Medo, “Measuring quality, repu-tation and trust in online communities,” ArXiv e-prints, Aug.2012.

[14] B.-C. Chen, J. Guo, B. Tseng, and J. Yang, “User reputation in acomment rating environment,” in Proceedings of the 17th ACMSIGKDD international conference on Knowledge discovery and datamining, ser. KDD ’11, 2011, pp. 159–167.

[15] M. Allahbakhsh and A. Ignjatovic, “Rating through Voting:An Iterative Method for Robust Rating,” ArXiv e-prints, Nov.2012.

[16] C. T. Chou, A. Ignatovic, and W. Hu, “Efficient computation ofrobust average of compressive sensing data in wireless sensornetworks in the presence of sensor faults,” IEEE Transactionson Parallel and Distributed Systems, vol. 99, no. PrePrints, p. 1,2012.

[17] Y. Yu, K. Li, W. Zhou, and P. Li, “Trust mechanisms in wirelesssensor networks: Attack analysis and countermeasures,”Journal of Network and Computer Applications, vol. 35,no. 3, pp. 867 – 880, 2012, ¡ce:title¿Special Issueon Trusted Computing and Communications¡/ce:title¿.[Online]. Available: http://www.sciencedirect.com/science/article/pii/S1084804511000592

[18] H.-S. Lim, G. Ghinita, E. Bertino, and M. Kantarcioglu, “Agame-theoretic approach for high-assurance of data trustwor-thiness in sensor networks,” in Data Engineering (ICDE), 2012IEEE 28th International Conference on, april 2012, pp. 1192 –1203.

[19] D. Wagner, “Resilient aggregation in sensor networks,”in Proceedings of the 2nd ACM workshop on Security ofad hoc and sensor networks, ser. SASN ’04. New York,NY, USA: ACM, 2004, pp. 78–87. [Online]. Available:http://doi.acm.org/10.1145/1029102.1029116

[20] “The Intel lab data,” Data set available at: http://berkeley.intel-research.net/ labdata/ , 2004.

[21] M. C. Vuran and I. F. Akyildiz, “Spatial correlation-basedcollaborative medium access control in wireless sensornetworks,” IEEE/ACM Trans. Netw., vol. 14, no. 2, pp.316–329, Apr. 2006. [Online]. Available: http://dx.doi.org/10.1109/TENT.2006.872544

[22] S. Ganeriwal, L. K. Balzano, and M. B. Srivastava, “Reputation-based framework for high integrity sensor networks,” ACMTrans. Sen. Netw., vol. 4, no. 3, pp. 15:1–15:37, Jun. 2008.

[23] X.-Y. Xiao, W.-C. Peng, C.-C. Hung, and W.-C. Lee,“Using SensorRanks for in-network detection of faultyreadings in wireless sensor networks,” in Proceedings ofthe 6th ACM international workshop on Data engineeringfor wireless and mobile access, ser. MobiDE ’07. NewYork, NY, USA: ACM, 2007, pp. 1–8. [Online]. Available:http://doi.acm.org/10.1145/1254850.1254852

[24] M. Li, D. Ganesan, and P. Shenoy, “PRESTO: feedback-drivendata management in sensor networks,” in Proceedings of the3rd conference on Networked Systems Design & Implementation -Volume 3, ser. NSDI’06, 2006, pp. 23–23.

[25] Y. Sun, H. Luo, and S. K. Das, “A trust-based framework forfault-tolerant data aggregation in wireless multimedia sensornetworks,” IEEE Trans. Dependable Secur. Comput., vol. 9, no. 6,pp. 785–797, Nov. 2012.

[26] L.-A. Tang, X. Yu, S. Kim, J. Han, C.-C. Hung, and W.-C. Peng,“Tru-Alarm: Trustworthiness analysis of sensor networks incyber-physical systems,” in Proceedings of the 2010 IEEE Inter-national Conference on Data Mining, ser. ICDM ’10, 2010, pp.1079–1084.

[27] H. M. Alzaid, “Secure data aggregation in wireless sensornetworks,” Ph.D. dissertation, Queensland University of Tech-nology, 2011.

[28] J.-W. Ho, M. Wright, and S. Das, “ZoneTrust: Fast zone-basednode compromise detection and revocation in wireless sensornetworks using sequential hypothesis testing,” Dependable andSecure Computing, IEEE Transactions on, vol. 9, no. 4, pp. 494–511, july-aug. 2012.

[29] S. Ozdemir and H. Cam, “Integration of false data detectionwith data aggregation and confidential transmission in wire-less sensor networks,” IEEE/ACM Trans. Netw., vol. 18, no. 3,pp. 736–749, Jun. 2010.

Page 18: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …cs4121/Paper 1.pdf · Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Mohsen Rezvani,

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (TDSC) 18

Mohsen Rezvani is a PhD candidate in the School of ComputerScience and Engineering at the University of New South Wales,Australia. His research focuses on trust and reputation systems inWSNs. He received the M.Sc. degree in software engineering fromSharif University of Technology, Iran, and is a student member ofIEEE. Contact him at [email protected].

Aleksandar Ignjatovic is a senior lecturer in the School of ComputerScience and Engineering at the University of New South Wales, Aus-tralia. His current research interests include approximation theory,sampling theory and applied harmonic analysis, algorithm designand applications of mathematical logic to computational complexitytheory. He has a PhD in mathematical logic from the University ofCalifornia, Berkeley. Contact him at [email protected].

Elisa Bertino is a professor of computer science at Purdue Univer-sity and serves as research director for the Center for Educationand Research in Information Assurance and Security (CERIAS).Her main research interests include security, privacy, digital identitymanagement systems, database systems, distributed systems, andmultimedia systems. She is a fellow of IEEE and ACM and has beennamed a Golden Core Member for her service to the IEEE ComputerSociety. Contact her at [email protected].

Sanjay Jha is a professor of computer science at the University ofNew South Wales, Australia. His research activities cover a widerange of topics in networking including Wireless Sensor Networks,Adhoc/Community wireless networks, Resilience and Multicastingin IP Networks and Security protocols for wired/wireless networks.He has a PhD in computer science from University of Technology,Sydney, Australia and s a senior member of IEEE. Contact him [email protected].