secure data aggregation technique for wireless sensor

14
Purdue University Purdue e-Pubs Cyber Center Publications Cyber Center 1-13-2015 Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Aacks Mohsen Rezvani University of New South Wales, [email protected] Aleksandar Ignjatovic University of New South Wales, [email protected] Sanjay Jha University of New South Wales, [email protected] Elisa Bertino Purdue University, [email protected] Follow this and additional works at: hp://docs.lib.purdue.edu/ccpubs Part of the Engineering Commons , Life Sciences Commons , Medicine and Health Sciences Commons , and the Physical Sciences and Mathematics Commons is document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. Rezvani, Mohsen; Ignjatovic, Aleksandar; Jha, Sanjay; and Bertino, Elisa, "Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Aacks" (2015). Cyber Center Publications. Paper 640. hp://dx.doi.org/10.1109/TDSC.2014.2316816

Upload: others

Post on 21-Dec-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Secure Data Aggregation Technique for Wireless Sensor

Purdue UniversityPurdue e-Pubs

Cyber Center Publications Cyber Center

1-13-2015

Secure Data Aggregation Technique for WirelessSensor Networks in the Presence of CollusionAttacksMohsen RezvaniUniversity of New South Wales, [email protected]

Aleksandar IgnjatovicUniversity of New South Wales, [email protected]

Sanjay JhaUniversity of New South Wales, [email protected]

Elisa BertinoPurdue University, [email protected]

Follow this and additional works at: http://docs.lib.purdue.edu/ccpubs

Part of the Engineering Commons, Life Sciences Commons, Medicine and Health SciencesCommons, and the Physical Sciences and Mathematics Commons

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] foradditional information.

Rezvani, Mohsen; Ignjatovic, Aleksandar; Jha, Sanjay; and Bertino, Elisa, "Secure Data Aggregation Technique for Wireless SensorNetworks in the Presence of Collusion Attacks" (2015). Cyber Center Publications. Paper 640.http://dx.doi.org/10.1109/TDSC.2014.2316816

Page 2: Secure Data Aggregation Technique for Wireless Sensor

Secure Data Aggregation Technique for WirelessSensor Networks in the Presence

of Collusion AttacksMohsen Rezvani, Student Member, IEEE, Aleksandar Ignjatovic, Elisa Bertino, Fellow, IEEE, and

Sanjay Jha, Senior Member, IEEE

Abstract—Due to limited computational power and energy resources, aggregation of data from multiple sensor nodes done at the

aggregating node is usually accomplished by simple methods such as averaging. However such aggregation is known to be highly

vulnerable to node compromising attacks. Since WSN are usually unattended and without tamper resistant hardware, they are highly

susceptible to such attacks. Thus, ascertaining trustworthiness of data and reputation of sensor nodes is crucial for WSN. As the

performance of very low power processors dramatically improves, future aggregator nodes will be capable of performing more

sophisticated data aggregation algorithms, thus makingWSN less vulnerable. Iterative filtering algorithms hold great promise for such a

purpose. Such algorithms simultaneously aggregate data from multiple sources and provide trust assessment of these sources, usually

in a form of corresponding weight factors assigned to data provided by each source. In this paper we demonstrate that several existing

iterative filtering algorithms, while significantly more robust against collusion attacks than the simple averaging methods, are

nevertheless susceptive to a novel sophisticated collusion attack we introduce. To address this security issue, we propose an

improvement for iterative filtering techniques by providing an initial approximation for such algorithms which makes them not only

collusion robust, but also more accurate and faster converging.

Index Terms—Wireless sensor networks, robust data aggregation, collusion attacks

Ç

1 INTRODUCTION

DUE to a need for robustness of monitoring and low costof the nodes, wireless sensor networks (WSNs) are usu-

ally redundant. Data from multiple sensors is aggregated atan aggregator node which then forwards to the base stationonly the aggregate values. At present, due to limitations ofthe computing power and energy resource of sensor nodes,data is aggregated by extremely simple algorithms such asaveraging. However, such aggregation is known to be veryvulnerable to faults, and more importantly, maliciousattacks [1]. This cannot be remedied by cryptographic meth-ods, because the attackers generally gain complete access toinformation stored in the compromised nodes. For that rea-son data aggregation at the aggregator node has to beaccompanied by an assessment of trustworthiness of datafrom individual sensor nodes. Thus, better, more sophisti-cated algorithms are needed for data aggregation in thefutureWSN. Such an algorithm should have two features.

1. In the presence of stochastic errors such algorithmshould produce estimates which are close to the

optimal ones in information theoretic sense. Thus,for example, if the noise present in each sensor is aGaussian independently distributed noise with zeromean, then the estimate produced by such an algo-rithm should have a variance close to the Cramer-Rao lower bound (CRLB) [2], i.e, it should be close tothe variance of the Maximum Likelihood Estimator(MLE). However, such estimation should beachieved without supplying to the algorithm the var-iances of the sensors, unavailable in practice.

2. The algorithm should also be robust in the presenceof non-stochastic errors, such as faults and maliciousattacks, and, besides aggregating data, such algo-rithm should also provide an assessment of the reli-ability and trustworthiness of the data received fromeach sensor node.

Trust and reputation systems have a significant role insupporting operation of a wide range of distributed sys-tems, from wireless sensor networks and e-commerceinfrastructure to social networks, by providing an assess-ment of trustworthiness of participants in such distributedsystems. A trustworthiness assessment at any givenmoment represents an aggregate of the behaviour of theparticipants up to that moment and has to be robust in thepresence of various types of faults and malicious behav-iour. There are a number of incentives for attackers tomanipulate the trust and reputation scores of participantsin a distributed system, and such manipulation canseverely impair the performance of such a system [3]. Themain target of malicious attackers are aggregation algo-rithms of trust and reputation systems [4].

� M. Rezvani, A. Ignjatovic, and S. Jha are with the School of ComputerScience and Engineering, University of New South Wales, Sydney, NSW2052, Australia. M. Rezvani acknowledges the support of CSIRO.E-mail: {mrezvani, ignjat, sanjay}@cse.unsw.edu.au, [email protected].

� E. Bertino is with the Department of Computer Science, Purdue Univer-sity, 305 N. University Street, West Lafayette, IN 47907-2107.E-mail: [email protected].

Manuscript received 30 July 2013; revised 19 Dec. 2013; accepted 2 Apr. 2014.Date of publication 10 Apr. 2014; date of current version 16 Jan. 2015.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TDSC.2014.2316816

98 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2015

1545-5971� 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 3: Secure Data Aggregation Technique for Wireless Sensor

Trust and reputation have been recently suggested asan effective security mechanism for Wireless Sensor Net-works [5]. Although sensor networks are being increas-ingly deployed in many application domains, assessingtrustworthiness of reported data from distributed sensorshas remained a challenging issue. Sensors deployed inhostile environments may be subject to node compromis-ing attacks by adversaries who intend to inject falsedata into the system. In this context, assessing the trust-worthiness of the collected data becomes a challengingtask [6].

As the computational power of very low power pro-cessors dramatically increases, mostly driven by demandsof mobile computing, and as the cost of such technologydrops, WSNs will be able to afford hardware which canimplement more sophisticated data aggregation and trustassessment algorithms; an example is the recent emer-gence of multi-core and multi-processor systems in sensornodes [7].

Iterative Filtering (IF) algorithms are an attractiveoption for WSNs because they solve both problems—dataaggregation and data trustworthiness assessment—usinga single iterative procedure [8]. Such trustworthiness esti-mate of each sensor is based on the distance of the read-ings of such a sensor from the estimate of the correctvalues, obtained in the previous round of iteration bysome form of aggregation of the readings of all sensors.Such aggregation is usually a weighted average; sensorswhose readings significantly differ from such estimate areassigned less trustworthiness and consequently in theaggregation process in the present round of iteration theirreadings are given a lower weight.

In recent years, there has been an increasing amount ofliterature on IF algorithms for trust and reputation sys-tems [8], [9], [10], [11], [12], [13], [14], [15]. The perfor-mance of IF algorithms in the presence of different typesof faults and simple false data injection attacks has beenstudied, for example in [16] where it was applied to com-pressive sensing data in WSNs. In the past literature itwas found that these algorithms exhibit better robustnesscompared to the simple averaging techniques; however,the past research did not take into account more sophisti-cated collusion attack scenarios. If the attackers have ahigh level of knowledge about the aggregation algorithmand its parameters, they can conduct sophisticatedattacks on WSNs by exploiting false data injectionthrough a number of compromised nodes. This paperpresents a new sophisticated collusion attack scenarioagainst a number of existing IF algorithms based on thefalse data injection. In such an attack scenario, colludersattempt to skew the aggregate value by forcing such IFalgorithms to converge to skewed values provided byone of the attackers.

Although such proposed attack is applicable to a broadrange of distributed systems, it is particularly dangerousonce launched against WSNs for two reasons. First, trustand reputation systems play critical role in WSNs as amethod of resolving a number of important problems,such as secure routing, fault tolerance, false data detec-tion, compromised node detection, secure data aggrega-tion, cluster head election, outlier detection, etc., [17].

Second, sensors which are deployed in hostile and unat-tended environments are highly susceptible to nodecompromising attacks [18]. While offering better protec-tion than the simple averaging, our simulation resultsdemonstrate that indeed current IF algorithms are vulner-able to such new attack strategy.

As we will see, such vulnerability to sophisticated collu-sion attacks comes from the fact that these IF algorithmsstart the iteration process by giving an equal trust value toall sensor nodes. In this paper, we propose a solution forsuch vulnerability by providing an initial trust estimatewhich is based on a robust estimation of errors of individualsensors. When the nature of errors is stochastic, such errorsessentially represent an approximation of the error parame-ters of sensor nodes in WSN such as bias and variance.However, such estimates also prove to be robust in caseswhen the error is not stochastic but due to coordinated mali-cious activities. Such initial estimation makes IF algorithmsrobust against described sophisticated collusion attack, and,we believe, also more robust under significantly more gen-eral circumstances; for example, it is also effective in thepresence of a complete failure of some of the sensor nodes.This is in contrast with the traditional non iterative statisti-cal sample estimation methods which are not robust againstfalse data injection by a number of compromised nodes [18]and which can be severely skewed in the presence of a com-plete sensor failure.

Since readings keep streaming into aggregator nodes inWSNs, and since attacks can be very dynamic (such asorchestrated attacks [4]), in order to obtain trustworthinessof nodes as well as to identify compromised nodes weapply our framework on consecutive batches of consecu-tive readings. Sensors are deemed compromised only rel-ative to a particular batch; this allows our framework tohandle on-off type of attacks (called orchestrated attacksin [4]).

We validate the performance of our algorithm by simula-tion on synthetically generated data sets. Our simulationresults illustrate that our robust aggregation technique iseffective in terms of robustness against our novel sophisti-cated attack scenario as well as efficient in terms of thecomputational cost.

Our contributions can be summarized as follows:1

1. Identification of a new sophisticated collusion attackagainst IF based reputation systems which reveals asevere vulnerability of IF algorithms.

2. A novel method for estimation of sensors’ errorswhich is effective in a wide range of sensor faultsand not susceptible to the described attack.

3. Design of an efficient and robust aggregationmethod inspired by the MLE, which utilises an esti-mate of the noise parameters obtained using contri-bution 2 above.

4. Enhanced IF schemes able to protect against sophis-ticated collusion attacks by providing an initial esti-mate of trustworthiness of sensors using inputs fromcontributions 2 and 3 above.

1. An extended version of this paper has been published as a techni-cal report in [19].

REZVANI ET AL.: SECURE DATA AGGREGATION TECHNIQUE FORWIRELESS SENSOR NETWORKS IN THE PRESENCE OF COLLUSION... 99

Page 4: Secure Data Aggregation Technique for Wireless Sensor

We provide a thorough empirical evaluation of effective-ness and efficiency of our proposed aggregation method.The results show that ourmethod provides both higher accu-racy and better collusion resistance than the existingmethods.

The remainder of this paper is organized as follows. Sec-tion 2 describes the problem statement and the assump-tions. Section 3 presents our novel robust data aggregationframework. Section 4 describes our experimental results.Section 5 presents the related work. Finally, the paper isconcluded in Section 6.

2 BACKGROUND, ASSUMPTIONS, THREAT MODEL

AND PROBLEM STATEMENT

In this section, we present our assumptions, discuss IFalgorithms, describe a collusion attack scenario against IFalgorithms, and state the problems that we address inthis paper.

2.1 Network Model

For the sensor network topology, we consider the abstractmodel proposed by Wagner in [20]. Fig. 1 shows ourassumption for network model in WSN. The sensor nodesare divided into disjoint clusters, and each cluster has a clus-ter head which acts as an aggregator. Data are periodicallycollected and aggregated by the aggregator. In this paperwe assume that the aggregator itself is not compromisedand concentrate on algorithms which make aggregationsecure when the individual sensor nodes might be compro-mised and might be sending false data to the aggregator.We assume that each data aggregator has enough computa-tional power to run an IF algorithm for data aggregation.

2.2 Iterative Filtering in Reputation Systems

Kerchove and Van Dooren proposed in [8] an IF algo-rithm for computing reputation of objects and raters in arating system. We briefly describe the algorithm in thecontext of data aggregation in WSN and explain the vul-nerability of the algorithm for a possible collusion attack.We note that our improvement is applicable to other IFalgorithms as well.

We consider a WSN with n sensors Si, i ¼ 1; . . . ; n. Weassume that the aggregator works on one block of read-ings at a time, each block comprising of readings at m

consecutive instants. Therefore, a block of readings isrepresented by a matrix X ¼ fx1;x2; . . . ;xng wherexi ¼ ½x1

i x2i . . . xmi �T , ð1 � i � nÞ represents the ithm-dimensional reading reported by sensor node Si. Letr ¼ ½r1 r2 . . . rm�T denote the aggregate values forinstants t ¼ 1; . . . ; m, which authors of [8] call a reputationvector,2 computed iteratively and simultaneously with asequence of weights w ¼ ½w1 w2 . . . wn�T reflecting thetrustworthiness of sensors. We denote by rðlÞ; wðlÞ theapproximations of r; w obtained at lth round of iteration(l � 0).

The iterative procedure starts with giving equal credibil-ity to all sensors, i.e., with an initial value wð0Þ ¼ 1. Thevalue of the reputation vector rðlþ1Þ in round of iterationlþ 1 is obtained from the weights of the sensors obtained inthe round of iteration l as

rðlþ1Þ ¼ X �wðlÞPn

i¼1 wðlÞi

:

Consequently, the initial reputation vector is rð1Þ ¼ 1nX � 1,

i.e., rð1Þ is just the sequence of simple averages of the read-ings of all sensors at each particular instant. The new weightvector wðlþ1Þ to be used in round of iteration lþ 1 is thencomputed as a function gðdÞ of the normalized belief diver-gence d which is the distance between the sensor readingsand the reputation vector rðlÞ. Thus, d ¼ ½d1 d2 . . . dn�T ,di ¼ 1

m xi � rðlþ1Þ�� ��22and w

ðlþ1Þi ¼ gðdiÞ, ð1 � i � nÞ.

Function gðxÞ is called the discriminant function and itprovides an inverse relationship of weights to distances d.Our experiments show that selecting a discriminant func-tion has a significant role in stability and robustness of IFalgorithms. A number of alternatives for this function arestudied in [8]:

� reciprocal: gðdÞ ¼ d�k;

� exponential: gðdÞ ¼ e�d;

� affine: gðdÞ ¼ 1� kld, where kl > 0 is chosen so that

gðmaxifdðlÞi gÞ ¼ 0.Algorithm 1 illustrates the iterative computation of the

reputation vector based on the above formulas. Table 1shows a trace example of this algorithm. The sensor read-ings in the first three rows of this table are from sensed tem-peratures in Intel Lab data set [21] at three different timeinstants. We executed the IF algorithm on the readings; thediscriminant function in the algorithm was a reciprocal ofthe distance between sensor readings and the current com-puted reputation. The lower part of the table illustrates theweight vector in each iteration as well as the obtained repu-tation values for the three different time instants (t1, t2, t3)in the last three columns. As can be seen, the algorithm con-verges after six iterations.

2.3 Adversary Model

In this paper, we use a Byzantine attack model, where theadversary can compromise a set of sensor nodes and injectany false data through the compromised nodes [22]. We

Fig. 1. Network model for WSN.

2. We find such terminology confusing, because reputation shouldpertain to the level of trustworthiness rather than the aggregate value,but have decided to keep the terminology which is already in use.

100 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2015

Page 5: Secure Data Aggregation Technique for Wireless Sensor

assume that sensors are deployed in a hostile unattendedenvironment. Consequently, some nodes can be physicallycompromised. We assume that when a sensor node is com-promised, all the information which is inside the nodebecomes accessible by the adversary. Thus, we cannot relyon cryptographic methods for preventing the attacks, sincethe adversary may extract cryptographic keys from thecompromised nodes. We assume that through the compro-mised sensor nodes the adversary can send false data tothe aggregator with a purpose of distorting the aggregatevalues. We also assume that all compromised nodes can beunder control of a single adversary or a colluding group ofadversaries, enabling them to launch a sophisticated attack.We also consider that the adversary has enough knowl-edge about the aggregation algorithm and its parameters.Finally, we assume that the base station and aggregatornodes cannot be compromised in this adversary model;there is an extensive literature proposing how to deal withthe problem of compromised aggregators; in this paper welimit our attention to the lower layer problem of false databeing sent to the aggregator by compromised individualsensor nodes, which has received much less attention inthe existing literature.

2.4 Collusion Attack Scenario

Most of the IF algorithms employ simple assumptions aboutthe initial values of weights for sensors. In case of our adver-sary model, an attacker is able to mislead the aggregationsystem through careful selection of reported data values.We use visualisation techniques from [18] to present ourattack scenario.

Assume that 10 sensors report the values of temperaturewhich are aggregated using the IF algorithm proposed in[8] with the reciprocal discriminant function. We considerthree possible scenarios; see Fig. 2.

� In scenario 1, all sensors are reliable and the result ofthe IF algorithm is close to the actual value.

� In scenario 2, an adversary compromises two sensornodes, and alters the readings of these values suchthat the simple average of all sensor readings isskewed towards a lower value. As these two sensornodes report a lower value, IF algorithm penalisesthem and assigns to them lower weights, becausetheir values are far from the values of other sensors.In other words, the algorithm is robust against falsedata injection in this scenario because the compro-mised nodes individually falsify the readings with-out any knowledge about the aggregation algorithm.Table 2 illustrates a trace example of the attack sce-nario on Intel data set; sensors 9 and 10 are compro-mised by an adversary. As one can see, thealgorithm assigns very low weights to these two sen-sor nodes and consequently their contributionsdecrease. Thus, the IF algorithm is robust against thesimple outlier injection by the compromised nodes.

� In scenario 3, an adversary employs three compro-mised nodes in order to launch a collusion attack. Itlistens to the reports of sensors in the network andinstructs the two compromised sensor nodes toreport values far from the true value of the measuredquantity. It then computes the skewed value of thesimple average of all sensor readings and commandsthe third compromised sensor to report such skewed

TABLE 1A Trace Example of Iterative Filtering Algorithm

Fig. 2. Attack scenario against IF algorithm.

REZVANI ET AL.: SECURE DATA AGGREGATION TECHNIQUE FORWIRELESS SENSOR NETWORKS IN THE PRESENCE OF COLLUSION... 101

Page 6: Secure Data Aggregation Technique for Wireless Sensor

average as its readings. In other words, two compro-mised nodes distort the simple average of readings,while the third compromised node reports a valuevery close to such distorted average thus makingsuch reading appear to the IF algorithm as a highlyreliable reading. As a result, IF algorithms will con-verge to the values provided by the third compro-mised node, because in the first iteration of thealgorithm the third compromised node will achievethe highest weight, significantly dominating theweights of all other sensors. This is reinforced inevery subsequent iteration; therefore, the algorithmquickly converges to a reputation which is very closeto the initial skewed simple average, as shown inFig. 2. Table 3 shows the same attack scenario onIntel Lab data set; sensors 8, 9 and 10 are compro-mised by an adversary. As one can see, the algorithmconverges quickly to the readings of sensor 10 whichis essentially equal to the simple average value of thesensors.

In the third scenario, how much the aggregate value isskewed directly depends on the number of compromisednodes which distort the sample average of readings.Moreover, in this scenario, the attacker needs to gain con-trol over at least two sensor nodes; one which will reportsreadings which distort the sample average and anotherone which reports such distorted average. In our experi-ments, we investigate how the behaviour of the IF algo-rithm depends on the number of compromised nodes; seeSection 4.4.

Clearly, the main source of the above vulnerability comesfrom the fact that the algorithm assigns an equal initialweight to all sensor nodes in the first iteration. Moreover,the reciprocal discriminant function has a pole at zerowhich makes the algorithm unstable in the presence of

sensors exhibiting a very small belief divergence at anygiven round of iteration. Therefore, under an attack of thekind described, the reputation value of the first iteration isequal to the simple average of readings, and the second vec-tor of weights is computed based on the distance of eachsensor to the simple average provided by the first iteration.As most of the IF algorithms in the literature make the sameassumption about the initial trustworthiness of sensors,we argue that an adversary with sufficient knowledge ofsuch algorithms can launch an attack as we have describedand deceive the aggregator node.

In the case in which the nodes use cryptography toensure the confidentiality of readings they send to theaggregator, the adversary can still estimate these readingsby sensing the measured quantity using the maliciousnodes.

To address the shortcoming of existing IF methods, wefocus on estimating an initial trust vector based on an esti-mate of error parameters of sensor nodes. After that, we usethe new trust vector as the initial sensor trustworthiness inorder to consolidate the algorithms against an attack sce-nario of the type described.

3 ROBUST DATA AGGREGATION

In this section, we present our robust data aggregationmethod. Table 4 contains a summary of notations used inthis paper.

3.1 Framework Overview

In order to improve the performance of IF algorithmsagainst the aforementioned attack scenario, we provide arobust initial estimation of the trustworthiness of sensornodes to be used in the first iteration of the IF algorithm.Most of the traditional statistical estimation methods for

TABLE 2A Trace Example of a Simple Attack Scenario

TABLE 3A Trace Example of the Proposed Collusion Attack Scenario

102 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2015

Page 7: Secure Data Aggregation Technique for Wireless Sensor

variance involve use of the sample mean. For this reason,proposing a robust variance estimation method in the caseof skewed sample mean is an essential part of ourmethodology.

In the remainder of this paper, we assume that the sto-chastic components of sensor errors are independent ran-dom variables with a Gaussian distribution; however, ourexperiments show that our method works quite well forother types of errors without any modification. Moreover, iferror distribution of sensors is either known or estimated,our algorithms can be adapted to other distributions toachieve an optimal performance.

Fig. 3 illustrates the stages of our robust aggregationframework and their interconnections. As we have men-tioned, our aggregation method operates on batches of con-secutive readings of sensors, proceeding in several stages.In the first stage we provide an initial estimate of two noiseparameters for sensor nodes, bias and variance; details ofthe computations for estimating bias and variance of sen-sors are presented in Sections 3.2 and 3.3, respectively.

Based on such an estimation of the bias and variance ofeach sensor, the bias estimate is subtracted from sensorsreadings and in the next phase of the proposed framework,we provide an initial estimate of the reputation vector calcu-lated using the MLE. The detailed computation operationsof such estimation are described in Section 3.4.

In the third stage of the proposed framework, the initialreputation vector provided in the second stage is used toestimate the trustworthiness of each sensor based on thedistance of sensor readings to such initial reputation vector.This idea will be described in Section 3.5.

3.2 Estimating Bias

We assume that all sensors in WSN can have some error;such error ets of a sensor s is modelled by the Gaussiandistribution random variable with a sensor bias bs andsensor variance ss, e

ts � Nðbs; s2

sÞ. Let rt denotes the truevalue of the signal at time t. Each sensor reading xt

s canbe written as

xts ¼ rt þ ets: (1)

The main idea is that, since we have no access to the truevalue rt we cannot obtain the value of the error ets; however,we can obtain the values of the differences of such errors.Thus, if we define dði; jÞ ¼ 1

m

Pmt¼1 ðxt

i � xtjÞ, we get

dði; jÞ ¼ 1

m

Xmt¼1

�xti � xt

j

� ¼ 1

m

Xmt¼1

eti �1

m

Xmt¼1

etj;

where eti is a random variable with Gaussian distributioneti � Nðbi; s2

i Þ. Let ei ¼ 1m

Pmt¼1 e

ti be the sample mean of

this random variable. As the sample mean is an unbi-ased estimator of the expected value of a random vari-able, we have

dði; jÞ ¼ ei � ej bi � bj:

Let dd ¼ fdði; jÞ : 1 � i; j � ng; this matrix is an estimatorfor mutual difference of sensor bias. In order to obtain thesensor bias from this matrix, we solve the following minimi-zation problem.

minimizeb

Xni¼1

Xi�1

j¼1

bi � bjdði; jÞ � 1

� �2

subject toXni¼1

bi ¼ 0:

(2)

To justify our constraint, it is clear that if the mean ofthe bias of all sensors is not zero, then there would be noway to account for it on the basis of sensor readings. Onthe other hand, bias of sensors, under normal circumstan-ces, comes from imperfections in manufacture and cali-bration of sensors as well as from the fact that they mightbe deployed in places with different environmental cir-cumstances where the sensed scalar might in fact have aslightly different value. Since by the very nature we areinterested in obtaining a most reliable estimate of an aver-age value of the variable sensed, it is reasonable toassume that the mean bias of all sensors is zero (withoutfaults or malicious attacks). We chose the above objectiverather than

Pni¼1

Pi�1j¼1 ðbi � bj � dði; jÞÞ2 to improve the

performance in case when biases can be of very differentmagnitudes.

We introduce a Lagrangian multiplier � and look atextremal values of the following function:

F ð~bÞ ¼Xni¼1

Xi�1

j¼1

bi � bjdði; jÞ � 1

� �2

þ �Xni¼1

bi:

By setting the gradient of F ð~bÞ to zero we obtain a systemof linear equations whose solution is our approximation ofthe the bias values. If we let

dði; jÞ ¼ �dðj; iÞ i < j;dðj; iÞ i � j;

then these equations can be written in the following com-pact form:

Fig. 3. Our robust data aggregation framework.

TABLE 4Notation Used in This Paper

REZVANI ET AL.: SECURE DATA AGGREGATION TECHNIQUE FORWIRELESS SENSOR NETWORKS IN THE PRESENCE OF COLLUSION... 103

Page 8: Secure Data Aggregation Technique for Wireless Sensor

Pni¼1i6¼k

2dði;kÞ2 bi �

Pni¼1i6¼k

2dði;kÞ2 bk � � ¼ 2

Pni¼1i6¼k

1dði;kÞ;

for all k ¼ 1; . . . ; nPni¼1 bi ¼ 0:

8<: (3)

Note that the obtained value of bi is actually an approxi-mation of the sample mean of the error of sensor i, which,in turn is an unbiased estimator of the bias of such a sensor.

3.3 Estimating Variance

In this section, we propose a similar method to estimate var-iance of the sensor noise using the estimated bias from pre-vious section. Given the bias vector b ¼ ½b1; b2; . . . ; bn� andsensor readings fxt

sg, we can define matrices fxtsg and

bb ¼ fbði; jÞg as follows:

xts ¼ xt

s � bs; (4)

bði; jÞ ¼ 1

m� 1

Xmt¼1

�xti � xt

j

�2

¼ 1

m� 1

Xmt¼1

��xti � xt

j

�� bi � bj� ��2

:

By (1) we have xti � xt

j ¼ ðrt þ etiÞ � ðrt þ etjÞ ¼ eti � etj; thus,we obtain

bði; jÞ ¼ 1

m� 1

Xmt¼1

�eti � bi

�2 þ 1

m� 1

Xmt¼1

�etj � bj

�2

� 2

m� 1

Xmt¼1

�eti � bi

��etj � bj

�:

We assume that the sensors noise is generated by inde-pendent random variables;3 as we have mentioned, ourapproximations of the bias bi are actually approximations ofthe sample mean; thus

1

m� 1

Xmt¼1

�eti � bi

��etj � bj

� Covðei; ejÞ ¼ 0

and similarly

bði; jÞ ¼ 1

m� 1

Xmt¼1

�eti � bi

�2 þ 1

m� 1

Xmt¼1

�etj � bj

�2

s2i þ s2

j :

The above formula shows that we can estimate the vari-ance of sensors noise by computing the matrix bb. We alsocompute the sum of variances of all sensors using the fol-lowing Lemma.

Lemma 3.1 (Total Variance). Let xt be the mean of readings intime t, then, using (4) and our assumption that

Pni¼1 bi ¼ 0,

we have

xt ¼ 1

n

Xnj¼1

xtj ¼

1

n

Xnj¼1

xtj;

and the statistic

SðtÞ ¼ n

mðn� 1ÞXni¼1

Xmt¼1

�xti � xt

�2

is an unbiased estimator of the sum of the variances of all sen-sors,

Pni¼1 vi.

We presented the proof of the lemma in [19]. To obtain anestimation of variances of sensors from the matrixbb ¼ fbði; jÞgwe solve the following minimization problem:

minimizev

Xni¼1

Xi�1

j¼1

vi þ vjbði; jÞ � 1

� �2

subject toXni¼1

vi ¼ n

mðn� 1ÞXni¼1

Xmt¼1

xti � xt� �2

:

(5)

Note that the constrain of the minimisation problem comesfrom Lemma 3.1. We again introduce a Lagrangian multi-plier � and by solving the minimization problem, we obtainlinear Equations (6):

Pni¼1i6¼k

1bði;kÞ2 vi þ

Pni¼1i6¼k

1bði;kÞ2 vk þ

�2 ¼

Pni¼1

1bði;kÞ;

for all k ¼ 1; . . . ; nPni¼1 vi ¼ n

mðn�1ÞPn

i¼1

Pmt¼1 xt

i � xt� �2

:

8><>: (6)

3.4 MLE with Known Variance

In the previous sections, we proposed a novel approachfor estimating the bias and variance of noise for sensorsbased on their readings. The variance and the bias of asensor noise can be interpreted as the distance measuresof the sensor readings to the true value of the signal. Infact, the distance measures obtained as our estimates ofthe bias and variances of sensors also make sense fornon-stochastic errors.

Given matrix fxtsg where xt

s � rt þNðbs; s2sÞ and esti-

mated bias and variance vectors b and ss, we propose torecover rt using (an approximate form of) the MLEapplied to the values obtained by subtracting the biasestimates from sensors readings. As it is well known, inthis case the MLE has the smallest possible variance as itattains the CRLB.

From a heuristic point of view, we removed the“systematic component” of the error by subtracting a quan-tity which in the case of a stochastic error corresponds to anestimate of bias; this allows us to estimate the variabilityaround such a systematic component of the error, which, incase of stochastic errors, corresponds to variance. We cannow obtain an estimation which corresponds to MLE for-mula for the case of zero mean normally distributed errors,but with estimated rather than true variances. Therefore, weassume that the expected value rt of the measurements isthe true value of the quantity measured, and is the onlyparameter in the likelihood function. Thus, in the expres-sion for the likelihood function for normally distributedunbiased case,

LnðrtÞ ¼Yni¼1

1

si

ffiffiffiffiffiffi2p

p e�12

ðxti�rtÞ2s2i3. We analyze our estimation method with synthetic correlated data

and the experimental results show that the our method produces excel-lent results even for correlated noise.

104 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2015

Page 9: Secure Data Aggregation Technique for Wireless Sensor

we replace s2i by the obtained variance vi from Equation (6).

Moreover, by differentiating the above formula with respectto rt and setting the derivative equal to zero we get

rt ¼Xni¼1

1viPnj¼1

1vj

xti for all t ¼ 1; . . . ;m: (7)

Equation (7) provide an estimate of the true value of thequantity measured in a form of a weighted average of sen-sor readings, with the sensor readings given a weightinversely proportional to the estimation of their error vari-ance provided by our method:

r ¼Xns¼1

wsxs: (8)

Note that this method estimates the reputation vectorwithout any iteration. Thus, the computational complexityof the estimation is considerably less than the existing IFalgorithms.

3.5 Enhanced Iterative Filtering

According to the proposed attack scenario, the attackerexploits the vulnerability of the IF algorithms which origi-nates from a wrong assumption about the initial trustwor-thiness of sensors. Our contribution to address thisshortcomings is to employ the results of the proposedrobust data aggregation technique as the initial reputationfor these algorithms. Moreover, the initial weights for allsensor nodes can be computed based on the distance of sen-sors readings to such an initial reputation. Our experimentalresults illustrate that this idea not only consolidates the IFalgorithms against the proposed attack scenario, but usingthis initial reputation improves the efficiency of the IF algo-rithms by reducing the number of iterations needed toapproach a stationary point within the prescribed tolerance;see Section 4.2.

4 SIMULATION RESULTS

In this section, we report on a detailed numerical simulationstudy that examines robustness and efficiency of our dataaggregation method.4 The objective of our experiments is toevaluate the robustness and efficiency of our approach forestimating the true values of signal based on the sensorreadings in the presence of faults and collusion attacks. Foreach experiment, we evaluate the accuracy based on RootMean Squared error (RMS error) metric and efficiency basedon the number of iterations needed for convergence of IFalgorithms.

4.1 Experimental Settings

All the experiments have been conducted on an HP PC with3.30 GHz Intel Core i5-2500 processor with 8 GB RAM run-ning a 64-bit Windows 7 Enterprise. The program code has

been written in MATLAB R2012b. Although there are anumber of real world data sets for evaluating reputationsystems and data aggregation in sensor networks such asIntel data set [21], none of them provides a clear groundtruth. Thus, we conduct our experiments by generating syn-thetic data sets. The experiments are based on simulationsperformed with both correlated and uncorrelated sensorerrors. If not mentioned otherwise, we generate syntheticdata sets according to the following parameters:

� Each simulation experiment was repeated 200 timesand then results were averaged.

� Number of sensor nodes is n ¼ 20.

� Number of readings for each sensor ism ¼ 400.

� For statistical parameters of the errors (noise) used tocorrupt the true readings, we consider several rangesof values for bias, variance and covariance of noisefor each experiment.

� The level of significance in K-S test is a ¼ 0:05.In all experiments, we compare our robust aggregation

method against three other IF techniques proposed for rep-utation systems. For all parameters of other algorithms usedin the experiments, we set the same values as used in theoriginal papers where they were introduced.

The first IF method considered computes the trustwor-thiness of sensor nodes based on the distance of theirreadings to the current state of the estimated reputation[8]. We described the details of this approach in Sec-tion 2.2. We investigate two discriminant functions in ourexperiments gðdÞ ¼ d�1 and gðdÞ ¼ 1� kld, and call thesemethods dKVD-Reciprocal and dKVD-Affine, respectively.

The second IF method we consider is a correlationbased ranking algorithm proposed by Zhou et al. in [9].In this algorithm, trustworthiness of each sensor isobtained based on the correlation coefficient between thesensors readings and the current estimate of the truevalue of the signal. In other words, this method givescredit to sensor nodes whose readings correlate wellwith the estimated true value of the signal. Based on thisidea, the authors proposed an iterative algorithm for esti-mating the true value of the signal by applying aweighted averaging technique. They argued that correla-tion coefficient is a good way to quantify the similaritybetween two vectors. Thus, they employed Pearson cor-relation coefficient between sensor readings and the cur-rent state of estimate signal in order to compute thesensor weight. We call this method Zhou.

The third algorithm considered has been proposed byLaureti et al. in [10] and is an IF algorithm based on aweighted averaging technique similar to the algorithmdescribed in Section 2.2. The only difference between thesetwo algorithms is in the discriminant function. The authorsin [10] exploited discriminant function gðdÞ ¼ d�0:5. We callthis method Laureti.

We apply dKVD-Reciprocal, dKVD-Affine, Zhou, Lauretiand our robust aggregation approach to synthetically gener-ated data. Although we can simply apply our robust frame-work to all existing IF approaches, in this paper weinvestigate the improvement which addition of our initialtrustworthiness assessment method produces on the robust-ness of dKVD-Reciprocal and dKVD-Affine methods (We call

4. Unfortunately, we are unable to prove mathematically that oursystem is secure; however we demonstrated higher robustness com-pared to the state of the art by thorough empirical evaluation. We areworking on obtaining a rigorous proof but this appears to be a challeng-ing problem.

REZVANI ET AL.: SECURE DATA AGGREGATION TECHNIQUE FORWIRELESS SENSOR NETWORKS IN THE PRESENCE OF COLLUSION... 105

Page 10: Secure Data Aggregation Technique for Wireless Sensor

them RobustAggregate-Reciprocal and RobustAggregate-Affine,respectively.).

Table 5 shows a summary of discriminant functions forall of the above four different IF methods.

We first conduct experiments by injecting only Gaussiannoise into sensor readings. In the second part of the experi-ments, we investigate the behaviour of these approaches byemulating a simple, non-colluding attack scenario pre-sented in the second case of Fig. 2. We then evaluate theseapproaches in the case of our sophisticated attack scenario.

4.2 Accuracy and Efficiency without an Attack

In the first batch of experiments we assume that there are nosensors with malicious behaviour. Thus, the errors are fullystochastic; we consider Gaussian sensors errors. In order toevaluate the performance of our algorithm in comparisonwith the existing algorithms, we produce the following fourdifferent synthetic data sets.

1. Unbiased error. We considered various distributionsof the variance across the set of sensors andobtained similar results. We have chosen to presentthe case with the error of a sensor s at time t isgiven by ets � Nð0; s s2Þ, considering differentvalues for the baseline sensor variance s2. Fig. 4ashows the results of the MLE with our noise param-eter estimation (steps �1 and �2 in Fig. 3) and theinformation theoretic limit for the minimal varianceprovided by the CRLB, achieved, for example, usingthe MLE with the actual, exact variances of sensors,which are NOT available to our algorithm. As onecan see in this figure, our proposed approach nearlyexactly achieves the minimal possible variance com-ing from the information theoretic lower bound.Furthermore, Fig. 4b illustrates the performance ofour approach for the initial trustworthiness assess-ment of sensors with different discriminant func-tions as well as other IF algorithms. It shows that inthis experiment, the performance of our approachwith both discriminant functions is very similar tothe original IF algorithm.

2. Bias error. In this scenario, we inject bias error tosensor readings, generated by Gaussian distribu-tion with different variances. Therefore, the errorof sensor s in time t is generated by ets �NðN ð0; s2

bÞ; s s2Þ with the variance of the biass2b ¼ 4 and increasing values for variances, where

the variance of sensor s is equal to s s2. Thus,the sensors bias is produced by a zero meanGaussian distribution random variable. Fig. 4cshows the RMS error for all algorithms in this sce-nario. As can be seen in this figure, since all of theIF algorithms, along with our approach, generatean error close to their errors in the unbiased sce-nario, we can conclude that the methods are sta-ble against bias but fully stochastic noise.

3. Correlated noise. The heuristics behind our initial vari-

ance estimation assumed that the errors of sensors

are uncorrelated. Thus, we tested how the perfor-

mance of our method degrades if the noise becomes

correlated and how it compares to the existing meth-

ods under the same circumstances. So in this sce-

nario, we assume that the errors of sensors are no

longer uncorrelated. Possible covariance functions

can be of different types, such as Spherical, Power

Exponential, Rational Quadratic, and Matern; see [23].

Although our proposed method can be applied to all

covariance functions, we present here the results for

the case of the Power Exponential function rði; jÞ ¼e� j i�j j

n . Moreover, the variance of a sensor s is again

set to s2s ¼ s s2. From the corresponding covari-

ance matrix SS ¼ fSij ¼ rði; jÞsisj : i; j ¼ 1 . . .ng,the noise values of sensors are generated from multi-

variate Normal distribution Noise � NðBias;SSÞ. Inthis scenario, we take into account different values of

s for generating the noise values of sensors in order

to analyse the accuracy of the data aggregation

under various levels of noise. Fig. 4d shows the RMS

error of the algorithms for this scenario. As can be

seen in this figure, our approach with reciprocal dis-

criminant function improves dKVD-Reciprocal algo-

rithm for all different values of variance, although

our method with affine function generates very simi-

lar RMS error to the original dKVD-Affine algorithm.

TABLE 5Summary of Different IF Algorithms

Fig. 4. Accuracy for No Attack scenarios.

106 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2015

Page 11: Secure Data Aggregation Technique for Wireless Sensor

Moreover, the scale of RMS error is in general larger

than in scenarios with uncorrelated noise, as one

would expect. This can be explained by our assump-

tion that the sensors noise is generated by indepen-

dent random variables; see Section 3.3.The results of our simulations also show that the use of

our initial variance estimation in the second phase of ourproposed framework as the initial reputation of IF algo-rithms decreases the number of iterations for the algo-rithms. We evaluate the number of iterations for the IFalgorithm proposed in [8] by providing the initial reputa-tion from the results of the our approach for both unbiasedand biased sensors errors. The results of this experimentshow that the proposed initial reputation for the IF algo-rithm improves the efficiency of the algorithm in terms ofthe number of iterations until the procedure has converged.In other words, by providing this initial reputation, thenumber of iterations for IF algorithm decreases approxi-mately 9 percent for reciprocal and around 8 percent foraffine discriminant functions in both biased and unbiasedcircumstances. This can be explained by the fact that thenew initial reputation is close to the true value of signal andthe IF algorithm needs fewer iterations to reach its station-ary point. In the next part of our experiments, we employthis idea for consolidating the IF algorithm against the pro-posed attack scenario.

4.3 Accuracy with Simple Attack Scenario

Lim et al. in [18] introduced an attack scenario against tradi-tional statistical aggregation approaches. We described thescenario in Section 2.4 and the second round of Fig. 2 as asimple attack scenario using a number of compromisednode for skewing the simple average of sensors readings. Inthis section, we investigate the behavior of IF algorithmsagainst the simple attack scenario. Note that the objective ofthis attack scenario is to skew the sample mean of sensorsreadings through reporting outlier readings by the compro-mised nodes.

In order to evaluate the accuracy of the IF algorithmsagainst the simple attack scenario, we assume that theattacker compromises cðc < n) sensor nodes and reportsoutlier readings by these nodes. We generate syntheticallydata sets for this attack scenario by taking into account dif-ferent values of variance for sensors errors as well asemploying various number of compromised nodes. More-over, we generate biased readings for all sensor nodes withbias provided by a random variable with a distributionNð0; s2

bÞwith the variance of bias chosen to be s2b ¼ 4.

Fig. 5 shows the accuracy of the IF algorithms and ourapproach in the presence of such simple attack scenario. Itcan be seen that the estimates provided by the threeapproaches, dKVD-Affine, Zhou and Laureti are significantlyskewed by this attack scenario and their accuracy signifi-cantly decreases by increasing the number of compromisednodes. On the other hand, dKVD-Reciprocal provides a rea-sonable accuracy for all parameter values of this simpleattack scenario (see Fig. 5a). The robustness of this discrimi-nant function can be explained by the fact that the functionsharply diminishes the contributions of outlier readingsthrough assigning very low values of weights to them. In

our sophisticated collusion attack scenario, we exploit thisproperty in order to compromise systems employing suchdiscriminant function.

The results of this experiment clearly show that our ini-tial trustworthiness has no negative effects on the perfor-mance of the IF algorithm with both discriminant functionsin the case of the simple attack scenario. In next section, weshow that how this initial values improve the IF algorithmin the case of proposed collusion attack scenario, while bothdKVD-Affine and dKVD-Reciprocal algorithms are compro-mised against such an attack scenario.

4.4 Accuracy with a Collusion Attack

In order to illustrate the robustness of the proposed dataaggregation method in the presence of sophisticated attacks,we synthetically generate several data sets by injecting theproposed collusion attacks. Therefore, we assume that theadversary employs c (c < n) compromised sensor nodes tolaunch the sophisticated attack scenario proposed in Sec-tion 2.4. The attacker uses the first c� 1 compromised nodesto generate outlier readings in order to skew the simpleaverage of all sensor readings. The adversary then falsifiesthe last sensor readings by injecting the values very close tosuch skewed average. This collusion attack scenario makesthe IF algorithm to converge to a wrong stationary point. Inorder to investigate the accuracy of the IF algorithms withthis collusion attack scenario, we synthetically generate sev-eral data sets with different values for sensors variances aswell as various number of compromised nodes (c).

Fig. 5. Accuracy with a simple attack scenario.

REZVANI ET AL.: SECURE DATA AGGREGATION TECHNIQUE FORWIRELESS SENSOR NETWORKS IN THE PRESENCE OF COLLUSION... 107

Page 12: Secure Data Aggregation Technique for Wireless Sensor

Fig. 6 shows the accuracy of the IF algorithms and ourapproach in the presence of the collusion attack scenario. Itcan be seen that the IF algorithms with reciprocal discrimi-nant function are highly vulnerable to such attack scenario(see Figs. 6a and 6d), while the affine discriminant functiongenerates more robust results in this case (see Fig. 6b). How-ever, the accuracy of the affine discriminant function is stillmuch worse than the previous experiment without the col-lusion attack.

This experiment shows that the collusion attack scenariocan circumvent all the IF algorithms we tried. Moreover, theaccuracy of the algorithms dramatically decreases byincreasing the number of compromised nodes participatedin the attack scenario. As explained before, the algorithmsconverge to the readings of one of the compromised nodes,namely, to the readings of the node which reports valuesvery close to the skewed mean. This demonstrates that anattacker with enough knowledge about the aggregationalgorithm employed can launch a sophisticated collusionattack scenario which defeats IF aggregation systems.

Figs. 6e and 6f show the accuracy of our approach by tak-ing into account the IF algorithm in [8] with reciprocal andaffine discriminant functions, respectively. As one can see,our proposed approach is superior to all other algorithms interms of the accuracy for reciprocal discriminant functions,while the approach has a very small improvement on affinefunction. Moreover, comparing the accuracy of ourapproach in this experiment with the results from no attackand simple attack experiments in Figs. 4 and 5, we can argue

that our approach with reciprocal discriminant function isrobust against the collusion attack scenario. The reason isthat our approach not only provides the highest accuracyfor this discriminant functions, it actually approximatelyreaches the accuracy of No Attack scenarios.

As we described, the main shortcoming of the IF algo-rithms in the proposed attack scenario is that they quicklyconverge to the sample mean in the presence of the attackscenario. In order to investigate the shortcoming, we con-ducted an experiment by increasing the sensor variances aswell as the number of colluders. In this experiment, wequantified the number of iterations for the IF algorithmwith reciprocal discriminant function (dKVD-Reciprocal andRobustAggregate-Reciprocal algorithms). The results obtainedfrom this experiment show that the original version of theIF algorithm quickly converges (after around five iterations)to the skewed values provided by one of the attackers, whilestarting with an initial reputation provided by ourapproach, the algorithms require around 29 iterations, and,instead of converging to the skewed values provided byone of the attackers, it provides a reasonable accuracy.

The results of this experiment validate that our sophisti-cated attack scenario is caused by the discovered vulnerabil-ity in the IF algorithms which sharply diminishes thecontributions of benign sensor nodes when one of the sen-sor nodes reports a value very close to the simple average.

5 RELATED WORK

Robust data aggregation is a serious concern in WSNs andthere are a number of papers investigating malicious datainjection by taking into account the various adversary mod-els. There are three bodies of work related to our research:IF algorithms, trust and reputation systems for WSNs, andsecure data aggregation with compromised node detectionin WSNs.

There are a number of published studies introducing IFalgorithms for solving data aggregation problem [8], [9],[10], [11], [12], [13], [14], [15]. We reviewed three of them inour comparative experiments in Section 4. Li et al. in [12]proposed six different algorithms, which are all iterativeand are similar. The only difference among the algorithmsis their choice of norm and aggregation function. Aydayet al. proposed a slight different iterative algorithm in [13].Their main differences from the other algorithms are: 1) theratings have a time-discount factor, so in time, their impor-tance will fade out; and 2) the algorithm maintains a black-list of users who are especially bad raters. Liao et al. in [14]proposed an iterative algorithm which beyond simply usingthe rating matrix, also uses the social network of users. Themain objective of Chen et al. in [15] is to introduce a “Bias-smoothed tensor model”, which is a Bayesian model ofrather high complexity. Although the existing IF algorithmsconsider simple cheating behaviour by adversaries, none ofthem take into account sophisticated malicious scenariossuch as collusion attacks.

Our work is also closely related to the trust and reputa-tion systems in WSNs. Ganeriwal et al. in [24] proposed ageneral reputation framework for sensor networks in whicheach node develops a reputation estimation for other nodesby observing its neighbors which make a trust community

Fig. 6. Accuracy with our collusion attack.

108 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2015

Page 13: Secure Data Aggregation Technique for Wireless Sensor

for sensor nodes in the network. Xiao et al. in [25] proposeda trust based framework which employs correlation todetect faulty readings. Moreover, they introduced a rankingframework to associate a level of trustworthiness with eachsensor node based on the number of neighboring sensornodes are supporting the sensor. Li et al. in [26] proposedPRESTO, a model-driven predictive data managementarchitecture for hierarchical sensor networks. PRESTO is atwo tier framework for sensor data management in sensornetworks. The main idea of this framework is to consider anumber of proxy nodes for managing sensed data from sen-sor nodes. Lim et al. in [6] proposed an interdependencyrelationship between network nodes and data items forassessing their trust scores based on a cyclical framework.The main contribution of Sun et al. in [27] is to propose acombination of trust mechanism, data aggregation, andfault tolerance to enhance data trustworthiness in WirelessMultimedia Sensor Networks (WMSNs) which considersboth discrete and continuous data streams. Tang et al. in[28] proposed a trust framework for sensor networks incyber physical systems such as a battle-network in whichthe sensor nodes are employed to detect approaching ene-mies and send alarms to a command center. Although faultdetection problems have been addressed by applying trustand reputation systems in the above research, none of themtake into account sophisticated collusion attacks scenariosin adversarial environments.

Reputation and trust concepts can be used to overcomethe compromised node detection and secure data aggrega-tion problems in WSNs. Ho et al. in [29] proposed a frame-work to detect compromised nodes in WSN and then applya software attestation for the detected nodes. They reportedthat the revocation of detected compromised nodes can notbe performed due to a high risk of false positive in the pro-posed scheme. The main idea of false aggregator detectionin the scheme proposed in [30] is to employ a number ofmonitoring nodes which are running aggregation opera-tions and providing a MAC value of their aggregationresults as a part of MAC in the value computed by the clus-ter aggregator. High computation and transmission costrequired for MAC-based integrity checking in this schememakes it unsuitable for deployment in WSN. Lim et al. in[18] proposed a game-theoretical defense strategy to protectsensor nodes and to guarantee a high level of trustworthi-ness for sensed data. Moreover, there is a large volume ofpublished studies in the area of secure tiny aggregation inWSNs [31], [32], [33]. These studies focus on detecting falseaggregation operations by an adversary, that is, on dataaggregator nodes obtaining data from source nodes andproducing wrong aggregated values. Consequently, theyaddress neither the problem of false data being provided bythe data sources nor the problem of collusion. However,when an adversary injects false data by a collusion attackscenario, it can affects the results of the honest aggregatorsand thus the base station will receive skewed aggregatevalue. In this case, the compromised nodes will attest theirfalse data and consequently the base station assumes thatall reports are from honest sensor nodes. Although theaforementioned research take into account false data injec-tion for a number of simple attack scenarios, to the best ofour knowledge, no existing work addresses this issue in the

case of a collusion attack by compromised nodes in a man-ner which employs high level knowledge about data aggre-gation algorithm used.

6 CONCLUSIONS

In this paper, we introduced a novel collusion attack sce-nario against a number of existing IF algorithms. More-over, we proposed an improvement for the IF algorithmsby providing an initial approximation of the trustworthi-ness of sensor nodes which makes the algorithms notonly collusion robust, but also more accurate and fasterconverging. In future work, We will investigate whetherour approach can protect against compromised aggrega-tors. we also plan to implement our approach in a dep-loyed sensor network.

REFERENCES

[1] S. Ozdemir and Y. Xiao, “Secure data aggregation in wireless sen-sor networks: A comprehensive overview,” Comput. Netw.,vol. 53, no. 12, pp. 2022–2037, Aug. 2009.

[2] L. Wasserman, All of Statistics : A Concise Course in Statistical Infer-ence. New York, NY, USA: Springer,.

[3] A. Jøsang and J. Golbeck, “Challenges for robust trust and reputa-tion systems,” in Proc. 5th Int. Workshop Security Trust Manage.,Saint Malo, France, 2009, pp. 253–262.

[4] K. Hoffman, D. Zage, and C. Nita-Rotaru, “A survey of attack anddefense techniques for reputation systems,” ACM Comput. Sur-veys, vol. 42, no. 1, pp. 1:1–1:31, Dec. 2009.

[5] R. Roman, C. Fernandez-Gago, J. Lopez, and H. H. Chen,“Trust and reputation systems for wireless sensor networks,”in Security and Privacy in Mobile and Wireless Networking, S.Gritzalis, T. Karygiannis, and C. Skianis, eds.,Leicester, U.K.:Troubador Publishing Ltd, 2009 pp. 105–128,.

[6] H.-S. Lim, Y.-S. Moon, and E. Bertino, “Provenance-based trust-worthiness assessment in sensor networks,” in Proc. 7th Int. Work-shop Data Manage. Sensor Netw., 2010, pp. 2–7.

[7] H.-L. Shi, K. M. Hou, H. ying Zhou, and X. Liu, “Energy efficientand fault tolerant multicore wireless sensor network: E2MWSN,”in Proc. 7th Int. Conf. Wireless Commun., Netw. Mobile Comput.,2011, pp. 1–4.

[8] C. de Kerchove and P. Van Dooren, “Iterative filtering in reputa-tion systems,” SIAM J. Matrix Anal. Appl., vol. 31, no. 4, pp. 1812–1834, Mar. 2010.

[9] Y. Zhou, T. Lei, and T. Zhou, “A robust ranking algorithm tospamming,” Europhys. Lett., vol. 94, p. 48002, 2011.

[10] P. Laureti, L. Moret, Y.-C. Zhang, and Y.-K. Yu, “Information fil-tering via iterative refinement,” Europhys. Lett., vol. 75, pp. 1006–1012, Sep. 2006.

[11] Y.-K. Yu, Y.-C. Zhang, P. Laureti, and L. Moret, “Decoding infor-mation from noisy, redundant, and intentionally distortedsources,” Physica A: Statist. Mech. Appl., vol. 371, pp. 732–744,Nov. 2006.

[12] R.-H. Li, J. X. Yu, X. Huang, and H. Cheng, “Robust reputation-based ranking on bipartite rating networks,” in Proc. SIAM Int.Conf. Data Mining, 2012, pp. 612–623.

[13] E. Ayday, H. Lee, and F. Fekri, “An iterative algorithm for trustand reputation management,” Proc. IEEE Int. Conf. Symp. Inf. The-ory, vol. 3, 2009, pp. 2051–2055.

[14] H. Liao, G. Cimini, and M. Medo, “Measuring quality, reputationand trust in online communities,” in Proc. 20th Int. Conf. Found.Intell. Syst., Aug. 2012, pp. 405–414.

[15] B.-C. Chen, J. Guo, B. Tseng, and J. Yang, “User reputation in acomment rating environment,” in Proc. 17th ACM SIGKDD Int.Conf. Knowl. Discovery Data Mining, 2011, pp. 159–167.

[16] C. T. Chou, A. Ignatovic, and W. Hu, “Efficient computation ofrobust average of compressive sensing data in wireless sensor net-works in the presence of sensor faults,” IEEE Trans. Parallel Dis-trib. Syst., vol. 24, no. 8, pp. 1525–1534, Aug. 2013.

[17] Y. Yu, K. Li, W. Zhou, and P. Li, “Trust mechanisms in wirelesssensor networks: Attack analysis and countermeasures,” J. Netw.Comput. Appl., vol. 35, no. 3, pp. 867–880, 2012.

REZVANI ET AL.: SECURE DATA AGGREGATION TECHNIQUE FORWIRELESS SENSOR NETWORKS IN THE PRESENCE OF COLLUSION... 109

Page 14: Secure Data Aggregation Technique for Wireless Sensor

[18] H.-S. Lim, G. Ghinita, E. Bertino, and M. Kantarcioglu, “A game-theoretic approach for high-assurance of data trustworthiness insensor networks,” in Proc. IEEE 28th Int. Conf. Data Eng., Apr.2012, pp. 1192–1203.

[19] M. Rezvani, A. Ignjatovic, E. Bertino, and S. Jha, “Secure dataaggregation technique for wireless sensor networks in the pres-ence of collusion attacks,” School Comput. Sci. and Eng., Univ.New South Wales, Kensington, NSW, Australia, Tech. Rep.UNSW-CSE-TR-201319, Jul. 2013.

[20] D. Wagner, “Resilient aggregation in sensor networks,” in Proc.2nd ACMWorkshop Security Ad Hoc Sens. Netw., 2004, pp. 78–87.

[21] (2004). The Intel lab data Data set [Online]. Available: http://berkeley.intel-research.net/labdata/

[22] B. Awerbuch, R. Curtmola, D. Holmer, C. Nita-rotaru, and H.Rubens, “Mitigating byzantine attacks in ad hoc wirelessnetworks,” Dept. Comput. Sci., Johns Hopkins Univ., Baltimore,MD, USA, Tech. Rep., 2004.

[23] M. C. Vuran and I. F. Akyildiz, “Spatial correlation-based collabo-rative medium access control in wireless sensor networks,” IEEE/ACM Trans. Netw., vol. 14, no. 2, pp. 316–329, Apr. 2006.

[24] S. Ganeriwal, L. K. Balzano, and M. B. Srivastava, “Reputation-based framework for high integrity sensor networks,” ACM Trans.Sens. Netw., vol. 4, no. 3, pp. 15:1–15:37, Jun. 2008.

[25] X.-Y. Xiao, W.-C. Peng, C.-C. Hung, and W.-C. Lee, “Using Sen-sorRanks for in-network detection of faulty readings in wirelesssensor networks,” in Proc. 6th ACM Int. Workshop Data Eng. Wire-less Mobile Access, 2007, pp. 1–8.

[26] M. Li, D. Ganesan, and P. Shenoy, “PRESTO: Feedback-drivendata management in sensor networks,” in Proc. 3rd Conf. Netw.Syst. Des. Implementation, vol.3, 2006, pp. 23–23.

[27] Y. Sun, H. Luo, and S. K. Das, “A trust-based framework for fault-tolerant data aggregation in wireless multimedia sensornetworks,” IEEE Trans. Dependable Secure Comput., vol. 9, no. 6,pp. 785–797, Nov. 2012.

[28] L.-A. Tang, X. Yu, S. Kim, J. Han, C.-C. Hung, and W.-C. Peng,“Tru-Alarm: Trustworthiness analysis of sensor networks incyber-physical systems,” in Proc. IEEE Int. Conf. Data Mining,2010, pp. 1079–1084.

[29] J.-W. Ho, M. Wright, and S. Das, “ZoneTrust: Fast zone-basednode compromise detection and revocation in wireless sensor net-works using sequential hypothesis testing,” IEEE Trans. Depend-able Secure Comput., vol. 9, no. 4, pp. 494–511, Jul./Aug. 2012.

[30] S. Ozdemir and H. Cam, “Integration of false data detection withdata aggregation and confidential transmission in wireless sensornetworks,” IEEE/ACM Trans. Netw., vol. 18, no. 3, pp. 736–749,Jun. 2010.

[31] H. Chan, A. Perrig, and D. Song, “Secure hierarchical in-networkaggregation in sensor networks,” in Proc. 13th ACM Conf. Comput.Commun. Security, 2006, pp. 278–287.

[32] Y. Yang, X. Wang, S. Zhu, and G. Cao, “SDAP: A secure hop-by-hop data aggregation protocol for sensor networks,” in Proc. 7thACM Int. Symp. Mobile Ad Hoc Netw. Comput., 2006, pp. 356–367.

[33] S. Roy, M. Conti, S. Setia, and S. Jajodia, “Secure data aggregationin wireless sensor networks,” IEEE Trans. Inf. Forensics Security,vol. 7, no. 3, pp. 1040–1052, Jun. 2012.

Mohsen Rezvani received the bachelor’s andmaster’s degrees both in computer engineeringfrom the Amirkabir University of Technology andthe Sharif University of Technology, respectively.He is currently working toward the PhD degree inthe School of Computer Science and Engineeringat the University of New South Wales, Sydney,Australia. His research interests include trust andreputation systems in WSNs. He is a studentmember of the IEEE and the IEEE Communica-tions Society.

Aleksandar Ignjatovic received the bachelor’sand master’s degrees both in mathematics fromthe University of Belgrade, former Yugoslavia,and the PhD degree in mathematical logic fromthe University of California at Berkeley. Aftergraduation, he was an assistant professor at theCarnegie Mellon University, where he taught for5 years at the Department of Philosophy andsubsequently had a startup in the Silicon Valley.He joined the School of Computer Science andEngineering at the University of New South

Wales (UNSW) in 2002, where he teaches algorithms. His researchinterests include sampling theory and signal processing, applications ofmathematical logic to computational complexity theory, algorithms forembedded systems design, and most recently trust-based data aggre-gation algorithms.

Elisa Bertino is a professor of computer scienceat Purdue University, and serves as the researchdirector of the Center for Education andResearch in Information Assurance and Security(CERIAS) and interim director of Cyber Center(Discovery Park). Previously, she was a facultymember and department head at the Departmentof Computer Science and Communication of theUniversity of Milan. Her main research interestsinclude security, privacy, digital identity manage-ment systems, database systems, distributed

systems, and multimedia systems. She is currently the chair of the ACMSIGSAC and a member of the editorial board of the following interna-tional journals: the IEEE Security and Privacy, IEEE Transactions onService Computing, and the ACM Transactions on Web. She was theeditor-in-chief of the VLDB Journal and the editorial board member ofACM Transactions on Information and System Security and IEEE Trans-actions on Dependable and Secure Computing. She coauthored thebook Identity Management Concepts, Technologies, and Systems. Shereceived the 2002 IEEE Computer Society Technical AchievementAward for outstanding contributions to database systems and databasesecurity and advanced data management systems, and the 2005 IEEEComputer Society Tsutomu Kanai Award for pioneering and innovativeresearch contributions to secure distributed systems. She is a fellow ofthe IEEE.

Sanjay Jha received the PhD degree from theUniversity of Technology, Sydney, Australia. Heis a professor and head of the Network Group atthe School of Computer Science and Engineeringat the University of New South Wales. He is anassociate editor of the IEEE Transactions onMobile Computing. He was a member-at-large,Technical Committee on Computer Communica-tions (TCCC), IEEE Computer Society for anumber of years. He has served on programcommittees of several conferences. He was the

cochair and general chair of the Emnets-1 and Emnets-II workshops,respectively. He was also the general chair of the ACM Sensys 2007symposium. His research activities include a wide range of topics in net-working including wireless sensor networks, ad hoc/community wirelessnetworks, resilience/quality of service (QoS) in IP networks, and active/programmable networks. He has published more than 100 articles inhigh-quality journals and conferences. He is the principal author of thebook Engineering Internet QoS and a coeditor of the bookWireless Sen-sor Networks: A Systems Perspective. He is a senior member of theIEEE and the IEEE Computer Society.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

110 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 1, JANUARY/FEBRUARY 2015