ahmed helmy

58
UNIVERSITY OF SOUTHERN CALIFORNIA Understanding and Utilizing Multi- Understanding and Utilizing Multi- Dimensional Correlations in Sensor Dimensional Correlations in Sensor Networks: A Protocol Design Perspective Networks: A Protocol Design Perspective Ahmed Helmy Ahmed Helmy Department of Electrical Engineering Department of Electrical Engineering USC Viterbi School of Engineering USC Viterbi School of Engineering University of Southern California University of Southern California [email protected] [email protected] Web: Web: ceng.usc.edu/~helmy ceng.usc.edu/~helmy , Lab: , Lab: nile.usc.edu nile.usc.edu

Upload: joanne

Post on 20-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Understanding and Utilizing Multi-Dimensional Correlations in Sensor Networks: A Protocol Design Perspective. Ahmed Helmy. Department of Electrical Engineering USC Viterbi School of Engineering University of Southern California [email protected] Web: ceng.usc.edu/~helmy , Lab: nile.usc.edu. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Understanding and Utilizing Multi-Understanding and Utilizing Multi-Dimensional Correlations in Sensor Dimensional Correlations in Sensor

Networks: A Protocol Design PerspectiveNetworks: A Protocol Design Perspective

Ahmed HelmyAhmed Helmy

Department of Electrical EngineeringDepartment of Electrical Engineering

USC Viterbi School of EngineeringUSC Viterbi School of Engineering

University of Southern CaliforniaUniversity of Southern California

[email protected]@usc.edu

Web: Web: ceng.usc.edu/~helmyceng.usc.edu/~helmy, Lab: , Lab: nile.usc.edunile.usc.edu

Page 2: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Outline

• Classifying Correlations• How to Utilize Correlations? • Insights for Protocol Design

– Gradient-based Routing (RUGGED)

– Active Query Routing (ACQUIRE)

– Abnormality Detection and Filtering Inserted Data

• WLANs as Sensor Networks (IMPACT)– Sensing access and usage patterns

– Analyzing correlations in wireless users behavior

• Issues

Page 3: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Correlation Classification• Dimensions of Correlation:

– Spatial• Between neighboring nodes

– Temporal• Across time (different samples) for the same node

– Spatio-temporal• Moving target (e.g., vehicle), moving phenomenon (e.g., fire)

• What is correlated?– Sensor readings (e.g., temperature, light, gradients)

– Communication channel (e.g., loss, fading)

– Localization information, …

Page 4: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

How Can We Utilize Correlations?• In-network processing

– Aggregation

– Abstraction/ adaptive fidelity/ zoom-in

• Prediction (model-based), enables Caching• Routing (gradients in time and space, etc.)• Abnormality detection (attacks, failures, mis-calibration)

• Equivalence– Sampling smaller set of nodes (sleep/wake-up)

– Topology control

Page 5: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

RUGGED: RoUting on RUGGED: RoUting on finGerprint Gradients in finGerprint Gradients in

sEnsor NetworkssEnsor Networks

Jabed FaruqueJabed Faruque, Ahmed , Ahmed HelmyHelmy

Department of Electrical EngineeringDepartment of Electrical Engineering

University of Southern CaliforniaUniversity of Southern California

[email protected], [email protected]@usc.edu, [email protected]

URL: URL: http://nile.usc.eduhttp://nile.usc.edu, , http://ceng.usc.edu/~helmyhttp://ceng.usc.edu/~helmy

- Faruque, Psounis, Helmy, IEEE/ACM DCOSS 2005. - Faruque, Helmy, IEEE ICPS 2004.- Faruque, Psounis, Helmy, IEEE/ACM DCOSS 2005. - Faruque, Helmy, IEEE ICPS 2004.

Page 6: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

IntroductionIntroduction

• Sensor networks are envisioned to be widely used for Sensor networks are envisioned to be widely used for habitat habitat and and environmentalenvironmental monitoring, among others monitoring, among others

• Every physical event produces a Every physical event produces a fingerprintfingerprint in the in the environment environment

• Usually Usually diffusion lawsdiffusion laws are inherent property of many are inherent property of many physical phenomena physical phenomena

f(d) f(d) 1/d 1/d, where , where d = distance from the source, d = distance from the source, = diffusion parameter, depends on the type of effect = diffusion parameter, depends on the type of effect

((e.g. for temperature = 1, light = 2))

Page 7: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ExampleExample (of diffusion) (of diffusion): : Isoseismal (intensity) maps ((North Palm Springs earthquake of July 8, 1986North Palm Springs earthquake of July 8, 1986))

Ref.: Southern California Earthquake Center. (http://www.scec.org)

Page 8: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Why Natural Information Gradient Why Natural Information Gradient is Important?is Important?

• This natural information gradient isThis natural information gradient is FREEFREE

• Routing protocols can use it to forward query packetRouting protocols can use it to forward query packet ((greedilygreedily))

- Locate event(s); e.g., fire, nuclear leakage.- Locate event(s); e.g., fire, nuclear leakage.

• Diffusion property is not limited to natural phenomenaDiffusion property is not limited to natural phenomena

- Time gradient- Time gradient

• Existing approaches – Existing approaches – flooding, expanding ring search, flooding, expanding ring search, random-walk random-walk, etc. do not utilize this information gradient, etc. do not utilize this information gradient

Page 9: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Challenges Challenges -Erroneous reading of malfunctioning sensors Erroneous reading of malfunctioning sensors

- Calibration error, obstacles. Cause local max/min- Calibration error, obstacles. Cause local max/min

-Environmental noiseEnvironmental noise

-In real life, sensors unable to measure below certain In real life, sensors unable to measure below certain threshold. So, diffusion curve has finite tailthreshold. So, diffusion curve has finite tail

-Non-uniform sensor distribution (gaps)Non-uniform sensor distribution (gaps)

0

20

40

60

80

100

0 50 100 150 200

distance

mag

nit

ud

e o

f ef

fect

Local MaximumDip

gap

Page 10: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ObjectiveObjectiveDesign an efficient algorithm to locate source(s) in Design an efficient algorithm to locate source(s) in sensor networks, utilizing the natural information sensor networks, utilizing the natural information gradient i.e., the diffusion pattern of the event’s effectgradient i.e., the diffusion pattern of the event’s effect

- Gradient based- Gradient based- Fully distributed- Fully distributed- Robust to node or sensor failure or malfunction- Robust to node or sensor failure or malfunction- Capable of finding multiple sources- Capable of finding multiple sources

Environment ModelEnvironment Model• Event’s effect follows the diffusion lawEvent’s effect follows the diffusion law

• Discontinuity exists in the diffusion curve with finite tailDiscontinuity exists in the diffusion curve with finite tail

• Environmental noiseEnvironmental noise

Page 11: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Basic ProtocolBasic Protocol A node can have two modeA node can have two mode

- flat region mode- flat region mode- gradient region mode - gradient region mode

A node forwards the query to neighbors with its information level A node forwards the query to neighbors with its information level To forward the query, each node uses following algorithm:To forward the query, each node uses following algorithm: 11. Information gradient region follows . Information gradient region follows greedy approachgreedy approach

- Forwards the query to the neighbors if the information level about the - Forwards the query to the neighbors if the information level about the event improves event improves

22. Unsmooth gradient region use probabilistic forward based . Unsmooth gradient region use probabilistic forward based on the on the Simulated AnnealingSimulated Annealing concept concept

- Probabilistic function is - Probabilistic function is ffpp(x) = 1/x(x) = 1/xaa, where x = hop count in the information , where x = hop count in the information

gradient region and ‘ gradient region and ‘a’a’ depends on the diffusion parameter depends on the diffusion parameter ( )

33. Use flooding for the flat (ie. zero) information region. Use flooding for the flat (ie. zero) information region - Decrease latency to reach gradient information region- Decrease latency to reach gradient information region - Handles query in the absence of event- Handles query in the absence of event

Query ID prevents looping Query ID prevents looping Once query is resolved, node uses the Once query is resolved, node uses the reverse path reverse path to replyto reply

Page 12: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

E

QQ

Q’Q’ Q’Q’ Q’Q’

Q’Q’ Q’Q’ Q’Q’

Q’Q’ Q’Q’ Q’Q’

E

QQ

Mn

ng ngng

ng

ngng ng

ngMx

np npnp

np

np np np

np

• All neighbors (All neighbors (nngg) of M) of Mnn have more information, so they forward the have more information, so they forward the

query to their neighbors query to their neighbors

• All neighbors (All neighbors (nnpp) of M) of Mxx have less information, so they forward the have less information, so they forward the

query to their neighbors query to their neighbors probabilisticallyprobabilistically

Page 13: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Query TypesQuery Types• I. Single-value queryI. Single-value query

- Search for a specific value and have a single response- Search for a specific value and have a single response

• II. Global Maxima searchII. Global Maxima search- Search for the maximum value of information in the system- Search for the maximum value of information in the system- Intermediate nodes suppress non-promising replies- Intermediate nodes suppress non-promising replies

• III. Multiple Events detection III. Multiple Events detection (still presents a challenge)(still presents a challenge)- Search for multiple events of same type- Search for multiple events of same type

Performance MetricsPerformance Metrics• Reachability i.e., success probabilityReachability i.e., success probability

- Probability that the query will reach the source- Probability that the query will reach the source

• Overhead in terms of average energy dissipation Overhead in terms of average energy dissipation - Number of transmissions to forward the query and to get the reply - Number of transmissions to forward the query and to get the reply

• For the probabilistic function fFor the probabilistic function fpp(x) = 1/x(x) = 1/xaa, , a a < < is recommended, but is recommended, but

close to close to gives optimal trade-off between reachability and overheadgives optimal trade-off between reachability and overhead

- Reachability ~98% is achievable in presence of noise, gaps and flat region- Reachability ~98% is achievable in presence of noise, gaps and flat region

Page 14: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ComparisonsComparisons

• Existing gradient-based routing protocols can be Existing gradient-based routing protocols can be categorized into categorized into twotwo major approaches major approaches

• Single-path approachSingle-path approach - CADR [Chu2002], Min-hop [Liu2003], … - CADR [Chu2002], Min-hop [Liu2003], …

• Multiple-path approachMultiple-path approach - GRAB [Ye2003], RUGGED [Faruque2004] - GRAB [Ye2003], RUGGED [Faruque2004]

Which Which approachapproach to choose? to choose?

Page 15: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ObjectiveObjective

• Analyze the performance of these general Analyze the performance of these general approaches to route a query approaches to route a query

- Model - Model query success ratequery success rate and and overheadoverhead• Using probability tools Using probability tools

- For - For idealideal and and lossylossy wireless link wireless link conditions conditions

• Simulate the protocols based on these Simulate the protocols based on these approaches in more realistic scenarios approaches in more realistic scenarios

- Also investigate Also investigate path qualitypath quality metric metric

• Compare both approaches using analytical and Compare both approaches using analytical and simulation results simulation results

Page 16: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Brief Description of Routing Brief Description of Routing ApproachesApproachesSingle-path Query forwarding with

look-ahead = 1Multiple-path Query forwarding

17.2 17.217.2 17.2

17.2 18.9 3.1 18.9

17.2 18.9 21.1 21.1

17.2 92.1 21.1 23.8

21.1 6.9 21.1 21.1

23.8 4.1 98.1 23.8 23.8 23.8 23.8

67.0 3.2 21.1 23.8 27.5 27.5 27.5 27.5 27.5 27.5 27.5

18.9 21.1 30.0 27.5 29.0 32.9 32.9 80.5 32.9 32.9

23.8 31.0 32.9 41.5 41.5 41.5 41.5 41.5

23.8 27.8 3.4 41.5 57.4 57.4 57.4 41.5

23.8 27.8 32.9 41.5 57.4 100 57.4 41.5

57.4 41.557.457.441.5

Q

17.2 17.217.2 17.2

17.2 18.9 3.8 18.9

17.2 18.9 21.1 21.1

17.2 92.1 21.1 23.8

21.1 6.9 21.1 21.1

23.8 4.1 98.1 23.8 23.8 23.8 23.8

67.0 3.2 21.1 23.8 27.5 27.5 27.5 27.5 27.5 27.5 27.5

18.9 21.1 30.0 27.5 29.0 32.9 32.9 9.0 32.9 32.9

21.1 23.8 31.0 32.9 41.5 41.5 41.5 41.5 41.5

23.8 27.8 3.4 41.5 57.4 57.4 57.4 41.5

23.8 27.8 32.9 41.5 57.4 100 57.4 41.5

57.4 41.557.457.441.5

Q

S S

Look-ahead = 1

Active Node

Candidate Node

Active Nodes

Page 17: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Variations of Single-path ApproachVariations of Single-path Approach

1. 1. Basic single-path approachBasic single-path approach- Selects a candidate node having maximum Selects a candidate node having maximum information and higher than current active node information and higher than current active node

- Sensitive to Sensitive to local maximalocal maxima

2. 2. Improved single-path approachImproved single-path approach- Selects a candidate node having maximum Selects a candidate node having maximum information information

78

10 15

1218

Depends on Depends on Next Active nodeNext Active node selection policyselection policy

78

9 14

1012

Candidate node

Active node

13

1110

9 14

1012

- Information of the selected node can be less - Information of the selected node can be less than the current active node than the current active node

Page 18: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ComparisonsComparisons --Query Success Rate Query Success Rate (ideal and lossy (ideal and lossy link case,link case,ppcc= = 0.050.05))

Lossy link case - analytical resultIdeal link case - analytical result

• Query success rate of the improved single-path approach drops drastically for lossy links while the multiple-path approach is quite resilient

• ARQ may improve success rate of the improved single-path approach

Page 19: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ComparisonsComparisons -- Overhead Overhead

Overhead of both approaches Energy saving of the multiple-path approach over improved single-path approach

• Multiple-path approach creates extra paths due to probabilistic forwarding, so overhead increases

• Single-path approach uses 1-hop look ahead at every step to decide on the forwarder

• With the increase of malfunctioning nodes, the overhead of the single-path approach increases - The length of the path increases

Page 20: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ResultsResults –– Path Quality Path Quality (ideal link case)(ideal link case)

• Ratio of the average path length due to a routing approach over the shortest path length between a source and a sink

• Multiple-path approach results shorter path which are close to the shortest path

• With the increase of malfunctioning nodes, the path length of the single-path approach increases

Page 21: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ConclusionsConclusions

• Multiple-path approach causes less overhead when a source Multiple-path approach causes less overhead when a source is is < 20hops< 20hops from sink from sink- Multiple-path approach yields shorter pathsMultiple-path approach yields shorter paths- With increase of malfunctioning nodes, the query success With increase of malfunctioning nodes, the query success rate of the multiple-path approach degrades gracefullyrate of the multiple-path approach degrades gracefully- With - With lossy linkslossy links- Query success rate of the single-path approach drops Query success rate of the single-path approach drops drasticallydrastically- Multiple-path approach is quite resilient Multiple-path approach is quite resilient

Page 22: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Future workFuture work

• Combine the benefits of both routing Combine the benefits of both routing approaches in a approaches in a hybrid hybrid routing approachrouting approach• Develop more Develop more adaptive adaptive multiple-path approach multiple-path approach to reduce the number of extra paths due to to reduce the number of extra paths due to probabilistic forwarding probabilistic forwarding • Implementation & evaluation in a test-bedImplementation & evaluation in a test-bed

- - on-going 150 sensor node new test-bed at USCon-going 150 sensor node new test-bed at USC- continued work under the NSF-funded ACQUIRE - continued work under the NSF-funded ACQUIRE projectproject

Page 23: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ACQUIRE: ACtive QUery ACQUIRE: ACtive QUery Forwarding In Sensor Forwarding In Sensor

NetworksNetworksOriginal team:Original team: Narayanan Sadagopan, Bhaskar Narayanan Sadagopan, Bhaskar

Krishnamachari, Ahmed HelmyKrishnamachari, Ahmed Helmy

Current: Sundeep Pattem, Jabed Faruque, Rahul Current: Sundeep Pattem, Jabed Faruque, Rahul Orgaonkar, Yongjin Kim, Jung-Hyun Jun, Sapon Orgaonkar, Yongjin Kim, Jung-Hyun Jun, Sapon

Tanachaiwiwat, Shao-Cheng WangTanachaiwiwat, Shao-Cheng Wang

Department of Electrical EngineeringDepartment of Electrical Engineering

USC Viterbi School of EngineeringUSC Viterbi School of Engineering

University of Southern CaliforniaUniversity of Southern California

URL: URL: http://ceng.usc.edu/~acquirehttp://ceng.usc.edu/~acquire

Funding: NSF NETS NOSS, Intel (equipment)

Page 24: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Develop a model of variation over time(or space) using measurements

Use the model to predict data/readings.Only trigger updates or queries when data/readings deviate from predicted value.

Depending on the data dynamics, we may be able to cache information collected earlier and answer queries without having to trigger new data collection.

Page 25: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ACtive QUery forwarding In sensoR nEtworks (ACQUIRE)*

• A mechanism for answering one-shot, complex queries for replicated data in sensor nets:

– One-shot (vs. continuous): answers are given based explicit queries about current readings.

– Complex (vs. simple): the query can contain several sub-queries. E.g: (x OR y) AND z.

– Replicated data: several sensors might have answer to a sub-query.

• Example: Micro Climate Data Collection– Different sensor modalities

– Give a location where (Temp > 80 degrees OR Humidity > 40%) AND Wind speed > 20 mph

* N. Sadagopan, B. Krishnamachari, A. Helmy, “Active Query Forwarding In Sensor Networks (ACQUIRE)”, AdHoc Networks Journal - Elsevier, Jan 2005 [Earlier version in SNPA ‘03]

Page 26: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Page 27: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Flooding Based Queries (Directed Diffusion)

D

C

E

C

A

C

AB

C

A

1

27

93

4

56

10

8x*

[QA, QC]

[QA, QC]

[QA, QC]

[QA, QC]

[QA, QC]

[QA, QC]

[QA, QC]

[QA, QC] [QA, QC]

[QA, QC]

(a) Flooding of interest query from querier node (sink x*)

x*

D

C

E

C

A

C

AB

C

A

[RC]

[RC][RA]

[RA, RC, RC]

[RA, RC, RC]

[RC]

[RA, RC]

[RA][RA, RA, RC]

(b) Response to query

1

27

93

4

56

10

8

Flooding:• Useful for long standing (continuous) queries• Replicated responses might make it very inefficient.

Page 28: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ACQUIRE

x*

D

C

E

C

A

C

AB

C

A

[QA, QC]

[QA, QC] [QA, RC]

[RA, RC][RA, RC]

(d) Sample trajectory of active query (solid) and response (dashed) in basic ACQUIRE (zero look-ahead)

[RA, RC]

1

27

93

4

56

10

8

Active Query

Complete Response

Update Messages

LEGEND

ACQUIRE

• An active node “refreshes” data from its “neighborhood”.

• The query is then forwarded to a node on the edge of the neighborhood

Page 29: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ACQUIRE

• Key Features

– In-network processing

– Does not rely on geographic information or unicast routing protocol

• Existence of these may considerably improve performance

– d helps us span the space from random walk (d = 0) to flooding (d = D, the network diameter)

Page 30: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ACQUIRE

• Look-ahead parameter, d– Determines the size of the “neighborhood” in hops.– Effects a tradeoff between the number of steps taken to resolve

the query and the energy consumed.– Optimal look-ahead, d*

• Depends on the query rate, refresh rate and the data dynamics (captured by the amortization factor, c)

• May be achieved by localized schemes.• The higher the query rates & lower the data dynamics, the

higher the optimal look ahead.

Page 31: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

0

500

1000

1500

2000

2500

3000

3500

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Look-ahead Parameter (d) [N=1000, M=200]

Ave

rag

e E

ne

rgy

pe

r Q

ue

ry

c=0.01

c=0.02

c=0.03

c=0.04

c=0.05

c=0.06

c=0.07

Performance of ACQUIRE

C is the refresh/query ratio (e.g., 0.01 means refreshonce every 100 queries) [the refresh overhead is amortized over the saving in queries]

Page 32: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

ACQUIRE

• Efficiency

– 60-75% energy savings over Expanding Ring Search (analytical results)

– Order of magnitude savings over flooding.

• Future Work

– Develop ACQUIRE in to a full fledged protocol that actively adapts the ‘d’ parameter for optimal performance

– Evaluation over an experimental sensor network test bed.

– ceng.usc.edu/~acquire

Page 33: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Correlations and Inserted Data

• Main purpose of sensor networks: Collect Data• Sybil attacks may insert false data that affect

operation of sensor networks:– Impersonating multiple IDs (at same/different times)

– Outlier detection alone will not work

• Approach:– Understand normal correlations between data

– Detect outliers based on reference to normal behavior

– Design protocol robust to massive amount of forged data

Page 34: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Single Attacker Scenario ISingle Attacker Scenario I

Data: X from Data: X from location (x,y)location (x,y)--Interesting --Interesting

eventsevents

MobiQuitous 2005 5

Page 35: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Single Attacker Scenario IISingle Attacker Scenario II

Data: X’ from location Data: X’ from location (x,y)(x,y)

--Normal events--Normal events

MobiQuitous 2005 6

Page 36: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Sybil Attack Scenario ISybil Attack Scenario I

Data: WData: Wii from from location (xlocation (xii,y,yii))--Interesting --Interesting

eventsevents

MobiQuitous 2005

Source

Source/forwarder

Attackers (sybil nodes)

Inactive node

Aggregator

Sink

Page 37: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Sybil Attack Scenario IISybil Attack Scenario IIData: WData: Wii’ from location ’ from location

(x(xii,y,yii))--Normal events--Normal events

MobiQuitous 2005

Sourceforwarder

Inactive node

Aggregator

Sink

Attackers (sybil nodes)

Page 38: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

T P H T P H T P H T P H

111 1 1 1

116 .74 .64 .74 1 1 1

122 .83 .42 .91 .84 .67 .80 1 1 1

126 .67 .41 .56 .55 .50 .64 .70 .55 .77 1 1 1

ID 111 116 122 126

Data Correlation (Great duck Data Correlation (Great duck island)island)

T: Temperature, P: Pressure, H: HumidityID: Sensor ID (only 4 neighboring sensors are shown)

Page 39: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Anomaly Relationship Test (ART) Anomaly Relationship Test (ART) ArchitectureArchitecture

Statistical Analysis Module

T*-test (Outlier T*-test (Outlier threshold)threshold)

Correlation-Correlation-coefficient coefficient analysisanalysis

Authentication ModuleDistributed Interactive ProofDistributed Interactive Proof

S. Tanachaiwiwat, A. Helmy, MobiQuitous 2005

Page 40: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Anomaly Relationship Test (Anomaly Relationship Test (ART) ART) ProtocolProtocol

(1)Correlation/T*-test (2)Request valid credential

(3)Response with valid/invalid/no response

Compromised/Failed

source

Verifier (aggregator)

sink

Prover (attacker)

Verifier (forwarder)

Sybil

MobiQuitous 2005 9

Perform at Perform at verifiersverifiers only! only!

(4) Send reportto sink

(5) Cross verify

Page 41: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

SummarySummary• Dynamic sliding window Correlation analysis and T*-Dynamic sliding window Correlation analysis and T*-

Test can Test can alleviatealleviate the attack effectively even under the attack effectively even under full scalefull scale attack from sybil nodes. attack from sybil nodes.

• RemarksRemarks– Recognition of normal/abnormal/malicious events based on Recognition of normal/abnormal/malicious events based on

statistical analysisstatistical analysis– Malicious data insertion can cause the problem to critical Malicious data insertion can cause the problem to critical

mission in WSN mission in WSN – Error is reduced by using Dynamic Sliding Window and Error is reduced by using Dynamic Sliding Window and

careful choice of correlation thresholdcareful choice of correlation threshold

MobiQuitous 2005 22

Page 42: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Total Population: ~ 25,000 studentsWireless Users: ~6000 studentsAccess Points: ~400

WLANs as Sensor Networks

Page 43: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace

Analysis*• Classes of future sensor networks will be attached to

humans• What kinds of correlations exist between users?• Analyze measurements of wireless networks

– Understand Wireless Users Behavior (individual and group)

– Develop models to understand associations and friendship

• Study of relationships and user behavior based on measurements of various University WLANs

* W. Hsu, A. Helmy, “IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis”, USC TR, July ‘05 (Under Submission)

Page 44: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Statistics of Studied Traces

- Four major campuses- Month long traces studied- Total users in the study: over 12,000 users- Total Access Points in the study: over 1,300

Page 45: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Observations: On-line Time

On-off behavior is very common for wireless users. This seems especially true for small handheld devices. There are clear categories of heavy and light users,

the distribution of which is skewed and heavily depends on the campus.

Page 46: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Observations: Visited Access Points (APs)

•Individual users access only a very small portion of APs in the network, less than 35% in all campuses. The long-term mobility of users is highly skewed in terms of time associated with each AP. On average a user spends more than 95% of time at its top five most visited APs.

[percentage of visited APs]

Page 47: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Observations: Visited APs

•The majority of users experience low mobility while using the network. This is even true for portable devices such as PDAs. The actual handoff statistics depend heavily on the environment.

Page 48: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

•We observe clear repetitive patterns of association in wireless network users. Typically, user association patterns show the strongest repetitive pattern at time gap of one day/one week.

Observations: Similarity Index

Page 49: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Observations: Encounters

•In all the traces, the MNs encounter a relatively small fraction of the user population; below 40% in most cases and never reaching above 60% in any case. Except for UCSD trace, on average a MN only encounters 1.88%-5.94% of the whole population. The number of total encounters for the users follows a BiPareto distribution, the parameters of which depends on the campus.

Page 50: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Encounter-graphs

• Definition– When 2 nodes access the same AP at the same time we

call this an ‘encounter’

– The encounter graph has all the mobile nodes as vertices and its edges link all those vertices that encounter each other

Page 51: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Regular Graph- High path length- High clustering

Random Graph - Low path length, - Low clustering

Small World Graph: Low path length, High clustering

- In Small Worlds, a few short cuts contract the diameter (i.e., path length) of a regular graph to resemble diameter of a random graph without affecting the graph structure (i.e., clustering)

0

0.2

0.4

0.6

0.8

1

0.0001 0.001 0.01 0.1 1

probability of re-wiring (p)

Clustering

Path Length

Page 52: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

• Encounters link most of the MNs together in a connected graph:– Albeit each MN encounters only with small portion of the population.

– The encounter graph is a SmallWorld graph

– Even for short time period (1 day) its clustering coefficent, average path length, and connectivity are all close to those for longer traces.

• Friendship between MNs is highly asymmetric. – The distribution for the friendship index is exponential for all the traces,

regardless of the friendship definition (based on time, encouner, or location).

– Among all node pairs there are less than 5% with friendship index larger than 0.01, and less than 1% with friendship index larger than 0.4.

Encounter-graphs and Friendship

Page 53: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Page 54: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

•Top-ranked friends tend to form cliques and low-ranked friends are the key to provide random links and reduce the degree of separation in encounter graph.

Encounter-graphs using Friends

Page 55: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

•Encounters patterns are rich enough to support information diffusion. Specifically, information can be delivered to more than 94% of users within two days. The reachability and average delay do not decrease significantly until at least ~40% of nodes are selfish.

Encounter-based Information Diffusion

Page 56: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Vision: Building Community-wide Wireless/Mobility Library

• Library of measurements from WLANs, mobility and associations from potential wireless societies (e.g., universities, vehicular nets)

• Library of realistic models of user behavior (e.g., mobility, traffic, friendship, encounter models, … )

• Library of benchmarks and guidelines for simulation and evaluation

• How much insight can we get by analyzing the traces?

• Can we use the insight to ‘design’ protocols of the future (not only for evaluation)?

• Currently 20 major universities willing to share their traces

• …. more to come: http://nile.usc.edu/MobiLib (under heavy update)

• If you have traces: [email protected] !

Page 57: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Issues

• How can we model correlations accurately?• How can we further utilize correlations?• Context-aware protocols:

– Phenomenon-aware protocols

– Socially-aware protocols

• Other kinds of correlations:– Sensor Networks Test-beds: correlation between radio

connectivity and phenomenon (e.g., rain)

– …

Page 58: Ahmed Helmy

UNIVERSITY OFSOUTHERN CALIFORNIA

Thank You !

• Related Links– ACQUIRE: ceng.usc.edu/~acquire

– Mobility Library: nile.usc.edu/MobiLib

– Lab: nile.usc.edu

– Homepage: ceng.usc.edu/~helmy