memento: efficient monitoring of sensor network health
DESCRIPTION
Memento: Efficient Monitoring of Sensor Network Health. Stanislav Rost and Hari Balakrishnan CSAIL, MIT SECON, September 2006. “Sed quis custodiet ipsos custodes?” “But who watches the watchmen?” - Juvenal, Satire VI. Goals and Challenges of Monitoring. Goals - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/1.jpg)
Memento:Efficient Monitoringof Sensor Network HealthStanislav Rost and Hari BalakrishnanCSAIL, MIT
SECON, September 2006
![Page 2: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/2.jpg)
“Sed quis custodiet ipsos custodes?”
“But who watches the watchmen?”
- Juvenal, Satire VI
![Page 3: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/3.jpg)
Goals and Challenges of Monitoring
• Goals– Accuracy: minimize false alarms, maintenance– Timeliness: repair quickly, preserve sensor
coverage– Efficiency: in power, bandwidth, to help longevity
• Challenges– Packet loss: inherent to wireless medium– Dynamic routing topology:
adapts to link quality– Resource constraints:
internal monitoring is not the primary application
![Page 4: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/4.jpg)
Memento Monitoring Suite Breakdown
• Failure detection:which nodes have failed?
• Collection protocol: gathering network-wide health status
• Watchdogs• Logging• Remote inspection
![Page 5: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/5.jpg)
Typical Sensornet Framework
• Assume routing protocol,optimized by path metric to root
• Example metric: ETX– expected transmission
count to reliably transfer a packet root node
• Periodic communication– Protocol
advertisements– Collection sweeps
(1 per sweep period)
Data collection server
Gateway node
Sensornodes
![Page 6: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/6.jpg)
Two Modules of Memento• Failure detectors
– Track communication of a subset of neighbors
– Detect failures– Form liveness beliefs
• Collection protocol– Send liveness
updates to the root– Aggregate along the
way, vote on status by aggregation
?
??
? ??
Fail-stop failure node permanently stops communicating (until reset or repaired)
Heartbeats periodic beacons of other protocols; or Memento’s own
Known period of transmission Packets include the source address
Liveness Update a bitmap s.t. bit k = 1 some node in subtree thinks node k is alive
Scope of Opportunistic Monitoring all? children? some?
Failure Set Calculation at gateway,[roster – live]
![Page 7: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/7.jpg)
Part I: Failure DetectionProblem Statement
• Given a maximum false positive rate parameter
develop a scheme which minimizes detection time
• Using distributed failure detection:every node is a participantmay monitor a number of other nodes
![Page 8: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/8.jpg)
Adaptive Failure Detectors
• Declare neighbor failed after an abnormally long gap in sequence of heartbeat arrivals
• Estimate “Normal” loss burst –vs- “Abnormal” loss burst from each neighbor
• May produce false positives: beliefs that a node has failed when it is alive
![Page 9: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/9.jpg)
Variance-Bound Detector
• Samples and estimates mean, stdev of loss burst
• Provides a guarantee on rate of false positives
• Based on one-sided Chebyshev’s inequality:
FPreq: goal for maximum false positive rate
Gi: number of consecutive missed heartbeats from neighbor i
HTOi: Heartbeat “TimeOut” (in hb) indicating failure1 reqi i i
req
FPHTO G
FP
2
1P
1XX X tt
![Page 10: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/10.jpg)
Loose Bounds Lead to Long Timeouts
• Chebyshev’s inequality provides the worst case for the extremes
Example data set:PMF of loss burst durations from a neighbor
target FP rate = 5%mean = 4.61
stdev = 3.76
Heartbeat timeout =
22
![Page 11: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/11.jpg)
Empirical-CDF Detector
• Samples gap lengths, maintains counters
• If we want FPreq =X%, calculate an HTO that has less than X% chance of occurring
0
[ ]
min s.t. (1 )[ ]
c
ij
i reqi
k
Count G j
HTO c FPCount G k
FPreq: goal for maximum false positive rateGi: number of consecutive missed heartbeats from neighbor iHTOi: Heartbeat “TimeOut” (in hb) indicating failureCount: vector of counters of occurrences of each gap length
![Page 12: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/12.jpg)
Same Example, Better Bound
0
[ ]
min s.t. 0.95[ ]
c
ij
ik
Count G j
cCount G k
Example data set:CDF of loss burst durations from a neighbor
target FP rate = 5%
Heartbeat timeout =
12
• Bounds the timeout by the outliers within the requisite percentile
![Page 13: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/13.jpg)
Testing the Tradeoffs on theExperimental Testbed
• Deployed 55-node testbed
• 16,076 square feet
• Implemented in TinyOS v1.4
• Runs on mica2 motes, crickets, EmStar
![Page 14: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/14.jpg)
Failure Detector Comparison
• 45 minutes in duration
• Pick X nodes randomly (X {2,4,6,8})• Schedule their failure at a random time
sweep=30 seconds
hb=10 seconds
– Routing stability threshold = 1.5
• Run same failure schedule for all detector algos
• Routing = ETX-based tree
![Page 15: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/15.jpg)
Contenders
• Direct-Heartbeat– Sends descendant’s liveness bitmaps to
the root, with aggregation a la TinyDB– If root hears no update about X, assumes
X is dead
• Variance-Bound, 1% FP target– Each node monitors its children
• Empirical-CDF, 1% FP target– Each node monitors its children
• Opportunistic Variance-Bound, 1% FP– Each node monitors any neighbor whose
packet loss < 30%
![Page 16: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/16.jpg)
Evaluating Failure Detectors:False Positive Rate
![Page 17: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/17.jpg)
Evaluating Failure Detectors:Detection Time
![Page 18: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/18.jpg)
Explaining the Results
• Empirical-CDF has trouble during the learning phase
• The learning happens whenever a node gets new children – After another node has failed– After routing reconfiguration
• Opportunistic monitoring inflates the detection time– Neighbors with higher loss need more
time to achieve confidence in failure
![Page 19: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/19.jpg)
Meeting the False Positive Guarantee
• How far can we push our FP target?
![Page 20: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/20.jpg)
Tradeoffs and Limits of Guarantees
![Page 21: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/21.jpg)
Tradeoffs and Limits of Guarantees
![Page 22: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/22.jpg)
Take-Home Lessons
• 5x patience gets you 1000x confidence
• Neighborhood opportunism is a must to make failure detection practically useful in wireless environments
![Page 23: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/23.jpg)
Part II:Collecting the Network Status
Aggregation
• TinyAggregation[TinyDB]
![Page 24: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/24.jpg)
Our Collection Protocol
Memento
Aggregation
• Parent caches result
• Node sends an update only if its result or parent changes
![Page 25: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/25.jpg)
Collection Protocol Summary
• Uses caching to suppress unnecessary communication
• Network-associative cache coherence is tricky, we propose mechanisms to maintain it
• Saves 80-90% bandwidth relative to state-of-the-art
• More sensitive to the rate of change in the update than to routing reconfigurations
![Page 26: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/26.jpg)
Conclusions• Memento collection protocol is very
efficient in terms of bandwidth/energy, and well-suited for monitoring
• [In paper] Monitoring more neighbors does not lead to better performance
• New failure detectors, based on application needs
• Need to use neighborhood opportunism to get acceptably low false positive rate
![Page 27: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/27.jpg)
End of Talk
• Questions?
![Page 28: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/28.jpg)
Memento’s Approach to Cache Coherence• Children switch away?
snoop routing packets with parent address
• Node failures failure detectors clear the cache
• Parent cache out of sync? snoop parent updates, see if consistent with your results parents advertise a vector of result sequence #’s
• Finite cache slots for child results? augment routing to subscribe to parents
![Page 29: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/29.jpg)
Collection Protocol Modules
![Page 30: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/30.jpg)
Collection Protocol Evaluation
• Sensitivity to rate of switching parents?– Use ETX, vary the stability threshold
(the minimum improvement in “goodness” necessary to switch parents)
![Page 31: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/31.jpg)
Collection Protocol Performance vs Routing Stability
![Page 32: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/32.jpg)
Collection Protocol Evaluation
• Sensitivity to the rate of change in node results?– Fix the topology– Vary the fraction of nodes whose result
changes every sweep
![Page 33: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/33.jpg)
Collection Protocol Performance vs Rate of Change of State
![Page 34: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/34.jpg)
Status Collection Byte Overhead
![Page 35: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/35.jpg)
Related Work
• Sympathy for the Sensor Network Debugger[Ramanathan, Kohler, Estrin | SENSYS ‘05]
• Nucleus[Tolle, Culler | EWSN ‘05]
• TiNA: Temporal Coherency-Aware In-Network Aggregation[Sharaf, Beaver, Labrinidis, Chryanthis | MobIDE ‘03]
• On Failure Detection Algorithms in Overlay Networks[Zhuang, Geels, Stoica, Katz | INFOCOM ‘05]
• Unreliable Failure Detectors[Chandra, Toueg | JACM ‘96][Gupta,Chandra,Goldszmidt | PODC ’01]
![Page 36: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/36.jpg)
More Memento
• Symptom alerts: similar to liveness bitmaps– Watchdogs: core health metrics crossing
danger thresholds trigger alarms
• Logging to stable storage, to neighbors• Inspection:
– Cached alert aggregates serve as “breadcrumbs” on the way back to the sources, prune query floods
• Example app: detecting network partitioning– Node X dies, becomes point of fracture– Its parent P sends bitmap of children as
“partitioned”
![Page 37: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/37.jpg)
Future Work
• State management in ad-hoc networks– Dynamic, yet stateful protocols– Working on: management of transfers of
large samples
• Static statistical properties of non-mobile deployments– Leverage models of group sampling to
reduce redundancy, provide load-balancing– Working on: statistical modeling, building
local models representative of global behavior
![Page 38: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/38.jpg)
Simple Failure Detectors
• “Direct-Heartbeat”– A neighbor is alive if one or more of its
heartbeats is received since last sweep
– A neighbor has failed if failure detector has missed last K consecutive heartbeats
![Page 39: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/39.jpg)
Dilemma: False Failure Alarms vs Detection Time
• Choose network-wide K given CDF of loss bursts:
![Page 40: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/40.jpg)
Memento Performance Summary
• Intended for non-mobile deployments• When node status fluctuates,
approaches the costs of the cache-less scheme
• Results so far for a long, narrow tree– 6 hops max depth– 2.5 average children
![Page 41: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/41.jpg)
Scope of the Opportunism
• Which neighbors are worth monitoring?
![Page 42: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/42.jpg)
All Children
Picking Neighbors to Monitor
Pick neighbors whose heartbeat delivery probability > X
![Page 43: Memento: Efficient Monitoring of Sensor Network Health](https://reader030.vdocuments.us/reader030/viewer/2022013101/56814dc7550346895dbb1948/html5/thumbnails/43.jpg)
Tradeoffs in Failure Detection