Observations and RemarksConcerning the Development of Joint WSN Protocol Benchmarks
Jan Beutel, ETH Zurich
Platforms/Protocols
Testbeds/Methodology
Prototype 2nd Generation 3rd Generation
My Background
Long-term (Monitoring) Applications
Observer Network
Target Sensor Network
Past chair TinyOS Testbed Working Group
A Big Success – The Rise of WSN Testbeds
The FlockLab Testbed
Targets
Observers
Targ
et 1
Targ
et 2
Targ
et 3
Targ
et 4
Observer• http://www.flocklab.ethz.ch
• 4 Target HW Architectures
• 31 Node Testbed−Ethernet/WLAN backbone− In- & Outdoor
FlockLab’s Target-Observer Model
• Stateful observer, out-of-band backchannel, multiple services• Fast, distributed tracing and actuation of logic• Deep local storage• Synchronized power tracing• Voltage control• Sensor stimuli and references
• Time synchronization to ~20 µs (NTP)[R. Lim, F. Ferrari, M. Zimmerling, C. Walser, P. Sommer and J. Beutel: FlockLab: A Testbed for Distributed, Synchronized Tracing and Profiling of Wireless Embedded Systems. Proc. 12th Int’l Conf. Information Processing in Sensor Networks (IPSN 2013), Philadelphia, Pennsylvania, USA, p. 153-165, April 2013.]
• GPIO tracing• Power profiles• Serial communication
• GPIO actuation• Voltage control
Targets
Observers
Recent Extensions to FlockLab
Observe
Control
56 kHz, 24 bits100 MHz (285 kHz continuous)
1.8 V – 3.3 V (100 mV steps)
Sub 1 μs synchronization
• High throughput sampling (FPGA-based)• Low jitter, accurate time synchronization• Integration with Cooja simulation toolchain
FlockLab Extension Implementations
Targets
Observers
Observer
Targ
et 1
Targ
et 2
Targ
et 3
Targ
et 4
Data acquisition system (FPGA)
Timesynchronization
(low-power radio)
Gumstix embedded computer (Linux)
ObserverDistributed GPS time references
Observe
Control
• GPIO tracing• Power profiles• Serial communication
• GPIO actuation• Voltage control
FlockLab Testbed Statistics
1
10
100
1000
10000
2012 2013 2014 2015 2016 2017
FlockLab Tests Per Target Platform Tmote
Mica2
TinyNode
Opal
Iris
CC430
ACM2
OpenMote
Wismote
DPP
0102030405060708090
2012 2013 2014 2015 2016 2017
FlockLab Numbers of Active Users DPP
Wismote
OpenMote
ACM2
CC430
Iris
Opal
TinyNode
Mica2
Tmote
Number of Active Users
0
5000
10000
15000
20000
2012 2013 2014 2015 2016 2017
FlockLab Test Service Usage
Serial GPIO tracing GPIO actuation Power profiling Total Tests
0
10
20
30
40
50
60
0
1000
2000
3000
4000
2012 2013 2014 2015 2016 2017
FlockLab Time Testbed Occupied
Time Occupied [h] Time Occupied [%]
Number of Users 190Total Tests 31168Time Occupied [h] 15155Time Occupied [%] 34.8Average Test Duration [min] 29.2
Data taken 2017-05-09
FlockLab Testbed User Demographics
FlockLab Users
Switzerland United States Germany India Sweden China Ireland Singapore United Kingdom Brazil Taiwan Australia France Italy Japan Netherlands Algeria Austria
SenSys 2010 Participants
The RocketLogger
Environmental Sensor Hubextensible measurement platform
Mixed-Signal Capabilitydigital inputs for state monitoring
High Dynamic Range Measurement2x each: 4 nA – 500 mA, and 13 µV – 5.5 V range
Portable Sizein-situ measurements
Remote Web Interfaceremote control and observation Seamless Range Switching
≤1.4 µs, @ ≤430 mV drop
Hardware and Software Open Sourcehttps://rocketlogger.ethz.ch/
[L. Sigrist, A. Gomez, R. Lim, S. Lippuner, M. Leubin and L. Thiele: Measurement and Validation of Energy Harvesting IoT Devices. Proc. Design, Automation & Test in Europe Conference & Exhibition (DATE 2017), Lausanne, Switzerland, March 2017.]
Performance Comparison to Start of the Art
The Application Space (aka Requirements)
• All applications are different.• All (environmental) embeddings are different.
• Today, attention to very high levels of detail is required (the bar is up).• Increasingly, applications have high dynamic range requirements.• Conflicting goals typically exist.
Conflicting Goals
Power consumption
Bandwidth
Cell Phone
But also:• Battery life• Cost• Avg. vs. max speed• Resource usage• Latency• Reliability• Size• …
Reactivity
Smart Phone
High Performance Computing
Data Centers
Sensor Networks
Our System Design Community
• Protocols are typically custom-developed for specific application requirements.
• Often, based on custom-developed, prototype system architectures.• Mix of open and closed source/IP solutions.• Adoption of standards is slow.• Persistent, strong disagreement on a “Golden Standard”.
• From the beginning of the field in the late 90’s to now.• Skepticism in other peoples solutions.• Raw data from tests are usually not made available.
Simple Metrics –Tables Lack Context and Detail
SPOTS Paper
Atmel Datasheet
Mica2 Mica2Dot
CPU: 7.3 MHz 4 MHz
Original Crossbow Datasheets
An Arduous Art – Tables Redone Properly
Mica2
Tmote Sky
Mica2Dot
Imote[J. Beutel: Metrics for Sensor Network Platforms. Proc. ACM Workshop on Real-World Wireless Sensor Networks (REALWSN 06), p. 26-30, June 2006.]
Benchmarks/Metrics
• Must be…… based on measurements… reproducible… system independent… include descriptive statistics… unambiguously citable
• Multi-dimensionality of benchmark results require methods for further exploration (and ranking) of pareto optimal solutions.
• Probably only a bottom-up approach (with obvious utility) will be convincing to a broad audience.
Image source: Wikipedia
WSN Benchmarks Aspects To Consider
Primary Metrics• Bandwidth• Throughput (Goodput)• Latency (short-term, long-term)• Jitter/Burstiness• Error rate (efficiency, reliability)
Ratios as Metrics• Duty-cycle (active vs. idle)• Computed amount vs. energy used• Success rate vs. packet length
Secondary Metrics• Signal-to-Noise Ratio (SNR)
• TX Signal Power, RX Sensitivity, level of ambient noise, etc.
• Spectral aspects• Frequency, modulation, signaling• Spectral efficiency (bits/s/Hz)
• Radio control mechanisms• e.g. radio ready/switch-on timing
But Hey – Where is Power/Energy?
• Power consumption should be factored out (of a benchmark metric).
• Yes it’s important – WSNs is all about power, low-power design, battery lifetime etc.• I too like power and working on low-power aspects. It’s the red tape through my scientific career. Most of my papers give
(extensive) power figures, often power traces.• But power consumption (as a scalar value) is only a consequence of an implementation. Not a property of a
protocol/algorithm.• In practice, few people really know what they are doing. Exact measurements are very hard and require the right
equipment/approach. Results are often displayed in funny/incomplete/doubtful contexts.• If you tune your application longer, power consumption will be lower. Almost always. And there are many tweaks to tune…
• Examples:1. Implementation X runs on battery A for 5 years. Everyone applauds. You change the battery to 3xA and it runs longer.
More applause. But scalar power consumption values do not characterize the application/protocol or whatever else uses the energy (unless everyone agrees to base on the same resources/platform/energy supply).
2. You implement protocol Y on platform B. Measured power ranks this implementation 3rd in the all-time listing. A new chip comes out and you re-implement Y on platform B*. Measured power will be different (lower) but this has nothing to do with the protocol. Just the underlying hardware (Moore's Law).
Open Question
• Who is the intended audience? (Where do we expect to impact?)
• Micro vs. Macro benchmarks?• Both?• Focus on one (first)?• Scenario driven? If yes – which?
• What is the relation of this benchmarking initiative to existing standardization bodies, e.g. IETF?
P. Sommer, ABB Research