data quality and query cost in pervasive sensing systemsdavid yates1 data quality and query cost in...
TRANSCRIPT
Data Quality and Query Cost in pervasive sensing systems
David Yates 1
Data Quality and Query Cost in Pervasive Sensing Systems
David J. Yates
Bentley CollegeComputer Information Systems Dept.
Waltham, Massachusetts, [email protected]
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 2
Joint Work With …
Erich NahumIBM T.J. Watson Research Center
19 Skyline DriveHawthorne, New York, USA
James Kurose and Prashant ShenoyDept. of Computer Science
University of MassachusettsAmherst, Massachusetts, USA
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 3
Talk Outline
• Data quality and query cost for pervasive sensing systems• Motivation and introduction
• Pervasive sensing applications• Resource-constrained sensor fields• Sensor networks and backbone networks
• Data management techniques to conserve resources
• Sensor network data server and cache• Query cost, data quality, delay, value deviation• Cost and quality performance
• Summary and Conclusions
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 4
Research Contributions
• Define and quantify data quality and query cost performance in pervasive sensing systems
• Develop policies that approximate sensor field values using cached values for nearby locations
• Prove analytic upper bound on sensor field query rate
• Show cost and quality win-win for pervasive sensing applications for which response time is most important
• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important
• Results are robust with respect to the manner in which the query workload changes
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 5
Pervasive Sensing Applications
• Microsensors, on-board processing, wireless interfaces feasible at very small scale – can monitor phenomena “up close”
• Enables spatially and temporally dense monitoring and control
Pervasive sensing will reveal previously unobservable phenomena
Data center management
Manufacturing engineering
Environmental monitoring
Natural disaster response
Embedded, energy-constrained (wireless, small form-factor), unattended systems
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 6
Sensors Embedded in Infrastructure
• The day after a moderate earthquake jolts the city of San Francisco, building inspectors check on the structural integrity of an office building in the financial district. Sensors embedded in the walls of the building to monitor and record vibration data confirm that the structure is safe to enter. (Intel 2005)
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 7
• Sensor fields (blue), backbone (yellow), monitoring & control applications (red)
• Queries submitted from sensing applications• Replies received from sensor fields• Our focus – Data management at data server
From Sensor Networks to Applications
Light
SoundData server / Gateway
(and cache)
…
Routers & Switches Sensing
Application
…Embedded, energy-constrained (wireless, small
form-factor), unattended systems
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 8
Data Server Node Without Cache
Sensor network query queue
Gateway reply queue
Queries
Replies
Sensor field
Queries
Replies
s
s
s
s
ss
s
ss
s
s
s
l1
l2
li = query location iti = timestamp associated with value sampled in
sensor field at location i
{t1}
{t2}
s = sensor
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 9
Data Server Node Without Cache
Sensor network query queue
Gateway reply queue
Queries
Replies
Sensor field
Queries
Replies
s
s
s
s
ss
s
ss
s
s
s
l1
l2
li = query location iti = timestamp associated with value sampled in
sensor field at location i
Querym
Replym
End-to-end delay occurs between Querym and Replym.Value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue.
{t1}
{t2}
s = sensor
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 10
Sensor network query queue
Gateway query queue
Cache update queue
Cache
Queries
Updates or repliesHit
Gateway reply queue
Miss or Prefetch
Updates
Data Server Node With Cache
Sensor field
s
s
s
s
ss
s
ss
s
s
s
l1
l2
Queries
Replies
l3
li = query location; eli = cache entry for query location
ti = timestamp of value associated with location ivi = value in cache associated with location i
eli = {li,vi,ti} el1, el2
Querym
Replym
For a cache hit or a miss, end-to-end delay occurs between Querym and Replym. Also, value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue.
s = sensor
Locations l1 and l2 are cached in entries el1 and el2
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 11
' '(1 )
mean( )1' , where , andstddev( )(1 )
mean( )1' , where stddev( )(1 )
where is system end-to-end delay, is value divergence,
d
v
Q AS A Dn d v
S Sd dS b
d b Se dD D
v vD cv c De v
SD
and is the relative importance of vs. d vA S D
Query Cost and Data Quality
Cost to query location li is normalized such that
Normalized quality using softmax normalization
min( ) 1 unitliCost
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 12
Caching and Lookup Policies
• All hits• All misses• Simple lookup• Piggyback queries• Greedy age-based lookup• Greedy distance-based lookup• Median-of-3 lookup
no queries
Policies incorporate an age parameter
TT can be 0, finite, or infinite
precise lookupsand queries
approximatelookups andqueries
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 13
Research Contributions
• Defined and quantified data quality and query cost performance in pervasive sensing systems
• Developed policies that approximate sensor field values using cached values for nearby locations
• Prove analytic upper bound on sensor field query rate
• Show cost and quality win-win for pervasive sensing applications for which response time is most important
• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important
• Results are robust with respect to the manner in which the query workload changes
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 14
Lab Trace Data
Trace data from multi-sensor motes deployed at Intel Berkeley lab (Deshpande 2004)
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 15
Lab Environment and Workload
• 2.3 million readings taken over 35+ days• Use readings with largest changes in
value in our simulator (light measured in Lux)
• Changes occur slowly relative to correlated changes (about 1 location every 1.4 seconds)
• But, range of values is large
• Applications determine values for A and T
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 16
Bounded Resource Consumption
• N is set of locations in sensor field• Cache entry for each location used by
multiple queries for periods of T seconds (requires blocking behind pending queries)
• Sensor field query rate can be bounded by:
queries per second
• Proof: Induction on size of N• Sensor field transmissions dominate
resource consumption
NT
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 17
Data Quality Driven by Response Time
Picking a large value of A means delay is more importantthan value deviationConsider normalized quality when A = 0.9
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 18
Cost and Quality Performance when
Response Time drives Quality
0
4
8
12
16
20
0.1 0.3 0.5 0.7 0.9
Quality
Cos
t
All hits
All misses
Simple lookup
Greedy age lookup
Greedy dist lookup
Median-of-3 lookup
Piggyback queries
Trace-driven Changes
A = 0.9, T = 90 secQuery rate = 0.9 lps
Change rate = 1.4 lps
Approximate greedy lookups outperform other policiesThere is a win-win here!
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 19
Delay when Response Time drives Quality
Delay Quality, highly
0
1
2
3
4
5
0.1 0.3 0.5 0.7 0.9
Quality
Del
ay
All hits
All misses
Simple lookup
Greedy age lookup
Greedy dist lookup
Median-of-3 lookup
Piggyback queries
Trace-driven Changes
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 20
Research Contributions
• Defined and quantified data quality and query cost performance in pervasive sensing systems
• Developed policies that approximate sensor field values using cached values for nearby locations
• Proved analytic upper bound on sensor field query rate
• Showed cost and quality win-win for pervasive sensing applications for which response time is most important
• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important
• Results are robust with respect to the manner in which the query workload changes
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 21
' '(1 )
mean( )1' , where , andstddev( )(1 )
mean( )1' , where stddev( )(1 )
where is system end-to-end delay, is value divergence,
d
v
Q AS A Dn d v
S Sd dS b
d b Se dD D
v vD cv c De v
SD
and is the relative importance of vs. d vA S D
Data Quality Driven by Accuracy
Choosing a small value of A means value deviation is moreimportant to data quality than delayFor example, consider normalized quality when A = 0.1
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 22
Cost vs. Quality when Accuracy drives Quality
0
4
8
12
16
20
0.3 0.4 0.5 0.6 0.7
Quality
Cos
t
All hits
All misses
Simple lookup
Greedy age lookup
Greedy dist lookup
Median-of-3 lookup
Piggyback queries
Trace-driven Changes
A = 0.1, T = 90 secQuery rate = 0.9 lps
Change rate = 1.4 lps
There is a tradeoff between cost and quality here
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 23
Value Deviation when Accuracy drives Quality
Trace-driven Changes
0
100
200
300
400
0.3 0.4 0.5 0.6 0.7
Quality
Val
ue d
evia
tion
All hits
All misses
Simple lookup
Greedy age lookup
Greedy dist lookup
Median-of-3 lookup
Piggyback queries
Significant differences in accuracy between policies
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 24
Cost and Quality Trends when Response Time drives
Quality
0
4
8
12
16
20
0.1 0.3 0.5 0.7 0.9
Quality
Cos
t
0
4
8
12
16
20
0.1 0.3 0.5 0.7 0.9
Quality
Co
st
All hits
All misses
Simple lookup
Greedy age lookup
Greedy dist lookup
Median-of-3 lookup
Piggyback queries
0
4
8
12
16
20
0.1 0.3 0.5 0.7 0.9
Quality
Cos
t
Trace-driven ChangesA = 0.9, T = 9 secQuery rate = 90, 9,
and 0.9 lps
Again, there is awin-win here!
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 25
Cost vs. Quality Trends when Accuracy drives Quality
0
4
8
12
16
20
0.3 0.4 0.5 0.6 0.7
Quality
Co
st
All hits
All misses
Simple lookup
Greedy age lookup
Greedy dist lookup
Median-of-3 lookup
Piggyback queries
0
4
8
12
16
20
0.3 0.4 0.5 0.6 0.7
Quality
Cos
t
0
4
8
12
16
20
0.3 0.4 0.5 0.6 0.7
Quality
Cos
t
Trace-driven ChangesA = 0.1, T = 9 secQuery rate = 90, 9,
and 0.9 lps
Same relative performance
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 26
Talk Summary
• Define and quantify data quality and query cost performance in pervasive sensing systems
• Develop policies that approximate sensor field values using cached values for nearby locations
• Prove analytic upper bound on sensor field query rate
• Show cost and quality win-win for pervasive sensing applications for which response time is most important
• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important
• Results are robust with respect to the manner in which the query workload changes
Data Quality and Query Cost in Pervasive Sensing Systems
David Yates 27
Thank You!
• Further questions ???• …
David J. Yates
Bentley CollegeComputer Information Systems Dept.
Waltham, Massachusetts, [email protected]