control-based load shedding in data stream management systems yicheng tu and sunil prabhakar...
DESCRIPTION
DSMS architecture Network of query operators (O1 – O3) Each operator has its own queue (q1 – q4) Scheduler decides which operator to execute Query results (Q1, Q2) pushed to clients Example systems: Aurora/Borealis STREAMTRANSCRIPT
![Page 1: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/1.jpg)
Control-Based Load Shedding in Data Stream Management Systems
Yicheng Tu and Sunil PrabhakarDepartment of Computer Sciences, Purdue UniversityApril 3, 2006
![Page 2: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/2.jpg)
Data stream management systems• Applications
• Financial analysis• Mobile services• Sensor networks• Network monitoring• More …
• Continuous data, discarded after being processed
• Continuous query• Data-active query-
passive model
User
DSMS
User
User
Data
Data
Data
Data
Data
Query Results
![Page 3: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/3.jpg)
DSMS architecture• Network of query operators (O1 – O3)• Each operator has its own queue (q1 – q4)• Scheduler decides which operator to
execute• Query results (Q1, Q2) pushed to clients• Example systems:
• Aurora/Borealis• STREAM
![Page 4: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/4.jpg)
Qualities in DSMS data processing• Data processing in DSMS is quality-critical
• tuple delay• data loss• sampling rate, window size, …
• Overloading during spikes degraded quality (delay)
• Solution: adjust data loss (i.e., load shedding)• On DSMS side • Eliminating excessive load by dropping data
items • The real problem is:
tuple delay is the major concern: results generated from old data are useless!
How to maintain processing delays while minimizing data loss ?
![Page 5: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/5.jpg)
Related work• Accuracy of aggregate queries under load
shedding (Babcock et al., ICDE04)• Data triage (Reiss & Hellerstein, ICDE05)
• Put data into an asylum upon overloading• LoadStar (Chi et al., VLDB05)• QoS-driven load shedding (Tatbul et al., VLDB03)
• Key questions- When?- How much?- Where?
• Use a load shedding roadmap to decide where• Simple, intuitive algorithm to decide when and how
much
![Page 6: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/6.jpg)
What’s wrong?• Highly dynamic environment is reality
• Bursty data input• Variable unit processing cost
• Fail to capture current system status (queue length) and output (delay)• Delay positively related to queue length
• Examples 1. Unbounded increase of delay• Example 2. Unnecessary data loss
![Page 7: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/7.jpg)
Our approach
• The feedback control loop:• Plant• Monitor• Controller• Actuator
• How it works• Error (e) = desirable output
(yr) - measured output (y) • Focal point: controller,
which maps e to control signal u
• Disturbances
• View load shedding as a control problem • Control: manipulation of system behavior by adjusting system
input • Cruise control of automobiles, room temperature control, etc.
• Open-loop vs. closed-loop (feedback) control
![Page 8: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/8.jpg)
Why feedback control ?
Open loop
Closed-loop
1/a
oimmrromir dddad
ayyddad
ayy
)(1)(
om
im
mr
m
m ddaK
ddaK
daydaK
daKy)(1
1)(1)(1
)(
oir d
Kd
Kyy 11
![Page 9: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/9.jpg)
Challenges• Can we model the system?
• Analytical model may not be easy to derive• System identification: experimental methods
• How to design the controller?• Use control theoretical tools for guaranteed
performance• DSMS-specific problems
• Lack of real-time measurement of output signal ( y )
• How to set control period (T)• Real system evaluation
• we use Borealis in our study
![Page 10: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/10.jpg)
Modeling a DSMS• Borealis data stream manager
• Round robin operator scheduler• FIFO waiting queues• For now, fix the per-tuple processing cost c
• Proposed model: y = qc
where q is the number of outstanding data tuples
• Discrete form: y(k) = q(k-1)c• Denote the input load as fi and system
processing power as fo:
kj
oi jfjfHcTckqky )]()([)1()(
![Page 11: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/11.jpg)
Controller design• Design based on pole placement• Guaranteed performance targeting
• Convergence rate - responsiveness• Damping - smoothness
• The controller:
![Page 12: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/12.jpg)
Control period• Provides complete answer to the question “when
to shed load”? • Arbitrarily set in previous studies• Case-by-case decision with some systematic
rules• In our problem, a tradeoff between:
• Sampling theory (Nyquist-Shannon Theorem): in order to capture the moving trends of the disturbances, higher (shorter) sampling frequency (period) is preferred
• Stochastic feature of output ( y ) and parameter ( c ): more samples are needed longer period is
preferred• The first factor should be given more weight
![Page 13: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/13.jpg)
Experiments• Controller and load shedder implemented in
Borealis• Synthetic (“pareto”) and real (“Web”) data
streams• Small query network with variable average
processing cost
![Page 14: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/14.jpg)
Experimental results• Experiments for
comparison• Aurora – open loop
solution• Baseline – a simple
feedback method• Target delay : 2000ms• Control period : 1
second• Total time: 400
seconds• For both input types,
data loss are almost the same for three load shedding strategies
![Page 15: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/15.jpg)
Future work• Time-varying DSMS model
• For example, time-varying cost c• Possible solution: adaptive control
• Adaptation other than load shedding• New disturbances?• Model changes?
• Other database problems?
distubance disturbance
InternalDynamics
ExternalController
InternalController
ExternalDynamics
![Page 16: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/16.jpg)
Backup - 1
![Page 17: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/17.jpg)
Backup - 2• Lack of robustness
of open-loop solution• More optimistic
policy adapted in Aurora
• Unstable performance
• Our solution is robust• Under input streams
with different burstiness
![Page 18: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/18.jpg)
Backup - 3
![Page 19: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/19.jpg)
Backup - 4 :Model verification• Feed Borealis with synthetic streams
• Input rate: step function or sinusoidal function of time
• Average processing cost is fixed
![Page 20: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/20.jpg)
Summary• Load shedding is an important quality
adaptation method• Ad hoc solutions do not work under
dynamic load and system features• We propose an approach to guide load
shedding in a highly dynamic environment based on feedback control theory
• Initial experimental results performed in a real-world DSMS show promising potential of our approach
![Page 21: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1b207f8b9ab0599952df/html5/thumbnails/21.jpg)
Acknowledgements
• Dr. Song Liu, Hurco Companies, Inc., Indianapolis, IN.
• Prof. Bin Yao, School of Mechanical Engineering, Purdue University
• Ms. Nesime Tatbul, Profs. Ugur Cetentimel, Stan Zdonik, CS Department, Brown University