real-time databasescs227b/slides/0408rtdbs.pdf• not all queries are equal • another form of qos...
TRANSCRIPT
Timeline
1:40 - 1:50: Introduction1:50 - 3:00: Real-Time Databases/Scheduling3:00 - 3:10: Break3:10 - 4:00: Operator Scheduling in Aurora4:00 - 4:25: Discussion4:25 - 4:30: Comments
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data: consistency and
validity• Conclusions
References
• http://www.fpa.org/newsletter_info2584/newsletter_info.htm (info on scud missiles)
• http://www.fas.org/spp/starwars/gao/im92026.htm (info on Patriot Missile System)
Imagine this…
• We are at war with Iraq• Our soldiers find a potential target• Military intelligence consults a database
to determine course of action
Imagine this…
• We are at war with Iraq• Air control system constantly monitors
hundreds of aircraft and records them in a database
• Intelligence systems constantly query the database for potential threats
Suddenly…
• Hundreds of missiles are launched• We suspect some are nuclear• Need info which will allow us to
determine a course of action• Need this info to make rapid decision• The costs of indecision are catastrophic
What could go wrong?
• Limited number of missiles we can intercept
• Once they’re launched, we have limited time to react
• Our traditional database is slowed by less critical queries
• Finally, our queries may not be answered in time due to system load
We need a system that:
• Handles time-sensitive queries• Returns only temporally valid data• Supports priority scheduling
• Solution: Real-Time Databases!
Real-Time Databases and Streams
• Scheduling– Streams: priority based on QoS optimization– Real-Time: priority based on deadlines
• Load Shedding– Streams: dropping tuples from queues– Real-Time: missing deadlines
• Freshness of data:– Streams: not guaranteed– Real-Time: resample
Real-Time Databases and Streams
• Scheduling– Streams: priority based on QoS optimization– Real-Time: priority based on deadlines and user-
supplied values• Load Shedding
– Streams: dropping tuples from queues– Real-Time: missing deadlines, dropping transactions
• Freshness of data:– Streams: not guaranteed– Real-Time: resample
Real-Time Databases
• An extension to traditional databases• Motivated by class of applications that
require reliable responses• Predictable (not necessarily fast)
Real-Time Database Features
• Priority – Classification of transactions– Assigns value to transactions
• Deadlines– Transactions specify explicit time requirements– Transaction scheduling takes time requirements into
account– Predictability that transactions will complete by
deadline or not at all
Transactions and Streams
• Operation on the database that perform combinations of reads/writes in an atomic step– Queries are a subset of transactions
• Streams are read-only data (may create new tuples)
• Data Consistency
Characteristics of Transactions
• Manner in which transactions use data• Nature of time constraints• Significance of executing a transaction by its
deadline– consequence of missing specified time constraints
Transaction Classification
• Effect of missing transaction deadlines• Value to user is dependent on timeliness:
– Soft: have some value after deadline– Firm: have no value after deadline– Hard: have negative value after deadline
• Special case: no deadline• Idea for Streams: Queries have periodic
deadlines
Scheduling and Streams
• Streams: schedules queries in terms of QoS
• Real-Time Databases: schedule transactions in terms of scheduling policy
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies • Resource Allocation• Properties of Data: consistency and
validity• Conclusions
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data: consistency and
validity• Conclusions
Scheduling Policies
• Earliest deadline first (PMM, PAQRS)• Highest value first• Highest value per unit computation time
first• Longest executed transaction first
PMM
• Priority Memory Management• Admission Control
– Decide if we run a query.• Memory Allocation
– How much memory does each running query get.
Memory Allocation: Two Strategies
• Max– Queries get their maximum required memory
or no memory at all.• MinMax
– High priority queries get their maximum required memory and low priority queries get their minimum.
Admission Control
• Goal: minimize the miss ratio (number of queries that miss their deadline/total queries).
• MultiProgramming Level (MPL) = number of queries to run.
• Optimize system resource use: optimal MPL.
Relating MPL to Streams
• Real-Time: One time queries• Stream: Continuous Queries
• Possibilities for future DSMS:– Using MPL for QoS
Oh no! Missiles are launched again.
• We are running two types of queries:– Query1 – Where should CNN’s cameras face
to see the missile?– Query2 – Should we shoot the missile down?
• Queries of type 2 are obviously more important, but how does the db know?
• Consider: Applications for relative query values in stream systems.
PAQRS – extension of PMM
• Priority Adaptation Query Resource Scheduling.
• PMM only minimizes miss ratio for the entire system.
• We would like to be able to specify a ratio between query classes for missed deadlines.
• RelMissRatio (Relative Miss Ratio) = {99:1} Query1:Query2.
Why do we care?
• Think of the missile example.– Same problems still exist in stream systems.
• Potential Stream Additions:– Relative Priority Scheduling.
• Not all queries are equal• Another form of QoS
– Periodic Query Deadlines.• Deadlines for continuous queries
Bias Control
• Puts queries into two groups:– Regular – Queries run with normal priority– Reserve – Queries run with priority lower than
regular.• Manages groups on a per query basis
– Each class gets RegQuota regular queries.– The rest have to run as reserve queries.
Relative Weights
Weight should reflect a class’ RelMissRatio.Weighti = (1/RelMissRatioi)/Σj(1/RelMissRatioj)Weightcnn = (1/99)/(1/99 + 1) = .01Weightmis = (1)/(1/99 + 1) = .99
Bias Control using Relative Weights
WeightedMissRatio = Σ(Weighti * MissRatioi)All terms are equal when the ratio is correct.
WeightMissRatioex=(.01*99x%) + (.99*x%)WeightMissRatioex=.99x% + .99x%
Back to Missiles and CNN
• The actual miss ratio is not correct, the miss ratio is 50:50!
• RegQuotainew = RegQuotai
old * {(Weighti * MissRatioi)/ (WeightedMissRatio/NumClasses)}
Missiles and CNN Calculations
WeightedMissRate=(.01*.50)+(.99*.50)=.5.005 ≠ .495
RegQuotacnnnew=RegQuotacnn
old * (.01*.50)/(.5/2)RegQuotacnn
new=RegQuotacnnold *.02 (98% less)
RegQuotamisnew=RegQuotamis
old * (.99*.50)/(.5/2)RegQuotamis
new=RegQuotamisold *1.98 (98% more)
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation • Properties of Data:consistency & validity• Conclusions
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data:consistency & validity• Conclusions
Essence of Real Time
• Although adaptive systems give better throughput, IT DOESN’T MATTER!
• RT is about dependability, not throughput. 1% miss rate is (usually) unacceptable.
• Throughput can be handled (usually) with extra hardware (i.e. money). Dependability needs a special design.
Resources in Databases
• Logical– Locks
• Physical– CPU(s)– Memory
• Cache• Work Area
– I/O Bandwidth• Disks & Storage• Network for
Distributed Processing
– Time...
Cost of a Transaction
• Waiting for locks to release • Work memory needed (e.g. O(n) for in-memory
hash join, O(sqrt(n)) for disk-assisted)• I/O amount (e.g. worst case join: multiplication)• CPU needed to process• Cost of aborting a transaction (negligible for
queries)
If success cannot be guaranteed, don't start!
Physical Resources – Now and Then
4 (16)10# Disks
616.7Disk Latency (ms)
Latency is Forever…
8192 (1GB)256Disk Cache (kB)120 (1TB)1Disk Size (GB)
10010I/O Bandwidth (MB/s)80020Memory Buffers (MB)
2 X 250040CPU Speed (MIPS)
2003 (opt.RAID)1995 (Paper)
Memory Allocation Strategies (I)
• Max– all memory needed or nothing (don't admit)
• MinMax– all memory needed for high-priority– min memory needed for low-priority
• M&M– feedback-based allocation – adaptive– small amount of memory set aside for small
transactions
Memory Allocation Strategies (II)
• Multiclass Dependent– Small get all the memory they need– Large get a minimum amount– Medium get according to level load
• Classes are:– Small – less than 10% of memory– Large – more than memory– Medium – between them.
Multi-Class Resource Allocation
Single Queue Multiple Queues
S S M LL S M LM S M L
SP
Resources
LPMP
Resources
MPSPSP LP
LP
Locking Strategies for Transactions
• Wait patiently…– Bad idea – can wait forever for lower priority or
deadlock
• Upgrade priority of lock holder– Will complete less important job and then continue
• Abort transaction with lower priority– Need to asses time of abort…
RT systems are not tolerant about lock delays!
Locks for Queries (Cursors)
• Grab all locks– Long wait, holds other transactions
• Disregard locks (“dirty read”)– May read inconsistent data or data to be discarded
• Read only committed data (“committed read”)– May read stale data– data may change while acting upon it
• Lock current record (“cursor stability”)– Other parts of the set may change while active– may interfere with transactions
Costs of Distributed Processing
• Two phase commit protocol• Aborting a transaction• I/O for Queries• Network delays (use dedicated
connections)
Relevance to Streams
NO• Locks• Rollbacks
YES• Memory allocation• Disk latency• I/O Bandwidth• Deadlines?
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data:consistency & validity
• Conclusions
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data:consistency & validity• Conclusions
Properties of Data
• Consistency– Temporal– Absolute– Relative
• Validity and Timestamp– Validity interval = how long reading is
accurate after arrives in system (timestamp)
Temporal Consistency
• R-T imposes temporal constraints not present in streaming systems
• Need to preserve temporal validity of data to reflect state of environment
• If transaction must meet deadline, valid data must be in R-T system
• Consists of absolute and relative consistency
Looking at Fresh Data
• How do we look at relevant data in streams?– No guarantee data is fresh– Shed older data; as new data comes in, older data is
flushed from system to make room
• How do we look at fresh data in Real-Time databases?– Make sure data hasn’t expired – absolute consistency
Absolute Consistency
• B/w state of environment and its reflection in database
• Necessary to ensure controlling system is aware of actual state of environment
• Example:– A reading is taken indicating which reporter is
with the 3rd infantry on April 8th; this reading is valid for 24 hours
Formal Definition: Absolute Consistency
• Data item d is described by: (value, avi, timestamp)
dvalue = current state of ddtimestamp = time when observation concerning d
was made davi=d’s absolute validity interval: length of time
following d during which d has absolute validity
Validity b/w Data in Streams
• Suppose: Want tuple1 and tuple2 to have been created within given time interval
• Implicit notion of relative validity– Data isn’t persistent in streaming systems
less likely have relative inconsistency b/c as data becomes stale, less likely to be in system
– If time interval is important, specify in query• E.g. window joins
Relative Consistency: R-T
• Data must be consistent in a group used to derive other data– Data used to derive other data must be
produced close together• Example:
– If we are taking the average temperature of 3 locations, readings for the 3 areas should be taken within proximity of each other
Formal Definition: Relative Consistency
• Set of data items used to derive other data is a relative consistency set, R
• Rrvi = relative validity interval• R is relatively consistent if:
– ∀ d’ ∈ R, | dtimestamp – d’timestamp | ≤ Rrvi
Illustration
• Scud missile is traveling at 500 mph SW at –45º, at an altitude of 100 ft
• Patriot missile is traveling at 1500 mph NE at 60º
• Can compute certain calculations to see if they will intercept
• Readings must be taken within some time interval I, to ensure computation is possible
Observe…
• Scud_speedavi=4 ms, patriot_speedavi = 2 ms, and Rrvi = 1; time = 12:33
• Scud_speed = (500, 4, 12:30)• Patriot_speed1 = (1500, 2, 12:31)• Patriot_speed2 = (1500, 2, 12:32)• All have absolute consistency, but R’s
relative consistency is violated
Achieving Validity Intervals
• avi:– R-T: realized by frequent sampling of real-world data; – Streams: can’t do this
• rvi: – R-T: rvi w/avi smallest avi belonging to relative
consistency set will prevail;– Streams: specified in query– Note: only necessary to achieve rvi of RC Set R if data
is being derived from R
Timestamps of Derived Data
• How to assign timestamp to derived data d’?
• One possibility: give d’ timestamp of oldest item from which derived:
d’timestamp = mind ∈ R (dtimestamp)• Alternative: d’timestamp = some function of
data from which derived
Another Note on Consistency
• avi and rvi may change with system dynamics– Streams: if querying soldier’s heartbeat and
see it stabilize can issue query less often– R-T: if system notices heartbeat is steady, may
increase validity interval
Real-Time relation to Streams
• Real-Time can have streaming queries• Temporal validity of data vs. window
joins• Load shedding based on user-defined
priorities (QoS); R-T sheds transactions vs. Streams shed tuples
Conclusions
• Parallels between Streams and Real-Time Databases: – Scheduling (Streams: based on QoS; R-T:
based on priority and deadlines)– Load Shedding (Streams: tuple-based; R-T:
based on ability to meet deadlines)– Freshness of Data (Streams: defined in query;
R-T: defined in data)
Discussion Questions
• What are the pros and cons of building notion of relative consistency into DSMS itself instead of its queries?
• Is it worthwhile to define QoS for a DSMS in terms of a ratio between queries? For example a relative periodic query scheduling policy.
Discussion Questions
• Should it possible to dynamically update the Relative Miss Ratio? What are some situations that would benefit from this? In streams?
• Low priority queries miss their deadlines and do not run, what is the parallel to this in a DSMS?
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation • Properties of Data: consistency and
validity• Conclusions
Real-Time Databases/Scheduling
• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data: consistency and
validity• Conclusions