real-time databasescs227b/slides/0408rtdbs.pdf• not all queries are equal • another form of qos...

69
Real-Time Databases Meghan Russ Miriam Speert Pete Dempsey Sedat Behar Yevgeny Ioffe Zachi Klopman

Upload: truongcong

Post on 07-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Real-Time Databases

Meghan RussMiriam SpeertPete DempseySedat Behar

Yevgeny IoffeZachi Klopman

Timeline

1:40 - 1:50: Introduction1:50 - 3:00: Real-Time Databases/Scheduling3:00 - 3:10: Break3:10 - 4:00: Operator Scheduling in Aurora4:00 - 4:25: Discussion4:25 - 4:30: Comments

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data: consistency and

validity• Conclusions

Imagine this…

• We are at war with Iraq• Our soldiers find a potential target• Military intelligence consults a database

to determine course of action

Imagine this…

• We are at war with Iraq• Air control system constantly monitors

hundreds of aircraft and records them in a database

• Intelligence systems constantly query the database for potential threats

Suddenly…

• Hundreds of missiles are launched• We suspect some are nuclear• Need info which will allow us to

determine a course of action• Need this info to make rapid decision• The costs of indecision are catastrophic

What could go wrong?

• Limited number of missiles we can intercept

• Once they’re launched, we have limited time to react

• Our traditional database is slowed by less critical queries

• Finally, our queries may not be answered in time due to system load

We need a system that:

• Handles time-sensitive queries• Returns only temporally valid data• Supports priority scheduling

• Solution: Real-Time Databases!

Real-Time Databases and Streams

• Scheduling– Streams: priority based on QoS optimization– Real-Time: priority based on deadlines

• Load Shedding– Streams: dropping tuples from queues– Real-Time: missing deadlines

• Freshness of data:– Streams: not guaranteed– Real-Time: resample

Real-Time Databases and Streams

• Scheduling– Streams: priority based on QoS optimization– Real-Time: priority based on deadlines and user-

supplied values• Load Shedding

– Streams: dropping tuples from queues– Real-Time: missing deadlines, dropping transactions

• Freshness of data:– Streams: not guaranteed– Real-Time: resample

Real-Time Databases

• An extension to traditional databases• Motivated by class of applications that

require reliable responses• Predictable (not necessarily fast)

Real-Time Database Features

• Priority – Classification of transactions– Assigns value to transactions

• Deadlines– Transactions specify explicit time requirements– Transaction scheduling takes time requirements into

account– Predictability that transactions will complete by

deadline or not at all

Transactions and Streams

• Operation on the database that perform combinations of reads/writes in an atomic step– Queries are a subset of transactions

• Streams are read-only data (may create new tuples)

• Data Consistency

Characteristics of Transactions

• Manner in which transactions use data• Nature of time constraints• Significance of executing a transaction by its

deadline– consequence of missing specified time constraints

Transaction Classification

• Effect of missing transaction deadlines• Value to user is dependent on timeliness:

– Soft: have some value after deadline– Firm: have no value after deadline– Hard: have negative value after deadline

• Special case: no deadline• Idea for Streams: Queries have periodic

deadlines

Scheduling and Streams

• Streams: schedules queries in terms of QoS

• Real-Time Databases: schedule transactions in terms of scheduling policy

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies • Resource Allocation• Properties of Data: consistency and

validity• Conclusions

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data: consistency and

validity• Conclusions

Scheduling Policies

• Earliest deadline first (PMM, PAQRS)• Highest value first• Highest value per unit computation time

first• Longest executed transaction first

PMM

• Priority Memory Management• Admission Control

– Decide if we run a query.• Memory Allocation

– How much memory does each running query get.

Memory Allocation: Two Strategies

• Max– Queries get their maximum required memory

or no memory at all.• MinMax

– High priority queries get their maximum required memory and low priority queries get their minimum.

Admission Control

• Goal: minimize the miss ratio (number of queries that miss their deadline/total queries).

• MultiProgramming Level (MPL) = number of queries to run.

• Optimize system resource use: optimal MPL.

Relating MPL to Streams

• Real-Time: One time queries• Stream: Continuous Queries

• Possibilities for future DSMS:– Using MPL for QoS

Oh no! Missiles are launched again.

• We are running two types of queries:– Query1 – Where should CNN’s cameras face

to see the missile?– Query2 – Should we shoot the missile down?

• Queries of type 2 are obviously more important, but how does the db know?

• Consider: Applications for relative query values in stream systems.

PAQRS – extension of PMM

• Priority Adaptation Query Resource Scheduling.

• PMM only minimizes miss ratio for the entire system.

• We would like to be able to specify a ratio between query classes for missed deadlines.

• RelMissRatio (Relative Miss Ratio) = {99:1} Query1:Query2.

Why do we care?

• Think of the missile example.– Same problems still exist in stream systems.

• Potential Stream Additions:– Relative Priority Scheduling.

• Not all queries are equal• Another form of QoS

– Periodic Query Deadlines.• Deadlines for continuous queries

Bias Control

• Puts queries into two groups:– Regular – Queries run with normal priority– Reserve – Queries run with priority lower than

regular.• Manages groups on a per query basis

– Each class gets RegQuota regular queries.– The rest have to run as reserve queries.

Relative Weights

Weight should reflect a class’ RelMissRatio.Weighti = (1/RelMissRatioi)/Σj(1/RelMissRatioj)Weightcnn = (1/99)/(1/99 + 1) = .01Weightmis = (1)/(1/99 + 1) = .99

Bias Control using Relative Weights

WeightedMissRatio = Σ(Weighti * MissRatioi)All terms are equal when the ratio is correct.

WeightMissRatioex=(.01*99x%) + (.99*x%)WeightMissRatioex=.99x% + .99x%

Back to Missiles and CNN

• The actual miss ratio is not correct, the miss ratio is 50:50!

• RegQuotainew = RegQuotai

old * {(Weighti * MissRatioi)/ (WeightedMissRatio/NumClasses)}

Missiles and CNN Calculations

WeightedMissRate=(.01*.50)+(.99*.50)=.5.005 ≠ .495

RegQuotacnnnew=RegQuotacnn

old * (.01*.50)/(.5/2)RegQuotacnn

new=RegQuotacnnold *.02 (98% less)

RegQuotamisnew=RegQuotamis

old * (.99*.50)/(.5/2)RegQuotamis

new=RegQuotamisold *1.98 (98% more)

Does it really work?

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation • Properties of Data:consistency & validity• Conclusions

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data:consistency & validity• Conclusions

Essence of Real Time

• Although adaptive systems give better throughput, IT DOESN’T MATTER!

• RT is about dependability, not throughput. 1% miss rate is (usually) unacceptable.

• Throughput can be handled (usually) with extra hardware (i.e. money). Dependability needs a special design.

Resources in Databases

• Logical– Locks

• Physical– CPU(s)– Memory

• Cache• Work Area

– I/O Bandwidth• Disks & Storage• Network for

Distributed Processing

– Time...

Cost of a Transaction

• Waiting for locks to release • Work memory needed (e.g. O(n) for in-memory

hash join, O(sqrt(n)) for disk-assisted)• I/O amount (e.g. worst case join: multiplication)• CPU needed to process• Cost of aborting a transaction (negligible for

queries)

If success cannot be guaranteed, don't start!

Physical Resources – Now and Then

4 (16)10# Disks

616.7Disk Latency (ms)

Latency is Forever…

8192 (1GB)256Disk Cache (kB)120 (1TB)1Disk Size (GB)

10010I/O Bandwidth (MB/s)80020Memory Buffers (MB)

2 X 250040CPU Speed (MIPS)

2003 (opt.RAID)1995 (Paper)

Memory Allocation Strategies (I)

• Max– all memory needed or nothing (don't admit)

• MinMax– all memory needed for high-priority– min memory needed for low-priority

• M&M– feedback-based allocation – adaptive– small amount of memory set aside for small

transactions

Memory Allocation Strategies (II)

• Multiclass Dependent– Small get all the memory they need– Large get a minimum amount– Medium get according to level load

• Classes are:– Small – less than 10% of memory– Large – more than memory– Medium – between them.

Allocating Memory

S M L M S L S

Multi-Class Resource Allocation

Single Queue Multiple Queues

S S M LL S M LM S M L

SP

Resources

LPMP

Resources

MPSPSP LP

LP

Locking Strategies for Transactions

• Wait patiently…– Bad idea – can wait forever for lower priority or

deadlock

• Upgrade priority of lock holder– Will complete less important job and then continue

• Abort transaction with lower priority– Need to asses time of abort…

RT systems are not tolerant about lock delays!

Locks for Queries (Cursors)

• Grab all locks– Long wait, holds other transactions

• Disregard locks (“dirty read”)– May read inconsistent data or data to be discarded

• Read only committed data (“committed read”)– May read stale data– data may change while acting upon it

• Lock current record (“cursor stability”)– Other parts of the set may change while active– may interfere with transactions

Costs of Distributed Processing

• Two phase commit protocol• Aborting a transaction• I/O for Queries• Network delays (use dedicated

connections)

Relevance to Streams

NO• Locks• Rollbacks

YES• Memory allocation• Disk latency• I/O Bandwidth• Deadlines?

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data:consistency & validity

• Conclusions

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data:consistency & validity• Conclusions

Properties of Data

• Consistency– Temporal– Absolute– Relative

• Validity and Timestamp– Validity interval = how long reading is

accurate after arrives in system (timestamp)

Temporal Consistency

• R-T imposes temporal constraints not present in streaming systems

• Need to preserve temporal validity of data to reflect state of environment

• If transaction must meet deadline, valid data must be in R-T system

• Consists of absolute and relative consistency

Looking at Fresh Data

• How do we look at relevant data in streams?– No guarantee data is fresh– Shed older data; as new data comes in, older data is

flushed from system to make room

• How do we look at fresh data in Real-Time databases?– Make sure data hasn’t expired – absolute consistency

Absolute Consistency

• B/w state of environment and its reflection in database

• Necessary to ensure controlling system is aware of actual state of environment

• Example:– A reading is taken indicating which reporter is

with the 3rd infantry on April 8th; this reading is valid for 24 hours

Formal Definition: Absolute Consistency

• Data item d is described by: (value, avi, timestamp)

dvalue = current state of ddtimestamp = time when observation concerning d

was made davi=d’s absolute validity interval: length of time

following d during which d has absolute validity

Validity b/w Data in Streams

• Suppose: Want tuple1 and tuple2 to have been created within given time interval

• Implicit notion of relative validity– Data isn’t persistent in streaming systems

less likely have relative inconsistency b/c as data becomes stale, less likely to be in system

– If time interval is important, specify in query• E.g. window joins

Relative Consistency: R-T

• Data must be consistent in a group used to derive other data– Data used to derive other data must be

produced close together• Example:

– If we are taking the average temperature of 3 locations, readings for the 3 areas should be taken within proximity of each other

Formal Definition: Relative Consistency

• Set of data items used to derive other data is a relative consistency set, R

• Rrvi = relative validity interval• R is relatively consistent if:

– ∀ d’ ∈ R, | dtimestamp – d’timestamp | ≤ Rrvi

Illustration

• Scud missile is traveling at 500 mph SW at –45º, at an altitude of 100 ft

• Patriot missile is traveling at 1500 mph NE at 60º

• Can compute certain calculations to see if they will intercept

• Readings must be taken within some time interval I, to ensure computation is possible

Observe…

• Scud_speedavi=4 ms, patriot_speedavi = 2 ms, and Rrvi = 1; time = 12:33

• Scud_speed = (500, 4, 12:30)• Patriot_speed1 = (1500, 2, 12:31)• Patriot_speed2 = (1500, 2, 12:32)• All have absolute consistency, but R’s

relative consistency is violated

Achieving Validity Intervals

• avi:– R-T: realized by frequent sampling of real-world data; – Streams: can’t do this

• rvi: – R-T: rvi w/avi smallest avi belonging to relative

consistency set will prevail;– Streams: specified in query– Note: only necessary to achieve rvi of RC Set R if data

is being derived from R

Timestamps of Derived Data

• How to assign timestamp to derived data d’?

• One possibility: give d’ timestamp of oldest item from which derived:

d’timestamp = mind ∈ R (dtimestamp)• Alternative: d’timestamp = some function of

data from which derived

Another Note on Consistency

• avi and rvi may change with system dynamics– Streams: if querying soldier’s heartbeat and

see it stabilize can issue query less often– R-T: if system notices heartbeat is steady, may

increase validity interval

Real-Time relation to Streams

• Real-Time can have streaming queries• Temporal validity of data vs. window

joins• Load shedding based on user-defined

priorities (QoS); R-T sheds transactions vs. Streams shed tuples

Conclusions

• Parallels between Streams and Real-Time Databases: – Scheduling (Streams: based on QoS; R-T:

based on priority and deadlines)– Load Shedding (Streams: tuple-based; R-T:

based on ability to meet deadlines)– Freshness of Data (Streams: defined in query;

R-T: defined in data)

Discussion Questions

• What are the pros and cons of building notion of relative consistency into DSMS itself instead of its queries?

• Is it worthwhile to define QoS for a DSMS in terms of a ratio between queries? For example a relative periodic query scheduling policy.

Discussion Questions

• Should it possible to dynamically update the Relative Miss Ratio? What are some situations that would benefit from this? In streams?

• Low priority queries miss their deadlines and do not run, what is the parallel to this in a DSMS?

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation • Properties of Data: consistency and

validity• Conclusions

Real-Time Databases/Scheduling

• General Introduction• Scheduling Policies• Resource Allocation• Properties of Data: consistency and

validity• Conclusions

Persistence of Memory (Dali, 1931)

databasesarepersistent.

data streams,likememory,fadewith time…