adaptive query processing
DESCRIPTION
Adaptive Query Processing. Adapted from a Tutorial given at SIGMOD 2006 by: Amol Deshpande, University of Maryland Joseph M. Hellerstein, University of California, Berkeley Vijayshankar Raman, IBM Almaden Research Center. Data Independence Redux. d app. d env.TRANSCRIPT
![Page 1: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/1.jpg)
Adaptive Query Processing
Adapted from a Tutorial given at SIGMOD 2006 by:
Amol Deshpande, University of Maryland
Joseph M. Hellerstein, University of California, Berkeley
Vijayshankar Raman, IBM Almaden Research Center
![Page 2: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/2.jpg)
Data Independence Redux The taproot of modern database technology Separation of specification (“what”) from implementation (“how”)
Refamiliarizing ourselves: Why do we care about data independence?
d appdt
d env
dt
![Page 3: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/3.jpg)
D. I. Adaptivity Query Optimization: the key to data independence
– bridges specification and implementation– isolates static applications from dynamic environments
How does a DBMS account for dynamics in the environment?
This tutorial is on a 30-year-old topic– With a 21st-Century renaissance
ADAPTIVITYADAPTIVITY
![Page 4: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/4.jpg)
Why the Renaissance? Breakdown of traditional query optimization
– Queries over many tables– Unreliability of traditional cost estimation– Success & maturity make problems more apparent, critical
• c.f. Oracle v6!
Query processing in new environments– E.g. data integration, web services, streams, P2P, sensornets, hosting, etc.– Unknown and dynamic characteristics for data and runtime– Increasingly aggressive sharing of resources and computation– Interactivity in query processing
Note two separate themes? – Unknowns: even static properties often unknown in new environments
• and often unknowable a priori– Dynamics: can be very high -- motivates intra-query adaptivity
?
denvdt
![Page 5: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/5.jpg)
20th Century Summary System R’s optimization scheme deemed the winner for 25 years
Nearly all 20thC research varied System R’s individual steps– More efficient measurement (e.g. sampling)– More efficient/effective models (samples, histograms, sketches)– Expanded plan spaces (new operators, bushy trees, richer queries and data
models, materialized views, parallelism, remote data sources, etc)– Alternative planning strategies (heuristic and enumerative)
Speaks to the strength of the scheme– independent innovation on multiple fronts– as compared with tight coupling of INGRES
But… minimal focus on the interrelationship of the steps– Which, as we saw from Ingres, also affects the plan space
![Page 6: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/6.jpg)
21st Century Adaptive Query Processing (well, starts in late 1990’s)
Revisit basic architecture of System R– In effect, change the basic adaptivity loop!
As you examine schemes, keep an eye on:– Rate of change in the environment that is targeted– How radical the scheme is wrt the System R scheme
• ease of evolutionary change– Increase in plan space: are there new, important opportunities?
• even if environment is ostensibly static!– New overheads introduced– How amenable the scheme is to independent innovation at each step
• Measure/Analyze/Plan/Actuate
![Page 7: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/7.jpg)
Tangentially Related Work An incomplete list!!!
Competitive Optimization [Antoshenkov93]– Choose multiple plans, run in parallel for a time, let the most promising finish
• 1x feedback: execution doesn’t affect planning after the competition Parametric Query Optimization [INSS92, CG94, etc.]
– Given partial stats in advance. Do some planning and prune the space. At runtime, given the rest of statistics, quickly finish planning.• Changes interaction of Measure/Model and Planning• No feedback whatsoever, so nothing to adapt to!
“Self-Tuning”/“Autonomic” Optimizers [CR94, CN97, BC02, etc.]– Measure query execution (e.g. cardinalities, etc.)
• Enhances measurement, on its own doesn’t change the loop– Consider building non-existent physical access paths (e.g. indexes, partitions)
• In some senses a separate loop – adaptive database design• Longer timescales
![Page 8: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/8.jpg)
Tangentially Related Work II Robust Query Optimization [CHG02, MRS+04, BC05, etc.]
– Goals: • Pick plans that remain predictable across wide ranges of scenarios• Pick least expected cost plan
– Changes cost function for planning, not necessarily the loop.• If such functions are used in adaptive schemes, less fluctuation [MRS+04]
– Hence fewer adaptations, less adaptation overhead
Adaptive query operators [NKT88, KNT89, PCL93a, PCL93b]– E.g. memory-adaptive sort and hash-join– Doesn’t address whole-query optimization problems– However, if used with AQP, can result in complex feedback loops
• Especially if their actions affect each other’s models!
![Page 9: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/9.jpg)
Extended Topics in Adaptive QP An incomplete list!!
Parallelism & Distribution– River [A-D03] – FLuX [SHCF03, SHB04]– Distributed eddies [TD03]
Data Streams– Adaptive load shedding– Shared query processing
![Page 10: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/10.jpg)
Adaptive Selection Ordering
![Page 11: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/11.jpg)
Selection Ordering
Complex predicates on relations common– Eg., on an employee relation:
((salary > 120000) AND (status = 2)) OR ((salary between 90000 and 120000) AND (age < 30) AND (status = 1)) OR …
Selection ordering problem Decide the order in which to evaluate the individual predicates against the tuples
We focus on evaluating conjunctive predicates (containing only AND’s) Example Query
select * from R where R.a = 10 and R.b < 20 and R.c like ‘%name%’;
![Page 12: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/12.jpg)
Why Study Selection Ordering
Many join queries reduce to this problem– Queries posed against a star schema– Queries where only pipelined left-deep plans are considered– Queries involving web indexes
Increasing interest in recent years– Web indexes [CDY’95, EHJKMW’96, GW’00]– Web services [SMWM’06]– Data streams [AH’00, BMMNW’04]– Sensor Networks [DGMH’05]
Similar to many problems in other domains– Sequential testing (e.g. for fault detection) [SF’01, K’01]– Learning with attribute costs [KKM’05]
![Page 13: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/13.jpg)
Execution StrategiesPipelined execution (tuple-at-a-time)
R.a = 10 R.b < 20 R.c like …
For each tuple r Є R Apply predicate R.a = 10 first; If tuple satisfies the selection, apply R.b < 20; If both satisfied, apply R.c like ‘%name%’;
R result
Static optimization ? 1. Using the KBZ algorithm Order by c/(1-p) Assumes predicate independence 2. A greedy algorithm Known to be 4-approximate
![Page 14: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/14.jpg)
Adaptive Greedy [BMMNW’04] Context: Pipelined query plans over streaming data Example:
R.a = 10 R.b < 20 R.c like …
Initial estimated selectivities
0.05 0.1 0.2
Costs 1 unit 1 unit 1 unit
Three independent predicates
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
Optimal execution plan orders by selectivities (because costs are identical)
![Page 15: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/15.jpg)
Adaptive Greedy [BMMNW’04] Monitor the selectivities Switch order if the predicates not ordered by selectivities
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
Rsample
Randomly sample R.a = 10
R.b < 20
R.c like …
estimate selectivities of the predicatesover the tuples of the profile
ReoptimizerIF the current plan not optimal w.r.t. these new selectivitiesTHEN reoptimize using the Profile
Profile
![Page 16: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/16.jpg)
Adaptive Greedy [BMMNW’04] Correlated Selections
– Must monitor conditional selectivities
monitor selectivities sel(R.a = 10), sel(R.b < 20), sel(R.c …)
monitor conditional selectivities sel(R.b < 20 | R.a = 10) sel(R.c like … | R.a = 10) sel(R.c like … | R.a = 10 and R.b < 20)
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
Rsample
Randomly sample R.a = 10
R.b < 20
R.c like …(Profile)
ReoptimizerUses conditional selectivities to detect violationsUses the profile to reoptimize
O(n2) selectivities need to be monitored
![Page 17: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/17.jpg)
Adaptive Greedy [BMMNW’04]
Advantages: – Can adapt very rapidly– Theoretical guarantees on performance
• Not known for any other AQP protocols
Disadvantages:– Limited applicability
• Only applies to selection ordering and specific types of join queries– Possibly high runtime overheads
• Several heuristics described in the paper
![Page 18: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/18.jpg)
Eddies [AH’00] Treat query processing as routing of tuples through operators
Pipelined query execution using an eddy
An eddy operator• Intercepts tuples from sources and output tuples from operators• Executes query by routing source tuples through operators
A traditional pipelined query plan
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 19: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/19.jpg)
Eddies [AH’00]
a b c …15 10 AnameA …
An R Tuple: r1
r1
r1
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 20: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/20.jpg)
ready bit i : 1 operator i can be applied 0 operator i can’t be applied
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 21: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/21.jpg)
done bit i : 1 operator i has been applied 0 operator i hasn’t been applied
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 22: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/22.jpg)
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
Used to decide validity and need of applying operators
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 23: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/23.jpg)
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
satisfiedr1
r1
a b c … ready done15 10 AnameA … 101 010
r1
not satisfied
eddy looks at the next tuple
For a query with only selections, ready = complement(done)
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 24: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/24.jpg)
Eddies [AH’00]
a b c …10 15 AnameA …
An R Tuple: r2
Operator 1
Operator 2
Operator 3
r2EddyR
result
R.a = 10
R.c like …
R.b < 20
satisfied
satisfied
satisfied
![Page 25: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/25.jpg)
Eddies [AH’00]
a b c … ready done10 15 AnameA … 000 111
An R Tuple: r2
Operator 1
Operator 2
Operator 3
r2
if done = 111, send to output
r2
EddyR
result
R.a = 10
R.c like …
R.b < 20
satisfied
satisfied
satisfied
![Page 26: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/26.jpg)
Eddies [AH’00] Adapting order is easy
– Just change the operators to which tuples are sent– Can be done on a per-tuple basis– Can be done in the middle of tuple’s “pipeline”
How are the routing decisions made ?– Using a routing policy
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 27: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/27.jpg)
Routing Policy 1: Non-adaptive Simulating a single static order
– E.g. operator 1, then operator 2, then operator 3
Routing policy: if done = 000 route to 1 100 route to 2 110 route to 3
table lookups very efficient
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 28: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/28.jpg)
Overhead of Routing PostgreSQL implementation of eddies using bitset lookups [Telegraph Project] Queries with 3 selections, of varying cost
– Routing policy uses a single static order, i.e., no adaptation
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 μsec 10 μsec 100 μsec
Selection cost
Norm
aliz
ed C
ost
No-eddies
Eddies
![Page 29: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/29.jpg)
Routing Policy 2: Deterministic Monitor costs and selectivities continuously Reoptimize periodically using KBZ
Statistics Maintained: Costs of operators Selectivities of operators
Routing policy: Use a single order for a batch of tuples Periodically apply KBZ
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
Can use the A-Greedy policy for correlated predicates
![Page 30: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/30.jpg)
Overhead of Routing and Reoptimization Adaptation using batching
– Reoptimized every X tuples using monitored selectivities– Identical selectivities throughout experiment measures only the overhead
0
1
2
3
4
5
6
0 μsec 10 μsec 100 μsec
Selection Cost
Norm
aliz
ed C
ost No-eddies
Eddies - No reoptimization
Eddies - Batch Size = 100 tuples
Eddies - Batch Size = 1 tuple
![Page 31: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/31.jpg)
Routing Policy 3: Lottery Scheduling Originally suggested routing policy [AH’00] Applicable when each operator runs in a separate “thread”
– Can also be done single-threaded, via an event-driven query executor Uses two easily obtainable pieces of information for making routing
decisions:– Busy/idle status of operators– Tickets per operator
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
![Page 32: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/32.jpg)
Routing Policy 3: Lottery Scheduling Routing decisions based on busy/idle status of operators
Rule: IF operator busy, THEN do not route more tuples to it
Rationale: Every thread gets equal time SO IF an operator is busy, THEN its cost is perhaps very high
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
BUSY
IDLE
IDLE
![Page 33: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/33.jpg)
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rules: 1. Route a new tuple randomly weighted according to the number of tickets
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
Will be routed to: O1 w.p. 0.1 O2 w.p. 0.7 O3 w.p. 0.2
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
r
![Page 34: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/34.jpg)
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rules: 1. Route a new tuple randomly weighted according to the number of tickets
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
r
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
![Page 35: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/35.jpg)
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 11tickets(O2) = 70tickets(O3) = 20
![Page 36: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/36.jpg)
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
r
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; 3. Oi returns a tuple to eddy tickets(Oi) --;
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 11tickets(O2) = 70tickets(O3) = 20
![Page 37: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/37.jpg)
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
r
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; 3. Oi returns a tuple to eddy tickets(Oi) --;
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
Will be routed to: O2 w.p. 0.777 O3 w.p. 0.222
![Page 38: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/38.jpg)
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rationale: Tickets(Oi) roughly corresponds to (1 - selectivity(Oi)) So more tuples are routed to the more selective operators
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; 3. Oi returns a tuple to eddy tickets(Oi) --;
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
![Page 39: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/39.jpg)
Routing Policy 3: Lottery Scheduling
Effect of the combined lottery scheduling policy:– Low cost operators get more tuples– Highly selective operators get more tuples– Some tuples are randomly, knowingly routed according to sub-optimal
orders• To explore• Necessary to detect selectivity changes over time
![Page 40: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/40.jpg)
Routing Policy 4: Content-based Routing Routing decisions made based on the values of the attributes [BBDW’05] Also called “conditional planning” in a static setting [DGHM’05] Less useful unless the predicates are expensive
– At the least, more expensive than r.d > 100
Example Eddy notices: R.d > 100 sel(op1) > sel(op2) & R.d < 100 sel(op1) < sel(op2)
Routing decisions for new tuple “r”: IF (r.d > 100): Route to op1 first w.h.p ELSE Route to op2 first w.h.p
Operator 1
Operator 2
Eddy result
Expensive predicates
![Page 41: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/41.jpg)
Eddies: Post-Mortem
Cost of adaptivity– Routing overheads
• Minimal with careful engineering– E.g. using bitset-indexed routing arrays
• “Batching” helps tremendously– Statistics Maintenance– Executing the routing policy logic
![Page 42: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/42.jpg)
Discussion
Benefits for AQP techniques come from two places– Increased explored plan space
• Can use different plans for different parts of data – Adaptation
• Can change the plan according to changing data characteristics
Selection ordering is STATELESS– No inherent “cost of switching plans”
• Can switch the plan without worrying about operator states– Key to the simplicity of the techniques
![Page 43: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/43.jpg)
Discussion
Adaptation is not free– Costs of monitoring and maintaining statistics can be very high
• A selection operation may take only 1 instruction to execute• Comparable to updating a count
– “Sufficient statistics”• May need to maintain only a small set of statistics to detect
violations • E.g. The O(n2) matrix in Adaptive-Greedy [BBMNW’04]
![Page 44: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/44.jpg)
Adaptive Join Processing
![Page 45: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/45.jpg)
Outline
20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering
Adaptive Join Processing– Additional complexities beyond selection ordering– Four plan spaces
• Simplest: pipelines of Nested Loop Joins• Traditional: Trees of Binary Operators (TOBO)• Multi-TOBO: horizontal partitioning• Dataflows of unary operators
– Handling asynchrony Research Roundup
![Page 46: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/46.jpg)
Select-Project-Join Processing Query: select count(*) from R, S, T
where R.a=S.a and S.b=T.b and S.c like ‘%name%’ and T.d = 10
An execution plan
Cost metric: CPU + I/O Plan Space:
– Traditionally, tree of binary join operators (TOBO):• access methods• Join algorithms• Join order
– Adaptive systems: • Some use the same plan space, but switch between plans during execution• Others use much larger plan spaces
– different adaptation techniques adapt within different plan spaces
S
T
R
SMJ
NLJ
![Page 47: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/47.jpg)
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
![Page 48: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/48.jpg)
Pipelined Nested Loops Join
Simplest method of joining tables– Pick a driver table (R). Call the rest driven tables– Pick access methods (AMs) on the driven tables– Order the driven tables– Flow R tuples through the driven tables
For each r R do:look for matches for r in A;for each match a do:
look for matches for <r,a> in B;…
RB
NLJ
C
NLJ
A
NLJ
![Page 49: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/49.jpg)
Adapting a Pipelined Nested Loops Join
Simplest method of joining tables– Pick a driver table (R). Call the rest driven tables– Pick access methods (AMs) on the driven tables– Order the driven tables– Flow R tuples through the driven tables
For each r R do:look for matches for r in A;for each match a do:
look for matches for <r,a> in B;…
RB
NLJ
C
NLJ
A
NLJ
Keep this fixed for now
Almost identical to selection
ordering
![Page 50: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/50.jpg)
Tree Of Binary Join Operators (TOBO) recap
Standard plan space considered by most DBMSs today– Pick access methods, join order, join algorithms– search in this plan space done by dynamic programming
A
B
R
MJ
NLJ
NLJ
DC
HJ
![Page 51: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/51.jpg)
Switching plans during query execution
Monitor cardinalities at well-defined checkpoints
If actual cardinality is too different from estimated,
re-optimize to switch to a new plan
Most widely studied technique:-- Federated systems (InterViso 90, MOOD 96), Red Brick,
Query scrambling (96), Mid-query re-optimization (98), Progressive Optimization (04), Proactive Reoptimization (05), …
A
C
B
R
MJ
NLJ
MJ
B
C
HJ
MJ
sort
sort
![Page 52: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/52.jpg)
Questions
Where to place checkpoints ?– Typically at materialization points
When to reoptimize ?– State reuse a key – The cost of switching shouldn’t be more than cost of continuing
How to switch ?– Try to reuse the work
![Page 53: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/53.jpg)
Next Eddies…
![Page 54: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/54.jpg)
Example Database
Name Level
Joe Junior
Jen Senior
Name Course
Joe CS1
Jen CS2
Course Instructor
CS2 Smith
select *from students, enrolled, courseswhere students.name = enrolled.name and enrolled.course = courses.course
Students Enrolled
Name Level Course
Joe Junior CS1
Jen Senior CS2
Enrolled Courses
Students Enrolled
Courses
Name Level Course Instructor
Jen Senior CS2 Smith
![Page 55: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/55.jpg)
Symmetric Hash Join
Name Level
Jen Senior
Joe Junior
Name Course
Joe CS1
Jen CS2
Joe CS2
select * from students, enrolled where students.name = enrolled.name
Name Level CourseJen Senior CS2
Joe Junior CS1
Joe Senior CS2
Students Enrolled
Pipelined hash join [WA’91] Simultaneously builds and probes hash tables on
both sides
Widely used: adaptive qp, stream joins, online aggregation, …
Naïve version degrades to NLJ once memory runs out– Quadratic time complexity– memory needed = sum of inputs
Improved by XJoins [UF 00] Needs more investigation
![Page 56: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/56.jpg)
Query Execution using Eddies
EddySEC
Probe to find matches
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Joe Junior
Joe JuniorJoe Junior
No matches; Eddy processesthe next tuple
Output
Insert with key hash(joe)
![Page 57: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/57.jpg)
Query Execution using Eddies
EddySEC
InsertProbe
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Joe Jr
Jen Sr
Joe CS1
Joe CS1Joe CS1
Joe Jr CS1
Joe Jr CS1Joe Jr CS1
Output
CS2 Smith
![Page 58: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/58.jpg)
Query Execution using Eddies
EddySEC
Output
Probe
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Joe Jr
Jen Sr
CS2 Smith
Jen CS2
Joe CS1
Joe Jr CS1Jen CS2
Jen CS2Jen CS2 Smith
Probe
Jen CS2 SmithJen CS2 SmithJen Sr. CS2 Smith
Jen Sr. CS2 Smith
![Page 59: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/59.jpg)
Eddies: Postmortem (1)
• Eddy executes different TOBO plans for different parts of data
Students Enrolled
Output
Courses
E C
S E
Courses Enrolled
Output
Students
E S
C E
Course InstructorCS2 Smith
Course InstructorCS2 Smith
Name CourseJoe CS1
Name LevelJoe Junior
Jen Senior
Name LevelJoe Junior
Jen Senior
Name CourseJen CS2
![Page 60: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/60.jpg)
Routing Policies routing of tuples determines join order
same policies as described for selections– can simulate static order– lottery scheduling– can tune policy for interactivity metric [RH02]
much more work remains to be done
![Page 61: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/61.jpg)
Summary
Eddies dynamically reorder pipelined operators, on a per-tuple basis– selections, nested loop joins, symmetric hash joins– extends naturally to combinations of the above
This can also be extended to non-pipelined operators (e.g. hybrid hash)
But, eddy can adapt only when it gets control (e.g., after hash table build)– just like with mid-query reoptimization
Next we study two extensions that widen the plan space still further– STAIRs: moving state across join operators– SteMs: unary operators for adapting access methods,
join algorithms dynamically
![Page 62: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/62.jpg)
N-way Symmetric Hash Join
Name Level
Jen Senior
Joe Junior
Name Course
Joe CS1
Jen CS2
Joe CS2
select * from students, enrolled where students.name = enrolled.name
Name Level Course InstructorJen Senior CS2 Prof. B
Joe Junior CS1 Prof. A
Joe Senior CS2 Prof. B
Students Enrolled
Pipelined join– XJoin-like disk spilling possible
[VNB’03] Simplest version atomically does
one build + (n-1) probes Breaking this atomicity is key to
adaptation
Cour Inst
CS1 P. A
CS2 P. B
Levels
![Page 63: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/63.jpg)
N-way Joins State Modules (SteMs)
Name Level
Jen Senior
Joe Junior
Name Course
Joe CS1
Jen CS2
Joe CS2
select * from students, enrolled where students.name = enrolled.name
Name Level Course InstructorJen Senior CS2 Prof. B
Joe Junior CS1 Prof. A
Joe Senior CS2 Prof. B
Students Enrolled
Cour Inst
CS1 P. A
CS2 P. B
Levels
![Page 64: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/64.jpg)
References [A-D03] R. Arpaci-Dusseau. Runtime Adaptation in River. ACM TOCS 2003. [AH’00] R. Avnur, J. M. Hellerstein: Eddies: Continuously Adaptive Query Processing SIGMOD
Conference 2000: 261-272 [Antoshenkov93] G. Antoshenkov: Dynamic Query Optimization in Rdb/VMS. ICDE 1993: 538-547. [BBD’05] S. Babu, P. Bizarro, D. J. DeWitt. Proactive Reoptimization. VLDB 2005: 107-118 [BBDW’05] P. Bizarro, S. Babu, D. J. DeWitt, J. Widom: Content-Based Routing: Different Plans for
Different Data. VLDB 2005: 757-768 [BC02] N. Bruno, S. Chaudhuri: Exploiting statistics on query expressions for optimization. SIGMOD
Conference 2002: 263-274 [BC05] B. Babcock, S. Chaudhuri: Towards a Robust Query Optimizer: A Principled and Practical
Approach. SIGMOD Conference 2005: 119-130 [BMMNW’04] S. Babu, et al: Adaptive Ordering of Pipelined Stream Filters. SIGMOD Conference 2004:
407-418 [CDHW06] Flow Algorithms for Two Pipelined Filter Ordering Problems; Anne Condon, Amol
Deshpande, Lisa Hellerstein, and Ning Wu. PODS 2006. [CDY’95] S. Chaudhuri, U. Dayal, T. W. Yan: Join Queries with External Text Sources: Execution and
Optimization Techniques. SIGMOD Conference 1995: 410-422 [CG94] R. L. Cole, G. Graefe: Optimization of Dynamic Query Evaluation Plans. SIGMOD Conference
1994: 150-160. [CHG02] F. C. Chu, J. Y. Halpern, J. Gehrke: Least Expected Cost Query Optimization: What Can We
Expect? PODS 2002: 293-302 [CN97] S. Chaudhuri, V. R. Narasayya: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL
Server. VLDB 1997: 146-155
![Page 65: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/65.jpg)
References (2) [CR94] C-M Chen, N. Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD
Conference 1994: 161-172 [DGHM’05] A. Deshpande, C. Guestrin, W. Hong, S. Madden: Exploiting Correlated Attributes in
Acquisitional Query Processing. ICDE 2005: 143-154 [DGMH’05] A. Deshpande, et al.: Model-based Approximate Querying in Sensor Networks. In VLDB
Journal, 2005 [DH’04] A. Deshpande, J. Hellerstein: Lifting the Burden of History from Adaptive Query Processing.
VLDB 2004. [EHJKMW’96] O. Etzioni, et al: Efficient Information Gathering on the Internet. FOCS 1996: 234-243 [GW’00] R. Goldman, J. Widom: WSQ/DSQ: A Practical Approach for Combined Querying of Databases
and the Web. SIGMOD Conference 2000: 285-296 [INSS92] Y. E. Ioannidis, R. T. Ng, K. Shim, T. K. Sellis: Parametric Query Optimization. VLDB 1992. [Ives02] Z. G. Ives. Efficient Query Processing for Data Integration. Ph.D., U. Washington, 2002. [K’01] M.S. Kodialam. The throughput of sequential testing. In Integer Programming and Combinatorial
Optimization (IPCO) 2001. [KBZ’86] R. Krishnamurthy, H. Boral, C. Zaniolo: Optimization of Nonrecursive Queries. VLDB 1986. [KD’98] N. Kabra, D. J. DeWitt: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution
Plans. SIGMOD Conference 1998: 106-117 [KKM’05] H. Kaplan, E. Kushilevitz, and Y. Mansour. Learning with attribute costs. In ACM STOC, 2005. [KNT89] Masaru Kitsuregawa, Masaya Nakayama and Mikio Takagi, "The Effect of Bucket Size Tuning
in the Dynamic Hybrid GRACE Hash Join Method”. VLDB 1989.
![Page 66: Adaptive Query Processing](https://reader036.vdocuments.us/reader036/viewer/2022062305/568159e3550346895dc72d16/html5/thumbnails/66.jpg)
References (3) [LEO 01] M. Stillger, G. M. Lohman, V. Markl, M. Kandil: LEO - DB2's LEarning Optimizer. VLDB 2001. [MRS+04] Volker Markl, et al.: Robust Query Processing through Progressive Optimization. SIGMOD
Conference 2004: 659-670 [MSHR’02] S. Madden, M. A. Shah, J. M. Hellerstein, V. Raman: Continuously adaptive continuous
queries over streams. SIGMOD Conference 2002: 49-60 [NKT88] M. Nakayama, M. Kitsuregawa, and M. Takagi. Hash partitioned join method using dynamic
destaging strategy. In VLDB 1988. [PCL93a] H. Pang, M. J. Carey, M. Livny: Memory-Adaptive External Sorting. VLDB 1993: 618-629 [PCL93b] H. Pang, M. J. Carey, M. Livny: Partially Preemptive Hash Joins. SIGMOD Conference 1993. [RH’05] N. Reddy, J. Haritsa: Analyzing Plan Daigrams of Database Query Optimizers; VLDB 2005. [SF’01] M.A. Shayman and E. Fernandez-Gaucherand: Risk-sensitive decision-theoretic diagnosis.
IEEE Trans. Automatic Control, 2001. [SHB04] M. A. Shah, J. M. Hellerstein, E. Brewer. Highly-Available, Fault-Tolerant, Parallel Dataflows ,
SIGMOD, June 2004. [SHCF03] M. A. Shah, J. M. Hellerstein, S. Chandrasekaran and M. J. Franklin. Flux: An Adaptive
Partitioning Operator for Continuous Query Systems, ICDE, March 2003. [SMWM’06] U. Srivastava, K. Munagala, J. Widom, R. Motwani: Query Optimization over Web Services;
VLDB 2006. [TD03] F. Tian, D. J. Dewitt. Tuple Routing Strategies for Distributed Eddies. VLDB 2003. [UFA’98] T. Urhan, M. J. Franklin, L. Amsaleg: Cost Based Query Scrambling for Initial Delays. SIGMOD
Conference 1998: 130-141 [UF 00] T. Urhan, M. J. Franklin: XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data
Eng. Bull. 23(2): 27-33 (2000) [VNB’03] S. Viglas, J. F. Naughton, J. Burger: Maximizing the Output Rate of Multi-Way Join Queries
over Streaming Information Sources. VLDB 2003: 285-296 [WA’91] A. N. Wilschut, P. M. G. Apers: Dataflow Query Execution in a Parallel Main-Memory
Environment. PDIS 1991: 68-77