To Tune or not to Tune?A Lightweight Physical Design Alerter
Nico Bruno, Surajit ChaudhuriDMX Group, Microsoft Research
VLDB’06
2
A DBA’s Dilemma Physical design tuning is important
Workloads and data change over time Installations often become suboptimal
Current tools: good but expensive
DBAs: Avoid suboptimal installations Periodically run expensive tools If no improvement, wasted resources
Tuner
DBMS
SELECT …INSERT …SELECT …
Recommendation:{Index1, Index2,
View1, View2}
3
A Lightweight Alerter
Low-overhead diagnostics Reliable lower-bound improvement
No false positives “Proof” with valid configuration
Upper-bound improvement Reduce false negatives
4
Outline
Instrumenting the optimizer Access path selection Index requests
Lower bounds Local transformations Alerting algorithm
Upper bounds Experimental results
5
Access Path Selection
Access Path Generation Module
Available indexes
Logical sub-plan Physical plans
Project(Filter(…))πc,d (σ a=10 (T))
Tag logical subplan with index request
Instrumentation Original optimizer
{(a, 0.85)}, Ø, {c,d}
Single entry-point for access-path selection (System-R, Cascades)
Intercept requests during optimization, save logical properties for later
6
Access Path Requests
ρ1 =( {(T1.a, 2500)}, Ø, {T1.a,T1.x,T1.w}, 1 )
ρ2 =({(T2.y, 0.2)}, Ø, {T2.y}, 2500)
ρ3 =({(T3.z, 1)}, Ø, {T3.z,T3.b}, 500)
ρ4 =({(T3.z, 0.2)}, Ø, {T3.z,T3.b}, 2500)
ρ5 =( {(T3.b, 5000)}, Ø, {T3.b,T3.z}, 1 )
SELECT T.bFROM T1, T2, T3WHERE T1.x=T2.y AND T1.w=T3.z AND T1.a=5 AND T3.b=8
T1.a=5 T3.b=8
T1 T2 T3
T1.x=T2.y
T1.w=T3.z
T1.w=T3.z
T1.x=T2.y
7
T1.a=5
T1 T2
T1.x=T2.y
T1.w=T3.z
T3.b=8
T3
T1.w=T3.z
T1.x=T2.y
ρ1 =( {(T1.a, 2500)}, Ø, {T1.a,T1.x,T1.w}, 1 )
ρ2 =({(T2.y, 0.2)}, Ø, {T2.y}, 2500)
ρ3 =({(T3.z, 1)}, Ø, {T3.z,T3.b}, 500)
ρ4 =({(T3.z, 0.2)}, Ø, {T3.z,T3.b}, 2500)
ρ5 =( {(T3.b, 5000)}, Ø, {T3.b,T3.z}, 1 )
T1.x=T2.y
T1.w=T3.z
ρ1, 0.08 secs
ρ2, 0.23 secs(left=0.08 secs)
ρ3, 0.45 secs(left=0.23 secs)
ρ5, 0.05 secs
Hash Join
Hash Join
Filter(T1.a=5) Filter(T3.b=8)
Scan(T1) Scan(T3)Scan(T2)
Monitoring Access Path Requests
Ø
OR
ρ1 Ø
OR
ρ2
AND
Ø
OR
ρ5
OR
ρ3
ANDAND
ρ1
ρ5ρ3
ρ2OR
“AND/OR trees” Encode
relationships between requests
Aggregated across queries
2-level normalized AND/OR tree.
8
T1.x=T2.y
T1.w=T3.z
ρ1, 0.08 secs
ρ2, 0.23 secs(left=0.08 secs)
ρ3, 0.45 secs(left=0.23 secs)
ρ5, 0.05 secs
Hash Join
Hash Join
Filter(T3.b=8)
Scan(T3)Scan(T2)
Filter(T1.a=5)
Scan(T1)
Local Transformations
Seek(I1,a=5)
I1(a,x,w)
Requests encode properties of any physical plan rooted at the corresponding operator
Allow cost inferences for varying physical designs without calling the optimizer
Result is upper bound of query cost after true optimization
If cost is 0.02, query is 0.08-0.02 = 0.06
faster
9
Impact of Hypothetical Indexes Single index, single request
Exploits logical information about request Safe inferences on subset of valid plans Only need costs, do not “build” plans
Multiple indexes, multiple requests Analyze all available indexes for each request Exploit AND/OR tree for multiple requests
Measures lower bound in difference between current and original configurations
10
Alerting Algorithm
For each request in T, obtain index that results in best strategy
Repeat while space constraint is not satisfied and improvement
still large enough.
AND/OR tree gathered during original optimization
No additional optimizer calls!
If size between storage bounds and improvement is big enough, save configuration for alert.
Transformations:- Index Merge.- Index Deletion.
11
Upper Bounds Reduce false negatives
Alert if: improvement is at least 25% OR maximum improvement
is 75%
Fast Upper Bounds Track all requests (not only AND/OR tree) Group requests by table Calculate “required work”
Tighter Upper Bounds Add new optimization phase that only
considers viable plans More expensive, but tightest upper bound
12
Handling Updates
Update queries are handled as:(select core) + (update shell) Optimizer instrumentation:
also gathers update information Lower bounds: small changes to main
algorithm (skyline of alternatives, non-monotonic improvement)
Upper bounds: Add necessary work for update shells
13
Experimental Evaluation
Real and synthetic databases Metrics: Execution time and Improvement
Experiments: Monitoring Overhead (server optimization) Diagnostics Overhead (alerting client) Quality of bounds/recommendation
14
Performance
0%
10%
20%
30%
40%
Op
tim
izat
ion
Ove
rhea
d Tighter Mode
Fast Mode
Server Overhead for Upper Bounds (Lower Bound Overhead << 1%)
Client Overhead for lower + upper bounds
TPC-H Database and workloads
15
Varying Workloads
TPC-H workloads W1 (first 11 queries) W2 (last 11 queries) W3 (mix).
Initial design tuned for W1
0%
20%
40%
60%
80%
1000 1500 2000 2500 3000 3500 4000 4500 5000
Configuration Size (MB)
Exp
ecte
d Im
pro
vem
ent W1 follows W0
W2 different from W0W3 is W1 union W2
16
Varying Initial Physical Design
TPC-H database and workloads Ci is recommendation of alerter after
executing the workload under Ci-1
0%
20%
40%
60%
80%
100%
1000 2000 3000 4000 5000 6000
Confi guration Size (MB)
C0 C1 (1.5GB)C2 (2GB) C3 (2.5GB)C4 (3GB) C5 (3.5GB)
17
Conclusions
Alerter fills gap in automatic physical design tools Low server/client overhead, can
monitor/diagnose very efficiently Lower bounds are supported by valid
(applicable) configurations Upper bounds provide additional
flexibility for defining policies
18
0%
25%
50%
75%
100%Q
1
Q3
Q5
Q7
Q9
Q11
Q13
Q15
Q17
Q19
Q21
Per
cent
age
Impr
ovem
ent
Lower Bound Tight Upper Bound Fast Upper Bound
Lower and Upper bounds for improvement
Single-Query Workloads
TPC-H Database and workloads
19
Complex Workloads
0%
25%
50%
75%
100%
1000 3000 5000 7000
Configuration Size (MB)
Exp
ecte
d Im
pro
vem
ent
Lower Bound
Tight Upper Bound
Tuning Tool
0%
25%
50%
75%
100%
720 770 820 870 920
Configuration Size (MB)
Exp
ecte
d Im
pro
vem
ent
Lower Bound
Tighter Upper Bound
Tuning Tool
TPCH
MIRMS