28-execution plan optimization techniques stroffek kovarik

Upload: sreekanthkata

Post on 10-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    1/49

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    2/49

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    3/49

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    4/49

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    5/49

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    6/49

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    7/49

    Join Graph

    Denition

    A node represents arelationAn edge indicates that acondition was used torestrict records of theconnected relationsArrows might be used toreect outer joins

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    8/49

    Example

    We have the following tables

    Question/Query?What is the salary and cell phone number of my bossesboss?

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    9/49

    Optimizer and Execution Plan

    Lets have a query like thisselect

    bboss.EmpId, sal.Salary,sal.Currency, con.Value

    fromEmployees as emp,

    Employees as boss,Employees as bboss,Salaries as sal,Contacts as con

    whereemp.LastName=MyNameand emp.ReportsTo = boss.EmpId

    and boss.ReportsTo = bboss.EmpIdand bboss.EmpId = sal.EmpIdand con.EmpId = bboss.EmpIdand con.Type = CELL

    How to process the query?

    There are 5 relations tobe joinedIf we do not allow product joins we have 10 optionsWith product joinsallowed we have 60options

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    10/49

  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    11/49

    Execution PlanLeft-Deep Tree Example

    Hash Join (cost=13.84..17.81 rows=1 width=151)Hash Cond: (outer.empid = inner.reportsto) Seq Scan on employees bboss (cost=0.00..3.64 rows=64 width=4) Hash (cost=13.84..13.84 rows=1 width=159)

    Hash Join (cost=10.71..13.84 rows=1 width=159)

    Hash Cond: (outer.empid = inner.reportsto) Seq Scan on contacts con (cost=0.00..2.80 rows=64 width=136)

    Filter: (type = CELL::bpchar) Hash (cost=10.71..10.71 rows=1 width=23)

    Hash Join (cost=7.78..10.71 rows=1 width=23)Hash Cond: (outer.empid = inner.reportsto) Seq Scan on sallaries sal (cost=0.00..2.28 rows=128 width=19) Hash (cost=7.77..7.77 rows=1 width=4)

    Hash Join (cost=3.80..7.77 rows=1 width=4)

    Hash Cond: (outer.empid = inner.reportsto) Seq Scan on employees boss (cost=0.00..3.64 rows=64 width=8) Hash (cost=3.80..3.80 rows=1 width=4)

    Seq Scan on employees emp (cost=0.00..3.80 rows=1 width=4)Filter: (lastname = MyName::bpchar)

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    12/49

    Execution PlanLeft-Deep Tree Example - pgAdmin

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    13/49

    Execution PlanBushy Tree Example

    Nested Loop (cost=1.00..13.17 rows=1 width=151)Join Filter: (inner.empid = outer.empid) Hash Join (cost=1.00..4.13 rows=1 width=155)

    Hash Cond: (outer.empid = inner.empid) Seq Scan on contacts con (cost=0.00..2.80 rows=64 width=136)

    Filter: (type = CELL::bpchar) Hash (cost=1.00..1.00 rows=1 width=19)

    Seq Scan on sallaries sal (cost=0.00..1.00 rows=1 width=19) Nested Loop (cost=0.00..9.02 rows=1 width=8)

    Join Filter: (outer.reportsto = inner.empid) Nested Loop (cost=0.00..6.01 rows=1 width=4)

    Join Filter: (outer.reportsto = inner.empid) Seq Scan on employees emp (cost=0.00..3.00 rows=1 width=4)

    Filter: (lastname = MyName::bpchar) Seq Scan on employees boss (cost=0.00..3.00 rows=1 width=8)

    Seq Scan on employees bboss (cost=0.00..3.00 rows=1 width=4)

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    14/49

    Execution PlanBushy Tree Example - pgAdmin

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    15/49

    Rule Based Optimization

    Deterministic - produce the same result at every callfor the same queryMostly produces a plan based on a join graphDoes not search through the large spaceThus it is relatively fastDoes not nd the best plan in most casesMany approaches how to nd the plan based on a

    use case and database schemaDeprecated Approach

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    16/49

    Cost Based Optimization

    Evaluate every execution plan by its costUsing relation statistics to calculate estimates andcost

    Exponential search space sizeHave to use a proper method for searching the spaceFinds the best plan for a small number of relationsNot possible to nd the best plan for too manyrelations

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    17/49

    Deterministic Algorithms

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    18/49

    Dynamic Programming

    Checks all possible orders of relationsApplicable only for small number of relationsPioneered by IBMs System R database projectIteratively creates and computes costs for everycombination of relations. Start with an access to anevery single relation. Later, compute the cost of joinsof every two relations, afterwards do the same for

    more relations.

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    19/49

    Dynamic ProgrammingExample

    Four relations to be joined A, B, C and D.

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    20/49

    Dijkstras algorithmDescription

    A graph algorithm which nds the shortest path from asource node to any destination node in a weightedoriented graph

    A graph has to have a non-negative weight functionon edgesThe weight of the path is the sum of weights on edgeson the path

    QuestionHow can we manage that the path will go through all thenodes/relations in a join graph?

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    21/49

    Dijkstras algorithmDescription II

    We will search for the path in a join graph directly but wewill search for a shortest path in a different graph.

    Nodes all possible subsets of relations to be joined

    Edges directed; connect every node to all othernodes where it is possible to go by performingexactly one joinBeing in a node on the path means that thecorresponding relations are joinedGoing through the edge means that we areperforming a join

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    22/49

    Dijkstras algorithmGraph Example

    We are joining four relations - A, B, C and D.

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    23/49

    Dijkstras algorithmDescription III

    For each node V we will maintainThe lowest cost c (V ) of already found path to achieve

    the nodeThe state of the node open/closed

    We will also maintain a set of nodes M which will holdopen nodes

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    24/49

    A* Search Algorithm

    Modication of Dijkstras algorithmIntroduces a heuristic function h (V ) which calculatesthe lower bound of the path cost from the node V to

    the destination nodeWe will choose the node to be processed withminimal c (V ) + h (V ) and mark it as closedIf we have better heuristic function we do not have

    to go through the whole space

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    25/49

    A* Search AlgorithmPossible Heuristic Functions

    The number of tuples in the current nodeReduces paths where too many tuples are prodcedApplicable only in materialization nodes

    The number of tuples that need to be fetched from theremaining relationsA realistic lower boundMight be extended to get a lower bound of the cost ofremaining joins

    Any other approximation?

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    26/49

    Nearest Neighbor

    1. Lets start in the rst node2. Chose an edge with the lowest cost going to an

    unreached node3. Repeat the previous step until there is any unvisited

    node4. If the path found has lower cost than the best path

    already known remember the path5. Choose the next node as a starting one and continue

    with 2nd step

    It is possible to check all the paths of the constant lengthinstead of only edges.

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    27/49

    Nearest Neighbor for Bushy Trees

    1. Build a set of nodes with an access to every relation2. Choose the cheapest join from all the combinations of

    possible joins of any two relations from the set

    3. Add the result node to the set of relations to be joined4. Remove the originating relations from the set5. Go back to 2nd step if we still have more than one

    relation

    It is also possible to look forward for more joins at once.

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    28/49

    Non-Deterministic Algorithms

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    29/49

    Random Walk

    Randomly goes through the search space

    1. Set the MAX VALUE as a cost for the best knownpath

    2. Choose a random path3. If the cost of the chosen path is better than the

    currently best known remember the path as the best4. Go back to step 2

    Very naive, not very usable in practise but can be used forcomparison with other algorithms

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    30/49

    Simulated AnnealingIt is an analogy with physics a cooling process ofa crystal.

    A crystal has lots of energy and small random changesare happening in its structure. These small changes are

    more stable if the crystal emits the energy afterwards.An execution plan has a high cost at the begining. A smallrandom change is generated in every iteration of theprocess. Depending upon the cost change the probability

    whether the change is preserved or discarded iscomputed. The change is kept with the computedprobability.

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    31/49

    Hill-Climbing

    1. Start with a random path2. Check all the neighbor paths and nd the one with the

    lowest cost3. If the cost of the best neighboring path is lower than

    the actual one change the actual one to that neighbor4. Go back to 2nd step if we found a better path in the

    previous step

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    32/49

    DemoTravelling Salesman Problem

    A travelling salesman has N cities which he needs to visit.He would like to nd the shortest path going through allthe cities.

    Similar to the execution plan optimization problemThe evaluation function has different properties

    The evaluation of an edge between two cities does notdepend on the cities already visitedDifferent behavior of heuristic algorithms

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    33/49

    Implemented Algorithms

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    34/49

    Implemented Algorithms

    Number of algorithms for optimization of large joinqueries was described and studied, e.g. in:

    M. Steinbrunn, G. Moerkotte, A. Kemper. Heuristic andRandomized Optimization for the Join Ordering Problem. InVLDB Journal, 6(3), 1997Ioannidis, Y. E. and Kang, Y. 1990. Randomized algorithmsfor optimizing large join queries. In Proceedings of the 1990ACM SIGMOD international Conference on Management ofData

    Performance evaluation and comparison of thesealgorithms is availableImplementation is often experimental and only forpurposes of the given research

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    35/49

    Studied algorithms

    We will look at the following algorithms in more detail

    Genetic Algorithm

    Simulated AnnealingIterative ImprovementTwo Phase OptimizationToured Simulated Annealing

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    36/49

    Genetic Algorithm

    Well known to the PostgreSQL community - GEQO -GEnetic Query OptimizerFundamentally different approach designed tosimulate biological evolutionWorks always on a set of solutions called populationSolutions are represented as character strings byappropriate encoding.

    In our case query processing trees

    Quality, or tness of the solution is measured by anobjective function

    Evaluation cost of processing tree, has to be minimized

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    37/49

    Genetic AlgorithmSteps

    Zero population of random character strings isgeneratedEach next generation is obtained by a combination of

    Selection - Selected ttest members of the populationsurvive to the next generationCrossover - Selected ttest members of the population arecombined producing an offspringMutation - Certain fraction of the population is randomlyaltered (mutated)

    This is repeated untilthe population has reached desired qualitypredetermined number of populations has been generatedno improvement has been detected for several generations

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    38/49

    Simulated Annealing

    Doesnt have to improve on every moveDoesnt get trapped in local minimum so easilyExact behavior determined by parameters:

    starting temperature (e.g. function of node cost)temperature reduction (e.g. 90% of the old value)stopping condition (e.g. no improvement in certain numberof temperature reductions)

    Performance of the algorithm is dependent on thevalues of these parameters

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    39/49

    Simulated AnnealingDetermining neighbor states

    Join method choiceLeft-deep processing trees solution spacerepresented by an ordered list of relations to be joined

    Swap exchange position of two relations in the list3Cycle Cyclic rotation of three relations in the list

    Complete solution space including bushy treesJoin commutativity A B B AJoin associativity (A B) C A (B C)

    Left join exchange (A B) C

    (A C) BRight join exchange A (B C) B (A C)

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    40/49

    Iterative Improvement

    Strategy similar to Hill climbing algorithmOvercomes the problem of reaching local minimumSteps of the algorithm

    Select random starting pointChoose random neighbor, if the cost is lower than thecurrent node, carry out the moveRepeat step 2 until local minimum is reachedRepeat all steps until stopping condition is metReturn the local minimum with the lowest cost

    Stopping condition can be processing xed number ofstarting points or reaching the time limit

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    41/49

    Two Phase Optimization

    Combination of SA and II, uses advantages of bothRapid local minimum discoverySearch of the neighborhood even with uphill moves

    Steps of the algorithmPredened number of starting points selected randomlyLocal minima sought using Iterative improvementLowest of these minima is used as a starting point ofsimulated annealing

    Initial temperature of the SA is lower, since only smallproximity of the minimum needs to be covered

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    42/49

    Toured Simulated Annealing

    Simulated annealing algorithm is run several timeswith different starting pointsStarting points are generated using somedeterministic algorithm with reasonable heuristicThe initial temperature is much lower than with thegeneric simulated annealing algorithmBenets over plain simulated annealing

    Covering different parts of the search spaceReduced running time

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    43/49

    Experimental Performance

    Comparision

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    44/49

    Experimental Performance ComparisonWhat to consider

    Evaluation plan quality vs. running timeDifferent join graph types

    ChainStarCycleGrid

    Left-deep tree optimization vs. Bushy treeoptimization

    Algorithms involvedGenetic algorithm (Bushy genetic algorithm)Simulated annealing, Iterative improvement, Two phaseoptimization

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    45/49

    Experimental Performance ComparisonResults

    Bushy tree optimization yields better resultsespecially for chain and cycle join graphs.

    If solution quality is preferred, 2PO achieves betterresults (although slightly sacricing the running time).Iterative improvement and Genetic algorithm aresuitable in case the running time is more important.

    Pure Simulated annealing requires higher runningtime without providing better solutions.

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    46/49

    Demo

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    47/49

    Conclusions

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    48/49

    Conclusions

    We have presented an overview of algorithms usablefor query optimizationChoosing the best algorithm is difcultPostgreSQL has conguration parameters for GEQO

    ThresholdPopulation sizeNumber of populationsEffort based approach

    More congurable optimizer with different algorithms

    OLAP - would prefer better plan optimizationOLTP - would prefer faster plan optimization

    http://find/http://goback/
  • 8/8/2019 28-Execution Plan Optimization Techniques Stroffek Kovarik

    49/49

    11

    J ulius [email protected]@stroffek.net

    Tom a s Kova r [email protected]

    http://find/http://goback/