db2 t11 query evaluation overview - new mexico state

22
Query Evaluation References: [RG-3ed] Chapter 12, 13, 14, 15 [SKS-6ed] Chapter 12, 13

Upload: others

Post on 21-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Query Evaluation!

References: !q  [RG-3ed] Chapter 12, 13, 14, 15!q  [SKS-6ed] Chapter 12, 13!

2!Database Management Systems II, Huiping Cao!

Query Evaluation!

q Overview!q Optimization!q Measures of Query Cost!

q Sorting !q  Join Operation !q Other Operations!q Query Optimization!

3!Database Management Systems II, Huiping Cao!

Basic Steps in Query Processing!

1. !Parsing and translation!2. !Optimization!3. !Evaluation!

queryoutput

query parser andtranslator

evaluation engine

relational-algebraexpression

execution plan

optimizer

data statisticsabout data

4!Database Management Systems II, Huiping Cao!

Basic Steps in Query Processing!

q  Parsing and translation: translate the query into its internal form. This is then translated into relational algebra.!q  Parser checks syntax, verifies relations!q  Plan: Tree of Relational Algebra operators, with choice

of algorithms for each operator (Operator evaluation!)!q  Optimization!q  Evaluation!

queryoutput

query parser andtranslator

evaluation engine

relational-algebraexpression

execution plan

optimizer

data statisticsabout data

5!Database Management Systems II, Huiping Cao!

Parser stage in PostgreSQL!

q  Two stages!q  Parser!

! Built using the Unix tools yacc and tex!! Check the query string, valid syntax?!!  If syntax is correct, build a parser tree!

q  Transformation process!! Parser tree + semantic interpretation => query tree!

q  Ref: http://www.postgresql.org/docs/8.3/static/parser-stage.html!

6!Database Management Systems II, Huiping Cao!

Basic Steps in Query Processing!

q  Parsing and translation: translate the query into its internal form. This is then translated into relational algebra.!q  Parser checks syntax, verifies relations!q  Plan: Tree of Relational Algebra operators, with choice

of algorithms for each operator (Operator evaluation!)!q  Optimization!q  Evaluation: The query-execution engine takes a query-

evaluation plan, executes that plan, and returns the answers to the query.!

queryoutput

query parser andtranslator

evaluation engine

relational-algebraexpression

execution plan

optimizer

data statisticsabout data

7!Database Management Systems II, Huiping Cao!

Query Evaluation!

q Overview!q Optimization!

q Why is optimization needed?!q Measures of Query Cost!

q Sorting !q  Join Operation !q Other Operations!q Query Optimization!

8!Database Management Systems II, Huiping Cao!

q  Why do we need optimization? !q  A relational algebra expression may have many equivalent

expressions!

! E.g., σsalary<75000(∏salary(instructor)) is equivalent to ∏salary(σsalary<75000(instructor))!

q  Each relational algebra operation can be evaluated using one of several different algorithms!! Correspondingly, a relational-algebra expression can be

evaluated in many ways. !

Optimization!

9!Database Management Systems II, Huiping Cao!

Optimization!

q  Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.!q  E.g., can use an index on salary to find instructors with

salary < 75000,!q  or can perform complete relation scan and discard

instructors with salary ≥ 75000!

10!Database Management Systems II, Huiping Cao!

q  Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost. !q  Cost is estimated using statistical information from the

database catalog!! e.g. number of tuples in each relation, size of tuples, etc.!! Introduce later!

Optimization!

11!Database Management Systems II, Huiping Cao!

Basic Steps in Query Processing (Cont.)!

q  Two main issues in query optimization!q For a given query, what plans are considered?!

q Algorithm to search plan space for cheapest (estimated) plan.!

q How is the cost of a plan estimated?!q  Ideally: Want to find best plan. !q Practically: Avoid worst plans!!

12!Database Management Systems II, Huiping Cao!

Query Evaluation!

q Overview!q Optimization!q Measures of Query Cost!

q System catalog!

q Sorting !q  Join Operation !q Other Operations!q Query Optimization!

13!Database Management Systems II, Huiping Cao!

Measures of Query Cost!

q  Cost is generally measured as total elapsed time for answering query!q  Many factors contribute to time cost!

! disk accesses, CPU, or even network communication!q  Typically disk access is the predominant cost, and is also

relatively easy to estimate. !q  Measured by taking into account (details)!

q  Number of seeks * average-seek-cost!q  Number of blocks read * average-block-read-cost!q  Number of blocks written * average-block-write-cost!

! Cost to write a block is greater than cost to read a block !–  data is read back after being written to ensure that the

write was successful!

14!Database Management Systems II, Huiping Cao!

Measures of Query Cost (Cont.)!

q  Ignore CPU costs for simplicity!q  Real systems do take CPU cost into account!

q  Cost to writing output to disk in our cost formulae!q  Consider or not?!

q  Do not differentiate block read and block write!

15!Database Management Systems II, Huiping Cao!

Measures of Query Cost (Cont.)!

q  Several algorithms can reduce disk IO by using extra buffer space !q  Amount of real memory available to buffer depends on other

concurrent queries and OS processes, known only during execution!

q  We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available!

q  Required data may be buffer resident already, avoiding disk I/O!q  But hard to take into account for cost estimation!

System Catalog!

17!Database Management Systems II, Huiping Cao!

Statistics and Catalogs!

q  Need information about the relations and indexes involved. !q  Catalogs typically contain at least:!

q  # tuples (NTuples) and # pages (NPages) for each relation.!q  # distinct key values (NKeys) and NPages for each index.!q  Index height, low/high key values (Low/High) for each tree

index.!q  Catalogs updated periodically.!

q  Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency is ok.!

q  More detailed information (e.g., histograms of the values in some field) are sometimes stored.!

18!Database Management Systems II, Huiping Cao!

Some Common Techniques!

q  Algorithms for evaluating relational operators using some simple ideas extensively:!q  Indexing: Can use WHERE conditions to retrieve small set

of tuples (selections, joins)!q  Iteration: Sometimes, faster to scan all tuples even if there

is an index. (And sometimes, we can scan the data entries in an index instead of the table itself.)!

q  Partitioning: By using sorting or hashing, we can partition the input tuples and replace an expensive operation by similar operations on smaller inputs.!

q  * Watch for these techniques as we discuss query evaluation!!

19!Database Management Systems II, Huiping Cao!

Access Paths!

q  An access path is a method of retrieving tuples!q  File scan, or index that matches a selection (in the query) !

q  A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key.!q  E.g., Tree index on <a, b, c> matches the selection a=5

AND b=3, and a=5 AND b>6, but not b=3.!

20!Database Management Systems II, Huiping Cao!

Access Paths!

q  An access path is a method of retrieving tuples!q  File scan, or index that matches a selection (in the query) !

q  A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key.!q  E.g., Tree index on <a, b, c> matches the selection a=5

AND b=3, and a=5 AND b>6, but not b=3.!q  A hash index matches (a conjunction of) terms that has a term

attribute = value for every attribute in the search key of the index.!q  E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND

c=5; !q  but it does not match b=3, or a=5 AND b=3, or a>5 AND

b=3 AND c=5.!

21!Database Management Systems II, Huiping Cao!

Selectivity of Access Path!

q  Selectivity of an access path is the number of pages retrieved (index pages and data pages)!

q  Most selective access path is the one that retrieves the fewest pages. !

22!Database Management Systems II, Huiping Cao!

Schema for Examples!

q  Sailors (sid: integer, sname: string, rating: integer, age: real)!q  Reserves (sid: integer, bid: integer, day: dates, rname: string)!

!q  Reserves:!

q  Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.!

q  Sailors:!q  Each tuple is 50 bytes long, 80 tuples per page, 500 pages. !