understandung firebird optimizer, by dmitry yemanov (in english)
DESCRIPTION
Understanding Firebird optimizer, by Dmitry Yemanov (in English)TRANSCRIPT
- 1. Understanding Firebird optimizer Dmitry Yemanov [email_address] Firebird Project
2. Optimizer Keypoints
-
- Allow the data to be retrieved in the most efficient way possible
-
- Analyze the existing statistical information
- 3. Inject additional predicates
-
- Order operations by priority
-
- Try different join permutations
-
- Strategies
-
- Rule-based (heuristics)
- 4. Cost-based (statistics)
5. Mixed 6. Optimizer Algorithm
-
- Preparation
-
- Expand views
- 7. Separate predicates: base, parent, missing
8. Distribute equalities 9. Generate index mappings
-
- Main stage
-
- Calculate cost for different join orders
- 10. Choose the best index coverage for the given join order
11. Ensure early predicates evaluation 12. Decide about navigation or sorting 13. Rule-based Approach
-
- Heuristical assumptions
-
- Indexed retrieval is better than a full table scan
- 14. Loop join (indexed) is better than a merge join
-
- Index b-tree has three levels of depth
-
- Compound indices are better than a few simple ones
-
- Drawbacks
-
- Indices could be not really good for some operations
- 15. Not ready for ad hoc queries
16. Cost-based Approach
-
- Key ideas
-
- Every operation has an associated cost value
- 17. Cost is calculated using the statistical information
-
- Cost is aggregated from bottom up in the access path
-
- Drawbacks
-
- Complex implementation
- 18. Slower optimization process
19. Requires up-to-date statistics 20. Basic Terms
-
- Selectivity
-
- Represents a fraction of rows from a row set
- 21. Value range is 0.0 to 1.0
-
- Cardinality
-
- Represents number of rows in a row set
- 22. Base cardinality is a number of rows in a base table
-
- Cost
-
- Represents computational complexity of the retrieval
- 23. Is a function of the estimated cardinalities
24. Linearly depends on the number of logical reads (page fetches) 25. Cost Measurement
-
- Full table scan
-
- cost =
-
- Unique index scan + table scan
-
- cost = + 1
-
- Range index scan + table scan
-
- cost = + N + *
- 26. N represents a number of the leaf pages to be scanned and thus depends on the average key length
27. Cost Aggregation SELECT * FROM T1 JOIN T2 ON T1.PK = T2.FK WHERE T1.VAL < 100 ORDER BY T1.RANK PLAN SORT ( JOIN ( T1 NATURAL, T2 INDEX (FK) ) ) Table T1: base cardinality = 1000 Table T2: base cardinality = 5000 Index FK: selectivity = 0.001 Final Row Set cost = 5000 cardinality = 2500 Sort cost = 5000 cardinality = 2500 Full Scan cost = 1000 cardinality = 1000 Filter cost = 1000 cardinality = 500 Index Scan cost = 7 cardinality = 5 Loop Join cost = 4000 cardinality = 2500 28. Statistics
-
- What is it?
-
- Information that describes data amounts and distribution of values on different levels (table / index / column)
-
- Where is located?
-
- Stored in the database
- 29. Calculated on the fly
-
- How is updated?
-
- By user's request (SET STATITICS)
- 30. On index creation / activation
31. On database restore 32. Core Statistics
-
- Base cardinality (number of rows in a table)
-
- For small tables: number of used record slots on data pages
- 33. For large tables: number of data pages / average record length
34. Estimated at runtime using a page scan
-
- Index selectivity
-
- 1 / number of distinct keys
- 35. Maintained per segment: (A), (A, B), (A, B, C)
36. Uniform value distribution is assumed 37. Stored on the index root page, visible through RDB$INDICES and RDB$INDEX_SEGMENTS 38. Advanced Statistics
-
- Table level
-
- Average page fill factor
- 39. Average record length
-
- Index level
-
- B-tree depth
- 40. Average key length
41. Clustering factor
-
- Column level
-
- Number of NULLs
- 42. Value distribution histograms
43. Clustering Factor Bad Clustering Factor Good Clustering Factor Index Key 1 Index Key 2 Index Key 3 Index Key 5 Index Key 4 Data Page 12 Data Page 25 Data Page 28 Data Page 57 Data Page 44 Data Page 12 Data Page 13 Data Page 14 44. Decisions Based on Statistics
-
- Full table scan vs indexed retrieval
-
- Big selectivity value suggests a full table scan
-
- Order of streams in loop joins
-
- Calculate costs of different stream permutations and choose the cheapest one
-
- Loop join vs merge join
-
- Calculate costs of different stream permutations
-
- Index navigation vs external sorting
-
- Depends on the clustering factor
45. Decisions Based on Statistics (cont'd)
-
- What indices to use
-
- Compare index selectivities and index scan costs
- 46. Estimate how many indices would work best
47. Consider segment operations for compound indices 48. Special handling of different comparisons 49. Calculate selectivities for AND / OR operations 50. Thank you!