understandung firebird optimizer, by dmitry yemanov (in english)

1. Understanding Firebird optimizer Dmitry Yemanov [email_address] Firebird Project

2. Optimizer Keypoints

Allow the data to be retrieved in the most efficient way possible

Analyze the existing statistical information

3. Inject additional predicates

Order operations by priority

Try different join permutations

Strategies

Rule-based (heuristics)

4. Cost-based (statistics)

5. Mixed 6. Optimizer Algorithm

Preparation

Expand views

7. Separate predicates: base, parent, missing

8. Distribute equalities 9. Generate index mappings

Main stage

Calculate cost for different join orders

10. Choose the best index coverage for the given join order

11. Ensure early predicates evaluation 12. Decide about navigation or sorting 13. Rule-based Approach

Heuristical assumptions

Indexed retrieval is better than a full table scan

14. Loop join (indexed) is better than a merge join

Index b-tree has three levels of depth

Compound indices are better than a few simple ones

Drawbacks

Indices could be not really good for some operations

15. Not ready for ad hoc queries

16. Cost-based Approach

Key ideas

Every operation has an associated cost value

17. Cost is calculated using the statistical information

Cost is aggregated from bottom up in the access path

Drawbacks

Complex implementation

18. Slower optimization process

19. Requires up-to-date statistics 20. Basic Terms

Selectivity

Represents a fraction of rows from a row set

21. Value range is 0.0 to 1.0

Cardinality

Represents number of rows in a row set

22. Base cardinality is a number of rows in a base table

Represents computational complexity of the retrieval

23. Is a function of the estimated cardinalities

24. Linearly depends on the number of logical reads (page fetches) 25. Cost Measurement

Full table scan

cost =

Unique index scan + table scan

cost = + 1

Range index scan + table scan

cost = + N + *

26. N represents a number of the leaf pages to be scanned and thus depends on the average key length

27. Cost Aggregation SELECT * FROM T1 JOIN T2 ON T1.PK = T2.FK WHERE T1.VAL < 100 ORDER BY T1.RANK PLAN SORT ( JOIN ( T1 NATURAL, T2 INDEX (FK) ) ) Table T1: base cardinality = 1000 Table T2: base cardinality = 5000 Index FK: selectivity = 0.001 Final Row Set cost = 5000 cardinality = 2500 Sort cost = 5000 cardinality = 2500 Full Scan cost = 1000 cardinality = 1000 Filter cost = 1000 cardinality = 500 Index Scan cost = 7 cardinality = 5 Loop Join cost = 4000 cardinality = 2500 28. Statistics

What is it?

Information that describes data amounts and distribution of values on different levels (table / index / column)

Where is located?

Stored in the database

29. Calculated on the fly

How is updated?

By user's request (SET STATITICS)

30. On index creation / activation

31. On database restore 32. Core Statistics

Base cardinality (number of rows in a table)

For small tables: number of used record slots on data pages

33. For large tables: number of data pages / average record length

34. Estimated at runtime using a page scan

Index selectivity

1 / number of distinct keys

35. Maintained per segment: (A), (A, B), (A, B, C)

36. Uniform value distribution is assumed 37. Stored on the index root page, visible through RDB$INDICES and RDB$INDEX_SEGMENTS 38. Advanced Statistics

Table level

Average page fill factor

39. Average record length

Index level

B-tree depth

40. Average key length

41. Clustering factor

Column level

Number of NULLs

42. Value distribution histograms

43. Clustering Factor Bad Clustering Factor Good Clustering Factor Index Key 1 Index Key 2 Index Key 3 Index Key 5 Index Key 4 Data Page 12 Data Page 25 Data Page 28 Data Page 57 Data Page 44 Data Page 12 Data Page 13 Data Page 14 44. Decisions Based on Statistics

Full table scan vs indexed retrieval

Big selectivity value suggests a full table scan

Order of streams in loop joins

Calculate costs of different stream permutations and choose the cheapest one

Loop join vs merge join

Calculate costs of different stream permutations

Index navigation vs external sorting

Depends on the clustering factor

45. Decisions Based on Statistics (cont'd)

What indices to use

Compare index selectivities and index scan costs

46. Estimate how many indices would work best

47. Consider segment operations for compound indices 48. Special handling of different comparisons 49. Calculate selectivities for AND / OR operations 50. Thank you!

understandung firebird optimizer, by dmitry yemanov (in english)

Technology