understandung firebird optimizer, by dmitry yemanov (in english)

Download Understandung Firebird optimizer, by Dmitry Yemanov (in English)

If you can't read please download the document

Upload: nataly-polyanskaya

Post on 16-Jun-2015

2.137 views

Category:

Technology


0 download

DESCRIPTION

Understanding Firebird optimizer, by Dmitry Yemanov (in English)

TRANSCRIPT

  • 1. Understanding Firebird optimizer Dmitry Yemanov [email_address] Firebird Project

2. Optimizer Keypoints

    • Allow the data to be retrieved in the most efficient way possible
    • Analyze the existing statistical information
  • 3. Inject additional predicates
    • Order operations by priority
    • Try different join permutations
    • Strategies
    • Rule-based (heuristics)
  • 4. Cost-based (statistics)

5. Mixed 6. Optimizer Algorithm

    • Preparation
    • Expand views
  • 7. Separate predicates: base, parent, missing

8. Distribute equalities 9. Generate index mappings

    • Main stage
    • Calculate cost for different join orders
  • 10. Choose the best index coverage for the given join order

11. Ensure early predicates evaluation 12. Decide about navigation or sorting 13. Rule-based Approach

    • Heuristical assumptions
    • Indexed retrieval is better than a full table scan
  • 14. Loop join (indexed) is better than a merge join
    • Index b-tree has three levels of depth
    • Compound indices are better than a few simple ones
    • Drawbacks
    • Indices could be not really good for some operations
  • 15. Not ready for ad hoc queries

16. Cost-based Approach

    • Key ideas
    • Every operation has an associated cost value
  • 17. Cost is calculated using the statistical information
    • Cost is aggregated from bottom up in the access path
    • Drawbacks
    • Complex implementation
  • 18. Slower optimization process

19. Requires up-to-date statistics 20. Basic Terms

    • Selectivity
    • Represents a fraction of rows from a row set
  • 21. Value range is 0.0 to 1.0
    • Cardinality
    • Represents number of rows in a row set
  • 22. Base cardinality is a number of rows in a base table
    • Cost
    • Represents computational complexity of the retrieval
  • 23. Is a function of the estimated cardinalities

24. Linearly depends on the number of logical reads (page fetches) 25. Cost Measurement

    • Full table scan
    • cost =
    • Unique index scan + table scan
    • cost = + 1
    • Range index scan + table scan
    • cost = + N + *
  • 26. N represents a number of the leaf pages to be scanned and thus depends on the average key length

27. Cost Aggregation SELECT * FROM T1 JOIN T2 ON T1.PK = T2.FK WHERE T1.VAL < 100 ORDER BY T1.RANK PLAN SORT ( JOIN ( T1 NATURAL, T2 INDEX (FK) ) ) Table T1: base cardinality = 1000 Table T2: base cardinality = 5000 Index FK: selectivity = 0.001 Final Row Set cost = 5000 cardinality = 2500 Sort cost = 5000 cardinality = 2500 Full Scan cost = 1000 cardinality = 1000 Filter cost = 1000 cardinality = 500 Index Scan cost = 7 cardinality = 5 Loop Join cost = 4000 cardinality = 2500 28. Statistics

    • What is it?
    • Information that describes data amounts and distribution of values on different levels (table / index / column)
    • Where is located?
    • Stored in the database
  • 29. Calculated on the fly
    • How is updated?
    • By user's request (SET STATITICS)
  • 30. On index creation / activation

31. On database restore 32. Core Statistics

    • Base cardinality (number of rows in a table)
    • For small tables: number of used record slots on data pages
  • 33. For large tables: number of data pages / average record length

34. Estimated at runtime using a page scan

    • Index selectivity
    • 1 / number of distinct keys
  • 35. Maintained per segment: (A), (A, B), (A, B, C)

36. Uniform value distribution is assumed 37. Stored on the index root page, visible through RDB$INDICES and RDB$INDEX_SEGMENTS 38. Advanced Statistics

    • Table level
    • Average page fill factor
  • 39. Average record length
    • Index level
    • B-tree depth
  • 40. Average key length

41. Clustering factor

    • Column level
    • Number of NULLs
  • 42. Value distribution histograms

43. Clustering Factor Bad Clustering Factor Good Clustering Factor Index Key 1 Index Key 2 Index Key 3 Index Key 5 Index Key 4 Data Page 12 Data Page 25 Data Page 28 Data Page 57 Data Page 44 Data Page 12 Data Page 13 Data Page 14 44. Decisions Based on Statistics

    • Full table scan vs indexed retrieval
    • Big selectivity value suggests a full table scan
    • Order of streams in loop joins
    • Calculate costs of different stream permutations and choose the cheapest one
    • Loop join vs merge join
    • Calculate costs of different stream permutations
    • Index navigation vs external sorting
    • Depends on the clustering factor

45. Decisions Based on Statistics (cont'd)

    • What indices to use
    • Compare index selectivities and index scan costs
  • 46. Estimate how many indices would work best

47. Consider segment operations for compound indices 48. Special handling of different comparisons 49. Calculate selectivities for AND / OR operations 50. Thank you!