robert haas query planning gone wrong presentation @ postgres open

Click to edit Master subtitle style

1

Query Planning Gone Wrong

Robert Haas

Chief Architect, Database Server

Why This Talk?

● 2010: The PostgreSQL Query Planner (Robert Haas)

● How does the query planner actually work from a user perspective? What does it really do?

● Very common audience question: What do I do when the query planner fails? How do I fix my query?

● 2011: Hacking the Query Planner (Tom Lane)

● How does the query planner actually work from a developer perspective? What does it *really* do?

● Plea for help to improve the query planner.

● But... what should we be improving?

Methodology: Which Problems Matter?

● Read hundreds of email threads on pgsql-performance over a period of almost two years.

● Disregarded all those that were not about query performance problems.

● Decided what I thought the root cause (or, occasionally, causes) of each complaint was.

● Skipped a very small number where I couldn't form an opinion.

● Counted the number of times each problem was reported.

Methodology: Possible Critiques

● The problems reported on pgsql-performance aren't necessarily representative of all the problems PostgreSQL users encounter (reporting bias).

● In particular, confusing problems might be more likely to be reported.

● I might not have correctly identified the cause of each problem (researcher bias).

● Others?

Statistically Speaking, Why Is My Query Slow? (168)

● Settings (23). Includes anything you can fix with postgresql.conf changes, DDL, or operating systems settings changes.

● Just Plain Slow (23). Includes anything that amounts to an unreasonable expectation on the part of the user. These are often questions of the form “why is query A slower than query B?” when A is actually doing something much more expensive than B.

● We're Bad At That (22). Includes anything that could be faster in some other database product, but isn't fast in PostgreSQL for some reason (not implemented yet, or architectural artifact).

● Planner Error (83). Bad decisions about the cost of one plan vs. another plan due to limitations of the optimizer.

● Bugs (14). Bugs in the query planner, or in one case, the Linux kernel.

● User Error (3). User got confused and did something illogical.

Settings (23)

● Planner Cost Constants (8). Adjustments needed to seq_page_cost, random_page_cost, and perhaps cpu_tuple_cost to accurately model real costs.

● Missing Index (4)

● Cost for @@ Operator Is Too Low (2)

● work_mem Too Low (2)

● Statistics Target Too Low (2)

● Statistics Target Too High (1)

● n_distinct Estimates Aren't Accurate On Large Tables (1)

● Not Analyzing Tables Often Enough (1)

● TOAST Decompression is Slow (1)

● vm.zone_reclaim_mode = 1 Causes Extra Disk I/O (1)

Just Plain Slow (23)

● It Takes a While to Process a Lot of Data (6)

● Disks Are Slower Than Memory (6)

● Clauses Involving Multiple Tables Can't Be Pushed Down (2)

● Random I/O is Slower Than Sequential I/O (1)

● Linearly Scanning an Array is O(n) (1)

● One Regular Expression is Faster Than Two (1)

● Can't Figure Out Which Patterns Match a String Without Trying Them All (1)

● xmlagg Is Much Slower Than string_agg (1)

● Scanning More Tables is Slower Than Scanning Fewer Tables (1)

● Replanning Isn't Free (1)

● Repeated Concatenation Using xmlconcat Is Slow (1)

● UNION is Slower than UNION ALL (1)

We're Bad At That (22)

● Plan Types We Can't Generate (11)

● Parameterized Paths (7). Two of these are post-9.2 complaints, involving cases where 9.2 can't parameterize as needed.

● Merge Append (3). Fixed in 9.1.

● Batched Sort of Data Already Ordered By Leading Columns (1).

● Executor Limitations (3)

● Indexing Unordered Data Causes Random I/O (1)

● <> is Not Indexable (1)

● DISTINCT + HashAggregate Reads All Input Before Emitting Any Results (1). This matters if there is a LIMIT.

● Architecture (8)

● No Parallel Query (2), Table Bloat (1), Backend Startup Cost (1), Redundant Updates Are Expensive (1), AFTER Trigger Queue Size (1), On-Disk Size of numeric (1), Autovacuum Not Smart About Inherited Tables (1)

Planner Errors (83)

● Any guesses?

Planner Errors (83)

● Conceptual Errors (28). The planner isn't able to recognize that two different queries are equivalent, so it doesn't even consider the best plan.

● Estimation Errors (55). The planner considers the optimal plan, but rejects it as too expensive.

● Row Count Estimation Errors (48). The planner mis-estimates the number of rows that will be returned by some scan, join, or aggregate.

● Cost Estimation Errors (7). The planner estimates the row count correctly but incorrectly estimates the relative cost.

Grand Prize Winners

● Selectivity of filter conditions involving correlated columns is estimated inaccurately (13)

● Suppose we want all the rows from a table where a = 1 and b = 1 and c = 1 and d = 1 and e = 1. The planner must estimate the number of rows that will match, but only has statistics on each column individually.

● Planner incorrectly thinks that “SELECT * FROM foo WHERE a = 1 ORDER BY b LIMIT n” will fill the limit after reading a small percentage of the index (11)

● It can scan an index on b and filter for rows where a = 1.

● Or it can scan an index on a, find all rows where a = 1, and perform a top-N sort.

● It often prefers the former when the latter would be faster.

● Can often be worked around with a composite or functional index.

Planner Error: Row Count Estimation – Others (24)

● Using WITH Results in a Bad Plan (5). Some of these are query fattening issues, while others result from failure to dig out variable statistics.

● Generic Plans Can Have Wildly Wrong Estimates (4). Improved.

● Selectivity Estimates on Arbitrary Estimates are Poor (4)

● Join Selectivity Doesn't Know about Cross-Table Correlations (3)

● Uncommitted Tuples Don't Affect Statistics (2)

● No Stats for WITH RECURSIVE (1) or GROUP BY (1) Results

● Redundant Equality Constraints Not Identified As Such (1)

● IN/NOT IN Estimation Doesn't Assume Array Elements Distinct (1). Fixed.

● Histogram Bounds Can Slide Due to New Data (1). Fixed.

● Inheritance Parents Aren't Assumed to be Completely Empty (1). Fixed.

Planner Error: Cost Estimation (7)

● Planner doesn't account for de-TOASTing cost (4)

● Plan change causes volume of data to exceed server memory (2)

● Hash join sometimes decides to hash the larger table when it should probably be hashing the smaller one (1)

Planner Error: Conceptual (28)

● Cross-data type comparisons are not always indexable (3)

● Inlining the same thing multiple times can lose (3)

● NOT IN is hard to optimize – and we don't try very hard (3)

● Target lists are computed too early or unnecessary targets are computed (3)

● Can't rewrite SELECT max(a) FROM foo WHERE b IN (…) as max of index scans (2)

● Can't rearrange joins and aggregates relative to one another (2)

● Can't deduce implied inequalities (2)

● Ten other issues that came up once each

Thank You

● Any questions?

robert haas query planning gone wrong presentation @ postgres open

Technology