mysql 5.7: what’s new in the optimizer? - percona€¦ · mysql optimizer team, oracle september,...

48
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Manyi Lu, Olav Sandstaa MySQL Optimizer Team, Oracle September, 2015 MySQL 5.7: What’s New in the Optimizer?

Upload: lyhanh

Post on 06-Jun-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Copyright © 2014, Oracle and/or its affiliates. All rights reserved.

Manyi Lu, Olav Sandstaa MySQL Optimizer Team, Oracle September, 2015

MySQL 5.7: What’s New in the Optimizer?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Optimizer

SELECT a, b FROM t1, t2, t3 WHERE t1.a = t2.b AND t2.b = t3.c AND t2.d > 20 AND t2.d < 30;

MySQL Server

Cost based optimizations

Heuristics

Cost Model

Op

tim

izer

Table/index info (data dictionary)

Statistics (storage engines)

t2 t3

t1

Table scan

Range scan

Ref access

JOIN

JOIN

Pars

er

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Optimizer: Design Principles

• Best out of the box performance

• Easy to use, minimum tuning needed

• When you need to understand: explain and trace

• Flexibility through optimizer hints, switches, and plugins

• Fast evolving

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Optimizer Improvements

• Generated columns and functional indexes

• New JSON datatype and functions

• Improved cost model

• New hint syntax and improved hint support

• Query rewrite plugin

• UNION ALL queries do not always use temporary tables

• Improved optimizations for queries with IN expressions

• Explain on a running query

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Generated Columns

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Generated Columns

• Column generated from the expression

• VIRTUAL: computed when read, not stored, indexable

• STORED: computed when inserted/updated, stored in SE, indexable

• Useful for: – Functional index

– Materialized cache for complex conditions

– Simplify query expression

CREATE TABLE order_lines (orderno integer, lineno integer, price decimal(10,2), qty integer, sum_price decimal(10,2) GENERATED ALWAYS AS (qty * price) STORED );

Kodus to Andrey Zhakov for his contribution!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Functional Index

• Online index creation

• Composite index on a mix of ordinary, virtual and stored columns

CREATE TABLE order_lines (orderno integer, lineno integer, price decimal(10,2), qty integer, sum_price decimal(10,2) GENERATED ALWAYS AS (qty * price) VIRTUAL); ALTER TABLE order_lines ADD INDEX idx (sum_price);

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Generated column: STORED vs VIRTUAL

• STORED –Requires table rebuild at creation

–Updating table data at INSERT/UPDATE

–Fast retrieval

9

• VIRTUAL –Metadata change only, instant

–Faster INSERT/UPDATE, no change to table

–Compute when read

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Indexing Generated column: STORED vs VIRTUAL (InnoDB)

• STORED –Primary & secondary index

–B-TREE, Full text search, R-TREE

–Duplication of data in base table and index

– Independent of SE

–Online operation

10

• VIRTUAL –Secondary index only

–B-TREE only

–Less storage

–Requires SE support

–Online operation

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

JSON support

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

JSON Support

• Seemless integration of relational and schema-less data

• Leverage existing database infrastructure for new applications

• Provide a native JSON datatype

• Provide a set of built-in JSON functions

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

New JSON Datatype

• You can always store JSON data in TEXT or VARCHAR column, but

• native JSON datatype has many benefits: – Validation on insert

– Efficient access

CREATE TABLE employees (data JSON); INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}'); INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}');

SELECT * FROM employees; +-------------------------------------+ | data | +-------------------------------------+ | {"id": 1, "name": "Jane"} | | {"id": 2, "name": "Joe"} | +-------------------------------------+ 2 rows in set (0,00 sec)

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

New functions to handle JSON data • Info

– JSON_VALID()

– JSON_TYPE()

– JSON_KEYS()

– JSON_LENGTH()

– JSON_DEPTH()

– JSON_CONTAINS_PATH()

– JSON_CONTAINS()

14

• Modify

– JSON_REMOVE()

– JSON_ARRAY_APPEND()

– JSON_SET()

– JSON_INSERT()

– JSON_ARRAY_INSERT()

– JSON_REPLACE()

• Create

– JSON_MERGE()

– JSON_ARRAY()

– JSON_OBJECT()

• Get data

– JSON_EXTRACT()

– JSON_SEARCH()

• Helper

– JSON_QUOTE()

– JSON_UNQUOTE()

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Inlining JSON PATH Expressions in SQL

[[database.]table.]column->”$<path spec>” (GA version)

SELECT * FROM employees WHERE data->”$.name” = “Jane"; Is a short hand for SELECT * FROM employees WHERE JSON_EXTRACT(data, “$.name” ) = “Jane”;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Indexing JSON data

• Use Functional Indexes,

• STORED and VIRTUAL types are supported

16

CREATE TABLE employees (data JSON); ALTER TABLE employees ADD COLUMN name VARCHAR(30) AS (JSON_UNQUOTE(data->”$.name”)) VIRTUAL, ADD INDEX name_idx (name);

• Functional index approach

• Use JSON_EXTRACT to specify field to be indexed

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Using Real Life Data

• Via SF OpenData

• 206K JSON objects representing subdivision parcels.

• Imported from https://github.com/zemirco/sf-city-lots-json + small tweaks

17

CREATE TABLE features ( id INTEGER NOT NULL auto_increment primary key, feature JSON NOT NULL );

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 18

{

"type":"Feature",

"geometry":{

"type":"Polygon",

"coordinates":[

[

[-122.42200352825247,37.80848009696725,0],

[-122.42207601332528,37.808835019815085,0],

[-122.42110217434865,37.808803534992904,0],

[-122.42106256906727,37.80860105681814,0],

[-122.42200352825247,37.80848009696725,0]

]

]

},

"properties":{

"TO_ST":"0",

"BLKLOT":"0001001",

"STREET":"UNKNOWN",

"FROM_ST":"0",

"LOT_NUM":"001",

"ST_TYPE":null,

"ODD_EVEN":"E",

"BLOCK_NUM":"0001",

"MAPBLKLOT":"0001001"

}

}

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Naive Performance Comparison

19

# as JSON type SELECT DISTINCT feature->"$.type" as json_extract FROM features; +------------------+ | json_extract | +------------------+ | "Feature" | +-----------------+ 1 row in set (1.25 sec)

Unindexed traversal of 206K documents

# as TEXT type SELECT DISTINCT feature->"$.type" as json_extract FROM features; +------------------+ | json_extract | +------------------+ | "Feature" | +------------------+ 1 row in set (12.85 sec)

Explanation: Binary format of JSON type is very efficient at searching. Storing as TEXT performs over 10x worse at traversal.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Create Index

20

ALTER TABLE features ADD feature_type VARCHAR(30) AS (feature->"$.type") VIRTUAL; Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 ALTER TABLE features ADD INDEX (feature_type); Query OK, 0 rows affected (0.73 sec) Records: 0 Duplicates: 0 Warnings: 0 SELECT DISTINCT feature_type FROM features; +-------------------+ | feature_type | +-------------------+ | "Feature" | +-------------------+ 1 row in set (0.06 sec)

From table scan on 206K documents to index scan on 206K materialized values

Down from 1.25 sec to 0.06 sec

Creates index online. Does not modify table rows.

Meta data change only (FAST). Does not need to touch table.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

JSON Path Search

• Provides a novice way to know the path. To retrieve via: [[database.]table.]column->"$<path spec>" (GA VERSION ONLY)

21

SELECT JSON_SEARCH(feature,'one', 'MARKET') AS extract_path FROM features WHERE id = 121254; +--------------------------------+ | extract_path | +--------------------------------+ | "$.properties.STREET" | +--------------------------------+ 1 row in set (0.00 sec)

SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254; +-----------------------+ | property_street | +-----------------------+ | "MARKET" | +-----------------------+ 1 row in set (0.00 sec)

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

JSON Array Creation

22

SELECT JSON_ARRAY(id, feature->"$.properties.STREET", feature->"$.type") AS json_array FROM features ORDER BY RAND() LIMIT 3; +------------------------------------------+ | json_array | +------------------------------------------+ | [65298, "10TH", "Feature"] | | [122985, "08TH", "Feature"] | | [172884, "CURTIS", "Feature"] | +------------------------------------------+ 3 rows in set (2.66 sec)

Evaluates a (possibly empty) list of values and returns a JSON array containing those values

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Cost Model

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Motivation for Changing the Cost Model

• Adopt to new hardware architectures

– SSD, larger memories, caches

• Allows storage engines to provide accurate and dynamic cost estimate

– Whether the data is in RAM, SSD, HDD?

• More maintainable cost model implementation – Avoid hard coded constants

– Refactoring of existing cost model code

• Tunable/configurable

• Replace heuristics with cost based decisions

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Cost Model: Main Focus in 5.7

Address the following pain points in current cost model:

• Inaccurate record estimation for JOIN

Solution: condition filtering

• Hard-coded cost constants

Solution: adjustable cost parameters

• Imprecise cardinality/records per key estimates from SE

Solution: Integer value replaced by floating point

• Hard to obtain detailed cost numbers

Solution: Added to JSON explain and MySQL WorkBench

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.6: Record Estimates for JOIN

• t1 JOIN t2

• Total cost = cost (access method t1) + Prefix_rows_t1 * cost (access method t2)

• Prefix_rows_t1 is records read by t1

– Overestimation if where conditions apply!->Suboptimial join order

Without condition filtering

t1 t2

Acc

ess

Met

ho

d

Prefix_rows_t1 Number of records read

from t1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Improved Record Estimates for JOIN

• t1 JOIN t2

• Prefix_rows_t1 Takes into account the entire query condition

– More accurate record estimate -> improved JOIN order

Condition filter

t1 t2

Acc

ess

Met

ho

d

Number of records read

from t1

Co

nd

itio

n f

ilter

Prefix_rows_t1 Records passing the table

conditions on t1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• 10 000 rows in the emp table • 100 rows in the office table • 100 rows with first_name=”John” AND hire_date BETWEEN “2014-01-

01″ AND “2014-06-01″

MySQL 5.7 Improved Record Estimates for JOIN

CREATE TABLE emp ( id INTEGER NOT NULL PRIMARY KEY, office_id INTEGER NOT NULL, first_name VARCHAR(20), hire_date DATE NOT NULL, KEY office (office_id) ) ENGINE=InnoDB;

CREATE TABLE office ( id INTEGER NOT NULL PRIMARY KEY, officename VARCHAR(20) ) ENGINE=InnoDB;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Table Type Possible keys Key Ref Rows Filtered Extra

office ALL PRIMARY NULL NULL 100 100.00 NULL

employee ref office office office.id 99 100.00 Using where

MySQL 5.7 Improved Record Estimates for JOIN

Explain for 5.6: Total Cost = cost(scan office) + 100 * cost(ref_access emp)

Explain for 5.7: Total Cost = cost(scan emp) + 9991*1.23% * cost(eq_ref_access office)

SELECT office_name

FROM office JOIN employee ON office.id = employee.office

WHERE employee.name LIKE “John” AND

hire_date BETWEEN “2014-01-01” AND “2014-06-01”;

Table Type Possible keys Key Ref Rows Filtered Extra

employee ALL NULL NULL NULL 9991 1.23 NULL

office eq_ref PRIMARY PRIMARY employee.office 1 100.00 Using where JOIN ORDER HAS CHANGED!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

HINT

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Improved HINTs

• Introduced new hint syntax /*+ ...*/

– Flexibility over optimizer switch, effect individual statement only

– Hints within one statement take preceduence over optimizer switch

– Hints apply at different scope levels: global, query block, table, index

• Extended hint support

– BKA, BNL, MRR, ICP, SEMIJOIN, SUBQUERY,MAX_EXECUTION_TIME++

– Disabling prevents optimizer to use it

– Enabling means optimizer is free to use it, but is not forced to use it

• Will gradually replace the old hint syntax in upcoming releases

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Hint Example: MAX_EXECUTION_TIME

SELECT /*+ MAX_EXECUTION_TIME(1) */ * FROM t1 a, t1 b, t1 c, t1 d, t1 e LIMIT 1; ERROR 3024 (HY000): Query execution was interrupted, maximum statement execution time exceeded SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM t1 a, t1 b, t1 c, t1 d, t1 e LIMIT 1; +---+-----+----+------+----+-----+----+------+---+------+ | a | b | a | b | a | b | a | b | a | b | +---+-----+----+------+----+-----+----+------+---+------+ | 1 | 10 | 1 | 10 | 1 | 10 | 1 | 10 | 1 | 10 | +---+------+---+------+----+-----+----+------+---+------+ 1 row in set (0,00 sec)

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Hint Example: SEMIJOIN

EXPLAIN SELECT * FROM t2 WHERE t2.a IN (SELECT a FROM t3);

id Select_type Table Type Possible_keys Key Key_len Ref Rows Extra

1 simple t3 index a a 4 null 3 Using where; LooseScan

1 simple t2 ref a a 4 test.t3.a 1 Using index

EXPLAIN SELECT * FROM t2 WHERE t2.a IN (SELECT /*+ NO_SEMIJOIN() */ a FROM t3);

id Select_type Table Type Possible_keys Key Key_len Ref Rows Extra

1 primary t2 index null a 4 null 4 Using where; Using index

2 dependent subquery

t3 Index_ subquery

a a 4 func 1 Using index

No hint, optimizer chooses semi-join algorithm loosescan

Semi-join disabled with hint, subquery is executed for each row of outer table

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Hint Example: SEMIJOIN

EXPLAIN SELECT /*+ SEMIJOIN(@subq MATERIALIZATION) */ * FROM t2 WHERE t2.a IN (SELECT /*+ QB_NAME(subq) */ a FROM t3);

id Select_type Table Type Possible_keys

Key Key_len Ref Rows Extra

1 simple t2 index a a 4 null 4 Using where;using index

1 simple <subquery2> eq_ref <auto_key> <auto_key> 4 test.t2.a 1 null

2 materialized t3 index a a 4 null 3 Using index 3 rows in set, 1 warning (0.01 sec)

Hint on a particular algorithm, in this case semi-join materialization

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Query Rewrite Plugin

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Why Query Rewrite Plugin?

• Problem

– Optimizer choses a suboptimal plan

– Users can change the query plan by adding hints or rewrite the query

– However, database application code cannot be changed

• Solution: query rewrite plugin!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Query Rewrite Plugin

• New pre and post parse query rewrite APIs

– Users can write their own plug-ins

• Provides a post-parse query plugin

– Rewrite problematic queries without the need to make application changes • Add hints

• Modify join order

• Rewrite rules are defined in a table

• Improve problematic queries from ORMs, third party apps, etc

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 How Rewrites Happen?

For query

SELECT * FROM t1 JOIN t2 ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2

Replace parameter markers in Replacement with actual literals:

Pattern is:

SELECT *

FROM t1 JOIN t2 ON t1.keycol =

t2.keycol

WHERE col1 = ? AND col2 = ?

Replacement is:

SELECT a, b, c

FROM t1 STRAIGHT_JOIN t2 FORCE

INDEX (col1)

ON t1.keycol = t2.keycol

WHERE col1 = ? AND col2 = ?

SELECT a, b, c

FROM t1 STRAIGHT_JOIN t2 FORCE INDEX (col1)

ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7Query Rewrite Plugin cont. query_rewrite.rewrite_rules table:

Pattern Pattern_ database

Replacement Enabl

ed Message

SELECT name, department_name FROM employee JOIN department USING ( department_id ) WHERE salary > ?

employees

SELECT name, department_name FROM employee STRAIGHT_JOIN department USING ( department_id ) WHERE salary > ?

Y NULL

SELECT name, department_name FROXM employee JOIN department USING ( department_id ) WHERE salary > ?

employees

SELECT name, department_name FROM employee STRAIGHT JOIN department USING ( department_id ) WHERE salary > ?

N

Parse error in pattern:……near ……at line 1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Query Rewrite Plugin: Performance Impact

What is the Cost of Rewriting queries?

• Designed for rewriting problematic queries only!

• ~ Zero cost for queries not to be rewritten

– Statement digest computed for performance schema anyway

• Cost of queries to be rewritten is insignificant compared to performance gain

– Cost of generating query + reparsing max ~5% performance overhead

– Performance gain potentially x times

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

User Requested

Performance Improvements

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Avoid Creating Temporary Table for UNION ALL

SELECT * FROM table_a UNION ALL SELECT * FROM table_b;

• 5.6: Always materialize results of UNION ALL in temporary tables

• 5.7: Do not materialize in temporary tables unless used for sorting, rows are sent directly to client

• 5.7: Client will receive the first row faster, no need to wait until the last query block is finished

• 5.7: Less memory and disk consumption

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Optimizations for IN Expressions

• 5.6: Certain queries with IN predicates can’t use index scans or range scans even though all the columns in the query are indexed.

• 5.6: Range optimizer ignores lists of rows

• 5.6: Needs to rewrite to de-normalized form

SELECT a, b FROM t1 WHERE ( a = 0 AND b = 0 ) OR ( a = 1 AND b = 1 )

• 5.7: IN queries with row value expressions executed using range scans.

• 5.7: Explain output: Index/table scans change to range scans

CREATE TABLE t1 (a INT, b INT, c INT, KEY x(a, b)); SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Optimizations for IN Expressions

• A table has 10 000 rows, 2 match the where condition

MySQL 5.6:

**************1. row *****************

select_type: SIMPLE

table: t1

type: index

key: x

key_len: 10

ref: NULL

rows: 10 000

Extra: Using where; Using index

MySQL 5.7:

*************1. row *****************

select_type: SIMPLE

table: t1

type: range

key: x

key_len: 10

ref: NULL

rows: 2

Extra: Using where; Using index

SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

EXPLAIN

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Explain on a Running Query

EXPLAIN [FORMAT=(JSON|TRADITIONAL)] FOR CONNECTION <id>;

• Shows query plan on connection <id>

• Useful for diagnostic on long running queries

• Plan isn’t available when query plan is under creation

• Applicable to SELECT/INSERT/DELETE/UPDATE

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

What is on Our Roadmap?

• Improve prepared statement support

• Add histogram

• Support parallel queries

• Common table expression(WITH RECURSIVE)

• Windowing functions

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |