mysql query tuning for the squeemish -- fossetcon orlando sep 2014
DESCRIPTION
An introduction to MySQL query tuning for those who are new to the subject. This is not a dark art but a skill that can be developed.TRANSCRIPT
MySQL Query Tuning MySQL Query Tuning For the SqueemishFor the Squeemish
@stoker
Copyright © 2014 Dave Stokes 2
MySQL Optimizer
● The MySQL Cost Based Optimizer wants to optimize each query, every time. Query plans can not be locked in as with Oracle.
● Cost is/was based on disk I/O
● FusionIO, hybrid drives changing cost paradigm
– Cost when cost = 0?!?
● Checks Syntax
● Guesstimates costs, looks for cheapest way to do things
● May not be the best way
● How good are your statistics? When was your last ANALYSE TABLE?
– Need to run to update stats
Copyright © 2014 Dave Stokes 3
Logging – find slow queries
Log slow queries (time configurable) and queries w/o indexes.
MySQL 5.7 allows levels of verbosity.
PERFORMANCE_SCHEMA, INFORMATION_SCEMA has tons of useful data.
Copyright © 2014 Dave Stokes 4
Chapter 8 of the MySQL Manual
● All about optimization
● EXPLAIN
MySQL 5.6.3 and later permits explainable statements for EXPLAIN are SELECT, DELETE, INSERT, REPLACE, and UPDATE. Before MySQL 5.6.3, SELECT is the only explainable statement.
● Many developers have been trained in their language of choice, JavaScript, JSON/REST, etc. but very, very few have any SQL training. And many try to use ORMs to make up for their rotten SQL skills or try to force NoSQL technologies for relational data.
Copyright © 2014 Dave Stokes 5
Please note
Copyright © 2014 Dave Stokes 6
Example 1
This is the ASCII style output of explain versus VE
Copyright © 2014 Dave Stokes 7
From the Manual – If you want details
8.8.4 Estimating Query Performance
● In most cases, you can estimate query performance by counting disk seeks. For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row: log(row_count) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1.
● In MySQL, an index block is usually 1,024 bytes and the data pointer is usually four bytes. For a 500,000-row table with a key value length of three bytes (the size of MEDIUMINT), the formula indicates log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.
● This index would require storage of about 500,000 * 7 * 3/2 = 5.2MB (assuming a typical index buffer fill ratio of 2/3), so you probably have much of the index in memory and so need only one or two calls to read data to find the row.
● For writes, however, you need four seek requests to find where to place a new index value and normally two seeks to update the index and write the row.
Copyright © 2014 Dave Stokes 8
Previous slide in simpler terms
● Use smallest data type possible. No BIGINT for age or customer id numbers
– Are you really going to live to 18,446,744,073,709,551,615?
– Use PROCEDUE ANALYSE to find optimal size/type● Wasted space is moved from disk to memory, memoery to
buffer, buffer to network, network to buffer, and/or buffer to disk
– Reading off disk 100,000 slower than memory
Copyright © 2014 Dave Stokes 9
From the Manual – If you want details
8.8.4 Estimating Query Performance
● In most cases, you can estimate query performance by counting disk seeks. For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row: log(row_count) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1.
● In MySQL, an index block is usually 1,024 bytes and the data pointer is usually four bytes. For a 500,000-row table with a key value length of three bytes (the size of MEDIUMINT), the formula indicates log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.
● This index would require storage of about 500,000 * 7 * 3/2 = 5.2MB (assuming a typical index buffer fill ratio of 2/3), so you probably have much of the index in memory and so need only one or two calls to read data to find the row.
● For writes, however, you need four seek requests to find where to place a new index value and normally two seeks to update the index and write the row.
Copyright © 2014 Dave Stokes 10
Chapter 8 is a treasure trove of information.
Copyright © 2014 Dave Stokes 11
The dreaded B-tree index
● Docs.Oracle.com
Copyright © 2014 Dave Stokes 12
Why your query stinks!
SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer,
address.phone, film.title
FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
INNER JOIN address ON customer.address_id = address.address_id
INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
INNER JOIN film ON inventory.film_id = film.film_id
WHERE rental.return_date IS NULL
AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()
LIMIT 5;
Copyright © 2014 Dave Stokes 13
NULL
● Usually used to do designate no value or a lack of data.
– Useful for many applications
– Screws up indexes as you need to also test values for null. Multiple nulls can kill performance
Copyright © 2014 Dave Stokes 14
FULL TABLE SCAN – Try to avoid
● FULL TABLE SCAN means each and every row in your table had to be read. If all the table is not in memory, it is made so.
– Not all ways bad● Sometimes you have to go through all rows
– Use indexes to find exact record(s) needed
Copyright © 2014 Dave Stokes 15
Still have a full table scan
Copyright © 2014 Dave Stokes 16
Note numbers for rows! Cost!!
Estimate that it will take 2,000 reads to get desired data
Copyright © 2014 Dave Stokes 17
Simpler Example
SELECT City.name, Country.Name
FROM City
JOIN Country ON (City.CountryCode = Country.Code)
LIMIT 5;
Copyright © 2014 Dave Stokes 18
But I ONLY want FIVE ROWS!!
SELECT City.name, Country.Name
FROM City
JOIN Country ON (City.CountryCode = Country.Code)
LIMIT 5;
Still needs to do the join BEFORE grabbing just the first FIVE rows.
Copyright © 2014 Dave Stokes 19
Add stuff to WHERE to narrow search
SELECT City.name, Country.Name
FROM City
JOIN Country ON (City.CountryCode = Country.Code)
WHERE City.Population > 500000
AND District = 'Texas'
LIMIT 5;
Should really cut down amount of data read, right?
Copyright © 2014 Dave Stokes 20
Not Really
Copyright © 2014 Dave Stokes 21
Another example
● SELECT *FROM City
WHERE
Population > 400000
Copyright © 2014 Dave Stokes 22
Tossing an index at a problem
● CREATE INDEX CityPopIndex ON City (Population)
● SELECT *FROM City
WHERE
Population > 400000
Net gain zero
Copyright © 2014 Dave Stokes 23
● Optimizer did not use our new index!!
Copyright © 2014 Dave Stokes 24
The best-laid schemes o’ micean’ men Gang aft agley
● SELECT * FROM City USE INDEX (CityPopindex) WHERE Population > 400000
● This is a case where even forcing the index does not help as the Population entries are not unique enough for the optimizer to take advantage of it.
Copyright © 2014 Dave Stokes 25
Another example
● SELECT Name
FROM City
WHERE CountryCode = 'USA'
● 274 rows in world.City have USA as the CountryCode, better than 4K total records in a FULL TABLE SCAN
Copyright © 2014 Dave Stokes 26
Using a Join
● SELECT City.Name, Country.name
FROM City
Join Country ON (Country.code = City.CountryCode)
WHERE Country.Population > 4000000AND City.CountryCode='USA'
Copyright © 2014 Dave Stokes 27
Indexes
● An index on City, State, and Zip can be used to look up
– City, State, Zip
– City, State
– City● Drags Zip along on City, State and saves a read into
the data
Copyright © 2014 Dave Stokes 28
Add a table for zipcodes
SHOW CREATE TABLE zipcodes\G
Table: zipcodes
Create Table: CREATE TABLE `zipcodes` (
`zip` varchar(5) NOT NULL,
`type` char(10) DEFAULT NULL,
`primary_city` varchar(25) DEFAULT NULL,
`accceptable_cities` varchar(50) DEFAULT NULL,
`unacceptable_city` varchar(50) DEFAULT NULL,
`state` char(2) DEFAULT NULL,
`county` varchar(30) DEFAULT NULL,
`tz` char(50) DEFAULT NULL,
`area_code` varchar(20) DEFAULT NULL,
`latitude` decimal(5,0) DEFAULT NULL,
`longitude` decimal(5,0) DEFAULT NULL,
`world_region` char(4) DEFAULT NULL,
`country` char(2) DEFAULT NULL,
`decomissioned` int(2) DEFAULT NULL,
`est_population` int(10) unsigned DEFAULT NULL,
`notes` char(20) DEFAULT NULL,
PRIMARY KEY (`zip`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Copyright © 2014 Dave Stokes 29
Lookup up by city name
● SELECT primary_city, state, zip, accceptable_cities FROM zipcodes WHERE primary_city='Roanoke'
Copyright © 2014 Dave Stokes 30
Two factors
SELECT primary_city, state, zip, accceptable_cities FROM zipcodes WHERE primary_city='Roanoke' AND state='TX'
Copyright © 2014 Dave Stokes 31
What is we specify all three items?
● SELECT primary_city, state, zip, accceptable_citiesFROM zipcodes WHERE primary_city='Roanoke' AND state='TX' AND zip='76262'
Optimizer is smart enough to figure that PRIMARY key gives all we need the cheapest
Copyright © 2014 Dave Stokes 32
Sometimes indexes help a little
● CREATE INDEXcityindex ON zipcodes (primary_city);
An index is added on primary_city to facilitate searching by city name
Copyright © 2014 Dave Stokes 33
Yet another example
SELECT primary_city, state, area_code
FROM zipcodes USE INDEX (primary)
WHERE primary_city='Orlando' ANDstate='FL'
● Default is a full table scan
● CREATE INDEX citystatearea ON zipcodes (primary_city,state,area_code);
Copyright © 2014 Dave Stokes 34
More complex query
SELECT CONCAT(c.city, _utf8',', cy.country) AS store, CONCAT(m.first_name, _utf8' ', m.last_name) AS manager, SUM(p.amount) AS total_salesFROM payment AS p INNER JOIN rental AS r ON p.rental_id = r.rental_id INNER JOIN inventory AS i ON r.inventory_id = i.inventory_id INNER JOIN store AS s ON i.store_id = s.store_id INNER JOIN address AS a ON s.address_id = a.address_id INNER JOIN city AS c ON a.city_id = c.city_id INNER JOIN country AS cy ON c.country_id = cy.country_id INNER JOIN staff AS m ON s.manager_staff_id = m.staff_idGROUP BY s.store_idORDER BY cy.country, c.city;
Copyright © 2014 Dave Stokes 35
Workbench – Mouse Over Blocks
● MySQL Workbench can provide information on queries
– Mouse over blocks for details. In this case the query does not require a filesort (slow) to order the query.
– SELECT Name FROM City WHERE Population > 100000 or District='Florida' Order by Population limit 10;
Note: Full INDEX scan not Full TABLE scan
Copyright © 2014 Dave Stokes 36
Joins – think sets on rows of data
Copyright © 2014 Dave Stokes 37
Books
● Effective MySQL Optimizing SQL Statements
– Ronald Bradford ● High Performance MySQL
– Schwartz, Zaitsev, Tkachenko
Copyright © 2014 Dave Stokes 38
Other resources
● MySQL Manual
● Planet.MySQL.Com
● Forums.MySQL.com
– Forums for Performance, Optimizer & Parser, and many more
● MySQL Central @ OpenWorld
– Part of Oracle Open World
– End of September