IndexesThe second pillar of database wisdom
“MySQL doesn’t support this”
MySQL
What everyone knows about indexes
• Trade insert/update/delete time for query time
• Trade storage space for query time
Other things everyone knows about indexes
• Index all join columns
• Index for every field you search
• Indexes should be rebuilt regularly
• Most selective first
• Dynamic SQL is slow
• Faster computers execute queries faster
Nonsense some folks believe
http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
Howe’s LawBusinesses can’t afford the
servers they really need
Corollary to Howe’s Law
Your database is never fast enough
Query• not a program
• balances:
• random vs sequential I/O
• CPU use
• memory use
• optimization is np hard
“Or” and complex queries
• AND queries, particularly in applications are the most common type of query
• A query involving OR clauses can usually be considered as combining the results of two or more AND queries
• Complex to give general advice
• Can be reason to stick to small, simple indexes
How to use an index
Phone Book Theory of Indexing
WhoCo
Employee Register
Smith Andrew Houston 555-413-6182
Adams Maggie Houston 555-111-8943
Medville Anne Los Angeles 555-413-1183
Samuels Fred New York 555-112-8943
…
p72
Jacob, a programmer
Family Name Pages
Adams 17/3, 44/1, 44/5 72/4
Andrews 1/3, 19/5, 44/5
Ambers 11/2
Amberson19/1, 32/1, 32/2, 32/3, 32/4,
99/2
…
Family name index
Given Name Pages
Adam 13/3, 14/3, 124/4
Ainsley 12/3
Aston 134/2, 135/2
Atley 19/5, 32/1, 32/3, 32/4, 99/1
…
Given name index
Office Pages
Austin, TX
3/1, 4/1, 5/2, 6/5, 7/4, 11/4, 15/5, 18/3, 19/3, 20/2, 21/3, 22/1, 23/1, 25/4, 26/5, 27/2, 28/4, 29/1, 30/1 31, 33, 34…
Bakersville, MD1/1, 2/2, 3/3, 4/3, 7/2, 11/3, 15/5, 19/4, 20/2, 21/2, 23/4,
44/1, …
Boston, MA …
…
Branch index
Gwen, a programmer
Comprehensive index
Branch Family Name Given Name Pages
Austin Jeffries Leslie 11/2
Austin Kew Nelson 124/2
Austin Samuelson Steven 13/4
Austin Simpson Zaphod 98/1
…
Comprehensive index
Family Name Given Name Branch Pages
Adams Leslie Bakersville 44/2
Adams Nelson Boston 17/2
Adams Steven Bakersville 44/1
Adams Zaphod Austin 72/3
…
1. Prefer compound indexes
Full-time index
Full-time Pages
Y 1/7, 1/8 2/1, 2/2, 2/3, 2/4, 2/5…
N 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/6, …
2. Prefer selective predicates
Comprehensive index
Family Name Given Name Branch Pages
Adams Leslie Bakersville 44/2
Adams Nelson Boston 17/2
Adams Steven Bakersville 44/1
Adams Zaphod Austin 72/3
…
3. Consider a covering index
Database index
1979-05-27
Tampa
1979-06-07
Grover
1979-07-08
Austin
…
1979-02-01 Austin
1979-02-12 Houston
1979-02-17
Bakersville
1979-03-08 Tampa
1979-05-27 Tampa
1979-05-29 Newtown
1979-05-29 Seattle
1979-06-04 Seattle
1979-06-07 Grover
1979-06-14
South End
1979-06-15 Newtown
1979-06-16 London
Database index
Austin 1990-01-01
Bakersville
1981-06-01
Bakersville
1985-06-04
Austin1980-01-
01
Austin1981-06-
01
Austin1983-01-
12
Austin1987-11-
31
Austin1990-01-
01
Austin1991-06-
01
Bakersville
1979-08-03
Bakersville
1979-09-07
Bakersville
1981-06-01
Bakersville
1982-12-13
Bakersville
1983-03-28
Bakersville
1985-01-01
…
4. Be subtle with compound index
predicates• Put covering fields you don’t search on at the
end
• Put a field to do range searches on next
• Put other fields you search on at the front in order of use and selectivity
• Unless you’re dealing with different searches, in which case compromise, or use multiple compound indexes
WhoCo
Employee Register
Senior Personnel Directory
DivisionFamily Name
Given Name Branch Pages
Accounting Adams Leslie Bakersville 44/2
Accounting Adams Nelson Boston 17/1
…
R&D Yu Stephen Austin 72/5
CREATE INDEXpersonnel_main_office_idx
ONpersonnel (
division,family_name,given_name)
WHEREdivision IS NOT NULL;
Partial index
SELECT*
FROMpersonnel
WHEREdivision IS NOT NULL
ANDdivision = ‘R&D’;
MySQL
CREATE INDEXpersonnel_name_lc
ONpersonnel (
lower(family_name),lower(given_name));
Functional index
SELECT*
FROMpersonnel
WHERElower(family_name) =
‘smith’ ANDlower(given_name) =
‘mary’;
MySQL
Comprehensive index
Family Name Given NameStreet
Address Pages
Smith Leslie 13 Main St 44/2
Smith Nelson 11a Rails St 17/2
Smith Steven 19 Main St 44/1
Smithers Zaphod 12 Rails St 72/3
…
SELECT*
FROMpersonnel
WHERElower(family_name) like ‘smi
%’;
SELECT*
FROMpersonnel
WHERElower(street_addr) like ‘%
st’;
CREATE INDEXpersonnel_addr_rev
ONpersonnel (
reverse(lower(street_addr)));
Functional index tricks
SELECT*
FROMpersonnel
WHEREreverse(lower(street_addr)) like ‘ts
%’;
MySQL
Sorting• Need same predicates in WHERE and ORDER
BY
• Index can be used in either direction
• But not both…
• not SELECT… ORDER BY family_name DESC, branch ASC
• Use Postgres’ CREATE INDEX… NULLS LAST (or FIRST)
Partial Results• LIMIT n or FETCH FIRST n
• Need pipelined ORDER BY
• Otherwise we have to sort the whole result set
• Database dependent
• Postgres window functions aren’t pipelined
Paginating“SELECT
*FROM
peopleORDER BY
hire_dateLIMIT
50OFFSET
1000"
Paginating: much better
People.where “
(hire_date, creation_date) >(:hire_date, :creation_date)”,
{hire_date: @last_person.hire_date,creation_date:
@last_person.creation_date}.order(:hire_date, :creation_date).limit: 50
.where “(hire_date > :hire_date)OR(hire_date = :hire_date ANDcreation_date > :creation_date)”
Otherdatabases
Possiblyor hire_date IS NULL
Postgres
Clustering
• Based on existing index
• Physically reorganize the heap in index order
• Reduce seek time for blocks of records
• One-off procedure; new records not clustered
• Locks table for duration
Use case: partitioning
• use Postgres table inheritance
• eg log data, very fast, last 30 days searchable
• split table into numerous identical subtables
• cluster a table once we’re done writing to it
Joins
Three Join Methods• Nested Loop
• Query 1 table; loop and query the other
• Hash Join
• Avoid nested loop cost by making hash of one side
• Merge Join
• Query each table, sort and compare
MySQL
MySQL
Nested Loop Join
• Index for each side separately
Hash Join
• Not for range joins; only equality
• No point indexing join fields
• Index independent predicates
• Reduce memory footprint of hash table by minimizing number of fields or rows
MySQL
Merge Join• Less often used because sort is expensive
• Mainly for outer joins
• Or when an index on both sides speeds sorting
• same index for where and order by
• No point indexing join fields, only independent predicates
• Reduce memory footprint by minimizing number of fields or rows
MySQL
Index Data Structures
Index Data Structures
• B-Tree
• “Normal” index type
• equality and range searches
• Hash
• Faster for equality
• Test
Index Data Structures
• GIST
• Variety of data types with complex operators eg spatial contains/near/etc
• hstore
• SP-GIST
• Similar to GIST but better for some data types
• GIN
• more-or-less GIST
• (3x) more space and (3x) slower insert gives faster (3x) search
• Some data types only support one
MySQL
Dynamic SQL• Prepared statement or bind queries lets
server cache query plan
• SQL Server or Oracle
• Save query plan time
• Usually saves time, but doesn’t consider values
• Uneven distributions suffer
MySQL
Postgres
Statistics
ALTER TABLEtable
ALTER COLUMNcolumn
SETSTATISTICS 1000