geek sync i consolidating indexes in sql server
Post on 29-Jul-2015
82 Views
Preview:
TRANSCRIPT
Consolidating Indexes in SQL Server
April 22. 2015
Meet Kendra Little
What you’ll learn today
How to de-dupe indexes
Why index hoarding is bad
How to fight key hoarding
How to design narrow indexes
3
1
2
4
To get the most out of this…
You should already know the definitions of:•Clustered indexes
•Nonclustered indexes
Not an expert?•It’s OK!
•Follow along, then review those concepts and watch
again later
Why index
hoarding
causes
problems
Consider this table
How a delete is executed
How a delete is executed
1) Find the rows to delete2) Delete them!
Where are those other indexes?
1) Find the rows to delete2) Delete them!
Hovering….
That’s a “narrow” plan
This is a simple looking plan•The query optimizer hands the work over to the
storage engine to do: delete this stuff!
•That may require a lot of running around in
those indexes!
The work needed for all those nonclustered
indexes isn’t apparent to you because it’s
hidden in a single operator
What if we need to
delete a lot of rows?
That’s different.
This is the exact same schema
Same table, same indexes
First query compiled:•DELETE FROM dbo.Posts WHERE PostTypeId=8;
Second query compiled:•DELETE FROM dbo.Posts WHERE PostTypeId=2;
SQL Server decided it needed a ‘wide plan’ for the second compilation
That’s a lot of work.
Indexes add overhead
SQL Server has to maintain them on inserts,
updates deletes
Just because you don’t see individual costs
for them in all execution plans doesn’t mean
they are free!
That cost is just hidden from you most of the
time
Too many indexes can be painful
Performance•Space in memory
•Time and lock implications for inserts, updates, deletes
Recoverability•Longer backups
•Longer restores
Maintenance duration•Longer reindexing
•Longer checking for corruption
Goal:
the smallest number of
indexes necessary to support
your workload.
1. Fight key
hoarding
You find these duplicate indexes
/* Id is an INT type (4 bytes) */
CREATE UNIQUE CLUSTERED INDEX cx_Postson dbo.Posts (Id);GO
ALTER TABLE dbo.PostsADD CONSTRAINT PK_PostsPRIMARY KEY NONCLUSTERED (Id);GO
15,979,181 rows; 31.5GB; 28.8GB LOB
15,979,181 rows; 155.0MB
Here they are
Your turn: What could go wrong…
/* If you replaced these with a Clustered
Primary key, like this? */
ALTER TABLE dbo.Posts
ADD CONSTRAINT PK_Posts
PRIMARY KEY CLUSTERED (Id);
GO
Time to write down your answer
Before:•Unique Clustered Index on Id
• 15,979,181 rows; 31.5GB; 28.8GB LOB
•Nonclustered Primary Key on Id• 15,979,181 rows; 155.0MB
Replace with….•Unique Clustered Primary Key on Id
• 15,979,181 rows; 31.5GB; 28.8GB LOB
What could go wrong?
Let’s talk.
Logical reads 19,788
Logical reads 348,079
Before
After
Comparison
There’s some risk
Some queries might take longer if there is no
substitute super-narrow nonclustered index•Example: COUNT(*) queries
Lookup queries against this value will take ever
so slightly longer (but that’s usually fine)
If the system is already live, you need to take
an outage if you’re enforcing referential
integrities
2. De-duplicate
nonclustered
indexes
You have these indexes
ID Key Columns Included Columns Reads Writes
22 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
4,932,101 695,023
42 LastActivityDate,
LastEditDate,
LastEditorDisplayName
Id 2,100,399 695,023
53 LastActivityDate,
LastEditDate
LastEditorDisplayName 5,011,022 695,023
61 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
1,302,230 695,023
Your turn: what do you do?
ID Key Columns Included Columns Reads Writes
22 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
4,932,101 695,023
42 LastActivityDate,
LastEditDate,
LastEditorDisplayName
Id 2,100,399 695,023
53 LastActivityDate,
LastEditDate
LastEditorDisplayName 5,011,022 695,023
61 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
1,302,230 695,023
Let’s de-dupe.
Observation: all are heavily read
ID Key Columns Included Columns Reads Writes
22 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
4,932,101 695,023
42 LastActivityDate,
LastEditDate,
LastEditorDisplayName
Id 2,100,399 695,023
53 LastActivityDate,
LastEditDate
LastEditorDisplayName 5,011,022 695,023
61 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
1,302,230 695,023
#22 and #61 are total duplicates
ID Key Columns Included Columns Reads Writes
22 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
4,932,101 695,023
42 LastActivityDate,
LastEditDate,
LastEditorDisplayName
Id 2,100,399 695,023
53 LastActivityDate,
LastEditDate
LastEditorDisplayName 5,011,022 695,023
61 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
1,302,230 695,023
#42: key is a superset!
ID Key Columns Included Columns Reads Writes
22 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
4,932,101 695,023
42 LastActivityDate,
LastEditDate,
LastEditorDisplayName
Id 2,100,399 695,023
53 LastActivityDate,
LastEditDate
LastEditorDisplayName 5,011,022 695,023
61 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
1,302,230 695,023
Consolidation process
1. Eliminate:• Indexes that aren’t read • Indexes that are total duplicates
2. Combine:• Identify the key superset (key order matters)• Combine the included columns
3. Review to see if it’s a monster
4. Create the new index• Watch performance
5. Drop the rest
Order doesn’t matter for includes
Leaf:
keys and
includes
Root &
intermediate:
key cols only
De-duping the includes
ID Key Columns Included Columns Reads Writes
22 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
4,932,101 695,023
42 LastActivityDate,
LastEditDate,
LastEditorDisplayName
Id 2,100,399 695,023
53 LastActivityDate,
LastEditDate
LastEditorDisplayName 5,011,022 695,023
61 LastActivityDate LastEditDate,
CommentCount,
CreationDate,
LastEditorDisplayName
1,302,230 695,023
One index to replace them all
KEYLastActivityDate, LastEditDate,
LastEditorDisplayName
INCLUDED COLUMNSCommentCount, CreationDate, Id
3. Practice
minimalist
index design
SELECT TOP 10 CommentCount, Score, ViewCount
FROM dbo.Posts
WHERE PostTypeId=$int
ORDER BY CommentCount DESC;
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= $int
AND PostTypeId=$int
ORDER BY Score DESC;
Two frequent queries
SELECT TOP 10 CommentCount, Score, ViewCount
FROM dbo.Posts
WHERE PostTypeId=$int
ORDER BY CommentCount DESC;
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= $int
AND PostTypeId=$int
ORDER BY Score DESC;
Your turn: design one index for
both
Here’s how I
do it
SELECT TOP 10 CommentCount, Score, ViewCount
FROM dbo.Posts
WHERE PostTypeId=$int
ORDER BY CommentCount DESC;
Query 1: look for key columns
An equality
comparison
SELECT TOP 10 CommentCount, Score, ViewCount
FROM dbo.Posts
WHERE PostTypeId=$int
ORDER BY CommentCount DESC;
Query 1: look for more key
columns
TOP / ORDER BY
SELECT TOP 10 CommentCount, Score, ViewCount
FROM dbo.Posts
WHERE PostTypeId=$int
ORDER BY CommentCount DESC;
Query 1: look for possible
includes
Query 1 – scratch paper
Possible keys (order still negotiable)•PostTypeId – an equality predicate
•CommentCount – TOP 10 ordered by this
DESC
Possible includes•Score
•ViewCount
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= $int
AND PostTypeId=$int
ORDER BY Score DESC;
Query 2: look for key columns
An equality
comparison
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= $int
AND PostTypeId=$int
ORDER BY Score DESC;
Query 2: look for key columns
An inequality
comparison
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= $int
AND PostTypeId=$int
ORDER BY Score DESC;
Query 2: look for even more
keys
TOP / ORDER BY
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= $int
AND PostTypeId=$int
ORDER BY Score DESC;
Query 2: look for possible
includes
Query 2 – scratch paper
Possible keys (order still negotiable)•PostTypeId – an equality predicate
•Score – top 10 ordered by this DESC• This is probably most efficient as the second column based
on selectivity
• a lot of low score things may get comments over whatever threshold is passed in
•CommentCount – an inequality >=
Possible includes•Id
•(the other possible includes are already key cols)
Scratch paper
A great index for query 1: •KEY (PostTypeId, CommentCount DESC)
•INCLUDE (Score, ViewCount)
A great index for query 2:•KEY (PostTypeId, Score DESC, CommentCount)
•INCLUDE (Id)
These queries have different ‘ideal’ indexes
But if every query gets an ideal index, we’ve got other problems
A compromise
Combined:•KEY (PostTypeId, CommentCount, Score)
• With Score third, DESC no longer helps
•INCLUDE (Id, ViewCount)
CREATE INDEX ix_Posts_PostTypeId_CommentCount_Score_INCLUDES
on dbo.Posts (PostTypeId, CommentCount, Score)
INCLUDE (Id, ViewCount)
WITH (ONLINE=ON);
Example query plans
logical reads 10,793
logical reads 3
KEY (PostTypeId, CommentCount, Score) INCLUDE (Id, ViewCount)
What if the order was different?
KEY (PostTypeId, Score, CommentCount) INCLUDE (Id, ViewCount)
logical reads 3
logical reads 34,980
What if we tried this?
KEY (PostTypeId) INCLUDE (Score, CommentCount, Id, ViewCount)
logical reads 34,754
logical reads 34,830
This requires a decision
Things to research•Will one of these queries run more often than the other?
•Does one of these queries need to be faster?
•Can either of these queries be cached in the application layer?
•How many other read and write queries run in the workload?
In general, one index that is great for one query and pretty good for the other is usually best
•KEY (PostTypeId, CommentCount, Score) INCLUDE (Id, ViewCount)
Takeaways
Consolidate as you write code
You can’t be psychic about how the
optimizer will work
Be conservative:•Don’t duplicate keys
•Don’t add included columns “just in case”
•Don’t index columns unless you have a reason
•If you add keys or includes to a nonclustered
index, drop the old index
Review for duplicates regularly
These always creep in•Indexes get added in an emergency
•Older indexes are meant to be dropped, but
aren’t
Or someone runs the Database Tuning
Advisor, which loves to create duplicates
and near-duplicates
Here’s How …
Idera Can Help You.
www.idera.com
top related