geek sync i consolidating indexes in sql server

Post on 29-Jul-2015

82 Views

Category:

Software

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Consolidating Indexes in SQL Server

April 22. 2015

Meet Kendra Little

What you’ll learn today

How to de-dupe indexes

Why index hoarding is bad

How to fight key hoarding

How to design narrow indexes

3

1

2

4

To get the most out of this…

You should already know the definitions of:•Clustered indexes

•Nonclustered indexes

Not an expert?•It’s OK!

•Follow along, then review those concepts and watch

again later

Why index

hoarding

causes

problems

Consider this table

How a delete is executed

How a delete is executed

1) Find the rows to delete2) Delete them!

Where are those other indexes?

1) Find the rows to delete2) Delete them!

Hovering….

That’s a “narrow” plan

This is a simple looking plan•The query optimizer hands the work over to the

storage engine to do: delete this stuff!

•That may require a lot of running around in

those indexes!

The work needed for all those nonclustered

indexes isn’t apparent to you because it’s

hidden in a single operator

What if we need to

delete a lot of rows?

That’s different.

This is the exact same schema

Same table, same indexes

First query compiled:•DELETE FROM dbo.Posts WHERE PostTypeId=8;

Second query compiled:•DELETE FROM dbo.Posts WHERE PostTypeId=2;

SQL Server decided it needed a ‘wide plan’ for the second compilation

That’s a lot of work.

Indexes add overhead

SQL Server has to maintain them on inserts,

updates deletes

Just because you don’t see individual costs

for them in all execution plans doesn’t mean

they are free!

That cost is just hidden from you most of the

time

Too many indexes can be painful

Performance•Space in memory

•Time and lock implications for inserts, updates, deletes

Recoverability•Longer backups

•Longer restores

Maintenance duration•Longer reindexing

•Longer checking for corruption

Goal:

the smallest number of

indexes necessary to support

your workload.

1. Fight key

hoarding

You find these duplicate indexes

/* Id is an INT type (4 bytes) */

CREATE UNIQUE CLUSTERED INDEX cx_Postson dbo.Posts (Id);GO

ALTER TABLE dbo.PostsADD CONSTRAINT PK_PostsPRIMARY KEY NONCLUSTERED (Id);GO

15,979,181 rows; 31.5GB; 28.8GB LOB

15,979,181 rows; 155.0MB

Here they are

Your turn: What could go wrong…

/* If you replaced these with a Clustered

Primary key, like this? */

ALTER TABLE dbo.Posts

ADD CONSTRAINT PK_Posts

PRIMARY KEY CLUSTERED (Id);

GO

Time to write down your answer

Before:•Unique Clustered Index on Id

• 15,979,181 rows; 31.5GB; 28.8GB LOB

•Nonclustered Primary Key on Id• 15,979,181 rows; 155.0MB

Replace with….•Unique Clustered Primary Key on Id

• 15,979,181 rows; 31.5GB; 28.8GB LOB

What could go wrong?

Let’s talk.

Logical reads 19,788

Logical reads 348,079

Before

After

Comparison

There’s some risk

Some queries might take longer if there is no

substitute super-narrow nonclustered index•Example: COUNT(*) queries

Lookup queries against this value will take ever

so slightly longer (but that’s usually fine)

If the system is already live, you need to take

an outage if you’re enforcing referential

integrities

2. De-duplicate

nonclustered

indexes

You have these indexes

ID Key Columns Included Columns Reads Writes

22 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

4,932,101 695,023

42 LastActivityDate,

LastEditDate,

LastEditorDisplayName

Id 2,100,399 695,023

53 LastActivityDate,

LastEditDate

LastEditorDisplayName 5,011,022 695,023

61 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

1,302,230 695,023

Your turn: what do you do?

ID Key Columns Included Columns Reads Writes

22 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

4,932,101 695,023

42 LastActivityDate,

LastEditDate,

LastEditorDisplayName

Id 2,100,399 695,023

53 LastActivityDate,

LastEditDate

LastEditorDisplayName 5,011,022 695,023

61 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

1,302,230 695,023

Let’s de-dupe.

Observation: all are heavily read

ID Key Columns Included Columns Reads Writes

22 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

4,932,101 695,023

42 LastActivityDate,

LastEditDate,

LastEditorDisplayName

Id 2,100,399 695,023

53 LastActivityDate,

LastEditDate

LastEditorDisplayName 5,011,022 695,023

61 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

1,302,230 695,023

#22 and #61 are total duplicates

ID Key Columns Included Columns Reads Writes

22 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

4,932,101 695,023

42 LastActivityDate,

LastEditDate,

LastEditorDisplayName

Id 2,100,399 695,023

53 LastActivityDate,

LastEditDate

LastEditorDisplayName 5,011,022 695,023

61 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

1,302,230 695,023

#42: key is a superset!

ID Key Columns Included Columns Reads Writes

22 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

4,932,101 695,023

42 LastActivityDate,

LastEditDate,

LastEditorDisplayName

Id 2,100,399 695,023

53 LastActivityDate,

LastEditDate

LastEditorDisplayName 5,011,022 695,023

61 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

1,302,230 695,023

Consolidation process

1. Eliminate:• Indexes that aren’t read • Indexes that are total duplicates

2. Combine:• Identify the key superset (key order matters)• Combine the included columns

3. Review to see if it’s a monster

4. Create the new index• Watch performance

5. Drop the rest

Order doesn’t matter for includes

Leaf:

keys and

includes

Root &

intermediate:

key cols only

De-duping the includes

ID Key Columns Included Columns Reads Writes

22 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

4,932,101 695,023

42 LastActivityDate,

LastEditDate,

LastEditorDisplayName

Id 2,100,399 695,023

53 LastActivityDate,

LastEditDate

LastEditorDisplayName 5,011,022 695,023

61 LastActivityDate LastEditDate,

CommentCount,

CreationDate,

LastEditorDisplayName

1,302,230 695,023

One index to replace them all

KEYLastActivityDate, LastEditDate,

LastEditorDisplayName

INCLUDED COLUMNSCommentCount, CreationDate, Id

3. Practice

minimalist

index design

SELECT TOP 10 CommentCount, Score, ViewCount

FROM dbo.Posts

WHERE PostTypeId=$int

ORDER BY CommentCount DESC;

SELECT TOP 10 Id, CommentCount, Score

FROM dbo.Posts

WHERE CommentCount >= $int

AND PostTypeId=$int

ORDER BY Score DESC;

Two frequent queries

SELECT TOP 10 CommentCount, Score, ViewCount

FROM dbo.Posts

WHERE PostTypeId=$int

ORDER BY CommentCount DESC;

SELECT TOP 10 Id, CommentCount, Score

FROM dbo.Posts

WHERE CommentCount >= $int

AND PostTypeId=$int

ORDER BY Score DESC;

Your turn: design one index for

both

Here’s how I

do it

SELECT TOP 10 CommentCount, Score, ViewCount

FROM dbo.Posts

WHERE PostTypeId=$int

ORDER BY CommentCount DESC;

Query 1: look for key columns

An equality

comparison

SELECT TOP 10 CommentCount, Score, ViewCount

FROM dbo.Posts

WHERE PostTypeId=$int

ORDER BY CommentCount DESC;

Query 1: look for more key

columns

TOP / ORDER BY

SELECT TOP 10 CommentCount, Score, ViewCount

FROM dbo.Posts

WHERE PostTypeId=$int

ORDER BY CommentCount DESC;

Query 1: look for possible

includes

Query 1 – scratch paper

Possible keys (order still negotiable)•PostTypeId – an equality predicate

•CommentCount – TOP 10 ordered by this

DESC

Possible includes•Score

•ViewCount

SELECT TOP 10 Id, CommentCount, Score

FROM dbo.Posts

WHERE CommentCount >= $int

AND PostTypeId=$int

ORDER BY Score DESC;

Query 2: look for key columns

An equality

comparison

SELECT TOP 10 Id, CommentCount, Score

FROM dbo.Posts

WHERE CommentCount >= $int

AND PostTypeId=$int

ORDER BY Score DESC;

Query 2: look for key columns

An inequality

comparison

SELECT TOP 10 Id, CommentCount, Score

FROM dbo.Posts

WHERE CommentCount >= $int

AND PostTypeId=$int

ORDER BY Score DESC;

Query 2: look for even more

keys

TOP / ORDER BY

SELECT TOP 10 Id, CommentCount, Score

FROM dbo.Posts

WHERE CommentCount >= $int

AND PostTypeId=$int

ORDER BY Score DESC;

Query 2: look for possible

includes

Query 2 – scratch paper

Possible keys (order still negotiable)•PostTypeId – an equality predicate

•Score – top 10 ordered by this DESC• This is probably most efficient as the second column based

on selectivity

• a lot of low score things may get comments over whatever threshold is passed in

•CommentCount – an inequality >=

Possible includes•Id

•(the other possible includes are already key cols)

Scratch paper

A great index for query 1: •KEY (PostTypeId, CommentCount DESC)

•INCLUDE (Score, ViewCount)

A great index for query 2:•KEY (PostTypeId, Score DESC, CommentCount)

•INCLUDE (Id)

These queries have different ‘ideal’ indexes

But if every query gets an ideal index, we’ve got other problems

A compromise

Combined:•KEY (PostTypeId, CommentCount, Score)

• With Score third, DESC no longer helps

•INCLUDE (Id, ViewCount)

CREATE INDEX ix_Posts_PostTypeId_CommentCount_Score_INCLUDES

on dbo.Posts (PostTypeId, CommentCount, Score)

INCLUDE (Id, ViewCount)

WITH (ONLINE=ON);

Example query plans

logical reads 10,793

logical reads 3

KEY (PostTypeId, CommentCount, Score) INCLUDE (Id, ViewCount)

What if the order was different?

KEY (PostTypeId, Score, CommentCount) INCLUDE (Id, ViewCount)

logical reads 3

logical reads 34,980

What if we tried this?

KEY (PostTypeId) INCLUDE (Score, CommentCount, Id, ViewCount)

logical reads 34,754

logical reads 34,830

This requires a decision

Things to research•Will one of these queries run more often than the other?

•Does one of these queries need to be faster?

•Can either of these queries be cached in the application layer?

•How many other read and write queries run in the workload?

In general, one index that is great for one query and pretty good for the other is usually best

•KEY (PostTypeId, CommentCount, Score) INCLUDE (Id, ViewCount)

Takeaways

Consolidate as you write code

You can’t be psychic about how the

optimizer will work

Be conservative:•Don’t duplicate keys

•Don’t add included columns “just in case”

•Don’t index columns unless you have a reason

•If you add keys or includes to a nonclustered

index, drop the old index

Review for duplicates regularly

These always creep in•Indexes get added in an emergency

•Older indexes are meant to be dropped, but

aren’t

Or someone runs the Database Tuning

Advisor, which loves to create duplicates

and near-duplicates

Here’s How …

Idera Can Help You.

www.idera.com

top related