sql server storage and index structures physical data organization indexes b-trees sql server data...

SQL Server Storage andIndex Structures

Physical Data Organization

Indexes

B-Trees

SQL Server Data Access

Clustered and Non-Clustered

Creating, Altering, Dropping Indexes

Choosing your Indexes

Maintaining your Indexes

SQL Server Storage Hierarchy• Database

• Extent– 8 contiguous 64K data pages– Once extent full, next record will take up a

whole additional extent. – Pre-allocating space saves time.

SQL Server Storage Hierarchy• Page

– 64K bytes– # of records/page varies with bytes/record– Types of Pages: Data and Index pages– Page Split

• When page becomes full, it splits– New page allocated: ½ data from old page moved to new

• Rows– 8060 bytes and 1024 columns

B-tree Key Points to Remember• Tree portion includes key attributes only

– ordered as in create index statement

• Keys packed in index pages– Fewer bytes per key -> more keys per

page/extent -> fewer page faults per access.

• Clustered indexes have records at leafs– Records are in data pages– Data pages sequentially linked

• Non-Clustered indexes point into heap or tree portion of clustered index

Create Index Statement

• Create [unique] [clustered | nonclustered] index <indexName> on <table or view> (col. Name> [asc|desc] [,…]) include (<col name> [,…]) (with …

pad_index, fillfactor, ignore_dup_key drop_existing, statistics_norecompute sort_in_tempDB, online, allow_row_locks, allow_page_locks, maxdop

Create Index Details

• Asc/Desc– Ascending & descending sort order for index

• Include (cool!)– Includes col in leaf nodes of clustered index

• Allows very fast access to non-key attribute• Useful with very large record – fewer page faults

Create Index “with” Details

• Pad_Index= (on|off)– Initial fill-factor for index’s non-leaf pages

• Fill Factor = <1% – 100%>– Default is index pages are as full as possible minus

two records– Fill factor is how full after index is created

• Once split goes to 50%

• Ignore_dup_key– Circumvent unique key constraint somewhat

• Still get error message, but no rollback• useful for storing unique values but trashing transactions


• Drop_Existing– Any existing index with same name is

dropped with this create statement• More efficient than drop index followed by create

for clustered index as no need to touch non-clustered indexes or data pages

• Statistics_nonrecompute– Default: sql server automates the process of

updating the statistics on tables/ indexes– This option says you will maintain stats

• DON’T USE THIS!


• Sort_In_tempdb– Only useful when tempdb on physically

separate drive – Reads/write for sort compete with read/writes

to write data and index pages• This make sense if and only if you understand disk

writes - discussion

• Online– Keeps table available to users while creating

index – sounds good, but ….!!


• Allow row/page locks– Don’t use unless really good

• MAXDOP– Overrides system setting for max degree of

parallelism while building index• How many processes are used to construct an

index. MAXDOP sets limit on how many processors per operation.

– Compare and contrast these terms


• ON– Can store index separately from data

• Space for index spread across drives• I/O for indexes not compete with physical data

retrieval

XML Indexes

• Indexes into XML data– Xml VERY unstructured– Column can be of type xml in sql server– Create index on xml column – Page 276 for more details

Implied indexes

• created by some constraints – Primary Key– Unique

• Can easily end up with duplicate constraints and not realize it

Deciding what indexes go where?

• Indexes speed access, but costly to maintain– Almost every update to table requires altering

both data pages and every index.• All inserts and deletions affect all indexes• Many updates will affect non-clustered indexes

• Sometimes less is more– Not creating an index sometimes may be best

• Code for tranasaction have where clause? What columns used? Sort requried?

Deciding what indexes go where?

• Selectivity– Indexes, particularly non-clustered indexes,

are primarily beneficial in situations where there is a reasonably HIGH LEVEL of Selectivity within the index.

• % of values in column that are unique• Higher percentage of unique values, the higher the

selectivity– If 80% of parts are either ‘red’ or ‘green’ not very

selective

Choosing Clustered Index

• Only one per table! - Choose wisely• Default, primary key creates clustered index

– Do you really want your prime key to be clustered index?

– Option: create table foo myfooExample(column1 int identify primary key nonclustered column2 ….)

– Changing clustered index can be costly• How long? Do I have enough space?

Clustered Indexes Pros & Cons• Pros

– Clustered indexes best for queries where columns in question will frequently be the subject of

• RANGE query (e.g., between)• Group by with max, min, count

– Search can go straight to particular point in data and just keep reading sequentially from there.

– Clustered indexes helpful with order by based on clustered key

Clustered Indexes Pros & Cons

• The Cons – two situations– Don’t use clustered index on column just

because seems thing to do (e.g., primary key default)

– Lots of inserts in non-sequential order• Constant page splits, include data page as well as

index pages• Choose clustered key that is going to be sequential

inserting• Don’t use a clustered index at all perhaps?

Column Order Matters

• (P#, S#, Qty) – P# S# together are primary key

• One index that includes all columns is not useful in all situations!– Only end up storing data a second time.

• Clustered index of P#S# not same as S#P#– P#S# can lookup P# fairly easily, but looking up S#

requires a linear search.– S#P# can lookup S# fairly easily, but not P#.

• Note that even though key of S#P# means can’t lookup P# quickly, are some advantages in include P# in key.

Dropping Indexes

• Sometimes makes sense to constantly re-analyze situation and add indexes– DON’T FORGET TO DROP INDEXES!!

• Big overhead for inserts and deletes

– Always ask yourself: “Can I get rid of any of these?”

– Drop INDEX <TABLE NAME> <Index name>

Index Tuning Wizard

• Hopefully you will evolve to the point you don’t need to use this gadget– But still can be quite handy

• Uses workload file generated using sql server profiler (ch 19)

• Not ideal to depend on this tool, but it may make some suggestions that you have not thought of.

Maintaining Indexes

• Page Splits – Insert/delete order and rate critical

• Fragmentation– Not OS fragementation – e.g. defrag tool– Happens when database grows, pages split,

and then data eventually deleted.– Btrees great on maintaining balance on

insertions, but with deletes, can end up with many pages containing small # of records.

Fragmentation Problems

• Wasted space– Sql server allocates an extend at a time

• Could end up with an extent, containing single page, with single record.

• Thrashing (way too many disk hits)– Could end up with page 1 of data on one

extend, page 2 on another, page 3 on the first, page 4 on another, ….

– Records all over the place• Bit better for inserts but really bad for reads!

Identifying Fragmentation vs. page splits

• DBCC SHOWCONTIG– Page 283– Demo with northwind

DBREINDEX & Fillfactor

• DBCC DBREINDEX– Can drop index and rebuild

• Usually best to use drop-existing

– Completely rebuilds the index– If supply table name, rebuilds all indexes on

table.• Re-establishes base fillfactors etc.

– Strongly recommend disallow transactions while doing this.

– Rebuilding is probably better.

Summary• Clustered indexes usually faster than non-

clustered• Only place non-clustered indexes on

columns with high selectivity (>95% of rows are unique on that column)

• All data manipulation language statements can benefit, from indexes, but inserts, deletes, and updates are slowed.

• Indexes take up space and require page hits.

Summary• Index used only if first column in index is

relevant to query

• Indexes can hurt as much as they help– Make sure don’t add one by accident.

• Indexes can provided structured data performance to unstructured XML, but overhead involved.

Summary• Is there a high level of selectivity on the

data?– if yes and is frequently target of where clause,

then add index

• Have I dropped indexes I no longer need?– Why not?

• Do I have a maintenance strategy established?– Why not?

Critical Questions• Are there lots of inserts of modifications to

this table?– If yes, keep indexes to minimum

• Is this a reporting table? – E.g. not many inserts but lots of reports run many

different ways– If yes, more indexes are fine.

• Is there a high level of selectivity on the data?– If yes and is frequently target of where clause,

then add index

sql server storage and index structures physical data organization indexes b-trees sql server data...

Documents