2015-12-21 index tuning conventional index. 2015-12-22 secondary index to speed up queries on...

23/4/18 1

Index Tuning

Conventional index

23/4/18 2

Secondary index

• To speed up queries on attributes not within primary key

• Primary index– Determine the placement of records in the data file– Each table has only one primary index

• Secondary– Only give the location of the records– One table may have multiple secondary index– Always dense

23/4/18 3

Secondary indexesSequencefield

5030

7020

4080

10100

6090

23/4/18 4


5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

23/4/18 5


5030

7020

4080

10100

6090

• Dense index10203040

506070...

105090...

sparsehighlevel

23/4/18 6

With secondary indexes:

• Lowest level is dense

• Other levels are sparse

Also: Pointers are record pointers

(not block pointers; not computed)

23/4/18 7

Application of secondary indexes in clustered file

• Given relations– Movie(title, year, length, incolor, studioName, producerC#)– Studio(name, address, presC#)

• Suppose the following query is typical– SELECT t i t l e , year

FROM Movie, StudioWHERE presC# = zzz AND Movie.studioName = Studio.name;

• Clustered file structure Secondary index on presC# can minimize disk I/Os!

23/4/18 8

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

23/4/18 9


1020

4020

4010

4010

4030

10101020

20304040

4040...

one option...

Problem:excess overhead!

• disk space• search time

23/4/18 10


1020

4020

4010

4010

4030

10

another option...

4030

20Problem:variable sizerecords inindex!

23/4/18 11


1020

4020

4010

4010

4030

10203040

5060...

Another idea (suggested in class):Chain records with same key?

Problems:• Need to add fields to records• Need to follow chain to know records

23/4/18 12


1020

4020

4010

4010

4030

10203040

5060...

buckets

Using Indirection!

23/4/18 13

Why “bucket” idea is useful

Indexes RecordsName: primary EMP (name,dept,floor,...)

Dept: secondaryFloor: secondary

We can use the pointers in the buckets to help answer queries without looking at most of records in the data file!

23/4/18 14

Query: Get employees in

(Toy Dept) ^ (2nd floor)

Dept. index EMP Floor index

Toy 2nd

Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s

23/4/18 15

This idea used in text information retrieval

Documents

...the cat is fat ...

...was raining cats and dogs...

...Fido the dog ...

Inverted lists

cat

dog

23/4/18 16

IR QUERIES

• Find articles with “cat” and “dog”• Find articles with “cat” or “dog”• Find articles with “cat” and not “dog”

• Find articles with “cat” in title

• Find articles with “cat” and “dog” within 5 words

23/4/18 17

Common technique: more info in inverted list

cat Title 5

Title 100

Author 10

Abstract 57

Title 12

d3d2

d1

dog

typeposit

ion

locatio

n

23/4/18 18

Summary so far

• Conventional index– Basic Ideas: sparse, dense, multi-level…– Duplicate Keys– Deletion/Insertion– Secondary indexes

– Buckets of Postings List

23/4/18 19

Conventional indexes

Advantage:

- Simple- Index is sequential file

good for scans

Disadvantage:- Inserts expensive, and/or- Lose sequentiality & balance

23/4/18 20

Example Index (sequential)

continuous

free space

102030

405060

708090

39313536

323834

33

overflow area(not sequential)

summary