2015-12-21 index tuning conventional index. 2015-12-22 secondary index to speed up queries on...

21
22/3/26 1 Index Tuning Conventional index

Upload: stephen-knight

Post on 13-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 1

Index Tuning

Conventional index

Page 2: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 2

Secondary index

• To speed up queries on attributes not within primary key

• Primary index– Determine the placement of records in the data file– Each table has only one primary index

• Secondary– Only give the location of the records– One table may have multiple secondary index– Always dense

Page 3: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 3

Secondary indexesSequencefield

5030

7020

4080

10100

6090

Page 4: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 4

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

Page 5: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 5

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Dense index10203040

506070...

105090...

sparsehighlevel

Page 6: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 6

With secondary indexes:

• Lowest level is dense

• Other levels are sparse

Also: Pointers are record pointers

(not block pointers; not computed)

Page 7: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 7

Application of secondary indexes in clustered file

• Given relations– Movie(title, year, length, incolor, studioName, producerC#)– Studio(name, address, presC#)

• Suppose the following query is typical– SELECT t i t l e , year

FROM Movie, StudioWHERE presC# = zzz AND Movie.studioName = Studio.name;

• Clustered file structure Secondary index on presC# can minimize disk I/Os!

Page 8: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 8

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

Page 9: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 9

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10101020

20304040

4040...

one option...

Problem:excess overhead!

• disk space• search time

Page 10: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 10

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10

another option...

4030

20Problem:variable sizerecords inindex!

Page 11: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 11

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10203040

5060...

Another idea (suggested in class):Chain records with same key?

Problems:• Need to add fields to records• Need to follow chain to know records

Page 12: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 12

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10203040

5060...

buckets

Using Indirection!

Page 13: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 13

Why “bucket” idea is useful

Indexes RecordsName: primary EMP (name,dept,floor,...)

Dept: secondaryFloor: secondary

We can use the pointers in the buckets to help answer queries without looking at most of records in the data file!

Page 14: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 14

Query: Get employees in

(Toy Dept) ^ (2nd floor)

Dept. index EMP Floor index

Toy 2nd

Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s

Page 15: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 15

This idea used in text information retrieval

Documents

...the cat is fat ...

...was raining cats and dogs...

...Fido the dog ...

Inverted lists

cat

dog

Page 16: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 16

IR QUERIES

• Find articles with “cat” and “dog”• Find articles with “cat” or “dog”• Find articles with “cat” and not “dog”

• Find articles with “cat” in title

• Find articles with “cat” and “dog” within 5 words

Page 17: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 17

Common technique: more info in inverted list

cat Title 5

Title 100

Author 10

Abstract 57

Title 12

d3d2

d1

dog

typeposit

ion

locatio

n

Page 18: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 18

Summary so far

• Conventional index– Basic Ideas: sparse, dense, multi-level…– Duplicate Keys– Deletion/Insertion– Secondary indexes

– Buckets of Postings List

Page 19: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 19

Conventional indexes

Advantage:

- Simple- Index is sequential file

good for scans

Disadvantage:- Inserts expensive, and/or- Lose sequentiality & balance

Page 20: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

23/4/18 20

Example Index (sequential)

continuous

free space

102030

405060

708090

39313536

323834

33

overflow area(not sequential)

Page 21: 2015-12-21 Index Tuning Conventional index. 2015-12-22 Secondary index To speed up queries on attributes not within primary key Primary index –Determine

summary