2015-12-21 index tuning conventional index. 2015-12-22 secondary index to speed up queries on...
TRANSCRIPT
23/4/18 1
Index Tuning
Conventional index
23/4/18 2
Secondary index
• To speed up queries on attributes not within primary key
• Primary index– Determine the placement of records in the data file– Each table has only one primary index
• Secondary– Only give the location of the records– One table may have multiple secondary index– Always dense
23/4/18 3
Secondary indexesSequencefield
5030
7020
4080
10100
6090
23/4/18 4
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
does not make sense!
23/4/18 5
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Dense index10203040
506070...
105090...
sparsehighlevel
23/4/18 6
With secondary indexes:
• Lowest level is dense
• Other levels are sparse
Also: Pointers are record pointers
(not block pointers; not computed)
23/4/18 7
Application of secondary indexes in clustered file
• Given relations– Movie(title, year, length, incolor, studioName, producerC#)– Studio(name, address, presC#)
• Suppose the following query is typical– SELECT t i t l e , year
FROM Movie, StudioWHERE presC# = zzz AND Movie.studioName = Studio.name;
• Clustered file structure Secondary index on presC# can minimize disk I/Os!
23/4/18 8
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
23/4/18 9
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10101020
20304040
4040...
one option...
Problem:excess overhead!
• disk space• search time
23/4/18 10
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10
another option...
4030
20Problem:variable sizerecords inindex!
23/4/18 11
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
Another idea (suggested in class):Chain records with same key?
Problems:• Need to add fields to records• Need to follow chain to know records
23/4/18 12
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
buckets
Using Indirection!
23/4/18 13
Why “bucket” idea is useful
Indexes RecordsName: primary EMP (name,dept,floor,...)
Dept: secondaryFloor: secondary
We can use the pointers in the buckets to help answer queries without looking at most of records in the data file!
23/4/18 14
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index
Toy 2nd
Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s
23/4/18 15
This idea used in text information retrieval
Documents
...the cat is fat ...
...was raining cats and dogs...
...Fido the dog ...
Inverted lists
cat
dog
23/4/18 16
IR QUERIES
• Find articles with “cat” and “dog”• Find articles with “cat” or “dog”• Find articles with “cat” and not “dog”
• Find articles with “cat” in title
• Find articles with “cat” and “dog” within 5 words
23/4/18 17
Common technique: more info in inverted list
cat Title 5
Title 100
Author 10
Abstract 57
Title 12
d3d2
d1
dog
typeposit
ion
locatio
n
23/4/18 18
Summary so far
• Conventional index– Basic Ideas: sparse, dense, multi-level…– Duplicate Keys– Deletion/Insertion– Secondary indexes
– Buckets of Postings List
23/4/18 19
Conventional indexes
Advantage:
- Simple- Index is sequential file
good for scans
Disadvantage:- Inserts expensive, and/or- Lose sequentiality & balance
23/4/18 20
Example Index (sequential)
continuous
free space
102030
405060
708090
39313536
323834
33
overflow area(not sequential)
summary