index management in shallow depth
DESCRIPTION
These slides afford in shallow depth the index management question. There are some example on how your choice can change your relation in terms of I/O accessesTRANSCRIPT
Index managementin shallow depth
Andrea Giuliano@bit_shark
Architecture of a DBMS
Andrea Giuliano @bit_shark
Disk managerThe disk manager provides the following commands for a page
• allocate a page • deallocate a page • read a page • write a page
The size of a page is chosen to be the size of a disk block and pages are stored as disk blocks so that reading or writing a page can be done in one disk I/O
Andrea Giuliano @bit_shark
Buffer manageris the software layer responsible for bringing pages from disk to main memory as needed and manages the available main memory by partitioning it into a collection of pages
the buffer pool
Andrea Giuliano @bit_shark
Mysql indexes are stored!in B-trees
B-tree structure
data entry: record stored in an index file data record: record stored in a database file
Andrea Giuliano @bit_shark
Clustered indexdata entries and data records have the same (or close) order
Andrea Giuliano @bit_shark
Unclustered indexdata entries and data records have the different order criteria
Andrea Giuliano @bit_shark
Dense indexAn index is dense if every value of the search key that appears in the data file appears also in at least one data entry of the index
Andrea Giuliano @bit_shark
An index is sparse if every value of the search key that appears in the data entry points to a page of the data record
Sparse index
Andrea Giuliano @bit_shark
Search in a b-tree
Cost: logF n
fan-out
n leaves
Andrea Giuliano @bit_shark
Insert and delete
..too deep
Insert and delete operations must keep the tree balanced towards split, redistribution and coalesce techniques.
Andrea Giuliano @bit_shark
How can I compute I/O accesses?
Andrea Giuliano @bit_shark
ask for code, author, publisher af all books with a given cost
how many page accesses do we need to answer to the query?
Book case
Query:
• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on
average) • dense non-clustering B+-tree index with search key cost
BOOK
codeauthorcostpublisher
Andrea Giuliano @bit_shark
Let’s build the index structure !• we know that each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20
Book case• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on
average) • dense non-clustering B+-tree index with search key cost
BOOK
codeauthorcostpublisher
Andrea Giuliano @bit_shark
We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)
How many leaves are there in the tree? 2.000.000/13 = 153.846 leaves
Book case
…
fan-out: 20
Andrea Giuliano @bit_shark
There are 153.846 leavesIn order to go to the leaves we need
Book case
log20 (153.846) = 4 I/O page accesses
…
fan-out: 20
Andrea Giuliano @bit_shark
Book case
Remember, we have on average 200 records with the same value of the attribute cost therefore 200/13 = 15 pages (on average) We need to visit these leaves because the index is dense and for each tuple we have to access the 200 data record in order to obtain the other attributes
…
fan-out: 20
Andrea Giuliano @bit_shark
The total cost is: 4 + 15 + 200 = 219 I/O accesses
Book case
~ 3 sec
…
fan-out: 20
Andrea Giuliano @bit_shark
• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on
average) • sparse clustering B+-tree index with search key cost
ask for code, author, publisher af all books with a given cost
how many page accesses do we need to answer to the query?
BOOK
codeauthorcostpublisher
Book case
Query:
Andrea Giuliano @bit_shark
Let’s build the index structure !• we know that each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20
Book case• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on
average) • sparse clustering B+-tree index with search key cost
BOOK
codeauthorcostpublisher
Andrea Giuliano @bit_shark
We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)BUT each data entry points to a data record page (and not to a tuple)
How many data record pages do we have? 2.000.000/10 = 200.000 data record pages
Book case
…
fan-out: 20
Andrea Giuliano @bit_shark
We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries) BUT each data entry points to a data record (and not to a tuple)
How many leaves there are in the tree? 200.000/13 = 15.384 leaves
Book case
…
fan-out: 20
Andrea Giuliano @bit_shark
There are 15.384 leavesIn order to go to the leaves we need
Book case
log20 (15.384) = 3 I/O pages accessesRemember, we have on average 200 data records with the same value of the attribute cost therefore 200/10 = 20 data record pages to visit
…
fan-out: 20
Andrea Giuliano @bit_shark
The total cost is: 3 + 20 = 23 I/O accesses
Book case
~ 0.3 sec
…
fan-out: 20
Andrea Giuliano @bit_shark
And what if the attributes we wantwere part of the search key?
Book case
Andrea Giuliano @bit_shark
In the worst case we have to visit all the 2.000.000 tuples
Book casewithout index
~ 50 min
Andrea Giuliano @bit_shark
Ο λογος δηλοι οτι
Think before doing
?Thanks!
Andrea Giuliano @bit_shark
References:
https://www.flickr.com/photos/james_wheeler/9340597900/sizes/o/in/photolist-fep1ko-bQByHk-duQ4Qr-82aKA9-82aL6y-8Tn6uc-iPzADZ-99etoQ-cZy6e9-jyqdnW-bxHjLf-8gP59X-cZDq3h-cZDq9d-cZDq8N-cZDq7d-cZDqwy-cZDqym-cZDqCf-cZDqAo-cZDqsS-cZDqnQ-cZDqey-cZDqkA-cZDqkJ-
cZDqLh-7Dg8pp-a7f1QC-a7c8rK-7Dg7n6-gCbBVr-9FZ4J1-e6XCpX-aZnsGv-ecTv5D-atFACM-gjXozL-9LBjtC-knoEf8-8LGGqw-a8Hw3M-gvL3bp-a7gmG6-aju6p2-
brQ76S-7Ckbm1-85XaXe-8JBcwN-9oYU3p-a3VsvR-atFAup/ http://www.woking.gov.uk/images/instances/00004A290FD4.C0A801BA.000079A7.0015.jpg
http://assets.20bits.com/20080513/b-tree.png, http://dblab.cs.toronto.edu/courses/443/2014/basic-index/dense-index.png http://dblab.cs.toronto.edu/courses/443/2014/basic-index/sparse-index.png
http://www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg, http://www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg http://webhostinggeeks.com/blog/wp-content/uploads/2012/07/611157_small.jpg
https://farm7.staticflickr.com/6237/6230474283_50d1f0f4ac_b.jpg, https://www.flickr.com/photos/javiercosio/6230474283/sizes/l/in/photolist-
auyNWp-9GRjnM-9GRjmZ-9GRjFz-9GRjvV-9GUcPq-9GRjz8-9GRjm4-9GUcTb-9GRjCt-9GUcQC-9GUcYm-9GRjZD-9GRk1Z-9GUcMU-9GUcGh-9GRjsg-9GRjYZ-9GRjTF-9GRjGe-9GRjNe-9GRjBz-9GUcNs-9GU
d25-9GUcKS-9GRjPn-9GRjRg-9GUcBh-9GUcVf-9GUcxj-9GUcuu-brLe7G-e8s4Cw-fyi4Rj-83LyYW-83HuFg-83LyLm-83LzUY-83Htrv-83Hv2H-83LBBb-83LAg9-83LBhQ-83Hw8t-83HtKD-83H
sYk-afT6uk-cwVhL1-ceVgGC-8tFezr-8SeW9d/