special chapter multidimensional indexesynucc.yu.ac.kr/~hrcho/courses/fs/special.pdf ·...
TRANSCRIPT
1
Special Chapter
MultidimensionalIndexes
Yeungnam University, Database Lab. Multidimensional Index - 1
Table of Contents
● Applications Needing Multiple Dimensions
● Hash-Like Structures for Multidimensional Data✔ Grid Files✔ Partitioned Hashing
● Tree-Like Structures for Multidimensional Data✔ KD-Trees✔ Quad Trees✔ R-Trees
● Bitmap Indexes
2
Yeungnam University, Database Lab. Multidimensional Index - 2
1. Applications
● Geographic Information Systems✔ Partial match query✔ Range query✔ Nearest-neighbor query✔ Where-am-I query
● Data Cubes✔ Example: Data mining information about sales.
– Sale(store, day, item, color, size): 5 dimensions– Customer(age, salary, zip-code, marital-status)
✔ “Give the sales of pink shirts for each store and eachmonth of 1998.”
Yeungnam University, Database Lab. Multidimensional Index - 3
Multidimensional Queries in SQL
● RDBS�����������
● Nearest-Neighbor Queries✔ Sample Relation: Points(x, y)✔ Sample Query
– (10, 20)������� Point�?✔ Sample SQL:
SELECT * FROM Points pWHERE NOT EXISTS(
SELECT * FROM POINTS qWHERE (q.x - 10) * (q.x - 10) + (q.y -20) * (q.y - 20) <
(p.x - 10) * (p.x - 10) + (p.y -20) * (p.y - 20);
3
Yeungnam University, Database Lab. Multidimensional Index - 4
Multidimensional Queries in SQL
● (10, 20)��������?✔ Sample Relation: Rectangles(id, xll, yll, xur, yur)✔ Sample SQL
SELECT id FROM RectanglesWHERE xll ≤ 10 and yll ≤ 10 and xur ≥ 20 and yur ≥ 20;
● Summarize the sales of pink shirts by day and store.✔ Sample Relation: Sales(store, day, item, color, size)✔ Sample SQL
SELECT day, store, COUNT(*) AS totalSaleFROM SalesWHERE item = ‘shirt’ AND color = ‘pink’GROUP BY day, store ;
Yeungnam University, Database Lab. Multidimensional Index - 5
Executing Queries Using ConventionalIndexes
● Range Queries: 2�����✔ x���� �, y���� �✔ �� �������������,�����✔ Example
–��• 1,000,000 points,� : 0 ~ 1000 (!"#$)• 100 points/block, B-tree leaf: 200 key-pointer pairs
– Sample Query: 450 ≤ x ≤ 550, 450 ≤ y ≤ 550– x�� ���: 501 disk I/O’s (root%&'()*)– y�� ���: 501 disk I/O’s–&'(+, > 1,000-.: 1000 disk I/O/0– Otherwise:����� �123� disk I/O
4
Yeungnam University, Database Lab. Multidimensional Index - 6
Executing Queries Using ConventionalIndexes
● Nearest-Neighbor Queries: 2�����✔ 45(6
–����� d�7�89 range query:;.– Range query�7<=��>?@AB.
✔ Discussion– There is no point within the selected range.– The closest point within the range might not be the
closest point overall.
C
CC
Closest pointin range
Possiblecloser point
Yeungnam University, Database Lab. Multidimensional Index - 7
Limitations of Conventional Indexes
● Range���� Query��������� �!"�#$%&' I/O().
● Multidimension� *��+,-./0.��� Index().
● �1-./0.� Concatenate2 Index?✔ Example: Index on x•y
5
Yeungnam University, Database Lab. Multidimensional Index - 8
2. Hash-Like Structures
● Grid Files
● Partitioned Hashing
● Comparison of Grid Files and Partitioned Hashing
Yeungnam University, Database Lab. Multidimensional Index - 9
2.1 Grid Files
Bi
Bj
Buckets100K50K10K5K2K1K
0 1 2 3 4 5 6
1 2 3 4 5 6
Linear Scale for balance
0
1
2
3
4
Central
Mianus
Perryridge
Towsend
1
4
3
2
Linear Scale forbranch-name
Grid Array
6
Yeungnam University, Database Lab. Multidimensional Index - 10
Example
● Who buys gold jewelry?✔ Sample Relation: Customer(age, salary)
(60, 260)(50, 275)(45, 350)(25, 400)
(30, 260)(85, 140)(70, 110)(50, 120)
(50, 100)(50, 75)(45, 60)(25, 60)
Yeungnam University, Database Lab. Multidimensional Index - 11
Example - A Grid File
* **** *
*
* **
*
*
0 40 55 100
90K
225K
500K
Age
Salary
7
Yeungnam University, Database Lab. Multidimensional Index - 12
Example - The Hash Table
0-90
90-225
225+
55+40-550-40
50, 12050, 100
50, 7545, 60
25, 60
25, 40030, 260
50, 27545, 350
60, 260
85, 14070, 110
Yeungnam University, Database Lab. Multidimensional Index - 13
Operations on Grid Files
● Lookup✔ Grid line�@DE & Bucket arrayFG✔ H: (50, 75)��
● Partial-Match Queries✔ H: Salary� $200K� Customer��
● Range Queries✔ Ranges define a region of buckets.✔ H: 35 < age ≤ 45; 50 < salary ≤ 100
● Nearest-Neighbor Queries✔ Lookup point P. Consider pointes in that bucket.✔ ��� bucket�I�JK pointLM?H: (45, 200)✔ Bucket� point�LM8NOPQR?
8
Yeungnam University, Database Lab. Multidimensional Index - 14
Operations on Grid Files
● Insertion✔ Point P�S�T bucket B�US Lookup.✔ B�S�VWXYLM8Z, No Problem.✔ If B is full:
– Add overflow block.• Overflow chainY[\]QR^_S8
– Grid LineP`,abcdK Grid Linee�.• Which dimension?• What position• Grid Line�AB�fg12 empty bucketh^
Yeungnam University, Database Lab. Multidimensional Index - 15
Example - Insertion
* **** *
*
* **
*
*
0 40 55 100
90K
225K
500K
Age
Salary
*130K
9
Yeungnam University, Database Lab. Multidimensional Index - 16
2.2 Partitioned Hashing
● !345
✔ A1, A2, …, An: Attributes✔ Each attribute is mapped to a fixed number of bi bits by a
hash function hi.✔ Bucket address B = b1 b2 … bn
✔ *�: Partial Match QueryNijk?
● Example: Gold-Jewelry✔ One bit from age (age module 2)✔ Two bits from salary (salary modulo 4)
Yeungnam University, Database Lab. Multidimensional Index - 17
Example
30, 26050, 100
70, 110
50, 7550, 275
25, 6025, 400
45, 350
25, 6025, 400
25, 6025, 400
111110101100011010001000
10
Yeungnam University, Database Lab. Multidimensional Index - 18
Grid File� Partitioned Hashing���
● Grid File is Good.✔ Range queryl Nearest-Neighbor query�m<n
● Partitioned Hashing is Good.✔ Directory Lookupo�Yp/0✔ Grid files suffer from sparseness if dimension is high.
–qr(srX� correlationLM�_ (bY,ot)– Partitioned hashing tends to distribute data evenly
among buckets.
● Both are Good.✔ Partial match query�NiV3uv.
Yeungnam University, Database Lab. Multidimensional Index - 19
3. Tree-Like Structures
● Multiple-Key Indexes
● KD-Trees
● Quad Trees
● R-Trees
11
Yeungnam University, Database Lab. Multidimensional Index - 20
3.1 Multiple-Key Access
● �6✔ Index on one attribute provides pointer to an index on the
other.
I1
I2
I3
Index onfirst attribute
Index onsecond attribute
Yeungnam University, Database Lab. Multidimensional Index - 21
Example
25304550607085
60400
60350
260
75100120275
110
140
260
12
Yeungnam University, Database Lab. Multidimensional Index - 22
Performance of Multiple-Key Indexes
● Partial-Match Queries✔ First attribute�$w9x�fg^_yz
● Range Queries✔ Root indexx{� �$w|% key@}P��✔ H: 35 ≤ age ≤ 55, 100 ≤ salary ≤ 200
● Nearest-Neighbor Queries✔ a( d��� Range Querydy~✔ x0 - d ≤ x ≤ x0 + d, y0 - d ≤ y ≤ y0 + d✔ � ���Y�ab, (x0, y0)���a(� d�v�QR, d���89v���.
Yeungnam University, Database Lab. Multidimensional Index - 23
3.2 KD-Trees
● �6✔ A main-memory data structure based on binary search trees.
– Can be fixed somewhat to adapt to the block model ofstorage.
✔ Levels rotate among the dimensions, partitioning the pointsby comparison with a value for that dimension.
13
Yeungnam University, Database Lab. Multidimensional Index - 24
Example - KD Tree
Salary 150
Age 47Age 60
Salary 80 Salary 300
Age 38
70, 110
85, 140
50, 275
60, 260
25, 400
45, 350
30, 26050, 100
50, 120
45, 60
50, 75
25, 60
Yeungnam University, Database Lab. Multidimensional Index - 25
Example - Partition
* **** *
*
* **
*
*
150K
60
47
80K
300K
38
14
Yeungnam University, Database Lab. Multidimensional Index - 26
Operations on KD-Trees
● Lookup✔ Binary search tree��!����d✔ �,� level�vn�|%qr(sr%yz
● Insert✔ Lookup�S�T block��✔ If block is full, split it.✔ H: (35, 500K)�S�
Yeungnam University, Database Lab. Multidimensional Index - 27
Example - Insert (35, 500K)
Salary 150
Age 47Age 60
Salary 80 Salary 300
Age 38
70, 110
85, 140
50, 275
60, 260
35, 500
45, 350
30, 26050, 100
50, 120
45, 60
50, 75
25, 60
Age 35
25, 400
15
Yeungnam University, Database Lab. Multidimensional Index - 28
Operations on KD-Trees
● Partial-Match Queries✔ Level���8%qr(sr@Y�PQR,'�������
✔ H: age = 50�'� point��
● Range Queries✔ � � internal node�@P$wVQR,'�������
✔ H: 35 ≤ age ≤ 55, 100 ≤ salary ≤ 200
● Nearest-Neighbor Queries✔ Range querydy~89�(.
Yeungnam University, Database Lab. Multidimensional Index - 29
Adapting KD-Trees to SecondaryStorage
● Multiway Branches at Interior Nodes✔ ���� fan-outP B-tree��!�j��d��.✔ ���: Reorganize� Range�=�T�_^.
● Group Interior Nodes into Blocks✔ Fan-out2 2d5�.✔ But,9�����}P8b� block���.✔ Risk
– It may not be possible to group nodes at lower levelsproductively.
16
Yeungnam University, Database Lab. Multidimensional Index - 30
3.3 Quad Trees
● �6✔ Nodes split along all dimensions at once.✔ Division fixed: by quadrants.
– Child pointer���: SW, SE, NW, NE✔ As for KD-Trees, we cannot make the number of levels
uniform.
● Operations on Quad Trees✔ Internal node� key@< search key�)�n ��fg 4�N child pointer�AB89 traverse.
✔ Insert� block fullYZ,�� region��K��=��dv� 4�� subregion�d#V. ←←←← ���?
Yeungnam University, Database Lab. Multidimensional Index - 31
Example - Partition
* **** *
*
* **
*
*
0 100
400K
Age
Salary
17
Yeungnam University, Database Lab. Multidimensional Index - 32
Example - A Quad Tree
50, 200
25, 30075, 100 70, 110
85, 140
50, 275
60, 260
25, 400
45, 350
30, 260
50, 100
50, 120
45, 60
50, 75
25, 60
SWSE NE
NW
Yeungnam University, Database Lab. Multidimensional Index - 33
3.4 R-Trees
● �6✔ Height-balanced tree similar to the B-tree.✔ Index record entries in each leaf node: (I, RID)
– I: n-dimensional rectangle which is the bounding boxof a spatial object.
– RID: A record identifier in the database.✔ Index record entries in each non-leaf node: (I, ChildPTR)
– I: The smallest rectangle that spatially contains therectangle in child node.
– ChildPTR: A pointer to a successor node in the nextlevel.
18
Yeungnam University, Database Lab. Multidimensional Index - 34
Example - 2 Dimensional Space
0 100
100
school
road1road2
house1 pipeline
house2
Yeungnam University, Database Lab. Multidimensional Index - 35
Example - An R-Tree
((0,0), (60, 50)) ((20,20), (100, 80))
road1 road2 house1 school house2 pipeline
19
Yeungnam University, Database Lab. Multidimensional Index - 36
Operations on R-Tree
● Lookup✔ Root nodex{ 5�8% pointer�$w8%
rectangleP recursive8¡��✔ 9��� RectangleLM�_
● Range Query✔ Query region<=¢|%'� rectangle��
● Insert a New Region R✔ RP$w8% rectangleYLMVQR
– Leaf BlockY fullY£¤QR,�¥¦S�– Leaf BlockY full�QR, Region Split.
✔ RP$w8% rectangleYLM8NOPQR– Extending region
Yeungnam University, Database Lab. Multidimensional Index - 37
Example - Extending Region
0 100
100
school
road1road2
house1 pipeline
house2
pop
house3
20
Yeungnam University, Database Lab. Multidimensional Index - 38
Performance of R-Tree
● �6✔ The coverage of a level
– The total area of all the rectangles associated with thenodes of that level.
✔ The overlap of a level– The total area contained within two or more nodes.
● * 78
✔ The minimum coverage reduces the amount of emptyspace covered by the nodes.
✔ The minimum overlap reduces the false drop problemwhich cause unnecessary searches of tree.
Yeungnam University, Database Lab. Multidimensional Index - 39
4. Bitmap Indexes
● Definition of Bitmap Indexes
● Motivation of Bitmap Indexes
● Compressed Bitmaps
● Managing Bitmap Indexes
21
Yeungnam University, Database Lab. Multidimensional Index - 40
4.1 Definition of Bitmap Indexes
● �6✔ ��:����3 = n,��� �%5�✔ Bitmap Index of a Field F
– Bit vectors of length n– F�'�@�����– i§¨���� F�@�d v��©QR, v���ª{�� i§¨ bit = 1.
● Example: Relation of (integer, string)✔ (30, foo), (30, bar), (40, baz), (50, foo), (40, bar), (30, baz)✔ Two types of bitmap indexes
– 30: 110001, 40: 001010, 50: 000100– foo: 100100, bar: 010010, baz: 001001
Yeungnam University, Database Lab. Multidimensional Index - 41
4.2 Motivation of Bitmap Indexes
● Bitmap Index��29�7:;�<✔ Space Overhead
– Field� m × n bit/0 (m:v«@���)– CompressionPY��� space overhead¬?�_
✔ ()����}
–����e�,®�TQR Bitmap Index(j¯– Field@}�e�,®�– Field@���n°� Bitmap Index��<�.
● = Bitmap Index������?✔ Partial-match query�m±n�Ni✔ Range query�²«�(
22
Yeungnam University, Database Lab. Multidimensional Index - 42
Bitmap Index���� Partial-MatchQuery ��
● Example✔ Sample Relation: Movie(title, year, length, studio-Name)✔ Sample SQL
SELECT Title FROM MovieWHERE studio-Name = ‘Disney’ and year = 1995;
✔ ��: year� studio-Name��� Bitmap IndexLM✔ Query�(jk
– year� 1995@��� bitmap B1@��
– studio-Name� ‘Disney’@��� bitmap B2@��
– Result = B1 & B2 (bitwise AND)– Result�� bit� 1����³��
Yeungnam University, Database Lab. Multidimensional Index - 43
Bitmap Index���� Range Query �
�
● Example: Gold-Jewelry1: (25, 60) 2: (45, 60) 3: (50, 75) 4: (50, 100)5: (50, 120) 6: (70, 110) 7: (85, 140) 8: (30, 260)9: (25, 400) 10: (45, 350) 11: (50, 275) 12: (60, 260)
✔ Bitmap Index for Age25: 100000001000, 30: 000000010000, 45: 01000000010050: 001110000010, 60: 000000000001, 70: 00000100000085: 000000100000
✔ Bitmap Index for Salary60: 110000000000, 75: 001000000000, 100: 000100000000110: 000001000000, 120: 000010000000,140: 000000100000, 260: 000000010001,275: 000000000010, 350: 000000000100,400: 000000001000
23
Yeungnam University, Database Lab. Multidimensional Index - 44
Bitmap Index���� Range Query �
�
● Example: Gold-Jewelry>?✔ Sample Query: 45 ≤ age ≤ 55, 100 ≤ salary ≤ 200✔ 45 ≤ age ≤ 55�@: 45, 50
– 45: 010000000100, 50: 001110000010–´@P bitwise ORo�
010000000100 | 001110000010 = 011110000110✔ 100 ≤ salary ≤ 200�@: 100, 110, 120, 140
–µ@P bitwise ORo�000100000000 | 000001000000 | 000010000000 | 000000100000 =
000111100000✔ 7<@P bitwise ANDo�
– 011110000110 & 000111100000 = 000110000000– 4§< 5§����>�7<
Yeungnam University, Database Lab. Multidimensional Index - 45
4.3 Compressed Bitmaps
● !345
✔ Bitmap Index:� Field�v m × n bit/0✔ Bitmap2 0Y1�¶d, 0�3�S�
– Field� key field�QR✔ *�: 0�3��¥¦ binarydS�8%·2��
–H: 000101, 010001, 010101✔ Run-length Encoding
– Determine how many bits of the binary representationof i has.
– j = the number of determined bits–>���¸7<
(j-1)�� 1 + 0 + binary representation of i
24
Yeungnam University, Database Lab. Multidimensional Index - 46
Examples
● Encoding✔ 00000000000001: i = 13 → j = 4 ⇒ 11101101✔ 1: i = 1 → j = 1 ⇒ 01, 0: 00✔ 100000001000: 00110111, 010000000100: 01110111
–*�:�N¹ 1vº� 0}2�|NO%v.–�7jk?
• »¼����3%½)DE!• ¾�N¹ 0�3�4£¿8%�?
● Decoding: 11101101001011✔ �ºx{ÀÁ�d�Â: j = 4 → 0Ã� 4bit = 13, 0, 3
Yeungnam University, Database Lab. Multidimensional Index - 47
4.4 Managing Bitmap Indexes
● Finding Bit-Vectors
● Finding Records
● Handling Modifications to the Data File
25
Yeungnam University, Database Lab. Multidimensional Index - 48
Finding Bit-Vectors
● @AB✔ Field@���n°� Bitmap Index��
● �CDE
✔ Bitmap Index = (Field@, Bit-vector)dÄ^Å Relation– Field@: Relation� Key Attribute
✔ Field@PY�89 Conventional IndexÄ^✔ Bit-vector��� blockingNi�_
Yeungnam University, Database Lab. Multidimensional Index - 49
Finding Records
● @AB✔ Bitwiseo�7<dh^Å Bitmap�Y�89:�������
● �CDE
✔ ���§Æ��� Conventional IndexÄ^✔ IndexÄ^:���§Æ→���S� �✔ Example:7< Bitmap = 000001000010
– 6§����S� �: Index�����
26
Yeungnam University, Database Lab. Multidimensional Index - 50
Handling Modifications to the Data File
● @AB✔ ����e�,®�d�����§ÆyQ✔ ���� Field@yQ
● FGH6I�,JA✔ ���®�:���§Æ�MÇ�ÈN✔ ���e�
–'� bitmap index� bit3 1��–���� Field@�fg bitmap index�ÉÊ– Bitmap index��%cdK Field@?
• cdK bitmap indexh^✔ ���� Field@yQ: old index -®�, new index -e�