9. multi-dimensional access paths - tu kaiserslautern · 2011-06-27 · quad trees basic problems...

22
Realization of DBS 9. Multi-dimensional Access Paths Theo Härder www.haerder.de Goals - Design principles for access paths to the records of a table, which support several search criteria in a symmetric way - Indexing of point objects and expanded objects © 2011 AG DBIS Realization of Database Systems – SS 2011 Main reference: Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 9. Volker Gaede, Oliver Günther: Multidimensional Access Methods, ACM Computing Surveys 30:2, June 1998, pp. 170-231. Realization of DBS Quad trees Basic problems Classification – access paths and queries Access Paths – Forest of Descendents Grid file Multi-key hashing k-d trees R- and R + - tree © 2011 AG DBIS Applications of Geo-DBS 9-2

Upload: others

Post on 27-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

9. Multi-dimensional Access Paths

Theo Härderwww.haerder.de

Goals- Design principles for access paths to the records of a table,

which support several search criteria in a symmetric way- Indexing of point objects and expanded objects

© 2011 AG DBIS

Realization of Database Systems – SS 2011

Main reference:Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 9.

Volker Gaede, Oliver Günther: Multidimensional Access Methods, ACM Computing Surveys 30:2, June 1998, pp. 170-231.

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Access Paths – Forest of Descendents

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

© 2011 AG DBIS

Applications ofGeo-DBS

9-2

Page 2: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Genealogy of Access Paths

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

© 2011 AG DBIS

Applications ofGeo-DBS

9-3

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Multi-dimensional Access Paths

Classification of queries

Basic problems• Indexing new data types – which ones?

Organization of spatial data

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

• Organization of spatial data• Preservation of topology (clustering)• Object representation

Organization of data records• Quadtrees• Multi-key hashing

Organization of embedding data space k d

© 2011 AG DBIS

Applications ofGeo-DBS

9-4

• k-d trees • Grid file

Access paths for expanded objects• R-tree• R+-tree

Page 3: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Classification of Query Types

Definitions• A file is a collection of N records of type R = (A1, ..., An), where each record represents a

point object by an ordered n-tuple t = (a1, a2, ..., an) of values. The attributes A1, ..., Ak(k n) be keys.

• A query q specifies some conditions which must be satisfied by the key values of the records in the result set.

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

• Intersection queries: qualifying objects overlap with the query window• Containment queries, enclosure queries: qualifying objects are entirely contained in

the query window or completely contain the query window

Classification of intersection queries1. Exact match query: specifies a value for each key:

Q = (A1 = a1) (A2 = a2) ... (Ak = ak)2. Partial match query: specifies s < k key values

Q = (Ai1 = ai1) (Ai2 = ai2) ... (Ais = ais)

© 2011 AG DBIS

Applications ofGeo-DBS

9-5

Q ( i1 i1) ( i2 i2) ( is is)with 1 < s < k und 1 < i1 < i2 < ... < is < k

3. Range query: specifies a range ri = [li < ai < ui] for each key Ai

Q = (A1 = r1) ... (Ak = rk) (A1 > l1) (A1 < u1) ... (Ak > lk) (Ak < uk)

4. Partial range query: specifies a range for s < k Q = (Ai1 = ri1) ... (Ais = ris)with 1 < s < k und 1 < i1 < ... < is < k und rij = [lij < aij < uij], 1 < j < s

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Classification of Query Types (2) General range query

• exact range [li = ai = ui]• infinite range [- < ai < ] All 4 types of intersection queries can be expressed as a general range query

Classification of containment queries• Point query: Given a point in data space D find all objects which do contain it

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

• Point query: Given a point in data space D, find all objects which do contain it. • Region query: Given a query window, find all objects intersecting with it

(enclose it, are contained in it).

Examples: search for rectangles

RECTANGLES (x1, y1, x2, y2)

(x1, y1)

(x2, y2)

a) Find all rectangles which contain point (2 5)

© 2011 AG DBIS

Applications ofGeo-DBS

9-6

a) Find all rectangles which contain point (2,5)SELECT x1, y1, x2, y2 FROM RECTANGLESWHERE x1 <= 2 AND x2 >= 2 AND y1 <= 5 AND y2 >= 5

b) Find rectangles with point (1,3) as corner point left belowSELECT x1, y1, x2, y2 FROM RECTANGLESWHERE x1 = 1 AND y1 = 3

These all are harmless queries to be answered without pain even with SQL

Page 4: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

3 Future Research Topics: Semantics, … Goals of DBMS development

• last 20 years: performance, availability, functionality• next 20 years: semantics (but we will always obtain only a bit more!)

From equality to approximation!Concept of similarity: many forms, interpretations, applications

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

p y y , p , pp• Information Retrieval, Geo Databases• Classification, Cluster Analysis (Data Mining)• Pattern Recognition, Case-based Reasoning, …

Search for “politician” as a term in a query

politicia

represent

congress

politicaperson

politicallyrested cit

© 2011 AG DBIS

Applications ofGeo-DBS

9-7Douglas R. Hofstadter: “The Location of Meaning” in : Godel, Escher, Bach: An Eternal Golden Braid, Basic Books; 20 Anv edition, 1999

Often implicit assumption: Standardized (meta-)schema for all data sources is existing andcomplete, correct, consistent with true and trustworthy content

10

Domain of uncertainty for similarity problemssufficientinsufficient don’t know

ntative

sman

alny inte-tizen

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Classification of Query Types (3)

Best-match query, nearest-neighbor query• Desired object does not exist Query for objects as similar as possible

XX

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

existing objectsX

X

desired object

distance

• "best" is determined via different kinds of distance functions

Examples• Object only satisfies 8 of 10 requested properties

© 2011 AG DBIS

Applications ofGeo-DBS

9-8

• Object is described by synonyms

Determination of the nearest neighborD = distance function, B = collection of points in the k-dimensional spaceWanted: nearest neighbor of p (in B)

nearest neighbor is q, if

︶D ︵ D q r B ︶ r ︵ p q,p) (r,

Page 5: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Fundamental Problems1. Preservation of topological structure

X XXXXXX

X XX XX XX XX XX XX X

DBuckets B

X X

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

2. Strongly varying density of objects

XX

XXX

XXX

XX

D

Strong variation of the spatial allocation over time

no equi distant division of D

XX X

XXXX X

X XX X

XXXX X

X

© 2011 AG DBIS

Applications ofGeo-DBS

9-9

Xno equi-distant division of D

but: uniform bucket size (on disk)

3. Object representation- point objects- expanded objects

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Fundamental Problems (2)

4. Dynamic reorganization

X XXX

D Dafter severalinsertions and

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

X

X XX

XXX

XXX

XXX

XXXB B

5. Balanced access structure

- Arbitrary allocations and insertion/deletion sequences

deletions

© 2011 AG DBIS

Applications ofGeo-DBS

9-10

y / q

- Symmetric access via all dimensions!

- Guarantee of uniform and logarithmic access

2 or 3 disk accesses

Page 6: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Use of One-Dimensional Access Paths

So farIndexing (Inversion) of a single dimension, e.g. B*-tree

Decomposition principle of the key space for B*-trees (2-dim.)

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

X XX

X X X

X X

X

X

XX

Key2

Key1

K2m

K21 X XX

X X X

X

X

X

XX

Key2

Key1

K2m

K21

X

© 2011 AG DBIS

Applications ofGeo-DBS

9-11

K11 K1n

B*-tree (Key1)

Partitioning of space according to values of Key1

K11 K1n

B*-tree (Key2)

Partitioning of space according to values of Key2

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Multi-Attribute Search

Access to K2j) (Key2 K1i) (Key1

OR

AND...

• Pointer list for K1i: from B*-tree (Key1)• Pointer list for K2j: from B*-tree (Key2)

Key2

K2j

AND

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

merge + access to result tuples

Key1K1i large pointer lists and intermediate results

Simulation of multi-dimensional access using a B*-tree?

Key1| Key2

Idea:concatenated keys andconcatenated values:

Search Operations:

© 2011 AG DBIS

Applications ofGeo-DBS

9-12

Key1| Key2K11 | K21K11 | K22

…K11 | K2mK12 | K21K12 | K22

…⁞K12 | K2mK13 | K21

… ⁞K1n | K2m

Search Operations:- (Key1 = K1i) AND (Key2 = K2j) ?- Key2 = K2j ?- Key1 = K1i ?- OR predicates ?

Page 7: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Quadtrees (Point Quadtree, Quadrant Tree) Storage structure

• for 2-dimensional multi-attribute access• Decomposition principle of D: recursive partitioning by quadrants

Realization as Generalization of the binary tree• Each node contains a record, out-degree of a node: max. 4

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

, g• Root divides 2-dimensional space in 4 quadrants • Recursive division of each quadrant by the root of a subtree• i-th subtree of a node contains points in the i-th quadrant

Goal: consideration of neighborhood relationships• Example: geographical data with coordinates x and y

N

II = NW I = NE

© 2011 AG DBIS

Applications ofGeo-DBS

9-13

E

S

W

IV = SEIII = SWI + III closedII + IV open

Generalization:• k-dimensional keys => out-degree of each node: 2k

• k=3: octtree, k=4: hextree

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Point Quadtree Node format

x yinfo

NE NW SW SE

Spatial divisionF

SB

DATR

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

NE NW SW SE KL

Properties• Tree structure dependent on

insertion sequence (unbalanced)• Expensive deletions (re-insertion of

the subtree)• No mapping onto pages KL

PS

KA

MALU

HD

© 2011 AG DBIS

Applications ofGeo-DBS

9-14

o app g o to pages

F SB PS KA

DA TR MA LU

HD

Query evaluation?

Page 8: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Region Quadtree

=1

2 3

4

Idea: recursive decomposition of surface areas into grids (having certain properties per grid)e.g., color, height, …, pixel value

Goal:space-saving representationand simple manipulation set-theoretic operations!

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

=

. . .1 2 4 8

12

3

67

8

4

set-theoretic operations!

Same principle for 3-dim. data:decompose space into cubes

© 2011 AG DBIS

Applications ofGeo-DBS

9-15

Some common uses of quadtrees• Image representation • Spatial indexing• Efficient collision detection in two dimensions • Storing sparse data, such as a formatting information for a spreadsheet or for some

matrix calculations • Solution of multidimensional fields (computational fluid dynamics, electromagnetism)

Quadtrees are the two-dimensional analog of octtrees.

7

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Multi-Key Hashing (Partitioned Hashing) Decomposition principle of D

Partitioning by hash functions in each dimension: is realized by division of bucket address in k fragments (k = number of keys)

For each key i (i=1, 2, …, k)l h h f h d f h b k dd

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

a special hash function hi determines part of the bucket address • number of buckets be 2B (address = sequence of B bits)

• each hash function hi delivers bi bits

Idea: separate contributions of k keys to address

k

1i ib B

Key A1 A2 A3Value a a a

A1 | A2 | A3: h(a11 | a22 | a33) only supports point queries!

© 2011 AG DBIS

Applications ofGeo-DBS

9-16

• record t = (a11, a22, ..., aki, ...) is stored in bucket with

addr. = h1(a11) | h2(a22) | ... | hk(aki)

Value a11 a22 a33

Page 9: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Multi-Key Hashing: Example

y

x

x x011

010

001b2=3

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree Application example

bucket with address ‘10010’ contains all records with h1(a1) = ‘10’ and h2(a2) = ‘010’

xx x000

b1=2

00 01 10 11

© 2011 AG DBIS

Applications ofGeo-DBS

9-17

Eno: INT (5), b1 = 4; SSno: INT (9), b2 = 3; DName: CHAR (10), b3 = 2; B=9 (512 buckets)

h1(Eno) = Eno mod 16 h1(58651) = 11 -> 1011h2(SSno) = SSno mod 8 h2(130326734) = 6 -> 110h3(DName) = L(DName) mod 4 h3(XYZ55) = 1 -> 01

B-addr. = 101111001

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Multi-Key Hashing (2)

Number of accessesExact-match queries: access to 1 BucketPartial-match queries:Eno = 58651 NB = 29/24 = 32Eno = 73443 AND SSno = 2332 NB = 29/24+3 = 4

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

Eno 73443 AND SSno 2332 NB 2 /2 4

Pros• No index, little space requirements and update overhead• Exact-match queries: entire address is known access to 1 bucket• Partial-match query : pruning of search space

(Ai = ai): number of buckets to be searched is reduced by

ib2

ii bBbBB 22/2N

© 2011 AG DBIS

Applications ofGeo-DBS

9-18

Cons / problems• Topological structure of data is not preserved• No support of range- and best-match queries• Optimal allocation of bi to Ai dependent on query frequencies

Page 10: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Evaluation of Approaches

Comparison Quadtree Multi-KeyHashing k-d Tree Grid File R-Tree

Dynamicreorganization

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

reorganization

Preservation of topological structure (cluster)

Determination of neighborhood: range queries, best match

© 2011 AG DBIS

Applications ofGeo-DBS

9-19

Separation of accessstructure and data

Exact query(balanced structure)

Object representation

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Multi-Dimensional Binary Search Trees Extension of the binary tree

• Consideration of k keys: k-d tree• All data records are organized using the

tree structure: node-oriented (homogeneous)• Maintenance operations as in the case of the

binary search tree but

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

binary search tree, but

• Key comparison applies for one of the k keys at each level

Discriminator determines key at each level • Cyclic variation of discriminator d: for all nodes of tree level i holds: d = (i mod k) + 1• Left (right) successor to a node contains all records having smaller (larger) values for the

discriminator attribute

(P)K (R)K :HISON(P) R(P)K(Q)K :LOSON(P)Q

dd

dd

© 2011 AG DBIS

Applications ofGeo-DBS

9-20

Tree structure is not balanced – various proposals for extensions• To a record, m-1 succeeding levels of the tree are completely mapped into a page

- page must be able to store up to 2m - 1 records

- each record at level 1, m+1, 2m+1, ... forms root of a subtree stored in a page

Further variants of the k-d tree• Leaf-oriented (heterogeneous, storage of records in buckets)

dd

Page 11: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Example: 3-d Tree (Node-Oriented)

Age Salary Location

352829254029

17K40K15K45K12K16K

KLFDAKLSBB

discr.

35 17K KL 1

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

29304225

16K17K100K14K

BFFB

P i f t ?

Age Salary Loc.

28 40K F 2

29 15K DA 3 25 45K KL 3

40 12K SB 2

42 100K F 3

© 2011 AG DBIS

Applications ofGeo-DBS

9-21

- Processing of query types?

- Tree structure dependent on insertion sequence (unbalanced tree)

- Pruning of search space for partial-match- and range queries

- Deletion is very difficult

29 16K B 1 30 17K F 1

25 14K B 2

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Organization of Embedding Data Space –Divide and Conquer

Decomposition principle of D• D is dynamically partitioned in cells, objects of a cell are stored as records in buckets • In case of bucket overflow: local cell refinement

Divide and Conquer• Piecewise preservation of topology (clustering)• Tree as access structure for the buckets only works as directory

E l H t k d t k 2

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

Example: Heterogeneous k-d tree – k=2

XXX X

60

20

50 70

X

XX

A2=Y

A1=X

100

0 100

D

b = 3X

x = 50 1

X

y = 20 2y = 60 2

x = 70 1

buckets

© 2011 AG DBIS

Applications ofGeo-DBS

9-22

50 70 A1 X0 100

Properties of k-d tree• Clustering by buckets is prerequisite for practical use• Built-in balancing mechanism does not exist• How are modification operations (deletion!) done?• How are the various query types supported?

Page 12: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Organization of Embedding Data Space –Dimension Refinement

Principle• Data space D is dynamically partitioned by an orthogonal grid such that

k-dimensional cells (grid blocks) emerge• The objects contained in the cells are stored in buckets • Therefore, a cell is uniquely allocated to a bucket

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

• Class-building property: principle of dimension refinement which refines a segment in the selected dimension by a complete cut through D

ExampleA3

w2

w1

u1

data space D = A1 x A2 x A3cell partition P = U x V x Wsegments of partition U = (u1, u2, … ul)

V = (v1, v2, …, vm)W = (w1, w2, …, wn)

© 2011 AG DBIS

Applications ofGeo-DBS

9-23

A2u1

u2

u3

A1

v1 v2 v3

v2 v3 v4

Three-dimensional data space D with cell partition P; visualization of a split event in interval v2

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Dimension Refinement

A2

w2

w1

u1

u2

A3

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

Problems of dimension refinement• How many new cells emerge each time?

u2

u3

A1

v1 v2 v3

v2 v3 v4

© 2011 AG DBIS

Applications ofGeo-DBS

9-24

• What is the consequence for bucket allocation?

• Which mapping methods can be chosen?

• Are there restrictions for the determination of dimension refinement?

Page 13: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Grid-File: Idea

DB k t

Goals• Preservation of topology• Efficient support of all query types• Reasonable storage occupancy

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

X

XX X

X

X

GD

Buckets b=3

© 2011 AG DBIS

Applications ofGeo-DBS

9-25

X

X

X

X

XX

X

X

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Grid-File: Idea (2)D

GD

BucketsX

X

X

X

X

X X

X

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

X

X X X

X X

X

XXX

X

X

X

© 2011 AG DBIS

Applications ofGeo-DBS

9-26

Properties• 1:1 relationship between cell Ci and element of GD• Element of GD = Ptr. to bucket B• n:1 relationship between Ci and B

X

XX

XX X

X

X

Ci

Page 14: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Grid-File2: Idea (3)

Decomposition principle of D: dimension refinement

X

Dn

GD

Buckets

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

X

X

X X

X

XC

0

0 mS1

S2

X X

XX

XX X

B

© 2011 AG DBIS

Applications ofGeo-DBS

9-272. Nievergelt, J. et al.: The Grid File: An Adaptable, Symmetric Multikey File Structure, ACM Trans. Database System, 1984, pp. 38-71

Components• k scales (scaling vectors) define cells (grid) in the k-dimensional data space D• Cell- or grid directory GD: dynamic k-dim. matrix for the mapping of D onto the

set of buckets• Bucket (B): storage of objects of one cell (C) or several cells

(bucket can cover a range of cells: bucket region))

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Grid File - Example Stepwise development of a GF9 D

X X

X

X

A B

GD

1

situation aX

XX

X

A

B

Buckets

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

0a t z

A B1

1 2S2

S1

b=3

9

0a t

D

X X

X

X

z

A C

GD

1

1 2S2

S1

X XX

B

3g

situation bX

XX

X

A

B

Buckets

X

X

X

C

© 2011 AG DBIS

Applications ofGeo-DBS

9-28

S1

Buckets9

0a t

D

X X

X

X

zA C

GD

1

1 2S2

S1

X XB

3

A C2 D

X

X

7

g

situation c

X

XX

X

A

B

X

X

X

C

DX

Page 15: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Central Data Structure: Grid Directory Requirements

• Principle of two disk accessesindependent of value distributions, operation frequencies and number of stored records

• Split- and merge operations only on two buckets at a time• Storage occupancy

A f b k t t bit il ll

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

- Average occupancy of buckets not arbitrarily small - Skewed distributions only enlarge G

b=2D

© 2011 AG DBIS

Applications ofGeo-DBS

9-29

Design of a directory structure• Dynamic k-dim. matrix GD (on disk)• k one-dim. vectors Si (in memory)

xx

xx xx

xx

xxxxxxxx

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Central Data Structure: Grid Directory (2)

Operations on GD• Direct access to a GD entry• Relative access (NEXTABOVE, NEXTBELOW)• Merge of 2 neighbored entries of a dimension (renaming of affected entries)• Splitting of an entry of a dimension (with renaming)

B h d i t f ll t b k t

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree ce

llsbu

cket

po

ol

Box-shaped assignment of cells to buckets

© 2011 AG DBIS

Applications ofGeo-DBS

9-30

Page 16: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Grid File – Search Queries

Exact match querySELECT *

FROM EMPWHERE Loc=‘KL‘AND Dno = ‘K55‘

[ . . .]B

S1: AA, DA, FR, MK, ZZ S2: K00, K17, K39, K52, K89, K99

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

[ KL, K55]

[ . . .]

GD pageGD

Range query• Determination of scale values

in each dimension

© 2011 AG DBIS

Applications ofGeo-DBS

9-31

• Computation of qualified GD entries• Access to the GD page(s) and

fetch of the referenced buckets

range query

D

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Bucket Regions (400 Insertions, b = 20)

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

© 2011 AG DBIS

Applications ofGeo-DBS

9-32Uniform distribution Non-uniform distribution

Page 17: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Grid File – Performance Considerations

Growth of GD is super-linear• In case of uniform object distributions:

O(N1+(k-1)/(k*b))

• In case of skewed object distributions up to:

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

Refinement in dimension k• Number of added GD entries: n = (m1*m2*...*mk-1) • Required disk accesses (E = #GD entries):

Storage occupancy for buckets:

Problem cases for the mapping:

© 2011 AG DBIS

Applications ofGeo-DBS

9-33

Problem cases for the mapping:• Low selectivity and unfavorable distribution

S1 : a, d, d, d, f, s, ...S2 : 0, 5, 5, 5, 7, 9, ...

• How many GD entries can be delivered by an exact query ?

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Access Paths for Expanded Spatial Objects Expanded objects own

• General properties such as name, composition, . . .• Location and geometry (curve, polygon, . . .)

Indexing of spatial objects• Exact representation?• Object approximation by box-shaped envelopment – effective!

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

Object approximation by box shaped envelopment effective! however, false drops (mismatches) are possible

Problems• Besides object density, the object expansion has

to be regarded when mapping and refining the object• Objects can contain others or can mutually overlap

Classification of solution approaches• Mutually overlapping regions (R-tree)• Clipping (R+-tree)• Transformation approach

© 2011 AG DBIS

Applications ofGeo-DBS

9-34

Transformation approach

functionally maps expanded objects onto higher-dimensional points

Page 18: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

R-Tree3

Goal: Efficient management of spatial objects (points, polygons, cuboids, ...)

Applications• Cartography: storage of geographic maps, efficient evaluation of

“geometrical” queriesCAD i l ti f f l d lid ( t l i f

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

• CAD: manipulation of faces, volumes and solids (e.g. rectangles in case of VLSI design )

• Computer Vision and Robotics

Key operations• Point queries: Find all objects which contain a given point • Region queries: Find all objects which overlap with a given search window

(... enclose, ... are completely contained in)

© 2011 AG DBIS

Applications ofGeo-DBS

9-35

Approach: Storage and search of axis-parallel rectangles• Objects are represented by data rectangles and must be described by

Cartesian coordinates • Representation in the R-tree happens by minimally enclosing

(k-dimensional) rectangles/regions• Search queries also refer to rectangles/regions

3. Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching, in: Proc. ACM SIGMOD Conf., 1984, pp. 47-57

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

R-Tree (2)

R-tree is height-balanced multi-way tree• Each node corresponds to a page• Maximally M, minimally m (>= M/2) entries per node

leaf node entryleaf node entry

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

l1 l2 . . . lk TID

l1 l2 . . . lk PID

smallest circumlocutory rectangle (data rectangle) for TID

intervals describe smallest circumlocutory data region for all objects contained in PID

ea ode e t y

intermediate node entry

ea ode e t y

© 2011 AG DBIS

Applications ofGeo-DBS

9-36

lj = closed interval regarding dimension j

TID: reference to object PID: reference to son

Properties• Strong overlap of circumlocutory rectangles/regions possible on all tree levels• Search for rectangles/regions has to traverse several subtrees, if necessary• Maintenance operations similar to such in B-trees

Page 19: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Mapping of the R-Tree

Division of data space D D

E

F G

H

K

J

I

B

A

d

datarectangle

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

N M L

C

data region

R-tree A BA C

H I JD E F L M NG K

Example for a

© 2011 AG DBIS

Applications ofGeo-DBS

9-37

Example for a “bad” search window

DE

FG

N M L

H

K

J

I

B

C

A

W

search window W

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

R-Tree: Example Face objects to be stored 8

7

6

5

4

F11 F10

F8

F7

F9 F6

F15

F12

F13

F16

F17

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

3

2

1

0 1 2 3 4 5 6 7 8 9 10

F5

F4

F3F14

F1F2

0-80-4

6-100-8

0-82-8

0-30-4

2-80-4

7-100-5

6-103-8

2-62-8

2-86-8

0-33-8

Related R-tree

© 2011 AG DBIS

Applications ofGeo-DBS

9-38

0-30-2F1

0-31-4F14

2-60-3F2

4-80-4F3

8-101-5F5

7-100-2F4

7-92-5F17

6-73-8F8

7-105-8F6

4-64-8F9

2-62-6F15

3-54-6F16

0-34-7F12

0-36-8F11

0-33-6F13

6-86-8F7

2-66-8F-10

Page 20: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Search Optimization by the R+-Tree

Coverage and overlap for a level of the R-tree• Coverage is the entire area to cover all related rectangles • Overlap is the entire area which is contained in two or more nodes • Minimal coverage reduces the amount of “dead space” (empty area) which

is covered by the nodes of the R-tree.• Minimal overlap reduces the set of search paths to the leaves

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

Minimal overlap reduces the set of search paths to the leaves (even more critical for the access time than minimal coverage)Efficient search requires minimal coverage and overlap

IdeaPartitions are allowed which “cut up” data rectangles (Clipping)

avoidance of overlaps in intermediate nodes

ConsequenceData rectangle is decomposed in a collection of disjoint sub-rectangles, if necessary and is stored in various nodes at the leaf level

© 2011 AG DBIS

Applications ofGeo-DBS

9-39

necessary, and is stored in various nodes at the leaf level

Partitioning option of a long line object in the R+-tree

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

R+-Tree4

Division of data space D

DE

F G

H

K

J

I

B

A P higher flexibility when data rectangles can be partitioned

4. Sellis, T. et al.: The R*-Tree; A Dynamic Index for Multi-Dimensional Objects,in: Proc. 14th Int. Conf. on Very Large Data Bases, Brighton, 1987, pp. 507-518

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

N M L

C

R+-treeA BA C

I J KD E F G HG

P

L M N

ecta g es ca be pa t t o ed

© 2011 AG DBIS

Applications ofGeo-DBS

9-40

Properties• Overlap of data regions is avoided • Coverage problem is essentially eased• Clustering of objects combined in a region may be prevented by clipping • More complex algorithms for certain queries (containment)• More difficult maintenance, no performance advantage as compared to R-tree

Page 21: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Applications of Geo-DBS What are Geo-objects?

Representation of a certain area of earth surface

• Geometric component (location, expansion)• Thematic component

(qualitative or quantitative description)

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

(q q p )

Modeling• Topological base elements: point, line, face• Line-shaped formations: polygon draws• Plane objects: grid model, vector model• Vector model is favored for Geo-DBS Good scalability, lower space requirement• Object class for surface modeling: simple polygons with holes

Queries

© 2011 AG DBIS

Applications ofGeo-DBS

9-41Examples of location-related and range-related selection

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Applications of Geo-DBS Use of object approximations

• Two-level search and test: in index and on object representation• Different conservative approximations

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

• Empirical result: minimally enclosing pentagon gives best trade-off between approximation quality and space consumption

Thematic component• Simple data types contained in an attribute vector (tuple)• Example: surface usage, amount of rainfall

rectangle convex envelope pentagon ellipse

© 2011 AG DBIS

Applications ofGeo-DBS

9-42

Characteristics of objects• Number of objects: up to 109 data records• Data volume: up to 1 TeraByte• Variability of objects and object sets

- Object expansion: 1 : 106

- Object form- Storage space requirement of polygons: 0.5 KB – 1 GB- Distribution of objects in the plane (object density): 1 : 104

There are no upper boundaries for these properties

Page 22: 9. Multi-dimensional Access Paths - TU Kaiserslautern · 2011-06-27 · Quad trees Basic problems Classification – access paths and queries Use of One-Dimensional Access Paths So

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Queries to Spatial Objects –Extensibility Aspects of the DB Language

No space relationship and no spatial operators in the relational model• High complexity even for simple examples• Example: representation of arbitrarily mounted rectangles in the plane

RECTANGLE (RE-No, X1, Y1, X2, Y2, X3, Y3, X4, Y4)

Q Fi d ll t l th t t l l t l ith i t (PA PB PC PD)

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

Query: Find all rectangles that truly enclose rectangle with points (PA, PB, PC, PD)

SELECT RE-No FROM RECTANGLEWHERE XA > X1 AND YA < G(P1, P2, XA)

AND YA > G(P4, P1, XA) ANDXB < Y2 AND YB < G(P1, P2, XB)

AND XB < G(P2, P3, XB) ANDXC < X3 AND YC < G(P2, P3, XC)

AND YC > G(P3, P4, XC) ANDYD > Y4 AND YD > G(P3, P4, XD)

AND YD > G(P1, P4, XD);

© 2011 AG DBIS

Applications ofGeo-DBS

9-43

with abbreviations: e.g. G(P1, P2, XA) for((Y1 - Y2) * XA) / (X1 - X2) + (Y2 * X1 - Y1 * X2) / (X1 - X2)

Object-relational data models – key property• Extensibility with user-defined functions and operators• Queries using operators such as overlap, intersect, distance, containment («) ...

SELECT RE-No FROM RECTANGLE XWHERE [PA, PB, PC, PD] « X;

Realizationof DBS

Quad trees

Basic problems

Classification –access pathsand queries

Summary

Important properties of multi-dimensional access paths• Preservation of topological structure of data space • Adaptation to the object density• Dynamic reorganization• Balanced access structure• Support of different query types (intersection queries best match queries)

Grid file

Multi-keyhashing

k-d trees

R- and R+-tree

• Support of different query types (intersection queries, best match queries)

Representation of spatial objects• Abstraction to point-shaped representation is typical • “Expanded” representation only in coarse approximation:

Processing problems

Pros of newer concepts• Organization of embedding data space – Grid file• Representation of expanded objects – R- and R+-tree

However: There exist very many proposals and concepts which are

© 2011 AG DBIS

Applications ofGeo-DBS

9-44

However: There exist very many proposals and concepts which are extremely specialized and not practically proven

Necessity of DBS integration of “conventional”(one-dimensional) and multi-dimensional access paths • Access options via spatial and time-related dimensions • Support of “Data Warehouse” appls, geographical information systems, . . .

Extension of object-relational DBS by multi-dimensional access paths (primarily Grid file and R-tree)