r-tree: spatial representation on a dynamic-index structure

37
R-Tree: Spatial Representation on a Dynamic-Index Structure Advanced Algorithms & Data Structures Lecture Theme 03 – Part I K. A. Mohamed Summer Semester 2006

Upload: elon

Post on 06-Feb-2016

62 views

Category:

Documents


0 download

DESCRIPTION

R-Tree: Spatial Representation on a Dynamic-Index Structure. Advanced Algorithms & Data Structures Lecture Theme 03 – Part I K. A. Mohamed Summer Semester 2006. Overview. Representing and handling spatial data The R-Tree indexing approach , style and structure Properties and notions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: R-Tree: Spatial Representation on a  Dynamic-Index Structure

R-Tree: Spatial Representation on a Dynamic-Index Structure

Advanced Algorithms & Data Structures

Lecture Theme 03 – Part I

K. A. Mohamed

Summer Semester 2006

Page 2: R-Tree: Spatial Representation on a  Dynamic-Index Structure

2

Overview

• Representing and handling spatial data

• The R-Tree indexing approach, style and structure

• Properties and notions

• Searching and inserting index Entry-records

• Deleting and updating

• Performance analyses

• Node splitting algorithms

• Derivatives of the R-Trees

• Applications

Page 3: R-Tree: Spatial Representation on a  Dynamic-Index Structure

3

Spatial Database (Ia)

• Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

Page 4: R-Tree: Spatial Representation on a  Dynamic-Index Structure

4

Spatial Database (Ib)

• Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

Spatial object: Contour (outline) of the area around the building(s).

Minimum bounding region (MBR) of the object.

Page 5: R-Tree: Spatial Representation on a  Dynamic-Index Structure

5

Spatial Database (Ic)

• Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick relational-topological search.

MBR of the city neighbourhoods.

MBR of the city defining the overall search region.

Page 6: R-Tree: Spatial Representation on a  Dynamic-Index Structure

6

Spatial Database (II)

Notion: To retrieve data items quickly and efficiently according to their spatial locations.

• Involves 2D regions.

• Need to support 2D range queries.

• Multiple return values desired: Answering a query region by reporting all spatial objects that are fully-contained-in or overlapping the query region (Spatial-Access Method – SAM).

In general:

• Spatial data objects often cover areas in multidimensional spaces.

• Spatial data objects are not well-represented by point-location.

• An ‘index’ based on an object’s spatial location is desirable.

Page 7: R-Tree: Spatial Representation on a  Dynamic-Index Structure

7

The Indexing Approach

• A B-Tree (Rosenberg & Snyder, 1981) is an ordered, dynamic, multi-way structure of order m (i.e. each node has at most m children).

• The keys and the subtrees are arranged in the fashion of a search tree.

• Each node may contain a large number of keys, and the number of subtrees in each node, then, may also be large.

• The B-Tree is designed (among other objectives):

– to branch out this large number of directions, and

– to contain a lot of keys in each node so that the height of the tree is relatively short.

M

P T X

B D F G K L N O Q S V W Y ZI

E H

Page 8: R-Tree: Spatial Representation on a  Dynamic-Index Structure

8

The R-Tree Index Structure

• An R-Tree is a height-balanced tree, similar to a B-Tree.

• Index records in the leaf nodes contain pointers to the actual spatial-objects they represent.

• Leaves in the structure all appear on the same level.

• Spatial searching requires visiting only a small number of nodes.

• The index is completely dynamic: inserts and deletes can be intermixed with searches.

• No periodic reorganisation is required.

Page 9: R-Tree: Spatial Representation on a  Dynamic-Index Structure

9

The R-Tree Index Structure

• A spatial database consists of a collection of tuples representing spatial objects, known as Entries.

• Each Entry has a unique identifier that points to one spatial object, and its MBR; i.e. Entry = (MBR, pointer).

Page 10: R-Tree: Spatial Representation on a  Dynamic-Index Structure

10

R-Tree Index Structure – Leaf Entries

• An entry E in a leaf node is defined as (Guttman, 1984):

E = (I, tuple-identifier)• Where I refers to the smallest binding n-dimensional region (MBR) that

encompasses the spatial data pointed to by its tuple-identifier.

• I is a series of closed-intervals that make up each dimension of the binding region.

Example. In 2D, I = (Ix, Iy), where Ix = [xa, xb], and Iy = [ya, yb].

Page 11: R-Tree: Spatial Representation on a  Dynamic-Index Structure

11

R-Tree Index Structure – Leaf Entries

• In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb].

• If either ka or kb (or both) are equal to , this means that the spatial object extends outward indefinitely along that dimension.

Page 12: R-Tree: Spatial Representation on a  Dynamic-Index Structure

12

R-Tree Index Structure – Non-Leaf Entries

• An entry E in a non-leaf node is defined as:

E = (I, child-pointer)• Where the child-pointer points to the child of this node, and I is the MBR

that encompasses all the regions in the child-node’s pointer’s entries.

I(A) I(B) … I(M)

I(a) I(b) I(c) I(d)

B

a

b

c

d

Page 13: R-Tree: Spatial Representation on a  Dynamic-Index Structure

13

Properties

Then an R-Tree must satisfy the following properties:

1. Every leaf node contains between m and M index records, unless it is the root.

2. For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR that spatially contains the n-dimensional data object represented by the tuple-identifier.

3. Every non-leaf node has between m and M children, unless it is the root.

4. For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the regions in the child node.

5. The root has two children unless it is a leaf.

6. All leaves appear on the same level.

Let M be the maximum number of entries that will fit in one node.Let m ≤ M/2 be a parameter specifying the minimum number of entries in one node.

Page 14: R-Tree: Spatial Representation on a  Dynamic-Index Structure

14

Node Overflow and Underflow

• A Node-Overflow happens when a new Entry is added to a fully packed node, causing the resulting number of entries in the node to exceed the upper-bound M.

• The ‘overflow’ node must be split, and all its current entries, as well as the new one, consolidated for local optimum arrangement.

• A Node-Underflow happens when one or more Entries are removed from a node, causing the remaining number of entries in that node to fall below the lower-bound m.

• The underflow node must be condensed, and its entries dispersed for global optimum arrangement.

Page 15: R-Tree: Spatial Representation on a  Dynamic-Index Structure

15

Search

Typical query: Find and report all university building sites that are within 5km of the city centre.

Key:a: FAWb: Uni-Klinikumc: Psychologied: Biotechnologye: Institutsviertelf: Rektoratg: Uni-zentrumh: Biology / Botanischeri: Sportszentrum

i. Build the R-Tree using rectangular regions a, b, … i.ii. Formulate the query range Q.iii. Query the R-Tree and report all regions overlapping Q.

Approach:

Page 16: R-Tree: Spatial Representation on a  Dynamic-Index Structure

16

Search Strategy

Let Q be the query region.

Let T be the root of the R-Tree.

Search all entry-records whose regions overlaps Q.

Search sub-trees:

• If T is not leaf, then apply Search on ever child-node entry E whose I overlaps Q.

Search leaf nodes:

• If T is leaf, then check each entry E in the leaf and return E if E.I overlaps Q.

Page 17: R-Tree: Spatial Representation on a  Dynamic-Index Structure

17

Search

Typical query: Find and report all university buildings/sites that are within 5km of the city centre.

r2

e

r5 r8

r3 r4r1 r7r0

ic gf hba d

@ r6

@ r2 @ r5 @ r8

@ r0 @ r1 @ r7 @ r3 @ r4

R-Tree settings: M = m =

Page 18: R-Tree: Spatial Representation on a  Dynamic-Index Structure

18

Search

• The search algorithm descends the tree from the root in a manner similar to a B-Tree.

• More than one subtree under a node visited may need to be searched.

• Cannot guarantee good worst-case performance.

• Countered by the algorithms during insertion, deletion, and update that maintain the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space.

• So that only data near the search area need to be examined.

• Emphasis is on the optimal placement of spatial objects with respect to the spatial location of other objects in the structure.

Page 19: R-Tree: Spatial Representation on a  Dynamic-Index Structure

19

Insertion

• New index entry-records are added to the leaves.• Nodes that overflow are split, and splits propagate up the tree.• A split-propagation may cause the tree to grow in height.

The main Insert routine• Let E = (I, tuple-identifier) be the new entry to be inserted.• Let T be the root of the R-Tree.

[Ins_1] Locate a leaf L starting from T to insert E. [Ins_2] Add E to L. If L is already full (overflow), split L into L and

L’. [Ins_3] Propagate MBR changes (enlarged or reduced) upwards. [Ins_4] Grow tree taller if node split propagation causes T to split.

Page 20: R-Tree: Spatial Representation on a  Dynamic-Index Structure

20

Insertion – Choose Leaf

[Ins_1] Locate a leaf L starting from T to insert E = (I, tuple-identifier).• Notion (i): Select the path that would require the least enlargement to include

E.I.• Notion (ii): Resolve ties by choosing the child-node with the smallest MBR.

• Invoke: L = ChooseLeaf (E, T).

A B C

@rNA

C

B

E.I

rN

Page 21: R-Tree: Spatial Representation on a  Dynamic-Index Structure

21

Insertion – Choose Leaf

Algorithm: ChooseLeaf (E, N)

Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.

Output: The leaf L where E should be inserted.

• If N is leaf Then Return N

• Let FS be the set of current entries in the node N

• Let F = (I, child-pointer) FS, so that F.I satisfies the Insertion-Notions

• Return ChooseLeaf (E, F.child-pointer)

Page 22: R-Tree: Spatial Representation on a  Dynamic-Index Structure

22

Insertion – Add and Adjust

[Ins_2] Add E to L.

• Notion (i): If L has room for another entry, install E.

• Notion (ii): Otherwise split L to obtain L and L’, which between them, will contain all previous entries in L and the new E (consolidated for local optima).

[Ins_3] Propagate MBR changes upwards by invoking AdjustTree (L, L’).

• Notion (i): Ascend from leaf L to the root T while adjusting the covering rectangles MBR.

• Notion (ii): If L’ exists, propagate node splits as necessary; i.e. attempt to install a new entry in the parent of L to point to L’.

Page 23: R-Tree: Spatial Representation on a  Dynamic-Index Structure

23

Insertion – Propagating Node Splits

Example. Found L = @Y to insert new E = e. R-Tree settings: M = 3, m = 1.

K

@G

a b c

@Y

X Y Z

@K

Page 24: R-Tree: Spatial Representation on a  Dynamic-Index Structure

24

Insertion – Adjust Tree (I)

Algorithm: AdjustTree (N, N’)

Inputs: (i) A node N that has had its contents modified, (ii) The resultant split node N’, if not NULL, that accompanies N.

Outputs: (i) N as above, (ii) N’ as above.

• If N is the root Then Return {(i) N, (ii) N’}

• Let PN be the parent node of N.

• Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points to N.

• Adjust I_N so that it tightly encloses all entry regions in N.

Page 25: R-Tree: Spatial Representation on a  Dynamic-Index Structure

25

Insertion – Adjust Tree (II)

• If N’ is Not NULL Then If number of entries in PN < M-1 Then

• Create a new Entry EN’ = (I_N’, child-pointer_N’)• Install EN’ in PN

• Return AdjustTree (PN, NULL) Else

• Set {PN, PN’} = SplitNode (PN, EN’)

• Return AdjustTree (PN, PN’) End If

• Else Return AdjustTree (PN, NULL)

• End If

Page 26: R-Tree: Spatial Representation on a  Dynamic-Index Structure

26

Insertion – Grow Tree Taller

[Ins_4] Grow Tree taller.

• Notion: If the recursive node split propagation causes the root to split, then create a new root whose children are the two resulting nodes.

A B C

@T (root)

E F

@C

G H

@C’

Page 27: R-Tree: Spatial Representation on a  Dynamic-Index Structure

27

Insertion – Summary

• The height of the R-Tree containing n entry-records is at most logm n – 1, because the branching factor of each node is at least m.

• The maximum number of nodes is:

• Worst case space utilisation for all nodes except the root is:

• Nodes will tend to have more than m entries, and this will:

Page 28: R-Tree: Spatial Representation on a  Dynamic-Index Structure

28

Overview

• Representing and handling spatial data

• The R-Tree indexing approach, style and structure

• Properties and notions

• Searching and inserting index Entry-records

• Deleting and updating

• Performance analyses

• Node splitting algorithms

• Derivatives of the R-Trees

• Applications

Page 29: R-Tree: Spatial Representation on a  Dynamic-Index Structure

29

Deletion

• Current index entry-records are removed from the leaves.

• Nodes that underflow are condensed, and its contents redistributed appropriately throughout the tree.

• A condense propagation may cause the tree to shorten in height.

The main Delete routine

• Let E = (I, tuple-identifier) be a current entry to be removed.

• Let T be the root of the R-Tree. [Del_1] Find the leaf L starting from T that contains E. [Ins_2] Remove E from L, and condense ‘underflow’ nodes. [Ins_3] Propagate MBR changes upwards. [Ins_4] Shorten tree if T contains only 1 entry after condense

propagation.

Page 30: R-Tree: Spatial Representation on a  Dynamic-Index Structure

30

Deletion – Find Leaf

[Del_1] Find the leaf L starting from T that contains E.Algorithm: FindLeaf (E, N)Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.Output: The leaf L containing E.

• If N is leaf Then If N contains E Then Return N Else Return NULL

• Else Let FS be the set of current entries in N. For each F = (I, child-pointer) FS where F.I overlaps E.I Do

• Set L = FindLeaf (E, F.child-pointer)• If L is not NULL Then Return L

Next F Return NULL

• End If

Page 31: R-Tree: Spatial Representation on a  Dynamic-Index Structure

31

Deletion – Remove and Adjust

[Del_2] Remove E from L, and condense ‘underflow’ nodes.

[Del_3] Propagate MBR changes upwards.

• Notion (i): Ascend from leaf L to root T while adjusting covering rectangles MBR.

• Notion (ii): If after removing the entry E in L and the number of entries in L becomes fewer than m, then the node L has to be eliminated and its remaining contents relocated.

Page 32: R-Tree: Spatial Representation on a  Dynamic-Index Structure

32

Deletion – Remove and Adjust

• Propagate these notions upwards by invoking CondenseTree (N, QS), where N is an R-Tree node whose entries have been modified, and QS is the set of eliminated nodes.

• Start the propagation by setting N = L, and QS = .

• Re-insert the entries from the eliminated nodes in QS back into the tree.

• Entries from eliminated leaf nodes are re-inserted as new entries using the Insert routine discussed earlier.

• Entries from higher-level nodes must be placed higher in the tree so that leaves of their dependent subtrees will be on the same level as the leaves on the main tree.

Page 33: R-Tree: Spatial Representation on a  Dynamic-Index Structure

33

Deletion – Propagating Node Condensation

• Example: Delete the index entry-record b. R-Tree settings: M = 4, m = 2.

• Spatial constraint: a.I will form smallest MBR with r4.

r2 r6

@ r7

a b

@ r0

r0 r1

@ r2

r3 r4 r5

@ r6

c d e

@ r1

f g h

@ r3

i j

@ r4

k l m

@ r5

n

Page 34: R-Tree: Spatial Representation on a  Dynamic-Index Structure

34

Deletion – Condense Tree (I)

Algorithm: CondenseTree (N, QS)

Inputs: (i) A node N whose entries have been modified, (ii) A set of eliminated nodes QS.

• If N is NOT the root Then Let PN be the parent node of N. Let EN = (I_N, child-pointer_N) in PN. If N.entries < m Then

• Delete EN from PN

• Add N to QS Else

• Adjust I_N so that it tightly encloses all entry regions in N. End If CondenseTree (PN, QS)

Page 35: R-Tree: Spatial Representation on a  Dynamic-Index Structure

35

Deletion – Condense Tree (II)

• Else If N is root AND Q is NOT Then For each Q QS Do

• For each E Q Do If Q is leaf Then Insert (E) Else Insert (E) as a node entry at the same node level as Q End If

• Next E Next Q

• End If

Page 36: R-Tree: Spatial Representation on a  Dynamic-Index Structure

36

Deletion – Summary

Why ‘re-insert’ orphaned entries?

• Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981), an ‘underflow’ node can be merged with whichever adjacent sibling that will have its area increased the least, or its entries re-distributed among sibling nodes.

• Both methods can cause the nodes to split.

• Eventually all changes need to be propagated upwards, anyway.

Re-insertion accomplishes the same thing, and:

• It is simpler to implement (and at comparable efficiency).

• It incrementally refines the spatial structure of the tree.

• It prevents gradual deterioration if each entry was located permanently under the same parent node.

Page 37: R-Tree: Spatial Representation on a  Dynamic-Index Structure

37

Performance with respect to Parameter m

• A high value of m, nearer to M, is useful when the underlying database represented by the R-Tree is mostly used for search inquiries with very few updates.

• The height of the tree will be kept to a minimum.

• High search performance is maintained.

• However, the risk of overflow and underflow is also high.

• A small value of m is good when frequent updates and modifications of the underlying database is required.

• The nodes are less dense.

• Maintenance is less costly.