lecture04 range searching

38
Orthogonal Range Searching Lecture 4, CS 631100 Sheung-Hung Poon [email protected] Fall 2011 National Tsing Hua University (NTHU) Lecture 4, CS 631100 Orthogonal Range Searching 1

Upload: kevin-patron-hernandez

Post on 20-Oct-2015

21 views

Category:

Documents


0 download

TRANSCRIPT

  • Orthogonal Range Searching

    Lecture 4, CS 631100

    Sheung-Hung [email protected]

    Fall 2011National Tsing Hua University (NTHU)

    Lecture 4, CS 631100 Orthogonal Range Searching 1

  • Orthogonal Range Searching

    Outline

    ReferenceTextbook chapter 5Mounts Lectures 17 and 18

    Problem: querying a databaseSolution in one dimensionData structure in IR2: range treesExtension to higher dimensionslog n factor improvement

    Lecture 4, CS 631100 Orthogonal Range Searching 2

  • Orthogonal Range Searching

    An Example of Application on Database

    A database in a bank records transactionsA query: find all the transactions such that

    The amount is between $ 1000 and $ 2000It happened between 10:40am and 11:20am

    Lecture 4, CS 631100 Orthogonal Range Searching 3

  • Orthogonal Range Searching

    An Example of Application on Database

    A database in a bank records transactionsA query: find all the transactions such that

    The amount is between $ 1000 and $ 2000It happened between 10:40am and 11:20am

    Geometric interpretation

    Lecture 4, CS 631100 Orthogonal Range Searching 3

  • Orthogonal Range Searching

    Query problems

    Assume n is the total number of transactions in thedatabaseWe will show how to build a data structure in O(n log n)time that allows to perform this type of queries inO(k + log n) time where k is the size of the output (thenumber of transactions that are reported)The data structure is built only once, then a large numberof queries can be answered quicklyO(n log n) is the preprocessing timeO(k + log n) is the query time

    Lecture 4, CS 631100 Orthogonal Range Searching 4

  • Orthogonal Range Searching

    Boxes

    2dboxAlso known asrectangleParallel to coordinateaxis

    3dbox [0, 3] [0, 2.5] [0, 2]Generalize to any dimensionAlgorithmic problems withboxes are relatively easy

    Lecture 4, CS 631100 Orthogonal Range Searching 5

  • Orthogonal Range Searching

    Boxes

    2dboxAlso known asrectangleParallel to coordinateaxis

    3dbox [0, 3] [0, 2.5] [0, 2]Generalize to any dimensionAlgorithmic problems withboxes are relatively easy

    Lecture 4, CS 631100 Orthogonal Range Searching 5

  • Orthogonal Range Searching

    Boxes

    2dboxAlso known asrectangleParallel to coordinateaxis

    3dbox [0, 3] [0, 2.5] [0, 2]Generalize to any dimensionAlgorithmic problems withboxes are relatively easy

    Lecture 4, CS 631100 Orthogonal Range Searching 5

  • Orthogonal Range Searching

    Problem statement

    Let P be a set of n points in IRd

    We assume d = O(1)Preprocess P so as to answer queries of the type

    Input: (a1, b1, a2, b2, . . . ad, bd)Output: P ([a1, b1] [a2, b2] [ad, bd])

    We denote k = |P ([a1, b1] [a2, b2] [ad, bd])|

    Lecture 4, CS 631100 Orthogonal Range Searching 6

  • Orthogonal Range Searching One Dimensional Case (d=1)

    One Dimensional Case (d=1):

    Using BBST

    Lecture 4, CS 631100 Orthogonal Range Searching 7

  • Orthogonal Range Searching One Dimensional Case (d=1)

    Problem statement

    P is a set of real numbersQueries: find all the points in P that are between a and bData structure:

    Balanced Binary Search TreePreprocessing time: (n log n) time to build a BBSTSpace usage: (n)

    Query time: (k + log n) time. How?

    Lecture 4, CS 631100 Orthogonal Range Searching 8

  • Orthogonal Range Searching One Dimensional Case (d=1)

    Answering a query

    Algorithm Report(T , a, b)Input: a BBST T storing P , an interval [a, b]Output: P [a, b]1. if T = NULL2. then return3. xvalue stored at the root of T4. if a

  • Orthogonal Range Searching One Dimensional Case (d=1)

    Analysis of query time

    Report left path, rightpath, vsplit and subtreesin between.Length of

    path from root to vsplitleft pathright path

    All lengths are O(log n)Sum of the sizes of redsubtrees: kQuery time: O(k + log n)

    Lecture 4, CS 631100 Orthogonal Range Searching 10

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Two Dimensional Case (d=2):

    Using range tree

    Lecture 4, CS 631100 Orthogonal Range Searching 11

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Introduction

    A set P of n points in IR2

    Query: given (a1, b1, a2, b2),find all points (x, y) from P in rectangle [a1, b1] [a2, b2].Results presented in this section

    (n log n) preprocessing time(n log n) space usage(k + log2 n) query time

    Query time will be slightly improved in the last section

    Lecture 4, CS 631100 Orthogonal Range Searching 12

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Canonical sets

    First store T in a BBST using the xcoordinates as keysWe associate each node v of T with a canonical set Cvcontaining points in P stored in the subtree rooted at v.

    Lecture 4, CS 631100 Orthogonal Range Searching 13

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Range trees in IR2

    Each canonical set Cv is stored in a BBST Tvusing the ycoordinates as keys.Tv is called the canonical tree at node v.

    We make the query through TWO steps:1st on x-coordinates, & 2nd on y-coordinates(as shown in the following slides).

    Lecture 4, CS 631100 Orthogonal Range Searching 14

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Step 1: Querying x-coordinates

    First make the query with range [a1, b1] on x-coordinatesLet P = P ([a1, b1] (,))Let P be the set of points on the right path and the leftpath (when searching for a1 and b1)We partition P \ P into c canonical subsets

    Thus P = P C1 C2 . . . Cc

    Lecture 4, CS 631100 Orthogonal Range Searching 15

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Partitioning P

    After we make the query with range [a1, b1] on x-coord.:We take the nodes on the left path and the right path,which gives P .For each node on the left path,select canonical tree Ti of its right child, (gives some Ci).For each node on the right path,select canonical tree Ti of its left child, (gives some Ci).

    It takes O(log n) time (height of the BBST).There are c = O(log n) canonical sets in our partition.

    Lecture 4, CS 631100 Orthogonal Range Searching 16

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Step 2: Querying y-coordinates

    p P check if p [a1, b1] [a2, b2], and report it if it is.For all i, use interval [a2, b2] to perform a 1-dim. searchquery in Ci using canonical tree Ti.The union of all these results gives P ([a1, b1] [a2, b2])

    Analysis of query time:

    Let ki = no. of points reported from Tici=1 ki k

    Query time:

    ci=1

    O(log n + ki) = c log n +

    ci=1

    ki = O(log2 n + k)

    Lecture 4, CS 631100 Orthogonal Range Searching 17

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Analysis of total query time

    CiTi

    canonicaltree

    Query on x-coordinates on T :Obtain P (points on left & right paths)& canonical trees Ti.It takes O(log n) time.

    Query on y-coordinates on Ti:It takes O(log2 n + k) (refer to previous slide).

    Total query time = O(log n) +O(log2 n+k) = O(log2 n+k).

    Lecture 4, CS 631100 Orthogonal Range Searching 18

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Space complexity (Proof 1)

    A point p belongs to all the canonical sets in the path fromthe vertex of T that stores p to the root (and only thesecanonical sets)Thus p lies in O(log n) canonical setsHence

    vT|Cv| = O(n log n),

    where Cv = the canonical set at node v.The memory space used is O(n log n).Actually, it is (n log n).

    Why?

    Lecture 4, CS 631100 Orthogonal Range Searching 19

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Space complexity (Proof 2)

    n4

    . . . . . . . . . . . .

    log nlevels

    n

    n2

    n2

    n4

    n4

    n4

    n

    2(n2 ) = n

    4(n4 ) = n

    n(1) = n

    Total = (n log n)

    . . .

    TTi

    Lecture 4, CS 631100 Orthogonal Range Searching 20

  • Orthogonal Range Searching Two Dimensional Case (d=2)

    Preprocessing time

    Tv can be build in O(|Cv| log |Cv|) timeHence the range tree can be built in timev

    |Cv| log |Cv| log nv

    |Cv| = log nO(n log n) = O(n log2 n)

    We can do better ...Compute the Tv s from leaves to rootComputing Tv is merging two sorted sequencesIt takes O(|Cv|) timeOverall, we can build the range tree in time

    v

    |Cv| = (n log n)

    Lecture 4, CS 631100 Orthogonal Range Searching 21

  • Orthogonal Range Searching Range trees in higher dimensions

    Range trees in higher dimensions

    Lecture 4, CS 631100 Orthogonal Range Searching 22

  • Orthogonal Range Searching Range trees in higher dimensions

    Idea

    We assume d > 1 and d = O(1).We want to perform range searching in IRd.

    We still build T with respect to the x1coordinate.For each canonical set of T we build a (d 1)dimensionalrange searching data structure using coordinates(x2, x3, . . . xd).

    To answer a ddimensional queryFind the canonical trees of T associated with [a1, b1]Make a d 1dimensional query on each canonical treerecursively, using [a2, b2] [a3, b3] . . . [ad, bd]

    Lecture 4, CS 631100 Orthogonal Range Searching 23

  • Orthogonal Range Searching Range trees in higher dimensions

    Analysis

    Query time: O(logd n + k)Due to d nested levels in d-dim. range tree,Searching for d levels takes O(logd n) time.Reporting all points inside the query range takes O(k) time.

    Space complexity: O(n logd1 n)By induction on d (See next slide ...)

    Preprocessing time: O(n logd1 n)Compute the Tv s from leaves to rootAs the size of the range tree is O(n logd1 n),building the whole range tree takes O(n logd1 n).

    Lecture 4, CS 631100 Orthogonal Range Searching 24

  • Orthogonal Range Searching Range trees in higher dimensions

    Space complexity (Proof by Induction)

    Suppose (d 1)-dim. range tree has size of O(n logd2 n).

    . . . . . . . . . . . .

    log nlevels

    O(n logd2 n)

    O(n2 logd2 n

    2 )

    O(n logd2 n)

    nO(1) = O(n)

    . . .

    TTi

    2O(n2 logd2 n

    2 )

    = O(n logd2 n2 )

    4O(n4 logd2 n

    4 )

    = O(n logd2 n4 )

    O(n4 logd2 n

    4 )

    Then size of d-dim. range tree isO(n logd2 n) + O(n logd2 n2 ) + O(n log

    d2 n4 ) + . . . + O(n)

    = log n O(n logd2 n) = O(n logd1 n).Lecture 4, CS 631100 Orthogonal Range Searching 25

  • Orthogonal Range Searching Improved range trees

    Improved range trees:

    Fractional cascading

    Lecture 4, CS 631100 Orthogonal Range Searching 26

  • Orthogonal Range Searching Improved range trees

    Motivation

    In IR2 the query time of range trees is (k + log2 n)For comparison based algorithms,(k + log n) is a lower bound.

    Can we do better to achieve the lower bound?Yes, well then show how to obtain (k + log n) optimalquery time.

    Lecture 4, CS 631100 Orthogonal Range Searching 27

  • Orthogonal Range Searching Improved range trees

    Step 1: Querying x-coordinates (Same as before:)

    Make the query with range [a1, b1] on x-coordinates.

    Ci Cj

    Take the nodes on the left path and the right path.Select canonical set Ci at right child of a node on left path;Select canonical set Cj at left child of a node on right path.

    It takes O(log n) time (height of the BBST T ).Let {C1, C2, . . . , Cc} = canonical sets selected,where c = O(log n).

    Lecture 4, CS 631100 Orthogonal Range Searching 28

  • Orthogonal Range Searching Improved range trees

    Step 2: Querying y-coordinates (Modified)

    When processing a query (a1, b1, a2, b2), we searchcanonical trees Tv, always with two keys a2 and b2.For each such tree, we spend O(log n) searching time.Main Idea: As Cv.left and Cv.right are subsets of Cv,We keep pointers between nodes of Tv and nodes of Tv.left& Tv.right that keep same key, or next larger key.

    Av

    Av.left Av.right

    Thus after performing search on a2 or b2 in Tv, we canperform search on a2 or b2 in Tv.left & Tv.right in O(1) time.

    Lecture 4, CS 631100 Orthogonal Range Searching 29

  • Orthogonal Range Searching Improved range trees

    Step 2: Querying y-coordinates (Modified)

    Minor Idea: Replacing each canonical tree Ti by acanonical array Ai for canonical set Ci:

    Make a search for key a2 in array Ai;Starting from a2, walk along array Ai until b2 is exceeded.

    Av

    Av.left Av.right

    Lecture 4, CS 631100 Orthogonal Range Searching 30

  • Orthogonal Range Searching Improved range trees

    Step 2: Querying y-coordinates (Modified)

    First make a binary search for a2 in Aroot,which takes O(log n) time.

    Ci

    Ajv

    u w

    AwAroot

    AuAv

    Cj

    Ai

    By following pointer links, we can search a2 in a canonicalarray Ai in O(1) time.Starting from a2, walk along array Ai (& reporting them)until b2 is exceeded.

    Lecture 4, CS 631100 Orthogonal Range Searching 31

  • Orthogonal Range Searching Improved range trees

    Improving d-dim. range trees

    Hence we can answer 2-dim. range query in O(log n + k)optimal time.This technique is known as fractional cascading.

    By induction, it also improves by a factor O(log n) theresults in d > 2 (by using canonical arrays at the last level,and the linking pointers).Hence range trees with fractional cascading in d 2 yield

    Query time: O(k + logd1 n) (improved by a O(log n) factor)Space usage: O(n logd1 n) (same as before)Preprocessing time: O(n logd1 n) (same as before)

    Lecture 4, CS 631100 Orthogonal Range Searching 32

  • Orthogonal Range Searching Improved range trees

    Remarks on 2-dim. improved range trees

    O(log n + k) query time and O(n log n) preprocessing timeare optimal.But space complexity is NOT optimal.O(n log n/ log logn) space is possible in 2 dimensions withthe same query time, and this is optimal.(not covered in this course)

    Lecture 4, CS 631100 Orthogonal Range Searching 33

  • Orthogonal Range Searching Improved range trees

    Concluding remarks

    Range trees:simplenearly optimal

    Spatial databases mainly use Rtreesnot covered in this coursegood in practice with real data-setsbut no performance guarantee(no good worst case bound on the query time)

    Lecture 4, CS 631100 Orthogonal Range Searching 34

  • Orthogonal Range Searching Next Lecture

    Summary of this lecture:Orthogonal Range Searching

    2-dim. range treesd-dim. range treesFractional cascading

    Next lecture:Segment Trees and Interval Trees

    Segment TreesInterval Trees

    Lecture 4, CS 631100 Orthogonal Range Searching 35

    Orthogonal Range SearchingOne Dimensional Case (d=1)Two Dimensional Case (d=2)Range trees in higher dimensionsImproved range treesNext Lecture