computational geometry - search in high dimension and kd-trees · 2018-05-15 · computational...

68
Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms, National Kapodistrian U. Athens ATHENA Research & Innovation Center, Greece Spring 2018 I.Emiris (Athens, Greece) Computational Geometry Spring 2018 1 / 57

Upload: others

Post on 17-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Computational GeometrySearch in High dimension and kd-trees

Ioannis Emiris

Dept Informatics & Telecoms, National Kapodistrian U. AthensATHENA Research & Innovation Center, Greece

Spring 2018

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 1 / 57

Page 2: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Contents

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 2 / 57

Page 3: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 3 / 57

Page 4: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Range query interpreted geometrically

date of birth

salary

3000

4000

19500000 19559999

G. Ometerborn: Aug 19, 1954salary: $3200

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 4 / 57

Page 5: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Range query in 3 dimensions

date of birth

salary

3000

4000

19500000 19559999

2

4

chlidren

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 5 / 57

Page 6: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The geometric approach

We are interested in answering queries on d fields of the records inour database.

Transform the records to points in d-dimensional space.

The transformed range query asks for all points inside ad-dimensional axis-parallel box (may be unbounded).

Such a query is called “rectangular” or “orthogonal” range query.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 6 / 57

Page 7: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

1-Dimensional Range Search

Problem

Preprocess a set of points P = {p1, p2, . . . , pn} ∈ R so as to answerqueries efficiently:Which points lie inside a query interval [x : x ′]?

Arrays

O(n) space, O(n log n) preprocess, O(k + log n) query

But, do not generalize in higher dim,

do not allow efficient updates: O(n).

Balanced Binary Search Trees (BBST)

The leaves of T store the points of P,

internal nodes store splitting values that guide the search.

E.g. red-black trees, AVL trees.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 7 / 57

Page 8: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

A search with the interval [18 : 77]

3 10

3

19 23

19

30

4930

10 37

37 59 62

59

70 80

70

62

23

89 100

89

100 105

80

49

µ µ′

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 8 / 57

Page 9: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

A search with the interval [x , x ′]

Search for x and x ′ in T . The search ends to leaves µ and µ′.

Report all points stored at leaves between µ and µ′ plus, possibly, thepoints stored at µ and µ′.

Remark

The leaves to be reported are the ones of subtrees that are rooted atnodes whose parents are on the search paths to µ and µ′.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 9 / 57

Page 10: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

A search with the interval [x , x ′]

Search for x and x ′ in T . The search ends to leaves µ and µ′.

Report all points stored at leaves between µ and µ′ plus, possibly, thepoints stored at µ and µ′.

Remark

The leaves to be reported are the ones of subtrees that are rooted atnodes whose parents are on the search paths to µ and µ′.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 9 / 57

Page 11: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The selected subtrees

µ µ′

root(T )

vsplit

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 10 / 57

Page 12: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Correctness and Performance

Any reported point lies in the query range.

Any point in the range is reported.

O(n) storage.

O(n log n) preprocessing.

O(log n) update.

Θ(n) worst case case query cost.

O(k + log n) output sensitive query cost: O(k) to report the pointsplus O(log n) to follow the paths to x , x ′.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 11 / 57

Page 13: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 12 / 57

Page 14: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

kd-Trees in the plane

Problem

Preprocess points P = {p1, p2, . . . , pn} ⊂ R2, to answer queries efficiently:Which points lie inside a query rectangle [x : x ′]× [y : y ′]?p = (px , py ) lies in the rectangle iff px ∈ [x , x ′] & py ∈ [y , y ′].

kd-trees

Generalize BBST: they split current pointset at median value, but usedifferent coordinate at each level.

Left subtree contains half (or one less) points with smaller coordinate andpoint with median value.

Points shall correspond to leaves (and plane regions).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 13 / 57

Page 15: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

l4

l5

l6

l7

l9

p1p2

p3

p4

p5

p6

p7

p8

p9

p10

l8

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 16: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1p1

p2

p3

p4

p5

p6

p7

p8

p9

p10

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 17: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2

p1p2

p3

p4

p5

p6

p7

p8

p9

p10

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 18: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

p1p2

p3

p4

p5

p6

p7

p8

p9

p10

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 19: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

l4

p1p2

p3

p4

p5

p6

p7

p8

p9

p10

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 20: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

l4

l5p1

p2

p3

p4

p5

p6

p7

p8

p9

p10

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 21: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

l4

l5

l6

p1p2

p3

p4

p5

p6

p7

p8

p9

p10

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 22: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

l4

l5

l6

l7p1

p2

p3

p4

p5

p6

p7

p8

p9

p10

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 23: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

l4

l5

l6

l7p1

p2

p3

p4

p5

p6

p7

p8

p9

p10

l8

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 24: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The way the plane is subdivided

l1

l2l3

l4

l5

l6

l7p1

p2

p3

p4

p5

p6

p7

p8

p9

p10

l8l9

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57

Page 25: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

The corresponding binary tree

p5 p4

l4

p2

l5

p10

l6

l2 l3

p9

l1

p3 p1

p2

l5

p3 p1

l8 p2

l5

p3 p1

l9p7

l7

p6 p8

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 15 / 57

Page 26: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

BuildKdTree(P , depth)

if P contains only one point thenreturn a leaf storing this point

elseif depth is even then

split P with vertical ` through median x-coord. of points in PP1 ← points left of ` or on `P2 ← points right of `

else {depth is odd}split P with horizontal ` through median y -coord. of points in PP1 ← points below ` or on `P2 ← points above `

end ifvleft ← BuidKdTree(P1, depth + 1); vright ← BuidKdTree(P2, depth + 1)create a node v storing `lc(v)→ vleft ; rc(v)→ vrightreturn v

end if

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 16 / 57

Page 27: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Building time and storage

Remarks

Split at the n2 -th smallest (median) coordinate: O(n) time,

or preprocess by sorting both on x- and y -coordinates.

The building time satisfies the recurrence:

T (n) =

{O(1) if n = 1O(n) + 2T (n2 ) if n > 1

T (n) = O(n log n) which subsumes sorting.

O(n) storage: points stored at leaves, leaf contains ≥ 1 points(alternatively stored at internal/splitting nodes).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 17 / 57

Page 28: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Nodes in a kd-tree and regions in the plane

l1

l2

l3

l1

l2

l3

v

region(v)

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 18 / 57

Page 29: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Regions and the query algorithm

Internal nodes of a kd-tree correspond to rectangular regions of theplane: can be unbounded on one or more sides.

Regions of all nodes at a specific level partition the plane.

region(root(T )) is the whole plane.

Point stored at (leaf of) subtree rooted at v iff it lies in region(v)

Search the subtree of v only if the query rectangle intersectsregion(v).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 19 / 57

Page 30: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

A query on a kd-tree

l1

l2l3

l4

l5

l6

l7p1

p2

p3

p4

p5

p6

p7

p8

p9

p10

l8l9

p5 p4

l4 l5

p10

l6

l2 l3

p9

l1

p1

l5

p1

l8 p2

l5

p3 p1

l9p7

l7

p6 p8

p2

p3

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 20 / 57

Page 31: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Algorithm

SearchKdTree(v ,R)

if v is a leaf thenreport point stored at v if in R

elseif region(lc(v)) is fully contained in Rthen

ReportSubtree(lc(v))else

if region(lc(v)) intersects R thenSearchKdTree(lc(v),R)

end ifend ifif region(rc(v)) is fully contained in Rthen

ReportSubtree(rc(v))else

if region(rc(v)) intersects R thenSearchKdTree(rc(v),R)

end ifend if

end if

Input: root v , range R.

Works for any query R,e.g. disk, triangle.

O(k) to report k points.

How many other nodes vare visited? i.e. for howmany v , query rangeintersects region(v)?

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 21 / 57

Page 32: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Query time analysis

Any vertical line intersects region(lc(root(T ))) orregion(rc(root(T ))) but not both.

If a vertical line intersects region(lc(root(T ))) it always intersects theregions corresponding to both children of lc(root(T )).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 22 / 57

Page 33: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Query time analysis

The number of intersected regions (by vertical line) in a kd-treestoring n points, satisfies the recurrence:

Q(n) =

{O(1) if n = 12 + 2Q(n4 ) if n > 1

Q(n) = O(√n)⇒ time = O(

√n + k) for rectangular query

The analysis is rather pessimistic: In many practical situations thequery range is small and will intersect much fewer regions.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 23 / 57

Page 34: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 24 / 57

Page 35: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Introduction

Given a distance function/metric:

Preprocess: set of points/objects P = {p1, . . . , pn} in d dimensions.

Query: Given a d-dimensional query point/object q, report the closestp ∈ P to q.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 25 / 57

Page 36: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Motivation

Points model general objects (e.g. handwritten digits)

Distance between points inverse to similarity measure

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 26 / 57

Page 37: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Several applications

Machine Learning: clustering/classification.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 27 / 57

Page 38: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Several applications

Pattern Recognition and Classification

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 28 / 57

Page 39: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Several applications

Searching multimedia databases.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 29 / 57

Page 40: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

NN in R

Sort/store the n points, use binary search for queries, then:

Prepreprocessing in O(n log n) time

Data structure requiring O(n) space

Answer the query in O(log n) time

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 30 / 57

Page 41: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

NN in R2

Preprocessing: Voronoi Diagram in O(n log n).

Storage = O(n).

Given query q, find the cell it belongs (point location) in O(log n).NN = site of cell containing q.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 31 / 57

Page 42: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Exact NN in Rd

Is it faster than linear-time?

Curse of Dimensionality:

Complexity of Voronoi diagram grows rapidly = O(ndd/2e).

Planar point location methods do not extend to higher dimensions.

The volume of the space increases so fast that data becomes sparse

State of the art:

kd-trees: Sp = O(n), Query = O(d · n1−1/d).Most practical for d � log n: O(log n) expected for “random” points

Randomized [Clarkson’88]: Sp = O(ndd/2e+δ), Q ' log n· exp(d).

n hyperplanes: point location O(d5 log n), Sp = O(nd+δ) [Meiser’93]

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 32 / 57

Page 43: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 33 / 57

Page 44: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Nearest Neighbor in high dimension

Exact NN

Given set P in d dimensions, and query point q, its NN is point p0 ∈ P:

dist(p0, q) ≤dist(p, q), ∀p ∈ P.

Approximate NN

Given set P in d dimensions, approximation factor 1 > ε > 0, and querypoint q, an ε-NN, or ANN, is any point p0 ∈ P:

dist(p0, q) ≤ (1 + ε) dist(p, q), ∀p ∈ P.

• •

•q

•NN

•x∗1

•x∗2

r(1 + ε)r

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 34 / 57

Page 45: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Approximate NN in Rd

BBD tree [Arya,Mount et al.98]: optimal query for d = O(1) BBD

In practice like kd-trees:

cgal offers “lazy” kd-treesann [Mount] for d ≤ 60flann [Lowe-Muja], kd-geraf [E-Samaras]: randomized

Locality sensitive hashing (LSH) for ε-NNSp = O(dn1+ρ), Q = O(dnρ), ρ = 1/(1 + ε)2.[Indyk,Motwani’98] [Panigrahy’06] [Andoni,Indyk’06]

Dimensionality reduction [Anagnostopoulos,E,Psarros’15’17]Sp = O∗(dn), Q = O∗(dnρ), ρ = 1 + ε2/ log ε < 1

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 35 / 57

Page 46: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

NN formulations

Standard computational geometry: the space is Euclidean Rd , forconstant d .

Complex data: treat d as an asymptotic quantity and seek solutionshaving no exponential dependence on d .

Wish to treat arbitrary metric (nonvector) spaces.

Structure, especially for metric spaces: may assume a growth-limitingproperty, e.g. constant doubling dimension: twice the ball is includedin constant number of balls (true for Euclidean).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 36 / 57

Page 47: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Grid for Uniform points

n uniformly distributed points in [0, 1]d

Cell structure (array) using parameter c = O(1)– Expected #points per box = c (high c increases box search).– Expected n/c boxes (high c reduces array size).– Each box of volume = c/n, edge length (c/n)1/d < 1.

Query lands in a box in O(1), checks points in box.– Given current best distance, check 3d − 1 adjacent boxes.– In expectation, O(1) boxes visited, in time O(c).– Expected query time = O(c + 3d).[Bentley-W.-Yao’80,Bentley’90].

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 37 / 57

Page 48: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 38 / 57

Page 49: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 39 / 57

Page 50: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

kd-trees

Assuming d > 2 but small.

Iterate through splitting coordinates; various strategies to pick them

Leaves contain ≥ 1 points; bound #levels.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 40 / 57

Page 51: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

NN search

Procedure NN(node), given query q

if node is leaf thenSearch all points in node, update best-dist

else {internal node}if split-coor(q) ≤ node’s split-value then

NN(left-child) // standard branchif split-coor(q) + best-dist > node’s split-value then

NN(right-child) // recurse to checkend if

else {split-coor(q) > node’s split-value}NN(right-child)if split-coor(q)− best-dist ≤ node’s split-value then

NN(left-child)end if

end if left/rightend if internal node

Overall topdown algorithm: NN(root).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 41 / 57

Page 52: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Complexity

Sp = O(d · n).

construction of balanced tree: O(d · n log n) by sorting per dimension,O(n log n) by linear-time median computation.

(Few) Insert/delete operations in balanced kd-tree = O(log n)

Exact Range query = O(d · n1−1/d + k).

In practice, ANN ' O(log n) when d = O(1), since O(1) expectedneighbors for random (e.g. uniform) distribution. See also BBD-trees:O((d/ε)d log n).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 42 / 57

Page 53: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Complexity

Sp = O(d · n).

construction of balanced tree: O(d · n log n) by sorting per dimension,O(n log n) by linear-time median computation.

(Few) Insert/delete operations in balanced kd-tree = O(log n)

Exact Range query = O(d · n1−1/d + k).

In practice, ANN ' O(log n) when d = O(1), since O(1) expectedneighbors for random (e.g. uniform) distribution. See also BBD-trees:O((d/ε)d log n).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 42 / 57

Page 54: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Extension

k Nearest Neighbors

Store k current best points.

Current ball encloses k current best points.

Eliminate sibling if none of its points can be closer than any of kcurrent best points, i.e. if sibling region outside current ball.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 43 / 57

Page 55: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Splitting at max spread

median of set closest to box centre

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 44 / 57

Page 56: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 45 / 57

Page 57: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Randomization

Construct:

Create r kd-trees s.t. searches are largely independent.

Find O(1) coord’s maximizing variance: Pick one randomly

May sample the data; split it about the mean of the sample.

Use bounded #levels; bucket contain several points.

Principal Component Analysis finds moment axes: rotate to alignthem with the coordinte axes. Or, random rotation.

Search:

Upper bound on total #nodes to be searched.

Priority queue stores candidates across r trees.

Similar effect to r lower-dim projection [Silpa-Anan,Hartley’08]

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 46 / 57

Page 58: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

FLANN: Fast Library for ANN

Typically r ≤ 6 independent trees.

Target d = 128, n > 104 (SIFT encoding of images).

Given data: Automatic choice of configuration,and algorithm (Randomized kd-trees, Hierarchical k-means trees)

[Lowe:IJCV04], software [Lowe,Muja]

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 47 / 57

Page 59: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

kd-GeRaF

Implement k-ANN

Simultaneous search, no backtracking.

Quickselect algorithm to find median in O(n)

Accelerated distance computations (dot product, see below)

Public domain C++: https://github.com/gsamaras/kd_GeRaF

(WebApp: 195.134.67.90:8080)

[Avrithis,E,Samaras’15]

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 48 / 57

Page 60: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Randomization

RotationEvery tree uses a randomly rotated pointset, thus using adifferent set of dimensions/coordinates.

Split dimensionPick t dimensions of highest variance. Choose one randomlyat every node while building the tree.

Split ValueThe pointset’s median in split dimension plus uniformlydistributed δ ∈ [−3∆√

d, 3∆√

d], ∆ = diameter of pointset.

ShufflingThe split value may be witnessed in several points, instead ofpicking always the same point, shuffle them to break ties.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 49 / 57

Page 61: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Performance

Parameters (auto or manual)

r Number of trees in forest (points are stored once)

t Number of hi-variance dimensions used for splits

Maximum number of points-per-leaf

c Maximum number of leaves to be checked during search

ε Determine search accuracy

Practical complexity

Automatic parameter configuration yields fastest preprocessing,successful trade-off between accuracy and speed.

Most competing methods suffer from slow parameter configuration,running out of memory, unstable search behaviour.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 50 / 57

Page 62: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Implementation

Search

Descend every tree to leaf, store unvisited branch nodes inmin-priority queue Q.

Examine nodes in Q, until c/1 + ε leaves are checked.

On descending a tree:– at leaf: update currently best distance.– at node: if query in the left half-space: insert right child to Q,descend to left child; or vice versa.

Distance computation

‖x − q‖2 = ‖x‖2 + ‖q‖2 − 2q · x , where the first two can be stored.Offers up to 10% speedup.

Project idea: ‖x − q‖2 − ‖y − q‖2 reduces to 2q · (y − x).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 51 / 57

Page 63: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Experiments

Faster than ann/bbd, flann for d ≥ 1, 000 (up to 10,000), n ≤ 106.

(i) SIFT images: n = 106, d = 128, BBD out of memory.GIST: d = 960 (ii) n = 105 (iii) n = 106, query < 1s, 90% exact.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 52 / 57

Page 64: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Experiments on Oxford set, CroW features (neural nets)

n = 5062 images, d = 512.Brute force: 5.22 sec. Build takes 2 sec for kd-Geraf.

points per leaf trees no t max leaf check miss(%) time(ms)

1 1 4 2 4 0.21 1 4 4 0 0.31 4 4 4 0 0.5

Search with ”Noisy” queries.

points per leaf trees no t max leaf check miss(%) time(s)

16 8 32 32 3.6 0.0116 32 64 64 0 0.0316 64 64 4 0 0.02

Search with Oxford queries.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 53 / 57

Page 65: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Outline

1 Orthogonal Range Search2D orthogonal range search

2 Nearest NeighborsApproximate nearest neighbor

3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 54 / 57

Page 66: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

BBD-trees

Box: set theoretic difference of two boxes,one enclosed in the other (inner is optional)

”Empirical runtimes for most distributionsshow little/no practical advantage overkd-trees” [Arya,Mount,et al’94,98].

Complexity:

O(1) points per leaf, space = O(dn).

Height = O(log n): every 4 levels reduce #points by > 2/3.

Construct = O(dn log n).

k εNNs in time O((d/ε)d + k) log n).

Dynamic: point insertion/deletion = O(log n).

overview

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 55 / 57

Page 67: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

Construction

Tree constructed by applying 2 operations, when cell contains > 1 points:

(Fair) Split:– by hyperplane parallel to a coordinate plane, through midpoint.– If inner box exists, do not intersect it.– Exponential decrease of region size (quadtree).

Shrink:– partitions box into inner and outer boxes.– If inner box exists, it lies inside new inner box.– Exponential decrease in number of points per cell (kd-tree).

Two strategies:– Splits and Shrinks alternate.– Split until both children with < 2/3 of parent’s points, then Shrink.

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 56 / 57

Page 68: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,

ANN search

Algorithm

1 Find leaf that contains query q; min-dist δ from q to points in leaf

2 Order leaves in increasing distance from q (priority search).

3 Find closest leaf to q, compute min-distance δ from q to points in cell

4 While distance of next closest leaf < δ(1 + ε), compute min-distancebetween q and points in cell: if < δ, this distance becomes δ.

Time Complexity

Point location = O(log n).

#cells explored = c < (1 + 6d/ε)d .

ANN query = O(cd log n).

I.Emiris (Athens, Greece) Computational Geometry Spring 2018 57 / 57