rethinking choices for multi-dimensional point indexing

13
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan

Upload: norman-combs

Post on 03-Jan-2016

13 views

Category:

Documents


0 download

DESCRIPTION

Rethinking Choices for Multi-dimensional Point Indexing. You Jung Kim and Jignesh M. Patel. University of Michigan. Outline. Motivation Index structures Experimental evaluation Conclusion. Motivation. Need for multi-dimensional point indexing in low to medium dimensional space - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Rethinking Choices for  Multi-dimensional Point Indexing

Rethinking Choices for Multi-dimensional Point Indexing

You Jung Kim and Jignesh M. Patel

University of Michigan

Page 2: Rethinking Choices for  Multi-dimensional Point Indexing

Outline

Motivation Index structures Experimental evaluation Conclusion

Page 3: Rethinking Choices for  Multi-dimensional Point Indexing

Motivation

Need for multi-dimensional point indexing in low to medium dimensional space Inherent nature of problems Use of dimensionality reduction techniques, e.g. PCA

Examples Spectral/image search (in feature space) Similarity search in sequence and structure databases Subsequence matching in time-series databases

Frequent choice: R*-tree

Is this the Right Choice?

Page 4: Rethinking Choices for  Multi-dimensional Point Indexing

Index Structures

R* tree

Data Partition

Quadtree

Balanced/Disjoint Space Partition

Pyramid-Technique

Unbalanced/Disjoint Space Partition

Balanced Tree Unbalanced Tree Balanced Tree

Page 5: Rethinking Choices for  Multi-dimensional Point Indexing

Packed Quadtree

Reduced disk footprint for the index Clustering sibling nodes

Regular QuadtreeRegular Quadtree Packed QuadtreePacked Quadtree

Page 6: Rethinking Choices for  Multi-dimensional Point Indexing

Experimental Setup

Three indices and a file scan in SHORE Synthetic and real datasets

Uniformly distributed point dataMAPS Catalog data

Query workload Random and skewed queries following the

underlying data distribution

Page 7: Rethinking Choices for  Multi-dimensional Point Indexing

Experiments with uniform data

Uniform-2D Uniform-4D Uniform-8D

Total execution time for varying data dimensionality

Page 8: Rethinking Choices for  Multi-dimensional Point Indexing

Experiments with skewed data

MAPS-2D MAPS-4D MAPS-8D

Total execution time for varying data dimensionality

Page 9: Rethinking Choices for  Multi-dimensional Point Indexing

Analysis with skewed data

The (relative) poor performance of R*-treeHigh overlap amongst MBRs Skewed data points are spread under several non-le

af nodes The (relative) poor performance of Pyramid-T

echniqueThe unbalanced space split is adversarial for skewe

d data

Page 10: Rethinking Choices for  Multi-dimensional Point Indexing

Quadtree

Uses the buffer pool very efficiently Better spatial locality with skewed queries

R*-tree Quadtree

Page 11: Rethinking Choices for  Multi-dimensional Point Indexing

Effect of packing in Quadtree

MAPS-2D MAPS-4D MAPS-8D

Total execution time of packed and unpacked Quadtree

Page 12: Rethinking Choices for  Multi-dimensional Point Indexing

Conclusion Quadtree outperforms R*-tree and Pyramid-Tech

nique, especially for skewed (real) datasets Efficiency of the Quadtree comes from

Packing technique Regular and disjoint partitioningBetter spatial locality and an efficient use of buffer

Analytical cost model agrees with experimental results i.e. our claims are not due to implementation differences, or dat

aset peculiarities

Page 13: Rethinking Choices for  Multi-dimensional Point Indexing

Questions?