cmu scs 15-826: multimedia databases and data mining lecture #7: spatial access methods - iv grid...
TRANSCRIPT
![Page 1: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/1.jpg)
CMU SCS
15-826: Multimedia Databases and Data Mining
Lecture #7: Spatial Access Methods - IV
Grid files, dim. curse
C. Faloutsos
![Page 2: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/2.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #2
Must-read material
• Textbook, Chapter 5.3
• Ramakrinshan+Gehrke, Chapter 28.5
![Page 3: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/3.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #3
Outline
Goal: ‘Find similar / interesting things’
• Intro to DB
• Indexing - similarity search
• Data Mining
![Page 4: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/4.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #4
Indexing - Detailed outline• primary key indexing• secondary key / multi-key indexing• spatial access methods
– problem dfn– z-ordering– R-trees– misc
• text• ...
![Page 5: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/5.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #5
SAMs - Detailed outline• spatial access methods
– problem dfn– z-ordering– R-trees– misc topics
• grid files• dimensionality curse; dim. reduction• metric trees• other nn methods
• text, ...
![Page 6: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/6.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #6
Grid files
• problem: spatial queries in k-d point-sets
• Main idea: try to generalize hashing to k-d
• (how?)
![Page 7: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/7.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #7
Grid files
• A: put a grid
• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exact match
queries– adaptive to non-uniform distr.
• Q: details?
Jurg Nievergelt
![Page 8: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/8.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #8
Grid files
• cuts: all the way through
• cuts: at ½, ¾, ¼ etc; but on demand
• each cell -> disk page
![Page 9: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/9.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #9
Grid files
Thus, we only need:
- cut-points for each axis
- k-d directory
½¼
½¼ ½
½
x-cuts
y-cuts
![Page 10: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/10.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #10
Grid files
Search ( for exact match) – eg., (0.3; 0.3)
½¼
½¼ ½
½
x-cuts
y-cuts
![Page 11: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/11.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #11
Grid files
Search ( for exact match) – eg., (0.3; 0.3)
½¼
½¼ ½
½
x-cuts
y-cuts
![Page 12: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/12.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #12
Grid files
• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exact
match queries– adaptive to non-uniform distr.
X
![Page 13: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/13.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #13
Grid files
partial match – eg., 0<x<0.3
½¼
½¼ ½
½
x-cuts
y-cuts
![Page 14: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/14.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #14
Grid files
partial match – eg., 0<x<0.3
½¼
½¼ ½
½
x-cuts
y-cuts
![Page 15: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/15.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #15
Grid files
exactly the symmetric algo for eg., 0<y<0.3
![Page 16: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/16.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #16
Grid files
• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exact
match queries– adaptive to non-uniform distr.
X
X
![Page 17: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/17.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #17
Grid files
Q: How to split an overflowing page?
½¼
½¼ ½
½
x-cuts
y-cuts
![Page 18: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/18.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #18
Grid files
A: pick the ‘best’ axis, and cut all the way through
½¼
½¼ ½
½
x-cuts
y-cuts
![Page 19: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/19.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #19
Grid files
A: pick the ‘best’ axis, and cut all the way through...
½¼
½¼ 3/8
½
x-cuts
y-cuts
3/8
½
![Page 20: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/20.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #20
Grid files
... updating the directory appropriately (ouch!)
½¼
½¼ 3/8
½
x-cuts
y-cuts
3/8
½
![Page 21: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/21.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #21
Grid files
• specs: [Nievergelt +, 84]– symmetric to all attributes– 2 disk accesses for exact
match queries– adaptive to non-uniform distr.
X
X
X
![Page 22: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/22.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #22
Grid files
• it meets the three goals
• had follow-up work [twin grid files, multi-level; etc]
• BUT: has some disadvantages (which ones?)
![Page 23: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/23.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #23
Grid files - disadvantages
• #1: problems in high-d: directory splits can be expensive
![Page 24: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/24.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #24
Grid files - disadvantages
• #2: even in low-d, suffers on correlated attributes:
![Page 25: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/25.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #25
Grid files - disadvantages
• (Q: how to fix, for 2-d, linearly correlated points?)
![Page 26: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/26.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #26
Grid files - disadvantages
• (A1: rotate [Hinrichs+]; A2: triangular cells [Rego+])
![Page 27: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/27.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #27
Grid files - disadvantages
• #3: how about region data?
![Page 28: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/28.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #28
Grid files - disadvantages
• #3: how about region data?
• if we ‘cut’ them, then we have O(volume) pieces (while z-ordering: O(surface))
• what to do?
![Page 29: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/29.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #29
Grid files - disadvantages
• what to do?
• Translation to 2k – d points! (clever, BUT, still has subtle problems) E.g., 1-d ‘regions’
A B C
x-start
x-end
0 1½
¼ ¾0 1
½¼ ¾
AB
C
![Page 30: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/30.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #30
Grid files - disadvantages
• what to do?
• Translation to 2k – d points! (clever, BUT, still has subtle problems) E.g., 1-d ‘regions’
A B C
x-start
x-end
0 1½
¼ ¾0 1
½¼ ¾
AB
C
![Page 31: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/31.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #31
Grid files - disadvantages
• what to do?
• Translation to 2k – d points! (clever, BUT, still has subtle problems) E.g., 1-d ‘regions’
A B C
x-start
x-end
0 1½
¼ ¾0 1
½¼ ¾
AB
C
![Page 32: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/32.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #32
Grid files - disadvantages
• what is the problem, then?
![Page 33: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/33.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #33
Grid files - disadvantages
• what is the problem, then?
• A: dimensionality curse; large query regions
![Page 34: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/34.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #34
Grid files – conclusions
• works OK in low-d un-correlated points
• but z-ordering/R-trees seem to work better for higher-d
• smart idea to translate k-d rectangles into 2*k - points (but: dim. curse)
![Page 35: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/35.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #35
SAMs - Detailed outline• spatial access methods
– problem dfn– z-ordering– R-trees– misc topics
• grid files• dimensionality curse; dim. reduction• metric trees• other nn methods
• text, ...
![Page 36: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/36.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #36
Dimensionality ‘curse’
• Q: What is the problem in high-d?
![Page 37: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/37.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #37
Dimensionality ‘curse’
• Q: What is the problem in high-d?
• A: indices do not seem to help, for many queries (eg., k-nn)– in high-d (& uniform distributions), most points
are equidistant -> k-nn retrieves too many near-neighbors
– [Yao & Yao, ’85]: search effort ~ O( N (1-1/d) )
![Page 38: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/38.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #38
Dimensionality ‘curse’
• (counter-intuitive, for db mentality)
• Q: What to do, then?
![Page 39: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/39.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #39
Dimensionality ‘curse’
• A1: switch to seq. scanning
• A2: dim. reduction
• A3: consider the ‘intrinsic’/fractal dimensionality
• A4: find approximate nn
![Page 40: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/40.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #40
Dimensionality ‘curse’
• A1: switch to seq. scanning– X-trees [Kriegel+, VLDB 96]– VA-files [Schek+, VLDB 98], ‘test of time’
award
![Page 41: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/41.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #41
Dimensionality ‘curse’
• A1: switch to seq. scanning
• A2: dim. reduction
• A3: consider the ‘intrinsic’/fractal dimensionality
• A4: find approximate nn
![Page 42: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/42.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #42
Dim. reduction
a.k.a. feature selection/extraction:
• SVD (optimal, to preserve Euclidean distances)
• random projections
• using the fractal dimension [Traina+ SBBD2000]
![Page 43: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/43.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #43
Singular Value Decomposition (SVD)
• SVD (~LSI ~ KL ~ PCA ~ spectral analysis...)
LSI: S. Dumais; M. Berry
KL: eg, Duda+Hart
PCA: eg., Jolliffe
MANY more details: soon
![Page 44: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/44.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #44
Random projections• random projections(Johnson-Lindenstrauss
thm [Papadimitriou+ pods98])
![Page 45: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/45.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #45
Random projections
• pick ‘enough’ random directions (will be ~orthogonal, in high-d!!)
• distances are preserved probabilistically, within epsilon
• (also, use as a pre-processing step for SVD [Papadimitriou+ PODS98]
![Page 46: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/46.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #46
Dim. reduction - w/ fractals
• Main idea: drop those attributes that don’t affect the intrinsic (‘fractal’) dimensionality [Traina+, SBBD 2000]
![Page 47: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/47.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #47
Dim. reduction - w/ fractalsglobal FD=1
![Page 48: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/48.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #48
Dimensionality ‘curse’
• A1: switch to seq. scanning
• A2: dim. reduction
• A3: consider the ‘intrinsic’/fractal dimensionality
• A4: find approximate nn
![Page 49: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/49.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #49
Intrinsic dimensionality
• before we give up, compute the intrinsic dim.:
• the lower, the better... [Pagel+, ICDE 2000]
• more details: under ‘fractals’
intr. d = 2 intr. d = 1
![Page 50: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/50.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #50
Dimensionality ‘curse’
• A1: switch to seq. scanning
• A2: dim. reduction
• A3: consider the ‘intrinsic’/fractal dimensionality
• A4: find approximate nn
![Page 51: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/51.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #51
Approximate nn
• [Arya + Mount, SODA93], [Patella+ ICDE 2000]
• Idea: find k neighbors, such that the distance of the k-th one is guaranteed to be within epsilon of the actual.
![Page 52: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/52.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #52
SAMs - Detailed outline• spatial access methods
– problem dfn– z-ordering– R-trees– misc topics
• grid files• dimensionality curse; dim. reduction• metric trees• other nn methods
• text, ...
![Page 53: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/53.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #53
Conclusions
• Dimensionality ‘curse’:– for high-d, indices slow down to ~O(N)
• If the intrinsic dim. is low, there is hope
• otherwise, do seq. scan, or sacrifice accuracy (approximate nn)
![Page 54: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/54.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #54
References
• Sunil Arya, David M. Mount: Approximate Nearest Neighbor Queries in Fixed Dimensions. SODA 1993: 271-280 ANN library:
http://www.cs.umd.edu/~mount/ANN/
![Page 55: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/55.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #55
References
• Berchtold, S., D. A. Keim, et al. (1996). The X-tree : An Index Structure for High-Dimensional Data. VLDB, Mumbai (Bombay), India.
• Faloutsos, C. and W. Rego (1989). “Tri-cell: A Data Structure for Spatial Objects.” Information Systems 14(2): 131-139.
![Page 56: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/56.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #56
References
• Hinrichs, K. and J. Nievergelt (1983). The Grid File: A Data Structure to Support Proximity Queries on Spatial Objects. Proc. of the WG'83 (Intern. Workshop on Graph Theoretic Concepts in Computer Science), Linz, Austria, Trauner Verlag.
![Page 57: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/57.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #57
References cnt’d
• Nievergelt, J., H. Hinterberger, et al. (March 1984). “The Grid File: An Adaptable, Symmetric Multikey File Structure.” ACM TODS 9(1): 38-71.
• Papadimitriou, C. H., P. Raghavan, et al. (1998). Latent Semantic Indexing: A Probabilistic Analysis. PODS, Seattle, WA.
![Page 58: CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos](https://reader036.vdocuments.us/reader036/viewer/2022062223/551a90a35503466b3a8b4ef5/html5/thumbnails/58.jpg)
CMU SCS
15-826 Copyright: C. Faloutsos (2012) #58
References cnt’d
• Weber, R., H.-J. Schek, et al. (1998). A Quantitative Analysis and Performance Study for Similarity-Search Methods in high-dimensional spaces. VLDB, New York, NY.
• Yao, A. C. and F. F. Yao (May 6-8, 1985). A General Approach to d-Dimensional Geometric Queries. Proc. of the 17th Annual ACM Symposium on Theory of Computing (STOC), Providence, RI.