access structures for angular similarity queries
DESCRIPTION
Access Structures for Angular Similarity Queries. Tan Apaydin and Hakan Ferhatosmanoglu IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 11, NOVEMBER 2006. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/1.jpg)
Access Structures forAngular Similarity
QueriesTan Apaydin and Hakan Ferhatosmanoglu
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 11, NOVEMBER 2006
![Page 2: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/2.jpg)
Angular similarity measures have been utilized by several database applications to define semantic similarity between various data types such as text documents, time-series, images, and scientific data.
Problems due to a mismatch of geometry make current techniques either inapplicable or their use results in poor performance.
This brings up the need for effective indexing methods for angular similarity queries.
Motivation
![Page 3: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/3.jpg)
We propose access structures to enable efficient execution of queries seeking angular similarity.
We explore quantization-based indexing, which scales well with the dimensionality ,and propose techniques that are better suited to angular measures than the conventional techniques.
![Page 4: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/4.jpg)
00 0~0.2501 0.26~0.510 0.51~0.7511 0.76~1
Vector Approximation file (VA-file)
000 0~0.125
001 0.126~025
010 0.26~0.375
011 0.376~0.5
100 0.51~0.625101 0.626~0.75110 0.76~0.875111 0.876~1
![Page 5: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/5.jpg)
![Page 6: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/6.jpg)
![Page 7: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/7.jpg)
Approach would slice the major pyramids in a round-robin manner. For instance,
= 1 according to = 1 to = 1 to (in a cyclicmanner)
Round-robin manner
![Page 8: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/8.jpg)
Equi-volumedEqui-populated
𝑥1 𝑥1𝑥3𝑥3
𝑥2𝑥2
![Page 9: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/9.jpg)
A particular point is contained in major pyramid , where is the dimension with the greatest corresponding value, i.e., .
For instance, in three dimensions, P(0.7, 0.3, 0.2) will be in “x1 = 1 major pyramid” since 0.7 ()is greater than both 0.3 and 0.2 .
![Page 10: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/10.jpg)
The easiest way to decide whether an approximation intersects the range query space is to look at the boundaries of the unit square which are not intersecting the origin.
Filtering Step
![Page 11: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/11.jpg)
𝑥2
𝑥3𝑥1
min
max𝑥1
𝑥2
Q
![Page 12: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/12.jpg)
If a feature vector is represented as ) the cosine angle is defined by the following formula:
if we assume the query point to be normalized, then can be simplified to
where U() is the unit normalized query
![Page 13: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/13.jpg)
Let Q be a three-dimensional query point and u=(, , ) be the unit vector which is the normalization of the query vector.
The expression for an equivalence conic surface in angular space is the following equation:
![Page 14: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/14.jpg)
For , the closed form of the ellipse equation is
To maximize or minimize subject to the constraint , the following system of equations is solved:
Lagrange’s multipliers approach
![Page 15: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/15.jpg)
To compute the extreme values for on , take f( ) =
To compute the extreme values for on , take f( ) =
To compute the extreme values for on , take f( ) =
, ) = ( + )
, ) = ( + )
, ) = ( + )
![Page 16: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/16.jpg)
We have the min-max values, we can use them to retrieve the relevant approximations.
These are the approximations in the specified range neighborhood of the query.
Filter Approximations
![Page 17: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/17.jpg)
Pruning step, we need to compute the angular distance of every candidate point to the query point and, if a point is in the given range , then we output that point in the result set.
Identifying feature vectors
![Page 18: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/18.jpg)
Uses cone partitions, rather than pyramids, and is organized as shells instead of the sweep approach followed by AS-Q.
CONE-SHELL QUANTIZER (CS-Q)
![Page 19: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/19.jpg)
is the number of data points is the reference point is the set of all approximations is the ith approximation.
1) For each data point , 1 k N, calculate the angular distance between and . 2) Sort the data points in nondecreasing order based on their angular distances to . 3) Assume t is the given population for each approximation. Assign the first t number of points in sorted order to Sae , the second t number of points to , and so on.
Angular Approximations based on Equal Populations
![Page 20: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/20.jpg)
![Page 21: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/21.jpg)
EXPERIMENTAL RESULTS
![Page 22: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/22.jpg)
AS-Q
![Page 23: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/23.jpg)
CS-Q
![Page 24: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/24.jpg)
![Page 25: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/25.jpg)
Scalability test results
![Page 26: Access Structures for Angular Similarity Queries](https://reader035.vdocuments.us/reader035/viewer/2022062501/568161ac550346895dd16ba2/html5/thumbnails/26.jpg)