supporting ranking in queries score-based paradigm
DESCRIPTION
Supporting Ranking in Queries Score-based Paradigm. Russell Greenspan CS 411 Spring 2004. Supporting Ranking in Queries Talk Outline. What Why How “Out-of-the-box” support “Smart” top- k processing. Ranking in Queries What is ranking in queries?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/1.jpg)
Supporting Ranking in QueriesScore-based Paradigm
Russell GreenspanCS 411Spring 2004
![Page 2: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/2.jpg)
2
Supporting Ranking in QueriesTalk Outline
What Why How
– “Out-of-the-box” support– “Smart” top-k processing
![Page 3: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/3.jpg)
3
Ranking in Queries What is ranking in queries?
A mechanism to return only the top-k results– Closest matches to user-specified boolean criteria– Scoring results based on user-specified
predicates SELECT Address
FROM HousesForSaleORDER BY Best(Size, Price)
Express similarity, relevance, or preference to a given query
![Page 4: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/4.jpg)
4
What is ranking in queries?Definitions
Intuitive– Output an ordered list of k items such that the list
includes only those items whose scored rank is greater than the items not included
Formal– “Given retrieval size k and scoring function F, a
ranked query returns a list K of k objects (i.e. |K| = k) with query scores, sorted in a descending order, such that F(t1, ..., tn) [u] > F(t1, ..., tn) [v] for all u in K and all v not in K.” [Chang, Hwang, 2002]
![Page 5: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/5.jpg)
5
What is ranking in queries?Differences from traditional queries
How does this differ from traditional queries?– Traditional queries:
Do not stop processing until all results are computed Do not focus on ranking tuples to best match the input
query
– Traditional boolean queries: Do not return “close” matches Can “over” or “under” match, producing too few or too
many results
![Page 6: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/6.jpg)
6
Ranking in Queries Why use ranking in queries?
Exact matches not required– Often times something “close enough” satisfies a
user’s demands
Fuzzy matches desired– Multimedia/image matching, where the very nature
of the query does not involve an exact match
Avoid unnecessary computations– Find the “best” answers quickly as opposed to all
answers
![Page 7: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/7.jpg)
7
Ranking in Queries How do we use execute ranked queries?
“Out-of-the-box” support– Perform query as any other, then perform sort
and return only first k rows– Why is this bad?
Lots of unnecessary processing Waste of resources in intermediate results If scoring function is expensive, could result in
computation of unneeded scores
Can we do better?
![Page 8: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/8.jpg)
8
How do we use execute ranked queries?“Smart” Ranked Query Execution
Query Processing – Try to achieve significant reduction in query
execution time– Use mid-query (i.e. as query executes)
techniques to optimize query plan for top-k results– Consider minimal amount of tuples necessary to
return k results Scoring Predicate
– Consider expense of scoring function in determining optimal query plan
![Page 9: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/9.jpg)
9
“Smart” Ranked Query ExecutionTwo Areas of Research Focus
Top-k processing– Reducing number of tuples considered at each
intermediate step Assume minimal work necessary to retrieve items sorted
by score (i.e. indexes on simple attributes)
Rank function– Reducing number of calls to ranking function
Assume rank calculation is expensive
– Implementing unusual ranking function
![Page 10: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/10.jpg)
10
“Smart” Ranked Query ExecutionResearch and Techniques
Reducing number of tuples considered– Middleware/Multimedia
Garlic [Fagin, 1999] CHITRA [Nepal, Ramakrishna, 1999]
– Relational STOP Operator [Carey, Kossmann, 1997] Probabilistic [Donjerkovic, Ramakrishnan, 1999] Statistical [Chaudhuri, Gravano, 1999]
Reducing number of calls to ranking function – MPro [Chang, Hwang, 2002]
Implementing unusual ranking function– AutoRank [Agrawal, Chaudhuri, Das, Gionis, 2003]
![Page 11: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/11.jpg)
11
“Smart” Ranked Querying (Middleware) –Garlic [Fagin, 1999]
Integrates data from different database systems or non-database data servers– Relational Query Set vs. “Sorted List”
Example: “Return the reddest covers of Beatle’s albums”i.e. (Artist = ‘Beatles’) AND (AlbumColor LIKE ‘red’), where Artists are stored relationally and Album colors in a multimedia database
Assign grade to each object– Boolean grade either 0 or 1– Fuzzy value 0<=x<=1 indicating closeness
![Page 12: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/12.jpg)
12
Garlic [Fagin, 1999]Rank Processing Methods
How to combine two fuzzy values to retrieve top-k objects?– Inefficient
Consider graded sets of all objects by color and shape Compute combined score for every object, then output top
k objects
– Efficient Retrieve objects (sorted by grade) from each subsystem
until there are at least k of the same objects in each set Compute combined score for each of these k objects
![Page 13: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/13.jpg)
13
Garlic [Fagin, 1999]Example Query
Example: (use combined scoring function = x * y) Return Top 2 Color = ‘red’ AND Shape = ‘round’
Object Roundness
A .6
B .8
C .3
D .2
E .9
F .1
G .7
H .4
Object Redness
A .2
B .6
C .1
D .8
E .3
F .5
G .9
H .3
![Page 14: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/14.jpg)
14
Garlic [Fagin, 1999]Inefficient vs. Efficient Processing
Inefficient– Calculate combined
score for every object– Sort by score– Return top k objects
{G, B}
Object Score
A .12
B .48
C .03
D .16
E .27
F .05
G .63
H .12
![Page 15: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/15.jpg)
15
Garlic [Fagin, 1999]Inefficient vs. Efficient Processing
Efficient (Fagin’s A0 algorithm)– Consider ordered members from each set until there
are k of the same object in each set A1 = {G(.9), D(.8), B(.6)} A2 = {E(.9), B(.8), G(.7)}
– Calculate combined score for each of the k objects G = .9 * .7 = .63 B = .6 * .8 = .48
– Return these objects ordered by combined score {G, B}
![Page 16: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/16.jpg)
16
Garlic [Fagin, 1999]Conclusions
Why is this more efficient?– Incur expense of scoring function k times, as
opposed to n times (where n is the total number of items)
– Access each subsystem at least k and at most n times, as opposed to n times (again, where n is the total number of items)
![Page 17: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/17.jpg)
17
“Smart” Ranked Querying (Middleware) –CHITRA [Nepal, Ramakrishna, 1999]
Expands on Fagin’s GARLIC system by proposing new “multi-step” processing algorithm
Experimental results show 50% improvement
![Page 18: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/18.jpg)
18
CHITRA [Nepal, Ramakrishna, 1999]“Multi-step” Algorithm
Consider first sorted item x from each subsystem i
Perform random access into every other subsystem to obtain other rankings of x
Add object to result set if its rank is greater than the threshold grade, quit when we have k objects– Threshold is score of all objects considered each
iteration
![Page 19: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/19.jpg)
19
CHITRA [Nepal, Ramakrishna, 1999]Example Query
Back to our example...Return Top 2 Color = ‘red’ AND Shape = ‘round’
Object Roundness
A .6
B .8
C .3
D .2
E .9
F .1
G .7
H .4
Object Redness
A .2
B .6
C .1
D .8
E .3
F .5
G .9
H .3
![Page 20: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/20.jpg)
20
CHITRA [Nepal, Ramakrishna, 1999]Example Scoring Functions Results
Consider two scoring functions as examples:
– min[x, y]
– [x * y]
Iter. Items Grade Threshold Resultset
1 i1 = {G(.9)}i2 = {E(.9)}
G = min[.9, .7] = .7E = min[.9, .3] = .3
min[.9, .9] = .9
2 i1 = {D(.8)}i2 = {B(.8)}
D = min[.8, .2] = .2B = min[.8, .6] = .6
min[.8, .8] = .8
3 i1 = {B(.6)}i2 = {G(.7)}
B = min[.6, .8] = .6G = min[.7, .9] = .7
min[.6, .7] = .6 {G, B}
Iter. Items Grade Threshold Resultset
1 i1 = {G(.9)}i2 = {E(.9)}
G = [.9 * .7] = .63E = [.9 * .3] = .27
[.9 * .9] = .81
2 i1 = {D(.8)}i2 = {B(.8)}
D = [.8 * .2] = .16B = [.8 * .6] = .48
[.8 * .8] = .64
3 i1 = {B(.6)}i2 = {G(.7)}
B = [.6 * .8] = .48G = [.7 * .9] = .63
[.6 * .7] = .43 {G, B}
![Page 21: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/21.jpg)
21
CHITRA [Nepal, Ramakrishna, 1999]Conclusions
Why is this more efficient?– Requires fewer accesses to each subsystem
How do we know this algorithm is correct?– Proof by contradiction
Assume object z which should have been included If Rank(z) > Rank(y), either:
– y must have at least one subsystem rank smaller than all subsystem ranks of z
– z must have at least one subsystem rank greater than all subsystem ranks of y
However, since Rank(z) < Threshold and Rank(y) >= Threshold, Rank(z) cannot be greater than Rank(y)
![Page 22: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/22.jpg)
22
“Smart” Ranked Querying (Relational) –STOP Operator [Carey, et al, 1997]
Specifies extension to SQL-92 standard to allow limit on cardinality of result– STOP AFTER
Return subset of results from each section of query plan
Implement with STOP operator– STOP(N, D, E) where N is the number of desired
tuples, D is the Sort Directive [asc, desc, none], and E is the Sort Expression
– Heuristically determine when and how to apply
![Page 23: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/23.jpg)
23
STOP Operator [Carey, et al, 1997] Example query plans
Fig a shows traditional JOIN– Join all EMP to DEPT, sort, output top k
Fig b shows implementation of STOP operators– Based on cardinality estimates, only 20 rows of EMP need
be joined with 30 rows of DEPT to produce top-k of 10
![Page 24: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/24.jpg)
24
STOP Operator [Carey, et al, 1997]Conservative Heuristic
Ensures that every tuple in each intermediate result is guaranteed to generate at least one tuple of the overall query result
Advantages– No restarts from intermediate processing returning fewer than k
results– Intermediate STOP operators take their N value from overall
query k value Disadvantages
– Only inserts STOP operators where all remaining predicates are non-reductive (cannot use with multi-way joins)
![Page 25: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/25.jpg)
25
STOP Operator [Carey, et al, 1997]Aggressive Heuristic
Applies STOP operator wherever it may be beneficial, thus reducing intermediate results to a greater degree
Choose N value using cardinality estimates
Requires RESTART operator when intermediate processing returns too few results
![Page 26: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/26.jpg)
26
STOP Operator [Carey, et al, 1997]Experimental Results
Which heuristic is better?– Depends on cardinality, expense of processing
intermediate results, accuracy of prediction, etc.– With low expense of processing intermediate
results, experimental results show aggressive overestimation the best:
Traditional Conservative Aggressive,Underestimate (1/10)
Aggressive,Overestimate (10)
128.3 sec 63.9 sec 63.1 sec 18.5 sec
![Page 27: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/27.jpg)
27
STOP Operator [Carey, et al, 1997]Experimental Results
Performance vs. Traditional (“out-of-the-box”) processing shows benefits in both indexed and non-indexed situations
![Page 28: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/28.jpg)
28
“Smart” Ranked Querying (Relational) –Probabilistic [Donjerkovic, et al, 1999]
Introduces idea of ‘selection cutoff’ to produce top k results without requiring SORT
Quantifies the risk of fewer than k results being generated using inherent database statistics– List the top 10 paid employees
becomesList the employees whose salary is greater than x where x is determined by the distribution of employees’ salaries
![Page 29: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/29.jpg)
29
Probabilistic [Donjerkovic, et al, 1999]Comparison with STOP Operator
In theory, likely to be cheaper to simply ‘select’ the necessary intermediate rows using cutoff (fig b) rather than performing sort and returning top-k (fig a)
![Page 30: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/30.jpg)
30
Probabilistic [Donjerkovic, et al, 1999]Implementation
Leverage same statistics used by traditional query optimizer to guess cutoff– Histograms– Selectivity factors
![Page 31: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/31.jpg)
31
Probabilistic [Donjerkovic, et al, 1999]Performance
For simple query using no indexes (return k highest paid employees, no index on ‘Salary’ attribute), easily outperforms traditional (scan, sort, return top k)
Also provides benefit to JOIN queries due to complexity of estimating join selectivity
![Page 32: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/32.jpg)
32
“Smart” Ranked Querying (Relational) – Statistical [Chaudhuri, Gravano, 1999]
Expansion of probabilistic model Maps rank queries into boolean range queries Works with a variety of scoring functions,
including Min, Euclidean, and Sum
![Page 33: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/33.jpg)
33
Statistical [Chaudhuri, Gravano, 1999]Expansion of probabilistic model
Consider multiple levels of ‘selection cutoff’, here referred to as ‘search score’ (Sq)– NoRestarts – score low enough to guarantee no
restarts are even needed– Restarts – score high enough that restarts might
result– Intermediate – score between NoRestarts and
Restarts
![Page 34: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/34.jpg)
34
Statistical [Chaudhuri, Gravano, 1999]Implementation
Determine Sq from histograms– Choose bounding tuples in each bucket to ensure
NoRestarts (fig a) or tight tuples to minimize selection but potentially require Restarts (fig b)
![Page 35: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/35.jpg)
35
Statistical [Chaudhuri, Gravano, 1999]Implementation
Determine relational query to retrieve all tuples that score above Sq
– Compute n-rectangle bounding such tuples– SELECT *
FROM RWHERE (a1<=A1<=b1) ... AND ... (an<=An<=bn)
Compute score for all returned tuples Output top-k tuples with score > Sq or rerun
query with lower search score
![Page 36: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/36.jpg)
36
Statistical [Chaudhuri, Gravano, 1999]Expansion of Fagin’s model
Expands Fagin’s ideas to relational queries– Substitute ‘search score’ query to determine top
tuples for each subsystem– Use NoRestarts strategy to ensure that expensive
re-querying is avoided
![Page 37: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/37.jpg)
37
“Smart” Ranked Querying (Rank) – MPro [Chang, Hwang, 2002]
Extends consideration of top-k querying to expensive predicates (monotonic only) – As opposed to other work, which assumes the
expense of score calculation to be minimal
Attempt to minimize the number of scores calculated– Consider only Necessary Probes, i.e. only those
calculations without which the top-k results cannot be found
![Page 38: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/38.jpg)
38
MPro [Chang, Hwang, 2002]Determining if probe is necessary
An object’s lowest calculated score represents “ceiling score” (i.e. it is impossible for any other score for that object to raise its lowest score)
If “ceiling score” falls below top-k object’s complete score, object is ruled out and no further calculations on the object need be performed
Simple Example: – Consider scoring function like Min and top-1 results desired– If we know object A’s combined rank with respect to F(x)
and F(y) is .8, and we calculate object B’s score with respect to F(x) to be .7, B’s score with respect to F(y) need not be calculated (its Min value cannot be higher than .7)
![Page 39: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/39.jpg)
39
MPro [Chang, Hwang, 2002] Determining all necessary probes
Only objects with ceiling scores in the top-k need be further evaluated
If objects are kept in sorted order by current ceiling scores:– For any object u in the top-k slots, its next probe
is necessary
![Page 40: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/40.jpg)
40
MPro [Chang, Hwang, 2002]Minimal Probes Algorithm (MPro)
Priority queue initialization– Evaluate each object over first predicate (same as
sequentially accessing objects sorted by x) Necessary probing
– Request from queue the object with highest ceiling score
– Evaluate object over next predicate y– Update ceiling score and reinsert into queue
Stop when at least k objects have been completely scored (and output these objects)
![Page 41: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/41.jpg)
41
MPro [Chang, Hwang, 2002]Further Applications
Incremental results– Output top k, resume processing where it left off for
next k as user requests
Fuzzy joins– Consider join predicate in same manner
Parallel processing– Distribute necessary probes across multiple servers– Distribute data, calculate top-n over each chunk,
merge results
![Page 42: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/42.jpg)
42
MPro [Chang, Hwang, 2002]Experimental Results
On experimental dataset, over 96% of complete probes found to be unnecessary
Elapsed time significantly improved (see below), from 21009 to 408 seconds for k = 10
![Page 43: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/43.jpg)
43
“Smart” Ranked Querying (Rank) – AutoRank [Agrawal, et al, 2003]
Consider ranking of relational attributes in similar way to Information Retrieval (IR)– IDF Similarity
Extend TF-IDF based on frequency of occurrence of attribute values
– QF Similarity Use database workload to determine frequency with
which attributes and attribute values are referenced “Poor man’s relevance feedback”
– ITA Index-based top-k algorithm that exploits above ranking
functions
![Page 44: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/44.jpg)
44
AutoRank [Agrawal, et al, 2003]IDF Similarity
Extend TF (term frequency)– IR – frequency of terms in a document – Relational – frequency of values for an attribute
Extend IDF (inverse document frequency)– IR – total documents / documents containing term– Relational – tuples / tuples where attribute = value
For all tuples matching the queried value, IDF Similarity is the attribute’s IDF (for the queried value), and 0 otherwise
![Page 45: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/45.jpg)
45
AutoRank [Agrawal, et al, 2003]QF Similarity
Consider problem of IDF where desired result is also the most frequent
– Realty database where homes built in the last three years are most desired, but the few entries existing for old homes (with higher IDF) will be considered “top”
Instead, use frequency of occurrence of attribute values in executed queries to determine ranking (by examining workload)
Can extend workload analysis to draw comparative conclusions from attribute values queried together
– Assume similarity between ‘Honda’ and ‘Toyota’ if users frequently look for cars by either of these manufacturers
![Page 46: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/46.jpg)
46
AutoRank [Agrawal, et al, 2003]Implementation
Store approximate representations of IDF and QF values using smooth function
– Minimal storage required– IDF and QF values can be quickly retrieved at runtime
ITA (Index-based Threshold Algorithm)– Use available, existing indexes (B+ trees)– Define threshold by computing best tuple in data not yet
examined– Stop processing when similarity of this tuple is no greater
than similarity of lowest ranking tuple in top-k buffer
![Page 47: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/47.jpg)
47
AutoRank [Agrawal, et al, 2003]Experimental Results
Used large realtor database from http://homeadvisor.microsoft.com and MS- SQL Server
Measured result-quality via user studies– For each test query, asked users to identify
relevant and irrelevant tuples and compared results of QF and IDF queries to users’ responses
ITA judged to be more efficient than SQL Server’s Top-k operator when indexes exist
![Page 48: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/48.jpg)
48
Conclusions
Clearly, an exciting and worthwhile field Research has gone in several directions but
all shares roots in Fagin and Carey’s work Combines many areas of computer science
– Artificial Intelligence (Fuzzy Logic)– Information Retrieval
![Page 49: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/49.jpg)
49
The Future
Implementation in major RDBMS vendors– Microsoft should be among the first to revamp
their Top-K operator, as in-house research [Agrawal, et al, 2003] has provided a smarter, faster technique
Explore more complex ranking functions that cannot be easily mapped to range queries or used with indexes
![Page 50: Supporting Ranking in Queries Score-based Paradigm](https://reader035.vdocuments.us/reader035/viewer/2022062322/56814652550346895db3695d/html5/thumbnails/50.jpg)
50
References
M. J. Carey and D. Kossmann. On saying “enough already!" in SQL. 1997 SIGMOD Conference: 219-230, 1997.
D. Donjerkovic, R. Ramakrishnan. Probabilistic Optimization of Top N Queries. VLDB 1999: 411-422, 1999.
R. Fagin. Combining Fuzzy Information from Multiple Systems. PODS 1996: 216-226, 1996.
S. Nepal, M. V. Ramakrishna. Query Processing Issues in Image (Multimedia) Databases. ICDE 1999: 22-29, 1999.
Surajit Chaudhuri, Luis Gravano. Evaluating Top-k Selection Queries. VLDB 1999: 397-410, 1999.
K.C. Chang, S. Hwang. Minimal Probing: Supporting Expensive Predicates for Top-k Queries. SIGMOD Conference 2002: 346-357, 2002.
Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Automated Ranking of Database Query Results. CIDR 2003, 2003.