ranking instructor: gautam das class notes prepared by sushanth sivaram vallath
TRANSCRIPT
RankingInstructor: Gautam Das
Class notes
Prepared by Sushanth Sivaram Vallath
Rate Restaurants
Select top 10 from restaurants where location=‘Arlington’
order by
3.5 * price + 1.2 * ambience
Selection Condition
Ranking function
Ranking possibilities
1. Do selection first then ranking – TA cannot be applied. There is no sorted
access in price and ambience
2. First rank, then select– May not work because top 10 may not have
Arlington at all
3. Rank and selection together– Run scan, but ignore tuples not = Arlington
• Pretend 3 cols location, price, ambience– 1 if Arlington– 0 otherwise
• Select top 10 from restaurants order by location * (3.5price + 1.2ambience)
• Q: 0 0 1 1 0• Find the similarity with the query, (count the number of
bits that match)• Hamming Distance (t1, t2) = # of mismatched bits
A1 A2 A3 A4
1 0 1 1
0 1 0 0
How to do it using Threshold Algorithm
• Add a column tid. Order according to the value
• A query where only 2 conditions apply. Other fields doesn’t matter (Note: IR people used to do similar functions to rank documents)
Document Ranking:IR
Definitions
• Database = collection of documents = {d1, d2, … , dn}
• Documents = bag of words di = {w1, w2, … , wn}
• Vocabulary = set of all possible words
• Represented as rows and columns
• Maintain a list for a word W where all the documents which contain W
• While searching for “Information Retrieval” whether to return the documents which contain both the words.
Intersection of the documents which contain both words is to be retrieved
A1 A2
D1 1 0
D2 0 1
D3 1 1
D4 1 0
D5 1 1
D6 0 1
A1 A2
D1 D2
D3 D3
D4 D5
D5 D6
•Do merge-sort to find intersection•Inverted list is used to store the dictionary
• Geographic Information systems : R-tree is used.
• K-nearest Neighbor problem: r-tree is used.