ranking instructor: gautam das class notes prepared by sushanth sivaram vallath

10
Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

Upload: derick-pierce

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

RankingInstructor: Gautam Das

Class notes

Prepared by Sushanth Sivaram Vallath

Page 2: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

Rate Restaurants

Select top 10 from restaurants where location=‘Arlington’

order by

3.5 * price + 1.2 * ambience

Selection Condition

Ranking function

Page 3: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

Ranking possibilities

1. Do selection first then ranking – TA cannot be applied. There is no sorted

access in price and ambience

2. First rank, then select– May not work because top 10 may not have

Arlington at all

3. Rank and selection together– Run scan, but ignore tuples not = Arlington

Page 4: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

• Pretend 3 cols location, price, ambience– 1 if Arlington– 0 otherwise

• Select top 10 from restaurants order by location * (3.5price + 1.2ambience)

• Q: 0 0 1 1 0• Find the similarity with the query, (count the number of

bits that match)• Hamming Distance (t1, t2) = # of mismatched bits

A1 A2 A3 A4

1 0 1 1

0 1 0 0

Page 5: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

How to do it using Threshold Algorithm

• Add a column tid. Order according to the value

• A query where only 2 conditions apply. Other fields doesn’t matter (Note: IR people used to do similar functions to rank documents)

Page 6: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

Document Ranking:IR

Page 7: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

Definitions

• Database = collection of documents = {d1, d2, … , dn}

• Documents = bag of words di = {w1, w2, … , wn}

• Vocabulary = set of all possible words

Page 8: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

• Represented as rows and columns

• Maintain a list for a word W where all the documents which contain W

• While searching for “Information Retrieval” whether to return the documents which contain both the words.

Page 9: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

Intersection of the documents which contain both words is to be retrieved

A1 A2

D1 1 0

D2 0 1

D3 1 1

D4 1 0

D5 1 1

D6 0 1

A1 A2

D1 D2

D3 D3

D4 D5

D5 D6

•Do merge-sort to find intersection•Inverted list is used to store the dictionary

Page 10: Ranking Instructor: Gautam Das Class notes Prepared by Sushanth Sivaram Vallath

• Geographic Information systems : R-tree is used.

• K-nearest Neighbor problem: r-tree is used.