venkatesh vinayakarao

24
அட பாட பால பேட ட ஒ கபSearch, like a song, is also a joy. - From the movie, Thulladha Manamum Thullum. Lyrics by Vaali. Venkatesh Vinayakarao (Vv) Information Retrieval Venkatesh Vinayakarao Term: Aug – Sep, 2019 Chennai Mathematical Institute https://vvtesh.sarahah.com/

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Venkatesh Vinayakarao

அட பாடல் பபால பேடல் கூட ஒரு சுகபேSearch, like a song, is also a joy.

- From the movie, Thulladha Manamum Thullum. Lyrics by Vaali.

Venkatesh Vinayakarao (Vv)

Information RetrievalVenkatesh Vinayakarao

Term: Aug – Sep, 2019Chennai Mathematical Institute

https://vvtesh.sarahah.com/

Page 2: Venkatesh Vinayakarao

Evaluation

Page 3: Venkatesh Vinayakarao

How Good is Our System?

• A collection having the following contents• d1: IIIT ALLAHABAD• d2: IIIT DELHI• d3: IIIT GUWAHATI• d4: ISI• d5: IIIT SRI CITY• d6: KREA SRI CITY

• Query is• SRI CITY

• Result is• IIIT SRI CITY• KREA SRI CITY

Very Good!

Page 4: Venkatesh Vinayakarao

Evaluation

Inverted Index

Co

llect

ion

Retrieval System

Results = ??

Query = “IIIT Sri City”

Documents

Query = “IIIT Sri City”

Results = ?? Human Judges

Page 5: Venkatesh Vinayakarao

How Good is Our System?

• A collection having the following contents• d1: IIIT ALLAHABAD• d2: IIIT DELHI• d3: IIIT GUWAHATI• d4: ISI• d5: IIIT SRI CITY• d6: KREA SRI CITY

• Query is• IIIT

• Result is• IIIT SRI CITY• KREA SRI CITY

Not so Good!

Page 6: Venkatesh Vinayakarao

Objective

We want all relevant documents and only relevant documents

Page 7: Venkatesh Vinayakarao

Relevance

• How many relevant documents?• Four (IIIT SRI CITY, IIIT ALLAHABAD, IIIT DELHI, IIIT

GUWAHATI)

• How many retrieved documents?• Two (IIIT SRI CITY, KREA SRI CITY)

How to quantify the “goodness” of our system?

Page 8: Venkatesh Vinayakarao

Terminology

• Documents we see in results are “positive”

• Positive• + IIIT SRI CITY,

• + KREA SRI CITY

• Negative• - IIIT ALLAHABAD

• - IIIT DELHI

• - IIIT GUWAHATI

• - ISI

Page 9: Venkatesh Vinayakarao

Terminology

• Documents that we correctly classify are “true”

• Positive• + IIIT SRI CITY (true)

• + KREA SRI CITY

• Negative• - IIIT ALLAHABAD

• - IIIT DELHI

• - IIIT GUWAHATI

• - ISI (true)Here, query is “IIIT”

Page 10: Venkatesh Vinayakarao

Quiz

•All retrieved results =1. tp + fp2. tp + fn3. tn + fp4. tn + fn

Legendtp = true positivetn = true negativefp = false positivefn = false negative

Page 11: Venkatesh Vinayakarao

Quiz

•All retrieved results =1. tp + fp2. tp + fn3. tn + fp4. tn + fn

Legendtp = true positivetn = true negativefp = false positivefn = false negative

Page 12: Venkatesh Vinayakarao

Quiz

•All relevant results =1. tp + fp2. tp + fn3. tn + fp4. tn + fn

Legendtp = true positivetn = true negativefp = false positivefn = false negative

Page 13: Venkatesh Vinayakarao

•All relevant results =1. tp + fp2. tp + fn3. tn + fp4. tn + fn

Quiz

Legendtp = true positivetn = true negativefp = false positivefn = false negative

Page 14: Venkatesh Vinayakarao

You have 100% Precision

• Everything you retrieved were relevant.• tp + fp = tp• fp = 0

Page 15: Venkatesh Vinayakarao

You have 100% Recall when

• You retrieved everything that were relevant. (Note: You could have retrieved more).• fn = 0• tp = all relevant documents

Page 16: Venkatesh Vinayakarao

Quiz

• R refers to Relevant Document

• N refers to Nonrelevant Document.

• Collection has 10,000 documents.

• Assume that there are 8 relevant documents in total in the collection. Calculate Precision and Recall.

• Retrieved Documents:

RRNNN NNNRN RNNNR NNNNR

Page 17: Venkatesh Vinayakarao

Precision and Recall

• Precision = 6/20

• Recall = 6/8

Page 18: Venkatesh Vinayakarao

Precision and Recall

Precision: fraction of retrieved docs that are relevant = P(relevant|retrieved)

Recall: fraction of relevant docs that are retrieved

= P(retrieved|relevant)

• Precision P = tp/(tp + fp)

• Recall R = tp/(tp + fn)

Relevant Nonrelevant

Retrieved tp fp

Not Retrieved fn tn

Sec. 8.3

Page 19: Venkatesh Vinayakarao

Exercise

Document ID Judge 1 Judge 2 Our System

d1 = Bru 0 0 Retrieved

d2 = 3Roses 0 0 No

d3 = Taj 1 1 Retrieved

d4 = Taj Tea 1 1 No

d5 = Taj Mahal 1 0 No

Query = “Taj”

Suppose, a document is relevant only if both judges agree that it is relevant. Assume (0 = nonrelevant, 1 =

relevant). What is the Precision and Recall?

Page 20: Venkatesh Vinayakarao

Exercise

Document ID Judge 1 Judge 2 Our System

d1 = Bru 0 0 Retrieved

d2 = 3Roses 0 0 No

d3 = Taj 1 1 Retrieved

d4 = Taj Tea 1 1 No

d5 = Taj Mahal 1 0 No

Query = “Taj”

Suppose, a document is relevant only if both judges agree that it is relevant. Assume (0 = nonrelevant, 1 =

relevant). What is the Precision and Recall?

True positive

False positive

Page 21: Venkatesh Vinayakarao

Exercise

Document ID Judge 1 Judge 2 Our System

d1 = Bru 0 0 Retrieved

d2 = 3Roses 0 0 No

d3 = Taj 1 1 Retrieved

d4 = Taj Tea 1 1 No

d5 = Taj Mahal 1 0 No

Query = “Taj”

Suppose, a document is relevant only if both judges agree that it is relevant. Assume (0 = nonrelevant, 1 =

relevant). What is the Precision and Recall?

False Negative

True Negative

True Negative

Page 22: Venkatesh Vinayakarao

Answer

• Precision = 1/2

• Recall = 1/2

Page 23: Venkatesh Vinayakarao

How does a Search Engine Work?

Inverted Positional Index

Co

llect

ion

Retrieval System

Results = ??

Query = “IIIT Sri City, Chittoor”

Documents

Query = “IIIT Sri City, Chittoor”

Results = ??

Tokenize, Case-fold, Stem, Stop, Lemmatize

IIIT \3 Chittoor

P = ?, R = ?

Page 24: Venkatesh Vinayakarao

Thank You