evaluation metrics - unimolevaluation metrics presented by dawn lawrie 1 some possibilities...
TRANSCRIPT
![Page 1: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/1.jpg)
Evaluation MetricsPresented by Dawn Lawrie
1
![Page 2: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/2.jpg)
Some PossibilitiesPrecisionRecallF-measureMean Average PrecisionMean Reciprocal Rank
2
![Page 3: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/3.jpg)
Precision
Proportion of things of interest in some set
Example: I’m interested in apples
Set
Precision = 3 apples / 5 pieces of fruit
3
![Page 4: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/4.jpg)
Recall
Proportion of things of interest in the set out of all the things of interest
Example: I’m looking for apples
Set
Recall = 3 apples / 6 total apples
4
![Page 5: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/5.jpg)
F-measure
Harmonic mean of precision and recallCombined measure that values each the same
F1= 2 * precision * recallprecision + recall
5
![Page 6: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/6.jpg)
Where to use
The set is well definedOrder of things in the set doesn’t matter
6
![Page 7: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/7.jpg)
But with a Ranked List123456789
10
123456789
10
7
![Page 8: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/8.jpg)
Mean Average Precision
Also known as MAPFavored IR metric for ranked retrieval
8
![Page 9: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/9.jpg)
Let Relevant = Set of Apples
Computing Average Precision
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
9
![Page 10: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/10.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
9
![Page 11: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/11.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
9
![Page 12: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/12.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2
9
![Page 13: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/13.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2
9
![Page 14: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/14.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3
9
![Page 15: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/15.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3
9
![Page 16: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/16.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3 + 3/6
9
![Page 17: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/17.jpg)
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3 + 3/6 + 4/10 + 5/11 + 6/12
9
![Page 18: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/18.jpg)
Compute MAPCompute average over a query set
Apple QueryBlueberry QueryPineapple QueryBanana Query
MAP Query( ) =AP Relevant( q )( )
q∈Query∑
Query
10
![Page 19: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/19.jpg)
Limitation of MAP
Results can be biased for query sets that include queries with few relevant documents
11
![Page 20: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/20.jpg)
Mean Reciprocal Rank
RR (q ) =
if q retrieves no relevant documents
0
otherwise 1TopRank q( )
!
"
##
$
##
MRR Query( ) =RR (q )
q∈Query∑
Query
12
![Page 21: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/21.jpg)
Mean Reciprocal Rank
RR (q ) =
if q retrieves no relevant documents
0
otherwise 1TopRank q( )
!
"
##
$
##
MRR Query( ) =RR (q )
q∈Query∑
Query
Reciprocal Rank
12
![Page 22: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/22.jpg)
Understanding MRRRanks
515
13
![Page 23: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/23.jpg)
Understanding MRRRanks
515
205215
13
![Page 24: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/24.jpg)
Understanding MRRRanks
515
RR values0.2
0.067205215
13
![Page 25: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/25.jpg)
Understanding MRRRanks
515
RR values0.2
0.0670.00490.0047
205215
13
![Page 26: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/26.jpg)
Understanding MRRRanks
515
RR values0.2
0.0670.00490.0047
Average: 110 MRR: 0.069
205215
13
![Page 27: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/27.jpg)
MRR vs. Average RankMRR=MAP when one relevant documentBound result between 0 and 1
1 is perfect retrievalAverage rank greatly influenced by documents retrieved at large ranks
High Ranks does not reflect the importance of those documents in practice
Minimizes difference between 750 and 900
14
![Page 28: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion](https://reader030.vdocuments.us/reader030/viewer/2022040307/5ed0782a21104e0e02433ad2/html5/thumbnails/28.jpg)
Take Home MessageP/R and f-measure good for well defined setsMAP good for ranked results when your looking for 5+ thingsMRR good for ranked results when your looking for <5 things and best when just 1 thing
15