![Page 1: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/1.jpg)
How effective is your classifier?Revisiting the role of metrics in machine learningSANMI KOYEJOCS @ ILLINOIS
Joint work with Ran Li, Xiaoyan Wang, Gaurush Hiranandani, Shant Boodaghians, and Ruta Mehta
![Page 2: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/2.jpg)
Image Source: https://davepannell.com/public/2016/03/Email-marketing-vs-spam.jpg
![Page 3: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/3.jpg)
■ Users complain that most real emails are labelled spam
■ ~90% of all email is spam*
■ Suggests that accuracy is the wrong metric as it gives equal weight to all errors
■ Accuracy = 95%
■ $$$
Image Source: https://becominghuman.ai/deep-learning-made-easy-with-deep-cognition-403fbe445351*Source: Symantec circa 2008; https://www.theatlas.com/charts/NJipnKmq
![Page 4: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/4.jpg)
Error analysis
■ Accuracy
Ground truth
Spam Not Spam
Predicted Spam TP FP
Not Spam FN TN
■ To improve user calibration, try evaluating and/or optimizing weighted accuracy e.g.
![Page 5: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/5.jpg)
The confusion matrix
Beyond Accuracy, more general metrics are nested functions
Ground truth
Y = 1 Y = 0
Predicted h(x) = 1 TP FP
h(x) = 0 FN TN
■ Metrics are used to compare classifiers, or can be optimized directly
■ The classifier performance metric can be approximated from data.
![Page 6: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/6.jpg)
Lots of real world examples
![Page 7: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/7.jpg)
Metrics in ranking and recommendation“Results show that improvements in RMSE often do not translate into [top-N ranking] accuracy improvements. In particular, a naive non-personalized algorithm can outperform some common recommendation approaches and almost match the accuracy of sophisticated algorithms”
P. Cremonesi, Y. Koren, and R. Turrin. "Performance of recommender algorithms on top-n recommendation tasks." Recsys, 2010.
![Page 8: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/8.jpg)
Metric choice has a large impact on real-world machine learning performance.
Given a complex metric, how can we efficiently construct classifiers that (approximately) optimize it?
1Given a new classification problem, which metric should you use to measure performance?
2
![Page 9: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/9.jpg)
One simple trick… A RE-WEIGHTING
STRATEGY
![Page 10: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/10.jpg)
Multiclass classification
Standard metric is Accuracy
![Page 11: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/11.jpg)
Standard Prediction Strategy
e.g. logistic regression, RF, DNN, …
![Page 12: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/12.jpg)
Proposed Postprocessing Strategy
e.g. logistic regression, RF, DNN, …
Narasimhan, H., et al. "Consistent multiclass algorithms for complex performance measures." ICML. 2015.
![Page 13: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/13.jpg)
A small experiment
1. Generate random data from model
2. Fit a logistic regression model
3. Post-process predictions
![Page 14: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/14.jpg)
Simple re-weighting can have a huge effect!
![Page 15: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/15.jpg)
Same strategy works for more complex metrics
any calibrated classifier
![Page 16: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/16.jpg)
Applies to more general settings
NIPS 2014, ICML 2016/2017/2018 ICML 2016
NIPS 2015 In prep.
![Page 17: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/17.jpg)
An application to recommender systemsUser assigns rating to each item.
Solve this as simultaneous (over items) multiclass classification problem i.e. multioutput classification
![Page 18: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/18.jpg)
Postprocessed OrdRec
Koren, Yehuda, and Joe Sill. "OrdRec: an ordinal model for predicting personalized item rating distributions." Recsys2011.
![Page 19: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/19.jpg)
When & Whydoes re-
weighting work?
THE GEOMETRY OF CONFUSION
![Page 20: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/20.jpg)
■ Set of feasible confusion matrices is a bounded convex set
■ Optimization properties will depend on how gradient field of the metric interacts with the feasible set
■ Any monotonic metric will be optimized at the boundary
![Page 21: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/21.jpg)
■ All points on the boundary are determined by the support function
■ This characterization is exhaustive i.e. characterizes ALL metrics that are consistently optimizable via linear post-processing
![Page 22: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/22.jpg)
This classification strategy is consistent
![Page 23: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/23.jpg)
Binary classification with general metrics
Logistic regression w/ MLEHolder densities w/ kernel approx. Threshold searchPlug-in classifier
Yan, K., Zhong, Ravikumar (2018)
![Page 24: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/24.jpg)
Which metric should you use?THE BINARY CLASSIFICATION CASE
![Page 25: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/25.jpg)
Recall: Lots of real world examples
![Page 26: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/26.jpg)
Limited formal guidanceAcademia: Use the standard metric in your application area◦ Accuracy◦ Top-K accuracy◦ F1 measure
Industry:Hire a consultant or economist◦ User survey◦ A/B tests
Image Sources: http://all-free-download.com, https://financesonline.com
![Page 27: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/27.jpg)
Our ApproachQuery an “expert” to determine the real-world value of a classifier i.e. the ideal evaluation metric
Pairwise queriesExperts give inaccurate results for value queries
More accurate results for comparison queries
![Page 28: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/28.jpg)
Speed Matters!THE “ORACLE” CARES ABOUT WORST CASE QUERY COMPLEXITY
![Page 29: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/29.jpg)
Exploiting the geometry…■ Only need to query
classifiers on the boundary – since we already know optimal is within this subset
■ Boundary is one-dimensional, parameterized by “angle”
![Page 30: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/30.jpg)
Using binary search■ Under weak conditions,
metric is unimodal with respect to boundary
■ Thus, can simple binary search to find the optimal confusion matrix
■ Simultaneously recovers gradient of the optimal metric
![Page 31: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/31.jpg)
Guaranteed recovery with finite queriesFor the linear case, when algorithm terminates, we recover
Guaranteed to be accurate after steps
If no additional assumptions, this matches lower bound
Stable to system noise e.g. noisy responses from the “expert”
![Page 32: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/32.jpg)
Conclusion
![Page 33: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/33.jpg)
Metric choice has a large impact on real-world machine learning performance.
Re-weighted post-processing is efficient for optimizing complex metrics.
1Can reduce metric elicitation for binary classifiers to binary search with bounded query complexity.
2
![Page 34: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model](https://reader034.vdocuments.us/reader034/viewer/2022042123/5e9f145da205ec5fd919abbc/html5/thumbnails/34.jpg)
Measurement is at the core of empirical research
Extensions to other machine learning problems e.g. ranking, regression, …
Faster elicitation using alternative query mechanisms
Noise tolerance, robust elicitation