kiri wagstaff jet propulsion laboratory, california institute of technology july 25, 2012...
TRANSCRIPT
![Page 1: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/1.jpg)
Kiri WagstaffJet Propulsion Laboratory, California Institute of Technology
July 25, 2012
Association for the Advancement of Artificial Intelligence
CHALLENGES FOR MACHINE LEARNING IMPACT
ON THE REAL WORLD
© 2012, California Institute of Technology. Government sponsorship acknowledged.This talk was prepared at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with NASA.
![Page 2: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/2.jpg)
MACHINE LEARNING IS GOOD FOR:
Photo: Matthew W. Jackson
[Nguyen et al., 2008]Photo: Eugene Fratkin
![Page 3: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/3.jpg)
WHAT IS ITS IMPACT?
HOW OFTEN ARE WE DOING
MACHINE LEARNING FOR
MACHINE LEARNING’S SAKE?(i.e., publishing results to impress other ML researchers)
Machine Learning
worldData
?
76%
83%
89%
91%
![Page 4: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/4.jpg)
ML RESEARCH TRENDS THAT LIMIT IMPACT
• Data sets disconnected from meaning
• Metrics disconnected from impact
• Lack of follow-through
![Page 5: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/5.jpg)
UCI DATA SETS
“The standard Irvine data sets are used to determine percent accuracy of concept classification, without regard to performance on a larger external task.”
Jaime Carbonell
But that was way back in 1992, right?
UCI: Online archive of data sets provided by the University of California, Irvine
[Frank & Asuncion, 2010]
![Page 6: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/6.jpg)
UCI DATA SETS TODAY
No experiments Synthetic UCI Only UCI/synth
7%
39%37%
23%
ICML 2011 papers
![Page 7: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/7.jpg)
DATA SETS DISCONNECTED FROM MEANING
3.2 1.5 2.9
2.6 1.8 3.1
2.9 1.4 3.3
UCI today1.2 -3.2 8.
5
1.8 -2.7 7.9
0.9 1.3 8.2
0.1 0.8 4.7
0.3 0.7 4.9
-0.2 0.7 5.0
…
UCI initially …
“Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.” – UCI Mushroom data set page
Did you know that the mushroom data set has 3 classes, not 2?Have you ever used this knowledge to interpret your results on this data set?
![Page 8: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/8.jpg)
DATA SETS CAN BE USEFUL BENCHMARKS
1. Enable direct empirical comparisons with other techniques
• And reproducing others’ results
2. Easier to interpret results since data set properties are well understood
No standard for reproducibility
We don’t actually understand these data sets
The field doesn’t require any interpretation
Too often, we fail at both goals
![Page 9: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/9.jpg)
BENCHMARK RESULTS THAT MATTERShow me:
• Data set properties that permit generalization of results
• Does your method work on binary data sets? Real-valued features? Specific covariance structures?Overlapping classes?
4.6% improvement in detecting
cardiac arrhythmia? We could save lives!
96% accuracy in separating poisonous and edible
mushrooms? Not good enough for me to trust it!
OR
• How your improvement matters to the originating field
![Page 10: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/10.jpg)
2. METRICS DISCONNECTED FROM IMPACT• Accuracy, RMSE, precision, recall, F-measure, AUC, …
• Deliberately ignore problem-specific details
• Cannot tell us
• WHICH items were classified correctly or incorrectly?
• What impact does a 1% change have? (What does it mean?)
• How to compare across problem domains?
“The approach we proposed in this paper
detected correctly half of the pathological cases, with
acceptable false positive rates (7.5%),
early enough to permit clinical intervention.”
“A Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery”
by Warrick et al., 2010
This doesn’t mean accuracy, etc. are bad measures,just that they should not remain abstractions
![Page 11: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/11.jpg)
3. LACK OF FOLLOW-THROUGH
ML research program
This is hard!
ML publishing incentives
![Page 12: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/12.jpg)
CHALLENGES FOR INCREASING IMPACT
• Increase the impact of your work
1. Employ meaningful evaluation methods
• Direct measurement of impact when possible
• Translate abstract metrics into domain context
2. Involve the world outside of ML
3. Choose problems to tackle biased by expected impact
• Increase the impact of the field
1. Evaluate impact in your reviews
2. Contribute to the upcoming MLJ Special Issue (Machine Learning for Science and Society)
3. More ideas? Contribute to http://mlimpact.com/
![Page 13: Kiri Wagstaff Jet Propulsion Laboratory, California Institute of Technology July 25, 2012 Association for the Advancement of Artificial Intelligence CHALLENGES](https://reader035.vdocuments.us/reader035/viewer/2022070306/5517fc4c550346c6568b5024/html5/thumbnails/13.jpg)
MLIMPACT.COMhttp://mlimpact.com/