national science foundation evaluation of mixed initiative systems michael j. pazzani university of...
DESCRIPTION
National Science Foundation Evaluation Micro level –Does the module (machine learning, user modeling, information retrieval and visualization, etc,) work properly. –Has been responsible for measurable progress in most specialized domains of intelligent systems –Relatively easy to do using well known metrics, error rate, precision, recall, time and space complexity, goodness of fit, ROC curves –Builds upon long history in “hard” sciences and engineeringTRANSCRIPT
Nat
iona
l Sci
ence
Fou
ndat
ion
Evaluation of Mixed Initiative Systems
Michael J. PazzaniUniversity of California, IrvineNational Science Foundation
Nat
iona
l Sci
ence
Fou
ndat
ion Overview
• Evaluation– Micro-level: Modules– Macro-level: Behavior of System Users– Caution: Don’t lose sight of the goal in
evaluation• National Science Foundation
– CISE (Re)organization– Funding for Mixed Initiative Systems– Tip on writing better proposals: Evaluate
Nat
iona
l Sci
ence
Fou
ndat
ion Evaluation
• Micro level– Does the module (machine learning, user
modeling, information retrieval and visualization, etc,) work properly.
– Has been responsible for measurable progress in most specialized domains of intelligent systems
– Relatively easy to do using well known metrics, error rate, precision, recall, time and space complexity, goodness of fit, ROC curves
– Builds upon long history in “hard” sciences and engineering
Nat
iona
l Sci
ence
Fou
ndat
ion Evaluation
• Macro level– Does the complex system, involving a user and a
machine work as desired.– Builds upon history in human (and animal)
experimentation, not always taught in (or respected by) engineering schools
– Allows controlled experiments comparing two systems (or one system with two variations)
Nat
iona
l Sci
ence
Fou
ndat
ion Adaptive Personalization
Nat
iona
l Sci
ence
Fou
ndat
ion Micro: Evaluating the Hybrid User Model
Nat
iona
l Sci
ence
Fou
ndat
ion Micro: Speed to Effectiveness
0102030405060708090
100
1 2 3 4 5 6 7 8 9 10
Number of Sessions
Ben
efit
Initially, AIS is as effective as a static system in finding relevant content. After only one usage, the benefits of AdaptiveInfo's Intelligent Wireless Specific Personalization are clear; after three sessions even more so; and, after 10 sessions the full benefits of Adaptive Personalization are realized
Nat
iona
l Sci
ence
Fou
ndat
ion Macro: Probability a Story is Read
40% probability a user will read one of the top 4 stories selected by an editor, but a 64% chance they'll read one of the top 4 personalized stories - the AIS user is 60% more likely to select a story than a non-AIS user
Nat
iona
l Sci
ence
Fou
ndat
ion Macro: Increased Page Views
0255075
100125150
OriginalOrder
AdaptiveOrder
After looking at 3 or more screens of headlines, users read 43% more of the personally selected news stories; clearly showing AIS's ability to dramatically increase stickiness of a wireless web application
Nat
iona
l Sci
ence
Fou
ndat
ion Macro: Readership and Stickiness
0255075
100125150
Static Personalized 20% more LA Times users who receive personalized news return to the wireless site 6 weeks after the first usage.
Nat
iona
l Sci
ence
Fou
ndat
ion Cautions
– Optimizing a micro level evaluation may have little impact on the macro level. It may even have a counter-intuitive effect:
• If personalization causes a noticeable delay, it may decrease readership
– Don’t lose sight of the goal. • The metrics are just approximations of the
goal.• Optimizing the metric may not optimize the
goal.
Nat
iona
l Sci
ence
Fou
ndat
ion R&D within the NSF Organization
Directorate for Biological Sciences
Directorate for Computer and InformationSciences and Engineering
Directorate for Education and
Human Resources
Directorate for Engineering
Directorate for Geosciences
Directorate for Mathematical andPhysical Sciences
Directorate for Social, Behavioral
And Economic Sciences
Office of the Director
Nat
iona
l Sci
ence
Fou
ndat
ion
CISE Directorate: 2004
• Computing & Communications Foundations • Computer Networks & Systems• Information and Intelligent Systems (IIS)• Deployed Infrastructure
Nat
iona
l Sci
ence
Fou
ndat
ion Information and Intelligent Systems Programs
• Information and Data Management• Artificial Intelligence and
Cognitive Science• Human Language and Communication• Robotics and Computer Vision• Digital Society and Technologies• Human Computer Interaction• Universal Access• Digital Libraries• Science and Engineering Informatics
Nat
iona
l Sci
ence
Fou
ndat
ion Types of proposals/awards
• IIS Regular Proposal Deadlines 250-600K 3 yr 12/12
• CAREER Program (400-500K, 5 year) late July
• REU & RET supplements(10-30K 1 year) 3/1
• Information Technology Research (ITR) Probably Feb
Nat
iona
l Sci
ence
Fou
ndat
ion
NSF Merit Review Criteria
Looking for important, innovative, achievable projects
– Criterion 1: What is the intellectual merit and quality of the proposed activity?
– Criterion 2: What are the broader impacts of the proposed activity?
NSF will return proposal without review if the single page proposal summary does not address each criteria in separate statements
Evaluation Plan of both micro & macro levels is essential using metrics that you propose (and your peers believe are appropriate)