national science foundation evaluation of mixed initiative systems michael j. pazzani university of...

Nat

iona

l Sci

ence

Fou

ndat

ion

Evaluation of Mixed Initiative Systems

Michael J. PazzaniUniversity of California, IrvineNational Science Foundation

Nat

iona

l Sci

ence

Fou

ndat

ion Overview

• Evaluation– Micro-level: Modules– Macro-level: Behavior of System Users– Caution: Don’t lose sight of the goal in

evaluation• National Science Foundation

– CISE (Re)organization– Funding for Mixed Initiative Systems– Tip on writing better proposals: Evaluate

Nat

iona

l Sci

ence

Fou

ndat

ion Evaluation

• Micro level– Does the module (machine learning, user

modeling, information retrieval and visualization, etc,) work properly.

– Has been responsible for measurable progress in most specialized domains of intelligent systems

– Relatively easy to do using well known metrics, error rate, precision, recall, time and space complexity, goodness of fit, ROC curves

– Builds upon long history in “hard” sciences and engineering

Nat

iona

l Sci

ence

Fou

ndat

ion Evaluation

• Macro level– Does the complex system, involving a user and a

machine work as desired.– Builds upon history in human (and animal)

experimentation, not always taught in (or respected by) engineering schools

– Allows controlled experiments comparing two systems (or one system with two variations)

Nat

iona

l Sci

ence

Fou

ndat

ion Adaptive Personalization

Nat

iona

l Sci

ence

Fou

ndat

ion Micro: Evaluating the Hybrid User Model

Nat

iona

l Sci

ence

Fou

ndat

ion Micro: Speed to Effectiveness

0102030405060708090

100

1 2 3 4 5 6 7 8 9 10

Number of Sessions

Ben

efit

Initially, AIS is as effective as a static system in finding relevant content. After only one usage, the benefits of AdaptiveInfo's Intelligent Wireless Specific Personalization are clear; after three sessions even more so; and, after 10 sessions the full benefits of Adaptive Personalization are realized

Nat

iona

l Sci

ence

Fou

ndat

ion Macro: Probability a Story is Read

40% probability a user will read one of the top 4 stories selected by an editor, but a 64% chance they'll read one of the top 4 personalized stories - the AIS user is 60% more likely to select a story than a non-AIS user

Nat

iona

l Sci

ence

Fou

ndat

ion Macro: Increased Page Views

0255075

100125150

OriginalOrder

AdaptiveOrder

After looking at 3 or more screens of headlines, users read 43% more of the personally selected news stories; clearly showing AIS's ability to dramatically increase stickiness of a wireless web application

Nat

iona

l Sci

ence

Fou

ndat

ion Macro: Readership and Stickiness

0255075

100125150

Static Personalized 20% more LA Times users who receive personalized news return to the wireless site 6 weeks after the first usage.

Nat

iona

l Sci

ence

Fou

ndat

ion Cautions

– Optimizing a micro level evaluation may have little impact on the macro level. It may even have a counter-intuitive effect:

• If personalization causes a noticeable delay, it may decrease readership

– Don’t lose sight of the goal. • The metrics are just approximations of the

goal.• Optimizing the metric may not optimize the

goal.

Nat

iona

l Sci

ence

Fou

ndat

ion R&D within the NSF Organization

Directorate for Biological Sciences

Directorate for Computer and InformationSciences and Engineering

Directorate for Education and

Human Resources

Directorate for Engineering

Directorate for Geosciences

Directorate for Mathematical andPhysical Sciences

Directorate for Social, Behavioral

And Economic Sciences

Office of the Director

Nat

iona

l Sci

ence

Fou

ndat

ion

CISE Directorate: 2004

• Computing & Communications Foundations • Computer Networks & Systems• Information and Intelligent Systems (IIS)• Deployed Infrastructure

Nat

iona

l Sci

ence

Fou

ndat

ion Information and Intelligent Systems Programs

• Information and Data Management• Artificial Intelligence and

Cognitive Science• Human Language and Communication• Robotics and Computer Vision• Digital Society and Technologies• Human Computer Interaction• Universal Access• Digital Libraries• Science and Engineering Informatics

Nat

iona

l Sci

ence

Fou

ndat

ion Types of proposals/awards

• IIS Regular Proposal Deadlines 250-600K 3 yr 12/12

• CAREER Program (400-500K, 5 year) late July

• REU & RET supplements(10-30K 1 year) 3/1

• Information Technology Research (ITR) Probably Feb

Nat

iona

l Sci

ence

Fou

ndat

ion

NSF Merit Review Criteria

Looking for important, innovative, achievable projects

– Criterion 1: What is the intellectual merit and quality of the proposed activity?

– Criterion 2: What are the broader impacts of the proposed activity?

NSF will return proposal without review if the single page proposal summary does not address each criteria in separate statements

Evaluation Plan of both micro & macro levels is essential using metrics that you propose (and your peers believe are appropriate)

national science foundation evaluation of mixed initiative systems michael j. pazzani university of...

Documents