1 towards a testbed for evaluating learning systems in gaming simulators 21 march 2004 symposium on...

46
1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1. Objective 2. Specification 3. Intended functionality 4. Example of use 5. Status and Goals David W. Aha Intelligent Decision Aids Group Naval Research Laboratory, Code 5515 Washington, DC home.earthlink.net/~dwaha [email protected] (working with Matt Molineaux) TIEL T TIELT

Upload: sebastian-torres

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

1

Towards a Testbed for EvaluatingLearning Systems in Gaming Simulators

21 March 2004Symposium on Reasoning and Learning

Outline:1. Objective2. Specification3. Intended functionality4. Example of use5. Status and Goals

Outline:1. Objective2. Specification3. Intended functionality4. Example of use5. Status and Goals

David W. AhaIntelligent Decision Aids Group

Naval Research Laboratory, Code 5515Washington, DC

home.earthlink.net/[email protected]

(working with Matt Molineaux)

David W. AhaIntelligent Decision Aids Group

Naval Research Laboratory, Code 5515Washington, DC

home.earthlink.net/[email protected]

(working with Matt Molineaux)

TIELTTIELT

Page 2: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

2

Objective & Expected Benefits

Objective

Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.

Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.

Benefits

1. Reduce costs (time, $) to create these integrations• Costs are often prohibitive, encouraging isolated studies

2. Encourage research on learning in cognitive systems:• Embedded (e.g., process aware)• Rapid (i.e., few trials)• Knowledge-intensive• Enduring

3. Support analysis of alternative learning systems for a given task

1. Reduce costs (time, $) to create these integrations• Costs are often prohibitive, encouraging isolated studies

2. Encourage research on learning in cognitive systems:• Embedded (e.g., process aware)• Rapid (i.e., few trials)• Knowledge-intensive• Enduring

3. Support analysis of alternative learning systems for a given task

Page 3: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

3

DARPA’s Cognitive Systems Thrust

IPTO: Information Processing Technology Office• Director: Ron Brachman• Assistant Director: Barbara Yoon

History: IPTO has many impressive contributions• e.g., time-sharing, interactive computing, Internet• Long-term goal: human-computer symbiosis

IPTO: Information Processing Technology Office• Director: Ron Brachman• Assistant Director: Barbara Yoon

History: IPTO has many impressive contributions• e.g., time-sharing, interactive computing, Internet• Long-term goal: human-computer symbiosis

Current goal: Develop “cognitive” systems that know what they’re doing:

– can reason, using substantial amounts of appropriately represented knowledge– can learn from its experience to perform better tomorrow than it did today– can explain itself and be told what to do– can be aware of its own capabilities and reflect on its own behavior – can respond robustly to surprise

Current goal: Develop “cognitive” systems that know what they’re doing:

– can reason, using substantial amounts of appropriately represented knowledge– can learn from its experience to perform better tomorrow than it did today– can explain itself and be told what to do– can be aware of its own capabilities and reflect on its own behavior – can respond robustly to surprise

Page 4: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

4

IPTO’s View of a Cognitive Agent

External Environment External Environment

Communication(language,gesture,image)

Prediction,planning

Deliberative Processes

Reflective Processes

Reactive Processes

Perception Action

STM

Sensors Effectors

Other reasoning

LTM(KB)

Concepts

SentencesCog

nit

ive

Ag

en

t

Affect

Attention

Learning

Page 5: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

5

Learning Foci in Cognitive Systems(Langley & Laird, 2002)

Action preconditions

Plan adaptor

Action effects

Resource allocater

Information fuser

Decision application procedure

Decision selector, Conflict resolver

Explanation generation

Recall procedureRemembering & Reflection

Dialogue coordination

NL interpretationInteraction & Communication

Action utility

Action executerExecution & Action

Inferencing knowledge and procedures

Beliefs & belief relationsReasoning & Belief Maintenance

Plans, Plan generator (e.g., search method)Problem Solving & Planning

Monitoring focus

Environment modelPrediction & Monitoring

Situation categories, Situation categorizationPerception & Situation Assessment

Space of possible decisionsDecision Making & Choice

Categories, Pattern Categorizer

Patterns, Pattern recognizer Recognition & Categorization

Knowledge Container(s)Capability

Action preconditions

Plan adaptor

Action effects

Resource allocater

Information fuser

Decision application procedure

Decision selector, Conflict resolver

Explanation generation

Recall procedureRemembering & Reflection

Dialogue coordination

NL interpretationInteraction & Communication

Action utility

Action executerExecution & Action

Inferencing knowledge and procedures

Beliefs & belief relationsReasoning & Belief Maintenance

Plans, Plan generator (e.g., search method)Problem Solving & Planning

Monitoring focus

Environment modelPrediction & Monitoring

Situation categories, Situation categorizationPerception & Situation Assessment

Space of possible decisionsDecision Making & Choice

Categories, Pattern Categorizer

Patterns, Pattern recognizer Recognition & Categorization

Knowledge Container(s)Capability

Many opportunities for learning

Page 6: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

6

Typical Current Practice

Comparatively few machine learning (ML) efforts on cognitive systems: Isolated studies of learning algorithms Single-step, non-interactive tasks Knowledge-poor learning contexts

Comparatively few machine learning (ML) efforts on cognitive systems: Isolated studies of learning algorithms Single-step, non-interactive tasks Knowledge-poor learning contexts

Few of Today’s Cognitive Systems Support Realistic

Learning Capabilities

Cognitive System

ML System

ML System

Database

Page 7: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

7

Wanted: A new Interface(thanks to W. Cohen)

ML System

Database ML System

ML System

ML System

InterfaceDatabaseDatabaseDatabase

Curmudgeon’s Viewpoint: • This might encourage research on more challenging problems• But don’t count on it

Curmudgeon’s Viewpoint: • This might encourage research on more challenging problems• But don’t count on it

(e.g., UCI Repository)

Cognitive System

ML SystemInterface

Cognitive System

Cognitive System

ML System

Cognitive System

ML System

Cognitive System

ML System(e.g., TIELT?)

Page 8: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

8

Your Potential Uses of TIELT(Hastily Considered…and Reaching)

1. Randy Jones: Smart way to update rule conditions• Use: Updating game model’s tasks

2. Doug Pearson: Changing conditions on operators• Use: Controlling game agents

3. Prasad Tedapalli: Learning hierarchies in the game model• Use: Active development of a game model’s task hierarchy

4. Jim Blythe: Knowledge acquisition• Use: Acquiring game model constraints

5. Gheorghe Tecuci: Mixed-initiative learning for knowledge acquisition• Use: Active learning of task models, etc.

6. Karen Myers: Incorporating guidance from humans for agent control• Use: Learning agent controls (assuming players can provide direct feedback)

7. Barney Pell: Learning to play any of a category of games given their rules• Use: Hmm…agent control, if collaborating with a game model-updating system

8. Afzal Upal: Updating plan quality• Use: Induce task-specific control rules

9. Susan Epstein: Learning to solve (large) CSPs• Use: Reasoning with game model’s constraints

10. Frank Ritter: Recognition tasks (e.g., for strategies?)• Use: Learning opponent strategies

11. Dan Roth: Using multiple classifiers to solve problems• Use: Set of (coordinated) learning systems for problem solving

12. Ken Forbus: Analogical reasoning and companion cognitive systems• Use: Qualitative representation for game model, predicting human/agent intentions

13. Daniel Borrajo: Learning control knowledge for planning• Use: Incremental learning for agent control tasks

14. Niels Taatgen: Learning for real-time tasks• Use: Agent control, several RTS applications

Page 9: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

9

Outline (cont)

1. Objective2. Specification of a testbed

• Select a category of cognitive systems• Category-specific challenges

3. Intended functionality4. Example of use5. Status of project6. Goals for future work

1. Objective2. Specification of a testbed

• Select a category of cognitive systems• Category-specific challenges

3. Intended functionality4. Example of use5. Status of project6. Goals for future work

Page 10: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

10

ML System

Cognitive System

• Effects• State

• State • DecisionMessage passingAchieve a goal(s)Create a planSignificant• Temporal, qualitative, …Plan execution measures

Interface Comparison(e.g., for a supervised learning system)

(e.g., for a cognitive system involving planning)

Characteristics

1. Performance API• Input• Output

2. Learning API• Input• Output

3. Integration4. Performance task5. Learning task6. Domain knowledge

• Reasoning7. Evaluation

methodology

• -• Common data format

• Matches input format• -Data input moduleClassification Set wts., create tree, etc.-• -Acc., ROC curves, etc.

ML System

Database

Page 11: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

11

What type of Cognitive System?

Desiderata:

Candidate: Interactive Gaming SimulatorsCandidate: Interactive Gaming Simulators

1. Available implementations• Inexpensive to acquire and run

2. Pushes ML research boundaries• Challenging embedded learning tasks

3. Significant interest/excitement• Military, industry, academia, funding

1. Available implementations• Inexpensive to acquire and run

2. Pushes ML research boundaries• Challenging embedded learning tasks

3. Significant interest/excitement• Military, industry, academia, funding

Page 12: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

12

Gaming Genres(Laird & van Lent, 2001)

Control units and strategic enemy (i.e., other coach), commentator

Act as coach and a key player

Madden NFL Football

Team Sports

Control enemy1st vs. 3rd personIndividual competitionMany (e.g., driving games)

Individual Sports

Control unit goals and goal-achievement strategies

Control a simulated world & its units

SimCity, The Sims

God

Control all units and strategic enemies

God’s eye view, controls many units (e.g., tactical warfare)

Age of Empires, Warcraft, Civilization

Strategy

Control supporting characters

Linear vs. dynamic scripting

Player solves puzzles, interacting w/ others

King’s Quest, Blade Runner

Adventure

Control enemies, partners, and supporting characters

Solo vs. (massively) multi-player

Be a characterDiabloRole-Playing

Control enemies1st vs. 3rd person, solo vs team play

Control a characterQuake, UnrealAction

AI RolesSub-GenresDescriptionExampleGenre

Unfortunately,…reaction time and aiming skill are the most important factors in success in a first-person shooter game. Deep reasoning about tactics and strategy don’t end up playing a big role as might be expected. (van Lent et al., 2004)

Page 13: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

13

Real-Time Strategy (RTS) Games(Buro & Furtak, 2003)

Fundamental AI research problems

1. Adversarial real-time planning• Motivates need for abstractions of world state

2. Decision making under uncertainty• e.g., opponent intentions

3. Opponent modeling, learning• One of the biggest shortcomings of current RTS game AI

systems is their inability to learn quickly…Current ML approaches…are inadequate in this area.

4. Spatial and temporal reasoning5. Resource management6. Collaboration7. Pathfinding

1. Adversarial real-time planning• Motivates need for abstractions of world state

2. Decision making under uncertainty• e.g., opponent intentions

3. Opponent modeling, learning• One of the biggest shortcomings of current RTS game AI

systems is their inability to learn quickly…Current ML approaches…are inadequate in this area.

4. Spatial and temporal reasoning5. Resource management6. Collaboration7. Pathfinding

Page 14: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

14

Military: Learning in Simulators for Computer Generated Forces (CGF)

Purpose: Training (present) & planning (future)

• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units

• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units

Evidence of commitment: Some Claims

• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent

simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess

the ability to integrate learning” (Banks & Stytz, CCGBR’00)

• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent

simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess

the ability to integrate learning” (Banks & Stytz, CCGBR’00)

Status:

• No CGF simulator has been deployed with learning (D. Reece, 2003)• Problems: Performance (costly training), overtraining, behavioral

accuracy (e.g., learned behaviors may become unpredictable), constraint violations (learned behaviors do not follow doctrine), difficult to isolate the utility of learning (Petty, CGFBR’01)

• No CGF simulator has been deployed with learning (D. Reece, 2003)• Problems: Performance (costly training), overtraining, behavioral

accuracy (e.g., learned behaviors may become unpredictable), constraint violations (learned behaviors do not follow doctrine), difficult to isolate the utility of learning (Petty, CGFBR’01)

Page 15: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

15

Industry: Learning in Video and Computer Games

Focus: Increase sales via enhanced gaming experience

• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors

• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors

Status

• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)

• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system

• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)

• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system

Evidence of commitment

• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)

• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)

“A computer that learns is worth 10 Microsofts.” (B. Gates, 2004)

• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)

• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)

“A computer that learns is worth 10 Microsofts.” (B. Gates, 2004)

Page 16: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

16

Focus: Several research thrusts

Academia: Learning inInteractive Computer Games

Status: Publication options (specific to AI & gaming)

• AAAI symposia and workshops (several)AAAI’04 Workshop on Challenges in Game AI

• International Conference on Computers and Games• Journals: J. of Game Development, Int. Computer Games J.

• AAAI symposia and workshops (several)AAAI’04 Workshop on Challenges in Game AI

• International Conference on Computers and Games• Journals: J. of Game Development, Int. Computer Games J.

• Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server)Use (other) open source engines (e.g., FreeCiv, Stratagus)

• Representation (e.g., Forbus et al., 2001; Houk, 2004; Munoz-Avila & Fisher, 2004)

• Knowledge acquisition (e.g., Hieb et al., 1995)• Supervised learning of lower-level behaviors (e.g., Geisler, 2002)• Learning plans (e.g., Fasciano, 1996)• Learning opponent unit models (e.g., Laird, 2001; Hill et al., 2002)• Learning to provide advice (e.g., Sweetser & Dennis, 2003)• Learning hierarchical knowledge (e.g., van Lent & Laird, 1998)• Learning rule preferences (e.g., Ponsen, 2004)

• Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server)Use (other) open source engines (e.g., FreeCiv, Stratagus)

• Representation (e.g., Forbus et al., 2001; Houk, 2004; Munoz-Avila & Fisher, 2004)

• Knowledge acquisition (e.g., Hieb et al., 1995)• Supervised learning of lower-level behaviors (e.g., Geisler, 2002)• Learning plans (e.g., Fasciano, 1996)• Learning opponent unit models (e.g., Laird, 2001; Hill et al., 2002)• Learning to provide advice (e.g., Sweetser & Dennis, 2003)• Learning hierarchical knowledge (e.g., van Lent & Laird, 1998)• Learning rule preferences (e.g., Ponsen, 2004)

Page 17: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

17

Example integrations

Academia: Learning in Interactive Computer Games (cont.)

Name + Reference Game Engine Learning Approach Tasks(Goodman, AAAI’93) Bilestoad Projective visualization Fighting maneuvers

CAPTAIN (Hieb et al., CCGFBR’95)

ModSAF Multistrategy (e.g., version spaces)

Platoon placement

MAYOR (Fasciano, 96 Dept. of CS, U. Chicago TR 96-05)

SimCity Case-based reasoning City development

(Fogel et al., CCGFBR’96) ModSAF Genetic programming Tank movements

KnoMic (van Lent & Laird, ICML’98)

ModSAF Rule condition learning in SOAR

Aircraft maneuvers

(Geisler, 2002) Soldier of Fortune Multiple (e.g., boosting backprop)

FPS action selection

(Sweetser & Dennis, 2003) Tubby Terror Regression Advice generation

(Chia & Williams, BRIMS’03) TankSoar Naïve Bayes classification Tank behaviors

(Ponsen, 2004) Wargus/Stratagus Genetic algorithms (dynamic scripting)

Strategic rule selection

Page 18: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

18

Summary: Some Additional Challenges withEmbedding Learning in Gaming Simulators

1. Low cpu requirements (e.g., in real-time games)2. Constraining learned knowledge

• Must not violate expectations

3. Learning & reasoning (e.g., planning)4. Isolating learning contributions (for evaluation)

1. Low cpu requirements (e.g., in real-time games)2. Constraining learned knowledge

• Must not violate expectations

3. Learning & reasoning (e.g., planning)4. Isolating learning contributions (for evaluation)

Page 19: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

19

Specification for Integrating Learning Systems with Gaming Simulators

1. Simplifies integration!• Interests ML researchers• Interests game developers

2. Learning focus concerns at least three types of models:• Task (e.g., learn how to perform, or advise on, a task)• Player (e.g., learn a human player’s strategies)• Game (e.g., learn its objects, their relations & functions)

• State interpretation/abstraction3. Learning methods: A wide variety

• They should be able to output their learned behaviors for inspection (e.g., by game developers)

4. Game engines: Those with challenging learning tasks• i.e., large hypothesis spaces, knowledge-intensive

5. Supports reuse via modularity (to be at all feasible)• Abstracts interface definitions from game & task models

6. Free (unlike some similar commercial tools)• Preferably, open source

1. Simplifies integration!• Interests ML researchers• Interests game developers

2. Learning focus concerns at least three types of models:• Task (e.g., learn how to perform, or advise on, a task)• Player (e.g., learn a human player’s strategies)• Game (e.g., learn its objects, their relations & functions)

• State interpretation/abstraction3. Learning methods: A wide variety

• They should be able to output their learned behaviors for inspection (e.g., by game developers)

4. Game engines: Those with challenging learning tasks• i.e., large hypothesis spaces, knowledge-intensive

5. Supports reuse via modularity (to be at all feasible)• Abstracts interface definitions from game & task models

6. Free (unlike some similar commercial tools)• Preferably, open source

Page 20: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

20

Outline (cont)

1. Objective2. Specification of a testbed3. Intended functionality

• Interaction• Types of use• Open issues

4. Example of use5. Status and Goals

1. Objective2. Specification of a testbed3. Intended functionality

• Interaction• Types of use• Open issues

4. Example of use5. Status and Goals

Page 21: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

21EditorsEditors

TIELT

TIELTUser

Reasoning & Learning

System

Reasoning & Learning

SystemLearning System

#1

LearningSystem

#n

. . .

Knowledge BasesKnowledge Bases

GameModel

Description

Task Descriptions

GameInterface

Description

LearningInterface

Description

EvaluationMethodology Description

Displays Displays

PredictionDisplay

AdviceDisplay

EvaluationDisplayGame

Engine

GameEngine

Stratagus

FreeCiv

GamePlayer(s)

Learned Knowledge

TIELT: Testbed for Integrating and Evaluating Learning Techniques

Page 22: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

22

TIELT Knowledge Bases

GameModel

Description

Task Descriptions

GameInterface

Description

LearningInterface

Description

EvaluationMethodology Description

Defines communication processes with the game engine

Defines communication processes with the learning system

Defines interpretation of the game• e.g., objects, operators, behaviors model, tasks, initial state

Defines the selected learning and performance tasks• Selected from the game model description

Defines the empirical evaluation to conduct

Page 23: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

23

Example learning functionalities supported

Data Sources and Targeted Functionalities

1. Learning from observations (e.g., behavioral cloning)2. Active learning3. Learning from advice (requires inputs from user)4. Learning to advise5. …

1. Learning from observations (e.g., behavioral cloning)2. Active learning3. Learning from advice (requires inputs from user)4. Learning to advise5. …

Data sources

1. Game (world) model (possibly incomplete, incorrect)2. Simulator

• Passive state observations (e.g., behavioral cloning)• Active testing (e.g., apply an action in a state)

3. Humans• Advice

1. Game (world) model (possibly incomplete, incorrect)2. Simulator

• Passive state observations (e.g., behavioral cloning)• Active testing (e.g., apply an action in a state)

3. Humans• Advice

Page 24: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

24EditorsEditors

TIELT

TIELTUser

Reasoning & Learning

System

Reasoning & Learning

SystemLearning System

#1

LearningSystem

#n

. . .

Knowledge BasesKnowledge Bases

GameModel

Description

Task Descriptions

GameInterface

Description

LearningInterface

Description

EvaluationMethodology Description

Displays Displays

PredictionDisplay

AdviceDisplay

EvaluationDisplayGame

Engine

GameEngine

Stratagus

FreeCiv

GamePlayer(s)

Example TIELT Usage: Controlling a Game Character

Raw State Processed State

DecisionAction

Page 25: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

25EditorsEditors

TIELT

TIELTUser

Reasoning & Learning

System

Reasoning & Learning

SystemLearning System

#1

LearningSystem

#n

. . .

Knowledge BasesKnowledge Bases

GameModel

Description

Task Descriptions

GameInterface

Description

LearningInterface

Description

EvaluationMethodology Description

Displays Displays

PredictionDisplay

AdviceDisplay

EvaluationDisplayGame

Engine

GameEngine

Stratagus

FreeCiv

GamePlayer(s)

Example TIELT Usage: Advising a Game Player

Raw State Processed State

Decision,Reason

Advice, Explanation

Page 26: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

26EditorsEditors

TIELT

TIELTUser

Reasoning & Learning

System

Reasoning & Learning

SystemLearning System

#1

LearningSystem

#n

. . .

Knowledge BasesKnowledge Bases

GameModel

Description

Task Descriptions

GameInterface

Description

LearningInterface

Description

EvaluationMethodology Description

Displays Displays

PredictionDisplay

AdviceDisplay

EvaluationDisplayGame

Engine

GameEngine

Stratagus

FreeCiv

GamePlayer(s)

Example TIELT Usage: Predicting a Game Player’s Actions

Raw State Processed State

Prediction,Reason

Prediction, Explanation

Page 27: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

27EditorsEditors

TIELT

TIELTUser

Reasoning & Learning

System

Reasoning & Learning

SystemLearning System

#1

LearningSystem

#n

. . .

Knowledge BasesKnowledge Bases

GameModel

Description

Task Descriptions

GameInterface

Description

LearningInterface

Description

EvaluationMethodology Description

Displays Displays

PredictionDisplay

AdviceDisplay

EvaluationDisplayGame

Engine

GameEngine

Stratagus

FreeCiv

GamePlayer(s)

Example TIELT Usage: Updating a Game Model

Raw State Processed State

Edit

Edit

Page 28: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

28EditorsEditors

TIELT

TIELTUser

Reasoning & Learning

System

Reasoning & Learning

SystemLearning System

#1

LearningSystem

#n

. . .

Knowledge BasesKnowledge Bases

GameModel

Description

Task Descriptions

GameInterface

Description

LearningInterface

Description

EvaluationMethodology Description

Displays Displays

PredictionDisplay

AdviceDisplay

EvaluationDisplayGame

Engine

GameEngine

Stratagus

FreeCiv

GamePlayer(s)

Learned Knowledge

Example TIELT Usage: Building a Task/Player Model

Raw State Processed State

Model

Page 29: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

29

Game Developer

Intended Use Cases

1. Define/store game engine interface2. Define/store game model3. Select learning system & interface4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results

1. Define/store game engine interface2. Define/store game model3. Select learning system & interface4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results

ML Researcher

1. Define/store learning system interface2. Select game engine & interface3. Select game model4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results

1. Define/store learning system interface2. Select game engine & interface3. Select game model4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results

Repository of learning systems, learning interface descriptions

Repository of game engines, game interface descriptions

Repository of game engines, game interface descriptions

Page 30: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

30

Some Open Questions

1. Game Model: • What representation?

• STRIPS operators?• Hierarchical task networks?• Explicit constraints?

• How to communicate it to the learning system?• Should it instead be maintained in the learning system?

2. What standards for:• Game engine message passing• Learning system message passing• Output format for learned knowledge

3. Support both on-line and off-line studies?4. What representations for advice and explanations?5. How to explicitly represent & apply constraints on learned

knowledge?6. How to evaluate TIELT’s utility?

1. Game Model: • What representation?

• STRIPS operators?• Hierarchical task networks?• Explicit constraints?

• How to communicate it to the learning system?• Should it instead be maintained in the learning system?

2. What standards for:• Game engine message passing• Learning system message passing• Output format for learned knowledge

3. Support both on-line and off-line studies?4. What representations for advice and explanations?5. How to explicitly represent & apply constraints on learned

knowledge?6. How to evaluate TIELT’s utility?

Page 31: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

31

Outline (cont)

1. Objective2. Specification of a testbed3. Intended functionality

• Interaction• Types of use• Open issues

4. Example of use• Demonstration of initial GUI• Simple “city placement” task

5. Status and Goals

1. Objective2. Specification of a testbed3. Intended functionality

• Interaction• Types of use• Open issues

4. Example of use• Demonstration of initial GUI• Simple “city placement” task

5. Status and Goals

Page 32: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

32

Outline (cont)

1. Objective2. Specification of a testbed3. Intended functionality

• Interaction• Types of use• Open issues

4. Example of use5. Status and Goals

1. Objective2. Specification of a testbed3. Intended functionality

• Interaction• Types of use• Open issues

4. Example of use5. Status and Goals

Page 33: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

33

Status and Goals

TIELTSpecification

TIELT (Initial GUI)

Matt Molineaux

Page 34: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

34

2. ORTS (Open Real-Time Strategy (RTS)) project:• Open source RTS game engine

• Free• Flexible game specification (via scripts)• Hack-free server-side simulation• Open message protocol: Players have total control

• Prefer ORTS to Stratagus?

2. ORTS (Open Real-Time Strategy (RTS)) project:• Open source RTS game engine

• Free• Flexible game specification (via scripts)• Hack-free server-side simulation• Open message protocol: Players have total control

• Prefer ORTS to Stratagus?

Status and Goals: Recent Influences

1. Full Spectrum Command (van Lent et al., 2004)• Multiple AI systems, one game engine

1. Full Spectrum Command (van Lent et al., 2004)• Multiple AI systems, one game engine

3. Collaboration with Lehigh University (Asst. Prof. H. Muñoz-Avila)• Extended Hierarchical Task Network (HTN) process

representation for the Game Model’s tasks?• Fall 2004 PhD candidate: First to integrate ML with Stratagus• Fall 2004 student: Will develop Game Models for us

3. Collaboration with Lehigh University (Asst. Prof. H. Muñoz-Avila)• Extended Hierarchical Task Network (HTN) process

representation for the Game Model’s tasks?• Fall 2004 PhD candidate: First to integrate ML with Stratagus• Fall 2004 student: Will develop Game Models for us

Page 35: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

35

Conclusion

Objective

Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.

Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.

Status

• Started 12/03, effectively• Initial GUI implementation• Many open research questions

• Started 12/03, effectively• Initial GUI implementation• Many open research questions

Goals

• 9/04: First complete implementation• Incrementally integrate with game engines, learning systems• Document & publicize for use to gain ML interest• Subsequently, seek military/industry interest

• 9/04: First complete implementation• Incrementally integrate with game engines, learning systems• Document & publicize for use to gain ML interest• Subsequently, seek military/industry interest

And game-developer community? Other research communities?

Page 36: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

36

Backup Slides

Page 37: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

37

Goal: Wargaming testbed for the machine learning community–Explore learning techniques in the

context of today’s latest simulations & video games

–Facilitate exploration of strategies and “what if” scenarios

–Provide common platform for evaluating different learning techniques

New Learning Techniques

Development Environment

Video Wargaming

Testbed

API

TIELT: Initial Vision(DARPA, 11/13/03)

Technical Approach: Enable insertion of learning/KA techniques into state-of-the-art video combat & strategy games–Create API for integrating learning into selected video games

• e.g., comm. module, socket interface, client-server comms protocol & language

–Create API that enables learning in computer generated forces (CGF) tools

Page 38: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

38

Functionality: Supervised learning using a passive dataset

API for Isolated Studies

Performance (Classifier): • Task: Classification• Interface:

Input: NoneOutput: Common access format (across all tasks & datasets)

Learning: • Task: Varies (e.g., tree, weight settings) • Interface:

Input: Data instance or set– Common format (across all tasks & systems)

Output: Classification decision

Performance (Classifier): • Task: Classification• Interface:

Input: NoneOutput: Common access format (across all tasks & datasets)

Learning: • Task: Varies (e.g., tree, weight settings) • Interface:

Input: Data instance or set– Common format (across all tasks & systems)

Output: Classification decision

Page 39: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

39

API for Cognitive Learning

Functionality: Learning by doing/being told/observation/etc.

Performance (Cognitive System):• Task: Varied (e.g., planning, design, diagnosis, … , classification)• Interface:

Input: ActionOutput: Current state

Learning: • Task: Varies (e.g., rule application conditions) • Interface:

Input: Processed current stateOutput: Decision

Performance (Cognitive System):• Task: Varied (e.g., planning, design, diagnosis, … , classification)• Interface:

Input: ActionOutput: Current state

Learning: • Task: Varies (e.g., rule application conditions) • Interface:

Input: Processed current stateOutput: Decision

Page 40: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

40

Role Focus State-of-the-Art AI NeedsTactical Enemies

Challenge human player Cheats, Scripts using FSMs, Path planning, expert systems

Situation assessment, user modeling, spatial & temporal reasoning, planning, plan recognition, learning

Partners Cooperation & coordination w/ human

Scripted responses to specific commands

Speech recognition, NLP, gesture recognition, user modeling, adaptation

Support Characters

Guide/interact with human Canned responses NL understanding & generation, path planning, coordination

Strategic Opponents

Develop high-level strategy, allocate resources. & issue unit-level commands

Cheating, etc. Integrated planning, commonsense reasoning, spatial reasoning, plan recognition, resource allocation

Units Carry out-high level commands autonomously

FSMs and path planning

Commonsense reasoning & coordination

Commentators Observe and comment on game play

NL generation, plan recognition

NL generation, plan recognition

Commercial Game Roles for AI(Laird & van Lent, 2001)

Page 41: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

41

TIELT

Display Display

Editors Editors

EvaluationMethodology

Game InterfaceEditor

Percepts

UserLearning Interface

EditorGame Model

EditorTask

Editor

GameModel

Description

Task Descriptions

Pe

rf.

Ta

sk

Display DisplayEvaluation

Display

Evaluator

ActionTranslator(Mapper)

Learning OutputsActions

LearningSystem(s)

LearningSystem(s)

System#1

System#2

System#n

. . .

Translated Model (Subset)

Learning Task

GameInterface

Description

LearningInterface

Description

LearningTranslator(Mapper)

Controller

CurrentState

ModelUpdater

Database

EvaluationSettings

StoredState

AdviceDisplay Database

EngineStateGame

Engine

GameEngine

Stratagus

FreeCiv

Page 42: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

42

TIELT

Editors Editors

Game InterfaceEditor

Sensors

UserGame Model

Editor

GameModel

Description

Up

da

tes

GameInterface

Description

ActionTranslator

Actions

GameEngine

GameEngine

CurrentState

1

2

4 3

4

In Game Engine, game begins and the colony pod is created and placed.

1

The Game Engine sends a “See” sensor message stating where the pod is.

The message template provides updates to the Game Model Description, which tell the Current Model that there is a pod at the location See describes.

4

2

The Model Updater receives the sensor message and finds the corresponding message template in the Game Interface Description.

3

ControllerModel

Updater

3

The Model Updater notifies the Controller that the See action event has occurred.

5

5

1. Sensing the Game State

Page 43: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

43

TIELT

Editors Editors

UserLearning Interface

EditorTask

Editor

Task Descriptions

LearningTranslator

Translated Model (Subset)

LearningInterface

Description

ActionTranslator

Learning Outputs

The Controllor notifies the Learning Translator that it has received a See message.

The Learning Translator finds a city location task which is triggered by the See message. It queries the controller for the learning mode, then creates a TestInput message to send to the learning system with information on the pod’s location and the map from the Current State.

The Learning System(s) transmit output to the Action Translator.

The Learning Translator transmits the TestInput message to the appropriate Learning System(s).

12

2 23

Controller

LearningSystem(s)

LearningSystem(s)

System#1

System#2

System#n

. . .

CurrentState

StoredState

2. Fetching Decisions fromthe Learning System

1

4

2

3

4

Page 44: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

44

TIELT

Editors Editors

Game InterfaceEditor

User

ActionTranslator

Actions

GameEngine

GameEngine

CurrentState

1

2

4

The Action Translator receives a TestOutput message from a Learning System.

The Action Translator finds the TestOutput message template and determines that it is associated with the city location task, and builds a MovePod operator (defined by the Current State) with the parameters of TestOutput.

The Game Engine receives Move and updates the game to move the pod toward its destination, or

The Action Translator determines that the Move Action from the Game Interface Description is triggered by the MovePod Operator and binds Move using information from MovePod, then sends Move to the Game Engine. Learning Interface

Editor

2

3

GameInterface

Description

LearningInterface

Description

Display Display

AdviceDisplay

3

The Advice Display receives Move and displays advice to a human player on what to do next.

51

4

2

3

3. Acting in the Game World

5

Page 45: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

45

TIELT

Editors Editors

TaskEditor

Task Descriptions

Model

Ev

alu

ati

on

Display DisplayEvaluation

Display

EvaluatorCurrent

State

The Evaluator is triggered by the Controller according to a trigger from the Evaluation Settings.

1

The Evaluator obtains performance metrics from each Task and calculates them on the Current State.

2

The Evaluator sends the new metrics values to the Evaluation Display, which updates with the new information.

3

2 2

3

Controller1

4. Displaying an Operation to the User

Page 46: 1 Towards a Testbed for Evaluating Learning Systems in Gaming Simulators 21 March 2004 Symposium on Reasoning and Learning Outline: 1.Objective 2.Specification

46

TIELTWhen in Record mode, the Controller triggers the Database Engine when the state updates.

1

The Database Engine records the Current State in a Database for later use.

2

Later, in Playback mode, the Controller triggers the Database Engine after the Learning System indicates readiness.

3

LearningTranslator(Mapper)

Controller

CurrentState

Database

StoredState

DatabaseEngine

State

12

2

3

The Database Engine then queries a Database and retrieves a Stored State.

4

Finally, the Controller notifies the Learning System that an update has arrived and to query the Stored State for message info.

5

4

4

55

5. Retrieving States from a Database