curriculum vitae - inspiring innovationdean3/dwprelim.pdf · curriculum vitae name: dean earl...

80
Curriculum Vitae Name: Dean Earl Wright III Permanent Address: 422 Shannon Court, Frederick, MD 21701 Degree and date to be conferred: Doctor of Philosophy, May 2009. Date of Birth: 27 November 1954. Place of Birth: La Rochelle, France. Secondary Education: McCluer High School, Florisant, Missouri. Previous Degrees: Hood College, Master of Business Administration, 2005 Hood College, Master of Science (Computer Science), 2001 Hood College, Bachelor of Science (Computer Science), 1998 Frederick Community College, Associate in Arts (Business Administration), 1993 Professional publications: Michael L. Anderson, Matt Schmill, Tim Oates, Don Perlis, Darsana Josyula, Dean Wright and Shomir Wilson. Toward Domain-Neutral Human-Level Metacognition. In Proceedings of the 8th International Symposium on Logical Formalizations of Commonsense Reasoning, pages 1–6, 2007. M. Schmill, D. Josyula, M. Anderson, S. Wilson, T. Oates, D. Perlis, D. Wright and S. Fults. Ontologies for Reasoning about Failures in AI Systems. In Proceedings of the Workshop on Metareasoning in Agent-Based Systems, May 2007 Michael L. Anderson, Scott Fults, Darsana P. Josyula, Tim Oates, Don Perlis, Matthew D. Schmill, Shomir Wilson, and Dean Wright. A Self-Help Guide For Autonomous Systems. To appear in AI Magazine. 2007. Professional positions held: CRW/Logicon/Northrop Grumman, January 1985–July 2007 Scientific Time Sharing Corporation (STSC), January 1977–January 1985

Upload: vanhanh

Post on 28-Aug-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Curriculum Vitae

Name: Dean Earl Wright III

Permanent Address: 422 Shannon Court, Frederick, MD 21701

Degree and date to be conferred: Doctor of Philosophy, May 2009.

Date of Birth: 27 November 1954.

Place of Birth: La Rochelle, France.

Secondary Education: McCluer High School, Florisant, Missouri.

Previous Degrees:

Hood College, Master of Business Administration, 2005Hood College, Master of Science (Computer Science), 2001Hood College, Bachelor of Science (Computer Science), 1998Frederick Community College, Associate in Arts (Business Administration), 1993

Professional publications:

Michael L. Anderson, Matt Schmill, Tim Oates, Don Perlis, Darsana Josyula, DeanWright and Shomir Wilson. Toward Domain-Neutral Human-Level Metacognition.In Proceedings of the 8th International Symposium on Logical Formalizations ofCommonsense Reasoning, pages 1–6, 2007.

M. Schmill, D. Josyula, M. Anderson, S. Wilson, T. Oates, D. Perlis, D.Wright and S. Fults. Ontologies for Reasoning about Failures in AI Systems. InProceedings of the Workshop on Metareasoning in Agent-Based Systems, May2007

Michael L. Anderson, Scott Fults, Darsana P. Josyula, Tim Oates, Don Perlis,Matthew D. Schmill, Shomir Wilson, and Dean Wright. A Self-Help Guide ForAutonomous Systems. To appear in AI Magazine. 2007.

Professional positions held:

CRW/Logicon/Northrop Grumman, January 1985–July 2007

Scientific Time Sharing Corporation (STSC), January 1977–January 1985

Page 2: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

ABSTRACT

Title of Thesis: Reengineering the Metacognitive Loop

Dean Earl Wright III, Ph.D. Computer Science, 200x

Dissertation directed by: Tim Oates, ProfessorDepartment of Computer Science andElectrical Engineering

The field of Artificial Intelligence has seen steady advances in cognitive systems. But

many of these systems perform poorly when faced with situations outside of their training

or in an dynamic environment. This brittleness is a major problem in the field today.

Adding metacognition to such systems can improve their operation in the face of per-

turbations. The Metacognitive Loop (MCL) (Anderson et al. 2006) works with a host

system, monitoring its sensors and expectations. When a failure is indicated, MCL advises

the host system on corrective actions.

Past implementations of MCL have been hand crafted and tightly integrated into their

host systems. MCL is being reengineered to provide a C language API and to do Bayesian

inference over a set of indication, failure, and response ontologies. These changes will

allow MCL to be used with a wide variety of systems.

To prevent brittleness within MCL itself several items need to be addressed. MCL

must be able to resolve host system failures when there is more than one indication of the

failure or when a second indication occurs while MCL is attempting to help the host system

recover from the failure. MCL also needs the ability of MCL to monitor itself and improve

its own operation.

A twenty month plan is proposed to enhance MCL as described and to measure (1) the

effectiveness of MCL in improving the operation of the host system; (2) MCL’s operational

efficiency in terms of additional computational resources required and (3) the effort needed

to incorporate MCL into the host system.

Page 3: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Reengineering the Metacognitive Loop

by

Dean Earl Wright III

Thesis submitted to the Faculty of the Graduate Schoolof the University of Maryland in partial fulfillment

of the requirements for the degree ofDoctor of Philosophy

2009

Page 4: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

c© Copyright Dean Earl Wright III 2007

Page 5: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Chapter 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Metacognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Metacognitive Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Assess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.3 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 3 TECHNICAL APPROACH . . . . . . . . . . . . . . . . . . . 12

3.1 MCL Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Indications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.2 Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.3 Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.4 Intra-ontology linkages . . . . . . . . . . . . . . . . . . . . . . . . 22

Page 6: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

3.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Support for Reasoning Over Time . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Recursive Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 C Language Application Program Interface . . . . . . . . . . . . . . . . . 29

3.6 Extending the API to Multiple Languages . . . . . . . . . . . . . . . . . . 29

Chapter 4 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.1 Effectiveness in Improving Host System Operation . . . . . . . . . 31

4.1.2 Additional Computational Resources Required . . . . . . . . . . . 31

4.1.3 Implementation Effort . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.4 Breadth of Deployment . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2 Evaluation Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Chippy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Windy Grid World . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.3 WinBolo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Chapter 5 PRELIMINARY RESULTS . . . . . . . . . . . . . . . . . . . 45

5.1 Grid World Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 BoloSoar Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3 Chippy Agent With Ontology-Based MCL . . . . . . . . . . . . . . . . . . 51

Chapter 6 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . 57

6.1 Pre-ontology Metacognitive Loop . . . . . . . . . . . . . . . . . . . . . . 57

6.2 Early Ontology Metacognitive Loop . . . . . . . . . . . . . . . . . . . . . 58

6.3 Model-Based Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Page 7: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

6.4 Multi-agent Metacognition . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Chapter 7 FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.1 Automatic Expectation Generation . . . . . . . . . . . . . . . . . . . . . . 62

7.2 Automatic Ontology Expansion/Linking . . . . . . . . . . . . . . . . . . . 63

7.3 Application to Multi-agent systems . . . . . . . . . . . . . . . . . . . . . . 64

7.4 Transferring Learning with MCL Networks . . . . . . . . . . . . . . . . . 64

7.5 Modeling dynamic environments . . . . . . . . . . . . . . . . . . . . . . . 64

Chapter 8 TIMETABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8.1 Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8.2 Monthly Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Page 8: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

LIST OF FIGURES

2.1 Metacognitive Monitoring and Control . . . . . . . . . . . . . . . . . . . . 7

2.2 Software agent interactions with the environment . . . . . . . . . . . . . . 7

2.3 Software agent with metacognition . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Host systems with MCL support . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Ontologies as additional metaknowledge . . . . . . . . . . . . . . . . . . . 14

3.2 Ontological Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Divergence Nodes in the Indications Ontology . . . . . . . . . . . . . . . . 17

3.4 Failure Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.5 Response Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.6 Example Ontology Connections . . . . . . . . . . . . . . . . . . . . . . . 23

3.7 Conditional probability tables for portion of MCL response ontology . . . . 24

3.8 Reentrant MCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.9 Expectations arranged in single (a) and multiple groups (b) . . . . . . . . . 26

3.10 MCL providing metacognitive services to MCL . . . . . . . . . . . . . . . 28

4.1 A 8x8 “Chippy” grid world with two rewards . . . . . . . . . . . . . . . . 33

4.2 A Chippy policy after 1,000 moves . . . . . . . . . . . . . . . . . . . . . . 34

Page 9: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

4.3 A Chippy policy after 5,000 moves . . . . . . . . . . . . . . . . . . . . . . 34

4.4 A Chippy policy after 1,000,000 moves . . . . . . . . . . . . . . . . . . . 35

4.5 Q-Learning in the Chippy Grid World . . . . . . . . . . . . . . . . . . . . 36

4.6 Q-Learning before and after perturbation . . . . . . . . . . . . . . . . . . . 36

4.7 Q-Learning with exploration rate set to 0 after policy learned . . . . . . . . 37

4.8 The windy grid world with both single and double offsetting columns . . . 38

4.9 A 15 step path from the start to the goal . . . . . . . . . . . . . . . . . . . 39

4.10 The two moves that lead to the goal and the seven squares that can not be

entered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.11 SARSA exploration of the Windy Grid World showing cumulative moves

over multiple episodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.12 A Windy grid world policy learned by SARSA after 170 episodes . . . . . 40

4.13 Windy grid world optimum policy with Q values. The best path from the

start to the goal is underlined. . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.14 A WinBolo console with menus, actions, primary and secondary displays. . 42

5.1 Effect on Chippy perturbation recovery of varying the learning rate . . . . . 47

5.2 Effect on Chippy perturbation recovery of varying the exploration rate . . . 48

5.3 Effect on Chippy perturbation recovery of varying the discount rate . . . . . 49

5.4 WinBolo tank outside small maze . . . . . . . . . . . . . . . . . . . . . . 50

Page 10: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

5.5 C code to add WinBolo status information to Soar’s input structure . . . . . 51

5.6 Soar rules to land tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.7 C++ code to initialize MCL interface for Chippy . . . . . . . . . . . . . . 53

5.8 C++ code to define the sensors for Chippy using the MCL API . . . . . . . 54

5.9 C++ code to define the expectations for Chippy using the MCL API . . . . 55

5.10 C++ code for Chippy to implement suggestions from MCL . . . . . . . . . 55

5.11 Average Rewards per Step for Chippy with and without MCL monitoring . 56

6.1 Monitoring a multi-agent system with a sentinel is isomorphic to using

metacognition with a single cognitive agent. . . . . . . . . . . . . . . . . . 61

Page 11: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

LIST OF TABLES

1.1 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Ontologies used with NAG cycle . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Indication Ontology Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Sensor, Divergence and Core Node Indications Ontology Linkages . . . . . 18

3.4 Concrete Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 MCL Platforms and Languages . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2 Wind Speed and Directions in the Seasonal Windy Grid World . . . . . . . 41

5.1 Rewards for Chippy after perturbation with varying Q-Learning rates . . . . 47

6.1 Monitor and control in MCL and Sentinels . . . . . . . . . . . . . . . . . . 60

8.1 Monthly Timetable for MCL Reengineering . . . . . . . . . . . . . . . . . 67

Page 12: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 1

INTRODUCTION

The field of Artificial Intelligence has seen steady advances in cognitive systems. AI

has acquitted itself in one area after another, theorem proving, game playing, machine

learning, and more. While advances in computer speeds and memory sizes have certainly

helped, the majority of the achievements have come from new algorithms and experience

in applying them.

Without detracting anything from AI’s accomplishments, many of the cognitive sys-

tems perform poorly when faced with situations outside of their training or in an dynamic

environment. A robot trained to search out a dark blue goal may or may not detect a light

blue one. A robotic car trained to drive on American roads will likely be a danger to it-

self and others if transported to a country with left hand driving rules. A switch between

instruments reading in miles to kilometers may doom a spacecraft.

The transition between laboratory training and successful real-world operation re-

mains a major challenge. To cope with the possible future encounters, additional rules

and/or more training can be used but this increases the cost and lengthens the time be-

tween conception and deployment. Alternatively, greater ability can be given to the agent

to explore, learn, and reason about its environment but this too raises the manufacture and

operational costs.

1

Page 13: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

2

Adding more capabilities to an agent to cope with possible perturbations increases the

complexity of the agent and, except during periods of perturbation, may decrease perfor-

mance. Adding metacognition to such systems can improve their operation in the face of

such perturbations without sacrificing performance during normal operations. Only when

a problem has been noticed does the problem correction correction code need to be active.

Metacognition can monitor an agent’s performance and invoke corrective action only when

needed.

The Metacognitive Loop (MCL) is a metacognitive approach that has been applied to

a variety of systems (Anderson & Perlis 2005). It works with a host system, monitoring its

sensors to ensure that they are within expected levels. As long as all the expectations are

met, MCL remains purely in a monitoring state allowing the agent to function unimpeded

by extraneous problem-handling code.

When a failure is indicated by a violation of one or more expectations, MCL exam-

ines the violations, determines possible failures, and advises the host system on potential

corrective actions. The agent implements the corrective action (e.g., removing a recent rule

from the KB, adjusting the exploration parameter, re-evoking a previous procedure). The

agent then continues to operate, leaving the performance evaluation tasks to MCL.

Past implementations of MCL have been hand-crafted and tightly integrated into their

host systems. MCL had direct access to the agent’s sensors. The failure of expectations was

tied directly to corrective actions. The logic of the agent and MCL were separate but their

implementations were intertwined. These implementations were successful in showing that

an MCL-enhanced agent performed better in a dynamic, perturbed situation than an agent

without such support.

To allow MCL to be implemented with a larger number of systems, MCL is being

reengineered to provide an Application Programming Interface (API). This will provide a

clean separation between MCL and the host system. The API will be developed for the

Page 14: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

3

C language and then (with the use of the open source SWIG package) extended to other

programming languages.

But, more than just having a new facade, MCL is being reengineered internally to do

Bayesian inference over a set of indication, failure, and response ontologies. The three

ontologies and the links between them capture the knowledge of how problems manifest

themselves and what the appropriate correction actions are. Using Bayesian inference over

probabilities on inter- and intra-ontology links, the most likely failure and the most effective

corrective action can be determined.

By having an API that can be used regardless of the implementation language of the

host (and internal processes that are generalized and probabilistic per above), a wide variety

of systems will be able to incorporate MCL for metacognitive monitoring and control.

MCL will no longer have to be implemented in the agent’s programming language and

intertwined with the agent’s processing with every new use of MCL requiring MCL to be

re-implemented. Now a single implementation can be created, leveraging the MCL/host-

system division made possible by the API and the computational power inherent in the

Bayesian augmented ontologies.

The reengineering of MCL will also allow some brittleness problems within MCL

itself to be addressed. When a system fails to achieve its goal, one or more expectations

may have been violated. Multiple expectation violations may indicate that there is more

than one problem or only a single problem. Likewise, multiple problems can manifest

themselves in a single expectation violation so a single violation is not a guarantee of a

single problem. Being able to disambiguate problem failures from expectation violations is

expected to come as a benefit from the new ontologies and the use of Bayesian inference.

A second problem that will be addressed within the new MCL framework is that of

reentrant invocations. That is the ability to resolve host system failure when a second

exception occurs while MCL is attempting to help the host system recover from an earlier

Page 15: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

4

exception. To effectively guide the host system we need to know if this second exception

is an indication of the same problem as the first failure or a new problem. The ontologies

and Bayesian inference will provide part of the answer but MCL must also be able to

maintain (create, update and eventually discard) a context record that allows it to reason

about failures over time.

Just as MCL can improve the operation of a host system, MCL should also be able

to assist itself through a recursive invocation. The meta-MCL would provide the same

services and facilities that MCL provides to a host system: that of monitoring sensors

for exceptions and suggesting corrective action when an exception has been violated. In

the case of meta-MCL, the sensors would be MCL’s internal performance metrics and the

corrective actions would be to adjust MCL’s ontologies and conditional probability tables.

While it may not be possible to create an AI agent that can function optimally in all

situations, metacognition in the form of the Metacognitive Loop can be efficiently applied

to many types of AI architectures to improve their performance in dynamic environments.

The new MCL will be used with several systems (some which have been used with the older

version of MCL and some new domains). The performance of the MCL-enhanced systems

versus the base versions will be compared to show the effectiveness of MCL. Source lines

of code will be used as a measure of the implementation cost of MCL. CPU, elapsed time,

and memory usage will be used to show that the cost of adding MCL is nominal.

In summary, I propose to improve the operation of AI agents by enhancing and ex-

tending the existing MCL. Changes will be made to improve generality and the robustness

of MCL. The contributions of this research are listed in Table 1.1.

Page 16: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

5

Table 1.1. Research contributions

Category ChangesGenerality Ontologies

Bayesian inferenceCross platform/language API

Robustness Reentrant invocationRecursive invocation

Other Metrics and evaluation

Page 17: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 2

BACKGROUND

This section describes metacognition for use with computer system and the particular

implementation, the Metacognitive Loop (MCL), whose extensions will be the basis for the

contributions of the proposed research.

2.1 Metacognition

The philosophical origins of metacognition may be traced to the dictum of “know thy-

self.” Metacognition is studied as part of developmental and other branches of psychology.

While there are several different approaches, one common model is a cognitive process

which is monitored and controlled by a metacognitive process as in Figure 2.1. Metacogni-

tion can be studied in conjunction with metaknowledge (knowledge about knowledge) and

metamemory (memory about memory) (Cox 2005).

The canonical depiction of a software agent (figure 2.2) has sensors to perceive the

environment and activators with which the agent tries to control it. Metacognition can

be layered onto a software agent so that the metacognitive process monitors and controls

the cognitive process of the software agent as shown in figure 2.3, with metamemory and

metaknowledge.

Metacognition improves the performance of the agent in the environment by providing

6

Page 18: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

7

FIG. 2.1. Metacognitive Monitoring and Control

FIG. 2.2. Software agent interactions with the environment

Page 19: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

8

FIG. 2.3. Software agent with metacognition

two control functions. The first is to inform the agent when a cognitive task (e.g. the

selection of the next action to perform) has been satisfactorily achieved so that the agent

can move on to another task (such as performing the selected action). For some agents, the

cognitive sufficiency test is built into the cognitive process itself. For example, the cognitive

task may be limited to selecting (based on a specified estimated utility) between a fixed

number of choices. In such a case there is no opportunity for metacognitive intervention.

The second metacognitive control function is to reflect on the performance of the

agent. This can be done at the completion of a successful task, but it is most often per-

formed after a failure. The metacognitive process evaluates the decisions made by the

agent and determines where an alternative selection would have been more appropriate or

it may suggest a change to the agent’s current cognitive state such as invoking a learning

module. Reflection can also be done on a continuous basis, allowing small deviations from

Page 20: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

9

the expected to trigger controls to prevent future failures.

2.2 Metacognitive Loop

The purpose of the metacognitive loop (MCL) is to improve the operation of the host

system by dealing with unexpected events (Anderson & Perlis 2005). It does this by adding

a metacognitive layer to the host that is concerned with monitoring the operation of the

host system, and taking corrective action when it detects a problem. Figure 2.4 shows a

cognitive host system with MCL.

FIG. 2.4. Host systems with MCL support

MCL consists of three phases that implement its megacognitive knowledge about

problem detection, fault isolation, and corrective action for cognitive agents. These three

Page 21: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

10

phases correspond to the process often used by humans where (1) we notice that something

is not working, (2) make decisions about it (whether the problem is important, how likely it

is to get worse in the future, if it is fixable, etc.) and then (3) implement a response based on

the decisions that were made (ignore the problem, ask for help, attempt to fix the problem

using trial-and-error, etc.)

2.2.1 Note

The MCL process starts with the Note phase that provides the host system with a "self-

awareness" component. MCL monitors the host system to detect a difference between

expectations and observations. An anomaly occurs when an expectation is violated. An

expectation is a statement about the allowable values for a sensor. Statements such as “the

mixing vat temperature will not exceed 170 degrees” and “the flow in the coolant pipe will

be between 80 and 90 gallons per second” are expectations made about external sensors.

Anomalies can also be about internal host processes such as “a new plan will be generated

no more than 5 seconds after a new subgoal has been made the current subgoal.” When

the sensor information is at odds with the expected values, an anomaly is noted and MCL

moves to the assess state.

2.2.2 Assess

In the Assess state, MCL attempts to determine the cause of the problem that led to the

anomaly and the severity of the problem. The computation done in this phase need not be

excessive. Indeed it is the philosophy of MCL that lightweight, efficient problem analysis

is better than ignoring the problem, attempting to design out every conceivable problem, or

to attempt to model and monitor large portions of the world. In some implementations of

MCL this phase is almost nonexistent with a direct connection between the exception and

the corrective action.

Page 22: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

11

2.2.3 Guide

The third state of MCL is Guide, where MCL attempts to guide the host system back

to proper operation but offering a suggestion as to what action(s) will return the sensor

values to be within the limits set by the expectations. The suggestions available in this

phase vary depending on the features of the host system.

Once the suggestion has been made, MCL returns to the exception monitoring state.

Any new exceptions will cause MCL to again enter the Note, Assess, and Guide phases of

the NAG cycle.

2.2.4 Example

A Mars rover tasked with exploring geological formations on the red planet also has

to manage power consumption. When a low battery alarm causes the rover initiate a return

to the recharging station, it plans a path avoiding known obstacles. The path leads it over a

dust field, while not an obstacle as such, it requires addition motive power. The additional

power consumption drains the rover’s battery, ending its mission.

If the rover had an MCL component, the additional power consumption would have

have been noted as an indication of a problem. An assessment would have been made with

a response of re-planning the path to the recharging station with the dust field now classified

as an obstacle.

Page 23: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 3

TECHNICAL APPROACH

The Metacognitive Loop has been shown to be effective in lessening the problem of

brittleness in cognitive systems. MCL was added to those systems as a customized en-

hancement. Each one slightly different to correspond to the implementation language and

target machine of the cognitive system. The Note, Assess, and Guide steps were also tai-

lored to the domain and host system. While this approach works fine on a small scale

to make MCL available for cognitive systems, in general, it will require an implementa-

tion that works for a variety host systems in different domains implemented in different

languages for different machines.

Several areas of MCL will need enhancing to provide the benefits of metacognition

as a general service. This work will require both research and system engineering efforts.

The research areas include:

• Using Bayesian inference over Indications, Failure and Response ontologies for

metacognitive reflection;

• Reasoning over time about multiple error indications;

• Using metacognition to improve the metacognitive process itself.

The system engineering effort will be to provide MCL services in a package that can be

12

Page 24: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

13

easily used across many different implementation environments.

3.1 MCL Ontologies

For MCL to serve as a general purpose tool for the brittleness problem for cognitive

systems, it must be able to perform its Note, Assess, and Guide phases without needing

extensive tailoring for each domain. MCL should be able to reason using mainly abstract,

domain-neutral concepts to determine why a system is failing and how to cope with the

problem. To support this, three ontologies were created (one for each phase of the NAG

cycle). These ontologies are additional metaknowledge for MCL as shown in figure 3.1.

Each of the three ontologies is used by a different phase of MCL (see Table 3.1). The

Indications ontology is used in the Note phase when sensor input shows that an expection

has been violated. The Assess phase uses the Failure ontology to determine likely causes

of the violated expections. Once likely causes of the failure have been identified, the Guide

phase uses the Response ontology to determine appropriate responses to the failure.

Table 3.1. Ontologies used with NAG cycle

Phase OntologyNote Indications

Assess FailureGuide Responses

Elements within each ontology are linked to others in the ontology to show an “is-

a” relationship. For example, the “sensor not responding” node in the Failure ontology is

connected to the “sensor failure” node to show that “sensor not responding” is a type of

Page 25: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

14

FIG. 3.1. Ontologies as additional metaknowledge

“sensor failure.” Elements in one ontology may also be linked to elements in a different

ontology to show a possible “cause-and-effect” or “problem-solution” relationship. The

general pattern of ontology linkage is shown in Figure 3.2. This figure also shows how

the expectations are linked to the indications ontology elements and the elements of the

response ontology lead to suggestions that MCL gives to the host system.

The sensors and expectations are part of the “concrete” realm of the host system.

Processing by MCL moves from the concrete expectations to the abstract indication, failure,

and response ontologies, and then back to the concrete suggestions implemented by the host

system. Figure 3.2 shows the division between the concrete and abstract processing. The

next sections expand on this process, going into each of the three ontologies in greater

detail.

Page 26: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

15

FIG. 3.2. Ontological Linkages

3.1.1 Indications

The Indications ontology is comprised of three types of nodes (See Table 3.2). The

purely abstract indication nodes support concepts that cross multiple domains. These make

up the core of the indications ontology. These nodes represent concepts such as “deadline

missed”, “failed to change state”, and “reward not received”.

The sensor nodes of the indications ontology model the sensors of the host system and

their attributes. When the sensors of the host system are defined, sensor nodes are added to

the indications ontology. Additional nodes are added to the ontology for the expectations

for the values of the sensors.

The third set of nodes in the indications ontology forms a linkage from the concrete

sensor and expectations nodes and the abstract, core nodes of the ontology. The divergence

nodes define how expectations can be violated. Figure 3.3 shows these nodes and the rela-

Page 27: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

16

Table 3.2. Indication Ontology Nodes

Type NodeCore deadlineMissed

rewardNotReceivedresourceOverflowresourceDeficitfailedStateChangeunanticipatedStateChangeassertedControlUnchanged

Sensor statecontroltemporalresourcerewardmessageambientobjectPropspatialcriticalnoncriticaldiscreteordinalmaintenanceeffect

Divergence divergenceaberrationcwa-violationcwa-decreasecwa-increasebreakout-lowbreakout-highmissed-targetmissed-unchangedshort-of-targetlong-of-targetoverunderlate

Page 28: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

17

tionships between them. The three free standing nodes (over, late, and under) are not part

of the divergence tree structure but like the other divergence nodes can be used to further

define the exception. In the lower left nodes “cwa” stands for Closed World Assumption.

divergence

aberation

missedtarget

cwaviolation

cwadecrease

cwaincrease

breakoutlow

breakouthigh

over

under

late

missedunchanged

long oftarget

short oftarget

FIG. 3.3. Divergence Nodes in the Indications Ontology

It is the violation of expectations that starts the MCL NAG cycle. The type of violation

and the type of sensor are linked together to a core indications ontology node. Table 3.3

Page 29: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

18

shows sensor and divergence nodes linked to core nodes.

Table 3.3. Sensor, Divergence and Core Node Indications Ontology Linkages

Core Node Sensor Node Divergence NodeDeadline missed temporal lateReward not received reward underResource overflow resource overResource deficit resource underFailed state change state missed-unchangedUnanticipated state change state aberrationAsserted control unchanged control missed-unchanged

3.1.2 Failures

Once the violated expectations have been evaluated in the Note phase, MCL proceeds

to evaluate the problem indications to determine the cause in the Assess phase. The Failure

ontology is used in the problem determination. This phase is used (rather than mapping

indications directly to responses) because of the ambiguous nature of failures and their in-

dications: two different failures which need different responses might have the same initial

problem indications and a single problem might manifest itself with multiple indications.

The Failure ontology (Figure 3.4) is a catalog of the various problems that befall cog-

nitive systems. This includes problems with sensors, effectors, resources, and the domain

model (or models). The links in the failure ontology are all of the is-a variety. Thus a “sen-

sor malfunction” is-a “sensor error” is-a “knowledge error” is-a “failure”. The “failure”

node is the root of the Failure ontology and all Failure ontology nodes eventually lead to it.

Page 30: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

19

Proceduremode error

Misfiterror

Failure

Knowledgeerror

Resourceerror

MPAerror

Badparameter

Modelerror

Lack ofresource

ResourceProperty

Resourcecost

Predictivemode error

Timingmodelerror

effectorerror

Sensorerror

effectornoise

Sensornoise

Effectorfailure

SensorMalfunction

underfiterror

Overfiterror

FIG. 3.4. Failure Ontology

Page 31: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

20

3.1.3 Responses

As the Failure ontology was an itemized list of everything that can go wrong with a

cognitive system, the Response ontology (Figure 3.5) is a list of everything that can be done

about it. There are two types of nodes in the Response ontology: abstract and concrete.

The abstract nodes represent general problem-solving techniques and the concrete nodes

represent specific suggestions that MCL can send to the host system. Table 3.4 lists the

concrete responses and the abstract nodes that directly link to them. The remaining links

within the response ontology are for “is-a” relationships. For example, “Strategic Change”

is-a “System Response” is-a “Internal Response” which is-a “Response”. The “Response”

node is the root of the Response ontology.

Table 3.4. Concrete Responses

Concrete Node Abstract NodeSolicit suggestion Ask for helpRelinquish control Ask for helpRun sensor diagnostic Run diagnosticRun effector diagnostic Run diagnosticActivate Learning Modify Predictive ModelRebuild Models Modify Predictive ModelAdjust Parameters Modify Procedure ModelRevisit Assumptions Modify Procedure ModelRevise Expectations Modify AvoidAlgorithm Swap Strategic ChangeChange HLC Strategic ChangeTry Again System

Page 32: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

21

System

Try Again

StrategicChange

ChangeHLC

AlgorithmSwap

Response

External Internal

PlantAsk forHelp

Solicitsuggestion

Relenquishcontrol

Testhypothesis

Rundiagnostic

Sensordiagnostic

Effectordiagnostic

Modifyknowledge

Modifycope

Modifypredict Modify

procedure

Activatelearning Adjust

parameter

Rebuildmodels

Revisitassumptions

Amendcontroller

Modifyavoid

Reviseexpectation

FIG. 3.5. Response Ontology

Page 33: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

22

3.1.4 Intra-ontology linkages

The three ontologies (Indications, Failure, and Response) are connected by intra-

ontology links. Core nodes in the Indications ontology connect to nodes in the Failure

ontology. Many nodes of the Failure ontology are connected to nodes in the Response

ontology. The linkages form a chain of reasoning from the violated expectation to a sug-

gestion that may correct the problem.

Figure 3.6 shows such a path for a Q-learner faced with a dynamic grid world domain

where the rewards have been moved. Note that this is a very simplified diagram with most

of the nodes and links removed. When the Q-learner moves to the grid square that no longer

contains the expected reward, the expectation of getting the reward in that square is vio-

lated. This activates the “Reward not received” node in the Indication ontology. That node

is connected (via an intra-ontology link) to the “Model error” node of the Failure ontology.

The “Model error” node has two children, “Procedure model error” and “Predictive model

error”. The “Predictive model error” node has an intra-ontology link to the “Modify pre-

dictive response” node of the Response ontology. The “Modify predictive response” has a

child node of “Rebuild model response” that is a concrete node for generating the “Rebuild

model” suggestion. This set of inter- and intra-ontology linkages allows reasoning from

the failed expectation of obtaining a reward to rebuilding the Q-learner’s Q-table.

3.2 Bayesian Inference

For each problem indication we want to be able to determine the most likely cause or

causes of the failure. For each failure we want to be able to determine which responses are

the most likely to correct the failure. Thus given the sensors and the violated expectations

we would like to find the responses with the highest probability of working. (Actually

we want to find the response with the highest utility but for the this discussion we will

Page 34: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

23

FIG. 3.6. Example Ontology Connections

let all the costs be the same and focus just on the probabilities). The three ontologies

and their inter-ontology linkages (which form a directed graph) can be viewed as a Bayes

net. Direct observation can be made of the sensors. By associating conditional probability

tables (CPTs) with each node in the three MCL ontologies we can use Bayesian inference

to compute the needed probabilities for the responses. Figure 3.7 shows the addition of

CPTs to a small section of the Response ontology.

The Bayesian inference will be implemented using the Intel contributed open source

Probabilistic Network Library (PNL) available at https://sourceforge.net/projects/openpnl/.

Page 35: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

24

FIG. 3.7. Conditional probability tables for portion of MCL response ontology

3.3 Support for Reasoning Over Time

Errors can occur either once or multiple times. When the reward for a Q-Learner is

moved, the learner’s policy will drive it to repeatedly enter the square that used to contain

the reward. If there is an expectation that the square will give a positive reward, that expec-

tation will be repeatedly violated. But all of these violations are an indication of the same

problem: that the reward has been moved.

Even if MCL was invoked on the first occurrence of the unexpected reward and Q-

Learner immediately adjusted the learning and/or exploration rates in response, the old

reward square would still be visited several times while learning the new policy. Thus,

MCL needs to be able to associate multiple exceptions with the same error even while

recovering from the initial exceptions.

Page 36: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

25

The recovery process itself may give rise to additional errors. Following MCL’s sug-

gestions to increase the exploration rate, a Q-learner may experience a longer interval be-

tween rewards. Assigning additional resources to recover from a problem in one area may

cause a scarcity of resources in another triggering a resource deficiency exception. This ex-

ception and the original one should both be considered by MCL when determining further

corrective suggestions for the host cognitive system.

To correctly assess multiple indications, MCL needs to remember what exception vi-

olations it has seen in the past and what suggestions were provided as recommended re-

sponses to those exceptions. Figure 3.8 shows the addition of previous exception violation

information as part the meta-knowledge of the enhanced Metacognitive Loop.

FIG. 3.8. Reentrant MCL

The mclFrame is the data structure for holding the context information that will be

used by the enhanced MCL to allow reasoning over time. It consists of the MCL ontologies

Page 37: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

26

with the calculated probabilities (plus a few other pieces of information).

One mclFrame will be created for each exception violation. Multiple frames can be

merged in the Guide phase of MCL if the frames are determined to represent the same

problem. This will be done by comparing the probabilities associated with each of the

nodes in the Failure ontology.

To provide an organized method of retaining and using mclFrames, each expectation

will be associated with an expectation group. An expectation group can hold zero, one,

or more expectations. Expectation groups can have a parent group so that hierarchies can

be created. Figure 3.9 shows expectations arranged in a single expectation group and then

in a hierarchy. Grouping expectations by the host systems’ functional categories should

provide better problem resolution.

a b

FIG. 3.9. Expectations arranged in single (a) and multiple groups (b)

Page 38: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

27

An mclFrame can be associated with each expectation group to provide a memory of

past violations for reasoning about errors over time. This is done by including the proba-

bilities retained in the mclFrame of an expectation group when calculating the probabilities

of any of the group’s violated expectations.

3.4 Recursive Invocation

MCL is a cognitive AI system like the host system it monitors. Like that host system

it receives perceptions from the environment and acts upon those perceptions to attempt

to change the environment. But, while the host system is situated in the real world (or a

simulation of it), MCL’s environment is the host system. Whatever constitutes the host

system’s environment, MCL is only aware of the shadows of that environment as projected

by the exceptions and sensors of the host system. MCL’s actuators are the suggestions it

passes to the host system.

Like any other AI system, MCL is susceptible to perturbation when its environment

(the host system) changes unexpectedly. And like any other AI systems, we should be able

to improve performance of MCL in times of perturbation by invoking MCL to note the

problem indications, assess those indications, and guide MCL to a solution.

Figure 3.10 shows a meta-MCL monitoring the operation of an MCL that is moni-

toring a cognitive agent. The environment of the meta-MCL is the MCL agent and the

environment of the cognitive agent forms the meta-environment of the meta-MCL.

mclFrames provide the mechanism for handling multiple exceptions in MCL and they

also serve the same function in the recursive use of MCL. The mclFrames of the meta-

MCL are separate and distinct from the mclFrames used with the exceptions and exceptions

groups of the MCL monitoring the cognitive agent.

While allowing MCL to monitor itself should improve its operation over time, there

Page 39: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

28

FIG. 3.10. MCL providing metacognitive services to MCL

is also the possibility of introducing major problems.

Excessive resource consumption If the recursive MCL monitors many expectations or

these expectations are written so that they are often violated, the recursive MCL

could use a large of resources (i.e. memory and CPU). Recursive MCL should be

limited or done only as low priority task when resources or not needed elsewhere.

Infinite regress If MCL can be used to improve MCL, then MCL can be used to improve

that MCL and so on. It is expected (but by no means proved) that layering on more

and more MCLs will be subject to diminishing returns. MCL regression will be

capped at one level.

Destructive changes Having MCL making changes to MCL could be described as let-

ting a surgeon cut on his own brain. Changes should be limited in scope to prevent

Page 40: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

29

catastrophic failure.

3.5 C Language Application Program Interface

The original MCL implementations were hand-crafted and tightly bound to the host

system. To provide metacognitive support for a variety of systems, a clean delineation is

needed between the host system and the metacognitive monitor. An initial API has been

created for C++ (the language in which the new MCL is being constructed). This interface

will be used as the basis for a C language API. This will actually require only a few changes

to the C++ API but will allow more applications to use MCL.

The C language API (as well as the C++ API) will be documented with examples of

its use.

3.6 Extending the API to Multiple Languages

The open source Simplified Wrapper and Interface Generator project (www.swig.org)

provides API generators that take a C (or C++) language API. These generators create

APIs for more than a dozen languages (Allegro CL, C#, Chicken, Guile, Java, Modula-3,

Mzscheme, OCAML, Perl, PHP, Python, Ruby, and Tcl).

In addition to the APIs created using SWIG, an API that will allow use of MCL with

the SOAR general cognitive architecture (sitemaker.umich.edu/soar/home). SOAR has a

large community of interest with its own newsletters and conferences. Giving them easy

access to MCL will allow many applications to receive the benefits of metacognitive mon-

itoring and control.

These APIs will be documented with examples of their use.

Page 41: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 4

METHODOLOGY

The primary measure for the success of the proposed research is how well the en-

hanced MCL improves the performance of the host system. This chapter describes the

method of testing for that and other criteria, as well as the problem domains that will be

used in the testing.

4.1 Evaluation Criteria

The hypotheses driving this research proposal are (1) that MCL augmented by on-

tologies and Bayesian inference provides cognitive systems a solution to the brittleness

problem when the environment changes in unexpected ways; (2) that MCL is efficient,

both in terms of operating costs and in the effort to add MCL to the host system; and (3)

that the MCL solution is broadly applicable across a variety of domains, implementation

languages, and operating systems.

In this section the evaluation criteria for each of the hypotheses are presented. The

next section will describe the problem domains that will be used in testing.

30

Page 42: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

31

4.1.1 Effectiveness in Improving Host System Operation

To evaluate the effectiveness of MCL in improving operation of the host system, base,

optimized, and MCL-enhanced versions of the host systems will be compared. The host

systems used in the evaluation are grid world reenforcement learners (described in sec-

tions 4.2.1 and 4.2.2) and the Bolo tank simulation (section 4.2.3). Average Reward will be

used on periodic grid worlds while number of steps to goal will be used with episodic grid

worlds. For Bolo domain the time to complete the task will be used.

The base measurement will be done without any tuning of the cognitive system to the

domain. This base value will be used to measure the improvement when the host system

is optimized (e.g. for Q-learning selecting the best alpha and epsilon values). The MCL-

enhanced system will also be compared to the base system to make sure that it does indeed

improve performance. It will then will be compared to the optimized system to see if MCL

improves the system beyond what would normally be done for a system. To determine if

any improvement is statistically significant, the unpaired t test will be used.

4.1.2 Additional Computational Resources Required

The CPU, wall clock, working set size, and partition size will be measured for the

base, optimized, and MCL-enhanced versions of the host systems. The unpaired t test will

be used to determine if any additional resource usage of the MCL system is nominal or

significant.

4.1.3 Implementation Effort

The number of source lines of code will be counted for the base, optimized, and MCL-

enhanced versions of the host systems. Many of the lines that needed to be added to support

MCL are the same (or nearly the same) in most implements. The number of these “boiler

Page 43: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

32

plate” lines will be counted separately from the custom code.

4.1.4 Breadth of Deployment

The metacognitive loop is to be available for multiple platforms for multiple computer

languages. Table 4.1 shows the platforms and languages that MCL will be tested on to

ensure that MCL can be widely used.

The TBD entries for OS X and Solaris reflect the dependency on the Intel originated

Bayesian Inference Library: PNL. While it is hoped that the PNL library will work on

non-Intel processors, this has not yet been attempted.

Table 4.1. MCL Platforms and Languages

platform C C++ Java Python SoarWindows Yes Yes Yes Yes Yes

Linux Yes Yes Yes YesMax OS X TBD TBD TBD TBD TBD

Solaris TBD TBD TBD TBD

4.2 Evaluation Domains

Several cognitive system will be augmented with the enhanced MCL to demonstrate

increased resilience to brittleness due to changes in the environment. These include do-

mains investigated in the initial MCL literature and new domains.

4.2.1 Chippy

The chippy grid world (Anderson et al. 2006) is an 8 by 8 square matrix as shown in

figure 4.1. An agent (in this case a chipmunk) can move in the four cardinal directions from

Page 44: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

33

square to square. An attempt to move off the board from one of the edge squares leaves

you in the same square. The lower left (R1) and upper right (R2) squares provide rewards

and transports the agent (chipmunk) to the opposite corner. The agent starts in one of the

center squares and continues to move (and occasionally transport) until the simulation is

stopped.

R2

Start

R2R2

StartStart

R1R1R1

FIG. 4.1. A 8x8 “Chippy” grid world with two rewards

With R1 = 10 and R2 = -10, Figure 4.2 shows‘ the policy learned by a Q-Learner in a

chippy grid world after 1,000 moves. Since only 2 of the 64 squares contain a reward the

Q-Learner makes many moves (on average about 98) before even seeing the first reward

so that learning can begin. By about 5,000 moves, an optimal policy has been learned

(Figure 4.3) which achieves a reward of 10 every 14 moves. Note that a large portion of the

grid remains unexplored. Even after a million moves (Figure 4.4), it is possible that some

squares may never be visited.

Page 45: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

34

109.9?

8.4?

6.3?

3.4?

1?

7.5�

2.2?

1.1�

0?

6�

0?

-�

-10

FIG. 4.2. A Chippy policy after 1,000 moves

10-13?

12?

11?

9.3?

6.6?

4.9?

3?

13�

9.2?

10�

9.5�

4.3?

0.92�

1.4�

1.1�

3.4�

8.5�

7.3?

1.9-

4.3-

0.17�

66

7.7�

6.9?

6.2?

5.6?

5?

4.46

6.2�

5.1?

3.2-

4.5�

0.15�

3.76

4.1�

3.2�

3.7�

2.4�

-10�

FIG. 4.3. A Chippy policy after 5,000 moves

Page 46: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

35

106-13?

12?

11?

9.5?

8.5?

7.7?

6.9?

13�

12?

11?

9.5?

8.5?

7.7?

6.9?

6.2�

12�

11?

9.5?

8.5?

7.7?

6.9?

6.2?

5.6?

11�

9.5?

8.5?

7.7?

6.9?

6.2?

5.6?

5?

7.76

8.5�

7.7?

6.9?

6.2?

5.6?

5?

4.5?

6�

6.26

6.9�

6.2?

5.6?

5?

4.5?

4.1?

6.2�

5.6?

5?

4.5?

4.1?

3.7?

2.8�

4.16

4.5�

4.1?

3.7?

-10?

FIG. 4.4. A Chippy policy after 1,000,000 moves

Perturbation Perturbation is introduced into the Chippy Grid World by swapping

the values of the two goal squares (R1 and R2). Using R1 = 10 and R2 = -10 as above,

Figure 4.5 shows the average rewards earned by a Q-Learner in the Chippy grid world.

With the standard parameters (α = 0.5, γ = 0.9, ε = 0.05), Q-Learning produces a policy

that converges in about 5000 steps as can be seen by the flattening of the curve. From that

point onward it gets a reward of 10 every 14 steps plus an occasional exploratory move.

If this were to remain a static world, the exploration rate could be lowered to zero and

we could achieve a slightly higher average reward (7.1). Keeping some exploration proves

very useful if the world changes. In figure 4.6 the rewards of (10, -10) are changed to (-10,

10) in step 10,000. The Q-Learner continues to make adjustments to its policy based on

the rewards received and eventually achieves a new policy with a reward of 10 every 14+

moves. If, however, we turn off exploration once the optimum policy is learned to get the

extra reward once the perturbation occurs, the best we can do is learn a very sub-optimal

Page 47: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

36

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5000 10000 15000 20000

Learning Rate = 0.5Discount Rate = 0.9Exploration = 0.05

X axis = StepsY axis = Average Reward

FIG. 4.5. Q-Learning in the Chippy Grid World

(but at least positive) policy as show in figure 4.7.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5000 10000 15000 20000

What we learned in

5,000 steps

Takes almost

10,000 to relearn

FIG. 4.6. Q-Learning before and after perturbation

4.2.2 Windy Grid World

The windy grid world (Sutton & Barto 1998) is a 7×10 matrix as shown in Figure 4.8.

An agent can move in the four cardinal directions from square to square. Attempting to

move off the board from one of the edge squares leaves you in the same square. The

starting (0,3) and goal (7,3) squares are labeled “S” and “G” respectively. Each move

receives a reward of −1 until the goal is reached. Movement is affected by a “northerly

Page 48: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

37

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 5000 10000 15000 20000

Turing off exploration increases

rewards once the optimum policy has

been learned

But not after a perturbation, no

exploration means a very long time to learn

the new optimum policy

FIG. 4.7. Q-Learning with exploration rate set to 0 after policy learned

wind” that offsets movement one or two squares upward (as indicated by the single and

double arrows.)

Figure 4.9 shows a path from the start to the goal and demonstrates how the winds

offset movement. Movement right (east) from the start is unaffected until point a is reached.

If there was no wind then another east move should go to b but instead the movement is

to c. Another eastern action (with a northernly shift) moves to d. Here the wind is even

stronger, causing a two space shift. Moving east from d goes to e which is only a single

upward shift but it is limited by the edge of the world. The path to the goal continues east

to the upper right square f . Now, unopposed by the wind, four southern actions lead to

square g. No wind alters the westward movement to h. A second westward action (with a

northern offset) gets to the goal. This path (which is the shortest possible) takes 15 moves.

Unless placed on an edge, the goal in a grid world should be accessible from the four

cardinal directions. The wind offset alters the spaces that lead to the goal. Figure 4.10

shows the two squares that lead to the goal and the direction that leads there. It also marks

(with an “X”) the seven squares that cannot be reached.

Figure 4.11 shows the number of episodes completed increase (the length of the path

taken from S to G decreasing) by a SARSA reinforcement learner. Figure 4.12 shows the

Page 49: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

38

policy at the end of the 170 episodes. Figure 4.13 shows the optimum policy for the Windy

Grid World. Goal, near goal, and inaccessible squares should all be discernable in any

reasonably complete policy learned for the windy grid world.

S G

6 6 6 66 66 6

FIG. 4.8. The windy grid world with both single and double offsetting columns

Perturbation Perturbation can be added to the Windy Grid World by changing the

strength and direction of the “winds”. The Seasonal Windy Grid World varies the winds

according to a fixed repeating pattern given in Table 4.2. The normal Windy Grid World is

the “summer” season with strong winds from the south. The “winter” season reverses the

direction of those winds. The “spring” and “fall” seasons only have unit one winds where

the Windy Grid World has its unit two winds. The two ’equinox’ seasons have no winds.

The number of steps that the world is in each season is called the rotational speed.

4.2.3 WinBolo

WinBolo (Morrison 2006) is a networked simulation game that has multiple players,

alone or in teams, driving tanks on an island world (Figure 4.14). The players explore the

Page 50: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

39

S G

6 6 6 66 66 6

- - -��

���

���

��

��- - -

?

?

?

?�@@

@Ia b

cd

e f

gh

FIG. 4.9. A 15 step path from the start to the goal

S G

6 6 6 66 66 6

X X X XX X

X W�S?

FIG. 4.10. The two moves that lead to the goal and the seven squares that can not beentered

Page 51: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

40

0 2000 4000 6000 8000 10000

0

20

40

60

80

100

120

140

160

180

Windy World SARSA Learning

Time Steps

Epi

sode

s

FIG. 4.11. SARSA exploration of the Windy Grid World showing cumulative moves overmultiple episodes

-14.8- -14?

- -13.2- -12.4- -1.496

-1.96�

-15.3- -14.6?

- -13.4- -12.2?

- -11.6?

- -1.76

?-1.93� -2.81�

-16.16

?-15.3

?

- -14.3- -13.4- -11.5?

- -10.8- -0.998?

-1� -2.23�

-16.7?

- -15.7- -14.7- -13.7- -12.4- -10.8- -9.98- -4.92� -3.24?

-16.1- -15.5- -14.7- -13.8?

-12.96

?-11.6- -9.87- -4.52

?-76

-4.03?

-15.7- -15.16

-14.3- -13- -13.16 -11.96

-10.86

-9.436

-7.76- -5.1?

-15.46

?-14.9

6-14.3

6- -13.4- -11.8- -10.5- -9.38- -8.96- -8.02- -6.56?

FIG. 4.12. A Windy grid world policy learned by SARSA after 170 episodes

Page 52: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

41

-15- -14- -13- -12- -26

-3�

-15- -14- -13- -12- -11- -2?

-2� -36�

-15- -14- -13- -12- -11- -10- -1?

-1� -2�

-15- -14- -13- -12- -11- -10- -9- -3- -3?

-15- -14- -13- -12- -11- -10- -9- -8- -6- -4?

-15- -14- -13- -12- -11- -10- -9- -8- -7- -5?

-15- -14- -13- -12- -11- -10- -9- -8- -7- -6?

FIG. 4.13. Windy grid world optimum policy with Q values. The best path from the startto the goal is underlined.

Table 4.2. Wind Speed and Directions in the Seasonal Windy Grid World

Season Strength DirectionSummer Strong SouthEqunox None

Fall Weak NorthWinter Strong NorthEqunox NoneSpring Weak South

Page 53: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

42

FIG. 4.14. A WinBolo console with menus, actions, primary and secondary displays.

island, capture resources, and attack other players’ tanks. In tournament play, the goal is to

have the tank capture and hold the most resources in a time-limited contest. WinBolo was

derived from Bolo, a MAC 68K game that was, in turn, inspired by an older (two player)

video game. Each WinBolo player runs a copy of WinBolo that connects to a WinBolo

server (running on either Windows or Linux). In single-player games, the same Windows

machine is both the client and the server.

A player uses the keyboard to drive the tank, turning left (O) and right (P), speeding

up (Q), slowing down (A), shooting (space), and laying mines (tab). The player drives over

refueling bases to capture them (as well as pillboxes once they have been shot sufficiently

to reduce their armor to zero). The object of the game is to capture all of the refueling

Page 54: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

43

bases. Captured pillboxes can be relocated and used to defend your bases. The player can

also build roads, bridges, and buildings provided that enough trees have been harvested to

obtain the raw materials.

It is the complexities of deciding whether to attack or defend, to harvest or build, and

to use speed or stealth, in combination with complex terrain and multiple agents, that make

WinBolo a suitably rich environment for AI research. Version 1.15 of WinBolo (the latest

version as of March 2007) will be used for this research.

WinBolo’s API WinBolo calls a program using its C langauge API a “brain.” The

API is defined in a single file, “brain.h”, and described in a short text document (Morrison

& Cheshire 2000). The API defines a single function, BrainMain(), that WinBolo calls,

giving control (briefly) to the brain. It is called once for initialization (BRAIN_OPEN),

multiple times during the course of a game (BRAIN_THINK), and then once more at ter-

mination (BRAIN_CLOSE). The brain code is given the state of the world from a large

C structure called BrainInfo. From this sensor information, the brain should decide on a

plan of action and then modify certain elements of the interface structure to implement its

actions. The brain code is compiled into a Windows DLL with a file type of “BRN”. The

brain code is activated by choosing it from the “Brains” menu of the WinBolo user console.

The BrainInfo structure contains information about the player’s tank (location on a

65536× 65536 plane, speed (0-64), direction (0-255) and others), the terrain1 near the tank

(mapped onto a 256× 256 grid), and the location and status of nearby tanks, pillboxes, and

bases. There is also static information such as the interface version number.

To indicate an action, the brain makes changes to the BrainInfo structure. Only

certain changes are allowed. The most important variables are holdkeys and tapkeys

1WinBolo comes with a built-in map, “Everard Island.” There are also other maps included in the down-load. More importantly, map editors are available for custom terrain maps.

Page 55: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

44

which are used to request single or multiple changes in direction and speed, or to shoot

the gun. Setting items in the BuildInfo structure allows harvesting or building at the

specified map coordinates. You can also send text messages to one or more players. This is

used for coordination in multi-player (agent) tournaments and can be useful as a debugging

tool.

Perturbation There is a small amout of randomness inherent in the WinBolo world

introduced by network and processing speeds. Larger scale perturbation will be added by

training agents to perform on one bolo map and then testing it on another. A WinBolo map

allows (almost) complete customization of the WinBolo world in terms of tank start points,

terrain (forest, rivers, building, etc.) features, and the location and strength of pillboxes. An

example of such a perturbation is to train a WinBolo tank on a map where all the pillboxes

are strength 0 and then to deploy the tank on a map where the pillboxes are strength 1.

Page 56: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 5

PRELIMINARY RESULTS

I joined the UMBC/UMCP MCL working group in the spring of 2006. This group

included Tim Oates and Matt Schmill of UMBC and Don Perlis, Mike Anderson, Darsana

Josyula and others from UMCP. My initial assignment was on a port to Windows from

Linux of an MCL-enhanced Bolo agent and the construction of maps for WinBolo that

would test the agent’s response to perturbation.

In the summer and fall of 2006, I joined with the others on producing the Indications,

Failure and Response ontologies and in the preparation of papers that discussed using them

with MCL. These are the ontologies presented in the Technical Approach chapter. As Matt

Schmill’s implementation of an ontology-based MCL progressed, I provided the Windows

port and assisted in the detection and correction of problems and the implementation of

missing features.

Along with assisting in the group efforts on the ontologies and the WinBolo domain,

I also worked on creating programs for evaluating MCL in the Chippy and Windy World

domains.

The next three sections discuss the progress made in constructing the test domains and

the initial testing of MCL with Bayesian inference for the Chippy grid world.

45

Page 57: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

46

5.1 Grid World Implementation

In the Methodology chapter of this proposal, the test domains that will be used are

described. The Chippy and Windy grid worlds have been implemented in C++. Both

the baseline and non-MCL optimized versions of these agents have been created. The

optimized versions were found by varying the learning parameters across a wide range and

then using the parameters that produced the highest reward after perturbation.

The optimization effort was carried out by varying the learning rate (Figure 5.1), the

exploration rate (Figure 5.2), and the discount rate (Figure 5.3). In each case a single

parameter was changed, leaving the other two parameters at their nominal values (α =

0.5, γ = 0.9, ε = 0.05).

The total reward received after perturbation is given in Table 5.1. The value for the best

parameter is highlighted. The learning rate (α) was best at 0.9 with all of the high values

being better than the low ones. A higher learning rate allowed quicker replacement of of the

old Q values. The exploration rate (ε) that returned the highest reward was 0.06 which is not

too far from the nominal value of 0.05. The best path in Chippy is the same before and after

perturbation although the direction of travel changes. Too low a learning rate keep it from

learning the new direction quickly and too high a value prevents exploiting the optimum

path once it is learned. To achieve the best reward, the best discount rate (γ) at 0.6 was

much lower that the normal 0.9. The rewards earned with (α = 0.5, γ = 0.6, ε = 0.05) at

5,333 was the best of the post-perturbation rewards. For Chippy, lowering the discount rate,

improved post-perturbation performance better than changing the learning or exploration

rates.

Page 58: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

47

FIG. 5.1. Effect on Chippy perturbation recovery of varying the learning rate

Table 5.1. Rewards for Chippy after perturbation with varying Q-Learning rates

alpha rewards epsilon rewards gamma rewards.1 619.64 .01 3320.31 .1 4580.51.2 3079.54 .02 3938.82 .2 4933.39.3 3916.66 .03 4225.87 .3 5227.81.4 4427.12 .04 4353.92 .4 5199.27.5 4587.84 .05 4486.38 .5 5256.53.6 4624.60 .06 4629.67 .6 5333.90.7 4809.90 .07 4595.64 .7 5099.67.8 4776.69 .08 4590.09 .8 4929.31.9 5123.45 .09 4618.18 .9 4589.14

Page 59: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

48

FIG. 5.2. Effect on Chippy perturbation recovery of varying the exploration rate

Page 60: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

49

FIG. 5.3. Effect on Chippy perturbation recovery of varying the discount rate

Page 61: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

50

5.2 BoloSoar Implementation

An initial implementation of the WinBolo/Soar interface was done as part of course

work for a class on Agent Architecture and Multi-Agent Systems. In a demostration, a

WinBolo tank controlled by a set of Soar rules found the solution to a small maze using a

random walk (see Figure 5.4).

FIG. 5.4. WinBolo tank outside small maze

Connecting WinBolo to Soar required putting the WinBolo tank’s sensor information

onto Soar’s intput-link. A portion of the code to accomplish this is shown in Fig-

ure 5.5. Once the sensor information has been set, control is turned over to Soar which

applies the rules in its knowledge base and then put actions on the output-link. Fig-

ure 5.6 shows the three Soar rules used to land the tank. There were a total of 36 rules

defined in the random walk demonstration.

Page 62: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

51

/* 3. Put tank information on ^input-link */integer_to_input_link(pInputLink, &pspeed,

"speed", brainInfo->speed);integer_to_input_link(pInputLink, &pdir,

"direction", brainInfo->direction);integer_to_input_link(pInputLink, &ptankx,

"tankx", brainInfo->tankx);integer_to_input_link(pInputLink, &ptanky,

"tanky", brainInfo->tanky);integer_to_input_link(pInputLink, &pinboat,

"inboat", brainInfo->inboat);integer_to_input_link(pInputLink, &pnewtank,

"newtank", brainInfo->newtank);

FIG. 5.5. C code to add WinBolo status information to Soar’s input structure

5.3 Chippy Agent With Ontology-Based MCL

The Indications, Failure, and Response ontologies have been created but are still being

revised as are the linkages between the ontologies and the conditional probability tables for

inter- and intra-ontology links. These have progressed far enough for initial testing. A

Q-Learning agent for the Chippy Grid World was enhanced (via the C++ MCL API) to

set expectations and receive suggestions from MCL. Figures 5.7, 5.8, 5.9, and 5.10 show

the code added to the agent for initialization, defining sensors, setting expectations, and

implementing MCL’s suggestions.

Figure 5.11 shows the results for Chippy with and without MCL.

Page 63: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

52

# ------------------------------------------------------# landing phase# 1. If still in the boat, speed up to get to shore# 2. Once we reach shore, slow down# 3. Once on shore and stopped, landing phase is done# ------------------------------------------------------

sp {propose*lp-speed-up(state <s> ^io.input-link <i>)(<s> ^phase landing)(<i> ^inboat 1)

-->(<s> ^operator <o> +)(<o> ^name speed

^value increase) }

sp {propose*lp-slow-down(state <s> ^io.input-link <i>)(<s> ^phase landing)(<i> ^inboat 0

-^speed 0)-->

(<s> ^operator <o> +)(<o> ^name speed

^value decrease) }

sp {propose*lp-end-landing(state <s> ^io.input-link <i>)(<s> ^phase landing)(<i> ^inboat 0

^speed 0)-->

(<s> ^operator <o> +)(<o> ^name end-landing) }

FIG. 5.6. Soar rules to land tank

Page 64: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

53

// 1. Introduce ourselves to MCLmclAPI::initializeMCL("Chippy2", 0);

// 2. Define propertiesmclAPI::setPV(PCI_INTENTIONAL, PC_NO);mclAPI::setPV(PCI_EFFECTORS_CAN_FAIL, PC_NO);mclAPI::setPV(PCI_SENSORS_CAN_FAIL, PC_NO);mclAPI::setPV(PCI_PARAMETERIZED, PC_YES);mclAPI::setPV(PCI_DECLARATIVE, PC_NO);mclAPI::setPV(PCI_RETRAINABLE, PC_YES);mclAPI::setPV(PCI_HLC_CONTROLLING, PC_NO);mclAPI::setPV(PCI_HTN_IN_PLAY, PC_NO);mclAPI::setPV(PCI_PLAN_IN_PLAY, PC_NO);mclAPI::setPV(PCI_ACTION_IN_PLAY, PC_NO);

mclAPI::setPV(CRC_IGNORE, PC_YES);mclAPI::setPV(CRC_NOOP, PC_YES);mclAPI::setPV(CRC_TRY_AGAIN, PC_YES);mclAPI::setPV(CRC_SOLICIT_HELP, PC_NO);mclAPI::setPV(CRC_RELINQUISH_CONTROL, PC_NO);mclAPI::setPV(CRC_SENSOR_DIAG, PC_NO);mclAPI::setPV(CRC_EFFECTOR_DIAG, PC_NO);mclAPI::setPV(CRC_ACTIVATE_LEARNING, PC_YES);mclAPI::setPV(CRC_ADJ_PARAMS, PC_NO);mclAPI::setPV(CRC_REBUILD_MODELS, PC_YES);mclAPI::setPV(CRC_REVISIT_ASSUMPTIONS, PC_NO);mclAPI::setPV(CRC_AMEND_CONTROLLER, PC_NO);mclAPI::setPV(CRC_REVISE_EXPECTATIONS, PC_YES);mclAPI::setPV(CRC_ALG_SWAP, PC_NO);mclAPI::setPV(CRC_CHANGE_HLC, PC_NO);

FIG. 5.7. C++ code to initialize MCL interface for Chippy

Page 65: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

54

// 3. Define the sensorsmclAPI::registerSensor("step"); // [0]mclAPI::registerSensor("old_x"); // [1]mclAPI::registerSensor("old_y"); // [2]mclAPI::registerSensor("new_x"); // [3]mclAPI::registerSensor("new_y"); // [4]mclAPI::registerSensor("reward"); // [5]mclAPI::registerSensor("reward0"); // [6]mclAPI::registerSensor("reward1"); // [7]

// 4. Define the property values for the sensorsmclAPI::setSensorProp("step", PROP_DT, DT_INTEGER); // [0]mclAPI::setSensorProp("old_x", PROP_DT, DT_INTEGER); // [1]mclAPI::setSensorProp("old_y", PROP_DT, DT_INTEGER); // [2]mclAPI::setSensorProp("new_x", PROP_DT, DT_INTEGER); // [3]mclAPI::setSensorProp("new_y", PROP_DT, DT_INTEGER); // [4]mclAPI::setSensorProp("reward", PROP_DT, DT_INTEGER); // [5]mclAPI::setSensorProp("reward0", PROP_DT, DT_INTEGER); // [6]mclAPI::setSensorProp("reward1", PROP_DT, DT_INTEGER); // [7]

mclAPI::setSensorProp("step", PROP_SCLASS, SC_TEMPORAL); // [0]mclAPI::setSensorProp("old_x", PROP_SCLASS, SC_SPATIAL); // [1]mclAPI::setSensorProp("old_y", PROP_SCLASS, SC_SPATIAL); // [2]mclAPI::setSensorProp("new_x", PROP_SCLASS, SC_SPATIAL); // [3]mclAPI::setSensorProp("new_y", PROP_SCLASS, SC_SPATIAL); // [4]mclAPI::setSensorProp("reward", PROP_SCLASS, SC_REWARD); // [5]mclAPI::setSensorProp("reward0", PROP_SCLASS, SC_REWARD); // [6]mclAPI::setSensorProp("reward1", PROP_SCLASS, SC_REWARD); // [7]

FIG. 5.8. C++ code to define the sensors for Chippy using the MCL API

Page 66: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

55

// 5. Define the expectation group.// We will add the expectations when we get the rewards.mclAPI::declareExpectationGroup((void *)this);

// Set reward expectation 0 or 1char sensor_name[15];sprintf(sensor_name, "reward\%d", index);expected[index] = reward;mclAPI::declareExpectation((void *)this,

sensor_name,EC_MAINTAINVALUE,(float) reward);

FIG. 5.9. C++ code to define the expectations for Chippy using the MCL API

// 5. Tell MCL what we knowresponseVector m = mclAPI::monitor(sensors, 8);

// 6. Evaluate the suggestions from MCLif (m.size() > 0){

int q=1;for (responseVector::iterator rvi = m.begin();rvi!=m.end();

rvi++){

cout << "response[ref" << hex<< (*rvi)->referenceCode()<< "] #" << q++ << ": "<< (*rvi)->responseText() << endl;

}}

FIG. 5.10. C++ code for Chippy to implement suggestions from MCL

Page 67: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

56

FIG. 5.11. Average Rewards per Step for Chippy with and without MCL monitoring

Page 68: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 6

RELATED WORK

In (Cox 2005), Michael Cox provides a survey of selected AI metacognition research

areas through 2000 (and a little beyond). Newer research is surveyed in (Anderson &

Oates 2007). This chapter will contrast several of the projects from the two surveys with

the ontology-based Metacognitive Loop. The chapter starts with a little reflection, looking

at pre-ontology, and early-ontology MCL papers. It ends with a section on a topic not

covered in the survey papers, monitoring multi-agent systems.

6.1 Pre-ontology Metacognitive Loop

Both surveys reference (Anderson & Perlis 2005) when discussing the Metacognitive

Loop. The paper describes the problem of brittle mess in AI systems due the lack of

perturbation tolerance. The Metacognitive Loop with its notice, assess, and guide phases

is offered as a solution. A trio of problem domain (reinforcement learning, navigation, and

human-computer dialogue) are shown to benefit from adding MCL. Three research areas

are proposed corresponding to the three MCL phases:

1. How should expectations be formulated to best track the performance of the systems?

2. How should the reasoning about the exceptions be organized?

57

Page 69: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

58

3. What are the best strategies for guiding a system back to proper operation?

The first and third questions remains an open issues. Bayesian Inference over three sets of

ontologies is thought to be the answer to second question and the proposed research should

show it to be an effective approach.

In (Anderson et al. 2006), three alternatives to the incorporation of MCL to deal with

perturbations are offered:

1. Do nothing,

2. Incorporate a recovery strategy for every possible problem, and

3. Create an extensive world model and continually compare the actual and predicted

performance.

The first of these approaches offers nothing except ease of implementation while the last

two are too expensive to use. MCL is offered as cost-effect alternative as it has only a

moderate cost and can greatly improve a systems tolerance to perturbation. This is demon-

strated with the Chippy grid world (see section 4.2.1). Perturbation in Chippy was used to

explore different expectations (average reward and steps between rewards), different assess-

ment techniques (immediate and cumulative), and different recovery strategies (increasing

the exploration rate and resetting the Q values). All of this was tailored for Q-learning

and would not be applicable to other types of cognitive systems. The approached outlined

in this proposal should produce the same perturbation tolerance as observed in the MCL

enhanced Chippy but will be generally applicable.

6.2 Early Ontology Metacognitive Loop

In 2007 a series of papers were published giving a preview of an ontology-based MCL.

The use of ontologies is introduced in (Anderson et al. 2007b). It describes the three

Page 70: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

59

ontontologies and how they are are linked together. The tank game, Bolo, is used as an

example domain. However, instead of using Bayesian inference, the paper discusses how

reasoning is done from expectations to response by spreading activation. An expanded

version of the paper (Schmill et al. 2007) includes an human-computer dialogue example

as well as Bolo. The Chippy reenforcement learner is used as the example in (Anderson et

al. 2007a).

The above three papers introduced the idea of generalizing the Metacognitive Loop by

using domain neutral ontologies in the Note, Assess, and Guide phases along with domain

specific expectations and corrective actions. This proposal replaces spreading activation

with Bayesian inference, adds a application program interface, and address the problem of

reentrant and recursive invocation.

6.3 Model-Based Reflection

Model-based reflection (MBR) ((Stroulia & Goel 1995) and (Stroulia & Goel 1996))

also uses a three phase approach to provide metacognition. The “monitor” phase checks

expectations, the “assign blame” phase determines the cause of the failure, and the “re-

design” phase makes the necessary corrections. But rather than using a general model of

cognitive systems and their failures, MBR uses a detail model of the problem solver using

structure-behavior-function (SBF) models. Having these models allows the expectations to

be automatically generated.

6.4 Multi-agent Metacognition

One approach to handling problems (or perturbations) in multi-agent systems is task

an agent with monitoring and controlling the other agents. These sentinel (Hägg 2000)

agents perform act as a metacognitive control on the collection of task solving agents (Fig-

Page 71: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

60

ure 6.1). The sentinel monitors the communication between agents. If an agents is acting

outside the model of the application specific interaction plan, the sentinel can take action

to correct the situation such as killing an agent, or informing other agents to ignore it.

Sentinels are also proposed in (Dellarocas & Klein 2000). But rather than a appli-

cation specific model of the agent interactions, a general purpose three phase monitoring

scheme augmented by a knowledge base of error conditions, causes, and responses is used.

Table 6.1 contrasts this approach with the MCL phases and ontologies.

Table 6.1. Monitor and control in MCL and Sentinels

MCL SentinelsPhase Ontology Phase KBNotice Indications Instrumentation FailureAssess Failure Diagnosis ExceptionGuide Response Resolution Resolution

The similarities between metacognition for single agents and sentinels for multi-

agents systems allows the transfer of ideas between the two. This is particularly apparent

when the fault handling approach is abstracted using ontologies (or knowledge bases) to

hold domain specific information.

Page 72: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

61

FIG. 6.1. Monitoring a multi-agent system with a sentinel is isomorphic to usingmetacognition with a single cognitive agent.

Page 73: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 7

FUTURE WORK

The work described in the Technical Approach and Methodology sections is consider-

able and while it will greatly advance the utility of the Metacognitive Loop as an augmenta-

tion to cognitive systems, there are several areas for further enhancement. Applying MCL

to different domains and evaluating its effectiveness can be an ongoing effort, especially

with comparisons to other approaches in the domain. The following are suggestions for

larger scale extensions to the Metacognitive Loop that are beyond of scope of the proposed

effort.

7.1 Automatic Expectation Generation

The MCL NAG cycle starts when an exception has occurred. It is required that the

designer of the system specify the exception conditions. For many maintenance condi-

tions, these are fairly obvious and easy to create: internal temperature will not exceed 150

degrees, battery power will not drop below 5 percent.

Perhaps a better solution would be to specify a minimal set of absolute expectations

and then have MCL learn and incorporate additional expectations. This has several advan-

tages:

Minimizing human effort Since only a limited set of expectations would have to be given

62

Page 74: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

63

to MCL, the domain programmer’s task would be limited to specifying that limited

set and not a broad range of expectations.

Minimizing the exception-checking overhead A system can have a great many mainte-

nance expectations - most of which will never be violated. Such expectations degrade

the system as they need to be checked against the sensor values just as often as ex-

pectations that may be violated. By generating expectations through learning, only

expectations that have the potential for violation would be created.

Improve problem detection As problems are identified, expectations could be created to

detect them earlier. This would allow identification of a reoccurrence of a problem

early enough to avoid or lessen the consequences.

The automatic generation of expectations would have to be tempered with common-

sense reasoning or other heuristics to prevent generating expectations that do not improve

the efficiency of the host system.

7.2 Automatic Ontology Expansion/Linking

The structure (node and linkages) of the MCL ontologies were created based on ex-

perience on the initial problem domains and modified as additional domains and analysis

was undertaken. It is, however, a static model and is certainly not optimal for all situations.

As with the automatic generation of expectations, the automatic generation of ontology

nodes and linkages has the potential to improve the operation of MCL, particularly in new

domains. The same caveat applies that any changes must improve the efficiency of the host

system and may require extensive heuristics to implement.

Page 75: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

64

7.3 Application to Multi-agent systems

All of the problem domains discussed and evaluated in this proposal are with single

agents. MCL can be directly applied to individual agents in a multi-agent system. This can

be either normal agents or sentinel agents.

When using MCL to monitor and control multiple agents, additional concrete re-

sponses would be needed with the corresponding augmentations to the Response ontology.

The Failure ontology would need to be expanded to incorporate nodes for agent communi-

cation and coordination failures. Agents can be treated as both sensors and effectors of the

sentinel agents.

7.4 Transferring Learning with MCL Networks

As MCL works with a host system, the conditional probabilities on the intra-

ontological and inter-ontological links change to reflect the experience of what suggestions

were effective strategies in coping with the failed expectations. Each host system is differ-

ent with different expectations and available concrete responses, but there should be a way

to apply a tuned set of conditional probabilities from one system to another.

7.5 Modeling dynamic environments

A basic tenet of this paper is that MCL provides agents with a mechanism for coping

with dynamic environments so that such mechanisms do not need to be crafted into the

agents themselves. But, what exactly is a dynamic environment and how is it quantified?

Is is possible to create quantitative or predictive models of dynamic environments and how

could these models be used to improve the operation of the NAG cycle within MCL?

Page 76: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

Chapter 8

TIMETABLE

This section details a twenty month timetable to enhance MCL. It covers research

activities, code development and testing, evaluation, and completion of the dissertation

materials with the final defense. The work concludes with a May 2009 commencement.

8.1 Activities

There are several activies that need to be accomplished, but they can be divided into

five major segments.

Bayesian Ontologies The NAG cycle of MCL will be reworked to use Bayesian inference

over Indications, Failure, and Response ontologies.

HTN Bolo and BoloSoar Implementation of a task planner in the Bolo tank domain both

in C++ and in Soar.

Reentrant and Recursive MCL Investigate and implement knowledge structures that

support reentrant and recursive invocations of MCL to deal with errors over time

and self improvement.

Evaluation Evaluate the enhanced MCL in terms of performance, ease and breadth of

implementation, and additional computation resources required.

65

Page 77: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

66

Dissertation and Defense Complete, revise, and defend the dissertation.

Intertwined with the these activities will be many support and minor research project,

including

• conference and journal papers with partial results

• C++, C, and additional APIs for MCL

• porting MCL to various machines and operating systems

• documentation and demonstration programs for the MCL and BoloSoar APIs

8.2 Monthly Schedule

Table 8.1 gives a month-by-month breakdown of the activities described above.

Page 78: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

67

Table 8.1. Monthly Timetable for MCL Reengineering

Year Month Number Activity2007 Aug Proposal Defense

Sep 1 Bayesian OntologyOct 2 Bayesian OntologyNov 3 Bayesian OntologyDec 4 HTN Bolo

2008 Jan 5 HTN BoloFeb 6 Bolo Soar

March 7 Bolo SoarApril 8 ReentrancyMay 9 ReentrancyJune 10 ReentrancyJuly 11 Recursive MCLAug 12 Recursive MCLSep 13 Recursive MCLOct 14 Performance measurementsNov 15 Implementation measurementsDec 16 Computational Requirements

2009 Jan 17 Revise DissertationFeb 18 Revise DissertationMar 19 Committee ReviewApr 20 Dissertation DefenseMay Commencement

Page 79: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

REFERENCES

[1] Anderson, M. L., and Oates, T. 2007. A review of recent research in metareasoning

and metalearning. AI Magazine 28(1):7–16.

[2] Anderson, M. L., and Perlis, D. R. 2005. Logic, self-awareness and self-improvement:

the metacognitive loop and the problem of brittleness. Journal of Logic and Computa-

tion 15(1):21–40.

[3] Anderson, M. L.; Oates, T.; Chong, W.; and Perlis, D. 2006. The metacognitive

loop: Enhancing reinforcement learning with metacognitive monitoring and control for

improved perturbation tolerance. Journal of Experimental and Theoretical Artificial

Intelligence 18(3):387–411.

[4] Anderson, M. L.; Fults, S.; Josyula, D. P.; Oates, T.; Perlis, D.; Schmill, M. D.; Wilson,

S.; and Wright, D. 2007a. A self-help guide for autonomous systems. AI Magazine.

[5] Anderson, M. L.; Schmill, M.; Oates, T.; Perlis, D.; Josyula, D.; Wright, D.; and Wil-

son, S. 2007b. Toward domain-neutral human-level metacognition. In 8th International

Symposium on Logical Formalizations of Commonsense Reasoning, 1–6.

[6] Cox, M. T. 2005. Metacognition in computation: a selected research review. Artificial

Intelligence 169(2):104–141.

[7] Dellarocas, C., and Klein, M. 2000. An experimental evaluation of domain-

independent fault handing services in open multi-agent systems. In Proceedings of the

International Conference on Multi-agent Systems (ICMAS).

[8] Hägg, S. 2000. A sentinel approach to fault handling in multi-agent systems. In

Proceedings of the 2nd Australian Workshop on Distributed AI.

68

Page 80: Curriculum Vitae - Inspiring Innovationdean3/dwprelim.pdf · Curriculum Vitae Name: Dean Earl Wright III ... 2.2.4 Example ... 5.1 Effect on Chippy perturbation recovery of varying

69

[9] Morrison, J., and Cheshire, S. 2000. How to write plug-in brains. www.winbolo.com.

Included in Winbolo distribution.

[10] Morrison, J. 2006. Lin-Winbolo Manual. www.winbolo.com. Included in Winbolo

distribution.

[11] Schmill, M.; Josyula, D.; Anderson, M. L.; Wilson, S.; Oates, T.; Perlis, D.; Wright,

D.; and Fults, S. 2007. Ontologies for reasoning about failures in ai systems. In

Workshop on Metareasoning in Agent-Based Systems.

[12] Stroulia, E., and Goel, A. K. 1995. Functional representation and reasoning for

reflective systems. Journal of Applied Intelligence 9(1):101–124.

[13] Stroulia, E., and Goel, A. K. 1996. A model-based approach to blame assignment:

Revising the reasoning steps of problem solvers. In Thirteenth Annual Conference on

Artificial Intelligence, 959–965. Portland, Oregon: AAAI Press.

[14] Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning, An Introduction.

Cambridge, MA: MIT Press.