curriculum vitae - inspiring innovationdean3/dwprelim.pdf · curriculum vitae name: dean earl...
TRANSCRIPT
Curriculum Vitae
Name: Dean Earl Wright III
Permanent Address: 422 Shannon Court, Frederick, MD 21701
Degree and date to be conferred: Doctor of Philosophy, May 2009.
Date of Birth: 27 November 1954.
Place of Birth: La Rochelle, France.
Secondary Education: McCluer High School, Florisant, Missouri.
Previous Degrees:
Hood College, Master of Business Administration, 2005Hood College, Master of Science (Computer Science), 2001Hood College, Bachelor of Science (Computer Science), 1998Frederick Community College, Associate in Arts (Business Administration), 1993
Professional publications:
Michael L. Anderson, Matt Schmill, Tim Oates, Don Perlis, Darsana Josyula, DeanWright and Shomir Wilson. Toward Domain-Neutral Human-Level Metacognition.In Proceedings of the 8th International Symposium on Logical Formalizations ofCommonsense Reasoning, pages 1–6, 2007.
M. Schmill, D. Josyula, M. Anderson, S. Wilson, T. Oates, D. Perlis, D.Wright and S. Fults. Ontologies for Reasoning about Failures in AI Systems. InProceedings of the Workshop on Metareasoning in Agent-Based Systems, May2007
Michael L. Anderson, Scott Fults, Darsana P. Josyula, Tim Oates, Don Perlis,Matthew D. Schmill, Shomir Wilson, and Dean Wright. A Self-Help Guide ForAutonomous Systems. To appear in AI Magazine. 2007.
Professional positions held:
CRW/Logicon/Northrop Grumman, January 1985–July 2007
Scientific Time Sharing Corporation (STSC), January 1977–January 1985
ABSTRACT
Title of Thesis: Reengineering the Metacognitive Loop
Dean Earl Wright III, Ph.D. Computer Science, 200x
Dissertation directed by: Tim Oates, ProfessorDepartment of Computer Science andElectrical Engineering
The field of Artificial Intelligence has seen steady advances in cognitive systems. But
many of these systems perform poorly when faced with situations outside of their training
or in an dynamic environment. This brittleness is a major problem in the field today.
Adding metacognition to such systems can improve their operation in the face of per-
turbations. The Metacognitive Loop (MCL) (Anderson et al. 2006) works with a host
system, monitoring its sensors and expectations. When a failure is indicated, MCL advises
the host system on corrective actions.
Past implementations of MCL have been hand crafted and tightly integrated into their
host systems. MCL is being reengineered to provide a C language API and to do Bayesian
inference over a set of indication, failure, and response ontologies. These changes will
allow MCL to be used with a wide variety of systems.
To prevent brittleness within MCL itself several items need to be addressed. MCL
must be able to resolve host system failures when there is more than one indication of the
failure or when a second indication occurs while MCL is attempting to help the host system
recover from the failure. MCL also needs the ability of MCL to monitor itself and improve
its own operation.
A twenty month plan is proposed to enhance MCL as described and to measure (1) the
effectiveness of MCL in improving the operation of the host system; (2) MCL’s operational
efficiency in terms of additional computational resources required and (3) the effort needed
to incorporate MCL into the host system.
Reengineering the Metacognitive Loop
by
Dean Earl Wright III
Thesis submitted to the Faculty of the Graduate Schoolof the University of Maryland in partial fulfillment
of the requirements for the degree ofDoctor of Philosophy
2009
c© Copyright Dean Earl Wright III 2007
TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Metacognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Metacognitive Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Assess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3 TECHNICAL APPROACH . . . . . . . . . . . . . . . . . . . 12
3.1 MCL Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Indications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.4 Intra-ontology linkages . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Support for Reasoning Over Time . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Recursive Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 C Language Application Program Interface . . . . . . . . . . . . . . . . . 29
3.6 Extending the API to Multiple Languages . . . . . . . . . . . . . . . . . . 29
Chapter 4 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Effectiveness in Improving Host System Operation . . . . . . . . . 31
4.1.2 Additional Computational Resources Required . . . . . . . . . . . 31
4.1.3 Implementation Effort . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.4 Breadth of Deployment . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Evaluation Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Chippy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Windy Grid World . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.3 WinBolo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 5 PRELIMINARY RESULTS . . . . . . . . . . . . . . . . . . . 45
5.1 Grid World Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 BoloSoar Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 Chippy Agent With Ontology-Based MCL . . . . . . . . . . . . . . . . . . 51
Chapter 6 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . 57
6.1 Pre-ontology Metacognitive Loop . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Early Ontology Metacognitive Loop . . . . . . . . . . . . . . . . . . . . . 58
6.3 Model-Based Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.4 Multi-agent Metacognition . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 7 FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1 Automatic Expectation Generation . . . . . . . . . . . . . . . . . . . . . . 62
7.2 Automatic Ontology Expansion/Linking . . . . . . . . . . . . . . . . . . . 63
7.3 Application to Multi-agent systems . . . . . . . . . . . . . . . . . . . . . . 64
7.4 Transferring Learning with MCL Networks . . . . . . . . . . . . . . . . . 64
7.5 Modeling dynamic environments . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 8 TIMETABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.1 Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Monthly Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
LIST OF FIGURES
2.1 Metacognitive Monitoring and Control . . . . . . . . . . . . . . . . . . . . 7
2.2 Software agent interactions with the environment . . . . . . . . . . . . . . 7
2.3 Software agent with metacognition . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Host systems with MCL support . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Ontologies as additional metaknowledge . . . . . . . . . . . . . . . . . . . 14
3.2 Ontological Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Divergence Nodes in the Indications Ontology . . . . . . . . . . . . . . . . 17
3.4 Failure Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Response Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Example Ontology Connections . . . . . . . . . . . . . . . . . . . . . . . 23
3.7 Conditional probability tables for portion of MCL response ontology . . . . 24
3.8 Reentrant MCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.9 Expectations arranged in single (a) and multiple groups (b) . . . . . . . . . 26
3.10 MCL providing metacognitive services to MCL . . . . . . . . . . . . . . . 28
4.1 A 8x8 “Chippy” grid world with two rewards . . . . . . . . . . . . . . . . 33
4.2 A Chippy policy after 1,000 moves . . . . . . . . . . . . . . . . . . . . . . 34
4.3 A Chippy policy after 5,000 moves . . . . . . . . . . . . . . . . . . . . . . 34
4.4 A Chippy policy after 1,000,000 moves . . . . . . . . . . . . . . . . . . . 35
4.5 Q-Learning in the Chippy Grid World . . . . . . . . . . . . . . . . . . . . 36
4.6 Q-Learning before and after perturbation . . . . . . . . . . . . . . . . . . . 36
4.7 Q-Learning with exploration rate set to 0 after policy learned . . . . . . . . 37
4.8 The windy grid world with both single and double offsetting columns . . . 38
4.9 A 15 step path from the start to the goal . . . . . . . . . . . . . . . . . . . 39
4.10 The two moves that lead to the goal and the seven squares that can not be
entered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.11 SARSA exploration of the Windy Grid World showing cumulative moves
over multiple episodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.12 A Windy grid world policy learned by SARSA after 170 episodes . . . . . 40
4.13 Windy grid world optimum policy with Q values. The best path from the
start to the goal is underlined. . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.14 A WinBolo console with menus, actions, primary and secondary displays. . 42
5.1 Effect on Chippy perturbation recovery of varying the learning rate . . . . . 47
5.2 Effect on Chippy perturbation recovery of varying the exploration rate . . . 48
5.3 Effect on Chippy perturbation recovery of varying the discount rate . . . . . 49
5.4 WinBolo tank outside small maze . . . . . . . . . . . . . . . . . . . . . . 50
5.5 C code to add WinBolo status information to Soar’s input structure . . . . . 51
5.6 Soar rules to land tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.7 C++ code to initialize MCL interface for Chippy . . . . . . . . . . . . . . 53
5.8 C++ code to define the sensors for Chippy using the MCL API . . . . . . . 54
5.9 C++ code to define the expectations for Chippy using the MCL API . . . . 55
5.10 C++ code for Chippy to implement suggestions from MCL . . . . . . . . . 55
5.11 Average Rewards per Step for Chippy with and without MCL monitoring . 56
6.1 Monitoring a multi-agent system with a sentinel is isomorphic to using
metacognition with a single cognitive agent. . . . . . . . . . . . . . . . . . 61
LIST OF TABLES
1.1 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Ontologies used with NAG cycle . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Indication Ontology Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Sensor, Divergence and Core Node Indications Ontology Linkages . . . . . 18
3.4 Concrete Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 MCL Platforms and Languages . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Wind Speed and Directions in the Seasonal Windy Grid World . . . . . . . 41
5.1 Rewards for Chippy after perturbation with varying Q-Learning rates . . . . 47
6.1 Monitor and control in MCL and Sentinels . . . . . . . . . . . . . . . . . . 60
8.1 Monthly Timetable for MCL Reengineering . . . . . . . . . . . . . . . . . 67
Chapter 1
INTRODUCTION
The field of Artificial Intelligence has seen steady advances in cognitive systems. AI
has acquitted itself in one area after another, theorem proving, game playing, machine
learning, and more. While advances in computer speeds and memory sizes have certainly
helped, the majority of the achievements have come from new algorithms and experience
in applying them.
Without detracting anything from AI’s accomplishments, many of the cognitive sys-
tems perform poorly when faced with situations outside of their training or in an dynamic
environment. A robot trained to search out a dark blue goal may or may not detect a light
blue one. A robotic car trained to drive on American roads will likely be a danger to it-
self and others if transported to a country with left hand driving rules. A switch between
instruments reading in miles to kilometers may doom a spacecraft.
The transition between laboratory training and successful real-world operation re-
mains a major challenge. To cope with the possible future encounters, additional rules
and/or more training can be used but this increases the cost and lengthens the time be-
tween conception and deployment. Alternatively, greater ability can be given to the agent
to explore, learn, and reason about its environment but this too raises the manufacture and
operational costs.
1
2
Adding more capabilities to an agent to cope with possible perturbations increases the
complexity of the agent and, except during periods of perturbation, may decrease perfor-
mance. Adding metacognition to such systems can improve their operation in the face of
such perturbations without sacrificing performance during normal operations. Only when
a problem has been noticed does the problem correction correction code need to be active.
Metacognition can monitor an agent’s performance and invoke corrective action only when
needed.
The Metacognitive Loop (MCL) is a metacognitive approach that has been applied to
a variety of systems (Anderson & Perlis 2005). It works with a host system, monitoring its
sensors to ensure that they are within expected levels. As long as all the expectations are
met, MCL remains purely in a monitoring state allowing the agent to function unimpeded
by extraneous problem-handling code.
When a failure is indicated by a violation of one or more expectations, MCL exam-
ines the violations, determines possible failures, and advises the host system on potential
corrective actions. The agent implements the corrective action (e.g., removing a recent rule
from the KB, adjusting the exploration parameter, re-evoking a previous procedure). The
agent then continues to operate, leaving the performance evaluation tasks to MCL.
Past implementations of MCL have been hand-crafted and tightly integrated into their
host systems. MCL had direct access to the agent’s sensors. The failure of expectations was
tied directly to corrective actions. The logic of the agent and MCL were separate but their
implementations were intertwined. These implementations were successful in showing that
an MCL-enhanced agent performed better in a dynamic, perturbed situation than an agent
without such support.
To allow MCL to be implemented with a larger number of systems, MCL is being
reengineered to provide an Application Programming Interface (API). This will provide a
clean separation between MCL and the host system. The API will be developed for the
3
C language and then (with the use of the open source SWIG package) extended to other
programming languages.
But, more than just having a new facade, MCL is being reengineered internally to do
Bayesian inference over a set of indication, failure, and response ontologies. The three
ontologies and the links between them capture the knowledge of how problems manifest
themselves and what the appropriate correction actions are. Using Bayesian inference over
probabilities on inter- and intra-ontology links, the most likely failure and the most effective
corrective action can be determined.
By having an API that can be used regardless of the implementation language of the
host (and internal processes that are generalized and probabilistic per above), a wide variety
of systems will be able to incorporate MCL for metacognitive monitoring and control.
MCL will no longer have to be implemented in the agent’s programming language and
intertwined with the agent’s processing with every new use of MCL requiring MCL to be
re-implemented. Now a single implementation can be created, leveraging the MCL/host-
system division made possible by the API and the computational power inherent in the
Bayesian augmented ontologies.
The reengineering of MCL will also allow some brittleness problems within MCL
itself to be addressed. When a system fails to achieve its goal, one or more expectations
may have been violated. Multiple expectation violations may indicate that there is more
than one problem or only a single problem. Likewise, multiple problems can manifest
themselves in a single expectation violation so a single violation is not a guarantee of a
single problem. Being able to disambiguate problem failures from expectation violations is
expected to come as a benefit from the new ontologies and the use of Bayesian inference.
A second problem that will be addressed within the new MCL framework is that of
reentrant invocations. That is the ability to resolve host system failure when a second
exception occurs while MCL is attempting to help the host system recover from an earlier
4
exception. To effectively guide the host system we need to know if this second exception
is an indication of the same problem as the first failure or a new problem. The ontologies
and Bayesian inference will provide part of the answer but MCL must also be able to
maintain (create, update and eventually discard) a context record that allows it to reason
about failures over time.
Just as MCL can improve the operation of a host system, MCL should also be able
to assist itself through a recursive invocation. The meta-MCL would provide the same
services and facilities that MCL provides to a host system: that of monitoring sensors
for exceptions and suggesting corrective action when an exception has been violated. In
the case of meta-MCL, the sensors would be MCL’s internal performance metrics and the
corrective actions would be to adjust MCL’s ontologies and conditional probability tables.
While it may not be possible to create an AI agent that can function optimally in all
situations, metacognition in the form of the Metacognitive Loop can be efficiently applied
to many types of AI architectures to improve their performance in dynamic environments.
The new MCL will be used with several systems (some which have been used with the older
version of MCL and some new domains). The performance of the MCL-enhanced systems
versus the base versions will be compared to show the effectiveness of MCL. Source lines
of code will be used as a measure of the implementation cost of MCL. CPU, elapsed time,
and memory usage will be used to show that the cost of adding MCL is nominal.
In summary, I propose to improve the operation of AI agents by enhancing and ex-
tending the existing MCL. Changes will be made to improve generality and the robustness
of MCL. The contributions of this research are listed in Table 1.1.
5
Table 1.1. Research contributions
Category ChangesGenerality Ontologies
Bayesian inferenceCross platform/language API
Robustness Reentrant invocationRecursive invocation
Other Metrics and evaluation
Chapter 2
BACKGROUND
This section describes metacognition for use with computer system and the particular
implementation, the Metacognitive Loop (MCL), whose extensions will be the basis for the
contributions of the proposed research.
2.1 Metacognition
The philosophical origins of metacognition may be traced to the dictum of “know thy-
self.” Metacognition is studied as part of developmental and other branches of psychology.
While there are several different approaches, one common model is a cognitive process
which is monitored and controlled by a metacognitive process as in Figure 2.1. Metacogni-
tion can be studied in conjunction with metaknowledge (knowledge about knowledge) and
metamemory (memory about memory) (Cox 2005).
The canonical depiction of a software agent (figure 2.2) has sensors to perceive the
environment and activators with which the agent tries to control it. Metacognition can
be layered onto a software agent so that the metacognitive process monitors and controls
the cognitive process of the software agent as shown in figure 2.3, with metamemory and
metaknowledge.
Metacognition improves the performance of the agent in the environment by providing
6
7
FIG. 2.1. Metacognitive Monitoring and Control
FIG. 2.2. Software agent interactions with the environment
8
FIG. 2.3. Software agent with metacognition
two control functions. The first is to inform the agent when a cognitive task (e.g. the
selection of the next action to perform) has been satisfactorily achieved so that the agent
can move on to another task (such as performing the selected action). For some agents, the
cognitive sufficiency test is built into the cognitive process itself. For example, the cognitive
task may be limited to selecting (based on a specified estimated utility) between a fixed
number of choices. In such a case there is no opportunity for metacognitive intervention.
The second metacognitive control function is to reflect on the performance of the
agent. This can be done at the completion of a successful task, but it is most often per-
formed after a failure. The metacognitive process evaluates the decisions made by the
agent and determines where an alternative selection would have been more appropriate or
it may suggest a change to the agent’s current cognitive state such as invoking a learning
module. Reflection can also be done on a continuous basis, allowing small deviations from
9
the expected to trigger controls to prevent future failures.
2.2 Metacognitive Loop
The purpose of the metacognitive loop (MCL) is to improve the operation of the host
system by dealing with unexpected events (Anderson & Perlis 2005). It does this by adding
a metacognitive layer to the host that is concerned with monitoring the operation of the
host system, and taking corrective action when it detects a problem. Figure 2.4 shows a
cognitive host system with MCL.
FIG. 2.4. Host systems with MCL support
MCL consists of three phases that implement its megacognitive knowledge about
problem detection, fault isolation, and corrective action for cognitive agents. These three
10
phases correspond to the process often used by humans where (1) we notice that something
is not working, (2) make decisions about it (whether the problem is important, how likely it
is to get worse in the future, if it is fixable, etc.) and then (3) implement a response based on
the decisions that were made (ignore the problem, ask for help, attempt to fix the problem
using trial-and-error, etc.)
2.2.1 Note
The MCL process starts with the Note phase that provides the host system with a "self-
awareness" component. MCL monitors the host system to detect a difference between
expectations and observations. An anomaly occurs when an expectation is violated. An
expectation is a statement about the allowable values for a sensor. Statements such as “the
mixing vat temperature will not exceed 170 degrees” and “the flow in the coolant pipe will
be between 80 and 90 gallons per second” are expectations made about external sensors.
Anomalies can also be about internal host processes such as “a new plan will be generated
no more than 5 seconds after a new subgoal has been made the current subgoal.” When
the sensor information is at odds with the expected values, an anomaly is noted and MCL
moves to the assess state.
2.2.2 Assess
In the Assess state, MCL attempts to determine the cause of the problem that led to the
anomaly and the severity of the problem. The computation done in this phase need not be
excessive. Indeed it is the philosophy of MCL that lightweight, efficient problem analysis
is better than ignoring the problem, attempting to design out every conceivable problem, or
to attempt to model and monitor large portions of the world. In some implementations of
MCL this phase is almost nonexistent with a direct connection between the exception and
the corrective action.
11
2.2.3 Guide
The third state of MCL is Guide, where MCL attempts to guide the host system back
to proper operation but offering a suggestion as to what action(s) will return the sensor
values to be within the limits set by the expectations. The suggestions available in this
phase vary depending on the features of the host system.
Once the suggestion has been made, MCL returns to the exception monitoring state.
Any new exceptions will cause MCL to again enter the Note, Assess, and Guide phases of
the NAG cycle.
2.2.4 Example
A Mars rover tasked with exploring geological formations on the red planet also has
to manage power consumption. When a low battery alarm causes the rover initiate a return
to the recharging station, it plans a path avoiding known obstacles. The path leads it over a
dust field, while not an obstacle as such, it requires addition motive power. The additional
power consumption drains the rover’s battery, ending its mission.
If the rover had an MCL component, the additional power consumption would have
have been noted as an indication of a problem. An assessment would have been made with
a response of re-planning the path to the recharging station with the dust field now classified
as an obstacle.
Chapter 3
TECHNICAL APPROACH
The Metacognitive Loop has been shown to be effective in lessening the problem of
brittleness in cognitive systems. MCL was added to those systems as a customized en-
hancement. Each one slightly different to correspond to the implementation language and
target machine of the cognitive system. The Note, Assess, and Guide steps were also tai-
lored to the domain and host system. While this approach works fine on a small scale
to make MCL available for cognitive systems, in general, it will require an implementa-
tion that works for a variety host systems in different domains implemented in different
languages for different machines.
Several areas of MCL will need enhancing to provide the benefits of metacognition
as a general service. This work will require both research and system engineering efforts.
The research areas include:
• Using Bayesian inference over Indications, Failure and Response ontologies for
metacognitive reflection;
• Reasoning over time about multiple error indications;
• Using metacognition to improve the metacognitive process itself.
The system engineering effort will be to provide MCL services in a package that can be
12
13
easily used across many different implementation environments.
3.1 MCL Ontologies
For MCL to serve as a general purpose tool for the brittleness problem for cognitive
systems, it must be able to perform its Note, Assess, and Guide phases without needing
extensive tailoring for each domain. MCL should be able to reason using mainly abstract,
domain-neutral concepts to determine why a system is failing and how to cope with the
problem. To support this, three ontologies were created (one for each phase of the NAG
cycle). These ontologies are additional metaknowledge for MCL as shown in figure 3.1.
Each of the three ontologies is used by a different phase of MCL (see Table 3.1). The
Indications ontology is used in the Note phase when sensor input shows that an expection
has been violated. The Assess phase uses the Failure ontology to determine likely causes
of the violated expections. Once likely causes of the failure have been identified, the Guide
phase uses the Response ontology to determine appropriate responses to the failure.
Table 3.1. Ontologies used with NAG cycle
Phase OntologyNote Indications
Assess FailureGuide Responses
Elements within each ontology are linked to others in the ontology to show an “is-
a” relationship. For example, the “sensor not responding” node in the Failure ontology is
connected to the “sensor failure” node to show that “sensor not responding” is a type of
14
FIG. 3.1. Ontologies as additional metaknowledge
“sensor failure.” Elements in one ontology may also be linked to elements in a different
ontology to show a possible “cause-and-effect” or “problem-solution” relationship. The
general pattern of ontology linkage is shown in Figure 3.2. This figure also shows how
the expectations are linked to the indications ontology elements and the elements of the
response ontology lead to suggestions that MCL gives to the host system.
The sensors and expectations are part of the “concrete” realm of the host system.
Processing by MCL moves from the concrete expectations to the abstract indication, failure,
and response ontologies, and then back to the concrete suggestions implemented by the host
system. Figure 3.2 shows the division between the concrete and abstract processing. The
next sections expand on this process, going into each of the three ontologies in greater
detail.
15
FIG. 3.2. Ontological Linkages
3.1.1 Indications
The Indications ontology is comprised of three types of nodes (See Table 3.2). The
purely abstract indication nodes support concepts that cross multiple domains. These make
up the core of the indications ontology. These nodes represent concepts such as “deadline
missed”, “failed to change state”, and “reward not received”.
The sensor nodes of the indications ontology model the sensors of the host system and
their attributes. When the sensors of the host system are defined, sensor nodes are added to
the indications ontology. Additional nodes are added to the ontology for the expectations
for the values of the sensors.
The third set of nodes in the indications ontology forms a linkage from the concrete
sensor and expectations nodes and the abstract, core nodes of the ontology. The divergence
nodes define how expectations can be violated. Figure 3.3 shows these nodes and the rela-
16
Table 3.2. Indication Ontology Nodes
Type NodeCore deadlineMissed
rewardNotReceivedresourceOverflowresourceDeficitfailedStateChangeunanticipatedStateChangeassertedControlUnchanged
Sensor statecontroltemporalresourcerewardmessageambientobjectPropspatialcriticalnoncriticaldiscreteordinalmaintenanceeffect
Divergence divergenceaberrationcwa-violationcwa-decreasecwa-increasebreakout-lowbreakout-highmissed-targetmissed-unchangedshort-of-targetlong-of-targetoverunderlate
17
tionships between them. The three free standing nodes (over, late, and under) are not part
of the divergence tree structure but like the other divergence nodes can be used to further
define the exception. In the lower left nodes “cwa” stands for Closed World Assumption.
divergence
aberation
missedtarget
cwaviolation
cwadecrease
cwaincrease
breakoutlow
breakouthigh
over
under
late
missedunchanged
long oftarget
short oftarget
FIG. 3.3. Divergence Nodes in the Indications Ontology
It is the violation of expectations that starts the MCL NAG cycle. The type of violation
and the type of sensor are linked together to a core indications ontology node. Table 3.3
18
shows sensor and divergence nodes linked to core nodes.
Table 3.3. Sensor, Divergence and Core Node Indications Ontology Linkages
Core Node Sensor Node Divergence NodeDeadline missed temporal lateReward not received reward underResource overflow resource overResource deficit resource underFailed state change state missed-unchangedUnanticipated state change state aberrationAsserted control unchanged control missed-unchanged
3.1.2 Failures
Once the violated expectations have been evaluated in the Note phase, MCL proceeds
to evaluate the problem indications to determine the cause in the Assess phase. The Failure
ontology is used in the problem determination. This phase is used (rather than mapping
indications directly to responses) because of the ambiguous nature of failures and their in-
dications: two different failures which need different responses might have the same initial
problem indications and a single problem might manifest itself with multiple indications.
The Failure ontology (Figure 3.4) is a catalog of the various problems that befall cog-
nitive systems. This includes problems with sensors, effectors, resources, and the domain
model (or models). The links in the failure ontology are all of the is-a variety. Thus a “sen-
sor malfunction” is-a “sensor error” is-a “knowledge error” is-a “failure”. The “failure”
node is the root of the Failure ontology and all Failure ontology nodes eventually lead to it.
19
Proceduremode error
Misfiterror
Failure
Knowledgeerror
Resourceerror
MPAerror
Badparameter
Modelerror
Lack ofresource
ResourceProperty
Resourcecost
Predictivemode error
Timingmodelerror
effectorerror
Sensorerror
effectornoise
Sensornoise
Effectorfailure
SensorMalfunction
underfiterror
Overfiterror
FIG. 3.4. Failure Ontology
20
3.1.3 Responses
As the Failure ontology was an itemized list of everything that can go wrong with a
cognitive system, the Response ontology (Figure 3.5) is a list of everything that can be done
about it. There are two types of nodes in the Response ontology: abstract and concrete.
The abstract nodes represent general problem-solving techniques and the concrete nodes
represent specific suggestions that MCL can send to the host system. Table 3.4 lists the
concrete responses and the abstract nodes that directly link to them. The remaining links
within the response ontology are for “is-a” relationships. For example, “Strategic Change”
is-a “System Response” is-a “Internal Response” which is-a “Response”. The “Response”
node is the root of the Response ontology.
Table 3.4. Concrete Responses
Concrete Node Abstract NodeSolicit suggestion Ask for helpRelinquish control Ask for helpRun sensor diagnostic Run diagnosticRun effector diagnostic Run diagnosticActivate Learning Modify Predictive ModelRebuild Models Modify Predictive ModelAdjust Parameters Modify Procedure ModelRevisit Assumptions Modify Procedure ModelRevise Expectations Modify AvoidAlgorithm Swap Strategic ChangeChange HLC Strategic ChangeTry Again System
21
System
Try Again
StrategicChange
ChangeHLC
AlgorithmSwap
Response
External Internal
PlantAsk forHelp
Solicitsuggestion
Relenquishcontrol
Testhypothesis
Rundiagnostic
Sensordiagnostic
Effectordiagnostic
Modifyknowledge
Modifycope
Modifypredict Modify
procedure
Activatelearning Adjust
parameter
Rebuildmodels
Revisitassumptions
Amendcontroller
Modifyavoid
Reviseexpectation
FIG. 3.5. Response Ontology
22
3.1.4 Intra-ontology linkages
The three ontologies (Indications, Failure, and Response) are connected by intra-
ontology links. Core nodes in the Indications ontology connect to nodes in the Failure
ontology. Many nodes of the Failure ontology are connected to nodes in the Response
ontology. The linkages form a chain of reasoning from the violated expectation to a sug-
gestion that may correct the problem.
Figure 3.6 shows such a path for a Q-learner faced with a dynamic grid world domain
where the rewards have been moved. Note that this is a very simplified diagram with most
of the nodes and links removed. When the Q-learner moves to the grid square that no longer
contains the expected reward, the expectation of getting the reward in that square is vio-
lated. This activates the “Reward not received” node in the Indication ontology. That node
is connected (via an intra-ontology link) to the “Model error” node of the Failure ontology.
The “Model error” node has two children, “Procedure model error” and “Predictive model
error”. The “Predictive model error” node has an intra-ontology link to the “Modify pre-
dictive response” node of the Response ontology. The “Modify predictive response” has a
child node of “Rebuild model response” that is a concrete node for generating the “Rebuild
model” suggestion. This set of inter- and intra-ontology linkages allows reasoning from
the failed expectation of obtaining a reward to rebuilding the Q-learner’s Q-table.
3.2 Bayesian Inference
For each problem indication we want to be able to determine the most likely cause or
causes of the failure. For each failure we want to be able to determine which responses are
the most likely to correct the failure. Thus given the sensors and the violated expectations
we would like to find the responses with the highest probability of working. (Actually
we want to find the response with the highest utility but for the this discussion we will
23
FIG. 3.6. Example Ontology Connections
let all the costs be the same and focus just on the probabilities). The three ontologies
and their inter-ontology linkages (which form a directed graph) can be viewed as a Bayes
net. Direct observation can be made of the sensors. By associating conditional probability
tables (CPTs) with each node in the three MCL ontologies we can use Bayesian inference
to compute the needed probabilities for the responses. Figure 3.7 shows the addition of
CPTs to a small section of the Response ontology.
The Bayesian inference will be implemented using the Intel contributed open source
Probabilistic Network Library (PNL) available at https://sourceforge.net/projects/openpnl/.
24
FIG. 3.7. Conditional probability tables for portion of MCL response ontology
3.3 Support for Reasoning Over Time
Errors can occur either once or multiple times. When the reward for a Q-Learner is
moved, the learner’s policy will drive it to repeatedly enter the square that used to contain
the reward. If there is an expectation that the square will give a positive reward, that expec-
tation will be repeatedly violated. But all of these violations are an indication of the same
problem: that the reward has been moved.
Even if MCL was invoked on the first occurrence of the unexpected reward and Q-
Learner immediately adjusted the learning and/or exploration rates in response, the old
reward square would still be visited several times while learning the new policy. Thus,
MCL needs to be able to associate multiple exceptions with the same error even while
recovering from the initial exceptions.
25
The recovery process itself may give rise to additional errors. Following MCL’s sug-
gestions to increase the exploration rate, a Q-learner may experience a longer interval be-
tween rewards. Assigning additional resources to recover from a problem in one area may
cause a scarcity of resources in another triggering a resource deficiency exception. This ex-
ception and the original one should both be considered by MCL when determining further
corrective suggestions for the host cognitive system.
To correctly assess multiple indications, MCL needs to remember what exception vi-
olations it has seen in the past and what suggestions were provided as recommended re-
sponses to those exceptions. Figure 3.8 shows the addition of previous exception violation
information as part the meta-knowledge of the enhanced Metacognitive Loop.
FIG. 3.8. Reentrant MCL
The mclFrame is the data structure for holding the context information that will be
used by the enhanced MCL to allow reasoning over time. It consists of the MCL ontologies
26
with the calculated probabilities (plus a few other pieces of information).
One mclFrame will be created for each exception violation. Multiple frames can be
merged in the Guide phase of MCL if the frames are determined to represent the same
problem. This will be done by comparing the probabilities associated with each of the
nodes in the Failure ontology.
To provide an organized method of retaining and using mclFrames, each expectation
will be associated with an expectation group. An expectation group can hold zero, one,
or more expectations. Expectation groups can have a parent group so that hierarchies can
be created. Figure 3.9 shows expectations arranged in a single expectation group and then
in a hierarchy. Grouping expectations by the host systems’ functional categories should
provide better problem resolution.
a b
FIG. 3.9. Expectations arranged in single (a) and multiple groups (b)
27
An mclFrame can be associated with each expectation group to provide a memory of
past violations for reasoning about errors over time. This is done by including the proba-
bilities retained in the mclFrame of an expectation group when calculating the probabilities
of any of the group’s violated expectations.
3.4 Recursive Invocation
MCL is a cognitive AI system like the host system it monitors. Like that host system
it receives perceptions from the environment and acts upon those perceptions to attempt
to change the environment. But, while the host system is situated in the real world (or a
simulation of it), MCL’s environment is the host system. Whatever constitutes the host
system’s environment, MCL is only aware of the shadows of that environment as projected
by the exceptions and sensors of the host system. MCL’s actuators are the suggestions it
passes to the host system.
Like any other AI system, MCL is susceptible to perturbation when its environment
(the host system) changes unexpectedly. And like any other AI systems, we should be able
to improve performance of MCL in times of perturbation by invoking MCL to note the
problem indications, assess those indications, and guide MCL to a solution.
Figure 3.10 shows a meta-MCL monitoring the operation of an MCL that is moni-
toring a cognitive agent. The environment of the meta-MCL is the MCL agent and the
environment of the cognitive agent forms the meta-environment of the meta-MCL.
mclFrames provide the mechanism for handling multiple exceptions in MCL and they
also serve the same function in the recursive use of MCL. The mclFrames of the meta-
MCL are separate and distinct from the mclFrames used with the exceptions and exceptions
groups of the MCL monitoring the cognitive agent.
While allowing MCL to monitor itself should improve its operation over time, there
28
FIG. 3.10. MCL providing metacognitive services to MCL
is also the possibility of introducing major problems.
Excessive resource consumption If the recursive MCL monitors many expectations or
these expectations are written so that they are often violated, the recursive MCL
could use a large of resources (i.e. memory and CPU). Recursive MCL should be
limited or done only as low priority task when resources or not needed elsewhere.
Infinite regress If MCL can be used to improve MCL, then MCL can be used to improve
that MCL and so on. It is expected (but by no means proved) that layering on more
and more MCLs will be subject to diminishing returns. MCL regression will be
capped at one level.
Destructive changes Having MCL making changes to MCL could be described as let-
ting a surgeon cut on his own brain. Changes should be limited in scope to prevent
29
catastrophic failure.
3.5 C Language Application Program Interface
The original MCL implementations were hand-crafted and tightly bound to the host
system. To provide metacognitive support for a variety of systems, a clean delineation is
needed between the host system and the metacognitive monitor. An initial API has been
created for C++ (the language in which the new MCL is being constructed). This interface
will be used as the basis for a C language API. This will actually require only a few changes
to the C++ API but will allow more applications to use MCL.
The C language API (as well as the C++ API) will be documented with examples of
its use.
3.6 Extending the API to Multiple Languages
The open source Simplified Wrapper and Interface Generator project (www.swig.org)
provides API generators that take a C (or C++) language API. These generators create
APIs for more than a dozen languages (Allegro CL, C#, Chicken, Guile, Java, Modula-3,
Mzscheme, OCAML, Perl, PHP, Python, Ruby, and Tcl).
In addition to the APIs created using SWIG, an API that will allow use of MCL with
the SOAR general cognitive architecture (sitemaker.umich.edu/soar/home). SOAR has a
large community of interest with its own newsletters and conferences. Giving them easy
access to MCL will allow many applications to receive the benefits of metacognitive mon-
itoring and control.
These APIs will be documented with examples of their use.
Chapter 4
METHODOLOGY
The primary measure for the success of the proposed research is how well the en-
hanced MCL improves the performance of the host system. This chapter describes the
method of testing for that and other criteria, as well as the problem domains that will be
used in the testing.
4.1 Evaluation Criteria
The hypotheses driving this research proposal are (1) that MCL augmented by on-
tologies and Bayesian inference provides cognitive systems a solution to the brittleness
problem when the environment changes in unexpected ways; (2) that MCL is efficient,
both in terms of operating costs and in the effort to add MCL to the host system; and (3)
that the MCL solution is broadly applicable across a variety of domains, implementation
languages, and operating systems.
In this section the evaluation criteria for each of the hypotheses are presented. The
next section will describe the problem domains that will be used in testing.
30
31
4.1.1 Effectiveness in Improving Host System Operation
To evaluate the effectiveness of MCL in improving operation of the host system, base,
optimized, and MCL-enhanced versions of the host systems will be compared. The host
systems used in the evaluation are grid world reenforcement learners (described in sec-
tions 4.2.1 and 4.2.2) and the Bolo tank simulation (section 4.2.3). Average Reward will be
used on periodic grid worlds while number of steps to goal will be used with episodic grid
worlds. For Bolo domain the time to complete the task will be used.
The base measurement will be done without any tuning of the cognitive system to the
domain. This base value will be used to measure the improvement when the host system
is optimized (e.g. for Q-learning selecting the best alpha and epsilon values). The MCL-
enhanced system will also be compared to the base system to make sure that it does indeed
improve performance. It will then will be compared to the optimized system to see if MCL
improves the system beyond what would normally be done for a system. To determine if
any improvement is statistically significant, the unpaired t test will be used.
4.1.2 Additional Computational Resources Required
The CPU, wall clock, working set size, and partition size will be measured for the
base, optimized, and MCL-enhanced versions of the host systems. The unpaired t test will
be used to determine if any additional resource usage of the MCL system is nominal or
significant.
4.1.3 Implementation Effort
The number of source lines of code will be counted for the base, optimized, and MCL-
enhanced versions of the host systems. Many of the lines that needed to be added to support
MCL are the same (or nearly the same) in most implements. The number of these “boiler
32
plate” lines will be counted separately from the custom code.
4.1.4 Breadth of Deployment
The metacognitive loop is to be available for multiple platforms for multiple computer
languages. Table 4.1 shows the platforms and languages that MCL will be tested on to
ensure that MCL can be widely used.
The TBD entries for OS X and Solaris reflect the dependency on the Intel originated
Bayesian Inference Library: PNL. While it is hoped that the PNL library will work on
non-Intel processors, this has not yet been attempted.
Table 4.1. MCL Platforms and Languages
platform C C++ Java Python SoarWindows Yes Yes Yes Yes Yes
Linux Yes Yes Yes YesMax OS X TBD TBD TBD TBD TBD
Solaris TBD TBD TBD TBD
4.2 Evaluation Domains
Several cognitive system will be augmented with the enhanced MCL to demonstrate
increased resilience to brittleness due to changes in the environment. These include do-
mains investigated in the initial MCL literature and new domains.
4.2.1 Chippy
The chippy grid world (Anderson et al. 2006) is an 8 by 8 square matrix as shown in
figure 4.1. An agent (in this case a chipmunk) can move in the four cardinal directions from
33
square to square. An attempt to move off the board from one of the edge squares leaves
you in the same square. The lower left (R1) and upper right (R2) squares provide rewards
and transports the agent (chipmunk) to the opposite corner. The agent starts in one of the
center squares and continues to move (and occasionally transport) until the simulation is
stopped.
R2
Start
R2R2
StartStart
R1R1R1
FIG. 4.1. A 8x8 “Chippy” grid world with two rewards
With R1 = 10 and R2 = -10, Figure 4.2 shows‘ the policy learned by a Q-Learner in a
chippy grid world after 1,000 moves. Since only 2 of the 64 squares contain a reward the
Q-Learner makes many moves (on average about 98) before even seeing the first reward
so that learning can begin. By about 5,000 moves, an optimal policy has been learned
(Figure 4.3) which achieves a reward of 10 every 14 moves. Note that a large portion of the
grid remains unexplored. Even after a million moves (Figure 4.4), it is possible that some
squares may never be visited.
34
109.9?
8.4?
6.3?
3.4?
1?
7.5�
2.2?
1.1�
0?
6�
0?
-�
-10
FIG. 4.2. A Chippy policy after 1,000 moves
10-13?
12?
11?
9.3?
6.6?
4.9?
3?
13�
9.2?
10�
9.5�
4.3?
0.92�
1.4�
1.1�
3.4�
8.5�
7.3?
1.9-
4.3-
0.17�
66
7.7�
6.9?
6.2?
5.6?
5?
4.46
6.2�
5.1?
3.2-
4.5�
0.15�
3.76
4.1�
3.2�
3.7�
2.4�
-10�
FIG. 4.3. A Chippy policy after 5,000 moves
35
106-13?
12?
11?
9.5?
8.5?
7.7?
6.9?
13�
12?
�
11?
�
9.5?
�
8.5?
�
7.7?
�
6.9?
�
6.2�
12�
11?
�
9.5?
�
8.5?
�
7.7?
�
6.9?
�
6.2?
�
5.6?
�
11�
9.5?
�
8.5?
�
7.7?
�
6.9?
�
6.2?
�
5.6?
�
5?
�
7.76
8.5�
7.7?
�
6.9?
�
6.2?
�
5.6?
�
5?
�
4.5?
�
6�
6.26
6.9�
6.2?
�
5.6?
�
5?
�
4.5?
�
4.1?
�
6.2�
5.6?
�
5?
�
4.5?
�
4.1?
�
3.7?
�
2.8�
4.16
4.5�
4.1?
�
3.7?
�
-10?
�
FIG. 4.4. A Chippy policy after 1,000,000 moves
Perturbation Perturbation is introduced into the Chippy Grid World by swapping
the values of the two goal squares (R1 and R2). Using R1 = 10 and R2 = -10 as above,
Figure 4.5 shows the average rewards earned by a Q-Learner in the Chippy grid world.
With the standard parameters (α = 0.5, γ = 0.9, ε = 0.05), Q-Learning produces a policy
that converges in about 5000 steps as can be seen by the flattening of the curve. From that
point onward it gets a reward of 10 every 14 steps plus an occasional exploratory move.
If this were to remain a static world, the exploration rate could be lowered to zero and
we could achieve a slightly higher average reward (7.1). Keeping some exploration proves
very useful if the world changes. In figure 4.6 the rewards of (10, -10) are changed to (-10,
10) in step 10,000. The Q-Learner continues to make adjustments to its policy based on
the rewards received and eventually achieves a new policy with a reward of 10 every 14+
moves. If, however, we turn off exploration once the optimum policy is learned to get the
extra reward once the perturbation occurs, the best we can do is learn a very sub-optimal
36
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 5000 10000 15000 20000
Learning Rate = 0.5Discount Rate = 0.9Exploration = 0.05
X axis = StepsY axis = Average Reward
FIG. 4.5. Q-Learning in the Chippy Grid World
(but at least positive) policy as show in figure 4.7.
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 5000 10000 15000 20000
What we learned in
5,000 steps
Takes almost
10,000 to relearn
FIG. 4.6. Q-Learning before and after perturbation
4.2.2 Windy Grid World
The windy grid world (Sutton & Barto 1998) is a 7×10 matrix as shown in Figure 4.8.
An agent can move in the four cardinal directions from square to square. Attempting to
move off the board from one of the edge squares leaves you in the same square. The
starting (0,3) and goal (7,3) squares are labeled “S” and “G” respectively. Each move
receives a reward of −1 until the goal is reached. Movement is affected by a “northerly
37
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 5000 10000 15000 20000
Turing off exploration increases
rewards once the optimum policy has
been learned
But not after a perturbation, no
exploration means a very long time to learn
the new optimum policy
FIG. 4.7. Q-Learning with exploration rate set to 0 after policy learned
wind” that offsets movement one or two squares upward (as indicated by the single and
double arrows.)
Figure 4.9 shows a path from the start to the goal and demonstrates how the winds
offset movement. Movement right (east) from the start is unaffected until point a is reached.
If there was no wind then another east move should go to b but instead the movement is
to c. Another eastern action (with a northernly shift) moves to d. Here the wind is even
stronger, causing a two space shift. Moving east from d goes to e which is only a single
upward shift but it is limited by the edge of the world. The path to the goal continues east
to the upper right square f . Now, unopposed by the wind, four southern actions lead to
square g. No wind alters the westward movement to h. A second westward action (with a
northern offset) gets to the goal. This path (which is the shortest possible) takes 15 moves.
Unless placed on an edge, the goal in a grid world should be accessible from the four
cardinal directions. The wind offset alters the spaces that lead to the goal. Figure 4.10
shows the two squares that lead to the goal and the direction that leads there. It also marks
(with an “X”) the seven squares that cannot be reached.
Figure 4.11 shows the number of episodes completed increase (the length of the path
taken from S to G decreasing) by a SARSA reinforcement learner. Figure 4.12 shows the
38
policy at the end of the 170 episodes. Figure 4.13 shows the optimum policy for the Windy
Grid World. Goal, near goal, and inaccessible squares should all be discernable in any
reasonably complete policy learned for the windy grid world.
S G
6 6 6 66 66 6
FIG. 4.8. The windy grid world with both single and double offsetting columns
Perturbation Perturbation can be added to the Windy Grid World by changing the
strength and direction of the “winds”. The Seasonal Windy Grid World varies the winds
according to a fixed repeating pattern given in Table 4.2. The normal Windy Grid World is
the “summer” season with strong winds from the south. The “winter” season reverses the
direction of those winds. The “spring” and “fall” seasons only have unit one winds where
the Windy Grid World has its unit two winds. The two ’equinox’ seasons have no winds.
The number of steps that the world is in each season is called the rotational speed.
4.2.3 WinBolo
WinBolo (Morrison 2006) is a networked simulation game that has multiple players,
alone or in teams, driving tanks on an island world (Figure 4.14). The players explore the
39
S G
6 6 6 66 66 6
- - -��
���
���
��
��- - -
?
?
?
?�@@
@Ia b
cd
e f
gh
FIG. 4.9. A 15 step path from the start to the goal
S G
6 6 6 66 66 6
X X X XX X
X W�S?
FIG. 4.10. The two moves that lead to the goal and the seven squares that can not beentered
40
0 2000 4000 6000 8000 10000
0
20
40
60
80
100
120
140
160
180
Windy World SARSA Learning
Time Steps
Epi
sode
s
FIG. 4.11. SARSA exploration of the Windy Grid World showing cumulative moves overmultiple episodes
-14.8- -14?
- -13.2- -12.4- -1.496
-1.96�
-15.3- -14.6?
- -13.4- -12.2?
- -11.6?
- -1.76
?-1.93� -2.81�
-16.16
?-15.3
?
- -14.3- -13.4- -11.5?
- -10.8- -0.998?
-1� -2.23�
-16.7?
- -15.7- -14.7- -13.7- -12.4- -10.8- -9.98- -4.92� -3.24?
-16.1- -15.5- -14.7- -13.8?
-12.96
?-11.6- -9.87- -4.52
?-76
-4.03?
-15.7- -15.16
-14.3- -13- -13.16 -11.96
-10.86
-9.436
-7.76- -5.1?
-15.46
?-14.9
6-14.3
6- -13.4- -11.8- -10.5- -9.38- -8.96- -8.02- -6.56?
FIG. 4.12. A Windy grid world policy learned by SARSA after 170 episodes
41
-15- -14- -13- -12- -26
-3�
-15- -14- -13- -12- -11- -2?
-2� -36�
-15- -14- -13- -12- -11- -10- -1?
-1� -2�
-15- -14- -13- -12- -11- -10- -9- -3- -3?
-15- -14- -13- -12- -11- -10- -9- -8- -6- -4?
-15- -14- -13- -12- -11- -10- -9- -8- -7- -5?
-15- -14- -13- -12- -11- -10- -9- -8- -7- -6?
FIG. 4.13. Windy grid world optimum policy with Q values. The best path from the startto the goal is underlined.
Table 4.2. Wind Speed and Directions in the Seasonal Windy Grid World
Season Strength DirectionSummer Strong SouthEqunox None
Fall Weak NorthWinter Strong NorthEqunox NoneSpring Weak South
42
FIG. 4.14. A WinBolo console with menus, actions, primary and secondary displays.
island, capture resources, and attack other players’ tanks. In tournament play, the goal is to
have the tank capture and hold the most resources in a time-limited contest. WinBolo was
derived from Bolo, a MAC 68K game that was, in turn, inspired by an older (two player)
video game. Each WinBolo player runs a copy of WinBolo that connects to a WinBolo
server (running on either Windows or Linux). In single-player games, the same Windows
machine is both the client and the server.
A player uses the keyboard to drive the tank, turning left (O) and right (P), speeding
up (Q), slowing down (A), shooting (space), and laying mines (tab). The player drives over
refueling bases to capture them (as well as pillboxes once they have been shot sufficiently
to reduce their armor to zero). The object of the game is to capture all of the refueling
43
bases. Captured pillboxes can be relocated and used to defend your bases. The player can
also build roads, bridges, and buildings provided that enough trees have been harvested to
obtain the raw materials.
It is the complexities of deciding whether to attack or defend, to harvest or build, and
to use speed or stealth, in combination with complex terrain and multiple agents, that make
WinBolo a suitably rich environment for AI research. Version 1.15 of WinBolo (the latest
version as of March 2007) will be used for this research.
WinBolo’s API WinBolo calls a program using its C langauge API a “brain.” The
API is defined in a single file, “brain.h”, and described in a short text document (Morrison
& Cheshire 2000). The API defines a single function, BrainMain(), that WinBolo calls,
giving control (briefly) to the brain. It is called once for initialization (BRAIN_OPEN),
multiple times during the course of a game (BRAIN_THINK), and then once more at ter-
mination (BRAIN_CLOSE). The brain code is given the state of the world from a large
C structure called BrainInfo. From this sensor information, the brain should decide on a
plan of action and then modify certain elements of the interface structure to implement its
actions. The brain code is compiled into a Windows DLL with a file type of “BRN”. The
brain code is activated by choosing it from the “Brains” menu of the WinBolo user console.
The BrainInfo structure contains information about the player’s tank (location on a
65536× 65536 plane, speed (0-64), direction (0-255) and others), the terrain1 near the tank
(mapped onto a 256× 256 grid), and the location and status of nearby tanks, pillboxes, and
bases. There is also static information such as the interface version number.
To indicate an action, the brain makes changes to the BrainInfo structure. Only
certain changes are allowed. The most important variables are holdkeys and tapkeys
1WinBolo comes with a built-in map, “Everard Island.” There are also other maps included in the down-load. More importantly, map editors are available for custom terrain maps.
44
which are used to request single or multiple changes in direction and speed, or to shoot
the gun. Setting items in the BuildInfo structure allows harvesting or building at the
specified map coordinates. You can also send text messages to one or more players. This is
used for coordination in multi-player (agent) tournaments and can be useful as a debugging
tool.
Perturbation There is a small amout of randomness inherent in the WinBolo world
introduced by network and processing speeds. Larger scale perturbation will be added by
training agents to perform on one bolo map and then testing it on another. A WinBolo map
allows (almost) complete customization of the WinBolo world in terms of tank start points,
terrain (forest, rivers, building, etc.) features, and the location and strength of pillboxes. An
example of such a perturbation is to train a WinBolo tank on a map where all the pillboxes
are strength 0 and then to deploy the tank on a map where the pillboxes are strength 1.
Chapter 5
PRELIMINARY RESULTS
I joined the UMBC/UMCP MCL working group in the spring of 2006. This group
included Tim Oates and Matt Schmill of UMBC and Don Perlis, Mike Anderson, Darsana
Josyula and others from UMCP. My initial assignment was on a port to Windows from
Linux of an MCL-enhanced Bolo agent and the construction of maps for WinBolo that
would test the agent’s response to perturbation.
In the summer and fall of 2006, I joined with the others on producing the Indications,
Failure and Response ontologies and in the preparation of papers that discussed using them
with MCL. These are the ontologies presented in the Technical Approach chapter. As Matt
Schmill’s implementation of an ontology-based MCL progressed, I provided the Windows
port and assisted in the detection and correction of problems and the implementation of
missing features.
Along with assisting in the group efforts on the ontologies and the WinBolo domain,
I also worked on creating programs for evaluating MCL in the Chippy and Windy World
domains.
The next three sections discuss the progress made in constructing the test domains and
the initial testing of MCL with Bayesian inference for the Chippy grid world.
45
46
5.1 Grid World Implementation
In the Methodology chapter of this proposal, the test domains that will be used are
described. The Chippy and Windy grid worlds have been implemented in C++. Both
the baseline and non-MCL optimized versions of these agents have been created. The
optimized versions were found by varying the learning parameters across a wide range and
then using the parameters that produced the highest reward after perturbation.
The optimization effort was carried out by varying the learning rate (Figure 5.1), the
exploration rate (Figure 5.2), and the discount rate (Figure 5.3). In each case a single
parameter was changed, leaving the other two parameters at their nominal values (α =
0.5, γ = 0.9, ε = 0.05).
The total reward received after perturbation is given in Table 5.1. The value for the best
parameter is highlighted. The learning rate (α) was best at 0.9 with all of the high values
being better than the low ones. A higher learning rate allowed quicker replacement of of the
old Q values. The exploration rate (ε) that returned the highest reward was 0.06 which is not
too far from the nominal value of 0.05. The best path in Chippy is the same before and after
perturbation although the direction of travel changes. Too low a learning rate keep it from
learning the new direction quickly and too high a value prevents exploiting the optimum
path once it is learned. To achieve the best reward, the best discount rate (γ) at 0.6 was
much lower that the normal 0.9. The rewards earned with (α = 0.5, γ = 0.6, ε = 0.05) at
5,333 was the best of the post-perturbation rewards. For Chippy, lowering the discount rate,
improved post-perturbation performance better than changing the learning or exploration
rates.
47
FIG. 5.1. Effect on Chippy perturbation recovery of varying the learning rate
Table 5.1. Rewards for Chippy after perturbation with varying Q-Learning rates
alpha rewards epsilon rewards gamma rewards.1 619.64 .01 3320.31 .1 4580.51.2 3079.54 .02 3938.82 .2 4933.39.3 3916.66 .03 4225.87 .3 5227.81.4 4427.12 .04 4353.92 .4 5199.27.5 4587.84 .05 4486.38 .5 5256.53.6 4624.60 .06 4629.67 .6 5333.90.7 4809.90 .07 4595.64 .7 5099.67.8 4776.69 .08 4590.09 .8 4929.31.9 5123.45 .09 4618.18 .9 4589.14
48
FIG. 5.2. Effect on Chippy perturbation recovery of varying the exploration rate
49
FIG. 5.3. Effect on Chippy perturbation recovery of varying the discount rate
50
5.2 BoloSoar Implementation
An initial implementation of the WinBolo/Soar interface was done as part of course
work for a class on Agent Architecture and Multi-Agent Systems. In a demostration, a
WinBolo tank controlled by a set of Soar rules found the solution to a small maze using a
random walk (see Figure 5.4).
FIG. 5.4. WinBolo tank outside small maze
Connecting WinBolo to Soar required putting the WinBolo tank’s sensor information
onto Soar’s intput-link. A portion of the code to accomplish this is shown in Fig-
ure 5.5. Once the sensor information has been set, control is turned over to Soar which
applies the rules in its knowledge base and then put actions on the output-link. Fig-
ure 5.6 shows the three Soar rules used to land the tank. There were a total of 36 rules
defined in the random walk demonstration.
51
/* 3. Put tank information on ^input-link */integer_to_input_link(pInputLink, &pspeed,
"speed", brainInfo->speed);integer_to_input_link(pInputLink, &pdir,
"direction", brainInfo->direction);integer_to_input_link(pInputLink, &ptankx,
"tankx", brainInfo->tankx);integer_to_input_link(pInputLink, &ptanky,
"tanky", brainInfo->tanky);integer_to_input_link(pInputLink, &pinboat,
"inboat", brainInfo->inboat);integer_to_input_link(pInputLink, &pnewtank,
"newtank", brainInfo->newtank);
FIG. 5.5. C code to add WinBolo status information to Soar’s input structure
5.3 Chippy Agent With Ontology-Based MCL
The Indications, Failure, and Response ontologies have been created but are still being
revised as are the linkages between the ontologies and the conditional probability tables for
inter- and intra-ontology links. These have progressed far enough for initial testing. A
Q-Learning agent for the Chippy Grid World was enhanced (via the C++ MCL API) to
set expectations and receive suggestions from MCL. Figures 5.7, 5.8, 5.9, and 5.10 show
the code added to the agent for initialization, defining sensors, setting expectations, and
implementing MCL’s suggestions.
Figure 5.11 shows the results for Chippy with and without MCL.
52
# ------------------------------------------------------# landing phase# 1. If still in the boat, speed up to get to shore# 2. Once we reach shore, slow down# 3. Once on shore and stopped, landing phase is done# ------------------------------------------------------
sp {propose*lp-speed-up(state <s> ^io.input-link <i>)(<s> ^phase landing)(<i> ^inboat 1)
-->(<s> ^operator <o> +)(<o> ^name speed
^value increase) }
sp {propose*lp-slow-down(state <s> ^io.input-link <i>)(<s> ^phase landing)(<i> ^inboat 0
-^speed 0)-->
(<s> ^operator <o> +)(<o> ^name speed
^value decrease) }
sp {propose*lp-end-landing(state <s> ^io.input-link <i>)(<s> ^phase landing)(<i> ^inboat 0
^speed 0)-->
(<s> ^operator <o> +)(<o> ^name end-landing) }
FIG. 5.6. Soar rules to land tank
53
// 1. Introduce ourselves to MCLmclAPI::initializeMCL("Chippy2", 0);
// 2. Define propertiesmclAPI::setPV(PCI_INTENTIONAL, PC_NO);mclAPI::setPV(PCI_EFFECTORS_CAN_FAIL, PC_NO);mclAPI::setPV(PCI_SENSORS_CAN_FAIL, PC_NO);mclAPI::setPV(PCI_PARAMETERIZED, PC_YES);mclAPI::setPV(PCI_DECLARATIVE, PC_NO);mclAPI::setPV(PCI_RETRAINABLE, PC_YES);mclAPI::setPV(PCI_HLC_CONTROLLING, PC_NO);mclAPI::setPV(PCI_HTN_IN_PLAY, PC_NO);mclAPI::setPV(PCI_PLAN_IN_PLAY, PC_NO);mclAPI::setPV(PCI_ACTION_IN_PLAY, PC_NO);
mclAPI::setPV(CRC_IGNORE, PC_YES);mclAPI::setPV(CRC_NOOP, PC_YES);mclAPI::setPV(CRC_TRY_AGAIN, PC_YES);mclAPI::setPV(CRC_SOLICIT_HELP, PC_NO);mclAPI::setPV(CRC_RELINQUISH_CONTROL, PC_NO);mclAPI::setPV(CRC_SENSOR_DIAG, PC_NO);mclAPI::setPV(CRC_EFFECTOR_DIAG, PC_NO);mclAPI::setPV(CRC_ACTIVATE_LEARNING, PC_YES);mclAPI::setPV(CRC_ADJ_PARAMS, PC_NO);mclAPI::setPV(CRC_REBUILD_MODELS, PC_YES);mclAPI::setPV(CRC_REVISIT_ASSUMPTIONS, PC_NO);mclAPI::setPV(CRC_AMEND_CONTROLLER, PC_NO);mclAPI::setPV(CRC_REVISE_EXPECTATIONS, PC_YES);mclAPI::setPV(CRC_ALG_SWAP, PC_NO);mclAPI::setPV(CRC_CHANGE_HLC, PC_NO);
FIG. 5.7. C++ code to initialize MCL interface for Chippy
54
// 3. Define the sensorsmclAPI::registerSensor("step"); // [0]mclAPI::registerSensor("old_x"); // [1]mclAPI::registerSensor("old_y"); // [2]mclAPI::registerSensor("new_x"); // [3]mclAPI::registerSensor("new_y"); // [4]mclAPI::registerSensor("reward"); // [5]mclAPI::registerSensor("reward0"); // [6]mclAPI::registerSensor("reward1"); // [7]
// 4. Define the property values for the sensorsmclAPI::setSensorProp("step", PROP_DT, DT_INTEGER); // [0]mclAPI::setSensorProp("old_x", PROP_DT, DT_INTEGER); // [1]mclAPI::setSensorProp("old_y", PROP_DT, DT_INTEGER); // [2]mclAPI::setSensorProp("new_x", PROP_DT, DT_INTEGER); // [3]mclAPI::setSensorProp("new_y", PROP_DT, DT_INTEGER); // [4]mclAPI::setSensorProp("reward", PROP_DT, DT_INTEGER); // [5]mclAPI::setSensorProp("reward0", PROP_DT, DT_INTEGER); // [6]mclAPI::setSensorProp("reward1", PROP_DT, DT_INTEGER); // [7]
mclAPI::setSensorProp("step", PROP_SCLASS, SC_TEMPORAL); // [0]mclAPI::setSensorProp("old_x", PROP_SCLASS, SC_SPATIAL); // [1]mclAPI::setSensorProp("old_y", PROP_SCLASS, SC_SPATIAL); // [2]mclAPI::setSensorProp("new_x", PROP_SCLASS, SC_SPATIAL); // [3]mclAPI::setSensorProp("new_y", PROP_SCLASS, SC_SPATIAL); // [4]mclAPI::setSensorProp("reward", PROP_SCLASS, SC_REWARD); // [5]mclAPI::setSensorProp("reward0", PROP_SCLASS, SC_REWARD); // [6]mclAPI::setSensorProp("reward1", PROP_SCLASS, SC_REWARD); // [7]
FIG. 5.8. C++ code to define the sensors for Chippy using the MCL API
55
// 5. Define the expectation group.// We will add the expectations when we get the rewards.mclAPI::declareExpectationGroup((void *)this);
// Set reward expectation 0 or 1char sensor_name[15];sprintf(sensor_name, "reward\%d", index);expected[index] = reward;mclAPI::declareExpectation((void *)this,
sensor_name,EC_MAINTAINVALUE,(float) reward);
FIG. 5.9. C++ code to define the expectations for Chippy using the MCL API
// 5. Tell MCL what we knowresponseVector m = mclAPI::monitor(sensors, 8);
// 6. Evaluate the suggestions from MCLif (m.size() > 0){
int q=1;for (responseVector::iterator rvi = m.begin();rvi!=m.end();
rvi++){
cout << "response[ref" << hex<< (*rvi)->referenceCode()<< "] #" << q++ << ": "<< (*rvi)->responseText() << endl;
}}
FIG. 5.10. C++ code for Chippy to implement suggestions from MCL
56
FIG. 5.11. Average Rewards per Step for Chippy with and without MCL monitoring
Chapter 6
RELATED WORK
In (Cox 2005), Michael Cox provides a survey of selected AI metacognition research
areas through 2000 (and a little beyond). Newer research is surveyed in (Anderson &
Oates 2007). This chapter will contrast several of the projects from the two surveys with
the ontology-based Metacognitive Loop. The chapter starts with a little reflection, looking
at pre-ontology, and early-ontology MCL papers. It ends with a section on a topic not
covered in the survey papers, monitoring multi-agent systems.
6.1 Pre-ontology Metacognitive Loop
Both surveys reference (Anderson & Perlis 2005) when discussing the Metacognitive
Loop. The paper describes the problem of brittle mess in AI systems due the lack of
perturbation tolerance. The Metacognitive Loop with its notice, assess, and guide phases
is offered as a solution. A trio of problem domain (reinforcement learning, navigation, and
human-computer dialogue) are shown to benefit from adding MCL. Three research areas
are proposed corresponding to the three MCL phases:
1. How should expectations be formulated to best track the performance of the systems?
2. How should the reasoning about the exceptions be organized?
57
58
3. What are the best strategies for guiding a system back to proper operation?
The first and third questions remains an open issues. Bayesian Inference over three sets of
ontologies is thought to be the answer to second question and the proposed research should
show it to be an effective approach.
In (Anderson et al. 2006), three alternatives to the incorporation of MCL to deal with
perturbations are offered:
1. Do nothing,
2. Incorporate a recovery strategy for every possible problem, and
3. Create an extensive world model and continually compare the actual and predicted
performance.
The first of these approaches offers nothing except ease of implementation while the last
two are too expensive to use. MCL is offered as cost-effect alternative as it has only a
moderate cost and can greatly improve a systems tolerance to perturbation. This is demon-
strated with the Chippy grid world (see section 4.2.1). Perturbation in Chippy was used to
explore different expectations (average reward and steps between rewards), different assess-
ment techniques (immediate and cumulative), and different recovery strategies (increasing
the exploration rate and resetting the Q values). All of this was tailored for Q-learning
and would not be applicable to other types of cognitive systems. The approached outlined
in this proposal should produce the same perturbation tolerance as observed in the MCL
enhanced Chippy but will be generally applicable.
6.2 Early Ontology Metacognitive Loop
In 2007 a series of papers were published giving a preview of an ontology-based MCL.
The use of ontologies is introduced in (Anderson et al. 2007b). It describes the three
59
ontontologies and how they are are linked together. The tank game, Bolo, is used as an
example domain. However, instead of using Bayesian inference, the paper discusses how
reasoning is done from expectations to response by spreading activation. An expanded
version of the paper (Schmill et al. 2007) includes an human-computer dialogue example
as well as Bolo. The Chippy reenforcement learner is used as the example in (Anderson et
al. 2007a).
The above three papers introduced the idea of generalizing the Metacognitive Loop by
using domain neutral ontologies in the Note, Assess, and Guide phases along with domain
specific expectations and corrective actions. This proposal replaces spreading activation
with Bayesian inference, adds a application program interface, and address the problem of
reentrant and recursive invocation.
6.3 Model-Based Reflection
Model-based reflection (MBR) ((Stroulia & Goel 1995) and (Stroulia & Goel 1996))
also uses a three phase approach to provide metacognition. The “monitor” phase checks
expectations, the “assign blame” phase determines the cause of the failure, and the “re-
design” phase makes the necessary corrections. But rather than using a general model of
cognitive systems and their failures, MBR uses a detail model of the problem solver using
structure-behavior-function (SBF) models. Having these models allows the expectations to
be automatically generated.
6.4 Multi-agent Metacognition
One approach to handling problems (or perturbations) in multi-agent systems is task
an agent with monitoring and controlling the other agents. These sentinel (Hägg 2000)
agents perform act as a metacognitive control on the collection of task solving agents (Fig-
60
ure 6.1). The sentinel monitors the communication between agents. If an agents is acting
outside the model of the application specific interaction plan, the sentinel can take action
to correct the situation such as killing an agent, or informing other agents to ignore it.
Sentinels are also proposed in (Dellarocas & Klein 2000). But rather than a appli-
cation specific model of the agent interactions, a general purpose three phase monitoring
scheme augmented by a knowledge base of error conditions, causes, and responses is used.
Table 6.1 contrasts this approach with the MCL phases and ontologies.
Table 6.1. Monitor and control in MCL and Sentinels
MCL SentinelsPhase Ontology Phase KBNotice Indications Instrumentation FailureAssess Failure Diagnosis ExceptionGuide Response Resolution Resolution
The similarities between metacognition for single agents and sentinels for multi-
agents systems allows the transfer of ideas between the two. This is particularly apparent
when the fault handling approach is abstracted using ontologies (or knowledge bases) to
hold domain specific information.
61
FIG. 6.1. Monitoring a multi-agent system with a sentinel is isomorphic to usingmetacognition with a single cognitive agent.
Chapter 7
FUTURE WORK
The work described in the Technical Approach and Methodology sections is consider-
able and while it will greatly advance the utility of the Metacognitive Loop as an augmenta-
tion to cognitive systems, there are several areas for further enhancement. Applying MCL
to different domains and evaluating its effectiveness can be an ongoing effort, especially
with comparisons to other approaches in the domain. The following are suggestions for
larger scale extensions to the Metacognitive Loop that are beyond of scope of the proposed
effort.
7.1 Automatic Expectation Generation
The MCL NAG cycle starts when an exception has occurred. It is required that the
designer of the system specify the exception conditions. For many maintenance condi-
tions, these are fairly obvious and easy to create: internal temperature will not exceed 150
degrees, battery power will not drop below 5 percent.
Perhaps a better solution would be to specify a minimal set of absolute expectations
and then have MCL learn and incorporate additional expectations. This has several advan-
tages:
Minimizing human effort Since only a limited set of expectations would have to be given
62
63
to MCL, the domain programmer’s task would be limited to specifying that limited
set and not a broad range of expectations.
Minimizing the exception-checking overhead A system can have a great many mainte-
nance expectations - most of which will never be violated. Such expectations degrade
the system as they need to be checked against the sensor values just as often as ex-
pectations that may be violated. By generating expectations through learning, only
expectations that have the potential for violation would be created.
Improve problem detection As problems are identified, expectations could be created to
detect them earlier. This would allow identification of a reoccurrence of a problem
early enough to avoid or lessen the consequences.
The automatic generation of expectations would have to be tempered with common-
sense reasoning or other heuristics to prevent generating expectations that do not improve
the efficiency of the host system.
7.2 Automatic Ontology Expansion/Linking
The structure (node and linkages) of the MCL ontologies were created based on ex-
perience on the initial problem domains and modified as additional domains and analysis
was undertaken. It is, however, a static model and is certainly not optimal for all situations.
As with the automatic generation of expectations, the automatic generation of ontology
nodes and linkages has the potential to improve the operation of MCL, particularly in new
domains. The same caveat applies that any changes must improve the efficiency of the host
system and may require extensive heuristics to implement.
64
7.3 Application to Multi-agent systems
All of the problem domains discussed and evaluated in this proposal are with single
agents. MCL can be directly applied to individual agents in a multi-agent system. This can
be either normal agents or sentinel agents.
When using MCL to monitor and control multiple agents, additional concrete re-
sponses would be needed with the corresponding augmentations to the Response ontology.
The Failure ontology would need to be expanded to incorporate nodes for agent communi-
cation and coordination failures. Agents can be treated as both sensors and effectors of the
sentinel agents.
7.4 Transferring Learning with MCL Networks
As MCL works with a host system, the conditional probabilities on the intra-
ontological and inter-ontological links change to reflect the experience of what suggestions
were effective strategies in coping with the failed expectations. Each host system is differ-
ent with different expectations and available concrete responses, but there should be a way
to apply a tuned set of conditional probabilities from one system to another.
7.5 Modeling dynamic environments
A basic tenet of this paper is that MCL provides agents with a mechanism for coping
with dynamic environments so that such mechanisms do not need to be crafted into the
agents themselves. But, what exactly is a dynamic environment and how is it quantified?
Is is possible to create quantitative or predictive models of dynamic environments and how
could these models be used to improve the operation of the NAG cycle within MCL?
Chapter 8
TIMETABLE
This section details a twenty month timetable to enhance MCL. It covers research
activities, code development and testing, evaluation, and completion of the dissertation
materials with the final defense. The work concludes with a May 2009 commencement.
8.1 Activities
There are several activies that need to be accomplished, but they can be divided into
five major segments.
Bayesian Ontologies The NAG cycle of MCL will be reworked to use Bayesian inference
over Indications, Failure, and Response ontologies.
HTN Bolo and BoloSoar Implementation of a task planner in the Bolo tank domain both
in C++ and in Soar.
Reentrant and Recursive MCL Investigate and implement knowledge structures that
support reentrant and recursive invocations of MCL to deal with errors over time
and self improvement.
Evaluation Evaluate the enhanced MCL in terms of performance, ease and breadth of
implementation, and additional computation resources required.
65
66
Dissertation and Defense Complete, revise, and defend the dissertation.
Intertwined with the these activities will be many support and minor research project,
including
• conference and journal papers with partial results
• C++, C, and additional APIs for MCL
• porting MCL to various machines and operating systems
• documentation and demonstration programs for the MCL and BoloSoar APIs
8.2 Monthly Schedule
Table 8.1 gives a month-by-month breakdown of the activities described above.
67
Table 8.1. Monthly Timetable for MCL Reengineering
Year Month Number Activity2007 Aug Proposal Defense
Sep 1 Bayesian OntologyOct 2 Bayesian OntologyNov 3 Bayesian OntologyDec 4 HTN Bolo
2008 Jan 5 HTN BoloFeb 6 Bolo Soar
March 7 Bolo SoarApril 8 ReentrancyMay 9 ReentrancyJune 10 ReentrancyJuly 11 Recursive MCLAug 12 Recursive MCLSep 13 Recursive MCLOct 14 Performance measurementsNov 15 Implementation measurementsDec 16 Computational Requirements
2009 Jan 17 Revise DissertationFeb 18 Revise DissertationMar 19 Committee ReviewApr 20 Dissertation DefenseMay Commencement
REFERENCES
[1] Anderson, M. L., and Oates, T. 2007. A review of recent research in metareasoning
and metalearning. AI Magazine 28(1):7–16.
[2] Anderson, M. L., and Perlis, D. R. 2005. Logic, self-awareness and self-improvement:
the metacognitive loop and the problem of brittleness. Journal of Logic and Computa-
tion 15(1):21–40.
[3] Anderson, M. L.; Oates, T.; Chong, W.; and Perlis, D. 2006. The metacognitive
loop: Enhancing reinforcement learning with metacognitive monitoring and control for
improved perturbation tolerance. Journal of Experimental and Theoretical Artificial
Intelligence 18(3):387–411.
[4] Anderson, M. L.; Fults, S.; Josyula, D. P.; Oates, T.; Perlis, D.; Schmill, M. D.; Wilson,
S.; and Wright, D. 2007a. A self-help guide for autonomous systems. AI Magazine.
[5] Anderson, M. L.; Schmill, M.; Oates, T.; Perlis, D.; Josyula, D.; Wright, D.; and Wil-
son, S. 2007b. Toward domain-neutral human-level metacognition. In 8th International
Symposium on Logical Formalizations of Commonsense Reasoning, 1–6.
[6] Cox, M. T. 2005. Metacognition in computation: a selected research review. Artificial
Intelligence 169(2):104–141.
[7] Dellarocas, C., and Klein, M. 2000. An experimental evaluation of domain-
independent fault handing services in open multi-agent systems. In Proceedings of the
International Conference on Multi-agent Systems (ICMAS).
[8] Hägg, S. 2000. A sentinel approach to fault handling in multi-agent systems. In
Proceedings of the 2nd Australian Workshop on Distributed AI.
68
69
[9] Morrison, J., and Cheshire, S. 2000. How to write plug-in brains. www.winbolo.com.
Included in Winbolo distribution.
[10] Morrison, J. 2006. Lin-Winbolo Manual. www.winbolo.com. Included in Winbolo
distribution.
[11] Schmill, M.; Josyula, D.; Anderson, M. L.; Wilson, S.; Oates, T.; Perlis, D.; Wright,
D.; and Fults, S. 2007. Ontologies for reasoning about failures in ai systems. In
Workshop on Metareasoning in Agent-Based Systems.
[12] Stroulia, E., and Goel, A. K. 1995. Functional representation and reasoning for
reflective systems. Journal of Applied Intelligence 9(1):101–124.
[13] Stroulia, E., and Goel, A. K. 1996. A model-based approach to blame assignment:
Revising the reasoning steps of problem solvers. In Thirteenth Annual Conference on
Artificial Intelligence, 959–965. Portland, Oregon: AAAI Press.
[14] Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning, An Introduction.
Cambridge, MA: MIT Press.