nonmonotonic logic and temporal projection

ARTIFICIAL INTELLIGENCE 379

Nonmonotonic Logic and Temporal Projection*

Steve Hanks and Drew McDermott** D e p a r t m e n t o f C o m p u t e r Science , Yale Univers i ty , N e w H a v e n ,

C T 06520, U . S . A .

Recommended by Daniel G. Bobrow

ABSTRACT

Nonmonotonic formal systems have been proposed as an extension to classical first-order logic that will capture the process of human "default reasoning" or "plausible inference" through their inference mechanisms, just as modus ponens provides a model for deductive reasoning. But although the technical properties of these logics have been studied in detail and many examples of human default reasoning have been identified, for the most part these logics have not actually been applied to practical problems to see whether they produce the expected results.

We provide axioms for a simple problem in temporal reasoning which has long been identified as a case of default reasoning, thus presumably amenable to representation in nonmonotonic logic. Upon examining the resulting nonmonotonic theories, however, we find that the inferences permitted by the logics are not those we had intended when we wrote the axioms, and in fact are much weaker. This problem is shown to be independent of the logic used; nor does it depend on any particular temporal representation. Upon analyzing the failure we find that the nonmonotonic logics we considered are inherently incapable of representing this kind of default reasoning.

The first part of the paper is an expanded version of one that appeared in the 1986 A A A I proceedings. The second part reports on several responses to our result that have appeared since the original paper was published.

1. Introduction

Logic as a representat ion language for AI theories has always held a particular appeal in the research communi ty (or in some parts of it, anyway): its rigid syntax forces one to be precise about what one is saying, and its semantics provide an agreed-upon and well-understood way of assigning meaning to the symbols. But if logic is to be more than just a concise and convenient notat ion that helps us in the task of writing programs, we somehow have to validate the axioms we write: are the conclusions we can draw from our representat ion

* This is a revised version of the paper that won the AI Journal Best Paper Award at AAAI-86, Philadelphia, PA.

** This work was supported in part by ONR grant N00014-85-K0301.

Artificial Intelligence 33 (1987) 379-412 0004-3702/87/$3.50 © 1987, Elsevier Science Publishers B.V. (North-Holland)

380 S. HANKS AND D. MCDERMO'Iq ~

(i.e., the inferences the logic allows) the same as the ones characteristic of the reasoning process we are trying to model? If so, we 've gone a long way toward validating our theory. In fact several research efforts have been based around the idea that logic can describe both ontology and reasoning processes. Most notable among these are the "Naive Physics" program proposed by Hayes (e.g. in [9]) and the work done by McCarthy on circumscription (e.g. in [17]).

The limitation of classical logic as a representat ion for human reasoning is that its inference rule, modus ponens, is the analogue to human deductive reasoning, but for the most part everyday human reasoning seems to have significant nondeductive components . But while certain aspects of human reasoning (e.g. inductive generalization and abductive explanation) seem to be substantially different from deduction, a certain class of reasoning, dubbed "default reasoning," resembles deduction more closely. Thus it was thought that extensions to first-order logic might produce formal systems capable of representing the process of default reasoning.

While it is still not clear exactly what constitutes default reasoning, the phenomenon commonly manifests itself when we know what conclusions should be drawn about typical situations or objects, but we must jump to the conclusion that an observed situation or object is typical. For example, I may know that I typically meet with my advisor on Thursday afternoons, but I can' t deduce that I will actually have a meeting next Thursday because I don ' t know whether next Thursday is typical. While certain facts may allow me to deduce that next Thursday is not typical (e.g. if I learn he will be out of town all next week), I will not generally be able to deduce that it is typical. What we want to do in cases like this is to jump to the conclusion that next Thursday is typical based on two pieces of information: first that most Thursdays are typical, and second that we have no reason to believe that this one is not. Another way to express the same notion is to say that I know that I have meetings on typical Thursdays, and that the only atypical Thursdays are the ones that ! k n o w (can deduce) are atypical. (Thus everything is typical that cannot be proved to be atypical.)

Research on nonmonotonic logics 1, most notably by McCarthy (in [16, 17]), McDermot t and Doyle (in [21]) and Reiter (in [24]) at tacked the problem of extending first-order logic in a way that captured the intuitive meaning of statements of the form "lacking evidence to the contrary, infer a " or more

1 So called because of the property that inferences allowed by the logic may be disallowed as axioms are added. For example I may jump to the conclusion that next Thursday is typical, thus deduce that I will have a meeting. If I later come to find out that it is atypical, I will have to retract that conclusion. In first-order logic the addition of new knowledge (axioms) to a theory can never diminish the deductions one can make from that theory, thus it is never necessary to retract conclusions.

2 This sounds much more straightforward than it is: consider that the theorems of a logic are defined in terms of its inference rules, yet here we are trying to define an inference rule in terms of what is or is not a theorem.

NONMONOTONIC LOGIC AND TEMPORAL PROJECTION 381

generally "infer/3 from the inability to infer a . ' ' 2 But since that first flurry of research the area has developed in a strange way. On one hand the logics have been subjected to intense technical scrutiny (in the papers cited above, and also, for example, by Davis in [3]) and have been shown to produce counterin- tuitive results under certain circumstances. But these examples always have a certain contrived appearance to them. A favorite example is a theory contain- ing a rule "infer ot from the inability to infer ~." Presumably there would be no real representation problem in which the inability of prove a formula should be a reason to believe that formula, so we wonder whether the anomolous technical results really should matter to us in representing the real world.

At the same time we see in the literature practical representation problems such as story understanding (Charniak [1]), social convention in conversation (Joshi, Webber, and Weischedel [10]), and temporal reasoning (McDermott [19] and McCarthy [17]), in which default rules would seem to be of use, but in these cases technical details of the formal systems are for the most part ignored. At best the researchers admit that they don't understand exactly what they're writing when they resort to the language of default logic (see [19], for example); at worst they just go ahead and use the syntax without further analysis.

The middle ground--whether the technical workings of the logics correctly bear out one's intentions in representing practical default-reasoning prob- l ems- i s for the most part empty, though the work of Reiter, Etherington, and Criscuolo (in [4, 5] and elsewhere) is a notable exception. Logicians have for the most part ignored-practical problems to focus on technical details, and "practitioners" have used the default rules intuitively, with the hope (most often unstated) that the proof theory or semantics of the logics can eventually be shown to support those intuitions.

We explore that middle ground by presenting a problem in temporal reasoning that involves default inference, writing axioms in nonmonotonic logics intended to represent that reasoning process, then analyzing the resulting theory.

Reasoning about time is an interesting application for a couple of reasons. First of all, the problem of representing the tendency of facts to endure over time (the "frame problem" of McCarthy and Hayes [18] or the notion of "persistence" of McDermott [19]) has long been assumed to be one of those practical reasoning problems that nonmonotonic logics would solve. Second, one has strong intuitions about how the problem should be formalized in the logics, and even stronger intuitions about what inferences should then follow, so it will be clear whether the logics have succeeded or failed to represent the domain correctly.

In the first part of this paper we discuss briefly some teclamcal aspects of nonmonotonic logics, then go on to pose formally the problem of temporal projection. We then analyze the inferences allowed by the resulting theory and show that they do not correspond to what we had intended when we wrote the

382 S. HANKS AND D. MCDERMOTT

axioms. Finally we point out the (unexpected) characteristics of the domain that the logics were unable to capture, and discuss proposed solutions to the problem.

The paper we originally published on this result, which first appeared as a Yale technical report [7], then in a shortened version in the AAAI-86 proceedings [8], generated a wide range of responses. In the second part of this paper we will discuss these replies and comment on the extent to which they actually address the issues we raised.

2. Nonmonotonic Inference

Since we are considering the question of what inferences can be drawn from a nonmonotonic theory, we should look briefly at how inference is defined in these logics. We will concentrate on Reiter 's default logic and on circumscription, but the discussion and subsequent results hold for McDermott ' s nonmonotonic logic as well.

2.1. Inference in default logic

Reiter in [24] defines a default theory as two sets of rules. The first consists of sentences in first-order logic (and is usually referred to as W), and the second is a set of default rules (referred to as D). Default rules are supposed to indicate what conclusions to jump to, and are of the form

:M/3

Y

where a , /3 , and 3/are first-order sentences. The intended interpretation of this rule is "if you believe ~, and it's consistent to believe/3, then believe y " , or, to phrase the idea more like an inference rule, " f rom ~ and the inability to prove -1/3, infer y ." (But recall our note above about the futility of trying to define inference in terms of inference.)

In order to discuss default inference Reiter introduces the concept of an extension--a set of sentences that "ex tend" the sentences in W according to the dictates of the default rules. A default theory defines zero or more extensions, each of which has the following properties:

(1) any extension E contains W, (2) E is closed under (monotonic) deduction, and (3) E is faithful to the default rules.

By the last we mean that if there's a default rule in the theory of the form (~ :M[3)/y, and if s E E , and ( - a f l ) ~ E , then y E E . The extensions of a default theory are all the minimal sets E that satisfy these three properties. Extensions can be looked upon as internally consistent and coherent states of the world, and a reasoner should thus "pick" an extension from those


generated by the default theory and reason within it. Although an extension is by definition consistent, the union of two extensions may be inconsistent.

Finding a satisfying definition of default inference--what sentences can be said to follow from a default theory--is tricky. Reiter avoids the problem altogether, focusing on the task of defining extensions and exploring their properties. He ascribes to the view of reasoning that we noted in the paragraph above: that default reasoning is really a process of selecting one extension of a theory, then reasoning "within" this extension until new information forces a revision of one's beliefs and hence the selection of a new extension.

This view of default reasoning, while intuitively appealing, is infeasible from a practical standpoint: there is no way of "isolating" a single extension of a theory, thus no procedure for enumerating or testing theoremhood within an extension. So any definition of default reasoning based on discrimination among extensions is actually beyond the expressive power of default logic. Reiter does provide a proof procedure for asking whether a sentence is a member of any extension, but, as he points out, this is not a satisfying definition of inference since both a sentence and its negation may appear in different extensions.

Our view is that some notation of inference is necessary to judge the representational power of the logic. A logic that generates one intuitive extension and one unintuitive extension does not provide an adequate representation of the problem, since there is no way to distinguish between the two interpretations. For that reason we will define inference in the manner of McDermott's logic: a sentence ~ can be inferred from a default theory just in case ~ is in every extension of that theory. (This definition is also consistent with circumscriptive inference as described in the next section.)

While there is no general procedure for determining how many extensions a given theory has, as a practical matter it has been noted that theories with "conflicting" default rules tend to generate multiple extensions. For example, the following default theory

W = { Q(N) , R(N)},

o = { P(x) ' -1P(x)

has two rules that would have us jump to contradictory conclusions. Applying one of the default rules means that the other cannot be applied, since its precondition is not met. This default theory has two extensions: E 1 = {Q(N), R ( N ) , P(N)} and E 2 = {Q(N), R(N), -1P(N)} that correspond to the two choices one has in applying the default rules. (One interpretation of this theory reads Q as "Quaker," R as "Republican," P as "Pacifist," and N as "Nixon.") Thus according to our definition of inference the above theory entails only the sentences in IV, plus tautologies (for example P(N) v --1P(N)).

384 S. H A N K S A N D D. M C D E R M O T T

We are not claiming that the admission of multiple extensions is a fault or deficiency of the logic--in this particular example it's hard to imagine how the logic could license any other conclusions. (It would be our responsibility to add additional rules to resolve the conflict, saying, for example, that one's Quaker tendencies always win out over one's Republican tendencies, or vice versa. McCarthy [17] and McDermott [20] both discuss techniques for restructuring the default rules to mediate their application.)

The point is that when a theory generates multiple extensions it's generally going to be the case that only weak conclusions can be drawn. Further, if one extension captures the intended interpretation but there are other different extensions, it will not be possible to make only the intended inferences.

2.2. Inference in circumscription

To describe inference in circumscribed theories we will have to be rather vague: there are several versions of the logic, defined in [14, 16, 17] and elsewhere, and we will not spend time discussing these differences.

We will speak generally of predicate circumscription, in which the intent is to minimize the extension of a predicate (say P) in a set of first-order axioms. Using terms like those we used in describing default logic, we might say that when we circumscribe axioms over P we intend that "the only individuals for which P holds are those individuals for which P must hold," or alternatively we might phrase it as "believe 'not P' by default ."

To circumscribe a set of axioms A over a predicate P one adds to A an axiom (the exact form of which is not important for our discussion) that says something like this: "any predicate P ' that satisfies the axioms A, and is at least as strong as P, is exactly as strong as P." The intended effect is (roughly) that for any individual x,

if A ~z P(x) , then Circum(A, P) ~- -7 P(x) ,

where Circum(A, P) refers to the axioms in A augmented by the circumscription axiom for P. This formulation is oversimplified: in addition to deciding what predicate(s) to circumscribe over, one must also decide on a set of predicates that are allowed to vary as parameters in the circumscription axiom. We will not discuss this complication further-- in a previous paper [7] we do so at some length.

To talk about circumscriptive inference we should first note that since Circum(A, P) is a first-order theory we want to know what deductively follows, but we are interested in characterizing those deductions in terms of the original axioms in A. In brief, the results are these: if a formula ~ is a theorem of Circum(A, P) then ~ is true in all models of A minimal in P (this property is called soundness), and if a formula ~ is true in all models of A minimal in P


then ~ is a theorem of Circum(A, P) (this property is called completeness). Completeness does not hold for all circumscribed theories, but it does hold in certain special cases--see Minker and Perlis [22].

Minimal models, the model-theoretic analogue to default-logic extensions, are defined as follows: a model ~ is minimal in P just in case there is no model ~ t ' that agrees with ~t on all predicates except for P, but whose extension of P is a subset of ~ ' s extension of P.

As with default logic, there is no effective procedure for determining how many minimal models a theory has. And note that the converse of the soundness property says that if q~ is not true in all models minimal in P it does not follow from Circum(A, P). So once again, if we have multiple minimal models we can deduce only those formulas true in all of them. Because of the obvious parallels between extensions and minimal models (and "NM fixed points" in McDermott ' s logic, which we will not discuss here), we will use the terms interchangeably when the exact logic or object doesn't matter.

3. The Temporal Projection Problem

The problem we want to represent is this: given an initial description of the world (some facts that are true), the occurrence of some events, and some notion of causality (that an event occurring can cause a fact to become true), what facts are true once all the events have occurred?

We obviously need some temporal representation to express these concepts, and we will use the situation calculus [18]. We will thus speak about facts holding true in situations. A fact syntactically has the form of a first-order sentence, and is intended to be an assertion about the world, such as SUNNY, LOADED(GUN-35), or VX.HAPPY(X). Situations are individuals denoting intervals of time over which facts hold or do not hold, but over which no fact changes its truth value. This latter property allows us to speak unambiguously about what facts are true or false in a situation. To say that a fact f is true in a situation s we assert t ( f , s), where t is a predicate and f and s are terms. In fact t and ab, which we will define below, are the only predicates in our logic.

Events are things that happen in the world, and the occurrence of an event may have the effect of changing the truth value of a fact. So we think of an event as a transition from one situation to ano ther - - f rom the situation in which it occurs to one in which its effects are reflected. The function result maps an event and a situation into another situation. So if S o is a situation and WAKEUP(JOHN) is an event, then result(WAKEUP(JOHN), So) is also a situa- t i o n - p r e s u m a b l y the one resulting from JOHN waking up in situation S 0. We might then want to state that JOHN is awake in this situation:

t(AWAKE(JOHN), result( WAKEUP( JOHN ) , So)),

386 s. HANKS AND D. MCDERMOTF

or more generally we might state that

Vp,s . t(AWAKE(p), result(WAKEUP(p), S)) .

A problem arises when we try to express the notion that facts tend to stay true from situation to situation as irrelevent events occur. For example, is JOHN still awake in the state $2, where

S 2 = result(EAT-BREAKFAST(JOHN), result(WAKEUP(JOHN), So))9.

Intuitively we would like to assume so, because it's typically the case that eating breakfast does not cause one to fall asleep. But given the above axioms there is no way to deduce

t(AWAKE(JOHN), $2).

We could add an axiom to the effect that if one is awake in a situation then one is still awake in the situation resulting from his eating breakfast, but this seems somewhat arbitrary and will occasionally be false. And in any reasonable description of what one might do in the course of a morning there would have to be a staggering number of axioms expressing something like "if fact f is true in a situation s, and e is an event, then f is still true in the situation result(e, s) ." McCarthy and Hayes (in [18]) called axioms of this kind "frame axioms," and identified the "frame problem" as the problem of having to explicitly state many such axioms. Deductive logic forces us into the position of assuming that an event occurrence may potentially change the truth value of all facts, thus if it does not change the value of a particular fact in a particular situation we must explicitly say so. We would like to assume just the opposite: that most events do not affect the truth of most facts under most circumstances.

Intuitively we want to solve the frame problem by assuming that in general an event happening in a situation is irrelevant to a fact's truth value in the resulting situation. Or, to make the notion a little more precise, we want to assume "by default" that

t( f , s) D t( f , result(e, s))

for all facts, situations, and events. But note the quotes: the point of this paper is that formalizing this assumption is not as straightforward as the phrase might lead one to believe.

McCarthy's proposed solution to the frame problem (described in [17]) involves extending the situation calculus a little, to make it what he calls a "simple abnormality theory ." We state that all "normal" facts persist across occurrences of "normal" events:


V f , e,s. t( f , s) A T a b ( f , e, s) ~ t( f , result(e, s) ) ,

where ab( f , e, s) is taken to mean "fact f is abnormal with respect to event e occurring in state s ," or, " there 's something about event e occurring in state s that causes fact f to stop being true in result(e, s)." We would expect, for example, that it would be true that

V p , S. a b ( A W A K E ( p ) , GOTOSLEEP(p) , S)

and we would have to add a specific axiom to that effect. Of course we still haven't solved the frame problem, since we haven't

provided any way to deduce 7 ab(f , e, s) for most facts, events, and situations. As an alternative to providing myriad frame axioms of this form, McCarthy proposes that we circumscribe over the predicate ab, thus "minimizing" the abnormal temporal individuals. 3 The question is whether this indeed represents what we intuitively mean by saying that we should "assume the persistence of facts by default ," or "once true, facts tend to stay true over t ime."

As an illustration of what we can infer from a simple situation-calculus abnormality theory, consider the axioms of Fig. 1. For simplicity we have restricted the syntactic form of facts and events to be propositional symbols, so the axioms can be interpreted as referring to a single individual who at any point in time (situation) can be either ALIVE or DEAD and a gun that can be either LOADED or UNLOADED. At some known situation S o the person is alive (axiom (1)), and the gun becomes loaded any time a LOAD event happens (axiom (2)). Axiom (3) says that any time the person is shot with a loaded gun he becomes dead, and furthermore that being shot with a loaded gun is abnormal with respect to staying alive. Or to express our intent as we did above: there is something about a SHOOT event occurring in a situation in which the gun is loaded that causes the fact ALIVE to stop being true in the situation resulting from the shot. Axiom (4) is just the assertion we made above that "normal" facts persist across the occurrence of "normal" events.

Then we circumscribe axioms (1) - (4) over ab, varying t, and recall that what follow from the augmented axiom are those formulas true in all models

t(ALIVE, So) (1)

Vs. t(LOADED, result(LOAD, s)) (2)

Vs. t(LOADED, s) D ab(ALIVE, SHOOT, s) A t(DEAD, result(SHOOT, s)) (3)

Yf, e,s. t(f, s) A -Tab(f, e, s) D t(f, result(e, s)) (4)

FIr. 1. Simple situation-calculus axioms.

3 More precisely, the proposal should be to circumscribe over ab while allowing predicate t to vary. Again, we will ignore this point.


minimal in ab. We will refer to axioms (1) - (4) as A, and to the circumscribed axioms as Circum(A, ab, t).

Now consider the problem of projecting what facts will be true in the following situations:

So,

S, = result(LOAD, So) ,

S 2 = result(WAIT, S I ) ,

S 3 = result(SHOOT, S2) = result(SHOOT, result(WAIT, result(LOAD, So))).

In other words, our individual is initially known to be alive, then the gun is loaded, then he waits for a while, then he is shot with the gun. The projection problem is to determine what facts are true at the situations S i. The event WAIT is supposed to signify a period of time when nothing of interest happens. Since according to the axioms no fact is abnormal with respect to the occurrence of event WAIT occurring, we intend that every fact true before the WAIT event occurs should also be true after it occurs.

One interpretation of the axioms (1 ) - (4 ) is shown in Fig. 2(a). This first picture represents facts true in all models of A (thus what we can deduce if we don ' t circumscribe). From axioms (1) and (2) we can make the following deductions:

t(ALIVE, S0), t(LOADED, S l ) ,

but we can deduce nothing about what is true in S 2 or in S 3. We also cannot

(a)

So Sl S: SO

I ALIVE A L I V E [ [ALIVE[ -] LOADEDi 1 =i LOADEDI

ab(ALIVE, WAIT,SI) ab(LOADED,WAIT,SI)

DEAD LOADED

ab(ALIVE,SHOOT,S2) ab(LOADED,SHOOT,S2)

(e) ALIVE ] _[ ALIVE _ ALIVE LOADED

J . J -' I -l -~ ab(ALIVE,LOAD,SO) -1 ab(ALIVE,WAIT,S1)

ab(LOADED,W~IT,SI) FIG. 2. Three models of axioms (1)-(4) of Fig. 1.

_ ALIVE ]

-, ab (ALIVE ,SHOOT,S2 )


deduce any "abnormali t ies" or their negations. But this is pretty much as expected: the ALIVE fact did not persist because we could not deduce that it was "not ab" with respect to loading the gun, and the gun being loaded did not persist through the WAIT event because we could not deduce that it was "not ab" with respect to waiting.

Intuitively we would like to reason about "minimizing abnormalities" like this: we know ALIVE must be true in So, and nothing compels us to believe ab(ALIVE, LOAD, So), so we assume its negation. From this assumption and from axiom (4) we deduce t(ALIVE, S 1). Reasoning along the same lines, we can deduce t(LOADED, S1) and we are free to make the assumptions ~ab(ALIVE, WAIT, S1) and Tab(LOADED, WAIT, $1) so we do so and go on to deduce t(ALIVE, $2) and t(LOADED, $2). Again moving forward in time, we can deduce from axiom (3) that ab(ALIVE, SHOOT, $2) so we cannot assume its negation, but we can assume Tab(LOADED, SHOOT, $2). At this point we can deduce t(DEAD, $3) and t(LOgOEO, $3).

This line of reasoning leads us to the interpretation of Fig. 2(b). We can easily verify that this interpretation is indeed a model of A (that axioms (1) through (4) are satisfied). Furthermore, this model is minimal in ab: any submodel would have to have an empty extension for ab, which cannot be the

4 case. The interesting question now is whether the model of Fig. 2 (b ) - -our

intended model of A--is the only minimal model, or, more to the point, whether t(DEAD, $3) and t(LOADED, $3) are true in all minimal models. Because if they 're not true in all minimal models they don' t follow from Circum(A, ab, t).

Consider the situation in Fig. 2(c). The picture describes a state of affairs in which the gun ceases to be loaded "as a result of" waiting. Then the individual does not die as a result of the shot, since the gun is not loaded. Of course this state of affairs directly contradicts our stated intention that since nothing is explicitly " a b " with respect to waiting everything should be "not ab" with respect to waiting. Does this interpretation describe a minimal model? First recall that there can be no models having a null extension for ab, so if this interpretation is a model at all it must be minimal in ab.

One can "build" this model in much the same way we constructed the model of Fig. 2(b), except this time instead of starting at S O and working forward in time, we will start at S 3 and work backward. In other words, we will start by noting that t(ALIVE, $2) must be true, then consider what must have been true at S 2 and in earlier situations for that to have been the case.

The first abnormality decision to make is whether ab(ALIVE, SHOOT, 82) is

4 To see why this is true, consider that t(ALIVE, $2) must be either true or false in any model of A. If it's true we can immediately deduce an abnormality from axiom (3). But if it's false then either ab(ALIVE, LOAD, S0) or ab(ALIVE, WAIT, So) would have to be true. In either case we must have at least one abnormality.


true. Since we haven't made a decision to the contrary, we will assume its negation. But then from the contrapositive of axiom (3), we can deduce ~t(LOADED, $2). But if that is the case, and since it must also be the case that t(LOADED, Sa) is true, we can deduce from axiom (4) that ab(LOADED, WAIT, S 1) is true. The rest of Fig. 2(c) follows directly, since we can assume that ALIVE is "not ab" with respect to LOAD and WAIT, thus deduce that ALIVE is true in both S 1 and S 2.

What, then, can be deduced from the (circumscribed) abnormality theory? It's fairly easy to verify that the two models we have presented are the only two models minimal in ab, so the theorems of Circum(A, ab, t) are those common to those two models. We can therefore deduce that ALIVE and LOADED are true in $1, that ALIVE is true in S 2, but we can say nothing about what is true in S 3 except for statements like t(ALIVE, $3) v t(DEAD, $3). What we can deduce from Circum(A, ab, t) is therefore considerably weaker than what we had intended.

4. How General Is This Problem?

The question now arises: how dependent is this result on the specific problem and formulation we just presented? Does the same problem arise if we use a different default logic or a different temporal formalism?

We can easily express the theory above in Reiter 's logic: we use the same first-order axioms from Fig. 1, but instead of circumscribing over ab we represent "minimizing abnormal individuals" with a set of default rules of the form

t :MTab(f, e, s) D Tab(f, e, s)

(where any individual may be substituted for f , e, and s). Recall that extensions are defined proof-theoretically instead of model-theoretically, so we must translate the minimal models shown in Figs. 2(b) and (c) into sets of sentences. The question becomes whether the following sets are default-logic extensions:

E~ E b

(la) t(ALIVE, So) (2a) -~ab(ALIVE, LOAD, So), (3a) t(ALIVE, Sa), (4a) t(LOADED, S~), (5a) Tab(ALIVE, WAIT, Si), (6a) -7ab(LOAOEO, WAIT, Sx), (7a) t(ALIVE, $2), (8a) t(LOADED, S2), (9a) ab(ALIVE, SHOOT, S2),

(10a) -7ab(LOADED, SHOOT, $2), (11a) t(DEAD, S3), (12a) t(LOADED, S3).

t(ALIVE, So) (lb) -Tab(ALIVE, LOAD, So), (2b) t(ALIVE, S 1), (3b) t(LOADED, Sa ) (4b) 7 ab(ALIVE, WAIT, S~), (5b) ab(LOADED, WAIT, S~), (6b) t(ALIVE, S2), (7b) -7 t(LOADED, $2), (8b) -Tab(ALIVE, SHOOT, $2), (9b) ~ab(LOADED, SHOOT, $2), (10b) t(ALIVE, $3), (llb)


Of course these are partial descriptions of extensions. Each set must also contain A and all tautologies, and in E a, for example, we must also include all sentences of the form ~ a b ( f , e , s ) for all individuals ( f , e , s ) except (ALIVE, SHOOT, $2).

To show that E a and E b a r e both extensions we can use a theorem from Reiter 's paper [24, p. 89], which says that a set of sentences E is an extension just in case it is equal in the limit to the sequence of sets E 0, E 1 . . . . formed as follows:

(1) E 0 -- Th(A) , (2) D i = {Tab( f , e, s): ab(f, e, s)y~E} , (3) Ei+ 1 = Th(E, U Oi)

where " T h " is deductive closure. Note that this is a way of verifying that a known set of sentences, E, is an extension, not a way of building an extension. Since the set of default instances D i is defined in terms of the proposed extension E, that set must be known ahead of time.

In practical terms, this definition suggests the following algorithm: Start with A. Add all sentences of the form -nab(f, e, s) such that their negations do not appear in the proposed extension. Deduce all you can. Continue until you reach a point where no more deductions can be made and no more default instances can be added. The set is an extension if all the sentences in the set have been deduced (and no others).

To demonstrate on Ea, start with Th(A), which is the set A U ( ( la ) , (4a)). Add the defaults D o = {(2a), (5a), (6a), (10a)}. Now we can make the following deductions:

deduce (3a) (7a) (8a) (9a)

(lla) (12a)

from (la) and (2a) and (4 ) , from (3a) and (5a) and (4 ) , from (4a) and (6a) and (4 ) , from (8a) and (3), from (8a) and (3 ) , from (8a) and (10a) and (4) .

At this point there are no more default instances to be added, and no more deductions to be made, so E 1 = E e . . . . . E a. Furthermore, since all the sentences in E a have been deduced, we can also conclude that E a is an extension. (By rights we should also verify that E a is minimal-- that only sentences in E a have been deduced. We will skip this part of the proof, since it would require a more precise definition of E a than we supplied above. In an earlier paper, [7], we prove the minimality of the two extensions, though using a different technique.)

Next we turn our attention to E b. The sentence of interest is (6b), since intuitively it's not true, and we would hope that it wouldn't follow from any set of "--lab" assumptions. We can rearrange a couple of the axioms in A in order to clarify the example:


Vs. "nab(ALIVE, SHOOT, s) v -nt(DEAD, result(SHOOT, s)) 7t(LOAOEO, s) , (3')

Vf, e,s. t( f, s) ^ -nt( f, result(e, s)) D ab( f, e, s) . (4')

Again we start with Th(A), which includes (lb) and (4b). Add defaults D O = {(2b), (5b), (9b), (10b)}. And

deduce (3b) from (lb) and (2b) and (4), (7b) from (3b) and (5b) and (4), (8b) from (9b) and (3 ' ) , (6b) from (4b) and (8b) and (4 ') ,

( l ib) from (7b) and (9b) and (4).

Again there are no more default instances to add, and nothing more to deduce, and all the sentences in E b follow from A and Do, therefore E b is also an extension.

So circumscription is not the culprit here--Reiter's proof-theoretic default logic has the same problem. We can also express the same problem in McDermott's nonmonotonic logic and show that the theory has the same two fixed points.

Nor is the situation calculus to blame: in a previous paper [7] we use a simplified version of McDermott's temporal logic and show that the same problem arises, again for all three default logics. In the next section we will show what characteristics of temporal projection lead to the multiple-extension problem, and why it appears that the three default logics are inherently unable to represent the domain correctly.

5. A Minimality Criterion for Temporal Reasoning

We noted above that default-logic theories often generate multiple extensions. But characteristic of all the usual examples, like the one we used in Section 2, is the fact that the default rules of these theories were mutually exclusive: the application of one rule rendered other rules inapplicable by blocking their preconditions.

T~us it comes as somewhat of a surprise that the temporal projection problem should exhibit several extensions. How can there be conflicting rules in the same way we saw above when our theory has only a single default rule? It turns out that conflict between rules arises in our domain in a different, more subtle, manner. To see how, recall how we built the first minimalmodel (that of Fig. 2(b)). The idea was that we assumed one "normality," then went on to make all possible deductions, then assumed another "normality," and so on. The picture looks something like this:


-qab(LOADED, WAIT, S1) ~ t(LOADED, $2) ~ . . . ~ ab(ALIVE, SHOOT, $2) ,

where the conflict to notice is that as a result of assuming a "normality" we could deduce an abnormality. The same thing happened when we built the model in Fig. 2(c), except the picture looks like this instead (reading from right to left):

ab(LOADED, WAIT, $ 1 ) ~ " " ~-lt(LOADED, $2) ab(ALIVE, SHOOT, $2).

The only difference between the two models is that in the first case we started at the (temporally) earliest situation and worked our way forward in time, and in the second case we started at the latest point and worked our way backward in time. Another way to express the idea is that in the first model we always picked the "earliest possible" (f , e, s) triple to assume "normal" and in the second model we always picked the latest.

So the class of models we want our logic to select is not the "minimal models" in the set-inclusion sense of circumscription, but the "chronologically minimal" models (a term due to Yoav Shoham): those in which normality assumptions are made in chronological order, from earliest to latest, or, equivalently, those in which abnormality occurs as late as possible.

Let us make clear what we are claiming here: that chronological minimality is the correct minimality criterion for the temporal projection problem we proposed above, and that neither predicate circumscription, nor Reiter's default logic, nor McDermott's nonmonotonic logic, can represent this criterion. Why they cannot do so is pretty clear: the concept of minimality in circumscription is intimately bound up with the notion of set inclusion, and chronological minimality cannot be expressed in those terms. As far as Reiter and McDermott's logics go, what we need is some way to mediate application of default rules in building extensions or fixed points, which is beyond the expressive power of (Reiter's) default rules or of NML sentences involving the M operator.

One thing we are not claiming (our critics notwithstanding) is that chronological minimality so defined is the only criterion one would ever need in representing problems in temporal reasoning. One can easily imagine the "temporal explanation" problem, for example, in which one is presented with a state of the world and causal rules and asked to speculate on what the world might have been like in the past so as to generate the given state. For this problem one might argue that "reverse chronological minimality"--working backward in time, making normality assumptions from latest to earliest--would be appropriate.

394 s . H A N K S A N D D. M C D E R M O T I ~

For reasoning about the real world the criterion will be more complicated than either of the criteria we proposed above. Is it more reasonable to assume the gun was loaded and our friend died, or that it was not loaded and he lived? That depends on a lot of different domain-dependent factors: how reliable is the gun, how much time elapsed between the WAIT and the SHOOT. One can easily build realistic scenarios where one might jump to either conclusion (or perhaps neither). The point is that if these formal systems cannot express the simple minimality criterion we presented above, they won't be able to express the more realistic, complex criteria either.

6. Criticisms and Proposed Solutions

A flurry of replies greeted our original paper. The responses can be grouped into five classes, making the following arguments:

(1) that there really is no problem at all, (2) that other domain-independent formal systems can solve the problem, (3) that the temporal axioms can be revised so the existing nonmonotonic

logics will select the intuitive interpretation, (4) that circumscription can be extended in expressive power so as to

represent chronological minimality and similar criteria, (5) that the right way to think about default reasoning is directly through the

notion of "preferred models" rather than indirectly through devices like circumscription axioms or default rules.

6.1. Arguing that there is no problem

The first argument, put forth by Loui in [15] says basically that we don't want our formal systems to favor the chronologically minimal models, either because other temporal reasoning problems might involve other such criteria, or because the function of a logic is to constrain admissible models rather than to mirror reasoning in any way.

The first interpretation we have already responded to: granted that chronological minimality is not the only criterion one might want to express, it certainly is o n e such criterion. It seems to express what McDermott was getting at in his temporal logic [19], and what McCarthy has referred to as the "inertial property of facts" (e.g. in [17]). It is hard to imagine any criterion that might arise in temporal reasoning that does not in some way mention temporal ordering, yet we have showed that all of these logics fail to discriminate on the basis of concepts like "temporally earlier" or "temporally later."

A second interpretation of the "no problem" argument takes a stand on what role logic should play in the process of theory development in AI. It goes


something like this: Axioms are supposed to describe constraints on the world, and do not need to support reasoning in any direct way. We fix a temporal ontology, but may want to reason about that ontology in a number of different ways: from past to future, from future to past, starting with the most strongly believed premise, etc. The axioms should reflect our ontological choices, but should be neutral about the ways in which we might reason using the resulting theories.

Now it is true that one can take this lofty view of the relation between logic and thought, but from that distance mundane phenomena like the frame problem simply become invisible. The whole point behind the development of nonmonotonic formal systems for reasoning about time has been to augment our logical machinery so that it reflects in a natural way the "inertia of the world." All of this effort is wasted if one is trying merely to express constraints on the physics of the world: we can solve the problem once and for all by saying "once a fact becomes true it stays true until the occurrence of some event causes it to become false." This is in effect our axiom (4) above. As we noted, this principle will allow us to conclude almost nothing, since we can generally not conclude that-a terminating event has not occurred. But if trying to conclude such things is not our aim, then there indeed is no problem. Many researchers, however, would not be satisfied with using logic only as a notation for expressing ontological theories.

6.2. Arguing for an alternative minimality criterion

Loui, in [15, Section III.1.1], makes the argument that with a change in our axioms an alternative formal system actually favors the correct extension. The implication is that the minimization we really intended to do can be captured by another domain-independent criterion for preferring one model over another, though set inclusion did not work. He claims that the theory- comparison logic of Poole, as reported in [23], chooses the right interpretation of the axioms. We will explain Poole's system quickly, then move on to Loui's analysis.

To use Poole's logic we divide our axioms into three sets: a set of noncontingent facts, or laws (labelled Fn) , a set of contingent, or problem-specific facts (labelled Fc) , and a set of default rules (labelled A). Fp and F c are just sets of first-order sentences. The former expresses truths about the world in general, and the latter describes one particular situation. Default rules look a lot like Reiter's default rules; our "persistence of facts" rule would appear as follows:

(f , e, s) ASSUME t ( f , s) D t ( f , result(e, s)) ,

where f , e, and s are variables that can be instantiated irt the rule, and ASSUME is just syntactic sugar.

396 S. HANKS AND D. MCDERMOJ~F

From a triple (F. , Fc, A) we can build theories of the form (Fn, F~, D), where D is a set of instances of the default rules in A (formed by plugging in constants for all the variable names that appear to the left of ASSUME). Note that theories formed in this way are first-order theories, so we can speak of the sentences that follow deductively from them. We say a sentence ~o is explained by a theory (F n, F~, D) if

and

F n tO F c tO D is consistent.

When context makes the identity of F , and F c clear, we will refer to a theory just by its default instances D.

Applying this model to our original axioms we might get the following:

Fo = (Vs. t(LOADED, result(LOAD, S)), VS. t(LOADED, S) ~ t(DEAD, result(SHOOT, s ) ) ) ,

F c = {t(ALIVE, So)},

A = {(f, e, s) ASSUME t(f, S) ~ t(f~ result(e, s))}.

Now we build two different theories by choosing two different sets of default instances. We'll call the theories D a and D b to identify them with the extensions we built in Section 4:

Da = {t(ALIVE, So) t(ALIVE, $1) t(LOADED, $1) t(LOADED, $2)

t(ALIVE, S 1), t(ALIVE, S2) ,

t(LOADED, S2) , t(LOADED, S3)},

Db = { t(ALIVE, S 0) t(ALIVE, S 1 ) t(ALIVE, $2)

t(ALIVE, Sl) , /(ALIVE, S2) , t(ALIVE, S 3)}.

Theory D a explains the sentence t(DEAD, 53) (we'll call this sentence ga, following Poole's notation) and theory D b explains t(ALIVE, $3) (which we'll call gb)" Poole's system addresses the question of which of these two theories we should prefer.

Poole's system would have us prefer the "most specific solution," where a solution is a theory plus a sentence it explains, e.g. (D a, ga)" Specificity can loosely be thought of as "the extent to which the solution depends on


contingent facts." Recall that contingent facts are the sentences in F c. To illustrate using an example from Poole, assume we have a noncontingent fact of the form "all emus are birds," default rules of the form "birds fly" and "emus do not fly," and a contingent fact of the form "Edna is an emu." One can build a theory that explains "Edna can fly" as well as a theory that explains "Edna cannot fly." According to Poole's specificity criterion the second theory is strictly more specific than the first because it depends on the contingent fact "Edna is an emu" and the first theory depends on no contingent facts.

Specificity is defined formally as follows: given two solutions

S1 = (D1, gl) and S 2 : ( D 2 , g2) ,

we say that S 1 is more specific than S 2 if the following holds for all sets of sentences Fp:

if Fp U D 1 U F n ~ gl and Fp U 0 2 U F n ~ g~,

then Fp U/)2 U F n ~ g2 •

Solution S 1 is strictly more specific than S2 if $1 is more specific than S 2 but S 2 is not more specific than S 1. So the analogue of circumscriptive minimal models are those solutions that are strictly more specific than all others. We will write "S 1 I> $2" to mean "S 1 is more specific than $2," and "S 1 > $2" to mean "S~ is strictly more specific than $2."

Now returning to the "shooting" theories we built above, we ask which is most specific between

and

Sa = ( T a , ga = t(DEAD, $3) )

Sb = (Tb, gb = t(ALIVE, $3))2

First let's establish whether S a ~ S b. To do so we must show that for any s e t Fp,

if Fp U D a U F n ~ t(DEAD, $3) and Fp U D b U F n ~ t(DEAD, $3) ,

then Fp U D b U F n ~ t(ALIVE, S 3) .

Crucial to determining specificity is figuring out for each solution what facts must be in Fp in order that the solution's conclusion follows. (This is the first condition in the specificity criterion above.) We will call this set the solution's "necessary Fp facts." Some of these facts will be uninteresting. Consider solution Sa, for example: certainly putting t(DEAD, $3) in Fp allows us to draw

398 S. HANKS AND D. MCDERMOTI"

that conclusion from Fp to O a U Fn, but it also allows any other solution to draw that conclusion too, so the second condition in the specificity criterion will not be satisfied. When we speak about necessary Fp facts, then, we will generally only be concerned with the "interesting" ones: those that satisfy both condi- tions of the specificity criterion.

Turning our attention now to comparing S a and Sb, we note that the set of necessary Fp facts for S a is empty: ga follows from D a tO F~ alone. Any set Fp will satisfy the first antecedent. To get gb from D b tO Fn, however, we do need additional information: either t(ALWE, $3) or t(ALIVE, $2) or t(ALIVE, SI) or t(ALIVE, So). So it's easy to find an Fp that does not satisfy the definition: Fp = 0. Therefore S a is not more specific than S b.

Conversely, to show that S b t> S a we note that the consequent of the new specificity condition,

Fp U O a U F . ~ t(DEAD, $3) ,

is true for any choice of Fp, thus the conditional is always true and S b 1> Sa; in fact S b > S a. The upshot: for our reformulation of the axioms in Poole's default logic, Poole's specificity criterion favors the wrong conclusion.

Loui claims that we can fix the problem by changing the axioms in two ways: by making the "shooting" law (Vs. t(LOADED, s )D t(DEAD, s)) into a default rule, and by strengthening its antecedent. Thus he reformulates our axioms as follows (we are actually inferring his intentions here, since he never explicitly states his reformulation):

F ~ = ~ , F c = {/(ALIVE, So), t(LOADEO, St) ) , A, = {(s) ASSUME t(LOADED, S) A t(ALIVE, S)

D t(DEAD, result(SHOOT, S)), (f, e, s) ASSUME t( f, s) D t( f, result(e, s))}.

Loui has actually made three changes here: in addition to changing the "shoot" laws into a default rule and adding t(ALIVE, S) tO its antecedent, he also chose to express the "loading a gun causes it to be loaded" law as a situation-specific fact. Expressing this axiom as a fact (in Fc) instead of as a law (in Fn) turns out to have an important effect on which theories are more specific than others, though Loui does not mention this in his paper. We will return to that point later.

Now the two theories look like this:

Da ----{t(ALIVE, S0)D t(ALIVE, S1), t(ALIVE, S1) D t(ALIVE, 32), t(LOADED, 81) ~ t(LOADED, 82) , t(LOADED, 82) A t(ALIVE, 82) D t(DEAD, 83) , t(LOADED, $2) D t(LOADED, 83)),


D b = {t(ALIVE, S0) D t(ALIVE, S l ) , t(ALIVE, S 1) ~ t(ALIVE, S 2), t(ALIVE, S 2) ~ t(ALIVE, S 3)}.

We can easily verify that S a is now strictly more specific than Sb: in order to conclude t(DEAD, $3) from D a we need both t(LOADED, $2) and t(ALIVE, $2). To get the former we need in Fp either t(LOADED, $2) or t(LOADED, $1). To get the latter we need in Fp either t(ALIVE, So) or t(ALIVE $1) or t(ALIVE, $2). On the other hand, to get t(ALIVE, $3) from D b we need in Fp either t(ALIVE, 82) or t(ALIVE, $1) or t(ALIVE, So), and one of those has to be in the Fp set for S a. So the Fp set for S a must be a superset of the Fp set for S b and therefore Sa > S b.

How are we to know that t(ALIVE, s) is the right conjunct to add to the "shoot" rule? Loui says only "[the added conjunct is] a genuine part of what is claimed to be known about the causal law in this domain. The reason Hanks and McDermott want to clip alive-ness instead of loaded-ness is that intuitively, they know that [t(ALIVE, $2) ] is one of the assertions that can be in the antecedent of (the defeasible version of) their rule." This hardly amounts to a policy for deciding which conjuncts to put in our causal rules, unless he is suggesting that we qualify every causal rule with every condition that does not interfere with the conclusion. That simple a scheme will not work.

Consider an extended version of our original example. We look a little further forward in time, and consider a situation $4, which is result(LEAVE- HOUSE, $3). We also add a causal rule that says if one is alive, and if it is raining, then one is wet as a result of LEAVE-HOUSE. We will defer to Loui here and couch this as a default rule. (It turns out to be irrelevant to the specificity analysis whether it is a default rule or a law; indeed, it turns out to be irrelevant whether the "shoot" rule is a default or a law.) We will also say that it is raining at S o . Our logic now becomes:

Fn = 0 , F c = {t(ALIVE, So) , t(LOADED, $1) , t(RAINING, So)) , AI----{(S) ASSUME t(LOADED, s) Dt(DEAD, result(SHOOT, s)) ,

(s) ASSUME ~t(ALIVE, s) A t(RAINING, S) t(WET, result(LEAVE-HOUSE, s)),

( f , e, s) ASSUME t( f , S) ~ t( f , result(e, s))}.

(We will have to add qualifications to the default rules' antecedents.) The two solutions to compare are:

Da = {t(ALIVE, S0)D t(ALIVE, $1) , t(ALIVE, 81) ~ t(ALIVE, 32), t(LOADED, 81)~ t(LOADED, S2) , t(LOADED, $2) ^ t(ALIVE, $2) ~ t(DEAD, $3) , t(LOADED, S2) ~ t(LOADED, S3) } ,


D b = {t(ALIVE, So) D t(ALIVE, S , ) , t(ALIVE, S,)D/(ALIVE, $2) , t(ALIVE, S2) D t(ALIVE, S3) , t(RAINING, S0) D t(RAINING, S 1), t(RAINING, $1) D t(RAINING, S2), t(RAINING, $2) D t(RAINING, $3) , t(ALIVE, $3)/x t(RAINING, $3) D t(WET, $4)},

S a = (Da, t(DEAD, $3) ) ,

S b = (Db, t(WET, S4) ) .

What do we add to the antecedents of these rules? Do we add t(ALIVE, s) to the "shoot" rule? Apparently so. Do we add t(RAINING, s) to the "shoot" rule? We have to: Sb depends on a "raining" fact, so if Sa is to be more specific than S b it must also depend on a "raining" fact. So now the "shoot" rule reads like this: if ALIVE and LOADED and RAINING then DEAD follows from SHOOT. Now what should we qualify the "get wet" rule with? We already qualify it with ALIVE and RAINING; should we qualify it with LOADED too? It would seem at least as reasonable as qualifying the "shooting" rule with RAINING. It doesn't matter whether we do so or not. Since the "get wet" rule is applied temporally later than the "shoot" rule, an Fp set of the form {t(ALIVE, $3), t(RAINING, $3), t(LOADED, $3) } would, with Db, entail t(WET, $4), but this Fp would not entail /(DEAD, $3).

Nov¢ we're beginning to see where the temporal aspect of the problem has been hidden: not in the axioms themselves, but in the choices one makes as to how one qualifies the default rules. The way one strengthens the causal rule antecedents depends on the temporal order in which they are to be applied. Now deciding on these qualifications itself becomes an exercise in temporal reasoning, and to formalize this process would presumably require formalizing the temporal-ordering criteria that have proved so difficult. Furthermore, what to add to the rule antecedents depends not just on the syntactic form of the rules themselves, but on the particular theory comparison being made. Com- pare two new theories and you have to rethink the form of the causal rules.

As far as we're concerned, there is no justification for qualifying the "shooting" rule with a "raining" precondition, except that it makes the theory comparison come out right. Is this sort of decision "a genuine part of what is claimed to be known about the causal law in this domain," as Loui claims? We think not. Domain knowledge to us means knowledge about being alive, about guns being loaded, about shooting, about raining, about getting wet-- in short, exactly what we put into our original, unqualified axioms.

What's worse is that under some circumstances no reasonable specialization of the antecedents works right. Consider the following example, pictured in Fig. 3:


& S, $2

Fc

&

Fc U Da

Fc U D b

FIG. 3. Another simple projection example.

F n ~ ,

F c = {t(P, So) , t (Q, So), t (R, $2)}, A~ = {(S) ASSUME t( Q, s) D --1 t(P, result(El, s)) ,

(s) ASSUME t(P, s) A t(R, s) D t(S, result(E3, s)) , ( f , e, s) ASSUME t( f , e, s) D t( f , e, result(e, s) )} ,

where the states are

S 1 = result(El , So) , S 2 = result(E2, S1) , S 3 = result(E3, 5 2 ) .

There is a single chronologically minimal model---one in which the P fact is "cl ipped" at S 1 so that the second causal rule does not apply, and S is not true at S 3. A second interpretation has P persisting to $2, thus S becomes true at S 3. The corresponding Loui solutions are:

D a = { t ( Q , So) D ~ t ( P , S 1 ) } , D b = {t(P, So) D t(P, S1) ,

t(P, $1) D t(P, $ 2 ) , t(P, $2) A t(R, $2) D t(S, $3)},

S a = (Da, ~ t (P , $1)), S b = ( O b, t(S, $3) ) .

S a describes the chronologically minimal model.

402 s. HANKS AND D. MCDERMOTT

The necessary Fp fact for S a is t(Q, So) , while S b must have t(R, $2) and either t(P, So) or t(P, S1) or t(P, $2). As is, neither solution is more specific than the other. To make Sa more specific we can strengthen the antecedent of the first causal rule, as Loui suggests. Following his example we will specialize the first causal rule so it reads "if P and Q are true at s then P is false at result(E1, s)." That takes care of Sb'S P precondition: now S a needs either t(P, So) or t(P, $1), and either of these will lead to a deduction of t(P, $2) from D b. But can we specialize the rule further to take care of Sb'S R precondition? Apparent ly not. If we further specialize the "shoot" rule to include R in the antecedent, not only does it not make S a more specific than Sb, but S a is no longer a valid solution: its conclusion, -at(P, $1) , no longer follows f rom F n LIF c U D a. In fact the only specialization that would work would say explicitly that the causal rule being applicable at situation Sj depends on R being true at situation S 2. To have the applicability of a causal rule depend on some future state of affairs is absurd.

We should make one last comment about Loui 's reformulat ion of our problem. We ment ioned above that he made a subtle, but crucial, decision in building his version of our shooting example. Whereas we expressed the " loaded gun" rule as as general law (thus in Fn) of the form "guns are loaded in all situations resulting f rom a loading event ," Loui expresses this information in a much more specific f o r m - - b y adding a situation-specific fact of the form "the gun is loaded in situation So." If this axiom is expressed as a law then the "shot therefore dead" theory no longer depends on a fact in Fp of the form "the gun is loaded," because that fact is already in F n. The theory is therefore less specific than it was before. In fact even if one strengthens the antecedent of the "shoot" rule as Loui suggests, the "shot therefore dead" theory is still not strictly more specific than the "unloaded therefore alive" theory. Thus Loui 's analysis depends on a representat ional decision that is arguably arbitrary, and certainly left unstated in his paper.

To summarize, we find Loui 's proposed solution singularly unsatisfying. First of all it is extremely brittle, depending, as it does, on the assignment of axioms to the set of universal laws instead of the set of problem-specific facts. His solution is based on coding temporal information in the causal rules in a way that depends not on issues of causality, but on the specific theory comparison one is undertaking. Further, he provides no general method to do this coding, but it is clear that this method must include exactly the temporal minimality criterion we are trying to formalize. Finally, there are cases where no reasonable specialization of the causal rules does the projection right.

6.3. Arguing for a change in temporal ontology

The proposal here is to revise the axioms so that one of the standard nonmonotonic formal systems actually licenses the right conclusions. The


best-developed effort in this direction is that of Vladimir Lifschitz, in his paper "Formal Theories of Act ion" [12]. He revises the situation calculation so as to make certain notions of causality more specific. First of all, he assumes a predicate of the form

causes(action, fluent I , fluent 2)

which means "if action is feasible, then whenever it is executed it causes fluent 1 to take on the value offluentz." A fluent is a function from situations to truth values, so it corresponds to what we called a fact. For example one might say

causes(SHOOT, ALIVE, FALSE),

where FALSE is a fluent that is true in no situations. Similarly TRUE is a fluent true in all situation. We enforce this by adding axioms of the form:

Vs. t(TRuE, S) and VS. -3t(FALSE, S).

The predicate precond determines whether an action is feasible--whether it will succeed in a given situation. Saying precond(fluent, action) means that action can succeed only if fluent is true in the situation in which action is attempted. An example would be precond(LOADED, SHOOT).

An action is successful in a situation just in case all its preconditions in that situation are true:

Va,s. success(a, s) ¢¢, Vf. precond( f, a) ~ t( f , s) .

Another predicate, affects, can then be defined for actions, fluents, and situations, to mean "a successful action would affect the value of the fluent in the situation"

Va,f ,s . affects(a, f, s) ¢:~ success(a, s) A 3V. causes(a, f, v) .

That is, an action affects the value of a fluent just in case the action is feasible and causes the fluent to take on some truth value.

With these definitions we can define how an action can change a fluent's value. These axioms simplify Lifschitz's t reatment somewhat:

Va,s , f ,v . success(a, s) A causes(a, f, v) D (t( f , result(a, s)) ¢:~ t(v, s))),

Va,s , f . 7affects(a, f, s) ~ (t( f, result(a, s))C~ t( f, s)) .

404 S. HANKS AND D. MCDERMOTt

Then for a given problem we list all the known causes and preconditions:

causes(LOAD, LOADED, T R U E ) ,

causes(SHOOT, ALIVE, F A L S E ) ,

precond(LOADED, SHOOT) ,

and then we circumscribe on the predicates causes and precond (letting t vary as a parameter) , which has the effect of making the explicit causes and preconditions the only causes and preconditions. So from the circumscribed axiom we get:

Va,f,v. (causes(a, f, v) ¢¢, (a = LOAD A f = LOADED A O = TRUE) v

(a = SHOOT A f = ALIVE A O = FA L SE ) ) ,

Va,f. (precond(f, a)Cz>f= LOADED A a = SHOOT) .

(This is called predicate completion, as in Clark [2].) From here conventional deduction yields the desired result: we can infer

from the circumscribed axioms:

7t(ALIVE, result(SHOOT, result(WAIT, result(LOAD, So))) ) .

Loui, in [15, Section III.1] makes a proposal like that of Lifschitz in spirit, though not in detail. He suggests adding a predicate original-event, which apparently refers to event instances that are imagined but not necessarily realized. So we say that a firing, a waiting, a dying, and an unloading are all original-events. Furthermore we say that the loading and the firing are both "real events," (event-instances, to use his notation) and that the dying is a real event only if the gun is loaded, and the gun becomes unloaded only if there's a "real event" unloading. The axioms provide no way to deduce that a "real" unloading occurred, so one never does, the gun remains loaded through the wait, and the individual dies.

We see two problems with proposals like these. The first is that if we adopt one of these solutions we have in effect allowed technical problems in the logic to put too much pressure on our knowledge representation. Part of the presumed appeal of expressing theories in logic is that the language should allow us to explore our intuitions about domains like naive physics (as in [9]). What we should be thinking about at the logical level are issues like whether time is continuous or discrete, how many modalities there are, whether time is made out of points or intervals, etc. But if we follow the example of those who would have us cliange our ontology, we in effect have to phrase our axioms in one particular way (which seems only slightly di f ferent- -and slightly more


awkward-- than the situation calculus) just to get around the inadequacies of the inference mechanism. Why is it that causes and precond are exactly the predicates we should have used to describe our problem? What was it about the domain (not about the technical details of the logic) that should have told us that? Will Lifschitz be around to bail us out when we come up against the next inferential bug?

The second objection we have to solutions like the one Lifschitz proposes is that at the point one takes heroic measures to patch the circumscription machinery, the machinery itself becomes epiphenomenal. We will present this argument in the next section.

6.4. Arguing for an enhanced circumscription

Two proposals, one by Lifschitz (in [13]) and one by Kautz (in [11]), propose a more robust (and hence more complex) circumscription axiom that expresses the chronological minimality criterion directly, and uses our original ontology. Since Lifschitz's proposal is more general, we will present it below.

Lifschitz's logic is called "pointwise circumscripi ion"-- the notion being that it may not be adequate to circumscribe "globally" over a set of predicates, that instead one may want to minimize predicates differently at different points. In our example we want to minimize abnormalities, so a point is actually a triple ( f , e, s) where f is a fact type, e is an event type, and s is a situation.

We need to beef up our axioms a little bit. We want to minimize over temporally earlier situations first, so in all fairness we need to introduce the notion of temporally earlier into the logic. We will do so in a rather ad hoc manner, just by defining a new predicate < and enumerating the situations over which it holds:

Ys~,s 2. s l<s2c:>(s ~ = S O ̂ s 2 = S~) v (s~ = So ^ s2 = S2) v

(Sx = s o ^ s2 = s 3 ) v (Sl = s~ ^ s2 = S2) v

(s~ = $1 ^ s2 = $3) v (Sx = $2 ^ s2 = S3) ( 5 )

and we can then define an ordering relation over fact event situation triples:

Vf, e ,s , f ' ,e ' , s ' . ( f , e, s) <p ( f ' , e', s')C:>s < s ' . (6)

As we did with global circumscription, we minimize the ab predicate, allowing t to vary as a parameter . But the pointwise circumscription axiom allows us to minimize according to the temporal ordering defined by <p. This is the axiom Lifschitz proposes:


Vf, e, s, ab', t' . ~ (ab( f ,e,s) (C1)

A Vf' ,e ' ,s ' . ( f ' , e ' , s ' ) < p ( f , e , s ) D (ab ' ( f ' , e', s ')C:>ab(f' , e', s')) (C2)

A A ( A f ' , e ' , s ' ( a b ( f ' , e ' , s ' ) A ( f , e , s ) # ( f ' , e ' , s ' ) ) , t ' ) ) . (C3)

(For any alternative choice of ab (ab') and of t (t ') it cannot be the case that the original ab is true at ( f , e, s) and that ab and ab' agree on all points temporally earlier than s and that ab' satisfies A at all points other than ( f , e, s).)

Now the question springs to mind: how do we know whether this works? What follows from the circumscribed axiom, which we henceforth call Cir- cum(A; ab/<p, t) using Lifschitz's notation? The time-honored approach to analyzing circumscribed theories, which Kautz carries out in some detail, goes like this: first admit that you already know the right answer. In our case this answer takes the form of choices for ab' and t' that correspond to our preferred model. We'll call them abc and t c to stand for the "correct" ab and t:

abc( f, e, s)C:~(f, e, s) @ {(ALIVE, SHOOT, result(SHOOT, SD) . (ALIVE, SHOOT, 82 ) , (ALIVE, SHOOT, S3) , (ALIVE, SHOOT, result(SHOOT, $3))} ,

tc(f , x) ¢:> ( f , e, s) E {(ALIVE, (ALIVE, (ALIVE, (DEAD, (DEAD, (DEAD,

So) , S 1), (LOADED, S 1), S2) , (LOADED, S2), S3) , (LOADED, S3), result(SHOOT, S L ) ) , result(SHOOT, S 3)) } .

(You may notice that we refer to some situations we're not really interested in, e.g. result(SHOOT, $3). We include these just to satisfy the original axioms, which say, for example, that if a shot did occur in S 3, the ALIVE fact would be abnormal.)

One fact we're very much interested in is whether or not ab(LOADED, WAIT, S~) is true in the augmented axioms. We hope it is not. We will check. To do so we create an instance of the circumscription axiom, using our choices for ab c, tc, f, e, and s. That gives us:

--1 (ab(LOADED, WAIT, 31) (CI ' ) A Vf' ,e ' ,s ' . ( f ' , e ' , s ' ) < p (LOADED, WAIT, $1)

D (abc( f ' , e ' , s ' ) ¢:> ab( f ' ,e ' , s ' ) ) (C2') A A(Af ' , e ' , s ' (abc ( f ' , e ' , s ' )

A (LOADED, WAIT, SI) # ( f ' , e ' , s ' ) ) , t ' ) ) , (C3')


which roughly means that if abc and ab agree on all points temporally prior to S 1, and if abc and t c are strong enough to satisfy the original axioms A, then ab(LOADED, WAIT, $1) is false--thus the gun is loaded and the individual dies.

What we have to do, then, is verify that the second two conjuncts, (C2') and (C3'), are true. Start with (C3'). We have to substitute t c for t, and substitute

Af ' ,e ' , s ' (abc( f ' ,e ' , s ' ) A (LOADED, WAIT, S1) ~ ( f ' ,e ' ,s ' ) )

for ab c into the original axioms (1)-(4) (from Fig. 1). Axioms (1) and (2) involve only t, and become:

tc(ALIVE, So),

Vs. tc(LOADED, result(LOAD, S)),

(1')

(2')

both of which can be verified by examining the enumeration of t c above. Axioms (3) and (4) become:

VS. tc(LOADED , S) abc(ALIVE, SHOOT, s) A (ALIVE, SHOOT, S) ~ (LOADED, WAIT, S 1) A tc(DEAD , result(SHOOT, s)) , (3')

Vf, e,s. t c ( f , s ) A 7 (abc ( f , e , s ) A ( f ,e ,s) ~ (LOADED, WAIT, 51) ) tc(f , result(e,s)) (4')

For instance (3') we note that the only values of s that satisfy the antecedent are $1, Se, and $3, so the conjuncts must be satisfied for each of those situations:

(ALIVE, SHOOT, S1) ~ abc and (ALIVE, SHOOT, $1)~ (LOADED WAIT, $1) and (DEAD, result(SHOOT, S 1)) E t c ;

(ALIVE, SHOOT, $2) E a b c and (ALIVE, SHOOT, $2)~ (LOADED, WAIT, $1) and (DEAD, result(SHOOT, $3) ) E tc;

(ALIVE, SHOOT, $3) ~ ab c and (ALIVE, SHOOT, $3)~ (LOADED, WAIT, $1) and (DEAD, result(SHOOT, $3) ) E t c .

Again we can verify each conjunct by examining the enumeration of abc and of t C •


Finally, axiom instance (4') can be enumerated and checked in the same way, but we will omit this step.

Now we have verified conjunct (C3), and we must do the same for (C2):

Vf',e',s'. ( f ' , e ' , s ' )<p (LOADED, WAIT, S,) ab c( f ' ,e ' ,s ' ) ¢:> ab( f ' ,e ' ,s ' ) .

Note that the only abnormality points that satisfy the precondition are those involving situation So, so (C2) reduces to

Vf, e. ab(f , e, So)¢~,abc( f, e, So),

but since abc includes no abnormalities at situation S 0, (C2) further reduces to

Vf, e. Tab(f, e, So),

so the original circumscription instance becomes

Tab(LOADED, WAIT, $1) v ::If, e. ab(f, e, So).

So we didn't get quite what we wanted, which was just the first disjunct. To eliminate the second disjunt we would next have to prove

Vf, e. ~ab(f, e, So).

The proof works the same way: we produce an instance of the circumscription axiom, substituting abc for ab, t c for t, and S O for s, and go about eliminating all terms in the negation except for the first conjunct, leaving us with Tab( f , e, So). We omit the proof, but note that it works out easily enough: verifying the third conjunct involves essentially the same analysis we did above, and since there are no points temporally earlier than So, the second conjunct is immediately true.

Thus pointwise circumscription (and Kautz's logic of persistence) can be shown to produce as theorems exactly those formulas true in all chronologically minimal models. But we must ask at this point what purpose the circumscription served in the first place? So far the only way to use circumscribed theories has been to perform analyses such as the one we performed above: you know what conclusion you want to get out of the theory, you use the conclusion to build an instance of the circumscription axiom, and you see if the desired result pops out.

It's the need to know the solution beforehand that is troublesome here. If we really can provide no general theory as to how to instantiate the circumscription axiom, in no sense have we provided a formal inferential theory. We're


always going to have to know the right answer ahead of time in order to instantiate the axiom, and at that point the circumscription provides us with no new information. On the other hand, if we can find some way to generate appropriate axiom instances we can just view that process itself as an axiom schema and dispense with circumscription altogether.

There was some hope with earlier versions of circumscription that the axioms could actually be put into an automated theorem prover, and through that mechanism we might learn unexpected things about our theory (see, e.g., [17, Appendix A]). But pointwise circumscription in its full generality is not a first-order system, nor can the chronological minimality criterion be expressed by a first-order axiom, so barring some serious advances in theorem proving for higher-order logics, circumscription seems to be putting us in the position of using our insight into the problem to validate the formal system, rather than the other way around. We can never get more out of circumscription than we already know, and, if we don't do it just right, we may get a good deal less.

6.5. Arguing for model-theoretic minimality criteria

It is probably clear by now that what we're really interested in is the answer to two questions: (1) given a logical theory that admits more than one model, what are the preferred models of that theory (that is, what is the preference criterion) and (2) given a theory and a preference criterion, how do we find the theorems that are true in all "most preferred" models?

The last proposal for solving our representation problem, and the most promising, in our view, involves answering these questions directly: by describing preference relations on models and describing algorithms that generate the theorems true in the preferred models. This is the approach advocated by Shoham (in [25]), though it is also suggested in the writings of Kautz [11], Poole [23], and Goebel and Goodwin [6].

Shoham's theory of causal inference involves a logic for reasoning about time in which knowledge appears as a key modality. We can make statements like "K(f , s)," which is to be interpreted as "fact f is known in situation s. His preference criterion--chronological minimization extended to the notion of "facts being known" instead of "facts being true"--is called chronologically maximal ignorance. Just as above we advocated preferring those models in which abnormalities, or clippings, occurred as (temporally) late as possible, Shoham's criterion directs us to prefer those models in which knowledge is acquired (facts are known) as late as possible.

Frame axioms are then phrased like "if you know that the gun is loaded at t and you don't know that an unloading occurs between t and t + 1, then you know the gun is loaded at t + 1." The "shot therefore dead" model of our shooting axioms is preferred to the "unloading therefore alive" model because the latter model admits knowledge of an "unloading" event unknown in the


former model, and the unloading event occurs temporally earlier than the "clipping aliveness" event that is known in the former model but unknown in the latter.

Minimizing knowledge (as opposed to clippings, or gun firings, or abnormalities, or something else) turns out to be an advantage because minimizing knowledge amounts to minimizing all propositions simultaneously. We can't do this by minimizing the truth of propositions because minimizing the points where P holds implies maximizing the points where 7 P holds. On the other hand, we can simulteneously minimize the points where P is known to hold and the points where 7 P is known to hold.

Having defined and justified this preference criterion, Shoham goes on to define theories, which he calls causal theories, that have the property that all maximally chronologically ignorant models agree on what is known. Further, he provides an algorithm that efficiently computes the set of atomic sentences known in all of these models.

So Shoham's solution, though it is very specific to the problem of temporal projection, indeed gives us what we were looking for in a logic: a way of expressing what models our theory of projection prefers, and a way of generating the sentences true in the preferred models.

7. Conclusion

Much of the controversy surrounding our first paper, [7], centered around a statement we made at the end:

[If it is indeed that case that] a significant part of defeasible reasoning can't be represented by default logics, and if in the cases where the logics fail we have no better way of describing the reasoning process than by a direct procedural characterization (like our program or its inductive definition), then logic as an AI representation language begins to look less and less attractive.

In retrospect that was an oversimplification---on the one hand, it almost certainly is the case that nonmonotonic inference (or default reasoning, or whatever you want to call it) is idiosyncratic and domain dependent, and that there is no reason to expect that a simple syntax for describing when "leaps of faith" should be made, or a simple minimality criterion such as set inclusion, should handle all the cases. On the other hand, this does not doom to ultimate failure all uses of logic in AI (or even in describing nonmonotonic reasoning).

There are a variety of ways to describe nonmonotonic reasoning within a logical framework. Many of them depend on classifying models more finely than classical logic allows. One's effort in describing nonmonotonic reasoning, then, should be directed toward describing the classification (i.e. developing a preference criterion) and finding out what is true in those preferred models, as Shoham eloquently argues in [25].


Circumscription-especially the new, more expressive versions--seems only obliquely concerned with these goals. The circumscription axiom does not allow a perspicuous statement of minimality criteria. Indeed, in many cases one starts with the notion of minimality and works backward to find a circumscription axiom that expresses it. And circumscription offers no help at all in generating the theorems true in preferred models. As we noted, the only way we know of to find that out is to show ahead of time what follows.

But while our conclusion about the role of logic in theory development was admittedly overstated, our second conclusion, about the acceptable uses of nonmonotonic logics, was not. One thing that our result, and even the responses, makes clear is that the relationship between these logics and human reasoning is not well understood. We can no longer engage in the logical "wishful thinking" that led us to claim that circumscription solves the frame problem [17], or that "'consistent' is to be understood in the normal way it is construed in nonmonotonic logic. [1]" From a technical standpoint, there is no "normal way" to understand the M operator, or the Reiter default rules, or a theory circumscribed over some predicate, apart from the proof or model theory of the chosen logic.

The term "consistent" has too often been used informally by researchers (e.g. in [10]) as if it had an intuitive and domain-independent meaning. We have shown that in at least one case a precise definition of the term is much more complex than intuition would have us believe, and that the definition is tightly bound up with the problem domain.

So researchers concerned with issues in nondeductive inference are not going to be able to rely on nonmonotonic logics to solve their problems for them, but perhaps in revealing the inadequacies of the formal systems we have also pointed out fruitful ways to attack these problems.

REFERENCES

1. Charniak, E., Motivation analysis, abductive unification, and non-monotonic equality, Artifi- cial Intelligence, to appear.

2. Clark, L., Negation as failure, in: H. Gallaire and J. Minker (Eds.), Logic and Databases (Plenum Press, New York, 1978) 293-322.

3. Davis, M., The mathematics of non-monotonic reasoning, Artificial Intelligence 13 (1980) 73 -80.

4. Etherington, D.W., Formalizing non-monotonic reasoning systems, Computer Science Tech. Rept. No. 83-1, University of British Columbia, Vancouver, BC, 1983.

5. Etherington, D.W. and Reiter, R., On inheritance hierarchies with exceptions, in: Proceedings AAA1-83, Washington, DC (1983) 104-108.

6. Goebel, R.G. and Goodwin, S., Applying theory formation to the planning problem, Research Rept. CS-87-02, Logic Programming and AI Group, Department of Computer Science, University of Waterloo, Waterloo, Ont., 1987.

7. Hanks, S. and McDermott, D., Temporal reasoning and default logics, Computer Science Research Rept. No. 430, Yale University, New Haven, CT, 1985.


8. Hanks, S. and McDermott, D., Default reasoning, nonmonotonic logics, and the frame problem, in: Proceedings AAA1-86, Philadelphia, PA (1986) 328-333.

9. Hayes,'P.J., The second naive physics manifesto, in: J.R. Hobbs and R.C. Moore (Eds.), Formal Theories of the Commonsense World (Ablex, Norwood, NJ, 1985) 1-36.

10. Joshi, A., Webber, B. and Weischedel, R., Default reasoning in interaction, in: Proceedings of the Non-Monotonic Reasoning Workshop (1984) 141-150.

11. Kautz, H., The logic of persistence, in: Proceedings 1JCAI-85, Los Angeles, CA (1985) 401-405.

12. Lifschitz, V., Formal theories of action, in: F. Brown (Ed.), The Frame Problem in Artificial Intelligence: Proceedings of the 1987 Workshop (Morgan Kaufman, Los Altos, CA, 1987).

13. Lifschitz, V., Pointwise circumscription: Preliminary report, in: Proceedings IJCAI-85, Los Angeles, CA (1985) 406-410.

14. Lifschitz, V., Some results on circumscription, in: Proceedings of the Non-Monotonic Reason- ing Workshop (1984) 151-164.

15. Loui, R.P., Response to Hanks and McDermott: Temporal evolution of beliefs and beliefs about temporal evolution, Cognitive Sci., to appear.

16. McCarthy, J., Circumscription--a form of non-monotonic reasoning, Artificial Intelligence 13 (1980) 27-39.

17. McCarthy, J., Applications of circumscription to formalizing common sense knowledge, in: Proceedings of the Non-Monotonic Reasoning Workshop (1984) 295-324.

18. McCarthy, J. and Hayes, P.J., Some philosophical problems from the standpoint of artificial intelligence, in: B. Meltzer and D. Michie (Eds.), Machine Intelligence 4 (Edinburgh Universi- ty Press, Edinburgh, 1969) 463-502.

19. McDermott, D.V., A temporal logic for reasoning about processes and plans, Cognitive Sci. 6 (1982) 101-155.

20. McDermott, D.V., Nonmonotonic logic II: Nonmonotonic modal theories, J. ACM 29 (1) (1982) 33-57.

21. McDermott, D.V. and Doyle, J., Non-monotonic logic I, Artificial Intelligence 13 (1980) 41-72. 22. Perlis, D. and Minker, J., Completeness results for circumscription, Computer Science Tech.

Rept. TR-1517, University of Maryland, College Park, MD, 1985. 23. Poole, D., On the comparison of theories: Perferring the most specific explanation, in:

Proceedings IJCA1-85, Los Angeles, CA (1985) 144-147. 24. Reiter, R., A logic for default reasoning, Artificial Intelligence 13 (1980) 81-132. 25. Shoham, Y., Time and causation from the standpoint of artificial intelligence, Computer

Science Research Rept. No. 507, Yale University, New Haven, CT, 1986.

Received Apr i l 1987

nonmonotonic logic and temporal projection

Documents