a lewisian logic of causal counterfactuals
TRANSCRIPT
A Lewisian Logic of Causal Counterfactuals
Jiji Zhang
Received: 28 May 2011 / Accepted: 2 November 2011
� Springer Science+Business Media B.V. 2011
Abstract In the artificial intelligence literature a promising approach to counter-
factual reasoning is to interpret counterfactual conditionals based on causal models.
Different logics of such causal counterfactuals have been developed with respect to
different classes of causal models. In this paper I characterize the class of causal
models that are Lewisian in the sense that they validate the principles in Lewis’s
well-known logic of counterfactuals. I then develop a system sound and complete
with respect to this class. The resulting logic is the weakest logic of causal coun-
terfactuals that respects Lewis’s principles, sits in between the logic developed by
Galles and Pearl and the logic developed by Halpern, and stands to Galles and
Pearl’s logic in the same fashion as Lewis’s stands to Stalnaker’s.
Keywords Causal models � Causal reasoning � Conditional logic �Counterfactual � Intervention
Introduction
Counterfactual reasoning is commonplace both in sciences and in everyday life, and
is an important subject matter for both philosophers and artificial intelligence
researchers. Despite the continuing controversies over the logic of conditionals in
philosophy, it is fair to say that the Stalnaker–Lewis semantics for counterfactual
conditionals (Stalnaker 1968; Stalnaker and Thomason 1970; Lewis 1973) enjoys
the greatest popularity. In the artificial intelligence literature, the general Stalnaker–
Lewis framework has also been employed to model counterfactuals (Ginsberg 1986)
and to develop theories of actions (Ginsberg and Smith 1987; Winslett 1988).
J. Zhang (&)
Department of Philosophy, Lingnan University, Tuen Mun, NT, Hong Kong
e-mail: [email protected]
123
Minds & Machines
DOI 10.1007/s11023-011-9261-z
More recently, a causal interpretation of counterfactuals, as statements about
consequences of (hypothetical) interventions, was vigorously developed and
defended, based on Pearl’s seminal and influential work on causal modeling (Pearl
1995, 1998, 2009).1 Logics of such causal counterfactuals have also been studied
from an axiomatic perspective (Galles and Pearl 1998; Halpern 2000). A natural
question to ask is how these logics are related to the well-known logics of
counterfactuals in the Stalnaker–Lewis framework. Galles and Pearl (henceforth
GP, 1998) compared their theory to Lewis’s, and showed that their logic of causal
counterfactuals respects Lewis’s logical principles (and, in addition, endorses a
principle called reversibility). In fact, by imposing a requirement of ‘‘unique
solution’’ on their models, which is analogous to Stalnaker’s assumption of ‘‘unique
closest world’’, GP’s logic also incorporates the stronger logic defended by
Stalnaker (1968).
Halpern (2000) relaxed GP’s requirement of ‘‘unique solution’’ and allowed
causal models with multiple solutions (or no solution). It is tempting to view the
difference between Halpern and GP in parallel to that between Lewis and Stalnaker,
and to expect that Halpern’s logic incorporates Lewis’s logic just as GP’s
incorporates Stalnaker’s. However, the expectation is false. As we will see, the
allowance of models with no solution immediately invalidates some of Lewis’s
principles. Moreover, even if we disallow models with no solution, the resulting
logic is still not Lewisian.
This observation suggests that there is an interesting class of causal models
waiting to be characterized, i.e., the class of models that validates Lewis’s logical
principles for counterfactuals. I shall provide a characterization of this class of
causal models in this paper. The characterization suggests an extension of Halpern’s
elegant axiomatic system, which I will show is sound and complete with respect to
the class of Lewisian causal models. This logic is the weakest logic of causal
counterfactuals that respects Lewis’s principles, and stands to GP’s logic in the
same fashion as Lewis’s stands to Stalnaker’s.
The rest of the paper is organized as follows. In Sect. 2, I briefly review the
relevant material on the logics of counterfactuals in the Stalnaker–Lewis
framework. In Sect. 3, I describe the general setup of causal models as (modifiable)
structural equations models, and the interpretation of counterfactuals based on such
models. I then introduce GP’s logic and Halpern’s logic for such causal
counterfactuals, and show, in Sect. 4, that while GP’s logic is Stalnakerian,
Halpern’s is not Lewisian, despite a superficial analogy between the Stalnaker–
Lewis contrast and the GP-Halpern contrast. The main results are presented in Sect.
5, where I characterize the class of Lewisian causal models (Sect. 5.1), and provide
an extension of Halpern’s system to axiomatize the logic with respect to the class
(Sect. 5.2). I conclude in Sect. 6.
1 The general approach has also been followed by prominent philosophers to illuminate the epistemology
of causation (Spirtes et al. 2000) and the nature of causal explanation (Woodward 2003; Woodward and
Hitchcock 2003).
J. Zhang
123
Two Logics of Counterfactuals in the Stalnaker–Lewis Framework
In Lewis’s theory, h! is used as the (‘would’) counterfactual operator ([ is used in
Stalnaker’s). So ‘‘u h! w’’ symbolizes the conditional statement ‘‘if it were the
case that u, it would be the case that w’’.2 The basic idea in the Stalnaker–Lewis
semantics is that the statement u h! w is true (at a world w) just in case w is true in
the most similar (to w) possible worlds at which u is true, or as is often put, w is true
in every closest u-world.3 For the purpose of this paper, it is not necessary to review
all the formal details of the semantics, but it will be helpful to recall a formalization
of the basic idea in terms of selection functions. Let L be a language and Ant(L) be
the set of sentences in L that can appear as antecedents in counterfactual
conditionals.4 Let W be a set of possible worlds. A (Lewisian) selection function is
a function f: Ant(L) 9 W ? 2W, that satisfies the following conditions for every
antecedents u1, u2, and every world w:
(S1) If u1 is true at w, then f(u1, w) = {w}.
(S2) u1 is true at every world in f(u1, w).
(S3) If u2 is true at every world at which u1 is true, and f(u1, w) = [, then f(u2,
w) = [.
(S4) If u2 is true at every world at which u1 is true, and u1 is true at some world
in f(u2, w), then f(u1, w) consists of all and only those worlds in f(u2, w) at which
u1 is true.
Intuitively, the selection function specifies, for a world w and an antecedent u,
the u-worlds closest to w. Condition (S1) requires that a world be closer to itself
than any other world. Condition (S2) ensures that the selected worlds are indeed u-
worlds. Condition (S3) requires that if the function returns a non-empty set for some
antecedent (which intuitively means that the antecedent is possible or entertainable
at w), it should also return a non-empty set for any weaker antecedent. Condition
(S4) in a way ensures that there is a consistent similarity or distance ordering
relative to w: if (S4) is violated, then there are two worlds, w1 and w2, such that w1 is
closer to w than w2 relative to one antecedent, but w2 is closer to w than w1 relative
to another antecedent.
Given a selection function f, u h! w is true at w if and only if w is true at every
world in f(u, w). From h! we can define e!: u e! w = df * (u h! * w),
which is true at w if and only if w is true at some world in f(u, w). Intuitively,
2 As Lewis (1973, p. 3) noted, this slightly ungrammatical reading of u h! w (instead of if it had been
the case that u, it would have been the case that w) was deliberate, in order not to interfere with tense.
Moreover, the antecedent is not required to be actually false, so some conditionals under the radar are,
properly speaking, not contrary-to-fact conditionals. Like Lewis, I will tolerate this terminological
inaptness.3 This formulation assumes that there are closest u-worlds, which Lewis labels the Limit Assumption
(1973, pp. 19–20). Lewis’s semantics can be formulated without this assumption, but the present, slightly
less general formulation fits our purposes well.4 In Lewis’s language, there is no restriction on the form of antecedent, so Ant(L) is just the set of all
sentences. But as we will see, the form of antecedent is restricted in the language of causal
counterfactuals. In that language, Ant(L) is a proper subset of the set of all sentences.
A Lewisian Logic of Causal Counterfactuals
123
u e! w stands for a ‘might’ counterfactual: if it were the case that u, it might be
the case that w. Note that when f(u, w) is empty, u h! w is trivially true and ue! w is trivially false at w.
Lewis allows all selection functions, while Stalnaker has an additional constraint:
(SS) f(u, w) is a singleton set or an empty set.5
In plain words, Lewis allows multiple closest worlds (for entertainable
antecedents), while Stalnaker requires a single or unique closest world (for
entertainable antecedents).
The logic of counterfactuals endorsed by Lewis (1973, p. 132), named VC, can
be axiomatized with the following axiom schemata (in addition to truth-functional
tautologies):
(VC1) u h! u(VC2) (* u h! u) . (w h! u)
(VC3) (u h! * w) _ (((u ^ w) h! v) : (u h! (w . v)))
(VC4) (u h! w) . (u . w)
(VC5) (u ^ w) . (u h! w)
(VC1) is valid due to condition (S2). (VC2) is valid due to conditions (S2) and
(S3). (VC3) corresponds to condition (S4). (VC4) and (VC5) are guaranteed by
condition (S1).
With the requirement of ‘‘unique closest world’’, Stalnaker’s logic, in addition to
these principles, contains the principle of conditional excluded middle (which is not
a theorem of VC):
(S) (u h! w) _ (u h! * w)
Indeed adding (S) as an extra axiom to Lewis’s logic yields Stalnaker’s (Lewis,
1973, p. 133).
Causal Models and Two logics of Causal Counterfactuals
I now turn to the logics of causal counterfactuals, counterfactuals interpreted based
on causal models. My description of the framework will closely follow the rigorous
presentation in (Halpern 2000). A signature for causal models is a tuple hU, V, Riwhere U and V are finite sets of variables, and R associates with each variable
X[U[V a finite set of values R(X). A causal model over a signature S is a tuple hS,
Fi where F is a collection of functions, such that for each X[V, there is one (and
only one) function fX:xY[U[V\{X}R(Y) ? R(X); that is, fX maps each value
configuration of U[V\{X} to a unique value of X.
Thus a causal model specifies, for each variable X[V, how the value of X depends
on the values of other variables. Intuitively the function fX models a causal
mechanism for X. Note that there are no functions for variables in U; how the values
5 Stalnaker’s (1968) own formulation is in terms of world-selection function instead of set-selection
function, where an absurd world is included to play the role of the empty set.
J. Zhang
123
of U are determined are not modeled. Rather, a value configuration of U is taken to
describe a set of background or boundary conditions for the modeled causal system.
For this reason, variables in U are called exogenous and variables in V are called
endogenous.
Such a causal model is also known as a structural equation model, as each fXcorresponds to a structural equation: X = fX(U[V\{X}). In concrete models, some
variables in U[V\{X} may be redundant in determining the value of X, and can be
omitted from the structural equation for X.
For illustration, here is a simple causal model borrowed from Pearl (2009,
p. 209). The signature of the model consists of one exogenous variable: U = {U},
and four endogenous variables: V = {X, Y, Z, W}, and all variables are binary:
R(U) = R(X) = R(Y) = R(Z) = R(W) = {0,1}. For each endogenous variable,
there is one (and only one) structural equation, specifying how that variable
depends on other variables: X = U; Y = X; Z = X; W = Y _ Z.
The model could, for example, represent the following situation in a firing squad:
The court decides whether to order execution (represented by the variable U). The
decision (yes or no) determines whether the captain on the squad orders shooting
(represented by the variable X). Two riflemen strictly follow the captain’s order
(Y represents whether rifleman 1 shoots and Z represents whether rifleman 2 shoots),
and shooting from either rifleman is sufficient to kill the prisoner (W represents
whether the prisoner dies).
Given a causal model M = hU, V, R, Fi , a (possibly empty) subset of
endogenous variables X ( V, and a possible value configuration x of X (i.e.,
x contains one and only one value for each variable in X), we write MX=x to denote
the causal model that results from M by replacing the structural equations for X in
M with X = x; that is, MX=x is the same as M except that for each X[X the equation
for X is modified to be X = x (where x is the component value in x for X), regardless
of the original equations for them in M. Obviously when X is empty, M[ is the same
as M. Because of this operation on models, the framework is also called modifiablestructural equation models.
Intuitively, MX=x models the (counterfactual) situation in which X is intervened
or forced to take the value x while the mechanisms for other endogenous variables
remain the same as modeled by M.6 In this framework, MX=x is the key to evaluate
counterfactuals with the antecedent X = x.
For example, suppose M is the aforementioned model representing the firing
squad scenario. Then MX=0 is the model with the structural equations: X = 0;
Y = X; Z = X; W = Y _ Z, and MZ=1 is the model with the structural equations:
X = U; Y = X; Z = 1; W = Y _ Z. Intuitively, MX=0 models the situation where
some intervention (say, from the prisoner’s friends) forces the captain not to order
shooting, regardless of the court’s decision, and MZ=1 models the situation where
some intervention (say, from the prisoner’s enemies) forces rifleman 2 to shoot,
regardless of the captain’s order. We can also model the situation where both
6 This approach to modeling interventions originated in econometrics (Strotz and Wold 1960; Fisher
1970), and was masterfully articulated and developed by Pearl (2009).
A Lewisian Logic of Causal Counterfactuals
123
interventions take place by MX=0, Z=1, which has the following structural equations:
X = 0; Y = X; Z = 1; W = Y _ Z.
A solution to a causal model, relative to a value configuration u of the exogenous
variables U, is a value configuration v of the endogenous variables V such that all
the structural equations in the model are simultaneously satisfied. In general, there
may or may not be a solution to a causal model relative to a value configuration of
the exogenous variables, and when there are, there may be more than one solution.
For example, in the firing squad model, relative to U = 1 (i.e., relative to the fact
that the court decides to order execution), there is a unique solution to the model:
X = 1 (the captain orders execution), Y = 1 (rifleman 1 shoots), Z = 1 (rifleman 2
shoots), and W = 1 (the prisoner dies). However, not every model features a unique
solution. For example, the model with the following structural equations: X = U;
Y = Z; Z = *Y; W = Y _ Z, has no solution relative to U = 1. By contrast, the
model with the following structural equations: X = U; Y = Z; Z = Y; W = Y _ Z,
has two solutions relative to U = 1: (X = 1, Y = 1, Z = 1, W = 1) and (X = 1,
Y = 0, Z = 0, W = 0).
The simplest counterfactual conditionals in this framework are of this form: if it
were the case that X = x (or more explicitly, if X were intervened to take value x), it
would be the case that Y = y, where X and Y are endogenous variables. Such a
statement gets a truth value in a causal model M relative to a value configuration u:
the statement is true just in case every solution to MX=x relative to u has it that
Y = y. More generally, the antecedent can be a conjunction of interventions, or say
an intervention on a set of variables, and the consequent can be any Boolean
combination of such statements as Y = y. For example, it makes sense to say: if
X were intervened to take value x and Z intervened to take value z, it would be the
case that Y = y or W = w. This statement is true in M relative to u just in case every
solution to MX=x, Z=z relative to u satisfies that Y = y or W = w.
To make it more precise, I shall use the following language adapted from Halpern
(2000), defined over a signature S = hU, V, Ri. The basic counterfactual formulasare of the form [X1 = x1^���^Xk = xk]u, 7 where X1,…, Xk are distinct variables in
V, and u is a Boolean combination of formulas of the form Y(u) = y, where u is a
value configuration of the variables in U. It stands for the statement ‘‘if X1 were
intervened to take value x1, …, and Xk were intervened to take value xk, it would be
the case that u, relative to u’’. For convenience, I will write in bold, X = x (or
X(u) = x), to abbreviate a conjunction of Xi = xi (or Xi(u) = xi). So
[X = x](Y(u) = y) is understood as: if the variables in X were intervened to take
the value configuration x, the variables in Y would have the value configuration y,
relative to u. In the special case when X is empty, the formula [X = x]u is written
as [true]u. The language contains all the Boolean combinations of the basic
counterfactual formulas. I will refer to this language for causal counterfactuals as
LCC(S).
Notice that the form of antecedents is restricted to a conjunction of interventions.
How to handle disjunctive antecedents properly in this framework is an interesting
7 The notation of [] (and h i to be introduced later) is obviously borrowed from dynamic logic (e.g.,
Harel 1979).
J. Zhang
123
open question.8 It is worth noting that although disjunctive antecedents are allowed
in the Stalnaker–Lewis framework, the closest-world interpretation of counterfac-
tuals with disjunctive antecedents also faces serious challenges (Ellis et al. 1977;
Lewis 1977).
Given any causal model M over S, every formula in LCC(S) has a truth value in
M. A basic counterfactual formula [X = x]u(u) is true in M if and only if every
solution to MX=x relative to u satisfies u. Other formulas are truth functions of the
basic counterfactual formulas.
Again, let me use the firing squad example to illustrate the ideas. Suppose M is
the model representing the firing squad scenario. Recall that M has one binary
exogenous variable U, and four binary endogenous variables X, Y, Z, W, with these
structural equations: X = U; Y = X; Z = X; W = Y _ Z. Suppose the actual value
of U is 1 (i.e., the court actually decides to order execution).
Consider the conditional: if Captain had not given a signal, the prisoner would
not have died. This conditional is expressed by the formula [X = 0](W(1) = 0). To
evaluate this formula, we consider the model MX=0: X = 0; Y = X; Z = X;
W = Y _ Z. The solution to MX=0 (relative to U = 1) gives W the value 0. So the
conditional is true in this model.
Consider another conditional: if rifleman 2 had not shot, the prisoner would not
have died. This is expressed by the formula [Z = 0](W(1) = 0). To evaluate this
formula, we consider the model MZ=0: X = U; Y = X; Z = 0; W = Y _ Z. The
solution to MZ=0 (relative to U = 1) gives W the value 1. So the conditional is false
in this model.
A formula is valid with respect to a class of causal models if and only if it is true
in every model in the class. Obviously different classes of causal models may
validate different sets of formulas and so generate different logics of causal
counterfactuals. GP (Galles and Pearl 1998) considered the class of causal models
with the following ‘‘unique-solution’’ property: for every X ( V, every value
configuration x of X, and every value configuration u of U, MX=x has one and only
one solution relative to u. I will refer to this class of models (over signature S) as
Mun(S), and the corresponding logic as GP’s logic.9 Halpern (2000) provided an
elegant axiomatization of GP’s logic, but also considered the logic with respect to
the class of all causal models. I will refer to the class of all causal models (over
signature S) Mall(S), and the corresponding logic as Halpern’s logic.
8 It might be tempting to simply dismiss the problem, on the ground that counterfactuals with disjunctive
antecedents concern consequences of interventions that are not well defined, and so should not bear truth
values under the causal semantics. But some counterfactuals of this sort seem to have as determinate a
truth value as any counterfactual can. Consider the firing squad example. Suppose, as it happened, the
court decided not to order execution. As a result, neither rifleman shot, and the prisoner survived. The
following counterfactual seems clearly true: if rifleman 1 had shot or rifleman 2 had shot, the prisoner
would have died.9 GP (as well as Halpern) also considered the class of recursive models, a subclass of Mun(S). That class,
though important for other purposes, does not need a special attention for the purpose of this paper.
A Lewisian Logic of Causal Counterfactuals
123
Halpern’s Logic is Not Lewisian
Galles and Pearl (1998) compared their logic to Lewis’s. They briefly indicated that
their semantics could be recast in Lewis’s terms, with a less elusive and more
principled similarity measure. Their suggestion is that given a causal model, each
value configuration of all variables can be taken as a possible world, and world w1 is
more similar or closer to world w than world w2 if and only if it takes a smaller
number of interventions to transform w to w1 than it does to transform w to w2.
This similarity measure, however, is not quite right for their purpose. Consider,
for example, a causal model M, over the signature h{U}, {X, Y, Z}, Ri where
R(U) = R(X) = R(Y) = R(Z) = {0, 1} (i.e., all variables are binary), with these
structural equations: X = U; Y = X; Z = X. In this model, the counterfactual
[Y = 1^Z = 1](X(0) = 0) is true, because in the solution to MY=1, Z=1 relative to
U = 0, the value of X is 0. However, according to the similarity measure suggested
by GP, the world (U = 0, X = 1, Y = 1, Z = 1) is closer to the (actual) world
(U = 0, X = 0, Y = 0, Z = 0) than the world (U = 0, X = 0, Y = 1, Z = 1) is,
because it takes only one local intervention (i.e., forcing X to be 1) to transform
(U = 0, X = 0, Y = 0, Z = 0) to (U = 0, X = 1, Y = 1, Z = 1), but takes two
local interventions (i.e., forcing Y to be 1 and forcing Z to be 1) to transform
(U = 0, X = 0, Y = 0, Z = 0) to (U = 0, X = 0, Y = 1, Z = 1). Thus the
suggested similarity measure is not quite right for causal counterfactuals.
Instead of working out the corresponding similarity measure, it is more
straightforward to see the analogy in terms of selection functions. In GP’s
framework, where only causal models with the ‘‘unique-solution’’ property is
considered, a causal model M induces an obvious ‘‘selection function’’ for each
value configuration u of U: the function selects, for each antecedent X = x, the
unique solution to MX=x relative to u as the antecedent world closest to the ‘‘actual’’
world (which is taken to be the solution to M). This function is not yet a fully
specified selection function, because it is only defined relative to the ‘‘actual’’ world,
but it suffices for our present purpose. Obviously a counterfactual is true in M if and
only if it is true according to the ‘‘actual’’ selection function induced by M. It is
straightforward to check that this function, for the ‘‘actual’’ world and every
antecedent, satisfies the conditions for a Stalnakerian selection function: (S1)-(S4)
plus (SS), as explained in Sect. 2. It is thus no accident that the following schemata,
which are translations of the axiom schemata (VC1)-(VC5) and (S) into the
language for causal counterfactuals, are all valid in GP’s logic.
(VC1c) [X = x](X(u) = x)
(VC2c) [X = x](X(u) = x) . [Y = y](X(u) = x)
(VC3c) [X = x](Y(u) = y) _ ([X = x^Y = y]u(u)
: [X = x](Y(u) = y . u(u)))
(VC4c) [X = x]u(u) . [true](X(u) = x . u(u))
(VC5c) [true](X(u) = x ^u(u)) . [X = x]u(u)
(Sc) [X = x]u(u) _ [X = x]*u(u)
J. Zhang
123
The validity of these schemata with respect to Mun(S) is very easy to verify. GP’s
logic thus incorporates Stalnaker’s logic, and in this sense may be called
Stalnakerian.
On the other hand, Halpern’s logic, by allowing causal models with multiple
solutions, obviously invalidates (Sc). It may be tempting to view the difference
between Halpern and GP as parallel to the difference between Lewis and
Stalnaker, and to expect Halpern’s logic to be Lewisian. This expectation is, in a
way, obviously false. Since Halpern also allows models with no solution under
some intervention, counter-models to (VC2c) are easy to find. In any model
M such that MX=x has no solution for some X = x and relative to some u,
[X = x](X(u) = x) is trivially true, but for Y = V and a value configuration
y consistent with X = x, [Y = y] (X(u) = x) is false. The rationale behind (VC2)
in Lewis’s theory is that only an antecedent that is not possible or entertainable
counterfactually implies its negation. This rationale is violated in causal models
with no solution: when MX=x has no solution, X = x is still possible (in the sense
that it is part of a solution to some other intervention) but counterfactually implies
X = x.
Therefore, causal models that have no solution under some intervention are
not Lewisian. Suppose we disallow such models and consider only (and all)
those causal models that have at least one solution under any intervention. Is the
corresponding logic Lewisian? The answer is still no, though it is less obvious.
In fact, (VC3c) and (VC5c) are still not valid. I will present a counter-model
to (VC3c), the diagnosis and treatment of which will also take care of
(VC5c).
Consider the signature h{}, {X, Y, Z, W}, Ri where R(X) =
R(Y) = R(Z) = R(W) = {0, 1} (i.e., all variables are binary), and a model
M over the signature with the following structural equations: X = Y ^ Z;
Y = X ^ W; Z = *W; W = *Z. It is easy to check that for any X ( {X, Y, Z,
W} and any value configuration x of X, MX=x has a solution. In particular, MX=1 has
two solutions: (X = 1, Y = 1, Z = 0, W = 1) and (X = 1, Y = 0, Z = 1, W = 0).
Thus [X = 1](Y = 1) is false, and [X = 1](Y = 1 . W = 1) is true. Now
MX=1^Y=1 also has two solutions: (X = 1, Y = 1, Z = 0, W = 1) and (X = 1,
Y = 1, Z = 1, W = 0). Thus [X = 1^Y = 1](W = 1) is false. Hence,
[X = 1^Y = 1](W = 1) : [X = 1](Y = 1 . W = 1) is false. Therefore, an
instance of (VC3c) is false in this model.
The validity of (VC3) in Lewis’s logic is due to the constraint that (S4) places
on selection functions. In that light, the failure of (VC3c) here owes to the
following circumstance: MX=1 has a solution in which Y = 1, but not every
solution to MX=1^Y=1 is a solution to MX=1. If we consider the natural ‘‘selection
function’’ that, for each antecedent X = x, selects the set of solutions to MX=x as
the set of closest worlds, we can see that the condition (S4) is violated due to that
circumstance.
This diagnosis suggests a restriction to causal models that yields a Lewisian logic
of causal counterfactuals, to which I now turn.
A Lewisian Logic of Causal Counterfactuals
123
A Lewisian Logic of Causal Counterfactuals
The Class of Lewisian Causal Models
The restriction I will put down is the following condition:
Definition 1 [solution-conservative] A causal model M = hU, V, R, Fi is called
solution-conservative if for every X ( V, Y [ V\X, and every value configuration
x of X, y of Y, and u of U, if MX=x has a solution relative to u consistent with Y = y,
then every solution to MX=x^Y=y relative to u is also a solution to MX=x relative to u.
It should be clear that this condition is motivated by a consideration of the
condition (S4) for selection functions. I name the condition ‘‘solution-conservative’’
because the condition requires that compared to the solutions to MX=x, no newsolution should emerge for MX=x^Y=y, unless no solution to MX=x is consistent with
Y = y.
Another restriction we have mentioned in the previous section is that for every
X ( V, every value configuration x of X, and u of U, MX=x has at least one solution
relative to u. I shall call models that satisfy this condition solution-ful. I now show
that the class of causal models that validate (VC1c)–(VC5c) is precisely the class of
models that are both solution-ful and solution-conservative.
For that purpose, we need the following lemma, showing that Definition 1 is
equivalent to a seemingly more general version.
Lemma 2 If a causal model M = hU, V, R, Fi is solution-conservative, then for
every disjoint sets X, Y ( V, and every value configuration x of X, y of Y, and u of
U, if MX=x has a solution relative to u consistent with Y = y, then every solution to
MX=x^Y=y relative to u is also a solution to MX=x relative to u.
Proof We do induction on |Y|, the number of variables in Y. The statement is
trivial when |Y| = 0. When |Y| = 1, the statement is true by Definition 1. Suppose
the statement is true for |Y| = k. Consider the case where Y = Y0[{Y} such that
|Y0| = k and Y62Y0. Suppose MX=x has a solution relative to u consistent with Y = y,
where y = y0[{y}. This solution of course is consistent with Y0 = y0. By the
induction hypothesis, every solution to MX=x^Y0=y0 relative to u is also a solution to
MX=x relative to u. Note also that any solution to MX=x relative to u consistent with
Y = y is also a solution to MX=x^Y0=y0 relative to u, because for variables not in Y0,MX=x^Y0=y0 has the exact same equations as MX=x. Thus there is a solution to
MX=x^Y0=y0 relative to u consistent with Y = y. By Definition 1, every solution to
MX=x^Y0=y0^Y=y, which is just MX=x^Y=y, relative to u is also a solution to
MX=x^Y0=y0. Therefore, every solution to MX=x^Y=y relative to u is also a solution to
MX=x relative to u. Q.E.D.
Theorem 3 Let M be any causal model over a signature S = hU, V, Ri . All
instances of (VC1c)–(VC5c) in LCC(S) are true in M if and only if M is solution-ful
and solution-conservative.
Proof (If) Suppose M is solution-ful and solution-conservative. We show that all
instances of (VC1c)–(VC5c) are true in M.
J. Zhang
123
The case of (VC1c) is trivial. (Indeed (VC1c) is valid with respect to Mall(S).)
The case of (VC2c) is also trivial given that M is solution-ful.
For (VC3c), it is equivalent to show that for every disjoint X, Y ( V, every value
configuration x of X, y of Y, and u of U, and every u, if [X = x](Y(u) = y) is false in
M, then ([X = x^Y = y]u(u) : [X = x](Y(u) = y . u(u))) is true in M. Suppose
[X = x](Y(u) = y) is false in M. This means that some solution to MX=x relative to
u is consistent with Y = y. It follows from Lemma 2 that every solution to MX=x^Y=y
relative to u is also a solution to MX=x relative to u. On the other hand, every solution to
MX=x relative to u that is consistent with Y = y is also a solution to MX=x^Y=y relative
to u, simply because for variables not in Y, MX=x^Y=y has the exact same equations as
MX=x. Thus, u is satisfied in every solution to MX=x^Y=y relative to u if and only if u is
satisfied in every solution to MX=x relative to u that is consistent with Y = y. There-
fore, [X = x^Y = y]u(u) : [X = x](Y(u) = y . u(u)) is true in M.
For (VC4c), suppose [X = x]u(u) is true in M, which means that every solution
to MX=x relative to u satisfies u. Note that every solution to M relative to
u consistent with X = x is also a solution to MX=x relative to u, because for
variables not in X, MX=x has the exact same equations as M. Hence every solution to
M relative to u is either inconsistent with X = x or satisfies u, which means that
[true](X(u) = x . u(u)) is true in M. (Notice that no constraint on causal models is
invoked in the argument: (VC4c) is also valid with respect to Mall(S).)
For (VC5c), suppose [true](X(u) = x ^u(u)) is true in M, which means that
every solution to M relative to u satisfies X = x and u. Since M is solution-ful, there
is indeed a solution to M, which is hence consistent with X = x. Since M is also
solution-conservative, by Lemma 2, every solution to MX=x relative to u is also a
solution to M relative to u, and hence satisfies u. Therefore, [X = x]u(u) is also
true in M.
(Only if) As already shown in the previous section, if M is not solution-ful, then
some instance of (VC2c) is false in M. On the other hand, if M is not solution-
conservative, then there exist X ( V, Y[V\X, and some configuration x of X, y of
Y, and u of U such that (1) MX=x has a solution relative to u consistent with Y = y,
but (2) some solution to MX=x^Y=y relative to u is not a solution to MX=x relative to
u. (1) implies that [X = x](Y(u) = y) is false in M. Let V = v1,…, and V = vn be
all the solutions to MX=x relative to u which are consistent with Y = y, and let
u(u) be V(u) = v1_…_V = vn. Then [X = x](Y(u) = y . u(u)) is true in
M. However, (2) implies that [X = x^Y = y]u(u) is false in M. Thus
[X = x^Y = y]u(u) : [X = x](Y(u) = y . u(u)) is false in M. Hence an
instance of (VC3c) is false in M. Q.E.D.
Let Msfsc(S) be the class of causal models over S that are both solution-ful and
solution-conservative. It follows from Theorem 3 that Msfsc(S) is the biggest class of
causal models over S with respect to which Lewis’s principles are valid, and so the
logic with respect to the class is the weakest Lewisian logic of causal
counterfactuals. Obviously, GP’s class Mun(S) is contained in Msfsc(S). The
containment is proper even for a signature as simple as h{}, {X, Y}, Ri,R(X) = R(Y) = {0, 1}. Over this signature we can define a model with the
following equations: X = Y; Y = X. It is very easy to check that this model belongs
A Lewisian Logic of Causal Counterfactuals
123
to Msfsc(S), though it does not belong to Mun(S). Halpern (2000, p. 320) provided
some justification for going beyond Mun(S). I suspect that, besides the formal
consideration given here, there may also be substantive reasons to stop at Msfsc(S).
Perhaps most models of real interest are solution-ful and solution-conservative, but
an investigation into this matter has to await another occasion.
Axiomatization
The logic with respect to Msfsc(S) is of course a (proper) extension of Halpern’s
logic. Based on Halpern’s elegant axiomatization of his logic, I now develop a
sound and complete system of the logic with respect to Msfsc(S).
To state Halpern’s axioms, it is convenient to use a defined operator hi :
hX ¼ xiu ¼df � X ¼ x½ � �u
Clearly hX=xiu corresponds to a ‘might’ counterfactual: if X were intervened to
take value x, it might be the case that u.
For the logic with respect to Mall(S) over S = hU, V, Ri, Halpern’s axiomatic
system is given by the rule of modus ponens and a number of axiom schemata. I will
list his schemata in a slightly different order to facilitate subsequent discussions.
H0. All instances of truth-functional tautologies
H1. [X = x](Y(u) = y . Y(u) = y0), where y = y0
H2. [X = x](_y[R(Y)Y(u) = y)
H3. _y[R(Y) hX=xi (Y = y) ^ _y[R(Y)[X = x](Y = y), where X = V\{Y}10
H4. hX=xi (u1(u1)^…^uk(uk)) : (h X=xi u1(u1)^…^ hX=xi uk(uk)), if
ui = uj
H5. ([X = x]u ^ [X = x](u . w)) . [X = x]wH6. [X = x]u, if u is a truth-functional tautology
H7. [X = x^Y = y](Y(u) = y)
H8. hX=xi (W(u) = w^Y(u) = y) . hX=x^W = wi(Y(u) = y)
H9. (hX=x^Y = yi(W(u) = w^Z(u) = z) ^ hX=x^W = wi(Y(u) = y^Z(u) =
z)). hX = xi(W(u) = w^Y(u) = y^Z(u) = z), where W and Y are distinct;
Z = V\(X [ {W, Y})
The schemata H1-H4 are in effect a description of the general setup of causal
models. H1 expresses the idea that different values of a variable are mutually
exclusive. H2 expresses the idea that the values in R(Y) are exhaustive for every
Y. H3 expresses the idea that for each endogenous variable Y, there is a function that
maps each value configuration of other variables in the model to a value of Y, or in
other words, if we set the value for every other variable, there is one and only one
value (solution) for Y. H4 expresses the idea that different value configurations of
the exogenous variables correspond to different background conditions and can be
considered separately.
10 Halpern (2000, p. 326) used a slightly different but equivalent expression in the first conjunct of this
schema (his D9).
J. Zhang
123
H5 and H6 are adapted from familiar principles in modal logic. The remaining
schemata H7-H9 are key principles for causal counterfactuals. H7 is known as the
principle of effectiveness and is clearly a version of (VC1c). H8 is a version of the
principle known as composition. In GP’s logic, the principle can be formulated in
terms of [] rather than hi , and the [] version is clearly related to (VC5c). In
Halpern’s logic, however, the [] version is not valid.11 Just as (VC5c) is valid with
respect to Msfsc(S), the [] version of composition is valid with respect to Msfsc(S),
and will be a theorem in our system.
The schema H9 is a version of the principle known as reversibility. This principle
is not validated by the Stalnaker–Lewis semantics, and reveals the extra constraint
imposed by the causal semantics on the logic of counterfactuals. In GP’s logic, this
principle can be formulated in a much simpler way (without involving Z), but the
simplification is not valid in Halpern’s logic. As we will see, however, the
simplification is available in the system I am heading to.
For the logic with respect to Msfsc(S), we need only add two axioms to Halpern’s
system, one expressing that models are solution-ful (SF) and the other expressing
that models are solution-conservative (SC).
SF _y2R Yð Þ hX ¼ xiðYðuÞ ¼ yÞSC ðhX ¼ xiðWðuÞ ¼ wÞ ^ hX ¼ x ^W ¼ wiðYðuÞ ¼ yÞÞ �hX ¼ xiðWðuÞ ¼ w ^ YðuÞ ¼ yÞ
For convenience I will call Halpern’s system ALL, and call the system
ALL ? SF ? SC the system SFSC. Given Halpern’s proof that the system ALL is
sound and complete (for the language LCC(S)) with respect to Mall(S), it is quite
easy to establish the soundness and completeness of SFSC with respect to Msfsc(S),
with the help of the following two lemmas.
Lemma 4 Let M be any causal model over S = hU, V, Ri. All instances of SF in
LCC(S) are true in M if and only if M is solution-ful.
Proof [If] Suppose M is solution-ful. Then for every X ( V, and every value
configuration x of X and u of U, MX=x has a solution relative to u. So for every Y[V,
there is some y[R(Y) such that hX=xi (Y(u) = y) is true in M. Hence
_y[R(Y) hX=xi (Y(u) = y) is true in M.
[Only if] Suppose M is not solution-ful. Then there exist X ( V, a value
configuration x of X and a value configuration u of U such that MX=x has no solution
relative to u. That means for every y[R(Y), hX=xi (Y(u) = y) is false in M. Hence
_y[R(Y) hX=xi (Y(u) = y) is false in M. Q.E.D.
Lemma 5 Let M be any causal model over S = hU, V, Ri. All instances of SC in
LCC(S) are true in M if and only if M is solution-conservative.
Proof (If) Suppose M is solution-conservative. We show that SC is true in M for
every X, Y ( V, W[V\X, and every value configuration x of X, y of Y, w of W, and
11 Halpern (2000, p. 326) seemed to remark in passing that the [] version of composition is also valid in
his logic, which, if I understood his remark correctly, was a mistake.
A Lewisian Logic of Causal Counterfactuals
123
u of U. Suppose hX=xi (W(u) = w) ^ hX=x^W = wi(Y(u) = y) is true in M. That
means MX=x has a solution relative to u in which W = w, and MX=x^W=w has a
solution relative to u in which Y = y. Since M is solution-conservative, the solution
to MX=x^W=w is also a solution to MX=x. Hence there is a solution to MX=x relative to
u in which W = w and Y = y. So hX=xi (W(u) = w^Y(u) = y) is also true in
M. Therefore, all instances of SC are true in M.
(Only if) Suppose M is not solution-conservative. That means there exist X,Y ( V, W[V\X, and some value configuration x of X, y of Y, w of W, and u of U,
such that MX=x has a solution relative to u in which W = w, which implies
that hX=xi (W(u) = w) is true in M, but some solution to MX=x^W=w is not a
solution to MX=x. Let V = v be a solution to MX=x^W=w which is not a solution to
MX=x. Then hX=x^W = wi(V(u) = v) is true in M, but hX=xi (W(u) =
w^V(u) = v) is false in M. We have thus an instance of SC that is false in
M. Q.E.D.
I now build on Halpern’s proof (of Theorem 3.3 in Halpern 2000) to show that
the system SFSC is sound and complete with respect to Msfsc(S). Since Halpern’s
completeness proof employs the familiar method of canonical models, I follow the
standard strategy to extend his proof (for a textbook demonstration of this strategy,
see e.g., Hughes and Cresswell 1996, ch. 6).
Theorem 6 Let S = hU, V, Ri be any signature. The system SFSC for the
language LCC(S) is sound and complete with respect to Msfsc(S).
Proof Halpern has proved that the system ALL is sound and complete with
respect to Mall(S) (Halpern 2000, Theorem 3.3). The soundness of
SFSC = ALL ? SF ? SC immediately follows from his soundness result, Lemma
4, and Lemma 5.
For completeness, I will ride on Halpern’s completeness proof, which employs
the method of canonical models. The idea is that for any formula u in
LCC(S) consistent with the system SFSC, {u} can be extended to a maximally
SFSC-consistent set of formulas C. From C we can construct a causal model M over
S such that for every formula w in LCC(S), w[C if and only if C is true in
M. Halpern’s proof of this fact for the system ALL carries over with no change to
the system SFSC. What remains to be shown is just that the constructed model
M belongs to the desired class, i.e., that M is solution-ful and solution-conservative.
But that follows from Lemmas 2 and 3: since all instances of SC and SF are
theorems of SFSC, they must belong to C, and hence are true in M, which, by
Lemmas 2 and 3, implies that M is solution-ful and solution-conservative. Thus
every formula consistent with the system SFSC is satisfiable in some causal model
in Msfsc(S). Q.E.D.
As already mentioned, for the system SFSC, the axiom schema H9 can be
replaced by a simpler and more elegant form of reversibility.
H9* (h X=x^Y = yi(W(u) = w) ^ hX=x^W = wi(Y(u) = y))
. hX=xi (W(u) = w^Y(u) = y)
J. Zhang
123
To see this, first note that H9* is valid with respect to Msfsc(S). Here is a proof.
For any X ( V and two distinct variables Y, W[V\X, and any value configuration
x of X, y of Y, w of W, and u of U, suppose hX=x^Y = yi(W(u) = w) ^ hX=x^W = wi(Y(u) = y) is true in a model M in Msfsc(S). That
means MX=x^Y=y has a solution relative to u consistent with W = w, and MX=x^W=w
has a solution relative to u consistent with Y = y. Let V = v be the solution to
MX=x^Y=y relative to u consistent with W = w. It is then also a solution to
MX=x^Y=y^W=w relative to u. Since M is solution-conservative, it follows that it is
also a solution to MX=x^W=w relative to u. The fact that V = v is both a solution to
MX=x^Y=y and a solution to MX=x^W=w relative to u implies that it is a solution to
MX=x relative to u, because every equation in MX=x appears in MX=x^Y=y or
MX=x^W=w (or both). Since V = v is consistent with both Y = y and
W = w, hX=xi (W(u) = w^Y(u) = y) is also true in M.
Therefore, if we replace H9 with H9* in the system SFSC, the system is still
sound. To show that it is also complete, it suffices to derive H9 in the new system.
Using H5, it is easy to derive that
ðhX¼ x^Y ¼ yiðWðuÞ ¼w^ZðuÞ ¼ zÞ ^ hX¼ x^W ¼wiðYðuÞ ¼ y^ZðuÞ ¼ zÞÞ� ðhX¼ x^Y ¼ yiðWðuÞ ¼wÞ ^ hX¼ x^W ¼wiðYðuÞ ¼ yÞÞ
ð1ÞCombined with H9*, we can further derive
ðhX¼ x^Y ¼ yiðWðuÞ ¼w^ZðuÞ ¼ zÞ ^ hX¼ x^W ¼wiðYðuÞ ¼ y^ZðuÞ ¼ zÞÞ� hX¼ xiðWðuÞ ¼w^YðuÞ ¼ yÞ
ð2ÞMoreover, it is an instance of H8 that
hX ¼ x ^ Y ¼ yiðWðuÞ ¼ w ^ ZðuÞ ¼ zÞ � hX ¼ x ^ Y ¼ y ^W ¼ wiðZðuÞ ¼ zÞð3Þ
And from SC we can easily derive
ðhX ¼ xiðWðuÞ ¼ w ^ YðuÞ ¼ yÞ ^ hX ¼ x ^ Y ¼ y ^W ¼ wiðZðuÞ ¼ zÞÞ� hX ¼ xiðWðuÞ ¼ w ^ YðuÞ ¼ y ^ ZðuÞ ¼ zÞ
ð4Þ
H9 obviously follows from (2), (3), and (4).
The [] version of reversibility, however, is not a theorem of SFSC. A simple
counter-model to the [] version is the model with two variables and the structural
equations: X = Y; Y = X. The [] version is of course valid in GP’s logic. Just as
there is no difference between ‘would’ counterfactuals and ‘might’ counterfactuals
(for entertainable antecedents) in Stalnaker’s logic, there is no difference between []
and h i in GP’s logic.
To reach GP’s logic, we need to add a special case of (Sc):
[X = x](Y(u) = y) _ [X = x](Y(u) = y), to SFSC. GP’s system stands to SFSCin the same way as Stalnaker’s system stands to Lewis’s.
A Lewisian Logic of Causal Counterfactuals
123
Conclusion
Modifiable structural equation models provide a powerful framework to model
causation and intervention, based on which a conception of counterfactual reasoning
as reasoning about consequences of interventions has been rigorously developed. In
this paper, I aimed to deepen our understanding of the connection between this
semantic framework based on causal models and the popular possible-world
semantics for counterfactuals. In particular, I have provided a precise character-
ization of the causal models that validate Lewis’s well-known logical principles for
counterfactuals. The characterization delineates a class of causal models that yields
the weakest Lewisian logic of causal counterfactuals. A sound and complete
axiomatization of this logic was also given.
The claim of being ‘‘weakest’’ is of course relative to the present definition of
causal models. That definition may be further generalized. In particular, as Halpern
(2000) and Pearl (2009) suggested, the requirement that fX be a function for each
X may be relaxed. In other words, there may not be a unique solution for some
variable X even when all other variables in the model have been fixed. Halpern
conjectured that his results could be straightforwardly extended to the more general
setup. It also seems to me that my characterization of Lewisian causal models will
survive that generalization without essential change.
In addition, there are also different ways to use causal models to evaluate
counterfactuals. The way I have followed in this paper is natural and especially
influential, but there exists an alternative theory of counterfactuals formulated in a
similar causal model framework (Hiddleston 2005). It is not yet clear what logic of
counterfactuals results from the alternative theory.
The class of Lewisian causal models identified in this paper was motivated and
studied in a purely formal way, as my main purpose was to improve our
understanding of the formal connections between the two approaches to interpreting
counterfactuals. However, it seems probable to me that there may be other,
philosophical or practical, motivations for this class of models. At any rate, for
people who take both Lewis’s logic and the causal model approach seriously, there
is every reason to explore in more detail the philosophical implications and/or
justifications for the Lewisian restrictions on causal models.
Acknowledgments I thank Lam Wai Yin for helpful discussions on issues related to this article, and the
audiences of a seminar at Carnegie Mellon University for useful feedback. My research was supported in
part by the Research Grants Council of Hong Kong under the General Research Fund LU341910.
References
Ellis, B., Jackson, F., & Pargetter, R. (1977). An objection to possible-world semantics for counterfactual
logics. Journal of Philosophical Logic, 6, 355–357.
Fisher, F. M. (1970). A correspondence principle for simultaneous equation models. Econometrica, 38,
73–92.
Galles, D., & Pearl, J. (1998). An axiomatic characterization of causal counterfactuals. Foundation ofScience, 3, 151–182.
Ginsberg, M. L. (1986). Counterfactuals. Artificial Intelligence, 30, 35–79.
J. Zhang
123
Ginsberg, M. L., & Smith, D. E. (1987). Reasoning about action I: A possible worlds approach. In F.
M. Brown (Ed.), The frame problem in artificial intelligence (pp. 233–258). Los Altos, CA: Morgan
Kaufmann.
Halpern, J. Y. (2000). Axiomatizing causal reasoning. Journal of Artificial Intelligence Research, 12,
317–337.
Harel, D. (1979). First-order dynamic logic. Berlin & New York: Springer.
Hiddleston, E. (2005). A causal theory of counterfactuals. Nous, 39, 632–657.
Hughes, G. E., & Cresswell, M. J. (1996). A new introduction to modal logic. London & New York:
Routledge.
Lewis, D. (1973). Counterfactuals. Oxford: Blackwell.
Lewis, D. (1977). Possible-world semantics for counterfactual logics: A rejoinder. Journal ofPhilosophical Logic, 6, 359–363.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–710.
Pearl, J. (1998). Graphs, causality, and structural equation models. Sociological Methods and Research,27, 226–284.
Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge, UK: Cambridge
University Press.
Spirtes, P., Glymour, G., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge,
MA: MIT Press.
Stalnaker, R. (1968). A theory of conditionals. In N. Rescher (Ed.), Studies in logical theory (pp. 98–112).
Oxford: Blackwell.
Stalnaker, R., & Thomason, R. H. (1970). A semantic analysis of conditional logic. Theoria, 36, 23–42.
Strotz, R. H., & Wold, H. O. A. (1960). Recursive versus nonrecursive systems: An attempt at synthesis.
Econometrica, 28, 417–427.
Winslett, M. (1988). Reasoning about action using a possible worlds approach. In Proceedings of theSeventh American Association of Artificial Intelligence Conference (pp. 89–93).
Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford & New York:
Oxford University Press.
Woodward, J., & Hitchcock, C. (2003). Explanatory generalizations, part I: A counterfactual account.
Nous, 37, 1–24.
A Lewisian Logic of Causal Counterfactuals
123