a lewisian logic of causal counterfactuals

A Lewisian Logic of Causal Counterfactuals

Jiji Zhang

Received: 28 May 2011 / Accepted: 2 November 2011

� Springer Science+Business Media B.V. 2011

Abstract In the artificial intelligence literature a promising approach to counter-

factual reasoning is to interpret counterfactual conditionals based on causal models.

Different logics of such causal counterfactuals have been developed with respect to

different classes of causal models. In this paper I characterize the class of causal

models that are Lewisian in the sense that they validate the principles in Lewis’s

well-known logic of counterfactuals. I then develop a system sound and complete

with respect to this class. The resulting logic is the weakest logic of causal coun-

terfactuals that respects Lewis’s principles, sits in between the logic developed by

Galles and Pearl and the logic developed by Halpern, and stands to Galles and

Pearl’s logic in the same fashion as Lewis’s stands to Stalnaker’s.

Keywords Causal models � Causal reasoning � Conditional logic �Counterfactual � Intervention

Introduction

Counterfactual reasoning is commonplace both in sciences and in everyday life, and

is an important subject matter for both philosophers and artificial intelligence

researchers. Despite the continuing controversies over the logic of conditionals in

philosophy, it is fair to say that the Stalnaker–Lewis semantics for counterfactual

conditionals (Stalnaker 1968; Stalnaker and Thomason 1970; Lewis 1973) enjoys

the greatest popularity. In the artificial intelligence literature, the general Stalnaker–

Lewis framework has also been employed to model counterfactuals (Ginsberg 1986)

and to develop theories of actions (Ginsberg and Smith 1987; Winslett 1988).

J. Zhang (&)

Department of Philosophy, Lingnan University, Tuen Mun, NT, Hong Kong

e-mail: [email protected]

123

Minds & Machines

DOI 10.1007/s11023-011-9261-z

More recently, a causal interpretation of counterfactuals, as statements about

consequences of (hypothetical) interventions, was vigorously developed and

defended, based on Pearl’s seminal and influential work on causal modeling (Pearl

1995, 1998, 2009).1 Logics of such causal counterfactuals have also been studied

from an axiomatic perspective (Galles and Pearl 1998; Halpern 2000). A natural

question to ask is how these logics are related to the well-known logics of

counterfactuals in the Stalnaker–Lewis framework. Galles and Pearl (henceforth

GP, 1998) compared their theory to Lewis’s, and showed that their logic of causal

counterfactuals respects Lewis’s logical principles (and, in addition, endorses a

principle called reversibility). In fact, by imposing a requirement of ‘‘unique

solution’’ on their models, which is analogous to Stalnaker’s assumption of ‘‘unique

closest world’’, GP’s logic also incorporates the stronger logic defended by

Stalnaker (1968).

Halpern (2000) relaxed GP’s requirement of ‘‘unique solution’’ and allowed

causal models with multiple solutions (or no solution). It is tempting to view the

difference between Halpern and GP in parallel to that between Lewis and Stalnaker,

and to expect that Halpern’s logic incorporates Lewis’s logic just as GP’s

incorporates Stalnaker’s. However, the expectation is false. As we will see, the

allowance of models with no solution immediately invalidates some of Lewis’s

principles. Moreover, even if we disallow models with no solution, the resulting

logic is still not Lewisian.

This observation suggests that there is an interesting class of causal models

waiting to be characterized, i.e., the class of models that validates Lewis’s logical

principles for counterfactuals. I shall provide a characterization of this class of

causal models in this paper. The characterization suggests an extension of Halpern’s

elegant axiomatic system, which I will show is sound and complete with respect to

the class of Lewisian causal models. This logic is the weakest logic of causal

counterfactuals that respects Lewis’s principles, and stands to GP’s logic in the

same fashion as Lewis’s stands to Stalnaker’s.

The rest of the paper is organized as follows. In Sect. 2, I briefly review the

relevant material on the logics of counterfactuals in the Stalnaker–Lewis

framework. In Sect. 3, I describe the general setup of causal models as (modifiable)

structural equations models, and the interpretation of counterfactuals based on such

models. I then introduce GP’s logic and Halpern’s logic for such causal

counterfactuals, and show, in Sect. 4, that while GP’s logic is Stalnakerian,

Halpern’s is not Lewisian, despite a superficial analogy between the Stalnaker–

Lewis contrast and the GP-Halpern contrast. The main results are presented in Sect.

5, where I characterize the class of Lewisian causal models (Sect. 5.1), and provide

an extension of Halpern’s system to axiomatize the logic with respect to the class

(Sect. 5.2). I conclude in Sect. 6.

1 The general approach has also been followed by prominent philosophers to illuminate the epistemology

of causation (Spirtes et al. 2000) and the nature of causal explanation (Woodward 2003; Woodward and

Hitchcock 2003).

J. Zhang

123

Two Logics of Counterfactuals in the Stalnaker–Lewis Framework

In Lewis’s theory, h! is used as the (‘would’) counterfactual operator ([ is used in

Stalnaker’s). So ‘‘u h! w’’ symbolizes the conditional statement ‘‘if it were the

case that u, it would be the case that w’’.2 The basic idea in the Stalnaker–Lewis

semantics is that the statement u h! w is true (at a world w) just in case w is true in

the most similar (to w) possible worlds at which u is true, or as is often put, w is true

in every closest u-world.3 For the purpose of this paper, it is not necessary to review

all the formal details of the semantics, but it will be helpful to recall a formalization

of the basic idea in terms of selection functions. Let L be a language and Ant(L) be

the set of sentences in L that can appear as antecedents in counterfactual

conditionals.4 Let W be a set of possible worlds. A (Lewisian) selection function is

a function f: Ant(L) 9 W ? 2W, that satisfies the following conditions for every

antecedents u1, u2, and every world w:

(S1) If u1 is true at w, then f(u1, w) = {w}.

(S2) u1 is true at every world in f(u1, w).

(S3) If u2 is true at every world at which u1 is true, and f(u1, w) = [, then f(u2,

w) = [.

(S4) If u2 is true at every world at which u1 is true, and u1 is true at some world

in f(u2, w), then f(u1, w) consists of all and only those worlds in f(u2, w) at which

u1 is true.

Intuitively, the selection function specifies, for a world w and an antecedent u,

the u-worlds closest to w. Condition (S1) requires that a world be closer to itself

than any other world. Condition (S2) ensures that the selected worlds are indeed u-

worlds. Condition (S3) requires that if the function returns a non-empty set for some

antecedent (which intuitively means that the antecedent is possible or entertainable

at w), it should also return a non-empty set for any weaker antecedent. Condition

(S4) in a way ensures that there is a consistent similarity or distance ordering

relative to w: if (S4) is violated, then there are two worlds, w1 and w2, such that w1 is

closer to w than w2 relative to one antecedent, but w2 is closer to w than w1 relative

to another antecedent.

Given a selection function f, u h! w is true at w if and only if w is true at every

world in f(u, w). From h! we can define e!: u e! w = df * (u h! * w),

which is true at w if and only if w is true at some world in f(u, w). Intuitively,

2 As Lewis (1973, p. 3) noted, this slightly ungrammatical reading of u h! w (instead of if it had been

the case that u, it would have been the case that w) was deliberate, in order not to interfere with tense.

Moreover, the antecedent is not required to be actually false, so some conditionals under the radar are,

properly speaking, not contrary-to-fact conditionals. Like Lewis, I will tolerate this terminological

inaptness.3 This formulation assumes that there are closest u-worlds, which Lewis labels the Limit Assumption

(1973, pp. 19–20). Lewis’s semantics can be formulated without this assumption, but the present, slightly

less general formulation fits our purposes well.4 In Lewis’s language, there is no restriction on the form of antecedent, so Ant(L) is just the set of all

sentences. But as we will see, the form of antecedent is restricted in the language of causal

counterfactuals. In that language, Ant(L) is a proper subset of the set of all sentences.


123

u e! w stands for a ‘might’ counterfactual: if it were the case that u, it might be

the case that w. Note that when f(u, w) is empty, u h! w is trivially true and ue! w is trivially false at w.

Lewis allows all selection functions, while Stalnaker has an additional constraint:

(SS) f(u, w) is a singleton set or an empty set.5

In plain words, Lewis allows multiple closest worlds (for entertainable

antecedents), while Stalnaker requires a single or unique closest world (for

entertainable antecedents).

The logic of counterfactuals endorsed by Lewis (1973, p. 132), named VC, can

be axiomatized with the following axiom schemata (in addition to truth-functional

tautologies):

(VC1) u h! u(VC2) (* u h! u) . (w h! u)

(VC3) (u h! * w) _ (((u ^ w) h! v) : (u h! (w . v)))

(VC4) (u h! w) . (u . w)

(VC5) (u ^ w) . (u h! w)

(VC1) is valid due to condition (S2). (VC2) is valid due to conditions (S2) and

(S3). (VC3) corresponds to condition (S4). (VC4) and (VC5) are guaranteed by

condition (S1).

With the requirement of ‘‘unique closest world’’, Stalnaker’s logic, in addition to

these principles, contains the principle of conditional excluded middle (which is not

a theorem of VC):

(S) (u h! w) _ (u h! * w)

Indeed adding (S) as an extra axiom to Lewis’s logic yields Stalnaker’s (Lewis,

1973, p. 133).

Causal Models and Two logics of Causal Counterfactuals

I now turn to the logics of causal counterfactuals, counterfactuals interpreted based

on causal models. My description of the framework will closely follow the rigorous

presentation in (Halpern 2000). A signature for causal models is a tuple hU, V, Riwhere U and V are finite sets of variables, and R associates with each variable

X[U[V a finite set of values R(X). A causal model over a signature S is a tuple hS,

Fi where F is a collection of functions, such that for each X[V, there is one (and

only one) function fX:xY[U[V\{X}R(Y) ? R(X); that is, fX maps each value

configuration of U[V\{X} to a unique value of X.

Thus a causal model specifies, for each variable X[V, how the value of X depends

on the values of other variables. Intuitively the function fX models a causal

mechanism for X. Note that there are no functions for variables in U; how the values

5 Stalnaker’s (1968) own formulation is in terms of world-selection function instead of set-selection

function, where an absurd world is included to play the role of the empty set.

J. Zhang

123

of U are determined are not modeled. Rather, a value configuration of U is taken to

describe a set of background or boundary conditions for the modeled causal system.

For this reason, variables in U are called exogenous and variables in V are called

endogenous.

Such a causal model is also known as a structural equation model, as each fXcorresponds to a structural equation: X = fX(U[V\{X}). In concrete models, some

variables in U[V\{X} may be redundant in determining the value of X, and can be

omitted from the structural equation for X.

For illustration, here is a simple causal model borrowed from Pearl (2009,

p. 209). The signature of the model consists of one exogenous variable: U = {U},

and four endogenous variables: V = {X, Y, Z, W}, and all variables are binary:

R(U) = R(X) = R(Y) = R(Z) = R(W) = {0,1}. For each endogenous variable,

there is one (and only one) structural equation, specifying how that variable

depends on other variables: X = U; Y = X; Z = X; W = Y _ Z.

The model could, for example, represent the following situation in a firing squad:

The court decides whether to order execution (represented by the variable U). The

decision (yes or no) determines whether the captain on the squad orders shooting

(represented by the variable X). Two riflemen strictly follow the captain’s order

(Y represents whether rifleman 1 shoots and Z represents whether rifleman 2 shoots),

and shooting from either rifleman is sufficient to kill the prisoner (W represents

whether the prisoner dies).

Given a causal model M = hU, V, R, Fi , a (possibly empty) subset of

endogenous variables X ( V, and a possible value configuration x of X (i.e.,

x contains one and only one value for each variable in X), we write MX=x to denote

the causal model that results from M by replacing the structural equations for X in

M with X = x; that is, MX=x is the same as M except that for each X[X the equation

for X is modified to be X = x (where x is the component value in x for X), regardless

of the original equations for them in M. Obviously when X is empty, M[ is the same

as M. Because of this operation on models, the framework is also called modifiablestructural equation models.

Intuitively, MX=x models the (counterfactual) situation in which X is intervened

or forced to take the value x while the mechanisms for other endogenous variables

remain the same as modeled by M.6 In this framework, MX=x is the key to evaluate

counterfactuals with the antecedent X = x.

For example, suppose M is the aforementioned model representing the firing

squad scenario. Then MX=0 is the model with the structural equations: X = 0;

Y = X; Z = X; W = Y _ Z, and MZ=1 is the model with the structural equations:

X = U; Y = X; Z = 1; W = Y _ Z. Intuitively, MX=0 models the situation where

some intervention (say, from the prisoner’s friends) forces the captain not to order

shooting, regardless of the court’s decision, and MZ=1 models the situation where

some intervention (say, from the prisoner’s enemies) forces rifleman 2 to shoot,

regardless of the captain’s order. We can also model the situation where both

6 This approach to modeling interventions originated in econometrics (Strotz and Wold 1960; Fisher

1970), and was masterfully articulated and developed by Pearl (2009).


123

interventions take place by MX=0, Z=1, which has the following structural equations:

X = 0; Y = X; Z = 1; W = Y _ Z.

A solution to a causal model, relative to a value configuration u of the exogenous

variables U, is a value configuration v of the endogenous variables V such that all

the structural equations in the model are simultaneously satisfied. In general, there

may or may not be a solution to a causal model relative to a value configuration of

the exogenous variables, and when there are, there may be more than one solution.

For example, in the firing squad model, relative to U = 1 (i.e., relative to the fact

that the court decides to order execution), there is a unique solution to the model:

X = 1 (the captain orders execution), Y = 1 (rifleman 1 shoots), Z = 1 (rifleman 2

shoots), and W = 1 (the prisoner dies). However, not every model features a unique

solution. For example, the model with the following structural equations: X = U;

Y = Z; Z = *Y; W = Y _ Z, has no solution relative to U = 1. By contrast, the

model with the following structural equations: X = U; Y = Z; Z = Y; W = Y _ Z,

has two solutions relative to U = 1: (X = 1, Y = 1, Z = 1, W = 1) and (X = 1,

Y = 0, Z = 0, W = 0).

The simplest counterfactual conditionals in this framework are of this form: if it

were the case that X = x (or more explicitly, if X were intervened to take value x), it

would be the case that Y = y, where X and Y are endogenous variables. Such a

statement gets a truth value in a causal model M relative to a value configuration u:

the statement is true just in case every solution to MX=x relative to u has it that

Y = y. More generally, the antecedent can be a conjunction of interventions, or say

an intervention on a set of variables, and the consequent can be any Boolean

combination of such statements as Y = y. For example, it makes sense to say: if

X were intervened to take value x and Z intervened to take value z, it would be the

case that Y = y or W = w. This statement is true in M relative to u just in case every

solution to MX=x, Z=z relative to u satisfies that Y = y or W = w.

To make it more precise, I shall use the following language adapted from Halpern

(2000), defined over a signature S = hU, V, Ri. The basic counterfactual formulasare of the form [X1 = x1^��^Xk = xk]u, 7 where X1,…, Xk are distinct variables in

V, and u is a Boolean combination of formulas of the form Y(u) = y, where u is a

value configuration of the variables in U. It stands for the statement ‘‘if X1 were

intervened to take value x1, …, and Xk were intervened to take value xk, it would be

the case that u, relative to u’’. For convenience, I will write in bold, X = x (or

X(u) = x), to abbreviate a conjunction of Xi = xi (or Xi(u) = xi). So

[X = x](Y(u) = y) is understood as: if the variables in X were intervened to take

the value configuration x, the variables in Y would have the value configuration y,

relative to u. In the special case when X is empty, the formula [X = x]u is written

as [true]u. The language contains all the Boolean combinations of the basic

counterfactual formulas. I will refer to this language for causal counterfactuals as

LCC(S).

Notice that the form of antecedents is restricted to a conjunction of interventions.

How to handle disjunctive antecedents properly in this framework is an interesting

7 The notation of [] (and h i to be introduced later) is obviously borrowed from dynamic logic (e.g.,

Harel 1979).

J. Zhang

123

open question.8 It is worth noting that although disjunctive antecedents are allowed

in the Stalnaker–Lewis framework, the closest-world interpretation of counterfac-

tuals with disjunctive antecedents also faces serious challenges (Ellis et al. 1977;

Lewis 1977).

Given any causal model M over S, every formula in LCC(S) has a truth value in

M. A basic counterfactual formula [X = x]u(u) is true in M if and only if every

solution to MX=x relative to u satisfies u. Other formulas are truth functions of the

basic counterfactual formulas.

Again, let me use the firing squad example to illustrate the ideas. Suppose M is

the model representing the firing squad scenario. Recall that M has one binary

exogenous variable U, and four binary endogenous variables X, Y, Z, W, with these

structural equations: X = U; Y = X; Z = X; W = Y _ Z. Suppose the actual value

of U is 1 (i.e., the court actually decides to order execution).

Consider the conditional: if Captain had not given a signal, the prisoner would

not have died. This conditional is expressed by the formula [X = 0](W(1) = 0). To

evaluate this formula, we consider the model MX=0: X = 0; Y = X; Z = X;

W = Y _ Z. The solution to MX=0 (relative to U = 1) gives W the value 0. So the

conditional is true in this model.

Consider another conditional: if rifleman 2 had not shot, the prisoner would not

have died. This is expressed by the formula [Z = 0](W(1) = 0). To evaluate this

formula, we consider the model MZ=0: X = U; Y = X; Z = 0; W = Y _ Z. The

solution to MZ=0 (relative to U = 1) gives W the value 1. So the conditional is false

in this model.

A formula is valid with respect to a class of causal models if and only if it is true

in every model in the class. Obviously different classes of causal models may

validate different sets of formulas and so generate different logics of causal

counterfactuals. GP (Galles and Pearl 1998) considered the class of causal models

with the following ‘‘unique-solution’’ property: for every X ( V, every value

configuration x of X, and every value configuration u of U, MX=x has one and only

one solution relative to u. I will refer to this class of models (over signature S) as

Mun(S), and the corresponding logic as GP’s logic.9 Halpern (2000) provided an

elegant axiomatization of GP’s logic, but also considered the logic with respect to

the class of all causal models. I will refer to the class of all causal models (over

signature S) Mall(S), and the corresponding logic as Halpern’s logic.

8 It might be tempting to simply dismiss the problem, on the ground that counterfactuals with disjunctive

antecedents concern consequences of interventions that are not well defined, and so should not bear truth

values under the causal semantics. But some counterfactuals of this sort seem to have as determinate a

truth value as any counterfactual can. Consider the firing squad example. Suppose, as it happened, the

court decided not to order execution. As a result, neither rifleman shot, and the prisoner survived. The

following counterfactual seems clearly true: if rifleman 1 had shot or rifleman 2 had shot, the prisoner

would have died.9 GP (as well as Halpern) also considered the class of recursive models, a subclass of Mun(S). That class,

though important for other purposes, does not need a special attention for the purpose of this paper.


123

Halpern’s Logic is Not Lewisian

Galles and Pearl (1998) compared their logic to Lewis’s. They briefly indicated that

their semantics could be recast in Lewis’s terms, with a less elusive and more

principled similarity measure. Their suggestion is that given a causal model, each

value configuration of all variables can be taken as a possible world, and world w1 is

more similar or closer to world w than world w2 if and only if it takes a smaller

number of interventions to transform w to w1 than it does to transform w to w2.

This similarity measure, however, is not quite right for their purpose. Consider,

for example, a causal model M, over the signature h{U}, {X, Y, Z}, Ri where

R(U) = R(X) = R(Y) = R(Z) = {0, 1} (i.e., all variables are binary), with these

structural equations: X = U; Y = X; Z = X. In this model, the counterfactual

[Y = 1^Z = 1](X(0) = 0) is true, because in the solution to MY=1, Z=1 relative to

U = 0, the value of X is 0. However, according to the similarity measure suggested

by GP, the world (U = 0, X = 1, Y = 1, Z = 1) is closer to the (actual) world

(U = 0, X = 0, Y = 0, Z = 0) than the world (U = 0, X = 0, Y = 1, Z = 1) is,

because it takes only one local intervention (i.e., forcing X to be 1) to transform

(U = 0, X = 0, Y = 0, Z = 0) to (U = 0, X = 1, Y = 1, Z = 1), but takes two

local interventions (i.e., forcing Y to be 1 and forcing Z to be 1) to transform

(U = 0, X = 0, Y = 0, Z = 0) to (U = 0, X = 0, Y = 1, Z = 1). Thus the

suggested similarity measure is not quite right for causal counterfactuals.

Instead of working out the corresponding similarity measure, it is more

straightforward to see the analogy in terms of selection functions. In GP’s

framework, where only causal models with the ‘‘unique-solution’’ property is

considered, a causal model M induces an obvious ‘‘selection function’’ for each

value configuration u of U: the function selects, for each antecedent X = x, the

unique solution to MX=x relative to u as the antecedent world closest to the ‘‘actual’’

world (which is taken to be the solution to M). This function is not yet a fully

specified selection function, because it is only defined relative to the ‘‘actual’’ world,

but it suffices for our present purpose. Obviously a counterfactual is true in M if and

only if it is true according to the ‘‘actual’’ selection function induced by M. It is

straightforward to check that this function, for the ‘‘actual’’ world and every

antecedent, satisfies the conditions for a Stalnakerian selection function: (S1)-(S4)

plus (SS), as explained in Sect. 2. It is thus no accident that the following schemata,

which are translations of the axiom schemata (VC1)-(VC5) and (S) into the

language for causal counterfactuals, are all valid in GP’s logic.

(VC1c) [X = x](X(u) = x)

(VC2c) [X = x](X(u) = x) . [Y = y](X(u) = x)

(VC3c) [X = x](Y(u) = y) _ ([X = x^Y = y]u(u)

: [X = x](Y(u) = y . u(u)))

(VC4c) [X = x]u(u) . [true](X(u) = x . u(u))

(VC5c) [true](X(u) = x ^u(u)) . [X = x]u(u)

(Sc) [X = x]u(u) _ [X = x]*u(u)

J. Zhang

123

The validity of these schemata with respect to Mun(S) is very easy to verify. GP’s

logic thus incorporates Stalnaker’s logic, and in this sense may be called

Stalnakerian.

On the other hand, Halpern’s logic, by allowing causal models with multiple

solutions, obviously invalidates (Sc). It may be tempting to view the difference

between Halpern and GP as parallel to the difference between Lewis and

Stalnaker, and to expect Halpern’s logic to be Lewisian. This expectation is, in a

way, obviously false. Since Halpern also allows models with no solution under

some intervention, counter-models to (VC2c) are easy to find. In any model

M such that MX=x has no solution for some X = x and relative to some u,

[X = x](X(u) = x) is trivially true, but for Y = V and a value configuration

y consistent with X = x, [Y = y] (X(u) = x) is false. The rationale behind (VC2)

in Lewis’s theory is that only an antecedent that is not possible or entertainable

counterfactually implies its negation. This rationale is violated in causal models

with no solution: when MX=x has no solution, X = x is still possible (in the sense

that it is part of a solution to some other intervention) but counterfactually implies

X = x.

Therefore, causal models that have no solution under some intervention are

not Lewisian. Suppose we disallow such models and consider only (and all)

those causal models that have at least one solution under any intervention. Is the

corresponding logic Lewisian? The answer is still no, though it is less obvious.

In fact, (VC3c) and (VC5c) are still not valid. I will present a counter-model

to (VC3c), the diagnosis and treatment of which will also take care of

(VC5c).

Consider the signature h{}, {X, Y, Z, W}, Ri where R(X) =

R(Y) = R(Z) = R(W) = {0, 1} (i.e., all variables are binary), and a model

M over the signature with the following structural equations: X = Y ^ Z;

Y = X ^ W; Z = *W; W = *Z. It is easy to check that for any X ( {X, Y, Z,

W} and any value configuration x of X, MX=x has a solution. In particular, MX=1 has

two solutions: (X = 1, Y = 1, Z = 0, W = 1) and (X = 1, Y = 0, Z = 1, W = 0).

Thus [X = 1](Y = 1) is false, and [X = 1](Y = 1 . W = 1) is true. Now

MX=1^Y=1 also has two solutions: (X = 1, Y = 1, Z = 0, W = 1) and (X = 1,

Y = 1, Z = 1, W = 0). Thus [X = 1^Y = 1](W = 1) is false. Hence,

[X = 1^Y = 1](W = 1) : [X = 1](Y = 1 . W = 1) is false. Therefore, an

instance of (VC3c) is false in this model.

The validity of (VC3) in Lewis’s logic is due to the constraint that (S4) places

on selection functions. In that light, the failure of (VC3c) here owes to the

following circumstance: MX=1 has a solution in which Y = 1, but not every

solution to MX=1^Y=1 is a solution to MX=1. If we consider the natural ‘‘selection

function’’ that, for each antecedent X = x, selects the set of solutions to MX=x as

the set of closest worlds, we can see that the condition (S4) is violated due to that

circumstance.

This diagnosis suggests a restriction to causal models that yields a Lewisian logic

of causal counterfactuals, to which I now turn.


123


The Class of Lewisian Causal Models

The restriction I will put down is the following condition:

Definition 1 [solution-conservative] A causal model M = hU, V, R, Fi is called

solution-conservative if for every X ( V, Y [ V\X, and every value configuration

x of X, y of Y, and u of U, if MX=x has a solution relative to u consistent with Y = y,

then every solution to MX=x^Y=y relative to u is also a solution to MX=x relative to u.

It should be clear that this condition is motivated by a consideration of the

condition (S4) for selection functions. I name the condition ‘‘solution-conservative’’

because the condition requires that compared to the solutions to MX=x, no newsolution should emerge for MX=x^Y=y, unless no solution to MX=x is consistent with

Y = y.

Another restriction we have mentioned in the previous section is that for every

X ( V, every value configuration x of X, and u of U, MX=x has at least one solution

relative to u. I shall call models that satisfy this condition solution-ful. I now show

that the class of causal models that validate (VC1c)–(VC5c) is precisely the class of

models that are both solution-ful and solution-conservative.

For that purpose, we need the following lemma, showing that Definition 1 is

equivalent to a seemingly more general version.

Lemma 2 If a causal model M = hU, V, R, Fi is solution-conservative, then for

every disjoint sets X, Y ( V, and every value configuration x of X, y of Y, and u of

U, if MX=x has a solution relative to u consistent with Y = y, then every solution to

MX=x^Y=y relative to u is also a solution to MX=x relative to u.

Proof We do induction on |Y|, the number of variables in Y. The statement is

trivial when |Y| = 0. When |Y| = 1, the statement is true by Definition 1. Suppose

the statement is true for |Y| = k. Consider the case where Y = Y0[{Y} such that

|Y0| = k and Y62Y0. Suppose MX=x has a solution relative to u consistent with Y = y,

where y = y0[{y}. This solution of course is consistent with Y0 = y0. By the

induction hypothesis, every solution to MX=x^Y0=y0 relative to u is also a solution to

MX=x relative to u. Note also that any solution to MX=x relative to u consistent with

Y = y is also a solution to MX=x^Y0=y0 relative to u, because for variables not in Y0,MX=x^Y0=y0 has the exact same equations as MX=x. Thus there is a solution to

MX=x^Y0=y0 relative to u consistent with Y = y. By Definition 1, every solution to

MX=x^Y0=y0^Y=y, which is just MX=x^Y=y, relative to u is also a solution to

MX=x^Y0=y0. Therefore, every solution to MX=x^Y=y relative to u is also a solution to

MX=x relative to u. Q.E.D.

Theorem 3 Let M be any causal model over a signature S = hU, V, Ri . All

instances of (VC1c)–(VC5c) in LCC(S) are true in M if and only if M is solution-ful

and solution-conservative.

Proof (If) Suppose M is solution-ful and solution-conservative. We show that all

instances of (VC1c)–(VC5c) are true in M.

J. Zhang

123

The case of (VC1c) is trivial. (Indeed (VC1c) is valid with respect to Mall(S).)

The case of (VC2c) is also trivial given that M is solution-ful.

For (VC3c), it is equivalent to show that for every disjoint X, Y ( V, every value

configuration x of X, y of Y, and u of U, and every u, if [X = x](Y(u) = y) is false in

M, then ([X = x^Y = y]u(u) : [X = x](Y(u) = y . u(u))) is true in M. Suppose

[X = x](Y(u) = y) is false in M. This means that some solution to MX=x relative to

u is consistent with Y = y. It follows from Lemma 2 that every solution to MX=x^Y=y

relative to u is also a solution to MX=x relative to u. On the other hand, every solution to

MX=x relative to u that is consistent with Y = y is also a solution to MX=x^Y=y relative

to u, simply because for variables not in Y, MX=x^Y=y has the exact same equations as

MX=x. Thus, u is satisfied in every solution to MX=x^Y=y relative to u if and only if u is

satisfied in every solution to MX=x relative to u that is consistent with Y = y. There-

fore, [X = x^Y = y]u(u) : [X = x](Y(u) = y . u(u)) is true in M.

For (VC4c), suppose [X = x]u(u) is true in M, which means that every solution

to MX=x relative to u satisfies u. Note that every solution to M relative to

u consistent with X = x is also a solution to MX=x relative to u, because for

variables not in X, MX=x has the exact same equations as M. Hence every solution to

M relative to u is either inconsistent with X = x or satisfies u, which means that

[true](X(u) = x . u(u)) is true in M. (Notice that no constraint on causal models is

invoked in the argument: (VC4c) is also valid with respect to Mall(S).)

For (VC5c), suppose [true](X(u) = x ^u(u)) is true in M, which means that

every solution to M relative to u satisfies X = x and u. Since M is solution-ful, there

is indeed a solution to M, which is hence consistent with X = x. Since M is also

solution-conservative, by Lemma 2, every solution to MX=x relative to u is also a

solution to M relative to u, and hence satisfies u. Therefore, [X = x]u(u) is also

true in M.

(Only if) As already shown in the previous section, if M is not solution-ful, then

some instance of (VC2c) is false in M. On the other hand, if M is not solution-

conservative, then there exist X ( V, Y[V\X, and some configuration x of X, y of

Y, and u of U such that (1) MX=x has a solution relative to u consistent with Y = y,

but (2) some solution to MX=x^Y=y relative to u is not a solution to MX=x relative to

u. (1) implies that [X = x](Y(u) = y) is false in M. Let V = v1,…, and V = vn be

all the solutions to MX=x relative to u which are consistent with Y = y, and let

u(u) be V(u) = v1_…_V = vn. Then [X = x](Y(u) = y . u(u)) is true in

M. However, (2) implies that [X = x^Y = y]u(u) is false in M. Thus

[X = x^Y = y]u(u) : [X = x](Y(u) = y . u(u)) is false in M. Hence an

instance of (VC3c) is false in M. Q.E.D.

Let Msfsc(S) be the class of causal models over S that are both solution-ful and

solution-conservative. It follows from Theorem 3 that Msfsc(S) is the biggest class of

causal models over S with respect to which Lewis’s principles are valid, and so the

logic with respect to the class is the weakest Lewisian logic of causal

counterfactuals. Obviously, GP’s class Mun(S) is contained in Msfsc(S). The

containment is proper even for a signature as simple as h{}, {X, Y}, Ri,R(X) = R(Y) = {0, 1}. Over this signature we can define a model with the

following equations: X = Y; Y = X. It is very easy to check that this model belongs


123

to Msfsc(S), though it does not belong to Mun(S). Halpern (2000, p. 320) provided

some justification for going beyond Mun(S). I suspect that, besides the formal

consideration given here, there may also be substantive reasons to stop at Msfsc(S).

Perhaps most models of real interest are solution-ful and solution-conservative, but

an investigation into this matter has to await another occasion.

Axiomatization

The logic with respect to Msfsc(S) is of course a (proper) extension of Halpern’s

logic. Based on Halpern’s elegant axiomatization of his logic, I now develop a

sound and complete system of the logic with respect to Msfsc(S).

To state Halpern’s axioms, it is convenient to use a defined operator hi :

hX ¼ xiu ¼df � X ¼ x½ � �u

Clearly hX=xiu corresponds to a ‘might’ counterfactual: if X were intervened to

take value x, it might be the case that u.

For the logic with respect to Mall(S) over S = hU, V, Ri, Halpern’s axiomatic

system is given by the rule of modus ponens and a number of axiom schemata. I will

list his schemata in a slightly different order to facilitate subsequent discussions.

H0. All instances of truth-functional tautologies

H1. [X = x](Y(u) = y . Y(u) = y0), where y = y0

H2. [X = x](_y[R(Y)Y(u) = y)

H3. _y[R(Y) hX=xi (Y = y) ^ _y[R(Y)[X = x](Y = y), where X = V\{Y}10

H4. hX=xi (u1(u1)^…^uk(uk)) : (h X=xi u1(u1)^…^ hX=xi uk(uk)), if

ui = uj

H5. ([X = x]u ^ [X = x](u . w)) . [X = x]wH6. [X = x]u, if u is a truth-functional tautology

H7. [X = x^Y = y](Y(u) = y)

H8. hX=xi (W(u) = w^Y(u) = y) . hX=x^W = wi(Y(u) = y)

H9. (hX=x^Y = yi(W(u) = w^Z(u) = z) ^ hX=x^W = wi(Y(u) = y^Z(u) =

z)). hX = xi(W(u) = w^Y(u) = y^Z(u) = z), where W and Y are distinct;

Z = V\(X [ {W, Y})

The schemata H1-H4 are in effect a description of the general setup of causal

models. H1 expresses the idea that different values of a variable are mutually

exclusive. H2 expresses the idea that the values in R(Y) are exhaustive for every

Y. H3 expresses the idea that for each endogenous variable Y, there is a function that

maps each value configuration of other variables in the model to a value of Y, or in

other words, if we set the value for every other variable, there is one and only one

value (solution) for Y. H4 expresses the idea that different value configurations of

the exogenous variables correspond to different background conditions and can be

considered separately.

10 Halpern (2000, p. 326) used a slightly different but equivalent expression in the first conjunct of this

schema (his D9).

J. Zhang

123

H5 and H6 are adapted from familiar principles in modal logic. The remaining

schemata H7-H9 are key principles for causal counterfactuals. H7 is known as the

principle of effectiveness and is clearly a version of (VC1c). H8 is a version of the

principle known as composition. In GP’s logic, the principle can be formulated in

terms of [] rather than hi , and the [] version is clearly related to (VC5c). In

Halpern’s logic, however, the [] version is not valid.11 Just as (VC5c) is valid with

respect to Msfsc(S), the [] version of composition is valid with respect to Msfsc(S),

and will be a theorem in our system.

The schema H9 is a version of the principle known as reversibility. This principle

is not validated by the Stalnaker–Lewis semantics, and reveals the extra constraint

imposed by the causal semantics on the logic of counterfactuals. In GP’s logic, this

principle can be formulated in a much simpler way (without involving Z), but the

simplification is not valid in Halpern’s logic. As we will see, however, the

simplification is available in the system I am heading to.

For the logic with respect to Msfsc(S), we need only add two axioms to Halpern’s

system, one expressing that models are solution-ful (SF) and the other expressing

that models are solution-conservative (SC).

SF _y2R Yð Þ hX ¼ xiðYðuÞ ¼ yÞSC ðhX ¼ xiðWðuÞ ¼ wÞ ^ hX ¼ x ^W ¼ wiðYðuÞ ¼ yÞÞ �hX ¼ xiðWðuÞ ¼ w ^ YðuÞ ¼ yÞ

For convenience I will call Halpern’s system ALL, and call the system

ALL ? SF ? SC the system SFSC. Given Halpern’s proof that the system ALL is

sound and complete (for the language LCC(S)) with respect to Mall(S), it is quite

easy to establish the soundness and completeness of SFSC with respect to Msfsc(S),

with the help of the following two lemmas.

Lemma 4 Let M be any causal model over S = hU, V, Ri. All instances of SF in

LCC(S) are true in M if and only if M is solution-ful.

Proof [If] Suppose M is solution-ful. Then for every X ( V, and every value

configuration x of X and u of U, MX=x has a solution relative to u. So for every Y[V,

there is some y[R(Y) such that hX=xi (Y(u) = y) is true in M. Hence

_y[R(Y) hX=xi (Y(u) = y) is true in M.

[Only if] Suppose M is not solution-ful. Then there exist X ( V, a value

configuration x of X and a value configuration u of U such that MX=x has no solution

relative to u. That means for every y[R(Y), hX=xi (Y(u) = y) is false in M. Hence

_y[R(Y) hX=xi (Y(u) = y) is false in M. Q.E.D.

Lemma 5 Let M be any causal model over S = hU, V, Ri. All instances of SC in

LCC(S) are true in M if and only if M is solution-conservative.

Proof (If) Suppose M is solution-conservative. We show that SC is true in M for

every X, Y ( V, W[V\X, and every value configuration x of X, y of Y, w of W, and

11 Halpern (2000, p. 326) seemed to remark in passing that the [] version of composition is also valid in

his logic, which, if I understood his remark correctly, was a mistake.


123

u of U. Suppose hX=xi (W(u) = w) ^ hX=x^W = wi(Y(u) = y) is true in M. That

means MX=x has a solution relative to u in which W = w, and MX=x^W=w has a

solution relative to u in which Y = y. Since M is solution-conservative, the solution

to MX=x^W=w is also a solution to MX=x. Hence there is a solution to MX=x relative to

u in which W = w and Y = y. So hX=xi (W(u) = w^Y(u) = y) is also true in

M. Therefore, all instances of SC are true in M.

(Only if) Suppose M is not solution-conservative. That means there exist X,Y ( V, W[V\X, and some value configuration x of X, y of Y, w of W, and u of U,

such that MX=x has a solution relative to u in which W = w, which implies

that hX=xi (W(u) = w) is true in M, but some solution to MX=x^W=w is not a

solution to MX=x. Let V = v be a solution to MX=x^W=w which is not a solution to

MX=x. Then hX=x^W = wi(V(u) = v) is true in M, but hX=xi (W(u) =

w^V(u) = v) is false in M. We have thus an instance of SC that is false in

M. Q.E.D.

I now build on Halpern’s proof (of Theorem 3.3 in Halpern 2000) to show that

the system SFSC is sound and complete with respect to Msfsc(S). Since Halpern’s

completeness proof employs the familiar method of canonical models, I follow the

standard strategy to extend his proof (for a textbook demonstration of this strategy,

see e.g., Hughes and Cresswell 1996, ch. 6).

Theorem 6 Let S = hU, V, Ri be any signature. The system SFSC for the

language LCC(S) is sound and complete with respect to Msfsc(S).

Proof Halpern has proved that the system ALL is sound and complete with

respect to Mall(S) (Halpern 2000, Theorem 3.3). The soundness of

SFSC = ALL ? SF ? SC immediately follows from his soundness result, Lemma

4, and Lemma 5.

For completeness, I will ride on Halpern’s completeness proof, which employs

the method of canonical models. The idea is that for any formula u in

LCC(S) consistent with the system SFSC, {u} can be extended to a maximally

SFSC-consistent set of formulas C. From C we can construct a causal model M over

S such that for every formula w in LCC(S), w[C if and only if C is true in

M. Halpern’s proof of this fact for the system ALL carries over with no change to

the system SFSC. What remains to be shown is just that the constructed model

M belongs to the desired class, i.e., that M is solution-ful and solution-conservative.

But that follows from Lemmas 2 and 3: since all instances of SC and SF are

theorems of SFSC, they must belong to C, and hence are true in M, which, by

Lemmas 2 and 3, implies that M is solution-ful and solution-conservative. Thus

every formula consistent with the system SFSC is satisfiable in some causal model

in Msfsc(S). Q.E.D.

As already mentioned, for the system SFSC, the axiom schema H9 can be

replaced by a simpler and more elegant form of reversibility.

H9* (h X=x^Y = yi(W(u) = w) ^ hX=x^W = wi(Y(u) = y))

. hX=xi (W(u) = w^Y(u) = y)

J. Zhang

123

To see this, first note that H9* is valid with respect to Msfsc(S). Here is a proof.

For any X ( V and two distinct variables Y, W[V\X, and any value configuration

x of X, y of Y, w of W, and u of U, suppose hX=x^Y = yi(W(u) = w) ^ hX=x^W = wi(Y(u) = y) is true in a model M in Msfsc(S). That

means MX=x^Y=y has a solution relative to u consistent with W = w, and MX=x^W=w

has a solution relative to u consistent with Y = y. Let V = v be the solution to

MX=x^Y=y relative to u consistent with W = w. It is then also a solution to

MX=x^Y=y^W=w relative to u. Since M is solution-conservative, it follows that it is

also a solution to MX=x^W=w relative to u. The fact that V = v is both a solution to

MX=x^Y=y and a solution to MX=x^W=w relative to u implies that it is a solution to

MX=x relative to u, because every equation in MX=x appears in MX=x^Y=y or

MX=x^W=w (or both). Since V = v is consistent with both Y = y and

W = w, hX=xi (W(u) = w^Y(u) = y) is also true in M.

Therefore, if we replace H9 with H9* in the system SFSC, the system is still

sound. To show that it is also complete, it suffices to derive H9 in the new system.

Using H5, it is easy to derive that

ðhX¼ x^Y ¼ yiðWðuÞ ¼w^ZðuÞ ¼ zÞ ^ hX¼ x^W ¼wiðYðuÞ ¼ y^ZðuÞ ¼ zÞÞ� ðhX¼ x^Y ¼ yiðWðuÞ ¼wÞ ^ hX¼ x^W ¼wiðYðuÞ ¼ yÞÞ

ð1ÞCombined with H9*, we can further derive

ðhX¼ x^Y ¼ yiðWðuÞ ¼w^ZðuÞ ¼ zÞ ^ hX¼ x^W ¼wiðYðuÞ ¼ y^ZðuÞ ¼ zÞÞ� hX¼ xiðWðuÞ ¼w^YðuÞ ¼ yÞ

ð2ÞMoreover, it is an instance of H8 that

hX ¼ x ^ Y ¼ yiðWðuÞ ¼ w ^ ZðuÞ ¼ zÞ � hX ¼ x ^ Y ¼ y ^W ¼ wiðZðuÞ ¼ zÞð3Þ

And from SC we can easily derive

ðhX ¼ xiðWðuÞ ¼ w ^ YðuÞ ¼ yÞ ^ hX ¼ x ^ Y ¼ y ^W ¼ wiðZðuÞ ¼ zÞÞ� hX ¼ xiðWðuÞ ¼ w ^ YðuÞ ¼ y ^ ZðuÞ ¼ zÞ

ð4Þ

H9 obviously follows from (2), (3), and (4).

The [] version of reversibility, however, is not a theorem of SFSC. A simple

counter-model to the [] version is the model with two variables and the structural

equations: X = Y; Y = X. The [] version is of course valid in GP’s logic. Just as

there is no difference between ‘would’ counterfactuals and ‘might’ counterfactuals

(for entertainable antecedents) in Stalnaker’s logic, there is no difference between []

and h i in GP’s logic.

To reach GP’s logic, we need to add a special case of (Sc):

[X = x](Y(u) = y) _ [X = x](Y(u) = y), to SFSC. GP’s system stands to SFSCin the same way as Stalnaker’s system stands to Lewis’s.


123

Conclusion

Modifiable structural equation models provide a powerful framework to model

causation and intervention, based on which a conception of counterfactual reasoning

as reasoning about consequences of interventions has been rigorously developed. In

this paper, I aimed to deepen our understanding of the connection between this

semantic framework based on causal models and the popular possible-world

semantics for counterfactuals. In particular, I have provided a precise character-

ization of the causal models that validate Lewis’s well-known logical principles for

counterfactuals. The characterization delineates a class of causal models that yields

the weakest Lewisian logic of causal counterfactuals. A sound and complete

axiomatization of this logic was also given.

The claim of being ‘‘weakest’’ is of course relative to the present definition of

causal models. That definition may be further generalized. In particular, as Halpern

(2000) and Pearl (2009) suggested, the requirement that fX be a function for each

X may be relaxed. In other words, there may not be a unique solution for some

variable X even when all other variables in the model have been fixed. Halpern

conjectured that his results could be straightforwardly extended to the more general

setup. It also seems to me that my characterization of Lewisian causal models will

survive that generalization without essential change.

In addition, there are also different ways to use causal models to evaluate

counterfactuals. The way I have followed in this paper is natural and especially

influential, but there exists an alternative theory of counterfactuals formulated in a

similar causal model framework (Hiddleston 2005). It is not yet clear what logic of

counterfactuals results from the alternative theory.

The class of Lewisian causal models identified in this paper was motivated and

studied in a purely formal way, as my main purpose was to improve our

understanding of the formal connections between the two approaches to interpreting

counterfactuals. However, it seems probable to me that there may be other,

philosophical or practical, motivations for this class of models. At any rate, for

people who take both Lewis’s logic and the causal model approach seriously, there

is every reason to explore in more detail the philosophical implications and/or

justifications for the Lewisian restrictions on causal models.

Acknowledgments I thank Lam Wai Yin for helpful discussions on issues related to this article, and the

audiences of a seminar at Carnegie Mellon University for useful feedback. My research was supported in

part by the Research Grants Council of Hong Kong under the General Research Fund LU341910.

References

Ellis, B., Jackson, F., & Pargetter, R. (1977). An objection to possible-world semantics for counterfactual

logics. Journal of Philosophical Logic, 6, 355–357.

Fisher, F. M. (1970). A correspondence principle for simultaneous equation models. Econometrica, 38,

73–92.

Galles, D., & Pearl, J. (1998). An axiomatic characterization of causal counterfactuals. Foundation ofScience, 3, 151–182.

Ginsberg, M. L. (1986). Counterfactuals. Artificial Intelligence, 30, 35–79.

J. Zhang

123

Ginsberg, M. L., & Smith, D. E. (1987). Reasoning about action I: A possible worlds approach. In F.

M. Brown (Ed.), The frame problem in artificial intelligence (pp. 233–258). Los Altos, CA: Morgan

Kaufmann.

Halpern, J. Y. (2000). Axiomatizing causal reasoning. Journal of Artificial Intelligence Research, 12,

317–337.

Harel, D. (1979). First-order dynamic logic. Berlin & New York: Springer.

Hiddleston, E. (2005). A causal theory of counterfactuals. Nous, 39, 632–657.

Hughes, G. E., & Cresswell, M. J. (1996). A new introduction to modal logic. London & New York:

Routledge.

Lewis, D. (1973). Counterfactuals. Oxford: Blackwell.

Lewis, D. (1977). Possible-world semantics for counterfactual logics: A rejoinder. Journal ofPhilosophical Logic, 6, 359–363.

Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–710.

Pearl, J. (1998). Graphs, causality, and structural equation models. Sociological Methods and Research,27, 226–284.

Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge, UK: Cambridge

University Press.

Spirtes, P., Glymour, G., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge,

MA: MIT Press.

Stalnaker, R. (1968). A theory of conditionals. In N. Rescher (Ed.), Studies in logical theory (pp. 98–112).

Oxford: Blackwell.

Stalnaker, R., & Thomason, R. H. (1970). A semantic analysis of conditional logic. Theoria, 36, 23–42.

Strotz, R. H., & Wold, H. O. A. (1960). Recursive versus nonrecursive systems: An attempt at synthesis.

Econometrica, 28, 417–427.

Winslett, M. (1988). Reasoning about action using a possible worlds approach. In Proceedings of theSeventh American Association of Artificial Intelligence Conference (pp. 89–93).

Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford & New York:

Oxford University Press.

Woodward, J., & Hitchcock, C. (2003). Explanatory generalizations, part I: A counterfactual account.

Nous, 37, 1–24.


123

a lewisian logic of causal counterfactuals

Documents