critique issue 1

Critique

Issue 1, Michaelmas 2016

Published by the Durham University Undergraduate Philosophy Society

Editor’s Preface

I would like to welcome you all to the newly launched (rebranded) undergraduate philosophy journal, Critique, published by the Durham University Undergraduate Philosophy Society. It is a great honour (not to mention a pleasure) to get to read and publish some of the finest work our undergraduates have to offer. I have tried to keep this first edition short in order to highlight the main focus of the journal—the publication of student work which deserves to (and might not otherwise) reach a wider audience of students and academics. Below, you will find discussions of causality, depression and possible worlds. Aside from the spectrum of topics covered, the style reflects another well-entrenched virtue in the Philosophy department at Durham: methodological pluralism. Or, to avoid the jargon, an open minded approach to philosophising. I was struck by this incisive passage by Stanley Cavell while receiving my first submissions to the journal, it is a long quote but one well worth reading more than once:

“[I]t seems to me commonly assumed among the serious philosophers I know that when they look into a new article they will find not merely a number of more or less annoying errors, but that they will find the whole effort fundamentally wrong, in sensibility or method or claim. Even when it is good—that is, when it contains one interesting or useful idea—the interest or usefulness cannot simply be taken over as it stands into one’s own thought, but will require independent development or justification from within one’s own procedures. It often happens that what makes an article or passage famous is its enunciation of a thesis which the profession is fully prepared to annihilate. The refuting of Mill on “desirable,” or Moore on “indefinable,” or Wittgenstein on “private language,” have become private industries, established more than one living. These can be disheartening facts, especially among the young who are entering the profession and still deciding whether it can support life—as though the profession as a whole has forgotten how to praise, or forgotten its value.”1

I hope, in the short time I will be in charge of this publication, that this journal can serve as praise for those published and as a demonstration of the value of philosophising for those lucky enough to read it. Looking to the future, (against Cavell’s disheartening facts and in line with the pluralism I spoke about above) I will be interviewing local lecturers to document and disseminate their exciting work to a wider audience, so it is worth keeping an eye on our Twitter for updates on future issues: @DUPS_Critique.

Finally, I will end with some administrative duties. Any and all requests for reproduction of the work contained in this journal should be addressed to [email protected]. Any submissions for future issues should also be sent to the same e-mail. Criterion for submission of essays is that they are under 5000 words and must have received a first, as for other contributions (poems, short stories etc.) the word limit is the only criteria.

Thank you for reading, I hope you enjoy it.

Nathan Davies

1 Cavell, Stanley. (1976) ‘Must we mean what we say?’, Cambridge University Press. pp.xx-xxi

1

mailto:[email protected]

Contents

“Why is there something rather than nothing?” Is this a meaningful question? Do you think this question is answerable?

Daniel Foggin p.3

Depression and the Phenomenology of Intersubjectivity: A Gadamerian approach to depression

Constantin Mehmel p.10

Illustrating Pearl’s Approach to Causality by Examples, and Responding to Cartwright’s Criticisms

Kim Tullar p.17

2

“Why is there something rather than nothing?” Is this a meaningful question? Do you think this question is answerable?

Daniel Foggin

Introduction

In this essay I will begin by arguing that “why is there something rather than nothing?" (WSR) is a meaningful question to ask as it is not able to be dismissed on the grounds of it being senseless, dispensable or insoluble1, which I offer as my definition of meaningfulness (and in the process I forgo the question of answerability). I will then present possible answers to the question, focussing on the egalitarian theories of Robert Nozick and Peter van Inwagen. I will argue that the probability distributions they wish to use are not valid for infinitely many possible worlds in their current formulation, and conclude that if an egalitarian theory is to succeed in convincing us of why there is something rather than nothing, it is with a large, bounded number of possible worlds.

Is the question meaningful?

There are many ways by which we can determine whether an utterance, such as an utterance of WSR, is meaningful or not, though I do not wish to wade through the theories here. Instead, I will outline A. R. Lacey's argument to show that WSR should not be dismissed at face value, and assume that this is in and of itself an appropriate ground to claim that the question can be asked meaningfully. In Robert Nozick, Lacey claims that WSR is “rarely discussed by philosophers, partly because it is often assumed to be either senseless, or dispensable, or insoluble. It might be thought senseless by someone who thought ... that we cannot understand a question unless we know what would count as an answer to it”.2

While we do not know what would definitively count as an answer to such a question, there has been sufficient investigation into what might count for it to not seem senseless. Indeed, if it were senseless, we would struggle to even comprehend what was being asked (as opposed to feigning confusion or calling for ‘therapeutic’ charges of senselessness). He continues “the question might be thought dispensable if it were a necessary fact that there is something, although this would need to be shown”.3 Since it is not apparent that it is necessary that there is something, the meaningfulness of the question cannot be dismissed on the grounds of dispensability. Or, if it is a necessary fact that there is something, it does not seem immediately apparent, and an explanation of such a fact would then guarantee that the question was meaningful.

1 (Lacey, 2001) 2 (Ibid, p.177) 3 (Ibid)

3

Finally, “if a problem is known to be insoluble, like that of squaring the circle, or trisecting an angle using only ruler and compasses, we would indeed waste our time trying to solve it, but might still be left philosophically puzzled and unwilling to dismiss it as senseless”.4 In this last case, the question does not seem initially insoluble as it is not paradoxical. Furthermore, it has not been proved that it is impossible for an answer to exist in the way that Lacey's examples have. Trying to prove that an answer to this question is not possible would still require a level of philosophical investigation that we can understand to be meaningful. The investigation that showed the problem of squaring the circle to be insoluble resulted in the discovery that 𝜋𝜋 is not only irrational but transcendental — by no means a trivial or meaningless result. I will therefore maintain that this question fulfils the definition of meaningfulness because it has been shown that it isn’t senseless or dispensable, and if it is insoluble it’s not altogether clear that this means it isn’t worth trying to answer it (which would presumably be the only interesting practical consequence of labelling it ‘unanswerable’). Given this latter claim I will adopt a liberal attitude toward attempted ‘answers’ to the question in order to see what we can learn from discussing them. So I will now move straight to discussing proposed ‘answers’ to the question.

Possible answers

Robert Nozick's5 first discussion about answers to the WSR concerns inegalitarian theories: “they hold that one situation or a small number of states 𝑁𝑁 are natural or privileged and in need of no explanation”.6 This concept of holding certain states of affairs as natural is common among theories such as Newtonian mechanics, where rest is the natural state.7 Such theories are “especially well geared to answer questions of the form “why is there X rather than Y"”.8 It is understandable that, in trying to answer the title question, we would first explore an inegalitarian theory as the question is of a form that presupposes an inegalitarian state of affairs: embedded in the structure of “why is there something rather than nothing?" is the supposition that it is unusual that there is something, as if we would expect nothing in the absence of any ‘force’ (or reason). Nozick's exploration of how something could be produced from a privileged nothing state9 has been widely dismissed, with Smith describing it as “quite absurd, standing as little more than an imaginative play on words”.10 Inegalitarian theories will always rely on states that require no explanation, which can easily be seen as problematic, given that we would need to be able to explain why certain states require no explanation. It is for this reason that an egalitarian theory is considered.

Nozick creates his egalitarian theory by applying “a version of the principle of indifference from probability theory”11 to ways that might obtain. The principle of indifference, according to Keynes, who coined the term, “asserts that if there is no known reason for predicating of our subject one rather than another of

4 (Ibid) 5 (1981) 6 (Nozick, 1981, p.121) 7 (Ibid) 8 (Ibid) 9 (1981, p.122) 10 (1987, p.6) 11 (Nozick, 1981, p.127)

4

several alternatives, then ... the assertions of each of these alternatives have an equal probability”.12 Nozick implicitly commits himself to the claim that it is unknown to us how it is decided what obtains, as this is a precondition for the application of the indifference principle.

Nozick claims that “there are many ways 𝑤𝑤1,𝑤𝑤2 … for there to be something, but there is only one way 𝑤𝑤0 for there to be nothing”13, and then asks us to “assign equal probability to each alternative possibility 𝑤𝑤𝑖𝑖 assuming it is a completely random matter which one obtains”.14 However, depending on how we are to interpret “many ways 𝑤𝑤1,𝑤𝑤2 …” Nozick's proposed probability distribution could be problematic. It is possible that the ‘many ways’ Nozick is referring to is some arbitrarily large number 𝑁𝑁, which we may not know, but we could at least bound. If this is the case, then 𝑃𝑃(𝑤𝑤𝑖𝑖), the probability of way 𝑤𝑤𝑖𝑖 being the way the world obtains, is equal to 1/𝑁𝑁, as we are assigning equal probability to 𝑁𝑁 many ways, which is well defined under the axioms of probability. (A simple illustration of uniform probability over 𝑁𝑁 many ways can be given by considering a fair 6-sided dice, where the probability of a certain face landing face-up is 1/6, as there are 6 ways).

However, the common mathematical interpretation of ‘𝑤𝑤1,𝑤𝑤2…’ would be that there are infinitely many ways. (Indeed, if Nozick was meaning to refer to some arbitrarily large number, 𝑁𝑁, of possible worlds, the more conventional notation would be ‘𝑤𝑤1,𝑤𝑤2 …𝑤𝑤𝑛𝑛’). In this instance, assigning equal probability to each alternative is not well defined. Intuitively, using 𝑃𝑃(𝑤𝑤𝑖𝑖) = 1/𝑁𝑁, we find that the probability of the i-th way-the-world-could-be, being the one which actually obtains, is equal to zero. This is because we would have lim

𝑁𝑁→∞𝑃𝑃(𝑤𝑤𝑖𝑖) = lim

𝑁𝑁→∞1𝑁𝑁

= 0. (This limit notation is necessary to speak meaningfully about infinity.

The first equality is true by our definition of the probability of it being the i-th way that obtains, and the second equality is a basic result from analysis).15

Before offering a more comprehensive proof of why it is not possible to distribute probability in this way, there are some assumptions being made about the nature of ways (that might obtain) that are worth justifying, though I believe that Nozick would have no reason to disagree with these assumptions. Firstly, ways are being treated as discrete. ‘Discrete’ is being used to say that there can only be a whole number of ways, not that different ways have no similar properties (they are numerically distinct, not necessarily distinct in content). This is because it is not sensible to speak of there being ‘eleven-and-a-half ways’ that might obtain (a fractional number of ways), or 4√2 ways that might obtain (an irrational number of ways). Secondly, it is not sensible to speak of −17 ways that might obtain (a negative number of ways) despite −17 being understandable as a ’whole’ number of ways. Because of this, the only meaningful value we can assign to the number of ways that might obtain (𝑛𝑛) is a natural number, i.e. such that 𝑛𝑛 ∈ ℕ. (Here I will understand the set of natural numbers as 0,1, 2, … , the reason I adopt the non-negative integer set is because even though the ‘nothing-way’ counts as one way the world could be, which is different from

12 (1929, p.42) 13 (1981, p.127) 14 (Ibid) 15 See (Sutcliffe, 2014, p.31) for a discussion of the basic result.

5

saying there are no ways the world could be, Nozick uses ‘0’ in his subscript to denote the nothing-way. Nothing hinges on how we assign the symbols to the ways16).

It is worth making clear my motivation for understanding the number of ways that might obtain as the naturals ℕ, rather than any other set. Uniform probability over any set reaching from −∞ to ∞, including the reals ℝ, is not well defined. However, uniform probability over a compact interval [𝑎𝑎, 𝑏𝑏] is well defined, and because of the completeness of ℝ, there are infinitely many elements in such a compact interval. It may be possible to map an infinite number of ways that might obtain, or possible worlds, onto such a compact interval where the probabilities at hand are well defined. However, it is clear that Nozick has made no such attempt and so I shall now formulate a more comprehensive proof of why the probability distribution he has actually proposed is not well defined. The axioms of probability are given by Borovkov as follows17:

1. 𝑃𝑃(𝜔𝜔) ≥ 0 2. 𝑃𝑃(Ω) = 1 3. 𝑃𝑃(⋃ 𝑤𝑤𝑖𝑖

∞𝑖𝑖=0 ) = ∑ 𝑃𝑃(𝜔𝜔𝑖𝑖)∞

𝑖𝑖=0

Where Ω is the entire sample space.

In our case, Ω is the set of all possible ways 𝑤𝑤0,𝑤𝑤1,𝑤𝑤2 … that might obtain. In asking us to assign equal probability to each alternative possibility, Nozick is committing us to a probability distribution that violates the axioms of probability as follows:

• By the principle of indifference, let the probability that the i-th possible world obtains 𝑃𝑃(𝑤𝑤𝑖𝑖) be such that 𝑃𝑃(𝑤𝑤𝑖𝑖) = 𝑝𝑝, for some 𝑝𝑝 ≥ 0 (in accordance with the first axiom).

• We know that Ω = 𝑤𝑤0,𝑤𝑤1,𝑤𝑤2… = ⋃ 𝑤𝑤𝑖𝑖∞𝑖𝑖=0 (this is true analytically, they are merely different

ways of writing the same set)

• By the third axiom, 𝑃𝑃(⋃ 𝑤𝑤𝑖𝑖∞𝑖𝑖=0 ) = ∑ 𝑃𝑃(𝑤𝑤𝑖𝑖)∞

𝑖𝑖=0 = 𝑝𝑝+ 𝑝𝑝 + 𝑝𝑝 +⋯ (again, this is true analytically, the sigma notation being used here simply says that we should sum all of the individual probabilities)

• We now have two cases to consider:

If 𝑝𝑝 > 0, 𝑝𝑝 + 𝑝𝑝+ 𝑝𝑝+⋯ = ∞

If 𝑝𝑝 = 0, 𝑝𝑝 + 𝑝𝑝+ 𝑝𝑝+⋯ = 0

16 This treatment was prompted by a discussion with the editor. The editor would also like to apologise for the pun. 17 (2013, p.13)

6

Neither situation is able to fulfil the second axiom (that is that there does not exist a 𝑝𝑝 ≥ 0 such that 𝑝𝑝 + 𝑝𝑝 + 𝑝𝑝 + ⋯ = 1) meaning the distribution is not valid. It is important to note here that I am not claiming that conventional axioms of probability hold over all possible worlds. However, since Nozick’s egalitarian argument18 relies on conventional probability axioms to arrive at his conclusion, this invalid distribution is problematic.

Peter van Inwagen also explores possible answers to the question in ‘Why is there anything at all?’, arriving at an egalitarian theory similar to Nozick’s whose four premises are as follows19:

(i) There are some beings; (ii) If there is more than one possible world, there are infinitely many; (iii) There is at most one possible world in which there are no beings; (iv) For any two possible worlds, the probability of their being actual is equal.

van Inwagen explicitly states that infinitely many possible worlds are part of his argument, leaving his egalitarian theory open to the same probabilistic criticism that Nozick’s has just been subjected to. However, despite this, van Inwagen provides a good response20 to a further criticism from probability theory. Allow it to be assumed that the axioms of probability are not violated as they were before. The sum of the probabilities of all possible ways is such that ∑ 𝑃𝑃(𝑤𝑤𝑖𝑖) = 1∞

𝑖𝑖=0 and the probability of a specific way 𝑤𝑤𝑖𝑖 being the way that obtains is still 𝑃𝑃(𝑤𝑤𝑖𝑖) = 1/𝑁𝑁. Then the probability that a something-way obtains is 1− 𝑃𝑃(𝑤𝑤0), where 𝑤𝑤0 is the single nothing-way. For an arbitrarily large 𝑁𝑁, the probability that a something-way obtains is 1, as 𝑃𝑃(𝑤𝑤0) = 0. But then do we not have the problem that the probability of any particular way obtaining is also equal to zero? van Inwagen uses a dart board analogy to illustrate this obscure result: “the probability of a dart’s hitting any particular point on a dart board is 0”21, yet it is obvious that it is not impossible for a dart to hit a dart board. This is because an event having probability equal to zero is not equivalent to that event being impossible. Analogously, the probability of any specific possible world being the one that obtains is 0, but this is not to say that it is impossible for anything to obtain. If we were to formulate an egalitarian theory where possible worlds are mapped to ℝ as suggested earlier, this could explain how it is that anything can obtain at all.22

However, van Inwagen’s egalitarian theory of why there is anything at all is still mathematically invalid because his premises guarantee there being infinitely many worlds with equal probability of their being actual. If there is to be an egalitarian theory that holds water, then, it is one where the number of ways that might obtain is bounded. This bound cannot simply be arbitrary if it is to exist. van Inwagen gives the following defence of his second premise: “it may be pointed out that if there is more than one possible world, then things can vary; and it seems bizarre to suppose, given the kinds of properties had by the things we observe, properties that seem to imply a myriad of dimensions along which these things could vary

18 (1981, p.127) 19 (1996, pp.95-96) 20 (1996, p.99) 21 (Ibid) 22 I will not pursue this line of thought in this criticism.

7

continuously, that there might be just two or just 17 or just 510 worlds”.23 A priori, it does seem as though infinitely many possible worlds is the more believable option. However, it may be possible to ascertain our bound by other means. Cliff has spoken about24 the input of theoretical physics in the discussion of why there is something rather than nothing: “It has been estimated that there are 10500 different versions of string theory. Each one would describe a different universe with different laws of physics”.25 Even though 10500 is an incomprehensibly large number, it would still successfully act as a bound to the number of possible worlds that could then be used in egalitarian probability calculations.

Conclusion

To conclude, WSR is certainly a meaningful question as far as the definition offered here is concerned. It has been argued that given our lack of knowledge regarding how it is decided which way obtains, an egalitarian theory is more convincing than an inegalitarian one. However, there are some serious probabilistic issues with infinite numbers of possible worlds, an assumption made use of in both Nozick’s and van Inwagen’s answers. Ultimately, if an egalitarian argument is to succeed in convincing us that it is more probable that there is something than nothing, then it is to do so by taking there to be a large, but bounded, number of possibilities.

23 (1996, p.101) 24 (2015) 25 (Found at 8:27 in 2015)

8

Bibliography

Borovkov, Alexander A. (2013) ‘Probability Theory’, London: Springer.

Cliff, Harry. (2015) ‘Have we reached the end of physics?’, (Online) Available at:

http://www.ted.com/talks/harry_cliff_have_we_reached_the_end_of_physics/transcript?language=en (Accessed 2 March 2016)

van Inwagen, Peter, and Lowe, E. J. (1996) ‘Why is there anything at all?’, Proceedings of the Aristotelian Society, Supplementary Volumes, Vol. 70, pp.95-120.

Keynes, J. M. (1929) ‘Chapter IV: The Principle of Indifference’, in A Treatise on Probability. London: Macmillan, pp.41-64.

Lacey, A. R. (2001) ‘Chapter VII: Metaphysics II: Explaining Existence’, in Robert Nozick. Chesham: Acumen, pp.177-187.

Nozick, Robert. (1981) ‘Why is there something rather than nothing?’, in Philosophical Explanations. Oxford: Clarendon Press, pp.115-164.

Smith, Joesph Wayne. (1987) ‘Essays on Ultimate Questions’, Aldershot: Avebury.

Sutcliffe, Paul. (2014) ‘Calculus’, (Online) Available at:

http://www.maths.dur.ac.uk/~dma0pms/calc/notes.pdf

(Accessed 2 March 2016)

9

http://www.ted.com/talks/harry_cliff_have_we_reached_the_end_of_physics/transcript?language=en

Depression and the Phenomenology of Intersubjectivity: A Gadamerian approach to depression

Constantin Mehmel

Introduction

This paper attempts to sketch a phenomenological account of impaired intersubjectivity in depression. Depression, I propose, can be framed as a ‘dialogical’ illness in that it fundamentally alters the way one relates to other people and the presupposed shared background. I therefore argue that depression entails what I call an altered ‘experience of the Other’. In order to understand how depression alters the phenomenology of intersubjectivity, I draw on Gadamer’s phenomenology of understanding via the fusion of horizons. I begin by sketching a Gadamerian perspective of an intact dialogue between two people. The rest of the paper is then dedicated to understanding the differing forms of dialogue that occur in depression.

Gadamer on Dialogue

In order to portray an intact experience of the Other, and the phenomenology of intersubjectivity more generally, we need to set out how understanding normally takes place between two people. For, I suggest, that our experience of the Other (hereafter synonymous to ‘another person’) is inextricable from coming to understand the Other and her claim regarding the mutual subject matter at hand. In other words, a failure in understanding can explain our diminished experience of the Other, something key to depression as I will show later on. To establish such an intact dialogue, we can turn to Hans-Georg Gadamer and his phenomenology of understanding via the fusion of horizons. According to Gadamer, the starting point for any dialogue between two people is that each interlocutor enters the dialogue from within a unique horizon. Denoting “the range of vision that includes everything that can be seen from a particular vantage point”1, a horizon structures one’s experience of the Other. I bring along certain “tacit expectations of meaning and truth”2, through which I perceive the Other and her claim regarding the subject matter. In light of this, it would be wrong to understand a horizon as a necessarily restrictive force. Although it does limit our perception of possibilities, it provides at the same time the very conditions whereby we can experience the Other in the first place.3 In fact, a horizon is not closed off, but rather open towards new experiences. As Gadamer puts it, “[a] horizon is not a rigid boundary but something that moves with one and invites one to advance further”.4 Whenever I experience something new, my horizon is expanding.

1 (GW1, p.307; TM, p.301), references to primary sources by Gadamer are given according to abbreviations listed in the bibliography. 2 (Garrett, 1978, p.393) 3 see (GW2, p.224; PH, p.9) 4 (GW1, p.250; TM, p.238)

10

Underlying such openness, we can identify a more far-reaching claim that my horizon does not exist independently from the Other’s horizon, but rather that both belong to a more fundamental, shared horizon.5

Gadamer thus appears to advance the Heideggerian notion that we are always already in relation with others, something that is crucial for our project of a phenomenology of intersubjectivity. Although both interlocutors have a unique horizon and thus experience the subject matter differently, they are nonetheless attuned to each Other. This holds true regardless of whether or not the different perspectives lead to a disagreement regarding the subject matter. Two people might experience things differently – and in that sense ‘disagree’ – and yet, such divergence is only possible against the backdrop of a presupposed shared background.6 Any dialogue therefore occurs within what we might call a shared, intersubjective horizon in the sense that both parties are already united by it: “I may say ‘Thou’, and I may refer to myself as over against a Thou, but a common understanding always precedes these situations”.7 Hence, Gadamer concludes, that the “formulation ‘I and Thou’ already betrays an enormous alienation”, since “there is neither the I nor the Thou as isolated, substantial realities”.8

We can therefore extract from Gadamer’s work the view that any two people conversing with each Other do not exist as two isolated realities. Rather, they share in a mutually constituted interpersonal reality, which again is constitutive of their respective outlook onto the world. In fact, this wider interpersonal horizon can be understood as a ‘transcendental condition’ in that, without it, the acquisition of propositional knowledge about the Other would be impossible, and would thus leave the structure of experiencing the Other compromised. In other words, such presupposed shared background makes it possible for the two people entering a dialogue to come to an understanding. Both of their horizons can fuse to a third more-encompassing one, the process of which Gadamer calls ‘fusion of horizons’.

However, simply being attuned to each Other is not sufficient for what we might call a ‘successful’ fusion of horizons, where appreciating the Other and her experience is appreciating it as unique and thus hers. Gadamer emphasises a fundamental openness that needs to be present in a dialogue, without which “there is no genuine human bond”.9

Such mutual openness involves a willingness to be transformed by the Other and thus what Gadamer calls the ‘fore-conception of completeness’, that both interlocutors assume each Other’s claim to be meaningful and true.10 For, only if we deem the Other a possible dialogue partner can we give her enough space to articulate herself, hence acknowledging her as a person with a unique horizon. Otherwise, we risk projecting ourselves onto the Other, whereby we would reduce her to an object-like status and consequently dispense with her as a “moral phenomenon”.11

5 (GW1, p.309; TM, p.303) 6 (Ratcliffe, 2014a, pp.272, 273) 7 (GW2, p.223; PH, p.7) 8 (GW2, p.223; PH, p.7) 9 (GW1, p.367; TM, p.335) 10 (GW1, p.229; TM, p.294) 11 (GW1, p.364; TM, p.352)

11

From a Gadamerian perspective, we can therefore conclude that an intact dialogue aims at a fusion of horizons with the Other, allowing us to experience and thus recognise the Other as a person. Intersubjectivity and the experience of the Other is not something artificially constructed, contra views “that the Other can first be given only as a perceived thing, and not as a living, as given ‘in the flesh’”.12 The experience of the Other cannot be an act of self-relatedness13, emulating what it is like to be the Other from the self’s viewpoint. For, this would assume a privileged access to the Other’s mind14, whereby the experience of the Other would be diminished and reduced to a projection of the self. Instead of being open towards the Other and immediately recognising her experiences as something distinct and hers, such an encounter of the Other would supersede both the distinction between ‘my’ and ‘your’ experience, and thus between the self and the Other.15

Key to the phenomenology of intersubjectivity, however, is the mutual recognition of each Other as the bearers of unique experiences that can transform us, without which the fusion of horizons will not succeed. In other words, a phenomenology of intersubjectivity, as we have construed it here, involves both the recognition of another person and the resultant fusion of horizons. This fusion changes the way both interlocutors relate to each Other; not only has their knowledge of the subject matter enlarged but so has their knowledge of the Other’s view on it. That is to say, the way one experiences the Other has been altered as one’s horizon has been expanded, enabling an experience of the Other that was impossible prior to the fusion.

However, this fusion should not just be understood in terms of two individual horizons expanding. For, the prime focus is not on each of the interlocutors and their newly extended horizons, but on the event of the fusion itself. Being mutually open towards each Other, they are united by their common aim of understanding the subject matter and thus experiencing the respective Other. This event structure can be linked to what Gadamer captures elsewhere with his concept of ‘play’: “The primacy of the game over the players engaged in it is experienced by the players themselves in a special way, where it is a question of human subjectivity that adopts an attitude of play ... the game itself is a risk for the player. One can only play with serious possibilities. ... The attraction of the game, which it exercises on the player, lies in this risk”.16

Applied to the fusion of horizons, both dialogue partners are guided by the dialogue itself, yielding to an intersubjective dynamic. This is why the fused, third horizon constitutes a shared, intersubjective horizon belonging to both rather than either of them exclusively. However, without the willingness to be challenged, thus putting ourselves “into play ... through being at risk”17, we cannot fuse horizons and experience the Other. Sketching a Gadamerian perspective of an intact dialogue, we can thus infer that it

12 (GW1, p.95; SI, p.283) 13 (GW1, p.365; TM, p.353) 14 (GW1, p.365; TM, p.353) 15 The general difference between approaches open towards the Other and emulating the Other can also be cast in non-Gadamerian terms as one between phenomenological and simulationist approaches to empathy. For an overview and analysis of the extent to which those overlap, see (Ratcliffe, 2012 and 2014a). 16 (GW1, pp.111-112; TM, p.95) Italics are my own. 17 (GW1, p.304; TM, p.266) Italics are my own.

12

entails both mutual openness and trust towards the Other, without which we cannot appreciate the Other and her experiences as hers.

Dialogue in Depression

Drawing on Gadamer’s phenomenology of understanding via the fusion of horizons has allowed us to sketch a phenomenology of intersubjectivity. We have established how understanding normally takes place between two people and thus, more generally provided an account of an intact dialogue. In light of this, I shall now apply these findings to the phenomenology of depression, elucidating the differing forms of dialogue that occur in depression.18

Although ‘depression’ is used as an umbrella term for a number of diagnoses, I shall focus on a phenomenological change in the experience of the Other that can be found in many autobiographical accounts, all describing an impaired form of intersubjectivity. For instance, consider the following statements19:

“When I’m depressed I feel like my relationships are less stable and I trust others a lot less. I try to avoid people, as they seem angry and irritated at me. ... I feel like a burden.”

“I find other people irritating when depressed, especially those that have never suffered with depression, and find the ‘advice’ often given by these is unempathetic and ridiculous.”

In these accounts, which I take to be representative of the aforementioned phenomenological change, we can identify the two principal themes of isolation and lack of trust. Interpersonal relations seem, at least most of the time, bereft of any positive, warm dimension. Instead, the depressed person experiences the Other as a threat and alienating force, with whom she cannot enter a genuine bond. One way of construing this change in experiencing the Other is in terms of the fusion of horizons between two people, and thus how understanding occurs. Whereas a mutual openness lies at the heart of an intact dialogue, a depressed person is lacking such openness in virtue of not trusting the Other. As a result, she seems incapable of putting herself ‘into play’ and ‘at risk’. Not yielding to the intersubjective dynamic of completely letting go in the process of the dialogue, the depressed person prevents herself from fusing horizons with the Other, thus from appreciating the Other as a person. Instead, the Other is reduced to a projection of the depressed, constituting a threat.20

The lack of trust furthermore explains why other people’s advice is deemed ‘unempathetic and ridiculous’. Key to the experience of the Other in an intact dialogue is the ‘fore-conception of completeness’, as I have outlined in the first section. The depressed person however does not seem to be in a position to presuppose the Other’s claim to be meaningful and true, since she has reduced the Other to an object-like status of embodying (almost) nothing but a threat. That is to say, the possibility of interacting with the Other in a way that could change the depressed person’s horizon is diminished. Hence, she does not feel understood

18 In this context, dialogue is understood broadly enough so as to encompass any communicative interaction between two people. 19 (Ratcliffe, 2014a, p.274) 20 see also (Ratcliffe, 2014b, p.234)

13

by the Other, which in turn makes her feel even more isolated and like a ‘burden’.21 In fact, even if the depressed person wanted to be understood, “[yearning] for connection”, a fusion of horizons could not take place, as she “[is] rendered incapable of being with others in a comfortable way”.22

It is therefore plausible to infer that depression involves a diminished experience of the Other, more generally an impaired form of intersubjectivity. The account sketched so far reveals the inability to connect and thus experience the Other in a horizon-changing way. Without being in a dialogue with the Other however, the depressed lacks the possibility “of immersion in a dynamic world that incorporates the potential for meaningful change”.23 Instead, we find the depressed person completely shut off from the world:

“I feel like I am watching the world around me and have no way of participating”.24

An intact dialogue always occurs within a shared, intersubjective horizon that unites both interlocutors. This is why we concluded in the first section that the formulation of ‘I and Thou’ does not do justice to our phenomenology of intersubjectivity, as both do not constitute two completely separate realities. The above quote however seems to depart from such an account. Rather than being mutually attuned to each Other, I suggest, the depressed person appears to fall out of such a mutual framework. What has been viewed as a transcendental condition in an intact dialogue, i.e. the interpersonal horizon, is missing. This leaves the structure of experiencing the Other compromised. Such a change that occurs in depression “has a profound effect upon one’s sense of agency”.25 What this involves can best be understood, I propose, when broadly conceptualising the depressed person as what I call a ‘radical Other’.26 As sketched in the first section, in a normal dialogue, two people experience things differently in virtue of each having a unique horizon, and yet both belong to a shared, intersubjective horizon. In a dialogue between a depressed and non-depressed person however, the two perspectives at work differ more fundamentally. For, the former does not seem to be part of the same framework as the latter, as established before. This is why the depressed person does not feel understood but isolated, feeling completely detached from everyone else without any possibility of taking part in the world. The lack of a mutually shared backdrop does equally affect those interacting with the depressed person in that she struggles to relate:

“When I start to get depressed, I only filter through the negative messages from friends and family ... As a result, they soon learn to step on egg shells around me, they become less affectionate because I’m less receptive. ... It’s a very hard thing to do to be able to step back and realize that someone who is depressed is projecting their own thoughts onto others.”27

The seeming impossibility for the depressed person to fuse horizons thus affects the non-depressed person. Being exposed to sheer negativity, the depressed person is likely reduced to an object-like status, being

21 Whether or not the feeling of isolation precedes the feeling of not being understood, in my view, does not have any bearing on the presented Gadamerian reading. 22 (Karp, 1996, p.14) 23 (Ratcliffe, 2014a, p.277) 24 (Ratcliffe, 2014a, p.274) 25 (Ratcliffe, 2013, p.584) 26 This notion and its implications are inspired by Lévinas’s radical alterity (e.g. 1969, p.194). 27 (Ratcliffe, 2014a, p.279)

14

‘unreachable’. Such reduction however appears problematic in that the depressed person becomes even more out of reach, if not actually being avoided. In other words, through such a reduction and the resultant alienation, we run the risk of dispensing with the depressed person as a moral phenomenon, as another person with unique experiences. In fact, this risk is revealing with respect to the phenomenological account of impaired intersubjectivity in depression. Central to the experience of the Other is “an appreciation of [her] potential to reshape one’s world”28, the potentiality of which the depressed person seems to lack in virtue of being ‘unreachable’. Even though the fusion of horizons thus cannot take place, we should nonetheless attempt to ‘realize that someone who is depressed is projecting their own thoughts onto others’ and avoid reducing the depressed person completely. For, “[much] of depression’s pain arises out of the recognition that what makes me feel better – human connection – seems impossible in the midst of a paralyzing episode of depression”.29

Hence, instead of dispensing with the depressed person as a moral phenomenon, our phenomenological analysis points to the paradoxical situation of those who are depressed, feeling like a radical Other herself and yet ultimately not wanting to be reduced as such. This is why conclusive reports such as “the psyche of the patient is too well understood”30 have to be treated carefully. On the one hand, they reveal the depressed person’s diminished sense of agency, feeling isolated and lacking any interpersonal possibilities, which again gives rise to an impaired form of intersubjectivity. On the other hand, it does not take much from here to yield to a reductionist experience of the depressed, perceiving her as nothing more than an object. This again could amount to the loss of the possibility of helping the depressed person, who is however dependent on our willingness to engage with her in a transformative manner.

Conclusion

The aim in this paper has been to sketch a phenomenological account of impaired intersubjectivity in depression. The claim has been that drawing on Gadamer’s phenomenology of understanding via the fusion of horizons, helps elucidate how depression affects the phenomenology of intersubjectivity. Against the backdrop of an intact dialogue between two people, we have construed the differing forms of dialogue that occur in depression in terms of the seeming impossibility of fusing horizons. No doubt the account given here does not apply to all cases of depression, however, the reader will hopefully realise that such a phenomenological sketch enables an understanding of depression that might otherwise not be possible.

28 (Ratcliffe, 2014b, p.236) 29 (Karp, 1996, p.16) 30 (Minkowksi, 1970, p.178)

15

Bibliography

Ratcliffe, M. (2012) ‘Phenomenology as a Form of Empathy’, in Inquiry: An Interdisciplinary Journal of Philosophy, Vol. 55, No. 5, pp.473-495.

– (2013) ‘Depression and the Phenomenology of Free Will’, in The Oxford Handbook of Philosophy and Psychiatry (ed. K.W.M. Fulford et al.), pp.574-591.

– (2014a) ‘The Phenomenology of Depression and the Nature of Empathy’ in Medicine, Health Care and Philosophy, Vol. 17, No. 2, pp.269-280.

– (2014b) ‘The structure of interpersonal experience’ in Moran, D. and Jensen, R. (ed.) Phenomenology of Embodied Subejctivity, Springer, Dordrecht, pp.221-238.

Gadamer, H.-G. (1993ff) Gesammelte Werke, 7th edition, Mohr Siebeck, Tübingen. (GW)

– (2008) Bd. 1: Hermeneutik I: Wahrheit und Methode. Grundzüge einer philosophischen Hermeneutik.

– (1993) Bd. 2: Hermeneutik II: Wahrheit und Methode. Ergänzungen, Register.

– (1976) Philosophical Hermeneutics (trans. Linge, D.E.), University of California Press, Berkeley. (PH)

– (2004) Truth and Method, 2nd revised edition, (trans. Weinsheimer, J. and Marshall, D. G.), Continuum, London. (TM)

Garrett, J. E. (1978) ‘Hans-Georg Gadamer on “Fusion of Horizons”’, in Man and World, Vol. 11, No. 3/4, pp. 392-400.

Karp, D. (1996) ‘Speaking of Sadness: Depression, Disconnection, and the Meanings of Illness’, Oxford: Oxford University Press.

Lévinas, E. (1969) ‘Totality and Infinity’, (trans. Lingis, A.), Pittsburgh: Duquesne University Press.

Minkowski, E. (1970) ‘Lived Time: Phenomenological and Psychopathaological Studies’, (trans. Metzel, N.), Evanston: Northwestern University Press.

16

Illustrating Pearl’s Approach to Causality by Examples, and Responding to Cartwright’s Criticisms

Kim Tullar

Abstract

In the year 2000, Judea Pearl’s ‘Causality’ was published, providing an expansive account of the Bayesian networks (henceforth ‘bayes nets’) approach to causality, emphasising its practical, mathematical nature. Whilst not all aspects of Pearl's approach are persuasive, I believe its core should be accepted by philosophers, scientists and statisticians alike. Yet Pearl's approach is not widely taught; most students will graduate without even hearing about it. This paper aims to rectify this issue somewhat, and persuade readers of the core of Pearl's approach by giving a few examples implementing it in section 1. In section 2, I address some of Nancy Cartwright's criticisms of the approach, arguing that whilst important, they can be answered.

1. Example Applications of Pearl’s Approach

Pearl's approach to causality is really just one version of the bayes nets approach to causality1 and I will make use of some of the other bayes nets approaches2 despite focussing on the account proposed by Pearl.

1.1 The Lawn Example

We begin with a simple example demonstrating the basics of utilising bayes nets to understand causality, adapted from Pearl3. Say we want to build a causal model for how a lawn can get wet. A simple model is that rain can cause wetness, a sprinkler can cause wetness, and those are the only causal relations. This model is represented by the graph of Figure 1, where 𝑅𝑅 represents rain, 𝑆𝑆 the sprinkler, and 𝑊𝑊 the wetness of the lawn.

1 (Woodward, 2013, Stanford Encyclopedia of Philosophy) 2 (Glymour, 2010) 3 (2000, p.15)

Figure 1

17

The arrows of Figure 1 represent direct causality with respect to the variables included in the graph. Let us take a moment to negatively define what ‘direct causality’ means in general. 𝑋𝑋 is defined as not directly causing 𝑌𝑌 iff, if 𝑋𝑋 causes 𝑌𝑌 at all, it is only via other variables included in the graph. What precisely is meant by ‘via’ another variable depends on one’s account of causation. Bayes nets are commonly taken to suggest a non-reductive, manipulationist account of causation4, on which 𝑋𝑋 does not directly cause 𝑌𝑌 iff holding all other causes of 𝑌𝑌 in the graph constant, or just every other variable in the graph, would render 𝑌𝑌 constant no matter the value of 𝑋𝑋. 5 In any case, we omit an arrow from 𝑋𝑋 to 𝑌𝑌 whenever we know 𝑋𝑋 is not a direct cause of 𝑌𝑌.

Returning to the lawn example, the bayes nets approach asks us to find out which variables are conditionally ‘d-separated’, so that we can draw conclusions about their probability distribution. d-separation is defined by Pearl as in definition 1.6

Definition 1. A path from 𝑎𝑎 to 𝑏𝑏 is a sequence of edges in any direction beginning with 𝑎𝑎 and ending with 𝑏𝑏, for instance 𝑎𝑎 ← 𝑐𝑐 → 𝑏𝑏. We say a path from 𝑎𝑎 to 𝑏𝑏 is d-separated by a set of nodes 𝐶𝐶 iff

1. It contains a chain 𝑖𝑖 → 𝑐𝑐 → 𝑗𝑗, 𝑖𝑖 ← 𝑐𝑐 ← 𝑗𝑗 or 𝑖𝑖 ← 𝑐𝑐 → 𝑗𝑗, where 𝑐𝑐 ∈ 𝐶𝐶, or

2. It contains a chain 𝑖𝑖 → 𝑐𝑐 ← 𝑗𝑗, where 𝑐𝑐 ∉ 𝐶𝐶, and no descendant of 𝑐𝑐 is in 𝐶𝐶.

If all paths from 𝑎𝑎 to 𝑏𝑏 are d-separated by 𝐶𝐶, we say 𝑎𝑎 and 𝑏𝑏 are d-separated given 𝐶𝐶, which we write 𝑎𝑎 ⫫𝑑𝑑 𝑏𝑏 | 𝐶𝐶. If 𝐴𝐴 and 𝐵𝐵 are sets of nodes, and for all 𝑎𝑎 ∈ 𝐴𝐴, 𝑏𝑏 ∈ 𝐵𝐵 , we have 𝑎𝑎 ⫫𝑑𝑑 𝑏𝑏 | 𝐶𝐶, then we say 𝐴𝐴 and 𝐵𝐵 are d-separated given 𝐶𝐶, which is written 𝐴𝐴 ⫫𝑑𝑑 𝐵𝐵 | 𝐶𝐶.

The bayes net approach to causality holds that if the causal graph is ‘correct’ (we will come to what this means in section 2.1), and if 𝐴𝐴 ⫫𝑑𝑑 𝐵𝐵 | 𝐶𝐶, then 𝐴𝐴 is independent of 𝐵𝐵 given 𝐶𝐶 in the probability distribution, which we write 𝐴𝐴 ⫫ 𝐵𝐵 | 𝐶𝐶. By observing Figure 1, we see that the only d-separation implied by the graph is 𝑅𝑅 ⫫𝑑𝑑 𝑆𝑆. Therefore, assuming our model is correct, we have 𝑅𝑅 ⫫ 𝑆𝑆. This is equivalent to the fact that the joint distribution7 can be factorised as:

Pr(𝑅𝑅, 𝑆𝑆,𝑊𝑊) = Pr(𝑅𝑅) Pr(𝑆𝑆) Pr(𝑊𝑊|𝑅𝑅,𝑆𝑆), (1)

Where each variable is conditioned on its parents and, for simplicity (in this example and the next), I will assume all the variables are binary, with 1 representing truth, and 0 falsity. For instance, if we assume the probability of rain is 0.4 (Pr(𝑅𝑅 = 1) = 0.4), the sprinkler being off is 0.8 (Pr(𝑆𝑆 = 0) = 0.8), and that

4 (Woodward, 2013, sect. 1) 5 (Glymour, 2010, pp.171,172) presents a similar definition. 6 (2000, pp.16-17) 7 For simplicity, we talk of a `joint distribution' by itself without defining it relative to a sample (the Frequentist approach) or a belief-state (the Bayesian approach). When it is relevant, we will adopt the Frequentist approach for the purposes of this paper. Cartwright (2001, p.244) and Glymour (2010, pp.165, 168) seem to favour it, given their talk of populations, though, confusingly, Glymour seems happy to appeal to prior probabilities (2010, p.194).

18

the probability of the lawn being wet given rain and the sprinkler being off is 0.6 (Pr(𝑊𝑊 = 1 |𝑅𝑅 = 1, 𝑆𝑆 =0) = 0.6), then the probability that it rained, and the sprinkler was off, and the grass was wet, is:

Pr(𝑅𝑅 = 1, 𝑆𝑆 = 0,𝑊𝑊 = 1) = 0.4 × 0.8 × 0.6 = 0.192. (2)

This is an example of how bayes nets methods help us get probabilities from causes.

A sceptical reader may wonder whether equation 1 would still hold if we had used a more sophisticated model for lawn wetness. According to bayes nets methods, a sufficient condition for equation 1 is that rain does not cause the sprinkler, the sprinkler does not cause rain, they have no common cause, and there are no sampling biases. We shall explore why this is sufficient in section 2.1.

1.2 The Smoking Example8

Say we are studying whether smoking causes lung cancer or not. We have observed a correlation between smoking and lung cancer, but it could be due to a common cause, such as a gene which causes people to be both more likely to smoke and more likely to have lung cancer. Say we know that smoking, if it causes cancer at all, only causes it by causing build-up of tar in the lungs, and the gene, if it exists, only affects tar by increasing propensity to smoke. Letting 𝑆𝑆 stand for smoking, 𝑇𝑇 stand for tar in the lungs, 𝐿𝐿 stand for lung cancer, and 𝑈𝑈 stand for the possible unobserved gene affecting 𝑆𝑆 and 𝐿𝐿. Figure 2 shows the causal model.

By conducting a randomised survey of 1,000 subjects, let us suppose we collect the data given in table 1.

8 This example is adapted from (Pearl, 2000, pp.83-88).

Figure 2

19

From this data and the causal model, bayes nets methods tell us how to calculate the casual effect of 𝑆𝑆 on 𝑇𝑇, of 𝑇𝑇 on 𝐿𝐿 and of 𝑆𝑆 on 𝐿𝐿. For any variables 𝑋𝑋 and 𝑌𝑌, the total causal effect of 𝑋𝑋 on 𝑌𝑌 is defined to be Pr(𝑌𝑌 | do(𝑋𝑋)), where ‘do’ is a special operator, which represents intervening to set the value of 𝑋𝑋; see definition 2.9

Definition 2. Consider a directed, acyclic graph of the variables 𝑋𝑋1, … , 𝑋𝑋𝑛𝑛. Assume the graph is a bayes net. Then the joint probability distribution of the variables can be written:

Pr(𝑋𝑋1 , … ,𝑋𝑋𝑛𝑛) = Pr(𝑋𝑋1 | pa(𝑋𝑋1)) … Pr (𝑋𝑋𝑛𝑛 | pa(𝑋𝑋𝑛𝑛)), (3)

Where pa(𝑋𝑋𝑖𝑖) is the set of parents of 𝑋𝑋𝑖𝑖 in the graph. The distribution Pr(𝑋𝑋1 , … ,𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖+1 … , 𝑋𝑋𝑛𝑛|do(𝑋𝑋𝑖𝑖)) is given by regular Bayesian conditionalisation on the pseudo-joint distribution:

Pr′(𝑋𝑋1, … , 𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖+1 … , 𝑋𝑋𝑛𝑛) = ∏ Pr 𝑋𝑋𝑗𝑗 𝑗𝑗≠𝑖𝑖 pa(𝑋𝑋𝑗𝑗)). (4)

This is equivalent to saying:

Pr(𝑋𝑋1, … ,𝑋𝑋𝑖𝑖−1,𝑋𝑋𝑖𝑖+1 … ,𝑋𝑋𝑛𝑛 | do(𝑋𝑋𝑖𝑖))= Pr𝑋𝑋𝑗𝑗 pa(𝑋𝑋𝑗𝑗))

𝑗𝑗≠𝑖𝑖, 𝑋𝑋𝑖𝑖∉pa𝑋𝑋𝑗𝑗

Pr𝑋𝑋𝑗𝑗 pa𝑋𝑋𝑗𝑗,𝑋𝑋𝑖𝑖).𝑗𝑗≠𝑖𝑖, 𝑋𝑋𝑖𝑖∈pa(𝑋𝑋𝑗𝑗)

(5)

The causal effect of 𝑆𝑆 on 𝑇𝑇 is Pr(𝑇𝑇| do(𝑆𝑆)). If 𝑆𝑆 has no direct causal effect on 𝑇𝑇, then it has no causal effect on 𝑇𝑇 at all, and so Pr(𝑇𝑇| do(𝑆𝑆)) should be uniform with respect to 𝑆𝑆; forcing someone to smoke or not should have no effect on their chance of having tar deposits in their lungs. It can be shown that Pr(𝑇𝑇| do(𝑆𝑆)) = Pr (T | 𝑆𝑆); to put this another way, we should not control for anything when calculating

9 The semantic interpretation of the do-operator is, in my opinion, one of the fundamental philosophical assumptions of bayes nets methods, along with the causal markov condition. However, Cartwright does not criticise its use in her essay, so we will allow this assumption for the purposes of this paper.

Table 1: Breakdown of Smoking, Tar and Lung Cancer in Subjects

20

the effect of 𝑆𝑆 on 𝑇𝑇. Thus we can calculate (from the data in the table):

Pr(𝑇𝑇 = 1 | do(𝑆𝑆 = 1)) = Pr (𝑇𝑇= 1 | 𝑆𝑆 = 1) ≈ 19/21, (6)

And

Pr(𝑇𝑇 = 1 | do(𝑆𝑆 = 0)) = Pr (𝑇𝑇= 1 | 𝑆𝑆 = 0) ≈ 1/19. (7)

So our data strongly suggests that smoking has a direct causal effect on tar, increasing the chance of tar dramatically.

Similarly, the causal effect of 𝑇𝑇 on 𝐿𝐿 is Pr(𝐿𝐿 | do(𝑇𝑇)), which should be uniform with respect to 𝑇𝑇 if 𝑇𝑇 has no direct causal effect on 𝐿𝐿. It can be shown that Pr(𝐿𝐿 | do(𝑇𝑇)) = ∑ Pr(𝐿𝐿 |𝑇𝑇, 𝑆𝑆)Pr (𝑆𝑆)𝑠𝑠 ; when calculating the causal effect of tar on lung cancer, we should control for smoking. Using our data, we obtain:

Pr(𝐿𝐿 = 1 | do(𝑇𝑇 = 1)) ≈ 1457/3800 ≈ 0.38 (8)

And

Pr(𝐿𝐿 = 1 | do(𝑇𝑇 = 0)) ≈ 547/6800 ≈ 0.08. (9)

So our data strongly suggests that tar has a direct effect on lung cancer, increasing the chances of cancer considerably.

As we have established that smoking has a positive effect on tar, and tar has a positive effect on lung cancer, we know that smoking has a positive effect on cancer. But how much would someone’s risk of lung cancer go up if they started smoking? The answer is given by:

Pr(𝐿𝐿 = 1 | do(𝑆𝑆 = 1)) − Pr (𝐿𝐿 = 1 do(𝑆𝑆 = 0) (10)

= Pr(𝑇𝑇 |𝑆𝑆 = 1)𝑇𝑇

Pr(𝐿𝐿 = 1 |𝑆𝑆,𝑇𝑇) Pr(𝑆𝑆) − Pr(𝑇𝑇| 𝑆𝑆 = 0) Pr(𝐿𝐿 = 1 |𝑆𝑆,𝑇𝑇)Pr (𝑆𝑆)𝑆𝑆𝑇𝑇𝑆𝑆

≈6329

17850−

16221144400

≈ 0.24 (11)

So smoking increases one’s risk of lung cancer by about 24%. In this example, we have seen how bayes nets methods allow us to: infer causality from probabilities, identify which variables to control to estimate effects; and calculate the result of interventions.

1.3 The abstract example

21

So far, we have utilised prior causal knowledge in combination with data to reach further causal conclusions. In this example10 we consider how to obtain a causal model without using any prior causal knowledge whatsoever. The idea is that by using the data observed, we can conclude what the conditional independencies in the variables are, and hence narrow down the possible causal models to only those which could generate such conditional independencies via the d-separation criterion. Rather than narrowing down models laboriously by checking through each possible model individually, algorithms have been created to perform the function quickly. The problem with this sort of example is that actually finding the conditional independencies from a dataset can be a complicated statistical matter. So we will instead state the real causal structure at play, assume we are able to derive the conditional independencies from a sufficiently large sample, and apply Glymour's PC algorithm (which was the first attempt to make Pearl’s IC algorithm practically implementable) to those indeterminacies.

The real model is given in Figure 3. The model could represent, for instance, a game of chance, where 𝑊𝑊 and 𝑋𝑋 are the rolls of independent dice, 𝑌𝑌 your score, randomised around 𝑊𝑊 and 𝑋𝑋, and 𝑍𝑍 is your winnings, randomised around 𝑌𝑌. In any case, the conditional independencies shown in the graph, using the d-separation criterion are: 𝑊𝑊 ⫫ 𝑋𝑋, 𝑊𝑊 ⫫ 𝑍𝑍 | 𝑌𝑌, 𝑊𝑊 ⫫ 𝑍𝑍 | 𝑋𝑋,𝑌𝑌 , 𝑋𝑋 ⫫ 𝑍𝑍 | 𝑌𝑌, and 𝑋𝑋 ⫫ 𝑍𝑍 | 𝑊𝑊,𝑌𝑌. We also assume the causal model is stable, which means that these are the only conditional independencies: so ¬(𝑌𝑌 ⫫ 𝑍𝑍) for instance.11

As stated before, we assume we observe a large dataset of 𝑊𝑊, 𝑋𝑋, 𝑌𝑌, 𝑍𝑍 jointly, and we are able to correctly determine what the conditional independencies are. We will not define the PC algorithm for simplicity,

10 Taken from (Glymour, 2010, pp.181-182) 11 (Pearl, 2000, p.48)

Figure 3: Real Model

22

but in applying it to our conditional independencies, we are actually able to completely reconstruct the graph of Figure 3. So in this case, bayes nets methods allow us to go from probabilistic knowledge alone, to a complete correct causal model (in general, it will not be possible to determine the entire causal model from the conditional independencies, but at least part of the model can be specified).

The PC algorithm assumes we observe all causally relevant variables. But this assumption is not necessary. Algorithms exist for reconstructing the causal structure, as best as possible, even when one makes no assumptions about what one has failed to observe. I omit such examples for simplicity.

2. Responding to Cartwright’s Criticisms

In this section, we focus on the criticisms Cartwright gives in her 2001 essay ‘What is wrong with Bayes Nets?’ 12, my responses to Cartwright are built upon the responses Glymour has given.13 In Cartwright’s essay she attacks two assumptions of bayes nets methods: the causal markov condition (CMC); and stability in causal models.

2.1 The Causal Markov Condition

Firstly, we address the issue of defining the CMC. The CMC is supposed to provide a link between causality and probability by claiming that causality works like a bayes net, and hence is fundamental to all bayes nets methods. Glymour understands the CMC to hold for a set of variables iff the true causal structure for that set of variables operates like a bayes net.14 A causal model can be defined as operating like a bayes net iff, if 𝑋𝑋 ⫫𝑑𝑑 𝑌𝑌 | 𝑍𝑍, then 𝑋𝑋 ⫫ 𝑌𝑌 | 𝑍𝑍 on the probability distribution.15 Pearl implicitly accepts an equivalent definition, and proves that the CMC must hold for deterministic, acyclic causal models with mutually independent errors.16 Furthermore, Pearl argues that if we commit ourselves to including all common causes of variables, and to Reichenbach’s Principle (RP) that dependence between 𝑋𝑋 and 𝑌𝑌 implies 𝑋𝑋 causes 𝑌𝑌, 𝑌𝑌 causes 𝑋𝑋 or 𝑋𝑋 and 𝑌𝑌 have a common cause, then the error variables in a deterministic graph must be independent, and so if our graph is acyclic, the CMC holds. However, neither Pearl nor Glymour believe the CMC will always hold.17

The problem with the preceding accounts of the CMC is that they are, like the RP, vulnerable to sampling bias. As we will see in more detail below, this is why Cartwright is able to construct counter-examples based on sampling bias.18 To avoid the problem of sampling bias, one should define the CMC in a way equivalent to: ‘if 𝑋𝑋 ⫫𝑑𝑑 𝑌𝑌 | 𝑍𝑍,𝑆𝑆 on the true causal structure, where 𝑆𝑆 indicates inclusion in one’s sample, then 𝑋𝑋 ⫫

12 `What is Wrong with Bayes Nets?' was republished in Cartwright's collection `Hunting Causes and Using Them', however I will focus on the 2001 version. 13 (2010) 14 (2010, p.175) 15 There are many equivalent definitions, but this is most useful for the purposes of this paper. 16 (Pearl, 2000, p.30) 17 (Pearl, 2000, pp.44-45), (Glymour, 2010, pp.200-201) 18 (Cartwright, 2001, p.259)

23

𝑌𝑌 | 𝑍𝑍 in one’s sample’. Note that there is still a connection to the RP, as 𝑋𝑋 ⫫𝑑𝑑 𝑌𝑌 | 𝑆𝑆 iff 𝑋𝑋 does not cause 𝑌𝑌 (there is no directed sequence 𝑋𝑋 →⋯ → 𝑌𝑌), 𝑌𝑌 does not cause 𝑋𝑋, there is no common cause of 𝑋𝑋 and 𝑌𝑌, and if 𝑋𝑋 ⫫𝑑𝑑 𝑌𝑌 then 𝑋𝑋 ⫫𝑑𝑑 𝑌𝑌 | 𝑆𝑆 (which can be interpreted as the independence of 𝑋𝑋 and 𝑌𝑌 not being influenced by sampling bias).

However, our definition is still problematic, because it assumes a ‘true causal structure’, which is assuredly very fine-grained. Ideally, we want a definition of the CMC which allows us to work at a coarser level as well. So I suggest a more refined definition, based on the definition of d-seperation.

Definition 3. Let the jointly observed variables in the sample be 𝑋𝑋 = (𝑋𝑋1, … , 𝑋𝑋𝑛𝑛). Let inclusion in the sample be denoted by 𝑆𝑆. Let 𝐶𝐶 ⊂ 𝑋𝑋. If 𝑎𝑎 causes 𝑏𝑏, not only via some 𝑐𝑐 ∈ 𝐶𝐶 ∪ 𝑆𝑆, we say there is a causal path 𝑎𝑎 → 𝑏𝑏. If 𝑎𝑎 only causes 𝑏𝑏 via some 𝑐𝑐 ∈ 𝐶𝐶 ∪ 𝑆𝑆, we say there is a causal path 𝑎𝑎 → 𝑐𝑐 → 𝑏𝑏. Causal paths can be joined: for instance, if there is a path 𝑎𝑎1 → ⋯ → 𝑎𝑎𝑛𝑛 and a path 𝑎𝑎𝑛𝑛 ←⋯ ← 𝑎𝑎𝑛𝑛+𝑚𝑚, then there is a path 𝑎𝑎1 → ⋯ → 𝑎𝑎𝑛𝑛 ←⋯ ← 𝑎𝑎𝑛𝑛+𝑚𝑚. If a causal path from 𝑎𝑎 to 𝑏𝑏 contains a chain

1. 𝑖𝑖 → 𝑐𝑐 → 𝑗𝑗, or 𝑖𝑖 ← 𝑐𝑐 ← 𝑗𝑗, or 𝑖𝑖 ← 𝑐𝑐 → 𝑗𝑗 for some 𝑐𝑐 ∈ 𝐶𝐶 ∪ 𝑆𝑆, or

2. 𝑖𝑖 → 𝑐𝑐 ← 𝑗𝑗 for some 𝑐𝑐 ∉ 𝐶𝐶 ∪ 𝑆𝑆, where 𝑐𝑐 does not cause anything in 𝐶𝐶 ∪ 𝑆𝑆,

Then we say that the causal path from 𝑎𝑎 to 𝑏𝑏 is causally d-separated by 𝐶𝐶 ∪ 𝑆𝑆, we say 𝑎𝑎 ⫫𝑐𝑐𝑑𝑑 𝑏𝑏 | 𝐶𝐶,𝑆𝑆. If for all 𝑎𝑎 ∈ 𝐴𝐴, 𝑏𝑏 ∈ 𝐵𝐵, we have 𝑎𝑎 ⫫𝑐𝑐𝑑𝑑 𝑏𝑏 | 𝐶𝐶,𝑆𝑆, then we say 𝐴𝐴 ⫫𝑐𝑐𝑑𝑑 𝐵𝐵 | 𝐶𝐶,𝑆𝑆. The causal markov condition states that if 𝐴𝐴 ⫫𝑐𝑐𝑑𝑑 𝐵𝐵 | 𝐶𝐶 ,𝑆𝑆, then 𝐴𝐴 ⫫ 𝐵𝐵 | 𝐶𝐶 in our sample.

I believe this adequately defines the CMC in a way which doesn’t require reference to a ‘true’ underlying graph, though I am not certain. Future work should attempt to prove that 𝐴𝐴 ⫫𝑐𝑐𝑑𝑑 𝐵𝐵 | 𝐶𝐶,𝑆𝑆 iff 𝐴𝐴 ⫫𝑑𝑑 𝐵𝐵 | 𝐶𝐶 ,𝑆𝑆 for any underlying model. The problems Cartwright poses can always be phrased in terms of an underlying model anyway, so we will work with ⫫𝑑𝑑.

Now I will address Cartwright’s alleged counter-examples to the CMC, showing that they are not true counter-examples to my definition of the CMC. The first counter-example Cartwright gives is of two causes cooperating to produce one effect in a population homogenous with respect to that effect. 19 I believe that Cartwright intends the sort of causal model given in Figure 4, where 𝑋𝑋 and 𝑌𝑌 cooperate to cause 𝑍𝑍, but 𝑍𝑍 influences inclusion in the sample 𝑆𝑆 (which can also be thought of as the sampling population). In this case, Cartwright rightly points out that, whilst 𝑋𝑋 ⫫𝑑𝑑 𝑌𝑌, often ¬(𝑋𝑋 ⫫ 𝑌𝑌)| 𝑆𝑆. Our CMC gets around this conundrum, because it is perfectly possible that ¬(𝑋𝑋 ⫫𝑑𝑑 𝑌𝑌)| 𝑆𝑆, hence allowing ¬(𝑋𝑋 ⫫ 𝑌𝑌)| 𝑆𝑆.

19 (2001, p.259)

24

The next example Cartwright gives is of different causal effects in different populations being mixed together.20 I found it hard to decipher how her example worked, but I suspect she meant something like the following. In population 1, let 𝐴𝐴 → 𝐶𝐶 ← 𝐵𝐵 , with joint probability:

Pr1(𝐴𝐴,𝐵𝐵,𝐶𝐶) = Pr1(𝐴𝐴)Pr1(𝐵𝐵)Pr1(𝐶𝐶|𝐴𝐴,𝐵𝐵) (12)

In population 2, let the causal graph be the same, but the distribution be:

Pr2(𝐴𝐴,𝐵𝐵, 𝐶𝐶) = Pr2(𝐴𝐴)Pr2(𝐵𝐵)Pr2(𝐶𝐶|𝐴𝐴,𝐵𝐵) (13)

In both populations 1 and 2, 𝐴𝐴 ⫫ 𝐵𝐵. Now consider the mixture population:

Pr(𝐴𝐴,𝐵𝐵, 𝐶𝐶) = 𝑤𝑤1Pr1(𝐴𝐴,𝐵𝐵,𝐶𝐶) +𝑤𝑤2Pr2(𝐴𝐴,𝐵𝐵,𝐶𝐶) (14)

20 (2001, p.259)

Figure 4: Cooperating Causes with Sampling Bias

25

Where 𝑤𝑤1 +𝑤𝑤2 = 1.

In this distribution, we may have that ¬(𝐴𝐴 ⫫ 𝐵𝐵). But this is fine, as there is in fact an unobserved variable 𝑇𝑇, representing which population is chosen, and so the causal graph is in fact as in figure 5.

The arrows in the graph are justified, as the difference in population makes a difference in marginal distribution of 𝐴𝐴 (unless Pr1(𝐴𝐴) = Pr2(𝐴𝐴)), a difference in marginal distribution of 𝐵𝐵 (unless Pr1(𝐵𝐵) =Pr2(𝐵𝐵)) and a difference in the dependence of 𝐶𝐶 on 𝐴𝐴,𝐵𝐵 (unless Pr1(𝐶𝐶| 𝐴𝐴,𝐵𝐵) = Pr2(𝐶𝐶| 𝐴𝐴,𝐵𝐵)). Cartwright expresses dismay at having to draw so many arrows, but I don’t see the problem. Glymour also holds that bayes nets methods work on mixture populations.21

Cartwright points out that time-series of variables can be correlated, even if variables have no causal relation. I will not deal with this example, as it requires time-series analysis; readers who are interested should consult the bibliography. In short, Glymour argues that this “correlation” is not indicative of probabilistic dependence, and that bayes nets methods can be applied by transforming the time-series in standard ways.22

Finally, Cartwright asserts that products and by-products, when produced probabilistically are mutually dependent, even conditional, on their cause. That is, she asserts that if 𝐵𝐵 ← 𝐴𝐴 → 𝐶𝐶, where 𝐵𝐵 and 𝐶𝐶 are caused non-deterministically by 𝐴𝐴, then ¬(𝐵𝐵 ⫫ 𝐶𝐶)| 𝐴𝐴. This is in stark contrast to what bayes nets methods say: 𝐵𝐵 ⫫ 𝐶𝐶 | 𝐴𝐴. Indeed, it is easy to create an example in which Cartwright’s claim is violated: if 𝐴𝐴 is the roll of two dice, 𝐵𝐵 is randomised around the dice’s sum, and 𝐶𝐶 is randomised around their difference, then 𝐵𝐵 ⫫ 𝐶𝐶 | 𝐴𝐴. The question is whether Cartwright’s claim is ever true. I am sure it is in cases with sampling bias, and it must be such cases which Cartwright has in mind; in which case, the solution to these cases is the same as earlier: the CMC should condition on inclusion in the sample.

21 (2010, pp.164, 206) 22 (2010, pp.164, 202)

Figure 5: Mixture Model

26

2.2 Stability

As we explained in section 1.3, stability is the assumption that the conditional independencies given by applying the d-separation criterion to the causal model are the only conditional independencies. This assumption is used as one of several jointly sufficient conditions for proving that algorithms such a PC will always specify the correct causal model as specifically as possible.23 If stability fails to hold of the true causal structure, then applying PC may (but not necessarily) result in an incorrect model.

Cartwright’s argument boils down to asserting that stability is often violated for scientific data, and hence we are not justified in applying algorithms such as PC to such data.24 However, Cartwright also confuses the sufficiency of stability for algorithmic methods with the necessity of stability for all bayes nets methods, leading Cartwright to remark how odd it is that bayes nets methods prohibit the existence of causal structures that violate stability.25

Ignoring Cartwright's conflation (which seems to be due to Pearl's flawed justification of stability, which we come to shortly), Cartwright gives an example in which a drug (birth control pill) has both positive and negative effects on an illness (thrombosis).26 She considers that we may want to develop a new version of the drug, for which these effects cancel out, thus violating stability. This example is confused: bayes nets methods do not assume stability when calculating the effects of interventions, such as changing the drug, nor do they stop us from testing whether the effects cancel out. Cartwright's example would only be a problem if the existing drug's effects already cancelled out, i.e. we had little to no knowledge of the prior causal structure, and we wished to infer the structure by observation, using an algorithm such as PC.

Nevertheless, the fact that Cartwright's example is confused does not imply that stability always holds. Pearl gives several justifications of stability, but at least one of them is flawed. Pearl considers an example in which two independent fair coins are flipped and a bell is tolled when the coins land the same.27 Pearl notes that, in such an example, each variable is mutually independent of each other, but mutually dependent conditional on the third. A condition called minimality (which we will not define) is not specific enough for inferring the causal model from the observational data. Pearl asserts that the model 𝐶𝐶1 → 𝐵𝐵 ←𝐶𝐶2, where 𝐶𝐶1, 𝐶𝐶2 are the coins and 𝐵𝐵 the bell, which is of course the correct causal model, is the only minimal, stable model. Hence Pearl motivates stability as a more precise condition, allowing one to hone in on the correct model. But, as Cartwright points out (and appears to become confused by, as above), this model is not stable: in fact, there is no stable minimal model. So this is a legitimate instance of stability violation on the true model.

Glymour addresses Cartwright’s concern by pointing out that some algorithms can be proved to work without assuming stability.28 I offer that we should normally be able to detect instability by using our

23 (Glymour, 2010, pp.182-184) 24 (2001, pp.251-254) 25 (2001, p.252) 26 (2001, pp.246-253) 27 (2000, p.48) 28 (Glymour, 2010, pp.163-164)

27

knowledge about the variables. For instance, it seems clear in Cartwright’s biological example that it is highly improbable for two independent biological effects to perfectly cancel each other out. If, on the other hand, we are dealing with fair coins and bells in which someone has deliberately set up a causal system, it may well be that the causal system was set up to be unstable.

3. Conclusion

In section 1, I gave a few example applications of bayes nets methods. I showed how bayes nets methods allow us to: consisely express causal knowledge, to infer probabilistic knowledge from causal knowledge, to infer causal knowledge from probabilistic knowledge (with or without prior causal knowledge); and to identify which variables to control for in order to calculate the effect of interventions. In section 2, I built on Glymour’s responses to Cartwright’s criticisms of bayes nets methods. In doing so, I argued that Cartwright’s concerns could be rectified if we accept a new definition of the casual markov condition.

28

Bibliography

Cartwright, N. (2001) ‘What is Wrong with Bayes Nets?’, The Monist, vol. 84, no. 2, pp.242-264.

Cartwright, N. (2007) ‘Hunting Causes and Using Them: Approaches in Philosophy and Economics’, Cambridge University Press.

Glymour, C. (2010) ‘What is Right with 'Bayes Net Methods' and What is Wrong with 'Hunting Causes and Using Them'?’, British Journal for the Philosophy of Science, vol. 61, no. 1, pp.161-211.

Pearl, J. (2000) ‘Causality: Models, Reasoning, and Inference’, Cambridge University Press.

Woodward, J. (2013) ‘Causation and Manipulability’, in The Stanford Encyclopedia of Philosophy, (Online), winter 2013 ed., E. N. Zalta, Ed., 2013. Available at:

http://plato.stanford.edu/archives/win2013/entries/causation-mani/

29

critique issue 1

Documents