decision problems & procedures for handling them

Upload: norwegian79

Post on 10-Jan-2016

87 views

Category:

Documents


7 download

DESCRIPTION

Decision Problems

TRANSCRIPT

  • DECISION PROBLEMS ANDPROCEDURES FOR HANDLING THEM

    Per-Erik Malmnasand

    Andreas Paulsson

    version 06.09.2015

  • Contents

    1 Introduction 41.1 Decision problems . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.1.1 Choice of medical treatment (adapted from Weinsteinand Fineberg [1980, p. 180]) . . . . . . . . . . . . . . . 5

    1.1.2 A choice between two languages for computer programs 51.1.3 Monty Hall . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2 Procedures for handling decision problems: the decision theo-retic approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2 Classical Decision Theory 292.1 Pure Classical Decision Theory . . . . . . . . . . . . . . . . . 292.2 Modified Classical Decision Theory . . . . . . . . . . . . . . . 32

    2.2.1 Introduction to utilities . . . . . . . . . . . . . . . . . . 32

    3 Supersoft Decision Theory 343.1 Representation of Decision Problems . . . . . . . . . . . . . . 343.2 Satisfiable and not Satisfiable Decision Frames . . . . . . . . . 383.3 Evaluations of Options and Decision Rules in wSSD . . . . . . 41

    3.3.1 Qualitative Evaluations . . . . . . . . . . . . . . . . . . 413.3.2 Expected Value . . . . . . . . . . . . . . . . . . . . . . 433.3.3 Decision Rules Based on Extreme Values . . . . . . . . 46

    4 Other Approaches to Decisions under Uncertainty 484.1 Hodges and Lehmann (1952) . . . . . . . . . . . . . . . . . . . 484.2 Blum and Rosenblatt (1967) . . . . . . . . . . . . . . . . . . . 494.3 Watson (1974) . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.4 Levi (1974) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.5 Gardenfors and Sahlin (1982) . . . . . . . . . . . . . . . . . . 51

    1

  • 5 Evaluations and Choice Rules 525.1 Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . 525.2 Miscellaneous remarks . . . . . . . . . . . . . . . . . . . . . . 56

    5.2.1 On Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 565.2.2 On Options . . . . . . . . . . . . . . . . . . . . . . . . 565.2.3 Classification of Evaluations . . . . . . . . . . . . . . . 565.2.4 On the Relation between Preference Tests and Choice

    Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.3 Evaluations and Common Ratio Tests . . . . . . . . . . . . . . 58

    5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 585.4 An Examination of Some Proposals . . . . . . . . . . . . . . . 59

    5.4.1 Hagen (1969; 1972; 1979) . . . . . . . . . . . . . . . . . 595.4.2 Fishburn (1983) . . . . . . . . . . . . . . . . . . . . . . 615.4.3 Loomes and Sugden (1986) . . . . . . . . . . . . . . . . 625.4.4 Green and Jullien (1988) . . . . . . . . . . . . . . . . . 635.4.5 Quiggin (1982) . . . . . . . . . . . . . . . . . . . . . . 635.4.6 Yaari (1987) . . . . . . . . . . . . . . . . . . . . . . . . 64

    5.5 On Continuous Evaluations . . . . . . . . . . . . . . . . . . . 655.5.1 Polynomial evaluations of options in two variables . . . 655.5.2 Polynomial evaluations of options with three outcomes 66

    5.6 Axiomatic Utility Theory and Expected Utility . . . . . . . . 675.6.1 Fundamental Concepts . . . . . . . . . . . . . . . . . . 685.6.2 Herstein and Milnor (1953) . . . . . . . . . . . . . . . 685.6.3 Savage (1954; 1972) . . . . . . . . . . . . . . . . . . . . 795.6.4 Oddie and Milne (1990) . . . . . . . . . . . . . . . . . 835.6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 86

    5.7 Allais Example and Expected Utility . . . . . . . . . . . . . . 875.8 Lotteries with Non-monetary Prizes . . . . . . . . . . . . . . . 88

    Appendices 94

    Appendix 1 Elementary Probability 95

    Appendix 2 Historical Notes 1162.1 The Classical Problem of Points . . . . . . . . . . . . . . . . . 116

    2.1.1 Pascals solution . . . . . . . . . . . . . . . . . . . . . 1162.1.2 Fermats solution . . . . . . . . . . . . . . . . . . . . . 118

    2.2 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . 1192.2.1 Pascals approach . . . . . . . . . . . . . . . . . . . . . 1192.2.2 Fermats approach . . . . . . . . . . . . . . . . . . . . 120

    2.3 Other Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    2

  • 2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    Appendix 3 Monte Carlo simulations 1333.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

    3

  • Chapter 1

    Introduction

    What will I suffer now? These words by Odysseus, one of the most successfuldecision makers of Western literature, expresses what most people at timesexperience both in their private life and as professionals. It is also worthnoting that they are not outdated, since in contrast to other areas man asa decision maker in general relies more on intuition rather than rules whensolving decision problems. Since such a procedure hardly can be seen as anideal one, especially if a proposed solution must be supported by arguments,it is natural to look into the possibility of finding formal methods like thosein logic for solving decision problems.

    1.1 Decision problems

    If you are looking for the right kind of white color for the walls of yourdrawing-room, or the perfect sour milk, then you are facing an optimizationproblem. If instead you are pondering which brand of yogurt to buy at yourlocal grocery, or of finding the ingredients of a decent meal there, then youare up to a problem of choice, or a problem of satisfaction. Decision problemscome in one of these forms or as combinations of them. A good example ofthe latter case is the procedure Swedish authorities follow when selectingtenders under the Act of Public Procurement. First each tender is subjectedto a test in order to see if it fulfills certain minimum requirements. Then aselection is made out of the tenders that had passed the initial test.

    There is also another and perhaps more familiar way of classifying de-cisions, namely in terms of timescales. In this course only decisions wherethere is ample time for deliberation are studied. Hence operational decisionsmade by firefighters, policemen and soldiers in the field are not considered.Moreover, the main focus will be on professional decision making since such

    4

  • decisions ought to be well founded. However, as an introduction, three simpleproblems are presented.

    1.1.1 Choice of medical treatment (adapted from We-instein and Fineberg [1980, p. 180])

    A 68 year old woman has suffered from bad circulation in the left leg forsome time. Shes now seeing a doctor because of an injury that has led toan infection that may develop into gangrene in the left foot. There are twoavailable options: either an immediate operation O leading to the insertion ofan artificial limb below the left knee, or a treatment with drugs during threemonths M. Such a treatment is successful in seven out of ten cases. But ifit fails, a more complicated operation leading to the insertion of an artificiallimb above the left knee is needed. The doctor considers the probabilitiesfor a successful operation to be 99 percent if it is done right away, and 90percent if it is done after an unsuccessful treatment with drugs. The doctoris also willing to let the preferences of the patient influence her proposals.How should she act?

    1.1.2 A choice between two languages for computerprograms

    A result of a cooperation with section TR of ELLEMTEL, a company ownedjointly by Telia and Ericsson.

    In spring and summer 1987 TR had developed a small prototype of asupport system for telephone services consisting of a simulator and a planner.The prototype was written in Prolog and was presented to various audiences.Since these presentations were quite successful it was decided that TR shoulddevelop a more elaborate prototype containing more tools. Large parts ofthis prototype was implemented during the autumn but it turned out thatfitting different parts to one another was harder than expected since theProlog system used could not handle large programs. The staff spent a lotof time learning more about Prolog and how to deal with large programsand in spring 1988 it was decided that the prototype should be ready forpresentation after the summer. The available options turned out to be thefollowing ones:

    A: The whole system is written in CB: The section continues searching for a version of Prolog that canhandle large programs

    5

  • C: The project is temporarily abandoned

    TR then embarked upon evaluating the options by stating their pros andcons:

    Pro A: Then number of man hours needed could be estimated withoutdifficulty. The finished product would in all probability be stable andenable swift computations.

    ConA: The whole code must be rewritten and getting hold of proficientC programmers could be a problem. Moreover, the staff would be forcedto spend a lot of time programming in C which in turn would stop themfrom engaging in more interesting work.

    Pro B: If a stable Prolog system is found, then the prototype wouldbe completed with limited effort. Hence the staff could spend time onimproving the system by adding more tools.

    Con B: Only parts of a prototype might be completed in September.Pro C: New and better Prolog systems are likely to appear in a not toodistant future. Hence TR should for the moment engage in researchand development rather than develop a prototype.

    Con C: The whole approach of TR might be discredited if a completeprototype isnt completed by September.

    At a preliminary meeting with me it turned out that C was not a viableoption, and I was asked to demonstrate the possible merits of a decisiontheoretic approach to make a choice between A and B. The result is shownin the next section.

    1.1.3 Monty Hall

    Suppose that you are to select one of three doors in a popular American TVshow. You are informed that there is a car behind one of the doors, and alemon behind each of the other ones. When you have made your choice, thehost Monty Hall, opens one of the doors that you didnt choose, and behindwhich there is a lemon. You are now invited to make a new choice. Shouldyou accept the invitation and choose the other, still closed, door?

    6

  • 1.2 Procedures for handling decision prob-

    lems: the decision theoretic approach

    1.2.1 Preliminaries

    The salient feature of a decision theoretic approach to decision problems isthat an option considered is evaluated in accordance with its consequences.This entails that an evaluation of an option must be preceded by both anenumeration and an evaluation of its consequences. For some decisions, seefor instance the Monty Hall example above, this causes no problem, but inthe other cases considered above, it is far from obvious how this is to beaccomplished. I will return to these problems in a moment after a few usefulgeneral steps have been considered.

    First of all it must be decided if the problem considered is a problem ofchoice or a problem of satisfaction, or even an optimization problem. Thena suitable framework has to be considered. Take as an example the problemof how to handle the nuclear waste from the nuclear power plants in Sweden.This could be seen either as a problem limited to nuclear waste or in themore general context of long lived toxic waste. In this area feasible strategiesclearly depend on the framework chosen. Furthermore, a horizon must befixed since it is in practice impossible to enumerate all consequences of agiven option. Finally the evaluation of the consequences must in some casesproceed in stages starting from a consideration of price or quality and endingin a final evaluation of them.

    1.2.2 Examples

    In this section it is shown how a decision theoretic approach can be usedto solve the problems mentioned above. The method used will be describedin more detail in chapter 3 followed by some theoretical underpinnings inchapters 4 and 5.

    Gangrene First, the doctor should try to find out if there exists a decisivefactor. In this case two such factors come to mind: the possibility of arecovery and the risk of serious complications. If one of these is decisive forthe patient, then the problem is an easy one. But if this is not the case,then some trade off between values and probabilities is needed. One way ofdoing this is to let the probabilities of the various consequences act as weightsand consider the weighted arithmetic mean of the values of the consequencesof the options. Then the option with highest mean value is chosen. Sadly,

    7

  • this approach presupposes a mathematical representation of the values of theoutcomes, something found strange by some people in the medical profession.However, to my mind this can be done in an uncontroversial way. Let us firstlabel the consequences as follows:

    r: recovery

    b: an artificial limb below the knee

    a: an artificial limb above the knee

    s: serious consequences

    (Note that s in reality is a set of consequences, but since these are thesame from the options considered, there is no point in enumerating them.)

    Let v be a value function, which maps a consequence to a specific value.Consequently, if c is a consequence, then v(c) is the value of that consequence.Returning to our decision problem, and with the help of the value functionv, we can now define the values of the consequences: v(r) is the value of arecovery, v(b) is the value of an artificial limb below the knee, v(a) is thevalue of an artificial limb above the knee, and v(s) is the value of seriouscomplications.

    If we let the values of the consequences correspond to some numericalvalue, and the better the consequence the higher that numerical value wouldbe, most patients are likely to rank the consequences as

    v(r) > v(b) > v(s), (1.1)

    v(r) > v(a) > v(s), (1.2)

    meaning that recovery is better than an artificial limb below the knee, whichin turn is better than serious consequences. Likewise, recovery is better thanan artificial limb above the knee, which is better than serious consequences.

    Most patients would also set

    v(b) > v(a) (1.3)

    so I will concentrate on this case.Let us now calculate the mean consequence value (denoted M) of an im-

    mediate operation, MO and of a drug treatment, MM. The decision problemcan be modeled as a tree structure with nodes connected by entities (seefigure 1.1 and 1.2), where the numbers above the entities represent the prob-abilities of the respective outcome. For example, if we would choose thealternative O, then the probability of a successful operation b is 0.99. In the

    8

  • Figure 1.1: The tree of an immediate operation, showing the probabilityof a successful operation b, to be 99 percent, and the probability of seriouscomplications s to be 1 percent.

    Figure 1.2: This is the tree of the choice of a drug treatment. The probabilityfor recovery r is 70 percent. Should the drug treatment fail, the probability ofa successful operation with an artificial limb above the knee a is 90 percent,and the probability of serious consequences s is 10 percent.

    tree of M (see figure 1.2), the symbol o stands for an operation, given thatthe drug treatment failed.

    A mean consequence value is the weighted average of the mean conse-quence values of the subsequent nodes, and the weighted average is the sumof all consequence values multiplied by their respective probabilities. Look-ing at the tree in figure 1.3, the weighted average of the nodes is calculatedas

    aP (a) + bP (b) + cP (c)

    where a, b and c represents the value of each node and P is the probabilityfunction. You can read more about the probability function in appendix 1.

    The mean consequence value of M, is the weighted average of v(o) andv(r). We have already set v(r) = 1, but we need to calculate the value of o,which in turn is the weighted average of v(s) and v(b), in order to calculate

    9

  • Figure 1.3: The weighted average of a, b and c, is calculated as aP (a) +bP (b) + cP (c).

    MM. Lets start by calculating v(o), and then continue with MM and MO:

    v(o) = P (a)v(a) + P (s)v(s)

    MM = P (o)v(o) + P (r)v(r)

    MO = P (b)v(b) + P (s)v(s)

    (1.4)

    From the inequalities in (1.2) we know that r is the best consequence,and s is the worst consequence. Setting v(r) = 1 and v(s) = 0 gives us anumerical interval of consequence values ranging from 0 to 1, which enables

    10

  • us to simplify v(o), MM and MO to

    v(o) = P (a)v(a) + P (s)v(s)

    = P (a)v(a) + P (s) 0= P (a)v(a)

    = 0.9v(a)

    MM = P (o)v(o) + P (r)v(r)

    = P (o)v(o) + P (r) 1= P (o)v(o) + P (r)

    = 0.3 (0.9v(a)) + 0.7

    = 0.27v(a) + 0.7

    MO = P (b)v(b) + P (s)v(s)

    = P (b)v(b) + P (s) 0= P (b)v(b)

    = 0.99v(b)

    (1.5)

    To determine which of the alternatives O and M has the highest value,we can calculate the difference M between the mean consequence values ofthe two alternatives as

    M = MM MO= 0.27v(a) + 0.7 0.99v(b) (1.6)

    Consequently, if M is positive, the preferred alternative should be atreatment with drugs M, and if its negative the rational choice should bean immediate operation O.

    A three dimensional plot of the equation in (1.6) can be seen in figure 1.4(one dimension for each variable v(a) and v(b), and one dimension for theresult M). The plots triangular shape is due to the constraint in (1.3),that v(b) > v(a). As v(b) 0 and v(a) 0, M approaches its maximumMmax, and as v(b) 1 and v(a) 0, M approaches its minimum Mmin.

    From (1.2) we know that v(b) and v(a) can only take on values in the socalled open interval ]0, 1[, i.e. any value in between 0 and 1, but not exactly0, or exactly 1. Thus

    Mmax < 0.27 0 + 0.7 0.99 0 = 0.7Mmin > 0.27 0 + 0.7 0.99 1 = 0.29

    11

  • Figure 1.4: From the plot we can see that M approaches its maximumwhen both v(a) and v(b) approaches 0, and approaches its minimum whenv(a) approaches 0 and v(b) approaches 1.

    and hence M must lie somewhere in the interval

    0.29 < M < 0.7.Since M can take on both negative and positive values, we need more

    constraints on the values in order to favor one of the options. In particular wemust introduce the notion of distance between values. One such constraintthat comes to mind is

    v(r) v(b) v(b) v(a)which reflects the opinion that the distance in value between recovery and anartificial limb below the knee, is not smaller than the distance between twodifferent artificial limbs. To employ the inequality we can rearrange things abit:

    v(r) v(b) v(b) v(a)

    v(r) 2v(b) v(a)

    v(r) + v(a) 2v(b)

    v(b) v(r) + v(a)2

    and just as above, set v(r) = 1, which in turn gives

    v(b) 1 + v(a)2

    .

    12

  • At the same time we know from (1.3) that

    v(a) < v(b)

    and thus we have

    v(a) < v(b) 1 + v(a)2

    .

    Applying the above inequality to M gives us

    0.27v(a) + 0.7 0.991 + v(a)2

    M < 0.27v(a) + 0.7 0.99v(a)

    0.205 0.225v(a) M < 0.7 0.72v(a)(1.7)

    and thus a new interval for M , depending only on v(a) where 0 < v(a) < 1,as in figure 1.5.

    0.2 0.4 0.6 0.8 1.0vHaL

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    M

    Figure 1.5: This plot shows the boundaries of M as functions of v(a) only.

    By using (1.7) and solving

    0.7 0.72v(a) = 0 v(a) = 0.70.72

    = 0.972222

    we see that for M to lie completely in a negative interval (i.e. for animmediate operation MO to be preferred over a treatment with drugs MM),the value of an artificial limb above the knee v(a) need to be greater than0.972222. But then, since v(r) > v(b) > v(a), the values of v(r), v(b) andv(a) would be so close that an operation now can only be recommended ifthe risk of serious complications is a decisive factor. Hence the doctor shouldrecommend a medical treatment even in this case. This is also a reasonablerecommendation even in the absence of the inequality

    v(r) v(b) v(b) v(a)

    13

  • because it seems safe to assume that the value of an artificial limb abovethe knee v(a), is at least as high as 0.8. Setting v(a) = 0.8 in the originalequation for M , gives us

    M = 0.27v(a) + 0.7 0.99v(b) (1.8)= 0.27 0.8 + 0.7 0.99v(b) (1.9)= 0.916 0.99v(b) (1.10)

    which in turn means that v(b) must higher than 0.925... if M is to be lessthan zero, and consequently an immediate operation MO is to be preferred.

    Suitable language for a computer program: a decision theoreticapproach As a first step it was agreed upon that the consequences ci ofthe two options were the following ones:

    Option Ac1 Prototype in C ready in September 1988, costs quite high, staff

    not entirely satisfied

    c2 Prototype in C somewhat delayed due to circumstances outsidethe control of TR

    c3 Neither c1 nor c2

    Option Bc4 Prototype in Prolog ready in September 1988, low costs, staff

    pleased

    c5 Prototype in Prolog somewhat delayed due to circumstances out-side the control of TR

    c6 Only fractions of a prototype ready in September 1988, low costs,staff frustrated

    c7 None of c4, c5 or c6

    It was then agreed upon that the probabilities and values were the fol-lowing ones:

    Probabilities The probability of c1 is quite high, at least 2/3. It isalmost certain that c1 or c2 occurs. The probability of the event c4 or c5 isquite uncertain, could be as low as 0.1 and as high as 0.6. The event c4 orc5 or c6 is almost certain.

    14

  • Values The most desirable consequence is c4, which is slightly betterthan c5. These two are clearly more desirable than c1, which is slightly betterthan c2 whereas c6 is much worse than c2.

    Comparison of the options As is to be expected, the staff did notconsider any factor as decisive. Therefore it is natural to compare the meanvalues of the options or, to borrow a term from probability theory, theirexpected values. As in the previous example this presupposes a mathematicalrepresentation. After some discussion TR accepted the following one:

    P (c1) 2/3P (c1) + P (c2) = 1 (1.11)

    0.1 P (c4) + P (c5) 0.6P (c4) +O(c5) + P (c6) = 1 (1.12)

    v(c6) < v(c2) < v(c1) < v(c5) < v(c4)

    v(c1) v(c2) = v(c4) v(c5) < v(c5) v(c1) < v(c2) v(c6) (1.13)Here P (ci) is the probability of ci, and v(ci) is its numerical value. Note

    that the representation is not to be viewed as an exact one. For instance,the interval [0.1, 0.6], should be viewed as sufficiently large for P (c4)+P (c5).Hence we must be careful not to let a ranking of the options depend on valuesnear the boundaries.

    To simplify the comparison between the options we use the variables a, band c such that

    a = P (c1),

    b = P (c4),

    c = P (c5).

    Take note that we now, according to (1.11) and (1.12), can substituteP (c2) with 1 a, and P (c6) with 1 (b+ c). In addition, we set

    v(c4) = 1 (1.14)

    v(c6) = 0

    to reflect the best and the worst consequence, and use the variables x and ywhere

    x = v(c2) (1.15)

    y = v(c1) x = v(c1) v(c2) = v(c4) v(c5) (1.16)

    15

  • From (1.13) we know that

    v(c4) v(c5) = v(c1) v(c2)

    so we can, according to (1.14), (1.15) and (1.16), make the substitution

    v(c5) = v(c4) + v(c2) v(c1)= 1 + x (x+ y)= 1 y.

    Calculating the expected value of the alternatives A and B gives us

    E[A] = P (c1)v(c1) + P (c2)v(c2)= a(y + x) + (1 a)x= ay + ax+ x ax= x+ ay

    and

    E[B] = P (c4)v(c4) + P (c5)v(c5) + P (c6)v(c6)= b+ c(1 y) + (1 (b+ c))= b+ c(1 y)

    where the intervals for a, b, c, where again a = P (c1), b = P (c4) and c =P (c5), are

    2

    3 a 1 (1.17)0 b (1.18)0 c (1.19)

    0.1 b+ c 0.6 (1.20)

    The intervals for x and y are a bit more tricky. In (1.15) we set

    x = v(c2)

    and in (1.16)

    y = v(c1) x = v(c1) v(c2) = v(c4) v(c5).

    In other words, y represents the difference between v(c1) and v(c2), andbetween v(c4) and c(v5). Since v(c6) = 0, x represents the difference between

    16

  • v(c2) and v(c6). If we add another variable z = v(c5) v(c1) and look at theinequalities in (1.13), we see that

    0 < y < z < x < 1,

    2y + z + x = 1

    and thus we have that

    0 < y < 0.25 < x < 1,

    x+ y > 0.5.

    We now start the comparison of E[A] and E[B] by computing the maxi-mum, minimum and mean of both, using the domain

    D = {(a, b, c) | 2/3 a 1, 0 b, 0 c, 0.1 b+ c 0.6}specified above. Repeating the formulas for the expected values of A and Bwe have

    E[A] = x+ ay (1.21)E[B] = b+ c(1 y) (1.22)

    We begin with calculating the minimum value of E[A]. From (1.21) itshould be clear that we need to set a to its minimum value according to D,and thus we obtain

    min(E[A]) = x+ 23y.

    To get the maximum value of E[A] we set a to its largest value and get

    max(E[A]) = x+ y.

    We then treat E[B] in the same way which gives

    min(E[B]) = 0.1(1 y),max(E[B]) = 0.6.

    As is customary, the mean of the means is defined as the integral of themeans over D, divided by the area of D. This gives

    mean(E[A]) =

    D (x+ ay) dAD dA

    17

  • and

    mean(E[B]) =

    D (b+ c(1 y)) dAD dA

    .

    Starting with DdA

    we notice that the boundaries of a dont pose any particular problems. How-ever, the boundaries of b and c are not as simple. Looking at the region plotin figure 1.6 we see that the region bounded by b and c forms a triangle withone of the corners missing. Thus we can start by integrating over the wholetriangle and then subtract the integration over the missing corner.

    0.0 0.1 0.2 0.3 0.4 0.5 0.6

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Figure 1.6: The region bounded by b and c.

    The line that makes up the top boundary of the triangle can be expressedas

    c = 0.6 b = 35 b

    where 0 b 35, and the line making up the top boundary of the missing

    corner can similarly be expressed as

    c = 0.1 b = 110 b

    18

  • where 0 b 110

    . Now we can write the integral over D as 1a= 2

    3

    35

    b=0

    35b

    c=0

    dc db da 1a= 2

    3

    110

    b=0

    110b

    c=0

    dc db da

    Calculating one term at the time result in 1a= 2

    3

    35

    b=0

    35b

    c=0

    dc db da =

    1a= 2

    3

    35

    b=0

    [c]35b

    0 db da

    =

    1a= 2

    3

    35

    b=0

    (3

    5 b)db da

    =

    1a= 2

    3

    [3

    5c b

    2

    2

    ] 35

    0

    da

    =

    1a= 2

    3

    (9

    25 9

    50

    )da

    =

    [9

    25a 9

    50a

    ]123

    =9

    25 9

    50(

    6

    25 6

    50

    )=

    3

    50

    19

  • and 1a= 2

    3

    110

    b=0

    110b

    c=0

    dc db da =

    1a= 2

    3

    110

    b=0

    [c]110b

    0 db da

    =

    1a= 2

    3

    110

    b=0

    (1

    10 b)db da

    =

    1a= 2

    3

    [1

    10c b

    2

    2

    ] 110

    0

    da

    =

    1a= 2

    3

    (1

    100 1

    200

    )da

    =

    [1

    100a 1

    200a

    ]123

    =1

    100 1

    200(

    2

    300 1

    300

    )=

    1

    600

    which finally givesDdA = 1

    a= 23

    35

    b=0

    35b

    c=0

    dc db da 1a= 2

    3

    110

    b=0

    110b

    c=0

    dc db da =

    3

    50 1

    600=

    7

    120.

    Continuing in the same manner with calculatingD

    (x+ ay) dA = 1a= 2

    3

    35

    b=0

    35b

    c=0

    (x+ ay) dc db da 1a= 2

    3

    110

    b=0

    110b

    c=0

    (x+ ay) dc db da,

    20

  • taking one term at a time, gives 1a= 2

    3

    35

    b=0

    35b

    c=0

    (x+ ay) dc db da =

    1a= 2

    3

    35

    b=0

    [cx+ acy]35b

    0 db da

    =

    1a= 2

    3

    35

    b=0

    (3

    5x bx+ 3

    5ay aby

    )db da

    =

    1a= 2

    3

    [3

    5bx b

    2x

    2+

    3

    5aby ab

    2y

    2

    ] 35

    0

    da

    =

    1a= 2

    3

    (9

    25x 9

    50x+

    9

    25ay 9

    50ay

    )da

    =

    [9

    25ax 9

    50ax+

    9

    50a2y 9

    100a2y

    ]123

    =

    [9

    50ax+

    9

    100a2y

    ]123

    =9

    50x+

    9

    100y

    (18

    150x+

    36

    900y

    )=

    3

    50x+

    1

    20y

    21

  • and 1a= 2

    3

    110

    b=0

    110b

    c=0

    (x+ ay) dc db da =

    1a= 2

    3

    110

    b=0

    [cx+ acy]110b

    0 db da

    =

    1a= 2

    3

    110

    b=0

    (1

    10x bx+ 1

    10ay aby

    )db da

    =

    1a= 2

    3

    [1

    10bx b

    2x

    2+

    1

    10aby ab

    2y

    2

    ] 110

    0

    da

    =

    1a= 2

    3

    (1

    100x 1

    200x+

    1

    100ay 1

    200ay

    )da

    =

    [1

    100ax 1

    200ax+

    1

    200a2y 1

    400a2y

    ]123

    =

    [1

    200ax+

    1

    400a2y

    ]123

    =1

    200x+

    1

    400y 2

    600x 4

    3600y

    =1

    600x+

    1

    720y

    resulting inD

    (x+ ay) dA = 1a= 2

    3

    35

    b=0

    35b

    c=0

    (x+ ay) dc db da 1a= 2

    3

    110

    b=0

    110b

    c=0

    (x+ ay) dc db da =

    3

    50x+

    1

    20y

    (1

    600x+

    1

    720y

    )=

    7

    120x+

    7

    144y

    which in turn gives

    mean (E[A]) =

    D (x+ ay) dAD dA

    =7

    120x+ 7

    144y

    7120

    = x+5

    6y. (1.23)

    To calculate the mean of E[B], we first solveD

    (b+ c(1 y)) dA = 1a= 2

    3

    35

    b=0

    35b

    c=0

    (b+ c(1 y)) dc db da 1a= 2

    3

    110

    b=0

    110b

    c=0

    (b+ c(1 y)) dc db da.

    22

  • Starting with the first term we have 1a= 2

    3

    35

    b=0

    35b

    c=0

    (b+ c(1 y)) dc db da =

    =

    1a= 2

    3

    35

    b=0

    [bc+

    c2(1 y)2

    ] 35b

    0

    db da =

    =

    1a= 2

    3

    35

    b=0

    (9

    50 b

    2

    2 9y

    50+

    3by

    5 b

    2y

    2

    )db da =

    =

    1a= 3

    5

    [9b

    50 b

    3

    6 9by

    50+

    3b2y

    10 b

    3y

    6

    ] 35

    0

    da =

    =

    1a= 3

    5

    (9

    125 9y

    250

    )da =

    =

    [9a

    125 9ay

    250

    ]123

    =

    =9

    125 9y

    250(

    6

    125 3y

    125

    )=

    =3

    125 3y

    250

    23

  • and the second term yields 1a= 2

    3

    110

    b=0

    110b

    c=0

    (b+ c(1 y)) dc db da =

    =

    1a= 2

    3

    110

    b=0

    [bc+

    c2(1 y)2

    ] 110b

    0

    db da =

    =

    1a= 2

    3

    110

    b=0

    (1

    200 b

    2

    2 y

    200+by

    10 b

    2y

    2

    )db da =

    =

    1a= 2

    3

    [b

    200 b

    3

    6 by

    200+b2y

    20 b

    3y

    6

    ] 110

    0

    da =

    =

    1a= 2

    3

    (1

    3000 y

    6000

    )da =

    =[ a

    3000 ay

    6000

    ]123

    =

    =1

    3000 y

    6000(

    1

    4500 y

    9000

    )=

    =1

    9000 y

    18000

    which taken together givesD

    (b+ c(1 y)) dA = 3125 3y

    250(

    1

    9000 y

    18000

    )=

    43

    1800 43y

    3600.

    We then have

    mean(E[B]) =

    D (b+ c(1 y)) dAD dA

    =43

    1800 43y

    36007

    120

    =43

    105 43y

    210. (1.24)

    A pairwise comparison between these linear forms then yields

    max = max(E[A])max(E[B]) = x+ y 0.6,mean = mean(E[A])mean(E[B]) = x+ 109

    105y 43

    105,

    min = min(E[A])min(E[B]) = x+2y

    3 0.1(1 y),

    24

  • where again 0.25 < x < 1 and 0 < y < 0.25. Hence only max can havedifferent signs, whilst mean and min always will be positive. But if D isreplaced by

    D = {(a, b, c) : 2/3 a 1, 0 b, 0 c, 0.1 b+ c 0.5} (1.25)then max(E[B]) = 0.5 and consequently

    max = x+ y 0.5 > 0since x+ y > 0.5.

    In order to test the stability of the sign of mean, D can be replaced byD = {(a, b, c) : 2/3 a 0.7, 0 b, 0 c, 0.3 b+ c 0.6} (1.26)

    which still yields

    mean = x+111

    124y 203

    465> 0 (1.27)

    (the calculations are left to the reader in order to save space) since x+y > 0.5and x > 0.25. The sign of mean is remarkably stable, and since othersensitivity tests yield similar results, we can conclude thatA is to be preferredto B.

    Monty Hall We would have reasoned as follows. In line with the classicaltheory of probability, the distribution of objects behind the doors shouldbe viewed as the outcome of a stochastic experiment. Moreover, there aretwo possibilities with regard to the distributions. Either all distributionsare equally likely, or the car is most likely to appear behind the door inthe middle since the show is sponsored by a car maker who wants the bestdisplay of the car. These considerations would have led us to select the doorin the middle. Assume now that the host opens door number one. Is it thenmore likely that the car is behind door number three than behind the doorin the middle? An answer to this question must await the result of a fewcalculations depending on the cases mentioned above.

    Case 1. All distributions are equally likely. Let Ci be the eventthat the car has been put behind door i, then

    P (C1) = P (C2) = P (C3) = 1/3.

    Now, let Si be the event that door i has been selected and Oi the event thatdoor number i has been opened by the host. Then

    P (Ci) = P (Ci|Sj)

    25

  • for i, j = 1, 2, 3. In other words, since the objects were put behind the doorsin advance, the probability that the car has been put behind door i remainsthe same, regardless of which door has been selected; the expression P (Ci|Sj)is the probability that the car has been put behind door i, given that doorj has been selected, see Appendix 1. Moreover, Monty Hall will never openthe door behind which there is a car, thus

    P (Oi|Ci and Sj) = 0

    for i, j = 1, 2, 3. However, since we dont know whether Monty Hall favoursone of the doors left, when there is a possibility of choice (in other wordswhen the door behind which the car is hidden has been selected) we have toset

    P (Oj|Ci and Si) = p

    where i, j = 1, 2, 3, i 6= j and 0 p 1. Lastly, we can be certain thatMonty Hall wont open the door with the car behind it, and thus

    P (Ok|Ci and Sj) = 1

    for i, j, k = 1, 2, 3 where i 6= j 6= k.Now we can calculate the probability that the car is behind the door we

    selected initially, given that Monty Hall opened one of the other doors, as

    P (Ci|Oj and Si) = P (Ci and Oj and Si)P (Oj and Si)

    where i, j = 1, 2, 3 and i 6= j. From now on the word and will be omittedfrom the expressions in order to save space.

    The numerator can, using the product rule twice, be expressed as

    P (CiOjSi) = P (Oj|CiSi)P (CiSi)= P (Oj|CiSi)P (Ci|Si)P (Si)= p

    (1

    3

    )P (Si)

    =P (Si)

    3p.

    Passing to the denominator we note that the events Oj and Si can occurboth when the car is behind door i or behind door k, where k = 1, 2, 3 andk 6= i, j. Remember that Monty never will open the door containing the car

    26

  • and thus we dont need to consider the case when the car is behind door j.Consequently

    P (OjSi) = P (OjSiCi) + P (OjSiCk)

    but applying the product rule to the individual terms yields

    P (OjSi) = P (Oj|SiCi)P (SiCi) + P (Oj|SiCk)P (SiCk)= P (Oj|SiCi)P (Ci|Si)P (Si) + P (Oj|SiCk)P (Ck|Si)P (Si).

    and since weve already specified the probabilities of the factors at the be-ginning of this section, we can finally write the denominator as

    P (OjSi) = p

    (1

    3

    )P (Si) + 1

    (1

    3

    )P (Si)

    =p

    3P (Si) +

    1

    3P (Si)

    =P (Si)

    3(p+ 1) .

    Going back to our original expression, replacing the old expressions of thenumerator and denominator, we have

    P (Ci|Oj and Si) = P (Ci and Oj and Si)P (Oj and Si)

    =P (Si)

    3p

    P (Si)3

    (p+ 1)=

    p

    p+ 1.

    Now, because 0 p 1, we know that

    0 pp+ 1

    12

    and thus we would recommend choosing the other, still closed, door.

    Case 2. The distribution of cars is skewed. In contrast to theprevious case, we now consider the case where P (Ci) > P (Cj) = P (Ck) andset

    P (Ci) = q

    and

    P (Cj) = P (Ck) =1 q

    2

    27

  • where 13< q < 1. Calculations similar to those in case 1 yield

    P (Ci|OjSi) = pqP (Si)pqP (Si) +

    1q2P (Si)

    =pqP (Si)

    P (Si)(pq + 1q

    2

    )=

    pq

    pq + 1q2

    .

    As a matter of curiosity, if we define the domains

    D ={

    (p, q)

    0 p 1, 13 < q < 1}

    and

    D ={

    (p, q)

    pqpq + 1q2

    >1

    2

    }

    then D(

    pq

    pq+ 1q2

    )dA

    D

    (pq

    pq+ 1q2

    )dA 0.84

    which would lead to the recommendation of not changing doors. However,one could argue that it is unlikely that q > 0.4, and for

    0.4p

    0.4p+ 0.3>

    1

    2

    p has to be greater than 3/4. But that Monty Hall would prefer a partic-ular door in three out of four cases seems highly unlikely. Hence it seemsreasonable to make a switch even in this case.

    28

  • Chapter 2

    Classical Decision Theory

    2.1 Pure Classical Decision Theory

    This theory can be viewed as an application of Probability Calculus to gamesof chance like the following ones:

    Game 1One throw of four distinguishable dice.Stake: 200 SEKPrizes:

    (a) 12 000 SEK if the product of the outcomes is an odd number and theirsum a square one.

    (b) 4 000 SEK if the product of the outcomes is an odd number and theirsum minus one a square number.

    (c) 1 000 SEK if the product of the outcomes is an odd number and neither(a) nor (b) holds.

    Game 2One throw of four distinguishable dice.Stake: 200 SEKPrizes:

    (a) 3 000 SEK if the product of the outcomes is a square number, theirsum is odd, and the number one does not occur.

    (b) 2 000 SEK if the product of the outcomes is a square number, theirsum is even, and the number one does not occur.

    29

  • (c) 550 SEK if the product of the outcomes is a square number and neither(a) nor (b) holds.

    Game 3One throw with four distinguishable dice.Stake: 200 SEKPrize: 202 SEK if the product of the outcomes is at least two.

    Game 4One throw with four distinguishable dice.Stake: 200 SEKPrize: 250 000 SEK if the product of the outcomes is one.

    According to the classical theory all outcomes of these lotteries are equallylikely. Moreover, each lottery has a value that is equal to its expected prize,which is defined as the weighted arithmetic mean of the prizes with theirprobabilities serving as weights. Accordingly, the classical theory assigns thefollowing expected prizes to our lotteries:

    Game 1: 111296 12000 + 16

    1296 4000 + 54

    1296 1000 = 192.90

    Game 2: 241296 3000 + 65

    1296 2000 + 110

    1296 550 = 202.55

    Game 3: 12951296 202 = 201.84

    Game 4: 11296 250000 = 192.90

    The assignment of expected prizes is to serve as a guide for a choice, andthe classical theory in this case recommends game 2 if one were to choosebetween the different games. Furthermore, it even prescribes that anyoneat anytime should accept an offer to play game 2 or game 3, because theirexpected prizes are higher than the stakes. The basis for this prescription isprovided by Bernoullis law of large numbers (see Appendix 1) which showsthat the average return in the long run, with a high probability, will be closeto the expected prize. Thereby, an empirical hypothesis to the effect that therelative frequency of a given outcome in the long run will equal its probability,is presupposed. To assess the basis for the classical assignment of expectedprizes, it may be instructive to estimate the long run in this case. We startby noticing the variances of the different games.

    Game 1: 1 424 209

    Game 2: 351 934

    30

  • Game 3: 31

    Game 4: 48 188 098

    If we then demand that the probability of a mean deviation of at most4 SEK from the expected prize, should be at least 0.99, then Chebyshevsinequality yields the following numbers:

    Game 1: 8 901 305 Game 2: 2 199 588 Game 3: 197 Game 4: 301 175 611Now, these estimates are mainly interesting from an historical point of

    view since modern sharper estimates yield lower but still forbiddingly highnumbers. The Bernstein-Bennett inequality (Bennett, 1962), for instance,yields the following ones:

    Game 1: 953 608 Game 2: 235 540 Game 3: 21 Game 4: 32 134 317Using Talagrands (1995) inequality we obtain even smaller numbers.

    Note that it isnt applicable to game 3 due to the size of the interval incomparison to the maximum win:

    Game 1: 810 700 Game 2: 200 185 Game 4: 27 332 010Moreover, extensive simulations (see Appendix 3), yield numbers in the

    same region. Hence these numbers are likely to be the true ones.But then the evaluation of games of chance given by the classical theory

    seems rather unconvincing and its influence almost a mystery. As a basisfor a choice between the lotteries above the following considerations seemmore natural. A risk averse person should choose game 3 whereas one onlyinterested in games where there are prospects for a substantial gain should

    31

  • choose game 4. If, on the other hand, someone badly needs 12 000 SEK, thengame 1 ought to be the most attractive one. Only if such considerations arenot met with success a trade-off between prizes and probabilities seems to beneeded. Then basing a choice between games on their expected prizes seemsto be attractive mainly because the expected prize takes care of probabilitiesand prizes in such a simple way.

    After this account the salient features of Pure Classical Decision Theorymay be summarized as follows:

    it aims at supporting a choice between options which are completelyanalyzed; i e the consequences of each option are determined and haveprizes attached to them,

    the probabilities of the consequences are determined by considerationsof symmetry, and

    the value of an option equals its expected prize.

    2.2 Modified Classical Decision Theory

    2.2.1 Introduction to utilities

    Utilities were introduced by Daniel Bernoulli in 1738 to solve a mathematicalproblem (see Appendix 2). But it seems reasonable to use them even forhandling finite decision problems like the ones described in sections 1.1.1and 2.1.

    Utilities are real numbers defined by functions from the outcomes of op-tions or, as in section 2.1, from the prizes associated with these outcomes.In the example of section 2.1, we can for instance define a utility function bysetting

    U(x) =

    x, if x 1000log10(x)10003 , if x > 1000 (2.1)Then the expected utility of the games in section 2.1 are as follows:

    Game 1:11

    1296U(12000) +

    16

    1296U(4000) +

    54

    1296U(1000) = 68.03

    Game 2:24

    1296U(3000) +

    65

    1296U(2000) +

    110

    1296U(550) = 123.33

    Game 3:1295

    1296U(202) = 201.84

    32

  • Game 4:1

    1296U(250000) = 1.39

    Hence a person who opts for choosing between games based on this utilityfunction should choose game 3, provided she evaluates the games in accor-dance with the utility principle: the value of a game equals its expectedutility.

    But what grounds are there for evaluating games in accordance with thisprinciple? In case

    U(x) = ax+ b with a > 0

    you can rank the games via the Law of Large Numbers since the expectedutility of a game g then equals

    aE[g] + b.

    But otherwise other grounds must be provided. Attempts in this directionwill be discussed in chapter 5.

    33

  • Chapter 3

    Supersoft Decision Theory

    Supersoft Decision Theory (SSD) is a family of modifications of ClassicalDecision Theory. The common trait is that vague and numerically impreciseestimates of probabilities and values are allowed. The main advantage ofadmitting such estimates is that it makes a smooth application of decisiontheoretic methods to new problems feasible. This much can be seen fromthe two first examples of chapter 1. The main drawback, of course, is thatcalculations then become much harder and therefore in many cases require asuitable software. In this chapter one version of the theory, called weak SSD(wSSD), is delineated. For other versions, see Danielson (1997), Ekenberg(2005), and Sundgren (2011).

    3.1 Representation of Decision Problems

    We assume that the number of options is finite and that each option has afinite number of consequences. Sometimes the consequences of an option arebest described as nodes of a tree. Such a tree can in some cases be quiteextensive, see e.g. Johansson (2003, 300-315).

    Mathematically, a finite tree can be identified with a finite set T of finitesequences of natural numbers such that

    i. T contains exactly one sequence of minimal length, called the root ofthe tree, and

    ii. if (s1, . . . , sn1, sn) is an element of T , then either (s1, . . . , sn1, sn) isthe element of minimal length or (s1, . . . , sn1) is an element of T .

    If (s1, . . . , sn1, sn) and (s1, . . . , sn1) both are in T , then (s1, . . . , sn1, sn)is an immediate successor of (s1, . . . , sn1) in T .

    34

  • An element of T without immediate successors in T is called a leaf of T .Hence the first step in the representation of a decision problem consists ofthe construction of a number of trees. This is followed by an estimate of theprobability of each node of each tree save the ones of minimal length.

    1

    2

    5 6

    3

    7 8 9

    4

    Figure 3.1: An example of a tree structure.

    The tree T in figure 3.1 can be described as the following set of sequences,where each sequence represents the path to a node:

    T =

    (1), (1, 2), (1, 2, 5),

    (1, 2, 6), (1, 3), (1, 3, 7),(1, 3, 8), (1, 3, 9), (1, 4)

    .Since the node labeled 1 is the minimal sequence of T , this is the root of

    the tree. The nodes labeled 5, 6, 7, 8 and 9 are all leaves since they donthave any immediate successors. We can also see that the sequences togetherrepresent all possible paths from the root to the different nodes.

    In wSSD any kind of probability estimate is admitted but the basic oneshave one of the following four forms:

    i. P (E) = a,

    ii. P (E1) = P (E2),

    iii. a < P (E) < b, or

    iv. a P (E) b.The first form is the only one that is mandatory since the immediate

    successors of a node form a sample space. The third one occurs above all inmathematical representations of vague estimates like the event E is quitelikely or it is more probable than not that E occurs, since vague estimates

    35

  • must always be represented by open sets. Note in particular that if an esti-mate of E is vague then the same must hold of the corresponding estimate ofnot E. Hence these two estimates must be represented by overlapping opensets. Finally, the value of each leaf is estimated.

    Sometimes such an evaluation is straightforward but sometimes it takesthe form of an aggregation. In public procurement, for instance, tenderstypically are first evaluated according to different criteria such as qualityand price. Then these evaluations are aggregated in some other way. InwSSD, primary evaluations of almost any form are admitted but three formspresent themselves as the most natural ones:

    i. v(E) = a,

    ii. a < v(E) < b, and

    iii. an ordering combined with an ordering of distances.

    For convenience, all numbers are assumed to be non-negative and at most1. As an end product the original decision problem is represented by a math-ematical structure F called a decision frame. Such a structure typically hasthe form (o1, . . . , on, T1, . . . , Tn,S[p],U [v]). Here o1, . . . , on are the optionsconsidered and T1, . . . , Tn the corresponding decision trees. Moreover, S[p],is the set of probability estimates and U [v] the set of value estimates.

    Lets look at a couple of examples that utilizes the above premises. Thetree structures in figure 3.2 represents two different options o1 and o2. If welet T1 and T2 be the corresponding decision trees then

    T1 = {(E1), (E1, E1,1), (E1, E1,2)}and

    T2 = {(E2), (E2, E2,1), (E2, E2,2), (E2, E2,3)} .

    E1

    E1,1 E1,2

    E2

    E2,1 E2,2 E2,3

    Figure 3.2: Each of the decision trees represents a unique option.

    Since the events E1,1 and E1,2 make up the complete sample space of E1,the probability

    P (E1) = P (E1,1) + P (E1,2) = 1.

    36

  • Likewise, the probability

    P (E2) = P (E2,1) + P (E2,2) + P (E2,3) = 1,

    which reflects our belief that no outcomes except the immediate successorsto E1 or E2, which we have defined explicitly, can occur. In its simplestform we can then let P (E1,1) = a and P (E1,2) = 1 a, where a representsan exact probability estimate. Using vague probability estimates for theoutcomes E2,1, E2,2 and E2,3, we could for example set

    a < P (E2,1) < b and P (E2,2) < P (E2,3),

    which would represent the belief that the probability of E2,2 lies somewherein between a and b, and that the probability of E2,2 is less than that of E2,3.

    When it comes to estimating the values of the different events, it is usuallynot possible to directly estimate the values of the options, in this case E1and E2, since these are dependent on their respective successors. However,we can estimate the values of E1,1 and E1,2 exactly as

    v(E1,1) = a and v(E1,2) = b

    where a and b are real numbers in the interval [0, 1]. Again, using vagueestimates for the values of E2,1, E2,2 and E2,3 we can for example set

    a < v(E2,1) < v(E2,2) and v(E2,3) = 2v(E2,1).

    Using the definitions we made earlier, we can now define the decisionframe

    F = (E1, E2, T1, T2,S[p],U [v]) ,where

    S[p] = {P (E1) = 1,P (E1,1) = a,

    P (E1,2) = 1 a,a < P (E2,1) < b,

    P (E2,2) < P (E2,3)}.and

    U [v] = {v(E1,1) = a,v(E1,2) = b,

    a < v(E2,1) < v(E2,2),

    v(E2,3) = 2v(E2,1)}.

    37

  • 3.2 Satisfiable and not Satisfiable Decision Frames

    F = (o1, . . . , on, T1, . . . , Tn,S[p],U [v]) is satisfiable if both S[p] and U [v] aresolvable. Then the estimates in S[p] can be interpreted as probability es-timates and the values expressed in U [v] can be measured with a commonscale. Formally, this is all that is needed to apply all valuations of optionsdeveloped for classical decision theory.

    In case F is not satisfiable S[p] and U [v] may in some cases be modifiedin the following ways to yield solvable sets:

    i. If S[p] is not solvable, the only general way to proceed is to successivelyincrease the size of the intervals until a solvable set is obtained. If thisfails, then the estimates in S[p] are not genuine probability estimatesand at least some of them must be reconsidered, see Malmnas (1981).

    ii. If U [v] is not solvable, the only general procedure available is the oneoutlined above, but in some cases more special procedures are to bepreferred. This can be illustrated by a case much discussed by moralphilosophers in Sweden.

    A well-known problem, see e.g. Bergstrom (1991), is that pairwise com-parisons may yield cycles: U [v] may e.g. contain the following inequalities:

    v(ui) < v(ui+1) where 0 i n 1 andv(un) < v(u0),

    which for the sake of clarity also can be written as

    v(u0) < v(u1) < v(u2) < < v(un) < v(u0).

    Cycles may, of course, give rise to problems for decision makers, especiallyif they like Bergstrom is to choose between u0, . . . , u1000. A simple way outof this predicament is to replace < by in the inequalities above. Then alloptions are equally good. But perhaps the following procedures more trulyreflect the value estimates in some cases.

    Assume for instance that a decision maker B is facing a choice betweenu0, . . . , u1000 and that her valuations of them are as follows:

    v(ui) < v(uj) where 0 i < j 999,v(ui) < v(u1000) where 1 i 999 andv(u0) > v(u1000),

    38

  • which we also can express as

    v(u0) < v(u1) < v(u2) < < v(u998) < v(u999),v(u1) < v(u2) < v(u3) < < v(u999) < v(u1000) andv(u1000) < v(u0).

    Assume, in addition, that

    v(u0) v(u1000) = v(ui+1) v(ui) where 0 i 999.In other words, the pairwise distance with regards to value, between anyneighboring elements is the same.

    If, then, N(ui) is the number of pairs where ui is on top, then

    N(u0) = 1,

    N(ui) = i where 1 i 999 andN(u1000) = 999.

    Hence B should choose u1000 if her choice is to be based on N sincev(u999) < v(u1000).

    If her choice instead is based on a cup without seeding, then only u998,u999 or u1000 can come out as the winner. Should u0 play the first roundagainst anyone but u1000, then u1000 will win because it cant loose to anyonebut to u0. Since u0, . . . , u1000 is an uneven number of outcomes, we have touse a so call bye competitor in a cup. Any ui will always win against thebye. If for simplicity u0, . . . , u1000 are given the 1001 leftmost positions inthe first round and the bye the last position, then the probability of u1000not winning is approximately 103 (see figure 3.3 for an example). Hence acup yields the same result as a series in this particular case.

    An alternative way of dealing with cycles is to introduce a binary functionD(ui, uj) expressing the difference in value between ui and uj instead of aunary utility function V (ui). The value of this function, which satisfies theconditions D(ui, uj) = D(uj, ui) and 1 D(ui, uj) 1, need not bea number but can be a variable with some constraints. A choice can thendetermined by the function

    M(ui) =nj=0

    D(ui, uj) = D(ui, u0) +D(ui, u1) + +D(ui, un),

    where n = 1000 in the example above.Hence, formally, cycles may not pose a great problem. However, a decision

    maker, whose value assessments have given rise to a cycle, should at leastconsider the following principles:

    39

  • bye

    u0

    u1000

    u999

    u998

    u1, ,

    u997

    u0

    u1000

    u998

    u0

    u998

    Figure 3.3: Example of a cup where u1000 looses.

    i. If a < b then a < pa+ (1 p)b < b where 0 < p < 1.ii. If a < b and b < c then b pa+ (1 p)c where 0 < p < 1.

    iii. If a < b and b c then a < c.iv. pa+ (1 p)b = (1 q)a+ qb where q = 1 p and 0 < p < 1.v. If a b and b = c then a c.Here a, b, and c are lotteries or outcomes, a < b is short for b is clearly

    better than a, and b c stands for b and c are about equally good. As iscustomary, = stands for definitional equality.

    From these principles and the cycle

    u1 < u2 < u3 < u1

    we first look at the partial relation

    u3 < u1 < u2

    and with the help of (ii.) conclude that

    u1 qu3 + (1 q)u2.

    40

  • Replacing u1 in the relation u3 < u1 with the approximation above yields

    u3 < qu3 + (1 q)u2.

    Furthermore, applying (i.) to the relation u2 < u3 we have

    u2 < pu2 + (1 p)u3 < u3which together with the approximation of u1 yields

    pu2 + (1 p)u3 < u3 < qu3 + (1 q)u2.

    Hence there is some lottery c such that u3 is both better and worse thanc; for example when both p and q equals 1/2. Therefore B should eitherreconsider her initial assessments or rebut one of these principles.

    3.3 Evaluations of Options and Decision Rules

    in wSSD

    When the original decision problem is represented by a decision frame, eval-uations and decision rules originally formulated for classical decision theorymay be used to decide whether a proposed project is acceptable or to makea choice between the options considered. The fact that this representationis not straightforward, however, calls for some circumspection in employingthem. In this section the evaluations used in example 1 of chapter 1 will bepresented followed by a discussion of decision rules based on extreme values.

    3.3.1 Qualitative Evaluations

    In many evaluations, options that are too risky are eliminated at an earlystage. In other cases only options which promise high returns are given acloser examination. To formulate such evaluations in wSSD, take as a pointof departure a decision maker B and a satisfiable decision frame

    F = (o1, . . . , on, T1, . . . , Tn,S[p],U [v]).

    Then the probability of the option oi yielding an outcome of value V issufficiently high if and only if there exists a suitable number r such thatP (Li,V ) r. Here Li,V is the set of leaves of Ti with a value at least as highas V (see figure 3.4), and r depends on B. This leaves us with the task ofexplicating P (Li,V ) r and the outcome u has the value V .

    41

  • root

    2

    0.5

    5

    0.2

    6

    0.3

    Figure 3.4: Let the tree in the figure be T1 and set V = 3, then the setL1,V = {5, 6} and consequently P (L1,V ) = P ({5, 6}) = 0.2 + 0.3 = 0.5.

    Explication of P (Li,V ) rAt least the following candidates present themselves:

    i. P (Li,V ) r holds for all solutions of S[p],ii. P (Li,V ) r holds for all regular solutions of S[p],

    iii. P (Li,V ) r holds for a great deal of the solutions of S[p],iv. P (Li,V ) r holds for some regular solution of S[p], andv. P (Li,V ) r holds for some solution of S[p].

    Comment. A solution is regular if and only if it does no contain valuesthat are too close to the endpoints. If we recall that the representation inwSSD uses somewhat wide intervals, then (v) is too weak since it allows valuesnear the endpoints, and perhaps (i) is too strong, even though its occurrenceshould be noted. Moreover, (iii) should be defined more precisely. One wayof doing this is to say that (iii) holds if and only if

    volume of A

    volume of B

    has a certain size. Here B is the set of solutions to S[p] and A the setof solutions to S[p] such that P (Li,V ) r. Another possibility is to usethe notion of contraction introduced by Mats Danielson (Danielson, 1997),which in turn is a modification of the notion of proportion due to LoveEkenberg (Ekenberg, 1994). Note that the outcome u has the value V canbe explicated along the same lines.

    42

  • 3.3.2 Expected Value

    If F = (o1, . . . , on, T1, . . . , Tn,S[p],U [v]) is a satisfiable decision frame, p isthe set of solutions to S[p], and v is the set of solutions to U [v], then theexpected value can be accommodated to wSSD in one of the following ways,where p p and v v:

    AF(oi) = inf E(p, v, oi) (3.1)

    BF(oi) = supE(p, v, oi) (3.2)

    CF(oi) =

    p

    v

    E(p, v, oi) dpdv

    /p

    v

    dpdv (3.3)

    Here

    E(p, v, oi) = E[Ti] =mj=1

    PjVj =mj=1

    PjE[Ti,j]

    where the sum is taken over the components of p = (P1, . . . , Pm) and v =(V1, . . . , Vm) associated with the nodes of Ti. Note that the order of theoperations is immaterial and that all combinations of them make sense. Notealso that partial evaluations like

    p

    E(p, v, oi) dp

    /p

    dp (3.4)

    are permissible and sometimes quite useful in ordering options, see example3 of chapter 1.

    If one of these evaluations is to be chosen, then the obvious choice isCF(oi) since it is a mean value of mean values but sometimes a conservativedecision maker wants to base a decision on all of them.

    Example. Set o1 = (q, 1 q;x, y) and o2 = (r, 1 r;x, y) with 0 < r CF(o2) and we see that CF induces the correct order.

    3.3.3 Decision Rules Based on Extreme Values

    Pure maximin and maximax

    Let F = (o1, . . . , on, T1, . . . , Tn,S[p],U [v]) be a satisfiable decision frame, vthe set of solutions to U [v] with v = (V1, . . . , Vm), and j the projection of

    46

  • v on the j:th coordinate, 1 j m. Set

    min(oi) = min( infV11

    V1, . . . , infVjj

    Vj, . . . , infVmm

    Vm)

    and

    max(oi) = max( supV11

    V1, . . . , supVjj

    Vj, . . . , supVmm

    Vm)

    where each Vj is associated with a leaf of Ti. Then min(oi) is the smallestpossible outcome for the alternative oi, and max(oi) is the largest possibleoutcome for the alternative oi.

    The maximin principle prescribes that the alternative with the largestminimum value should be chosen. In other words if the smallest possibleoutcome of the alternative o1 is a, and the smallest possible outcome ofthe alternative o2 is b, and a > b, then alternative o1 is preferred over o2.Consequently, the maximin principle can be denoted

    mF = max (min(o1), . . . ,min(oi), . . . ,min(on)) .

    The maximax principle prescribes that the alternative with the largestmaximum value should be chosen. If the largest possible value of alternativeo1 is c, the largest possible value of of alternative o2 is d, and d > c, then o2is to be preferred over o1. We can denote the maximax principle

    MF = max (max(o1), . . . ,max(oi), . . . ,max(on)) .

    Comment. Note that the lotteries considered in section 3.2 are equallygood according to these rules. More generally, all lotteries with the sameoutcomes are equally good according to them as long as the price per ticketis the same.

    47

  • Chapter 4

    Other Approaches to Decisionsunder Uncertainty

    In this chapter a few well-known decision rules, when we have more than oneprobability distribution, are presented and evaluated. First decision rulesdeveloped within Statistical Decision Theory are discussed, thereafter twodifferent proposals put forward by philosophers are considered.

    4.1 Hodges and Lehmann (1952)

    Let F = (o1, . . . , on, T1, . . . , Tn, S[p], U [v]) be a satisfiable decision frame ands > 0. Then these two authors propose that first all options oi such that mFmin(oi) < s are selected, (mF is the maximin of F , see section 3.3.3). Thenfrom this set the option oi with the greatest value of AF(oi), the infimum ofthe expected value, is to be chosen.

    They also propose the following refinement of step one in the selectionprocess. Let

    F1, . . . ,Fi, . . . ,Fmwhere

    Fi = (o1, . . . , on, T1, . . . , Tn, Si[p], U [v])

    be a series of satisfiable decision frames such that Sj[p] is a contraction ofSm[p] if j > i and si > 0, 1 i m. Here a contraction is obtained by thereplacement of an interval by a subinterval. Then all options oi satisfyingthe condition mFj min(oi) < sj where 1 j m are first selected.

    48

  • Comment. Set o1 = (q, 1 q;x, y) and o2 = (r, 1 r;x, y) with0 < r < q < 1 and 0 < y < x < 1. Here o1 is a lottery which yields x withprobability q and y with probability 1 q. Then o1 dominates o2 stochas-tically and therefore clearly is the better option. But according to the ruleproposed by Hodges and Lehmann they are equally good.

    4.2 Blum and Rosenblatt (1967)

    These two authors propose that a selection of options is to be based onthe evaluation AF , see section 3.3.2. The same proposal is put forward byJackson et al. (1970), Randles and Hollander (1971), Solomon (1972) andKofler and Menges (1976, p. 140). But this proposal makes the optionsconsidered in the previous section equally good.

    4.3 Watson (1974)

    Let F = (o1, . . . , on, T1, . . . , Tn, S[p], U [v]) be a satisfiable decision frame andP a solution to S[p]. Set

    CF = (oi, P ) =

    v

    E(P, v, oi) dv

    /v

    dv ,

    M(P ) = max(CF(oi, P ), 1 i n),

    L(P ) = sup (M(P )M(P ), P a solution to S[p]))

    and

    m = inf (L(P ), P a solution to S[p]) .

    Then Watson proposes that an option oi such that

    CF(oi, P ) = M(P )

    with L(P ) m is to be chosen.

    Comments.

    - The rationale behind Watsons proposal is that we should minimize thedamage due to a choice based on a wrong solution to S[p].

    49

  • - Note that Watsons proposal orders the options considered in section4.1 correctly since CF(o1, P ) > CF(o2, P ) for all solutions P .

    - Set o1 = (q, 1 q;x, y) and o2 = (r, 1 r;x+ a, y) with 0 < r < q < 1and 0 < y < x < 1 a. Then Watsons proposal favors o2 for alla, 0 < a < 1. To see this, note that L(P ) approaches m when bothq and p approaches 1. Hence it suffices to consider the limiting caseq = p = 1. But

    v

    E(P, v, o1) dv (1 a)3

    6

    b with b 0.13. Hence CF fares better in this example.

    4.4 Levi (1974)

    Levi does not propose an evaluation or ranking of available options in asatisfiable decision frame F = (o1, . . . , on, T1, . . . , Tn, S[p], U [v]) but insteadproposes ways of delimiting the set of permissible options. More precisely headvocates that this should be done as follows. Select first the E-permissibleoptions, pick then out the P -permissible ones. Select finally the S-permissibleones. Here

    i. An option oi is E-permissible if there exists a p in p and a v in vsuch that E(p, v, oi) E(p, v, oj) for all j, 1 j n.

    ii. An option oi is P -permissible if it is E-permissible and an optimumone with respect of freedom of choice.

    iii. An option oi is S-permissible if it is P -permissible and there exists a vin v such that w(oi, v) w(oj, n) for all j, 1 j n. Here w(oi, v)is the worst outcome of oi given v.

    Comments. Set o1 = (p, 1 p;x, y) and o2 = (q, 1 q;x, y) with 0 > cn.Then the set of finite options O over C is the least set S such that

    i. if A is a vector (p1, . . . , pn; o1, . . . , on) such that 0 pi,pi = 1, then

    A S, andii. if A1 S, . . . , Am S, 0 pi and

    pi = 1, then

    (p1, . . . , pm;A1, . . . , Am) S.A is a normal option if and only if A equals (p1, . . . , pn; c1, . . . , cn) for

    some p1, . . . , pn such that 0 pi andpi = 1. Each finite option A can be

    reduced to a unique normal option N(A) and two options A and B are saidto be congruent if and only if N(A) = N(B). An evaluation is a functionV : O RR R such that V (A, f) = V (B, f) if A and B are congruent.

    Comments. Finite options are to be viewed as lotteries which ultimatelyyield USD. A fixed finite set C is used in order to simplify the presentation.C is to be thought of as a sufficiently large set that contains all outcomesexplicitly mentioned in this chapter.

    52

  • Examples

    a1 Expected utility given f .E(A, f) = p1f(c1)+ +pnf(cn). Here and belowN(A) = (p1, . . . , pn; c1, . . . , cn).

    a2 Qualitative evaluation given f and a risk index (r, s).S(A, f, r, s) = 1 if

    r pi s and S(A, f, r, s) = 0 if

    r pi > s. Here

    r indicates that the sum is taken over all i such that f(ci) r.a2,c Continuous qualitative evaluation given f and risk index (r, s, ).

    S(A, f, r, s, ) = 1 if

    r pi s, S(A, f, r, s, ) = (s+ t)/ if s t =r pi s+ , and S(A, f, r, s, ) = 0 if

    r pi s+ .

    a3 Maximin.m(A, f) = min(f(ci), pi > 0).

    a4 Maximax.M(A, f) = max(f(ci), pi > 0).

    a5 Hurwicz (1951).H(A, f) = M(A, f) + (1 )m(A, f).

    a6 Maximin regret given K.RK(A, f) = inf(R(A,B, f), B 6= A,B K).Here R(A,B, f) = min(f(ci) f(cj), pi, qj > 0)if N(B) = (q1, . . . , qn; c1, . . . , cn).

    Given an evaluation V and a function f , we can define a semimetric dV,fand an order V,f on O by setting dV,f (A,B) = |V (A, f) V (B, f)| andA V,f B if and only if V (A, f) V (B, f). These notions can in turnserve to define a choice rule RV,f as follows: RV,f : 2O R 2O such thatRV,f (A, ) = {A A |V (A, f) + V (B, f), for all B A}. RV,f (A, ) iscalled the set of -optimum options given V, f .

    Comments. d is a semimetric on a set M if and only if d(x, y) 0,d(x, y) = d(y, x), and d(x, z) d(x, y)+d(y, z). Note that the last inequalityimplies that d(x, x) = 0. As is customary, is to be a small non-negativenumber. is introduced because A does not always contain an option thatis an optimum one given V, f but, at least for evaluations considered in thischapter, always an - optimum one. To see this, set

    A = {(p, 1 p; c1, c2) | 0.5 < p < 0.9} ,f(c1) > f(c2) and

    V ((p, 1 p; c1, c2), f) = pf(c1) + (1 p)f(c2).

    53

  • A choice test is a triple (>,A,B) where > is a strict partial order onO and A, B are finite subsets of O such that A > B, for all A A andB B. A preference test is a pair (A, >) where A is a finite subset of Oand > is a strict partial order on A. Let V be an evaluation, F a subset ofRR, and 0. Then (>,A,B) is a weak counterexample at level (F , ) to Vas a choice rule generator if and only if (>,A,B) is a choice test such thatRV,f (AB, )B 6= , for all f F , and (>,A,B) is a strong counterexampleat level (F , ) to V as a choice rule generator if and only if (>,A,B) is achoice test such that B RV,f (A B, ), for all f F . Moreover, (A, >) isa counterexample at level (F , ) to V as a preference generator if and only if(A, >) is a preference test such that (A,V,f ) 6= (A,C), for each f F andeach linear extension C of >. Finally, V fails strongly (weakly) to degreed as a choice rule generator at level (F , ) if and only if there exists a d-convincing strong (weak) counterexample at this level to V as a choice rulegenerator, and V fails to degree d as a preference generator at level (F , ) ifand only if there exists a d-convincing counterexample at this level to V asa preference generator.

    Comments. The concepts defined above can serve to assess an evalua-tion as a choice rule generator along the following two ways: first, counterex-amples may be assigned degrees which indicate how convincing they are,secondly, it may for each proposed counterexample c and each number 0be possible to determine the largest set F such that c is a counterexampleat level (F , ).

    In this chapter, only the following three levels will be considered:

    1. (F1, 0) where F1 = RR,2. (F2, 0) where F2 =

    {f RR f is strictly increasing} and

    3. (F3, 0) where F3 ={f RR f is strictly increasing but with a

    strictly decreasing rate of increase}.As to degrees, it will only be required that each evaluation be compatiblewith stochastic dominance; that is V (A, f) > V (B, f) if A dominates Bstochastically. Here A dominates B stochastically if p1+ +pi q1+ +qi,for all i, 1 i n1, and p1+ +pi > q1+ +qi, for some i, 1 i n1.As before N(A) = (p1, . . . , pn; c1, . . . , cn) and N(B) = (q1, . . . , qn; c1, . . . , cn).

    Note that counterexamples primarily fault the order induced by a givenevaluation. Hence we will below also speak of counterexamples to orderingsnot necessarily induced by evaluations.

    54

  • Examples

    b1 Set A = (0.99, 0.01; 1000, 0), B = (0.01, 0.99; 2, 1), and A > B. Then(>, {A}, {B}) is a strong counterexample to minimax as a choice rulegenerator at level (F2, 0).

    b2 Set A = (0.01, 0.99; 1000, 0), B = (0.99, 0.01; 999, 998), and B > A.Then (>, {B}, {A}) is a strong counterexample to maximax as a choicerule generator at level (F2, 0).

    b3 Set B = (1s, s; t, r), C = (1s, s;u, r), t u r, and B > C. Then(>, {B}, {C}) is a weak counterexample to S(A, f, r, s) as a choice rulegenerator at level (F2, 0).

    b4 Set B = (0.99, 0.01; 999, 0), A = (0.01, 0.99; 1000, 1), and B > C. Then(>, {B}, {C}) is a strong counterexample to H(A, f) as a choice rulegenerator at level (F2, 0).

    b5 Set B = (0.99, 0.01; 999, 0), A = (0.01, 0.99; 1000, 1), K = {B,C}, andB > C. Then (>, {B}, {C}) is a strong counterexample to RK(A, f)as a choice rule generator at level (F2, 0).

    b6 (Bernoulli 1738, Menger 1934) SetB = (1; 0), C = (231, . . . , 21, 231; 230

    15, . . . , 1 15,15), B > C, and i(x) = x, for all x in R. Then(>, {B}, {C}) is a weak counterexample to E(A, f) at level ({i}, 0).

    b7 (Allais 1953, 1979) Set B = (1; 106), C = (0.1, 0.89, 0.01; 5106, 106, 0),

    B > C, and i as in (b6). Then (>, {B}, {C}) is a strong counterexampleto E(A, f) as a choice rule generator at level ({i}, 0).

    Remarks. As expected, all evaluations ignoring probabilities fail stronglyat such high level as (F2, 0). Hence it is doubtful whether it is good policyto any of these to a large extent. However, the status of E(A, f) as a choicerule generator remains to be determined. To this end, section 5 contains anaccount of what can be inferred from the Allais example with respect to thisproblem. Moreover, the status of evaluations as preference generators mustalso be determined. This will be done in sections 3 and 4 with the help ofcommon ratio tests. But first a few general remarks to clarify some issues.

    55

  • 5.2 Miscellaneous remarks

    5.2.1 On Tests

    A justification of an evaluation as a choice rule generator can either take theform of a demonstration that the given evaluation is the only one satisfyingcertain desirable properties (arguments from above) or in demonstration thatit does not to a large extent produce counterintuitive choices (arguments frombelow). Arguments from above, in particular for expected utility, abound inthe literature. But, for all I can see (Malmnas, 1994), they can only provideexpected utility with a justification that is so weak as to be almost useless.Take as a case in point the axiom system of Herstein and Milnor (1953).According to these two authors (see 5.6.2 for details), an option A is betterthan an option B if and only if E(A, f) > E(B, f), for all f in F2 if and only ifA dominates B stochastically. Set, for example, A = (1 106, 106; 106, 0)and B = (106, 1 106; 2, 1). Then the axiom system of Herstein andMilnor does not imply that A is better than B. Hence we can get a weakcounterexample to the order induced by this axiom system at a level as closeto (F2, 0) as we please. Accordingly, arguments from above offer little as aguide for selecting suitable choice rules. Hence most of the burden must becarried by arguments from below.

    5.2.2 On Options

    The present compendium is mainly devoted to normative decision theory. Anaxiom of this theory is that the value of a course of action a is a function ofthe ultimate outcomes of a and of the probabilities of these outcomes. Hencethe virtues of various choice rules when the ultimate outcomes are prices inUSD can be discussed at the level of options.

    5.2.3 Classification of Evaluations

    An evaluation V (A, f) is regular if and only if V (A, f) = f(ci) in caseA = (p1, . . . , pn; c1, . . . , cn) and

    pi = 1. V (A, f) is discriminating if there

    exists a bijection g in RR such that g(V (A, f)) is regular. Of the evaluationsconsidered in section 5.1 all except (a2) and (a2,c) are discriminating, andof the remaining ones only (a6) is not regular. Regularity seems to be adesirable property for an evaluation. V (A, f) is a continuous evaluation ifV (Bn, f) converges to V (C, f) in case {Bn} converges to C. All evaluationsdefined above except (a2) are continuous ones. The reader should note thatno evaluation that is continuous and regular can put a premium on security.

    56

  • 5.2.4 On the Relation between Preference Tests andChoice Tests

    Take as a starting point the Allais example (Allais, 1953, 1979). Set

    B = (1; 106),

    C = (0.1, 0.89, 0.01; 5 106, 106, 0),D = (0.11, 0.89; 106, 0),

    E = (0.1, 0.9; 5 106, 0),

    and B > C > E > D. Then ({B,C,D,E}, >) is a counterexample toE(A, f) as a preference generator at level (F1, 0). Indeed,

    E(B, f) E(C, f) = f(106) 0.1f(5 106) 0.89f(106) 0.01f(0)= 0.11f(106) + 0.89f(0) 0.1f(5 106) 0.9f(0)= E(D, f) E(E, f).

    Hence ({B,C,D,E}, >) is certainly a counterexample of the kind mentionedabove. Moreover, empirical tests, see Mac Crimmon and Larsson (1979) andKahneman and Tversky (1979), indicate that it should be considered a con-vincing counterexample. Now, what kind of counterexample to E(A, f) asa choice rule generator can be constructed from the Allais example? Theimmediate consequences are the following ones: (>, {B}, {C}) is a strongcounterexample to E(A, f) as a choice rule generator at level (G1, 0) and(>, {E}, {D}) is a strong counterexample to E(A, f) as a choice rule gener-ator at level (G2, 0). Here

    G1 = {f |E(E, f) E(D, f)} andG2 = {f |E(B, f) E(C, f)} .

    Moreover, G1 G2 = F1. On the other hand, G1 F3 6= and G2 F3 6= .So these counterexamples are not really at a high level. Passing to morecontrived counterexamples, we can consider the hypothetical choices at thesame time or in succession; we can also consider mixtures of the given options.This possibility is, however, not open to those who side with Allais sincethey can not accept the following principle: If A1 > B1, . . . , An > Bn, then(p1, . . . , pn : A1, . . . , An) > (p1, . . . , pn;B1, . . . , Bn) for all p1 0, . . . , pn 0such that p1 + +pn = 1. Now this principle is not compatible with B > C,E > D since (0.5, 0.5;B,E) = (0.5, 0.5;C,D). Considering the choices insuccession can not give rise to a formal counterexample since the underlyingutility functions need not be kept constant. So the only possibility left is to

    57

  • consider combinations of the given options. This case is discussed in somedetail in section 5.5, and it is shown there that combinations do not yielda counterexample to E(A, f) at level (F3, 0). Now these findings are notlimited to the Allais example but hold in general. So the substantial recentliterature on preference tests has little to offer those who are interested infinding counterexamples to E(A, f) as a choice rule generator.

    5.3 Evaluations and Common Ratio Tests

    5.3.1 Introduction

    Set

    Ap = (p, 1 p; c1, c3) andBq = (q, 1 q; c2, c3)

    with c1 > c2 > c3 and 1 > q > p > r > 0. Then ({Bq, Ap, Arp, Brq}, >) iscalled a common ratio test. In most cases p and q are comparatively largenumbers and r a small one. Because of their simple structure, common ratiotests are deemed ideal for testing evaluations as preference generators. Thefundamental observation concerning such tests is that

    E(Ap, f) > E(Bq, f) if and only if E(Arp, f) > E(Brq, f),

    for all f F1.On the other hand, for some c1, c2, c3 and when p and q are large but r

    small, most people, see Kahneman and Tversky (1979), prefer Bq to Ap andArp to Brq. So ({Bq, Ap, Arp, Brq}, >) can be a counterexample to E(A, f)as a preference generator at level (F1, 0).

    Now, a challenge for anyone who wishes to propose an alternative toexpected utility, say V (A, f), is to show that V (Ap, f) > V (Bq, f) if and onlyif V (Arp, f) > V (Brq, f), for all f in F1 does not hold, and that V (A, f) iscompatible with stochastic dominance, see Sugden (1986) for a lucid account.Quite a few attempts have also been successful in these respects.

    However, showing this does not entail that ({Bq, Ap, Arp, Brq}, >) cannotbe a counterexample to V (A, f) as a preference generator at level (F2, 0).But the latter result is what is needed in order to have an evaluation thatis a substantial improvement upon expected utility. The prospects for find-ing such an evaluation are, however, not particularly bright; this much canbe concluded from the following observation: Select a set {Bq, Ap, Arp, Brq}

    58

  • as above with c1 c2 > c3 and 1 > q > p > 0.5. Let V be an eval-uation that is compatible with stochastic dominance and f F2. As-sume that V (Arp, f) > V (Brq, f). Now V (Aq, f) > V (Bq, f), and henceV (As, f) > V (Bq, f) for some s, p s < q. Moreover, V (Ars, f) > V (Brq, f).Hence the main advantage that V (A, f) may have over E(A, f) is that of asmaller distance between probabilities. The reader should bear this in mindwhen considering the following detailed examination.

    5.4 An Examination of Some Proposals

    5.4.1 Hagen (1969; 1972; 1979)

    Building on earlier work by Allais, Hagen tentatively proposed the followingevaluation in 1969.

    Ha,b(A, f) = E(A, f) aS(A, f) + bM3(A, f)S2(A, f)

    in case S2(A, f) > 0. If S2(A, f) = 0, then

    Ha,b(A, f) = E(A, f).

    Here

    S2(A, f) = p1(f(c1) E(A, f))2 + + pn(f(cn) E(A, f))2,0 S(A, f) = (S2(A, f))1/2,M3(A, f) = p1(f(c1) E(A, f))3 + + pn(f(cn) E(A, f))3

    and a, b > 0.The rationale behind Ha,b(A, f) is that the value of an option should

    decrease with increasing dispersion and increase with positive skewness. Thereason behind the division of M3 by S

    2 is probably that all moments are tohave equal weights. As is customary in this field, Ha,b(A, f) is claimed to becompatible with an axiom system that the interested reader can look up.

    To gain some understanding of this evaluation, set

    B(p, d) = (p, 1 p; d, 0)with p, d > 0 and f(d) > f(0) = 0. Then

    Ha,b(B(p, d), f) = pf(d) af(d)(p(1 p)) 12 bf(d)(1 2p) andH

    p= f(d) af(d)(1 2p)(p(1 p))

    12

    2+ 2bf(d).

    59

  • Let k be a large natural number and set p = 1k2

    . Then

    af(d)(1 2p)(p(1 p)) 1

    2

    2< ak/4.

    Hence Hp

    < 0 if p is a small number. Accordingly, H is not an increasingfunction in p, and it is therefore hardly a serious rival to expected utility.

    In 1979 Hagen proposed the following modification of Ha,b:

    Hg,b(A, f) = E(A, f) g(S(A, f)) + bM3(A, f)S2(A, f)

    +

    in case S2(A, f) > 0. If S2(A, f) = 0, then

    Hg,b(A, f) = E(A, f).

    Here g(0) = 0, g(x) > 0 if x > 0, g(x) > 0 and continuous, and b, > 0.To see to what extent Hg,b is compatible with stochastic dominance, we

    compute Hp

    as above, getting

    H

    p= f(d)

    (1 g

    (f(d) (p(1 p)) 12

    )(1 2p)(p(1 p))

    12

    2 2b

    ).

    Hence Hp> 0 only if b < 0.5. Assume then that c > 0 and that (p(1 p)) 12

    12k with k large. Then g(x) < 1

    kif f(d) 2kx. Hence g(x) = 0, which yields

    a contradiction. Hence we must have a uniform upper bound M on f(d),which in turn imposes restrictions on admissible f :s and d:s. Moreover, g

    must approach 0 as fast as p does. The simplest function satisfying theseconditions seems to be x

    awith a a large number. But then the contribution of

    the term g (S(A, f)) will be negligible in most cases. Hence we may neglectthis term when determining how Hg,b performs in common ratio tests. Tosimplify the comparison with E(A, f), set p = sq with 0 < s < 1. Then

    q

    qs q + b(1 2q)qs+ b(1 2qs) =

    1

    s q(1 2b) + bqs(1 2b) + b

    =qs(1 2b) + b s(q(1 2b) + b)

    s(qs(1 2b) + b)=

    b bss(qs(1 2b) + b) > 0.

    Hence Hg,b is less risk averse than E(A, f) in most common ratio tests andtherefore hardly a serious rival to it.

    60

  • 5.4.2 Fishburn (1983)

    Fishburn there proposes the evaluation

    V (A, f) =E(A, g1(f))

    E(A, g2(f)).

    Here g2(x) > 0. V is regular if and only if

    g1(x)

    g2(x)= x.

    Assume that V is regular. Then g1(0) = 0. To see how V performs incommon ratio tests, set

    Ap = (p, 1 p; c1, 0) andBp = (p, 1 p; c2, 0)

    with c1 > c2 > 0. Let f be strictly increasing with f(0) = 0. Then

    V (Ap, f) V (Bp, f) ==

    pg1(f(c1))

    pg2(f(c1)) + (1 p)(g2(f(0))) pg1(f(c2))

    pg2(f(c2)) + (1 p)(g2(f(0))) > 0

    if and only if

    (1 p)g2(0)g1(f(c1)) > (1 p)g2(0)g1(f(c2))if and only if g1(x) is strictly increasing for x > 0. Hence the introduction ofg1 seems pointless. Now

    V (Ap, f) V (Bq, f) > 0if and only if

    pg1(f(c1))(qg2(f(c2)) + (1 q)g2(0)) > qg1(f(c2))(pg2(f(c1)) + (1 p)g2(0))if and only if

    p(1 q)q(1 p) >

    g1(f(c2))

    g1(f(c1)).

    Hence V is slightly more risk averse than E. But note that this is is due tothe introduction of the function g2 with g2(0) > 0. Hence the price seems tobe too high.

    61

  • 5.4.3 Loomes and Sugden (1986)

    These two authors claim that regret and disappointment (Loomes & Sugden,1982; 1986) ought to influence decision making and in their paper of 1986they propose an evaluation utilizing these ideas. Their work is closely relatedto that of (Bell, 1982; 1985). But, for all we know, Bell has contented himselfwith a discussion of special cases and never presented any evaluation. Theproposal of Loomes and Sugden is as follows:

    V (A, f) =

    pif(ci) +D(f(ci) E(A, f))where 1 i n and A = (p1, . . . , pn; c1, . . . , cn). Here D is supposed to mea-sure elation and disappointment. D is non-decreasing and derivable. More-over, D(x) is convex if x > 0 and concave if x < 0. Finally, D(x) = D(x)which yields D(0) = 0. To see when V is compatible with stochastic domi-nance, let Ap and f be as in section 5.4.2. Then

    V (Ap, f) = pf(c1) +D(f(c1) pf(c1)) +D(0 pf(c1))= pf(c1) +D((1 p)f(c1))D(pf(c1))

    and

    V

    p= f(c1) f(c1)D((1 p)f(c1)) f(c1)D((pf(c1))= f(c1)(1D((1 p)f(c1))D(pf(c1))).

    Hence 0 D(x) < 1 for 0 < x < a, for some a > f(c1). But then xD(x)is strictly increasing in this interval. To see how V performs in common ratiotests, set

    D(pf(c1)) = pf(c1) dand

    D(qf(c2)) = qf(c2) e.Then

    V (Ap, f) V (Bq, f) = d+D((1 p)f(c1)) eD((1 q)f(c2)).But

    d+D(1 p)f(c1)) eD((1 q)f(c2)) 0if pf(c1) qf(2) with equality only if pf(c1) qf(2) since xD(x) is strictlyincreasing and compatibility with stochastic dominance holds. Hence V isless risk averse than E in common ratio tests and hardly a serious rival toE.

    62

  • 5.4.4 Green and Jullien (1988)

    These two authors propose an evaluation that is based on the idea that weshould replace the notion of the value of an outcome with the notion of thevalue of an outcome given a probability. Since this idea contradicts oneof the basic presuppositions of normative decision theory, their proposal ispresented here only for the sake of completeness.

    V (A, f) =

    pigi(f(ci)),

    where 1 i n and A = (p1, . . . , pn; c1, . . . , cn). Here

    gi(x) =1

    pi

    (x, t) dt

    with the integral going from si1 to si, s0 = 0, si = p1 + + pi, and : R [0, 1] R such that is continuous, non-decreasing in the firstvariable, and (0, p) = 0. Setting (x, p) = x we get V (A, f) = E(A, f) asexpected. If V is to be regular, then

    I(x, t) dt = x for I = [0, 1]. Moreover,

    {p |(x, p) is not strictly increasing for all p} has measure zero if V is com-patible with stochastic dominance. To see how V performs in common ratiotests, we will here only consider the case (x, p) = x h(p), h continuous in[0, 1]. Now, in view of Weierstrass approximation theorem, it suffices to seth(p) = Pn(p) with Pn a polynomial of degree n > 0. If V is compatible withstochastic dominance, then Pn 0 in [0, 1], and if Pn is to serve as an riskaverse weight, Pn must be strictly increasing. So the most interesting caseis Pn(p) = (n + 1) pn. Set Ap = (p, 1 p; c1, 0), Bq = (q, 1 q; c2, 0) with0 < c2 < c1, 0.5 < p < q, and f strictly increasing with f(0) = 0. Then

    V (Ap, f) = pnf(c1) and

    V (Bq, f) = qnf(c2).

    Moreover,

    V (Arp, f) = rnpnf(c1) and

    V (Brq, f) = rnqnf(c2).

    Hence, for this choice of , V performs very much as E in common ratiotests.

    5.4.5 Quiggin (1982)

    Quiggin proposes that an evaluation of an option should not be based di-rectly upon the values and probabilities of the given outcomes, but that the

    63

  • probabilities first should be modified. The reason behind this claim is solelyempirical: it seems to conform with observed behavior. Any such proposalis, of course, at variance with one of the basic tenets of normative