lecture on probablity

Course Outline

1

Pierre Simon de Laplace(1749-1827)A Philosophical Essay on Probabilities (1814)Probability is the ratio of the number of favorable cases to that of all cases possible.Suppose we throw a coin twice. What is the probability that we will throw exactly one head?There are four equally possible cases that might arise: One head and one tail.One tail and one head.Two tails.Two heads.So there are 3 cases that will give us a head. The probability that we seek is 3/4.112Laplace firmly believed that, in reality, every event is fully determined by general laws of the universe. But nature is complex and we are woefully ignorant of her ways; we must therefore calculate probabilities to compensate for our limitations. Event, in other words, are probable only relative to our meager knowledge. In an epigram that has defined strict determinism ever since, Laplace boasted that if anyone could provide a complete account of the position and motion of every particle in the universe at any single movement, then total knowledge of natures laws would permit a full determination of all future history. Laplace directly links the need for a theory of probability to human ignorance of natures deterministic ways. He writes: So it is that we owe to the weakness of the human mind one of the most delicate and ingenious of mathematical theories, the science of chance or probability. (Analytical Theory of Probabilities, as cited by Stephen J. Gould, Dinosaurs in a Haystack, p. 27.3Laplaces Classical Definition: The Probability of an event A is defined a-priori without actual experimentation as

provided all these outcomes are equally likely.Consider a box with n white and m red balls. In this case, there are two elementary outcomes: white ball or red ball. Probability of selecting a white ball We can use above classical definition to determine the probability that a given number is divisible by a prime p.

(1-1)4An advantage of study of probability is its use in everyday life. What actions will improve your probability of success. Buying lottery tickets? Investing in junk bonds? Buying gold? Investing in real estate? Smoking? What events are most probable in my lifetime? What kind of event is likely to kill me? Meteor impact? Earthquake? Tsunami? Terrorism? War? Accident? Heart disease? Cancer?Should you buy insurance? How much should you pay for insurance? None of these probabilities are fixed. As knowledge increases and parameters change, so do the probabilities. 5Prisoners dilemmaThree prisoners , A, B, C are in jail. One of them is to be executed and the other two will be set free. Prisoner A asked the guard : one of my partners B or C will be set free. Could you please tell me which one of them will be set free?Guard thought a while and told A : If I do not tell you, then your chance of death is 1/3. But if I tell you, then there are only two left and you are one of them to be killed. Your chance of death will be 1/2. Do you really want to increase your chance of death ? 6Some basics:Flip a coin 3 times, how many possible outcomes are there? With each flip there are two possible outcomes, and we do this 3 times, so all the possible results are:

Flip 1flip 2 flip 3 H HHTHHHTHTTHH HTTHTHTTTTTThere are 3 events each with two possible outcomes, so there are a total of 2*2*2 results = 8.The formulation is the number of possible results with k trails with ni possible outcomes in the Ith trial is

How many values can a 3-digit binary number have?7Another example: How many possible license plates are there using three letters and three numbers?

N=26*26*26*10*10*10= 1,757,600Permutations: The permutations of r objects taken from a set of n distinct objects is the number of ways n things taken r at a time can be arranged. Example: We have 20 rock samples, how many ways can you select 3 samples from the 20? The first rock can be any one of the 20; the 2nd can be any of 19, and the 3rd can be any of 18. So the answer is 20*19*18. The formulation is:

8Factorial: The factorial operation is defined as:

By definition, 0! Is set equal to 1. We can re-write the permutation equation as:

Example: How many different hands are there in straight poker (no draw)?

9The poker example isnt quite correct, because it assumes that the order that you received the cards in is important, which it isnt. We need another parameter where order isnt important.COMBINATIONS: When we dont care about the order of the outcomes (ABC=ACB), then we talk about the number of COMBINATIONS of n objects taken r at a time. This turns out to be the number of permutations divided by r!.

10So how many different poker hands are there really?

How many ways can you pick three marbles from 9 marbles?

11ProbabilityNow that we know how to tell whats possible, how do we tell whats probable?The basic concept is: If there are s possible favorable outcomes of an event and there are n outcomes possible, then the probability of success is s/n.p=s/nHowever, this is only true if all outcomes are equally likely.12Example: What is the probability of drawing an ace from a deck of cards?Since there are 52 cards, there are 52 possible outcomes, and, since there are 4 aces, four of those outcomes are favorable, thus:P=4/52=1/13=7.7%Example: A cancer surgery patient gets biopsies on 6 lymph nodes. If any one is found to contain cancer, then the cancer will be known to have spread and the patient will receive chemotherapy. If only 1 in 10 lymph nodes are actually cancerous, what are the odds of all six sampled nodes coming out negative?13Our possible outcomes are 10 nodes taken 6 at a time, or 10C6=10!/(6!(10-6)!)=10*9*8*7/(4*3*2*1)=10*3*7=210.Favorable outcomes are picking the 1 cancerous node out of 10 in 6 tries, which is the same as picking only the 9 clear nodes in 6 tries: 9C6=9!/(6!(9-6)!)=9*8*7/(3*2*1)=84. So the probability is 84/210=40%. Lesson to surgeons: sample LOTS of nodes!When the probabilities of some outcomes are greater than those of others, the above calculations dont work. A better definition is:The probability of an outcome is the fraction of trials where that outcome is observed with a large number of trials.14Example: The probability of sunshine for more than 2 hours per day in June in Murree is 97%. This statistic, valuable to the Tourist Bureau, is based on a large number of samples of sunshine in June.

The Law of Large Numbers:If an experiment is repeated a large number of times, the fraction of times a particular outcome is observed will approach the probability of that outcome.15Rules and definitions

S: Sample Space: All possible outcomes of an experiment

A: Event: a subset of S. An event may contain more than one outcome

Mutually exclusive: Two events that have no common outcomes.

The probability of an event must be greater than or equal to 0 and less than or equal to 1.

0 P(A) 1

Also, P(S)=1.

16If P(A)=1, A is a certainty.If two events are mutually exclusive, then the probability that one or the other will occur is the sum of their probabilities.: the Union symbol. It means or: the Intersection symbol. It means andIf A and B are mutually exclusive:P(A B)= P(A)+P(B)P(A B)=0: the compliment symbol. It means not P(A)+P(A)=117Additional Probability Addition RulesVenn Diagram

0.180.120.24Venn Diagrams illustrate the the probabilities of non-exclusive events. The circles represent two different events embedded in the sample space. This could be the probabilities of hitting economical oil (Orange) or gas (pink). P(oil)=0.18+0.12=0.30P(gas)=0.24+0.12=0.36P(gas oil)=0.18+0.12+0.24=0.54Note: this is the inclusive OR in that both events can occur and still be counted.18If we had used our previous addition rule, AB=P(A)+P(B)=P(oil)+P(gas)=0.30+0.36=0.66,We overestimate the probability of finding gas and oil.We fix that by writing:P(oil gas)=P(oil)+P(gas)-P(oil gas)=0.3+0.36-0.12=0.54If the events are mutually exclusive, then P(A) P(B)=0,And the original rule is recovered.19Conditional ProbabilityWhat if probabilities are very common - probabilities where an outcome depends on the occurrence of a previous outcome. If a strength 5 hurricane hits hawks bay, what is the probability that a Kanupp will fail? If an earthquake occurs of the South coast, what is the probability that a major tsunami will be generated. If a disaster occurs, what is the probability that our insurance company will not have sufficient funds If oil supply drops below demand, what is the probability that we can make due with alternative energy?20Conditional probability is the probability that an event will occur, given that another event has already occurred. P(A|B)=P(A B) P(B)The probability of A given that B has occurred is equal to the probability of A and B divided by the probability of B.

In the oil and gas example, what is the probability of finding oil given that gas was found?

P(oil | gas)= 0.12/0.36= 1/3= 33%

0.180.120.2421Bayes Basic TheoremRe-writing the above equation, we get:P(A B) = P(B) P(A | B)andP(A B) = P(A) P(B | A).If A and B are independent, then if B has already occurred or not does not affect the probability of A:P(A|B)=P(A).Substituting into Bayes Theorem:P(A B)= P(A) P(B), if A and B are independent.For n independent events:

22 Example:What is the probability of death by meteoroid impact? The probability of a planet killer meteoroid impact in a given year are about 10-8. The average person lives about 60 years, and there are about 5x109 people.Every 108 years, 5x109 people will be killed by an impact, but every 60 years 5x109 people will be killed by other causes. So, in 108 years, 108/60 * 5x109 die of other causes, and 5x109 people will be killed by an impact. Divide the total deaths by impact by total deaths by other to get the probability of death by impact:P(death by impact)~ 1 in 17 million this is about the same as death by lightning2223Peak Oil ExampleThe probability that A) demand for oil will outstrip supply within the next 5 years is ~70%.The probability that B) we will be able to satisfy demand with other energy sources to take up the demand: ~20%The probability of C) global economic chaos if A B is ~60%.The probability of global economic chaos beginning within the next 5 years:P(A) P(B|A)=0.7*(1-0.2)=0.56P( C ) = 0.56*0.6 ~ 34 %This is the argument that is getting considerable attention now: Google peak oil24If there is more than 1 event Bi (all mutually exclusive) that are conditionally related to event A, then P(A) is the sum of the conditional probabilities of the Bi.

This yields:

Which is the general Bayes Theorem.25Like much of statistics, the formulas are incomprehensible without examples. Consider:

An unknown marine fossil fragment was found at the fossil site in a stream bed. You want a better fossil, but there are two possible sources up stream. Drainaage basin B1 covers 180 km2 and B2 covers 100 km2 . 26Based on the area alone, the probability that the fossil comes from one or the other basins is:P(B1)=180/280=0.64P(B2)=100/280=.36However, a geological map shows that 35% of the outcrops in B1 are marine, while 80% of the outcrops in B2 are marine. The conditional probabilities are:P(A|B1)=0.35 probability of fossil given B1P(A|B2)=0.80 probability of fossil given B2We can now use Bayes theorum to find the probability that the fossil came from B1, given that the fossil is marine:

27AB

ABA

If the empty set, then A and B are said to be mutually exclusive (M.E). A partition of is a collection of mutually exclusive subsets of such that their union is .

BA

(1-9)

Fig. 1.2Fig.1.128De-Morgans Laws:

AB

AB

AB

AB Often it is meaningful to talk about at least some of the subsets of as events, for which we must have mechanism to compute their probabilities.Example 1.1: Consider the experiment where two coins are simultaneously tossed. The various elementary events are Fig.1.3(1-10)29

andThe subset is the same as Head has occurred at least once and qualifies as an event.Suppose two subsets A and B are both events, then consider Does an outcome belong to A or B Does an outcome belong to A and B Does an outcome fall outside A?

30is a measure of the event A given that B has already occurred. We denote this conditional probability by P(A|B) = Probability of the event A given that B has occurred.We define

provided As we show below, the above definitionsatisfies all probability axioms discussed earlier.

(1-35)31We have (i)

(ii) since B = B.

(iii) Suppose Then

But hence

satisfying all probability axioms in (1-13). Thus (1-35) defines a legitimate probability measure.

(1-39)(1-37)(1-36)(1-38)32Properties of Conditional Probability:a. If and

since if then occurrence of B implies automatic occurrence of the event A. As an example, but

in a dice tossing experiment. Then and b. If and

(1-40)

(1-41)

33(In a dice experiment,so that The statement that B has occurred (outcome is even) makes the odds for outcome is 2 greater than without that information).c. We can use the conditional probability to express the probability of a complicated event in terms of simpler related events.

Let are pair wise disjoint and their union is . Thus and

Thus

(1-42)(1-43)

34But so that from (1-43)

With the notion of conditional probability, next we introduce the notion of independence of events.Independence: A and B are said to be independent events, if

Notice that the above definition is a probabilistic statement, not a set theoretic notion such as mutually exclusiveness.

(1-45)

(1-44)35Suppose A and B are independent, then

Thus if A and B are independent, the event that B has occurred does not shed any more light into the event A. It makes no difference to A whether B has occurred or not. An example will clarify the situation: Example 1.2: A box contains 6 white and 4 black balls. Remove two balls at random without replacement. What is the probability that the first one is white and the second one is black?Let W1 = first ball removed is white B2 = second ball removed is black

(1-46)36We need We have Using the conditional probability rule,

But

and

and hence

(1-47)37Are the events W1 and B2 independent? Our common sense says No. To verify this we need to compute P(B2). Of course the fate of the second ball very much depends on that of the first ball. The first ball has two options: W1 = first ball is white or B1= first ball is black. Note that and Hence W1 together with B1 form a partition. Thus (see (1-42)-(1-44))

and

As expected, the events W1 and B2 are dependent.

38From (1-35),

Similarly, from (1-35)

or

From (1-48)-(1-49), we get

or

Equation (1-50) is known as Bayes theorem.

(1-48)(1-49)

(1-50)

39Although simple enough, Bayes theorem has an interesting interpretation: P(A) represents the a-priori probability of the event A. Suppose B has occurred, and assume that A and B are not independent. How can this new information be used to update our knowledge about A? Bayes rule in (1-50) take into account the new information (B has occurred) and gives out the a-posteriori probability of A given B. We can also view the event B as new knowledge obtained from a fresh experiment. We know something about A as P(A). The new information is available in terms of B. The new information should be used to improve our knowledge/understanding of A. Bayes theorem gives the exact mechanism for incorporating such new information.40A more general version of Bayes theorem involves partition of . From (1-50)

where we have made use of (1-44). In (1-51), represent a set of mutually exclusive events with associated a-priori probabilities With the new information B has occurred, the information about Ai can be updated by the n conditional probabilities

(1-51)

41Example 1.3: Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box (B1) has 15 defective bulbs and the second 5. Suppose a box is selected at random and one bulb is picked out. (a) What is the probability that it is defective?Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box B2 has 195 good and 5 defective bulbs. Let D = Defective bulb is picked out.Then

42Since a box is selected at random, they are equally likely.

Thus B1 and B2 form a partition as in (1-43), and using (1-44) we obtain

Thus, there is about 9% probability that a bulb picked at random is defective.

43(b) Suppose we test the bulb and it is found to be defective. What is the probability that it came from box 1?

Notice that initially then we picked out a box at random and tested a bulb that turned out to be defective. Can this information shed some light about the fact that we might have picked up box 1? From (1-52), and indeed it is more likely at this point that we must have chosen box 1 in favor of box 2. (Recall box1 has six times more defective bulbs compared to box2).

(1-52)

44CONTINUOUS RANDOM VARIABLES45Definition and Basic Properties Recall that a random variable X is simply a function from a sample space S into the real numbers. The random variable is discrete is the range of X is finite or countably infinite. This refers to the number of values X can take on, not the size of the values. The random variable is continuous if the range of X is uncountably infinite and X has a suitable pdf (see below). Typically an uncountably infinite range results from an X that makes a physical measuremente.g., the position, size, time, age, flow, volume, or area of something. 46Definition and Basic Properties The pdf of a continuous random variable X must satisfy two conditions.It is a nonnegative function (but unlike in the discrete case it may take on values exceeding 1).Its definite integral over the whole real line equals one. That is

47Definition and Basic Properties The pdf of a continuous random variable X must satisfy three conditions.Its definite integral over a subset B of the real numbers gives the probability that X takes a value in B. That is,

for every subset B of the real numbers. As a special case (the usual case) for all real numbers a and b

Put simply, the probability is simply the area under the pdf curve over the interval [a,b].

48Definition and Basic Properties If X has uncountable range and such a pdf, then X is a continuous random variable. In this case we often refer to f as a continuous pdf. Note that this means f is the pdf of a continuous random variable. It does not necessarily mean that f is a continuous function. 49Definition and Basic Properties Note that by this definition the probability of X taking on a single value a is always 0. This follows from , since every definite integral over a degenerate interval is 0. This is, of course, quite different from the situation for discrete random variables.

50Definition and Basic Properties Consequently we can be sloppy about inequalities. That is

Remember that this is blatantly false for discrete random variables.

51Definition and Basic Properties There are random variables that are neither discrete nor continuous, being discrete at some points in their ranges and continuous at others. They are not hard to construct, but they seldom appear in introductory courses and will not concern us.Mathematicians have defined many generalizations of the Riemann integral of freshman calculusthe Riemann-Stieljes integral and the Lesbegue integral being common examples. With a suitable generalized integral it is possible to treat discrete and continuous random variables identically (as well as the mixed random variables), but this approach lies far beyond the scope of our course. 52Definition and Basic Properties ExamplesLet X be a random variable with range [0,2] and pdf defined by f(x)=1/2 for all x between 0 and 2 and f(x)=0 for all other values of x. Note that since the integral of zero is zero we get

That is, as with all continuous pdfs, the total area under the curve is 1. We might use this random variable to model the position at which a two-meter with length of rope breaks when put under tension, assuming every point is equally likely. Then the probability the break occurs in the last half-meter of the rope is

53Definition and Basic Properties ExamplesLet Y be a random variable whose range is the nonnegative reals and whose pdf is defined by

for nonnegative values of x (and 0 for negative values of x). Then

54Definition and Basic Properties The random variable Y might be a reasonable choice to model the lifetime in hours of a standard light bulb with average life 750 hours. To find the probability a bulb lasts under 500 hours, you calculate

55Definition and Basic Properties Note that in both these examples the pdf is not a continuous function. Also note that in all these cases the pdf behaves as a linear density function in the physical sense: the definite integral of the density of a nonhomogeneous wire or of a lamina gives the mass of the wire or lamina over the specified interval. Here the mass is the probability.56Cumulative Distribution Functions The cdf of a continuous random variable has the same definition as that for a discrete random variable. That is,

In practice this means that F is essentially a particular antiderivative of the pdf since

Thus at the points where f is continuous F(x)=f(x).

57Cumulative Distribution Functions Knowing the cdf of a random variable greatly facilitates computation of probabilities involving that random variable since, by the Fundamental Theorem of Calculus,

58Cumulative Distribution Functions In the second example above, F(x)=0 if x is negative and for nonnegative x we have

Thus the probability of a light bulb lasting between 500 and 1000 hours is

59Cumulative Distribution Functions In the first example above F(x)=0 for negative x, F(x)=1 for x greater than 2 and F(x)=x/2 for x between 0 and 2 since for such x we have

Thus to find the probability the rope breaks somewhere in the first meter we calculate F(1)-F(0)=1/2-0-1/2, which is intuitively correct.

60Cumulative Distribution Functions If X is a continuous random variable, then its cdf is a continuous function. Moreover,

and

Again these results are intuitive

61Expectation and Variance DefinitionsThe expected value of a continuous random variable X is defined by

Note the similarity to the definition for discrete random variables. Once again we often denote it by . As in the discrete case this integral may not converge, in which case the expectation if X is undefined.

62Expectation and Variance DefinitionsAs in the discrete case we define the variance by

Once again the standard deviation is the square root of variance. Variance and standard deviation do not exist if the expected value by which they are defined does not converge.

63Expectation and Variance TheoremsThe Law Of The Unconscious Statistician holds in the continuous case. Here it states

Expected value still preserves linearity. That is

The proof depends on the linearity of the definite integral (even an improper Riemann integral).

64Expectation and Variance TheoremsSimilarly the expected value of a sum of functions of X equals the sum of the expected values of those functions (see theorem 4.3 in the book) by the linearity of the definite integral.The shortcut formula for the variance holds for continuous random variables, depending only on the two preceding linearity results and a little algebra, just as in the discrete case. The formula states

Variance and standard deviation still act in the same way on linear functions of X. Namely and

65Expectation and Variance ExamplesIn the two-meter-wire problem, the expected value should be 1, intuitively. Let us calculate:

66Expectation and Variance ExamplesIn the same example the variance is

and consequently

This result seems plausible.

67Expectation and Variance ExamplesIt is also possible to compute the expected value and variance in the light bulb example. The integration involves integration by parts.

lecture on probablity

Documents