an introduction to mathematical statistics and its applications … · an introduction to...

An Introduction to Mathematical Statistics andIts Applications

Richard Larsen and Morris Marx

Contents

1 Introduction 11.1 An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Probability 52.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Sample spaces and The Algebra of sets . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 The Probability Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.6 Combinatoric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.7 Combinatorial Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3 Random Variables 923.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.2 Binomial and Hypergeometric Probabilities . . . . . . . . . . . . . . . . . . . . . . . 963.3 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063.4 Continuous random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233.5 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1383.6 The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1513.7 Joint Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1603.8 Transformation and Combining Random Variables . . . . . . . . . . . . . . . . . . . 1793.9 Further properties of Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . 1933.10 Order Statistics —- skip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2013.11 Conditional Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2013.12 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2033.13 Taking a second look at statistics(Interpreting Means) . . . . . . . . . . . . . . . . . 215

2

Chapter 1

Introduction

1.1 An Overview

The book covers two broad topics

•Mathematics of Statistics

• Practice of statisticsMathematics of Statistics refers to the probability

that supports and justifies the various method usedto analyze data. Why Statistical Techniques areneeded?Want to do some research like:Research Questions:

• Do stock Market rise and Fall randomly?

• Can External forces such as the phases of themoon affect admission to mental hospital?

•What kind of relationship exists between expo-sure to radiation and cancer mortality?

1

Difficult or impossible to perform in the lab.Can be answered by collecting data, making as-

sumptions about the conditions that generated thedata and then drawing inferences about the assump-tion.

Case Study 1.2.3 (4th Ed)

In folklore, the full moon is often portrayed assomething sinister, a kind of evil force possessingthe power to control our behavior. Over the cen-turies, many prominent writers and philosophershave shared this belief. The possibility of lunarphases influencing human affairs is a theory notwithout supporters among the scientific community.Studies by reputable medical researchers have at-tempted to link the ” Transylvania effect,” as it hascome to be known, with higher suicide rates, pyro-mania, and even epilepsy.Note:Pyromania in more extreme circumstances

can be an impulse control disorder to deliberatelystart fires to relieve tension or for gratification orrelief. The term pyromania comes from the Greek

word (’pyr’, fire).The relationship between the admission rates to

the emergency room of a Virginia mental healthclinic before, during and after the twelve full moonsfrom August 1971 to July 1972.

Table 1.1: Admission Rates (Patients /Day )

Year Month Before Full Moon During Full Moon After Full Moon1971 Aug. 6.4 5.0 5.8

Sept 7.1 13.0 9.2Oct. 6.5 14.0 7.9. . . .. . . .. . . .

1972 Jan. 10.4 9 13.5Feb. 11.5 13.0 13.1. . . .. . . .. . . .July 15.8 20.0 14.5Averages 10.9 13.3 11.5

For these data, the average admission rate ”dur-ing” the full moon is higher than the ” before”and ”after” admission rates: 13.3 versus 10.9 and11.5. Does that imply the ”Transylvania” effect isreal? Not necessarily. The question that needs tobe addressed is whether sample means of 13.3, 10.9and 11.5 are significantly different or not. Afterdoing a suitable statistical analysis, the conclusionis these three means are not statistically different

which conclude that ”Transylvania” effect is notreal. How do you make the decision? Based onsome probability!We will learn theory of probability in this class.

Chapter 2

Probability

2.1 Introduction

Read pages 22 through 23.What is Probability?Consider tossing a coin once. What will be the

outcome? The outcome is uncertain. Head or tail?What is the probability it will land on its head?What is the probability that it will land on its

tail?A probability is a numerical measure of the like-

lihood of the event (head or tail). It is a numberthat we attach to an event.A probability is a number from 0 to 1. If we assign

a probability of 0 to an event, this indicates that thisevent never will occur. A probability of 1 attachedto a particular event indicates that this event alwayswill occur. What if we assign a probability of .5?

5

This means that it is just as likely for the event tooccur as for the event to not occur.

THE PROBABILITY SCALE

+----------------+-----------------+

0 .5 1

event never event and "not event" always

will occur event are likely will

to occur occur

Three basics methods on assigning probability toan event.

1) classical approach. Credited to Gerolamo Car-dano. Require that (1)the number of possibleoutcomes is finite and (2) all outcomes are equallylikely. The probability of an event consisting moutcomes is m/n, where n is the total possibleoutcomes. Example: Tossing a fair six sided diegive n =6. The probability of getting either 2, 4or 6 is m/n=3/6. (Limited)

2) empirical approach. Credited to Richard von

Misses. Needed identical experiments be repeatedmany times let say n times. Can count the num-ber of times event of interest occurs m. Theprobability of the event is the limit as n goes toinfinity of m/n. In practice how to determinehow large n is in order m/n to be good approx-imation of limn→∞m/n

3) subjective - depend on situations.

Back to the coin tossed1)Classical approach.

P(head) = number of headthe number of possible outcome

= 1/2

P(tail) = number of tailthe number of possible outcome

= 1/2

This approach is based on assumption that theevent head and tail are equally likely to occur.

2) Empirical approach.Toss the coin 1000 times. Count how may times

it landed on the head or tail.

P (head) =number of head

1000

=number times event happen

number of experiments

P(tail)= number of tail1000

3) subjectiveJust guest the probability of head or probability

of a tail.

2.2 Sample spaces and The Algebra of sets

Just as statistics build on probability theory, prob-ability in turns build upon set theory.Definition of key terms:

experiment : Procedure that can be repeated, the-oretically an infinite number of time and has welldefined set of possible outcomes.

sample outcome s : each potential eventualities ofan experiment

sample space S : the totalities of sample outcomes

event : collection of sample outcomes

Example 2.2.1Consider the experiment tossing a coin three times.

experiment :tossing a coin three times

sample outcome s : HTH

sample space : S = {HHH, HTH, HHT, THH,HTT, THT, TTH, TTT }

event : Let A represent outcomes having 2 head soA={ HTH, HHT, THH }Example 2.2.4A coin is tossed until the first tail appears

sample outcome s : HT

sample space : S = {T, HT, HHT, HHHT, · · · }Note in example 2.2.4 sample outcomes are infi-

nite.Practice:2.2.1 A graduating engineer has signed up for

three jobs interview. She intends to categorize eachone as being either being a ”success” (1) or a ”fail-ure ” (0) depending on whether it leads to a plant

trip. Write out the appropriate sample space. Whatoutcomes are in the event A:Second success occurson third interview? In B: First success never occur?Hint: Notice the similarity between this situationand the coin tossing experiment in Example 2.2.1.

Answer: S = { 111,110, 101, 011, 001, 010,100,000}

A = { 101, 011 }B = { 000 }

2.2.2 Three dice are tossed, one red, one blue,and one green. What outcomes make up the eventA that the sum of the three faces showing equalsfive?

Answer: A={113,122,131,212,221,311 }Practice2.2.3 An urn contains six chips numbered 1 through

6. Three are drawn out. What outcomes are in theevent A ”Second smallest chip is a 3”? Assume thatthe order of the chips is irrelevant.Answer: A={ 134, 135, 136, 234, 235, 236,}

Practice: 2.2.12Unions, Intersections and Complements

Operations performed among events defined onthe sample space is referred to as algebra of setDefinition 2.2.1. Let A and B be any two

events defined over the same sample space S. Then

a. The intersection of A and B, written as A∩B,is the event whose outcomes belong to both Aand B.

b. The union of A and B, written as A∪B, is theevent whose outcomes belong to either A or B ofboth.

ExampleA= { 1, 2, 3, 4, 5, 6, 7, 8} B={2,4,6,8}A ∩B= { 2, 4, 6, 8 }A ∪B={ 1, 2, 3, 4, 5, 6, 7, 8 }Example 2.2.7Let A be the set of x for which x2 + 2x = 8; let

B be the set for which x2+x = 6. Find A∩B andA ∪B.Answer: Since the first equation factor into (x+

4)(x − 2) = 0, its solution set is A = {−4, 2}.Similarly, the second equation can be written (x +

3)(x− 2) = 0, making B = {−3, 2}, ThereforeA ∩B = {2}

A ∪B = {−4,−3, 2}Definition 2.2.2.Events A and B defined over the same sample

space S are said to be mutually exclusive if theyhave no outcomes in common - that is, ifA∩B = ∅,where ∅ is the null set.

ExampleA= { 1,3,5,7 } B={2,4,6,8}A ∩B= { }= ∅Definition 2.2.3.Let A be any event defined on a sample space S.

The complement of A, written AC , is the eventconsisting of all the outcomes in S other than thosecontained in A.

Example S = { 1,2,3,4,5,6,7,8, 9, 10}A={2,4,6,8,10},AC = { }

Example 2.2.11Suppose the events A1, A2, · · · , Ak are intervals

of real numbers such that Ai = {x : 0 ≤ x ≤ 1/i}.

Describe the set A1 ∪ A2∪, . . . ,∪Ak =∪ki=1Ai

and A1 ∩ A2∩, . . . ,∩Ak =∩ki=1Ai

Notice that A′is are telescoping sets. That is, A1

is the interval 0 ≤ x ≤ 1, A2 is the interval 0 ≤x ≤ 1/2 and so on. It follows, then that the unionof the k Ai’s is simply A1 while the intersection ofthe Ai’s (that is their overlap) is Ak.

Practice:

Let A be the set of x for which x2+2x− 8 ≤; letB be the set for which x2+x− 6 ≤ 0. Find A∩Band A ∪B.Answer: The solution set for the first inequality

is [−4, 2], then A = {x : −4 ≤ x ≤ 2}. Similarly,the second inequality has a solution [−3, 2], makingB = {x : −3 ≤ x ≤ 2}, Therefore

A ∩B = {x : −3 ≤ x ≤ 2}

A ∪B = {x : −4 ≤ x ≤ 2}

2.2.22 Suppose that each of the twelve lettersin the word { T E S S E L L A T I O N} iswritten on a chip. Define the events F, R, and C asfollows:

F: letters in first half of alphabetR: letters that are repeatedV: letters that are vowelsWhich chips make up the following events:

(a) F ∩R ∩ V

(b) FC ∩R ∩ V C

(c) F ∩RC ∩ V

Answer:

(a) F ∩R ∩ V = {E1, E2}(b) FC ∩R ∩ V C = {S1, S2, T1, T2}(c) F ∩RC ∩ V = {A, I}

More Practices 2.2.16, 2.2.28, 2.2.29

2.2.16 Sketch the regions in the xy-plane corre-sponding to A ∪B and A ∩B if

A = {(x, y) : 0 < x < 3, 0 < y < 3}

B = {(x, y) : 2 < x < 4, 2 < y < 4}

2.2.28. Let events A and B and sample space Sbe defined as the following intervals:

S = {x : 0 ≤ x ≤ 10}A = {x : 0 < x < 5}B = {x : 3 ≤ x ≤ 7}Characterize the following events:

(a) AC

(b) A ∩B

(c) A ∪B

(d) A ∩BC

(e) AC ∪B

(f) AC ∩B

2.2.29 A coin is tossed four times and the re-sulting sequence of Heads and/ or Tails is recorded.Defined the events A, B, and C as follows:

A: Exactly two heads appear

B: Heads and tails alternate

C: First two tosses are heads

(a) Which events, if any, are mutually exclusive?

(b) Which events, if any, are subsets of other sets?

Expressing Events Graphically: Venn Di-agram

Read Textbook Notes on page 25 through 26

Relationships based on two or more events cansometimes be difficult to express using only equa-tions or verbal descriptions. An alternative ap-proach that can be used highly effective is to repre-sent the underlying events graphically in a formatknown as a Venn diagrams.

Example 2.2.13 (4th Ed)When swampwater Tech’s class of ’64 held its for-

tieth reunion, one hundredth grads attended. fif-teen of those alumni were lawyers and rumor had itthat thirty of the one hundredth were psychopaths.If ten alumni were both lawyers and psychopath,how many suffered from neither afflictions?Let L be the set of lawyers and H, the set of psy-

chopaths. If the symbol N(Q) is defined to be thenumber of members in set Q, then,

N(S) = 100

N(L) = 15

N(H) = 30

N(L ∩ H) = 10

Summarize these information in a venn diagram.Notice thatN(L ∪ H) = number of alumni suffering from at

least one affliction= 5 + 10 + 20= 35

Therefore alumni who were neither lawyers of psy-chopaths is 100− 35 = 65.We can also see that N(L ∪ H)= N(L) + N(H) -

N(L ∩ H)

Practice2.2.31 During orientation week, the latest Spi-

derman movie was shown twice at State Univer-sity. Among the entering class of 6000 freshmen,850 went to see it the first time, 690 the secondtime, while 4700 failed to see it either time. Howmany saw it twice?Answer: 850 + 690− 1300 = 240.

2.2.32. De Morgans laws Let A and B be anytwo events. Use Venn diagrams to show that

(a) the complement of their intersection is the unionof their complement.

(A ∩B)C = AC ∪BC

(b) the complement of their union is the intersec-tion of their complements.

(A ∪B)C = AC ∩BC

Practice

2.2.36 Use Venn diagrams to suggest an equiv-alent way of representing the following events:

(a) (A ∩BC)C

(b) B ∪ (A ∩B)C

(c) A ∩ (A ∩B)C

2.2.37 A total of twelve hundredth graduatesof State Tech have gotten into medical school inthe past several years. Of that number, one thou-sand earned scores of twenty seven or higher on theMedical College Admission Test (MCAT) and four

hundred had GPA that were 3.5 or higher. More-over, three hundred had MCATs that were twentyseven or higher and GPA that were 3.5 or higher.What proportion of those twelve hundred graduatesgot into medical school with an MCAT lower thantwenty seven and a GPA below 3.5?

2.2.38

2.2.40 For two events A and B defined on a sam-ple space S,N(A ∩ BC) = 15, N(AC ∩ B) = 50,and N(A ∩ B) = 2. Given that N(S) =120, howmany outcomes belong to neither A nor B?

2.3 The Probability Function

The following definition of probability was entirelya product of the twentieth century. Modern mathe-maticians have shown a keen interest in developingsubjects axiomatically. It was to be expected, then,that probability would come under such scrutinyand be defined not as a ratio (classical approach)or as the limit of a ratio (empirical approach) butsimply as a function that behaved in accordancewith a prescribed set of axioms.

Denote P(A) as a probability of A, where P is theprobability function. It is a mapping from a set A(event) in a sample space S to a number.The major breakthrough on this front came in

1993 when Andrey Kolmogorov published Foun-dation of the Theory of Probability. Kolmogorovswork was a masterpiece of mathematical elegance-it reduced the behavior of the probability functionto a set of just three or four simple postulates, threeif the sample space is finite and four if S is infinite.

Three Axioms (Kolmogorov) that are necessaryand sufficient for characterizing the probability func-tion P:

Axiom 1 Let A be any event defined over S.Then P(A) ≥ 0

Axiom 2 P(S)=1

Axiom 3 Let A and B be any two mutuallyexclusive events defined over S. ThenP (A∪B) = P (A)+P (B). (Additivity or finiteadditivity)

When S has an infinite number of members, afourth axiom is needed:

Axiom 4. LetA1, A2, . . . , be events defined overS. If Ai ∩ Aj = ∅ for each i ̸= j, then

P

∞∪i=1

Ai

=

∞∑i=1

P (Ai)

Note that from Axiom 4 follows Axiom 3, but ingeneral the inverse does not hold.Some basic properties of P that are the conse-

quence form the Kolgomorov Axiom are:

Theorem 2.3.1. P (AC) = 1− P (A)

Proof By Axiom 2 and Definition 2.2.3 (com-plement of an event :S = A ∪ AC)

P (S) = 1 = P (A ∪ AC),

but A and AC are mutually exclusive, so by axiom2, P (A ∪ AC) = P (A) + P (AC) and the resultfollows.

Theorem 2.3.2. P (∅) = 0

Proof Since ∅ = SC, P (SC) = 1− P (S) = 0

Theorem 2.3.3. If A ⊂ B, then P (A) ≤ P (B)

Proof Note that event B may be written in theform

B = A ∪ (B ∩ AC)

where A and (B ∩ AC) are mutually exclusive.Therefore P (B) = P (A) + P (B ∩ AC) which im-plies that P (B) ≥ P (A) since P (B ∩ AC) > 0

Theorem 2.3.4. For any event A,P (A) ≤ 1.

Proof The proof follows immediately from The-orem 2.3.3 because A ⊂ S and P (S) = 1

Theorem 2.3.5. Let A1, A2, · · · , An be definedover S. If Ai ∩ Aj = ∅ for i ̸= j then

P

n∪i=1

Ai

=

n∑i=1

P (Ai)

Proof The proof is a straight forward inductionargument with axiom 3 being the starting point.

Theorem 2.3.6. P (A ∪ B) = P (A) + P (B) −P (A ∩B)

Proof The Venn diagram forA∪B suggests thatthe statement of the theorem is true. Formally, wehave from Axiom 3 that

P (A) = P (A ∩BC) + P (A ∩B)

and

P (B) = P (B ∩ AC) + P (A ∩B)

adding these two equations gives

P (A) + P (B) = [P (A ∩BC) + P (B ∩ AC)

+ P (A ∩B)] + P (A ∩B).

By Theorem 2.3.5, the sum in the brackets is P (A∪B). If we subtract P (A∩B) from both sides of theequations, the result follows.

Example 2.3.1

Let A and B be two events defined on the samplespace S such that P (A) = 0.3, P (B) = 0.5 andP (A ∪B) = 0.7. Find (a) P (A ∩B), (b) P (AC ∪BC), and (c) P (AC ∩B).

(a) From Theorem 2.3.6 we have

P (A ∩B) = P (A) + P (B)− P (A ∪B)

= 0.3 + 0.5− 0.7 = 0.1

(b) From De Morgan’s laws AC∪BC = (A∩B)C,so P (AC ∪BC) = P (A∩B)C = 1− (A∩B) =1− 0.1 = 0.9

(c) The event AC ∩B can be shown by venn dia-gram. From the diagram

P (AC ∩B) = P (B)− (A ∩B) = 0.5− 0.1 = 0.4

Example 2.3.2

Show that P (A ∩ B) ≥ 1 − P (AC) − P (BC)for any two events A and B defined on a samplespace S.

From Example 2.3.1 (a) and Theorem 2.3.1

P (A ∩B) = P (A) + P (B)− P (A ∪B)

= 1− P (AC) + 1− P (BC)− P (A ∪B).

But P (A∪B) ≤ 1 from Theorem 2.3.4, so P (A∩B) ≥ 1− P (AC)− P (BC)

Read Example 2.3.4

Example 2.3.5 Having endured (and survived)the mental trauma that comes from taking two yearsof Chemistry, a year of Physics and a year of Biol-ogy, Biff decides to test the medical school watersand sent his MCATs to two colleges, X and Y .Based on how his friends have fared, he estimatesthat his probability of being accepted at X is 0.7,and at Y is 0.4. He also suspects there is a 75%chance that at least one of his application will berejected. What is the probability that he gets atleast one acceptance?Answer: Let A be the event ”school X accept

him” and B the event ”school Y accept him”. We

are given that P (A) = 0.7, P (B) = 0.4, P (AC ∪BC) = 0.75. the question is P (A ∪B). From The-orem 2.3.6,

P (A ∪B) = P (A) + P (B)− P (A ∩B)

From de Morgan’s law

AC ∪BC = (A ∩B)C

so P (A ∩B) = 1− P (A ∩B)C = 1− 0.75 = 0.25

It follow that Biff’s prospect is not that bleak - hehas an 85% chance of getting in somewhere:

P (A ∪B) = P (A) + P (B)− P (A ∩B)

= 0.7 + 0.4− 0.25 = 0.85

Practice

2.3.2 Let A and B be two events defined onS. Suppose that P (A) = 0.4, P (B) = 0.5, andP (A∩B) = 0.1, What is the probability that A orB but not both occur?Answer:

P (A or B but not A and B)

= P (A ∪B)− P (A ∩B)

= P (A) + P (B)− P (A ∩B)− P (A ∩B) = 0.7

2.3.4 Let A and B be two events defined on S.If the probability that at least one of them occursis 0.3 and the probability that A occurs but B doesnot occur is 0.1, what is P(B)?Answer:

Given P (A ∪B) = 0.3

P (A ∩BC) = P (A)− P (A ∩B) = 0.1

What isP (B)

Since P (A ∪B) = P (A) + P (B)− P (A ∩B)

Then P (B) = 0.3− 0.1 = 0.2

2.3.5 Suppose that three fair dice are tossed. LetAi be the event that a 6 shows on the ith die, i =1, 2, 3 Does P (A1 ∪ A2 ∪ A3) =

12? Explain.

Answer: No. P (A1 ∪ A2 ∪ A3) = P(at least one

6th appears) = 1 - P(no 6th appear) = 1−(56

)3.

Practice2.3.62.3.10

2.3.112.3.122.3.142.3.16

2.4 Conditional Probability

In the previous section we were given two separateprobability of events A and B. Knowing P (A∩B)we can find the probability of A ∪ B. Sometimesknowing that certain event A has happened canchange the probability of B happen compare withtwo individual probability of events A and B. Thisis called conditional probability.Consider a fair die being tossed, with A defined as

the event 6 appears. Clearly, P (A) = 1/6. But sup-pose that the die has already been tossedby some-one who refuses to tell us whether or not A occurredbut does enlighten us to the extent of confirmingthat B occurred, where B is the event Even numberappears. What are the chances of A now? Here,common sense can help us: There are three equally

likely even numbers making up the event Bone ofwhich satisfies the event A, so the updated proba-bility is 1/3 . Notice that the effect of additionalinformation, such as the knowledge that B has oc-curred, is to reviseindeed, to shrinkthe original sam-ple space S to a new set of outcomes S’. In this ex-ample, the original S contained six outcomes, theconditional sample space, three (see Figure 2.4.1).The symbol P (A|B) read the probability of A

given B is used to denote a conditional probability.Specifically, P (A|B) refers to the probability thatA will occur given that B has already occurred.

Definition 2.4.1Let A and B any two events defined on S such

that P (B) > 0. The conditional probability of A,assuming that B has already occurred, is writtenP (A|B) and, is given by

P (A|B) =P (A ∩B)

P (B)

Comment: From definition 2.4.1,

P (A ∩B) = P (A|B)P (B)

Example 2.4.2Consider the set of families having two children.

Assume that the four possible birth sequences(youngerchild is a boy, older child is a boy), (younger childis a boy, older child is a girl), and so onare equallylikely.(sequences(b, b), (b, g), (g, b), and (g, g)hasa 1/4 probability of occurring.) What is the proba-bility that both children are boys given that at leastone is a boy?Let A be the event that both children are boys,

and let B be the event that at least one child is aboy. Then

P (A|B) = P (A ∩B)/P (B) = P (A)/P (B)

since A is a subset of B (so the overlap betweenA and B is just A). But A has one outcome {(b,b)} and B has three outcomes (b, g), (g, b), (b, b).Applying Definition 2.4.1, then, gives P (A|B) =(1/4)/(3/4) = 1/3

Example 2.4.3Two events A andB are defined such that (1) the

probability that A occurs but B does not occur is

0.2, (2) the probability that B occurs but A doesnot occur is 0.1, and (3) the probability that neitheroccurs is 0.6. What is P (A|B)?Answer: Given (1) P (A∩BC) = 0.2. (2) P (B ∩

AC) = 0.1. (3)P (A ∪ B)C = 0.6. We want

P (A|B) =P (A∩B)P (B)

. Therefore we need P (B) and

P (A ∩ B). From (3) we have P (A ∪ B) = 0.4 .Draw the venn diagram. From the venn diagram,

P (A ∪B) = P (B ∩ AC) + P (A ∩B) + P (B ∩ AC)

P (A ∩B) = 0.4− 0.2− 0.1 = 0.1

P (B) = P (A ∩B) + P (B ∩ AC) = 0.1 + 0.1 = 0.2

P (A|B) = 0.1/0.2 = 0.5

Example 2.4.5Max and Muffy are two myopic deer hunters who

shoot simultaneously at a nearby sheepdog thatthey have mistaken for a 10-point buck. Basedon years of well documented ineptitude, it can beassumed that Max has a 20% chance of hitting astationary target at close range, Muffy has a 30%chance, and the probability is 0.06 that they will

both be on target. Suppose that the sheepdog ishit and killed by exactly one bullet. What is theprobability that Muffy fired the fatal shot? Let Abe the event that Max hit the dog, and let B bethe event that Muffy hit the dog. Then P (A) =0.2, P (B) = 0.3, andP (A ∩B) = 0.06.{myopic: not able to see clearly things that are

far awayineptitude: incompetence}We are trying to find P (B|(AC ∩B)∪ (A∩BC))

where the event (AC ∩B)∪ (A∩BC) is the unionof A and B minus the intersection. that is, it rep-resents the event that either A or B but not bothoccurNotice, that the intersection of B and (AC ∩B)∪

(A ∩ BC) is the event AC ∩ B. Therefore, fromDefinition 2.4.1,

P (B|(AC ∩B) ∪ (A ∩BC))

= [P (AC ∩B)]/[P (AC ∩B) ∪ (A ∩BC)]

= [P (B)− P (A ∩B)]/[P (A ∪B)− P (A ∩B)]

= [0.3− 0.06]/[0.2 + 0.3− 0.06− 0.06]

= 0.63

Practice2.4.1.Suppose that two fair dice are tossed. What

is probability that the sum equals 10 given that itexceeds 8 ?Answer: Let A: event sum of the two faces is 10

A = {(5, 5), (4, 6), (6, 4)}. Let B: event sum of thetwo faces exceed 8. B = {sum = 9, 10, 11, 12}B = {(4, 5), (5, 4), (3, 6), (6, 3), (5, 5), (4, 6), (6, 4), (5, 6), (6, 5), (6, 6)}.Note that the number of elements in the samplespace is 36 Q: P (A|B) = P (A ∩B)/P (B)?

2.4.2 Find P (A∩B) if P (A) = 0.2, P (B) = 0.4,and P (A|B) + P (B|A) = 0.75.Answer:

0.75 = P (A|B) + P (B|A)= P (A ∩B)/P (B) + P (A ∩B)/P (A)

→ P (A ∩B) = 0.1

Homework2.4.6, 2.4.7, 2.4.10, 2.4.11, 2.4.12, 2.4.16

Applying Conditional Probability to Higher-Order Intersection.What is the formula for P (A∩B ∩C)? If we let

A ∩B as D then

P (A ∩B ∩ C) = P (D ∩ C)

= P (C|D)P (D)

= P (C|A ∩B)P (A ∩B)

= P (C|A ∩B)P (B|A)P (A)

Repeating this same argument for n events,A1, A2, · · · , An give a general case for:

P (A1 ∩ A2 ∩ · · · ∩ An)

= P (An|A1 ∩ A2 ∩ · · · ∩ An−1)

.P (An−1|A1 ∩ A2 ∩ · · · ∩ An−2)

· · ·P (A2|A1)P (A1) (2.4.1)

Example 2.4.7A box contains 5 white chips, 4 black, and 3 red

chips. Four chips are drawn sequentially and with-out replacement. What is the probability of obtain-ing the sequence (W,R,W,B) ? Define following four

events:

A: white chip is drawn on 1st selection;

B: red chip is drawn on 2nd selection;

C: white chip is drawn on 3rd selection;

D: black chip is drawn on 4th selection

Our objective is to find P (A ∩ B ∩ C ∩ D), SoP (A ∩ B ∩ C ∩D) = P (D|A ∩ B ∩ C)P (C|A ∩B)P (B|A)P (A). From the diagram,P (D|A∩B ∩C) = 4/9,P (C|A∩B) = 4/10, P (B|A) = 3/11, P (A) =5/12Therefore

P (A ∩B ∩ C ∩D) =4

9· 4

10· 3

11· 5

12

=240

11880= 0.02

Homework2.4.21, 2.4.24

Calculating ”Unconditional” Probability

Also called Total ProbabilityLet’s partition S into mutually exclusive parti-

tions namely: A1, A2, · · · , An, where the union isS. LetB denote an event defined on S. See venn di-agram in Figure 2.4.7 in the text. the next theoremgives formula for the ”unconditional” probability ofB.

Theorem 2.4.1 (Total Probability Theo-rem) Let {Ai}ni=1 be a set of events defined overS such that S =

∪ni=1Ai, Ai ∩ Aj = ∅ for i ̸= j,

and P (Ai) > 0 for 1 = 1, 2, · · · , n. For any eventB

P (B) =

n∑i=1

P (B|Ai)P (Ai)

Proof. By the conditioned imposed on the Ai’s

B = (B ∩ A1) ∪ (B ∩ A2) ∪ · · · (B ∩ An)

and

P (B) = P (B ∩ A1) + P (B ∩ A2) + · · · + P (B ∩ An).

But each P (B ∩Ai) can be written as the productof P (B|Ai)P (Ai) and the result follows.

Example 2.4.8Box I contains two red chips and four white chips;

box II, three red and one white. A chip is drawn atrandom from box I and transferred to box II. Thena chip is drawn from box II. What is the probabilitythat the chip drawn from box II is red?Let B the event Chip drawn from urn II is red; let

A1 and A2 be the events Chip transferred from urnI is red and Chip transferred from urn I is white,respectively. Then P (B|A1) = 4/5, P (B|A2) =3/5, P (A1) = 2/6, P (A2) = 4/6. Then P (B) =P (B|A2)P (A2) + P (B|A1)P (A1) = 4/5 · 2/6 +3/5 · 4/6 = 2/3

Example 2.4.10Ashley is hoping to land a summer internship with

a public relation firm. If her interview goes well,she has a 70 % chance of getting a offer. If theinterview is a bust, though, her chance of getting the

position drop to 20 %. Unfortunately, Ashey tendsto babble incoherently when she is under stress, sothe likelihood of the interview going well is only0.10. What is the probability that Ashley gets theinternship?Let B be the event ” Ashley is offered internship,”

let A1 be the event ”interview goes well” and A2be the event ”interview does not go well”. By theassumption,

P (B|A1) = 0.70 P (B|A2) = 0.20

P (A1) = 0.10 P (A2) = 1− P (A1) = 0.90

From the Total Probability Theorem,

P (B) = P (B|A2)P (A2) + P (B|A1)P (A1)

= (0.70)(0.10) + (0.2)(0.90) = 0.25

Practice2.4.25A toy manufacturer buys ball bearings from three

different suppliers - 50 % of her total order comesfrom supplier 1, 30 % from supplier 2, and the restfrom supplier 3. Past experience has shown that the

quality control standards of the three suppliers arenot all the same. Two percent of the ball bearingsproduced by supplier 1 are defective, while suppliers2 and 3 produce defective 3 % and 4 % of the time,respectively. What proportion of the ball bearingsin the toy manufacturer’s inventory are defective?Let B be the event that ball bearings are defec-

tive. A1 be the event ball bearing from manufac-turer 1, A2 be the event ball bearing from manu-facturer 2, A3 be the event ball bearing from manu-facturer 3. From the information P (B|A1) = 0.02,P (B|A2) = 0.03, P (B|A3) = 0.04. P (A1) =0.5, P (A2) = 0.3, P (A3) = 0.2,

P (B) = P (B|A3)P (A3) + P (B|A2)P (A2) + P (B|A1)P (A1)

= (0.04)(0.2) + (0.03)(0.3) + (0.02)(0.5) = 0.027

Homework2.4.26, 2.4.28, 2.4.30

Bayes theoremIf we know P (B|Ai) for all i, the theorem enables

us to compute conditional probabilities in the other

direction that is we can use the P (B|Ai)s to findP (Ai|B). It is like a certain kind of ”inverse” prob-ability.Theorem 2.4.2 (Bayes) Let {Ai}ni=1 be a set

of n events, each with positive probability, thatpartition S in such a way that ∪n

i=1Ai = S andAi ∩ Aj = ∅ for i ̸= j. For any event B (alsodefined on S, where P (B) > 0,

P (Aj|B) =P (B|Aj)P (Aj)∑ni=1P (B|Ai)P (Ai)

for any 1 ≤ j ≤ nProof.From definition 2.4.1,

P (Aj|B) =P (Aj ∩B

P (B)=

P (B|Aj)P (Aj)

P (B).

But Theorem 2.4.1 allows the denominator to bewritten as

∑ni=1P (B|Ai)P (Ai) and the result fol-

lows.

PROBLEM SOLVING HINT (Workingwith partitioned sample Spaces)Learning to identify which part of the ”given” cor-

responds to B and which part correspond to theAi’s. The following hints may help.

1) Pay attention to the last one or two sentences. Isthe question asking for the unconditional proba-bility or conditional probability

2) If unconditional probability denote B as theevent whose probability we are trying to find.If Conditional probability denote B as the eventthat already happened.

3) OnesB has been identified, reread the beginningof the question and assign the Ai’s.

Example 2.4.13A biased coin, twice as likely to come up heads

as tails, is tossed once. If it shows heads, a chipis drawn from box I, which contains three whiteand four red chips; if it shows tails, a chip is drawnfrom box II, which contains six white and three redchips. Given that a white chip was drawn, what isthe probability that the coin came up tails? (Figure2.4.10 shows the situation).Since P (H) = 2P (T ), P (H) = 2/3 and P (T ) =

1/3. Define the events B: white chip is drawn,

A1: coin come up heads (i.e., chip came from boxI)A2: coin co me up tails (i.e., chip came from box

II)We are trying to find P (A2|B), where P (B|A1) =

3/7,P (B|A2) = 6/9, P (A1) = 2/3, P (A2) = 1/3so

P (A2|B) =P (B|A2)P (A2)

P (B|A1)P (A1) + P (B|A2)P (A2)

=(6/9)(1/3)

(3/7)(2/3) + (6/9)(1/3)= 7/16

Example 2.4.16According to the manufacturers specifications, your

home burglar alarm has a 95% chance of going offif someone breaks into your house. During the twoyears you have lived there, the alarm has gone offon five different nights, each time for no apparentreason. Suppose the alarm goes off tomorrow night.What is the probability that someone is trying tobreak into your house? (Note: Police statistics showthat the chances of any particular house in your

neighborhood being burglarized on any given nightare two in ten thousand.) Let B be the event Alarmgoes off tomorrow night, and let A1 and A2 be theevents House is being burglarized and House is notbeing burglarized, respectively.Then

P (B|A1) = 0.95

P (B|A2) = 5/730(i.e., five nights in two years)

P (A1) = 2/10000

P (A2) = 1− P (A1) = 9998/10000

The probability in question is P (A1|B). Intuitively,it might seem that P (A1|B) should be close to 1because the alarms performance probabilities lookgood P (B|A1) is close to 1 (as it should be) andP (B|A2) is close to 0 (as it should be). Neverthe-less, P (A1|B) turns out to be surprisingly small:

P (A1|B) =P (B|A1)P (A1)

P (B|A1)P (A1) + P (B|A2)P (A2)

=(0.95)(2/10, 000)

(0.95)(2/10000) + (5/730)(9998/10000)= 0.027

That is, if you hear the alarm going off, the prob-ability is only 0.027 that your house is being bur-glarized. Computationally, the reason P (A1|B) isso small is that P (A2) is so large. The latter makesthe denominator of P (A1|B) large and, in effect,washes out the numerator. Even if P (B|A1) weresubstantially increased (by installing a more ex-pensive alarm), P (A1|B) would remain largely un-changed (see Table 2.4.3).Table 2.4.3

P (B|A1)0.95 0.97 0.99 0.999

P (A1|B) 0.027 0.028 0.028 0.028

Practice2.4.40 Box I contains two white chips and one

red chip; box II has one white chip and two redchips. One chip is drawn at random from box I andtransferred to box II. Then one chip is drawn frombox II. Suppose that a red chip is selected from boxII. What is the probability that the chip transferredwas white?AR: transferred red is chip

AW : transferred chip is white ;Let B denote the event that the chip drawn from

box II is red; Then

P (AW |B) =P (B|AW )P (AW )

P (B|AW )P (AW ) + P (B|AR)P (AR))

(2/4)(2/3)

(2/4)(2/3) + (3/4)(1/3)) = 4/7

Homework2.4.44, 2.4.49, 2.4.52

2.5 Independence

In section 2.4 we introduced conditional probabil-ity. It often is the case, though, that the prob-ability of the given event remains unchanged, re-gardless of the outcome of the second event-thatis,P (A|B)=P (A)

Definition 2.5.1. Two events A andB are saidto be independent if P (A ∩B) = P (A)P (B)

Comment: When two events are independent, thenP (A ∩B) = P (A)P (B). We have

P (A|B) =P (A ∩B)

P (B)= P (A)

Therefore, when event A and event B are indepen-dent P (A|B) = P (A)

Example 2.5.2. Suppose that A and B areindependent, Does it follow that AC and BC arealso independent?Answer: Yes!!We need to show that P (AC∩BC) = P (AC)P (BC)We start with

P (AC ∪BC) = P (AC) + P (BC)− P (AC ∩BC)

But

P (AC ∪BC) = P (A ∩B)C = 1− P (A ∩B).

Therefore

1− P (A ∩B) = 1− P (A) + 1− P (B)− P (AC ∩BC).

Since A and B are independent,

P (A ∩B) = P (A)P (B). So

P (AC ∩BC)

= P (AC) + P (BC)− P (AC ∪BC)

= P (AC) + P (BC)− P [(A ∩B)C ]

= 1− P (A) + 1− P (B)− [1− P (A)P (B)]

= [1− P (A)][1− P (B)] = P (AC)P (BC)

Example 2.5.4Suppose that two events, A and B each having

nonzero probability are mutually exclusive. Arethey also independent?No. If A and B are mutually exclusive then P (A∩

B) = 0, But P (A) · P (B) > 0 (By assumption)

Deducing Independence.Sometimes the physical circumstances surround-

ing two events makes it obvious that the occurrence(or nonoccurrence) of one has absolutely no influ-ence or effect on the occurrence (or nonoccurrence)of the other. If it should be the case, then thetwo events will necessarily be independent in the

sense of definition 2.5.1. Example is tossing a cointwice. Clearly what happen in the first toss willnot influence what happen at the second toss. SoP (HH) = P (H ∩H) = 1/2 · 1/2 = 1/4

Example 2.5.5Myra and Carlos are summer interns working as

proofreaders for a local newspaper. Based on ap-titude tests, Myra has a 50% chance of spotting ahyphenation error, while Carlos picks up on thatsame kind of mistake 80% of the time. Supposethe copy they are proofing contains a hyphenationerror.What is the probability it goes undetected?Let A and B be the events that Myra and Car-

los, respectively, catch the mistake. By assumption,P (A) = 0.50 and P (B) = 0.80. What we are look-ing for is the probability of the complement of aunion. That is,

P (Error goes undetected)

= 1− P (Error is detected)

= 1− P ( Myra or Carlos or both see the mistake)

= 1− P (A ∪B)

= 1− {P (A) + P (B)− P (A ∩B)}( from Theorem2.3.6)

Since proofreaders invariably work by themselves,events A and B are necessarily independent, so P (A∩B) would reduce to the product P (A).P (B). It fol-lows that such an error would go unnoticed 10% ofthe time:

P ( Error goes undetected )

= 1− {0.50 + 0.80− (0.50)(0.80)}= 1− 0.90

= 0.10

Example 2.5.7 Emma and Josh have just got-ten engaged. What is the probability that they have

different blood types? Assume that blood types forboth men and women are distributed in the generalpopulation according to the following proportions:

Blood Type ProportionA 40%B 10 %AB 5 %O 45%

First, note that the event Emma and Josh havedifferent blood types includes more possibilities thandoes the event Emma and Josh have the same bloodtype. That being the case, the complement will beeasier to work with than the question originallyposed. We can start, then, by writingP(Emma and Josh have different blood types)=1- P(Emma and Josh have the same blood type)Now, if we let EX and JX represent the events

that Emma and Josh, respectively, have blood typeX , then the event Emma and Josh have the sameblood type is a union of intersections, and we canwrite

P (Emma and Josh have the same blood type)

= P{(EA ∩ JA) ∪ (EB ∩ JB) ∪ (EAB ∩ JAB)

∪ (EO ∩ JO)}Since the four intersections here are mutually ex-

clusive, the probability of their union becomes thesum of their probabilities. Moreover, ” blood type” is not a factor in the selection of a spouse, so EXand JX are independent events and P (EX∩JX) =P (EX)P (JX). It follows, then, that Emma andJosh have a 62.5% chance of having different bloodtypes:

P (Emma and Josh have different blood types)

= 1− {P (EA)P (JA) + P (EB)P (JB)

+P (EAB)P (JAB) + P (EO)P (JO)}= 1− {(0.40)(0.40) + (0.10)(0.10)

+(0.05)(0.05) + (0.45)(0.45)}= 0.625

Practice 2.5.1Suppose P (A∩B) = 0.2, P (A) = 0.6, and P (B) =

0.5a. Are A and B mutually exclusive?b. Are A and B independent?c. Find P (AC ∪BC).a) No, because P (A ∩B) > 0.b) No, because P (A ∩B) = 0.2 whileP (A)P (B) = (0.6)(0.5) = 0.3c) P (AC∪BC) = P ((A∩B)C) = 1−P (A∩B) =

1− 0.2 = 0.8.

Practice 2.5.2. Spike is not a terribly brightstudent. His chances of passing chemistry are 0.35,mathematics, 0.40, and both 0.12.Are the event”Spike passes chemistry” and ”Spike passes mathe-matics” independent? What is the probability thathe fails both.Answer: not independent, 0.37

Homework.2.5.4, 2.5.7, 2.5.9

Defining the Independence of More Thantwo EventsIt is not immediately obvious how to extend def-

inition of independence,say, three events. To callA,B, and C independent, should we require

P (A ∩B ∩ C) = P (A) · P (B) · P (C) (2.5.1)

or should we impose the definition we already haveon the three pairs of events (pairwise independent)

P (A ∩B) = P (A) · P (B) (2.5.2)

P (B ∩ C) = P (B) · P (C)

P (A ∩ C) = P (A) · P (C)

Neither condition by itself is sufficient.If the three events satisfy 2.5.1 and 2.5.2 , we call

them independent (or mutually independent) Gen-erally 2.5.1 does not imply (2.5.2), nor does 2.5.2imply 2.5.1More generally, the independence of n events re-

quires that the probabilities of all possible intersec-tions equal the products of all the correspondingindividual probabilities.

Definition 2.5.2 EventsA1, A2, · · · , An are saidto be independent if all k, k = 1 · · ·n,

P (

k∩i=1

Ai = P (A1)P (A2) · · ·P (Ak)

Example 2.5.8An insurance company plans to assess its future

liabilities by sampling the records of its current poli-cyholders. A pilot study has turned up three clients,one living in Alaska, one in Missouri, and one in Ver-mont, whose estimated chances of surviving to theyear 2015 are 0.7, 0.9, and 0.3, respectively.Whatis the probability that by the end of 2014 the com-pany will have had to pay death benefits to exactlyone of the three?LetA1 be the event Alaska client survives through

2014. Define A2 and A3 analogously for the Mis-souri client and Vermont client, respectively. Thenthe event E: Exactly one dies can be written as theunion of three intersections:

E = (A1 ∩ A2 ∩ AC3 ) ∪ (A1 ∩ AC

2 ∩ A3)

∪(AC1 ∩ A2 ∩ A3)

Since each of the intersections is mutually exclu-sive of the other two,

P (E) = P (A1 ∩ A2 ∩ AC3 ) + P (A1 ∩ AC

2 ∩ A3)

+P (AC1 ∩ A2 ∩ A3)

Furthermore, there is no reason to believe thatfor all practical purposes the fates of the three arenot independent. That being the case, each of theintersection probabilities reduces to a product, andwe can write

P (E) = P (A1) · P (A2) · P (AC3 )

+ P (A1) · P (AC2 ) · P (A3)

+P (AC1 ) · P (A2) · P (A3)

= (0.7)(0.9)(0.7) + (0.7)(0.1)(0.3)

+ (0.3)(0.9)(0.3)

= 0.543

Comment Declaring events independent for rea-sons other than those prescribed in Definition 2.5.2is a necessarily subjective endeavor. Here we mightfeel fairly certain that a random person dying inAlaska will not affect the survival chances of a ran-dom person residing in Missouri (or Vermont). Butthere may be special circumstances that invalidatethat sort of argument. For example, what if thethree individuals in question were mercenaries fight-ing in an African border war and were all crew mem-bers assigned to the same helicopter? In practice,all we can do is look at each situation on an indi-vidual basis and try to make a reasonable judgmentas to whether the occurrence of one event is likelyto influence the outcome of another event.

Practice2.5.11 Suppose that two fair dice (one red and one

green) are thrown, with events A,B, and C definedA: a 1 or a 2 shows on the red dieB: a 3,4,or 5 shows on the green dieC: the dice total is 4,11, or 12. Show the these

events satisfy Equation 2.5.1 but not 2.5.2. By list-ing the sample outcomes, it can be shown that

P (A) = 1/3, P (B) = 1/2, P (C) = 1/6

P (A ∩B) = 1/6, P (A ∩ C) = 1/18;

P (B ∩ C) = 1/18

and P (A∩B∩C) = 1/36 Note that equation 2.5.1is satisfied,

P (A ∩B ∩ C) = 1/36 = P (A) · P (B) · P (C)

= 1/3 · 1/2 · 1/6.But Equation 2.5.2 is not satisfied since P (B∩C) =1/18 ̸= P (B) · P (C).

Practice 1) Suppose that a fair coin is flippedthree times. Let A1 be the event of a head on thefirst flip; A2, a tail on the second flip; andA3, a headon the third flip. Are A1, A2, and A3 independent?

2) Suppose that two events A and B, each havingnonzero probability, are mutually exclusive. Arethey also independent?

3) Suppose that P (A∩B) = 0.2, P (A) = 0.6, and

P (B) = 0.5 a) Are A and B mutually exclusive? b)Are A and B independent? C) Find P (AC ∪BC)

Repeated Independent EventsIt is not uncommon for an experiment to be the

composite of a finite or countably infinite number ofsubexperiments, each of the latter being performedunder essentially the same conditions. (tossing acoin three times) In general, the subexperimentscomprising an experiment are referred to as tri-als. We will restrict our attention here to problemswhere the trials are independent- that is, for all j,the probability of any given outcome occurring onthe jth trial is unaffected by what happened on thepreceding j-1 trials. They are also referred to asRepeated Independent Trials

Example 2.5.10 Suppose a string decorationlight you just bought has twenty-four bulbs wiredin series. If each bulb has a 99.9% chance of ”work-ing” the first time current is applied. What is theprobability that the string itself will not work? (

Note that if one or more bulb fails the string willnot work)LetAi be the event that ith bulb fails, i = 1, 2, · · · , 24.

Then

P ( String fails )

= P ( at least one bulb fails)

= P (A1 ∪ A2 ∪ · · · ∪ A24)

= 1− P ( String works)

= 1− P ( all twenty four work )

= 1− P (AC1 ∩ AC

2 ∩ · · · ∩ AC24)

If we assume that bulbs are presumably manufac-tured the same way, P (AC

i ) is the same for all i,so

P ( String fails ) = 1− {P (ACi )}

24

= 1− {0.99}24

= 1− 0.98 = 0.02

Therefore the chances are one in fifty, in otherwords, that the string will not work the first timecurrent is applied.

Practice 2.5.25If two fair dice are tossed, what is the smallest

number of throws, n, for which the probability ofgetting at least one double 6 exceed 0.5? (Notethat this was one of the first problems that de Merecommunicated to Pascal in 1654)

P ( at least one double six in n throws )

= 1− P ( no double sixes in n throws )

= 1− (35

36)n.

By trial and error, the smallest n for whichP ( at least one double six in n throws ) exceeds 0.50

is n=25Homework 2.5.23, 2.5.26, 2.5.27

2.6 Combinatoric

Recall P (A) = # element in the event A# element in the sample space S

How to count the number of element?

Counting ordered sequences (The multi-plication Rule)

The multiplication Rule If operation A canbe performed in m different ways and operationB in n different ways, the sequence (operation A,operation B) can be performed in m.n differentways.Proof. Use tree diagram

Corollary 2.6.1 If operation Ai, i = 1, 2, · · · , kcan be performed in ni ways, i = 1, 2, · · · , k, re-spectively, then the ordered sequence(operation A1, operation A2, · · · ,operation Ak)

can be performed in n1, n2, · · ·nk ways.

Example: How many different ways can parentshave three children.Answer: For each child we will assume there are

only two possible outcomes (thus neglecting effectsof extra X or Y chromosomes, or any other chromo-somal/birth defects). The number of ways can becalculated: 2 · 2 · 2 = 8. These can be listed: BBB,BBG, BGB, BGG, GBB, GBG, GGB, GGG whereB=boy, G=girl.

Example (from Question 2.6.9) A restaurantoffers a choice of four appetizers (A), fourteen en-trees (E), six desserts (D), and five beverages (B).How many different meals are possible if a dinerintends to order only three courses? (Consider thebeverage to be a ”course.”)4 · 14 · 6 + 4 · 6 · 5 + 14 · 6 · 5 + 4 · 14 · 5 = 1156AED + ADB + EDB + AEB

Example 2.6.1The combination lock on a briefcase has two dials,

each marked off with sixteen notches (see Figure2.6.2). To open the case, a person first turns the leftdial in a certain direction for two revolutions andthen stops on a particular mark. The right dial isset in a similar fashion, after having been turned ina certain direction for two revolutions. How manydifferent settings are possible?In the terminology of the multiplication rule, open-

ing the briefcase corresponds to the four-step se-quence (A1, A2, A3, A4) detailed in Table2.6.1.Ap-plying the previous corollary, we see that 1024 dif-

ferent settings are possible:

number of different settings = n1 · n2 · n3 · n4= 2 · 16 · 2 · 16= 1024

Example 2.6.3In 1824 Louis Braille invented what would even-

tually become the standard alphabet for the blind.Based on an earlier form of night writing used bythe French army for reading battlefield communiqusin the dark, Brailles system replaced each writtencharacter with a six-dot matrix:

· ·· ·· ·where certain dots were raised, the choice depend-

ing on the character being transcribed. The lettere, for example, has two raised dots and is written

• ·· •· ·Punctuation marks, common words, suffixes, and

so on, also have specified dot patterns. In all, howmany different characters can be enciphered in Braille?See Figure 2.6.3Think of the dots as six distinct operations, num-

bered 1 to 6 (see Figure 2.6.3). In forming a Brailleletter, we have two options for each dot:We canraise it or not raise it. The letter e, for exam-ple, corresponds to the six-step sequence (raise, donot raise, do not raise, do not raise, raise, do notraise). The number of such sequences, with k =6and n1 = n2 = · · · = n6 = 2, is 26, or 64. Oneof those sixty-four configurations, though, has noraised dots, making it of no use to a blind person.Figure 2.6.4 shows the entire sixty-three-characterBraille alphabet.

Problem-Solving Hints (Doing combinatorialproblems)

Combinatorial questions sometimes call for problem-solving techniques that are not routinely used inother areas of mathematics. The three listed beloware especially helpful.

1. Draw a diagram that shows the structure of theoutcomes that are being counted. Be sure to in-clude (or indicate) all relevant variations. A casein point is Figure 2.6.3. Almost invariably, dia-grams such as these will suggest the formula, orcombination of formulas, that should be applied.

2. Use enumerations to test the appropriateness ofa formula. Typically, the answer to a combina-torial problemthat is, the number of ways to dosomethingwill be so large that listing all possi-ble outcomes is not feasible. It often is feasible,though, to construct a simple, but analogous,problem for which the entire set of outcomes canbe identified (and counted). If the proposed for-mula does not agree with the simple-case enu-meration, we know that our analysis of the orig-inal question is incorrect.

3. If the outcomes to be counted fall into struc-

turally different categories, the total number ofoutcomes will be the sum (not the product) ofthe number of outcomes in each category. Recall(from Question 2.6.9)

Suggested Practice (NOT COLLECTED)2.6.1, 2.6.3, 2.6.4, 2.6.14, 2.6.16

Counting Permutations (when the ob-jects are all distinct)Ordered sequences arise in two fundamentally dif-

ferent ways. The first is the scenario addressed bythe multiplication rule - a process is comprised of koperations, each allowing ni options, i = 1, 2, · · · k;choosing one version of each operation leads to n1, n2, · · ·nkpossibilities.The second occurs when an ordered arrangement

of some specified length k is formed from a finitecollection of objects. Any such arrangement is re-ferred to as a permutation of length k. For exam-

ple, given the three objects A, B, and C, there aresix different permutations of length two that can beformed if the objects cannot be repeated: AB, AC,BC, BA, CA, and CB.

Theorem 2.6.1 The number of permutationsof length k that can be formed from a set of ndistinct elements, repetitions not allowed, is de-noted by the symbol nPk, where

nPk = n(n−1)(n−2) · · · (n−k+1) = n!/(n−k)!

Proof Any of the n objects may occupy the firstposition in the arrangement, any of n - 1 the second,and so on - the number of choices available for fillingthe k th position will be n - k + 1 (see Figure 2.6.6).The theorem follows, then, from the multiplicationrule: There will be n(n - 1)...(n - k + 1) orderedarrangements.

Choices: n1

n−12 · · · n−(k−2)

k−1n−(k−1)

kPosition in sequenceSee Figure 2.6.6 , a tree diagram

Corollary 2.6.2 The number of ways to per-

mute an entire set of n distinct objects is nPn =n(n− 1)(n− 2)...1 = n!

Example 2.6.7Howmany permutations of lengthk = 3 can be formed from the set of n = 4 distinctelements, A,B,C, and D?According to Theorem 2.6.1, the number should

be 24:n!

(n−k)!= 4!

(4−3)!= 4·3·2·1

1 = 24

Confirming that figure, Table 2.6.2 lists the entireset of 24 permutations and illustrates the argumentused in the proof of the theorem.

Example 2.6.12A new horror movie, Friday the 13th, Part X,

will star Jasons great-grandson (also named Jason)as a psychotic trying to dispatch (as gruesomelyas possible) eight camp counselors, four men andfour women. (a) How many scenarios (i.e., victimorders) can the screen writers devise, assuming theywant Jason to do away with all the men before goingafter any of the women? (b) How many scripts are

possible if the only restriction imposed on Jason isthat he save Muffy for last?a. Suppose the male counselors are denoted A,

B, C, and D and the female counselors, W, X, Y, and Z. Among the admissible plots would be thesequence pictured in Figure 2.6.11, where B is donein first, then D, and so on. The men, if they are tobe restricted to the first four positions, can still bepermuted in 4P4 =4! ways. The same number ofarrangements can be found for the women. Further-more, the plot in its entirety can be thought of asa two-step sequence: first the men are eliminated,then the women. Since 4! ways are available to dothe former and 4! the latter, the total number ofdifferent scripts, by the multiplication rule, is 4!4!,or 576.

Men WomenB D A C Y Z W X1 2 3 4 5 6 7 8Order of killingFigure 2.6.11

b. If the only condition to be met is that Muffybe dealt with last, the number of admissible scriptsis simply 7P7 =7!, that being the number of waysto permute the other seven counselors (see Figure2.6.12).B W Z C Y A D Muffy1 2 3 4 5 6 7 8Order of killingFigure 2.6.12

Example 2.6.13Consider the set of nine-digit numbers that can

be formed by rearranging without repetition the in-tegers 1 through 9. For how many of those permu-tations will the 1 and the 2 precede the 3 and the4? That is, we want to count sequences like 7 2 5 13 6 9 4 8 but not like 6 8 1 5 4 2 7 3 9.At first glance, this seems to be a problem well

beyond the scope of Theorem 2.6.1.With the helpof a symmetry argument, though, its solution is sur-prisingly simple.Think of just the digits 1 through 4. By the

corollary on p. 74, those four numbers give rise to

4!(=24) permutations. Of those twenty-four, onlyfour (1, 2, 3, 4), (2, 1, 3, 4), (1, 2, 4, 3), and (2, 1,4, 3) have the property that the 1 and the 2 comebefore the 3 and the 4. It follows that 4

24 of thetotal number of nine-digit permutations should sat-isfy the condition being imposed on 1, 2, 3, and 4.Therefore, number of permutations where 1 and 2precede 3 and 4 = 4

24 · 9! = 60, 480Comment Computing n! can be quite cumber-

some, even for n’s that are fairly small: We sawin Example 2.6.9, for instance, that 16! is alreadyin the trillions. Fortunately, an easy-to-use approxi-mation is available. According to Stirlingfs formula,

n! =√2πnn+1/2e−n

In practice, we apply Stirlingfs formula by writing

log10(n!).= log10(

√2π) + (n +

1

2) + nlog10(e)

Practice

2.6.17. The board of a large corporation has sixmembers willing to be nominated for office. Howmany different ”president/vice president/treasurer”

slates could be submitted to the stockholders?Answer:6 · 5 · 4 = 1202.6.18. How many ways can a set of four tires

be put on a car if all the tires are interchangeable?How many ways are possible if two of the four aresnow tires?

4P4 and 2P2 ·2 P2

More Practice :2.6.22, 2.6.23, 2.6.27

Counting Permutations (when the ob-jects are not all distinct)Permutation is an ordered arrangement of the

numbers, terms, etc., of a set into specified groupsThe corollary to Theorem 2.6.1 gives a formula for

the number of ways an entire set of n objects canbe permuted if the objects are all distinct. Fewerthan n ! permutations are possible, though, if someof the objects are identical. For example, there are3! = 6 ways to permute the three distinct objectsA, B, and C :ABC

ACBBACBCACABCBAIf the three objects to permute, are A, A, and B -

that is, if two of the three are identical - the numberof permutations decreases to three:AABABABAA

Illustration 2 Suppose you want to order agroup of n objects where some of the objects arethe same.Think about the letters in the word EAR. How

many different ways can we arrange the letters toform different three letter words? Easy, right. Wehave three letters we can write first, we have twoletters next, and then the last letter. 3 × 2 × 1 =6 different three letter words. EAR, ERA, ARE,AER, REA, RAE.Now think about the letters in the word EYE.

How many different ways can we arrange the lettersto form different three letter words? Easy, right.Just like before. We have three letters we can writefirst, we have two letters next, and then the lastletter. 3 × 2 × 1 = 6 different three letter words.EYE, EYE, YEE, YEE, EEY, EEY. But wait asecond. Three are the same as another three. Ac-tually there are only three distinguishable ways forthe word EYE.The number of distinguishable permutations of n

objects where n1 are one type, n2 are of anothertype, and so on isn!/n1!n2!n3!As we will see, there are many real-world applica-

tions where the n objects to be permuted belong tor different categories, each category containing oneor more identical objects.

Theorem 2.6.1 The number of ways to ar-range n objects, n1 being of one kind, n2 of asecond kind,...,and nr, of an rth kind, is

n!n1!·n2!···nr! where

∑ri ni = n.

Comment Ratios like n!(n1! n2! ···nr!)

are all called

multinomial coefficients because the general term inthe expansion of (x1 + x2 + ... + xr)

n isn!

n1! n2!···nr!xn11 x

n22 · · · xnrr

Example 2.6.14 A pastry in a vending ma-chine costs 85 cents. In how many ways can a cus-tomer put in two quarters, three dimes, and onenickel?Quarter Dime Dime Quarter Nickel Dime

1 2 3 4 5 6Order in which coins are depositedFigure 2.6.13If all coins of a given value are considered identi-

cal, then a typical deposit sequence, say,QDDQND(see Figure 2.6.13), can be thought of as a permuta-tion of n = 6 objects belonging to r = 3 categories,wheren1 = number of nickels = 1n2 = number of dimes = 3n3 = number of quarters = 2By Theorem 2.6.2, there are sixty such sequences:

n!n1! n2! n3! =

6!1! 3! 2! = 60

Of course, had we assumed the coins were distinct(having been minted at different places and differenttimes), the number of distinct permutations wouldhave been 6!, or 720.

Example 2.6.16What is the coefficient of x23 in the expression of

(1 + x5 + x9)100?First consider (a + b)2 = (a + b)(a + b) = a2 +

2ab + b2

The coefficient ab is 2 come from two differentmultiplication of ab and ba. Similarly for the coef-ficient of x23 in the expansion of (1 + x5 + x9)100

will the number of ways that one term from eachof the one hundredth factors (1 + x5 + x9) can bemultiplied together to form x23.

x23 = x9 · x9 · x5 · 1 · 1 · · · 1It follows that the coefficient x23 is the number of

to permute two x9’s one x5 and ninety seven 1’s.

So the

coefficient ofx23 =100!

2!1!97!= 485100

Practice2.6.34Which state name can generate more per-

mutations, TENNESSEE or FLORIDA?

MORE PRACTICE2.6.36, 2.6.40, 2.6.41, 2.6.42

Counting Combination

We call a collection of k unordered elements acombination of size k. For example, given a set ofn = 4 distinct elements - A, B, C, and D - thereare six ways to form combinations of size 2:A and B B and CA and C B and DA and D C and DA general formula for counting combinations can

be derived quite easily from what we already knowabout counting permutations.

Theorem 2.6.3. The number of ways to formcombinations of size k from a set of n distinctobjects, repetitions not allowed, is denoted by thesymbols

(nk

)or nCk, where(n

k

)= nCk = n!

k!(n−k)!

Proof. Let the symbol(nk

)denote the number of

combinations satisfying the conditions of the theo-rem. Since each of those combinations can be or-dered in k! ways, the product k!

(nk

)must equal

the number of permutations of length k that canbe formed from n distinct elements. But n dis-tinct elements can be formed into permutations oflength k in n(n− 1) · · · (n− k+ 1) = n!

(n−k)!ways.

Therefore,k!(nk

)= n!

(n−k)!

Solving for(nk

)gives the result.

Comment. It often helps to think of combina-tions in the context of drawing objects out of anurn. If an urn contains n chips labeled 1 throughn, the number of ways we can reach in and drawout different samples of size k is

(nk

). With respect

to this sampling interpretation for the formation ofcombinations,

(nk

)is usually read ”n things taken

k at a time” or ”n choose k”.Comment. The symbol

(nk

)appears in the state-

ment of a familiar theorem from algebra,

(x + y)n =

n∑k=0

(n

k

)xkyn−k

Since the expression being raised to a power in-volves two terms, x and y, the constants

(nk

), k =

0, 1, · · · , n, are commonly referred to as binomialcoefficients.

EXAMPLE 2.6.20

Eight politicians meet at a fund-raising dinner.How many greetings can be exchanged if each politi-cian shakes hands with every other politician ex-actly once?Imagine the politicians to be eight chips - 1 through

8 - in an urn. A handshake corresponds to an un-ordered sample of size 2 chosen from that urn. Sincerepetitions are not allowed (even the most obse-quious and overzealous of campaigners would not

shake hands with himself!), Theorem 2.6.3 applies,and the total number of handshakes is(n

k

)= 8!

2!6!or 28.

Example 2.6.21A chemist is trying to synthesize a part of a straight-

chain aliphatic hydrocarbon polymer that consistsof twenty-one radicals ten ethyls (E), six methyls(M), and five propyls (P). Assuming all arrange-ments of radicals are physically possible, how manydifferent polymers can be formed if no two of themethyl radicals are to be adjacent? Imagine ar-ranging the Es and the Ps without the Ms. Figure2.6.15 shows one such possibility. Consider the six-teen spaces between and outside the Es and Ps asindicated by the arrows in Figure 2.6.15. In orderfor the Ms to be nonadjacent, they must occupyany six of these locations. But those six spaces canbe chosen in

(166

)ways. And for each of the posi-

tionings of the Ms, the Es and Ps can be permutedin 15!

10!5!ways (Theorem 2.6.2).

Figure 2.6.15E E P P E E E P P P E E E E E↑ ↑ · · · ↑ ↑ · · · ↑ ↑So, by the multiplication rule, the total number

of polymers having nonadjacent methyl radicals is24,048,024:(

16

6

)· 15!

10!5!=

16!

10!6!· 15!

10!5!

= (8008)(3003)

= 24, 048, 024

EXAMPLE 2.6.22Consider Binomial expansion (a + b)n = (a +

b)(a + b) · · · (a + b)When n=4, (a+b)4 = a4+4a3b+6a2b2+4ab3+b4

Notice: The literal factors are all the combina-tions of a and b where the sum of the exponentsis 4: a4, a3b, a2b2, ab3, b4. The degree of each termis 4. In the expansion of (a + b)4, the binomialcoefficients are 1 4 6 4 1; The coefficients fromleft to right are the same right to left.The answer to the question, ”What are the bino-

mial coefficients?” is called the binomial theorem.

It shows how to calculate the coefficients in the ex-pansion of (a + b)n.The symbol for a binomial coefficient is

(nk

)The

upper index n is the exponent of the expansion; thelower index k indicates which term.For example, when n = 5, each term in the ex-

pansion of (a + b)5 will look like this:(5

k

)a5−kbk

k will successively take on the values 0 through 5.Therefore the binomial theorem is

(a + b)5 =

5∑k=0

(5

k

)a5−kbk

http://www.themathpage.com/aprecalc/binomial-theorem.htm

Binomial coefficients have many interesting prop-erties. Perhaps the most famous is Pascal’s trian-gle, a numerical array where each entry is equal tothe sum of the two numbers appearing diagonallyabove it (see Figure 2.6.16). Note that each entry

in Pascal’s triangle can be expressed as a binomialcoefficient, and the relationship just described ap-pears to reduce to a simple equation involving thosecoefficients:(n+1

k

)=(nk

)+( nk−1

)Equation 2.6.1

FIGURE 2.6.16

Practice2.6.50. How many straight lines can be drawn

between five points (A, B, C, D, and E ), no threeof which are collinear?Since every (unordered) set of two letters describes

a different line, the number of possible lines is(52

)=

10

Example 2.6.23 The answers to combinatorialquestions can sometimes be obtained using quitedifferent approaches. What invariably distinguishesone solution from another is the way in which out-comes are characterized. For example, suppose youhave just ordered a roast beef sub at a sandwichshop, and now you need to decide which, if any,

of the available toppings (lettuce, tomato, onions,etc.) to add. If the shop has eight extras to choosefrom, how many different subs can you order? Oneway to answer this question is to think of eachsub as an ordered sequence of length eight, whereeach position in the sequence corresponds to oneof the toppings. At each of those positions, youhave two choicesadd or do not add that particulartopping. Pictured in Figure 2.6.17 is the sequencecorresponding to the sub that has lettuce, tomato,and onion but no other toppings. Since two choices(add or do not add) are available for each of theeight toppings, the multiplication rule tells us thatthe number of different roast beef subs that couldbe requested is 28, or 256.An ordered sequence of length eight, though, is

not the only model capable of characterizing a roastbeef sandwich.We can also distinguish one roastbeef sub from another by the particular combina-tion of toppings that each one has. For example,there are(8

4

)= 70 different subs having exactly four top-

pings. It follows that the total number of different

sandwiches is the total number of different combi-nations of size k, where k ranges from 0 to 8. Reas-suringly, that sum agrees with the ordered sequenceanswer:

total number of different roast beef subs

=

(8

0

)+

(8

1

)+

(8

2

)+ · · · +

(8

8

)= 1 + 8 + 28 + · · · + 1

= 256

What we have just illustrated here is another prop-erty of binomial coefficients namely, that

n∑k=0

(n

k

)= 2n (2.6.2)

The proof of Equation 2.6.2 is a direct consequenceof Newtons binomial expansion (see the second com-ment following Theorem 2.6.3).

More practice2.6.51, 2.6.53, 2.6.54, 2.6.55

2.7 Combinatorial Probability

In Section 2.6 our concern focused on counting thenumber of ways a given operation, or sequence ofoperations, could be performed. In Section 2.7 wewant to couple those enumeration results with thenotion of probability. Putting the two togethermakes a lot of sense – there are many combina-torial problems where an enumeration, by itself, isnot particularly revelent.In a combinatorial setting, making the transition

from an enumeration to a probability is easy. Ifthere are n ways to perform a certain operation anda total of m of those satisfy some stated condition-call it A-then P(A) is defined to be the ratio m/n.This assumes, of course, that all possible outcomesare equally likely. Historically, the ”m over n” ideais what motivated the early work of Pascal, Fermat,and Huygens (recall section 1.3). Today we recog-nize that not all probabilities are so easily charac-terized. Nevertheless, the m/n model-the so-calledclassical definition of probability-is entirely appro-priate for describing a wide variety of phenomena.

Example 2.7.1A box contains eight chips, numbered 1 through

8. A sample of three is drawn without replacement.what is the probability that the largest chip in thesample is 5? Let A be the event ”Largest chip insample is a 5” Figure 2.7.1 shows what must hap-pen in order for A to occur: (1) the 5 chip must beselected, and (2) two chips must be drawn from thesubpopulation of chips numbered 1 through 4. Bythe multiplication rule, the number of samples sat-isfying event A is the product

(11

)·(42

). The sample

space S for the experiment of drawing three chipsfrom the box contains

(83

)outcomes, all equally

likely. In this situation, thenm =(11

)·(42

),n =

(83

),

and

P (A) =

(11

)·(42

)(83

) = 0.11

Example 2.7.2A box contains n red chips numbered 1 through

n, n white chips numbered 1 through n, and n bluechips numbered 1 through n. Two chips drawn aredrawn at random and without replacement. What

is the probability that the two drawn are either thesame color or the same number?Let A be the event that the two chips drawn are

the same color, let B be the event that they havethe same number. We are looking for P (A ∪ B).Since A and B here are mutually exclusive,

P (A ∪B) = P (A) + P (B).

With 3n chips in the box, the total number of waysdraw an unordered sample of size 2 is

(3n2

). More-

over,

P (A) = P (2 red ∪ 2 whites ∪ 2 blues)

= P (2 red) + P (2 whites ) + P (2 blues)

=3(n2

)(3n2

)and

P (B) = P ( two 1 ’s ∪ two 2 ’s ∪ · · · ∪ two n ’s)

= n

(3

2

)/

(3n

2

).

Therefore

P (A ∪B) =3(n2

)+ n(32

)(3n2

)=

n + 1

3n− 1

Example 2.7.3Twelve fair dice are rolled. What is the probabil-

ity that

a. the first six dice all show one face and the lastsix dice all show a second face?

b. not all the faces are the same?

c. each face appears exactly twice?

a. The sample space that corresponds to the ex-periment of rolling twelve dice is the set of orderedsequences of length twelve, where the outcome atevery position in the sequence is one of the inte-gers 1 through 6. If the dice are fair, all 612 suchsequences are equally likelyLet A be the set of rolls where the first six dice

show one face and the second six show another face.Figure 2.7.3 shows one of the sequences in the event

A. Clearly, the face that appears for the first half ofthe sequence could be any of the six integers from1 through 6.Faces 2

12223242526474849

410

411

412

Position in sequenceFigure 2.7.3Five choices would be available for the last half

of the sequence (since the two faces cannot be thesame). The number of sequences in the event A,then, is 6P2 = 6 · 5 = 30. Applying the m/n rulegivesP (A) = 30/612 = 1.4× 10−8

b. Let B be the event that not all the faces arethe same. Then P (B) = 1−P (BC) = 1− (6/612)since there are six sequences(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ), · · · , (6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, ) where the twelve

faces are all the same.c. Let C be the event that each face appears ex-

actly twice. From Theorem 2.6.2, the number ofways each face can appear exactly twice is

12!(2!·2!·2!·2!·2!·2!). Therefore,

P (C) =12!/(2!·2!·2!·2!·2!·2!)

612= 0.0034

Practice2.7.1 Ten equally qualified marketing assistants

are candidates for promotion to associate buyer;seven are men and three are women. If the com-pany intends to promote four of the ten at random,what is the probability that exactly two of the fourare women?Let A: { Exactly two of the four are women }B: { two of the four are men}, n(S) =

(104

)P (A ∩B) =

n(A ∩B)

n(S)

=

(72

)(32

)(104

)Practice:2.7.2, 2.7.3, 2.7.7, 2.7.14,

Chapter 3

Random Variables

3.1 Introduction

Throughout Chapter 2, probabilities were assignedto events - that is, to sets of sample outcomes. Theevents we dealt with were composed of either a finiteor a countably infinite number of sample outcomes,in which case the event’s probability was simply thesum of the probabilities assigned to its outcomes.One particular probability function that came upover and over again in Chapter was the assignmentof 1/n as the probability associated with each of then points in a finite sample space. This is the modelthat typically describes games of chance (and all ofour combinatorial probability problems in Chapter2).

i.e. Roll a die: S={1,2,3,4,5,6} ; N(S)= 6 = n

92

P(getting a 1)= P(getting a 2)= · · · = P(gettinga 6) = 1/6 = 1/n

The first objective of this chapter is tolook at several other useful ways for as-signing probabilities to sample outcomes.In so doing, we confront the desirabilityof ”redefining” sample spaces using func-tions known as random variables.

How and why these Random Variables areused - and what their mathematical properties are- become the focus of virtually everything coveredin Chapter 3.

As a case in point, suppose a medical researcher istesting eight elderly adults for their allergic reaction(yes or no) to a new drug for controlling blood pres-sure. One of the 28 = 256 possible sample pointswould be the sequence(yes, no, no, yes, no, no, yes, no),signifying that the first subject had an allergic re-

action, the second did not, the third did not, and so

on. Typically, in studies of this sort, the particularsubjects experiencing reactions is of little interest:what does matter is the number who show a reac-tion. If that were true here, the outcome’s relevantinformation (i.e., the number of allergic reactions)could be summarized by the number 3.Suppose X denotes the number of aller-

gic reactions among a set of eight adults.Then X is said to be a random variable and the

number 3 is the value of the random variable forthe outcome (yes, no, no, yes, no, no, yes, no).In general, random variables are functions

that associate numbers with some attributeof a sample outcome that is deemed to beespecially important.

If X denotes the random variable and s denotesa sample outcome, then X(s) = t, where t is a realnumber. For the allergy example, s = (yes, no, no,yes, no, no, yes, no) and t = 3.

Random variables can often create a dramaticallysimpler sample space. That certainly is the case

here - the original sample space has 256 (= 28)outcomes, each being an ordered sequence of lengtheight. The random variable X, on the other hand,has only nine possible values, the integers from 0to 8, inclusive.

In terms of their fundamental structure, all ran-dom variables fall into one of two broad categories,the distinction resting on the number of possiblevalues the random variable can equal. If the latteris finite or countably infinite (which would be thecase with the allergic reaction example), the ran-dom variable is said to be discrete; if the outcomescan be any real number in a given interval, the num-ber of possibilities is uncountably infinite, and therandom variable is said to be continuous. The dif-ference between the two is critically important, aswe will learn in the next several sections.

The purpose of Chapter 3 is to introduce theimportant definitions, concepts and computationaltechniques associated with random variables, bothdiscrete and continuous. Taken together these ideas

form the bedrock of modern probability and statis-tics.

3.2 Binomial and Hypergeometric Probabilities

This section looks at two specific probability sce-narios that are especially important, both for theirtheoretical implications as well as for their ability todescribe real-world problems. What we learn in de-veloping these two models will help us understandrandom variables in general, the formal discussionof which beings in Section 3.3.

The Binomial Probability DistributionBinomial probabilities apply to situations involv-

ing a series of independent and identical trials (Bernoulli),where each trial can have only one of two possibleoutcomes. Imagine three distinguishable coins be-ing tossed, each having a probability p of coming upheads. The set of possible outcomes are the eightlisted in Table 3.2.1. If the probability of any of thecoins coming up heads is p, then the probability ofthe sequence (H, H, H) is p3, since the coin tosses

qualify as independent trials. Similarly the proba-bility of (T, H, H) is (1−p)p2. The fourth column ofTable 3.2.1 shows the probabilities associated witheach of the three-coin sequences.

Table 3.2.11st 2nd 3rd Prob. Number of Heads

H H H p3 3

H H T p2(1− p) 2

H T H p2(1− p) 2

T H H p2(1− p) 2

H T T p(1− p)2 1

T H T p(1− p)2 1

T T H p(1− p)2 1

T T T (1− p)3 0

Suppose our main interest in the coin tosses isthe number of heads that occur. Whether the ac-tual sequence is, say (H, H, T) of (H, T, H) is im-material, since each outcome contains exactly twoheads. The last column of Table 3.2.1 shows thenumber of heads in each of the eight possible out-comes. Notice that there are three outcomes with

exactly two heads, each having an individual prob-ability of p2(1 − p). The probability, then, of theevent ”two heads” is the sum of those three individ-ual probabilities - that is, 3p2(1 − p). Table 3.2.2lists the probabilities of tossing k heads, where k =0, 1, 2, or 3.Table 3.2.2

Number of Heads Probability

0 (1− p)3

1 3p(1− p)2

2 3p2(1− p)

3 p3

Now, more generally, suppose that n coins aretossed, in which case the number of heads can equalany integer from 0 through n. By analogy,

P (k heads )

= ( number of ways to arrange k

heads and n - k tails)

·( probability of any particular

sequence having k heads and n - k tails)

= ( number of ways to arrange k

heads and n - k tails)

·pk(1− p)n−k

The number of ways to arrange k H’s and n -k T’s, though is n!

k!(n−k)!, or

(nk

)(recall Theorem

2.6.2).

Theorem 3.2.1 Consider a series of n inde-pendent trials, each resulting in one of two pos-sible outcomes, ”success” or ”failure.” Let p =P (success occurs at any given trial) and assumethat p remains constant from trial to trial. Then

P (k successes) =

(n

k

)pk(1− p)n−k, k = 0, 1, · · ·n

P (x successes) =

(n

x

)px(1− p)n−x, x = 0, 1, · · ·n

Comment The probability assignment given bythe equation in Theorem 3.2.1 is known as the bi-nomial distribution.

Example 3.2.1 An information technology cen-ter uses nine aging disk drives for storage. Theprobability that any one of them is out of service is0.06. For the center to function properly, at leastseven of the drives must be available. What is theprobability that the computing center can get itswork done?The probability that a drive is available is p = 1

- 0.06 = 0.94. Assuming the devices operate inde-pendently, the number of disk drives available hasa binomial distribution with n = 9 and p = 0.94.The probability that at least seven disk drives workis a reassuring 0.986:(9

7

)(0.94)7(0.06)2 +

(98

)(0.94)8(0.06)1

+(99

)(0.94)9(0.06)0 = 0.986

Practice:Suppose that since the early 1950’s some ten-

thousand independent UFO sightings have been re-ported to civil authorities. If the probability thatany sighting is genuine on the order of one in onehundred thousand, what is the probability that atleast one of the ten thousand was genuine?The probability of k sightings is given by the bi-

nomial probability model with n = 10,000 and p= 1/100,000. The probability of at least one gen-uine sighting is the probability that k ≥ 1. Theprobability of the complementary event, k = 0, is(99, 999/100, 000)10,000 = 0.905. Thus, the proba-bility that k ≥ 1 is 1− 0.905 = 0.095.

Practice: 3.2.2, 3.2.4, 3.2.5, 3.2.8

Practice 3.2.11

If a family has four children, is it more likely theywill have two boys and two girls or three of one sexand one of the other? Assume that the probabilityof a child being a boy is 1/2 and that the births areindependent events.0.375 and 0.5

The Hypergeometric DistributionThe second ”special” distribution that we want to

look at formalizes the urn problems that frequentedChapter 2. Our solutions to those earlier problemstended to be enumerations in which we listed theentire set of possible samples, and then counted theones that satisfied the event in question. The in-efficiency and redundancy of that approach shouldnow be painfully obvious. What we are seekinghere is a general formula that can be applied to anyand all such problems, much like the expression inTheorem 3.2.1 can handle the full range of questionarising from the binomial model.Suppose an urn contains r red chips and w white

chips, where r + w = N. Imagine drawing n chipsfrom the urn one at a time without replacing any ofthe chips selected. At each drawing we record thecolor of the chip removed. The question is, what isthe probability that exactly k red chips are includedamong the n that are removed?Notice that the experiment just described is simi-

lar in some respects to the binomial model, but themethod of sampling creates a critical distinction. Ifeach chip drawn was replaced prior to making an-other selection, then each drawing would be an in-dependent trial, the chances of drawing a red in anygiven trial would be a constant r/N, and the prob-ability that exactly k red chips would ultimatelybe included in the n selections would be a directapplication of Theorem 3.2.1:

P ( k reds drawn ) =

(n

k

)(r/N)k(1− r/N)n−k,

k = 0, 1, 2, ..., n

However, if the chips drawn are not replaced, thenthe probability of drawing a red on any given at-tempt is not necessarily r/N: Its value would depend

on the colors of the chips selected earlier. Since p =P(Red is drawn) = P(success) does not remain con-stant from drawing to drawing, the binomial modelof Theorem 3.2.1 does not apply. Instead, probabil-ities that arise from the ”no replacement” scenariojust described are said to follow the hypergeometricdistribution.

Theorem 3.2.2 Suppose an urn contains r redchips and w white chips, where r + w = N. If n chipsare drawn out at random, without replacement, andif k denotes the number of red chips selected, then

P (k red chips are chosen) =

(rk

)( wn−k

)(Nn

) (3.2.1)

where k varies over all the integers for which(rk

)and

( wn−k

)are defined. The probabilities appearing

on the right-hand side of Equation 3.2.1 are knownas the hypergeometric distribution.

Comment The appearance of binomial coeffi-cients suggests a model of selecting unordered sub-

sets. Indeed one can consider the model of select-ing a subset of size n simultaneously, where orderdoesn’t matter. In that case, the question remains:What is the probability of getting k red chips andn - k white chips? A moment’s reflection will showthat the hypergeometric probabilities given in thestatement of the theorem also answer that question.So, if our interest is simply counting the number ofred and white chips in the sample, the probabilitiesare the same whether the drawing of the sample issimultaneous or the chips are drawn in order with-out repetition.

Example (from 3.2.20)A corporate board contains twelve members. The

board decides to create a five-person Committee tohide Corporation Debt. Suppose four members ofthe boards are accountants. What is the probabilitythat the committee will contain two accountantsand three nonaccountants?

P (2A ∩ 2AC) =(42)(

83)

(125 )= 14

33

Practice: 3.2.22,3.2.23, 3.2.25

3.3 Discrete Random Variables

The binomial and hypergeometric distributions de-scribed in Section 3.2 are special cases of some im-portant general concepts that we want to exploremore fully in this section. Previously in Chapter 2,we studied in depth the situation where every pointin a sample space is equally likely to occur (recallSection 2.6).How to assign probabilities to outcomes that are

not binomial or hypergeometric is one of the majorquestions investigated in this chapter.The purpose of this section is to (1) outline the

general conditions under which probabilities can beassigned to sample spaces and (2) explore the waysand means of redefining sample spaces through theuse of random variables. The notation introducedin this section is especially important and will beused throughout the remainder of the book.

Assigning Probabilities: The Discrete CaseWe begin with the general problem of assigning

probabilities to sample outcomes, the simplest ver-sion of which occurs when the number of points inS is either finite or countably infinite. The proba-bility function, p(s), that we are looking for in thosecases satisfy the conditions in Definition 3.3.1.

Definition 3.3.1 Suppose that S is finite orcountably infinite sample space. Let p be a realvalued function defined for each element of S suchthat

a) 0 ≤ p(s) for each s ∈ S

b)∑

all s∈S p(s) = 1

Then p is said to be a discrete probabilityfunction .

Comment Once p(s) is defined for all s, it fol-lows that the probability of any event A - that isP(A) - is the sum of the probabilities of the out-comes comprising A:P (A) =

∑all s∈A p(s)

Defined in this way, the function P(A) satisfies theprobability axioms given in Section 2.3. The nextseveral examples illustrate some of the specific formthat p(s) can have and how P (A) is calculated.

Example 3.3.2 Suppose a fair coin is tosseduntil a head comes up for the first time. What arethe chances of that happening on an odd-numberedtoss?Note that the sample space here is countably in-

finite and so is the set of outcomes making up theevent whose probability we are trying to find. TheP(A) that we are looking for, then, will be the sumof an infinite number of terms.Let p(s) be the probability that the first head ap-

pears on the sth toss. Since the coin is presumed tobe fair, p(1) = 1/2. Furthermore, we would expectthat half the time, when a tail appears, the nexttoss would be a head, so p(2) = 1/2 times 1/2 =1/4. In general, p(s) = (1/2)s, s = 1, 2, · · ·Does p(s) satisfy the conditions stated in Defini-

tion 3.3.1? Yes. Clearly, p(s) greater than or equal

to 0 for all s. To see that the sum of the probabilitiesis 1, recall the formula for the sum of a geometricseries: If 0 < r < 1,∑∞

s=0 rs = 1

1−rApplying Equation 3.3.2 to the sample space here

confirms that P (S) = 1:

P (S) =∑∞

s=1 p(s) =∑∞

s=1

(12

)s=∑∞

s=0

(12

)s−(12

)0= 1

(1−12)− 1 = 1

Now, let A be the event that the first head appearson an odd-numbered toss. Then P (A) = p(1) +p(3) + p(5) + · · ·

=

∞∑s=0

p(2s + 1) =

∞∑s=0

(1

2

)2s+1

=

∞∑s=0

(1

2

)2s+1

=1

2

∞∑s=0

(1

4

)s

=1

2

[1

(1− 14)

]=

2

3

Example 3.3.4Is

p(s) =1

1 + λ

(λ

1 + λ

)s

, s = 0, 1, 2, · · · ;λ > 0

a discrete probability function? Why or why not?A simple inspection p(s) ≥ 0 for all s

∑all s∈S

p(s) =

∞∑s=0

1

1 + λ

(λ

1 + λ

)s

=1

1 + λ

(1

1− λ1+λ

)

=1

1 + λ

1 + λ

1= 1

Defining ”New” sample SpacesWe have seen how the function p(s) associates a

probability with each outcome, s, in a sample space.Related is the key idea that outcomes can often begrouped or reconfigured in ways that may facilitateproblem solving.The function that replaces the outcome s with the

numerical value is called a random variable.

Definition 3.3.2 A function whose domain is asample space S and whose values form a finite orcountably infinite set of real numbers is called dis-

crete random variable. We denote random variableby upper case X or Y .

Example 3.3.5.Consider tossing two dice, an experiment for which

the sample space is a set of ordered pairs, S ={(i, j)|i = 1, 2, · · · , 6; j = 1, 2, · · · , 6}. For a va-riety of games the sum showing is what matterson a given turn. That being the case, the origi-nal sample S of thirty six ordered pairs would notprovide a particularly convenient backdrop for dis-cussing the rules of those games. It would be betterto work directly with sums. Of course the elevenpossible sums (from two to twelve) are simply thedifferent values of the random variable X whereX(i, j) = i + j.

Comment : In the above example, suppose wedefine a random variable X1 that gives the resulton the first die and a random variableX2 that givesthe result on the second die. Then X = X1 +X2.Note how easily we could extend this idea to the tossof three dice or ten dice. The ability to conveniently

express complex events in the terms of simple onesis an advantage of the random variable concept thatwe will see playing out over and over again.

The probability density Function

Definition 3.3.3. Associated with every dis-crete random variable X is a probability densityfunction (or pdf), denoted pX(k) where

pX(k) = P ({s ∈ S|X(s) = k})Note that pX(k) = 0 for any k not in the range of

X . For notational simplicity, we will usually deleteall references to s and S and write pX(k) = P (X =k)

Comment. We have already discussed at lengthtwo examples of the function pX(k). Recall the bi-nomial distribution derived in Section 3.2. If we letthe random variable X denote the number of suc-cesses in n independent trials, then Theorem 3.2.1states that

pX(k) = P (X = k) =

(n

k

)pk(1− p)n−k,

k = 0, 1, 2 · · · , n

EXAMPLE 3.3.6Consider rolling two dice as described in Example

3.3.5. Let i and j denote the faces showing on thefirst and the second die respectively, and define theR.V. X to be the sum of the two faces: X(i, j) =i + j. Find pX(k).Table 3.3.3k pX(k) k pX(k)2 1/36 8 5/363 2/36 9 4/364 3/36 10 3/365 4/36 11 2/366 5/36 12 1/367 6/36

EXAMPLE 3.3.7Acme Industries typically produces three electric

power generators per day: some pass the company’squality control inspection on their first try and areready to be shipped: others need to be retooled.The probability of a generator needing further workis 0.05 If a generator is ready to be shipped, the firmearns a profit of $10,000. If it needs to be retooled,it ultimately cost the firm $2000. Let X be therandom variable quantifying the company’s dailyprofit. Find pX(k).The underlying sample space here is a set of n = 3

independent trials, wherep = P ( generator passes inspection) = 0.95.If the random variable X is to measure the com-

pany’s daily profit, then

X = $10000× (no. of generators passing inspection)

− $2000× (no. of generators needing retooling)

What are the possible profit? k? It depend onthe number of defective(non defective). The pdf ofthe profit will correspond to the pdf of number ofdefective which is distributed as binomial.For instanceX(s, f, s) = 2($10000)−1($2000) =

$18000. Moreover, the R.VX equals $18, 000 when-ever the day’s output consists of two successes andone failure. It follows that

P (X = $18000) = pX(18000)

=

(3

2

)(0.95)2(0.05)1 = 0.135375

Table 3.3.4No. Defective k= Profit pX(k)0 $30000 0.8573751 $18000 0.1353752 $6000 0.0071253 -$6000 0.000125

Linear Transformation

Theorem 3.3.1.Suppose X is a discrete random variable. Let

Y = aX + b, where a and b are constants. Then

pY (y) = pX((y−b)a )

Proof.

pY (y) = P (Y = y) = P (aX + b = y)

= P (X =(y − b)

a) = pX

((y − b)

a

)Practice 3.3.11 Suppose X is a binomial dis-

tribution with n = 4 and p = 23. What is the pdf

of 2X + 1Given

pX(x) =

(n

x

)px(1− p)n−x

Let Y = 2X + 1, then

P (Y = y) = P (2X + 1 = y) = P (X =(y − 1)

2)

= pX

((y − 1)

2

)=

(n

(y−1)2

)p(y−1)2 (1− p)n−

(y−1)2

=

(4

(y−1)2

)(2

3)(y−1)2 (1− (

2

3))4−

(y−1)2

The Cumulative Distribution functionIn working with random variables, we frequently

need to calculate the probability that the value ofa random variable is somewhere between two num-bers. For example, suppose we have an integer-valued random variable. We might want to calcu-late an expression likeP (s ≤ X ≤ t)If we know the pdf for X, then

P (s ≤ X ≤ t) =t∑

k=s

pX(k).

but depending on the nature of pX(k) and thenumber of terms that need to be added, calculatingthe sum of pX(k) from k = s to k = t may bequite difficult. An alternate strategy is to use thefact that

P (s ≤ X ≤ t) = P (X ≤ t)− P (X ≤ s− 1)

where the two probabilities on the right representcumulative probabilities of the random variable X.If the latter were available (and they often are),then evaluating P (s ≤ X ≤ t)

by one simple subtraction would clearly be easierthan doing all the calculations implicit in

∑tk=s pX(k).

Definition 3.3.4. Let X be a discrete randomvariable. For any real number t, the probabilitythat X takes on a value ≤ t is the cumulative dis-tribution function (cdf) of X (written FX(t)). Informal notation, FX(t) = P ({s ∈ S|X(s) ≤ t}).As was the case with pdfs, references to s and S aretypically deleted, and the cdf is written FX(t) =P (X ≤ t)

EXAMPLE 3.3.10Suppose we wish to compute P (21 ≤ X ≤ 40)

for a binomial random variable X with n = 50 andp = 0.6. from Theorem 3.2.1, we know the formulafor pX(k), so P (21 ≤ X ≤ 40) can be written asa simple, although computationally cumbersome,sum:

P (21 ≤ X ≤ 40) =

40∑k=21

(50

k

)(0.6)k(0.4)50−k

Equivalently, the probability we are looking forcan be expressed as the difference between two cdfs:

P (21 ≤ X ≤ 40) = P (X ≤ 40)− P (X ≤ 20)

= FX(40)− FX(20)

As it turns out, values of the cdf for a binomialrandom variable are widely available, both in booksand in computer software. Here, for example,FX(40) = 0.9992 and FX(20) = 0.0034, so

P (21 ≤ X ≤ 40) = 0.9992− 0.0034 = 0.9958

Practice3.3.1 An urn contains five balls numbered 1 through

5. Two balls are drawn simultaneously.

a) Let X be the larger of the two numbers drawn.Find pX(k).

b) Let V be the sum of the two numbers drawn.Find pV (k).

a) Each outcome has probability 1/10 OutcomeX = larger no. drawn 1,Outcome X = larger no. drawn1 ,2 21 ,3 31, 4 41, 5 52, 3 32 ,4 42 ,5 53 ,4 43 ,5 54 ,5 5Counting the number of each value of the larger

of the two and multiplying by 1/10 gives the pdf:

k pX(k)2 1/103 2/104 3/105 4/10

b)

Outcome X = larger no. drawn sum1 ,2 2 31 ,3 3 41, 4 4 51, 5 5 62, 3 3 52 ,4 4 62 ,5 5 73 ,4 4 73 ,5 5 84 ,5 5 9

k pX(k)3 1/104 1/105 2/106 2/107 2/108 1/109 1/10

Homework: 3.3.4, 3.3.5, 3.3.13, 3.3.14

3.4 Continuous random Variables

If a random variable X is defined over a continu-ous sample space S (contained uncountably infinitenumber of outcomes), then the r.v X is said to becontinuous r.v.Rolling a pair of dice and recording the faces that

appear is an experiment with a discrete sample space;choosing a number at random from the interval [0,1] would have a continuous sample space.

Example of continuous r.v are time, temperature,weight and etc.

How do we assign a probability to this type ofsample space?

When S is discrete (countable), we can assign eachoutcome s with p(s).If a random variable X is defined on the sam-

ple space, the probabilities associated with its out-comes are assigned by the probability density func-tion pX(X = k).

This will not work when S is continuous.The fact that a continuous sample space has an

uncountably infinite number of outcomes eliminatesthe option of assigning a probability to each pointas we did in the discrete case with the function p(s).

We begin this section with a particular pdf de-

fined on a discrete sample space that suggests howwe might define probabilities, in general, on a con-tinuous sample space.

Suppose an electronic surveillance monitor is turnedon briefly at the beginning of every hour and has a0.905 probability of working properly, regardless ofhow long it has remained in service.

If we let the random variable X denote the hourat which the monitor first fails, then pX(k) is theproduct of k individual probabilities:

pX(k)

= P (X = k)

= P ( Monitor fails for the first time at the kth hour)

= P ( Monitor functions properly for first k - 1 hours ∩ Monitor fails at the kth hour)

= (0.905)k−1(0.095), k = 1, 2, 3, · · ·

Figure 3.4.1 shows a probability histogram of pX(k)

for k values ranging from 1 to 21. Here the heightof the kth bar is pX(k), and since the width of eachbar is 1, the area of the kth bar is also pX(k).

Now, look at Figure 3.4.2, where the exponentialcurve y = 0.1e−0.1x is superimposed on the graphof pX(k). Notice how closely the area under thecurve approximates the area of the bars. It followsthat the probability that X lies in some given inter-val will be numerically similar to the integral of theexponential curve above that same interval.For example, the probability that the monitor fails

sometime during the first four hours would be thesum

P (0 ≤ X ≤ 4) =

4∑k=0

pX(k)

=

4∑k=0

(0.905)k−1(0.095)

= 0.3297

To four decimal places, the corresponding area

under the exponential curve is the same:

∫ 4

00.1e0.1xdx = 0.3297

Implicit in the similarity here between pX(k) andthe exponential curve y = 0.1e−0.1x is our sought-after alternative to p(s) for continuous sample spaces

Instead of defining probabilities for individual points,we will define probabilities for intervals of points,and those probabilities will be areas under the graphof some function (such as y = 0.1e−0.1x, where theshape of the function will reflect the desired prob-ability measure to be associated with the samplespace.

Definition 3.4.1 A probability function P on aset of real numbers S is called continuous if thereexist a function f (t) such that for any closed inter-

val [a, b] ⊂ S, P ([a, b]) =∫ ba f (t)dt.

Comment If a probability function P satisfiesDefinition 3.4.1, then P (A) =

∫A f (t)dt for any set

A where the integral is defined.Conversely, suppose a function f (t) has two prop-

erties

a) f (t) ≥ 0 for all t.

b)∫∞−∞ f (t)dt = 1.

If P (A) =∫A f (t)dt for all A, then P will satisfy

the probability axioms given in section 2.3.

Choosing the Function f (t)

We have seen that the probability structure ofany sample space with a finite or countably infi-nite number of outcomes is defined by the functionp(s) = P ( Outcome is s). The function f (t) servesan analogous purpose. Specifically , f (t) definesthe probability structure of S in the sense that theprobability of any interval in the sample space isthe integral of f (t).Example 3.4.1. The continuous equivalent

of the equiprobable probability model on a discrete

sample space is the function f (t) defined by f (t) =1

b−a for all t in the interval [a, b] (and f (t) = 0 oth-erwise). This particular f (t) places equal probabil-ity weighting on every closed interval of the samelength contained in the interval [a, b]. For example,suppose a = 0 and b = 10, and let A = [1, 3] andB = [6, 8], then f (t) = 1

10 and

P (A) =

∫ 3

1

(1

10

)dt =

2

10=

∫ 8

6

(1

10

)dt

See figure 3.4.3

Example 3.4.2Could f (t) = 3t2, 0 ≤ t ≤ 1 be used to defined

the probability function for a continuous samplespace whose outcomes consist of all the real num-bers in the interval [0, 1]?Yes because (1) f (t) ≥ 0 for all t, and

(2)∫ 10 f (t)dt =

∫ 10 3t2dt = 3t3|10 = 1

Notice that the shape of f (t) (see Figure 3.4.4)implies that outcomes close to 1 are more likely tooccur than are outcomes close to 0. For example,

P ([0, 1/3]) =

∫ 1/3

03t2dt = t3|1/30

while

P ([2/3, 1]) =

∫ 1

2/33t2dt = t3|12/3 = 1− 8/27 = 19/27

Example 3.4.3 By far the most important of allcontinuous probability function is the ”bell-shaped”curve, known more formally as the normal (Gaus-sian) distribution. The sample space for the normaldistribution is the entire real line; its probability isgiven by

f (t) =1√2πσ

exp

[−1

2

(t− µ

σ

)2]

Fitting f(t) to Data: The Density-ScaledHistogramHow to determine which f(t) to use for certain

data set?

Create histogram for the data. Specifically prob-ability density histogram.Unlike frequency histogram (refer to figure 3.4.6

and figure 3.4.7.) to make sure that area under thehistogram is 1.

Continuous Probability Density Functions(pdf)

Definition 3.4.2Let Y be a function from a sample space S to the

real numbers. The function Y is a called a continu-ous random variable if there exists a function fY (y)such that for any real numbers a and b,with a < b

P (a ≤ Y ≤ b) =

∫ b

afY (y)dy

The function fY (y) is the probability density func-tion (pdf) for Y . As in the discrete case, the cumu-lative distribution function (cdf) is defined by

FY (y) = P (Y ≤ y)

The cdf in the continuous case is just an integralof fY (y), that is, FY (y) =

∫ y∞ fY (t)dt

Let f (y) be an arbitrary real-valued function de-fined on some subset S of the real numbers. If1. f (y) ≥ 0 for all y in S and2.∫s fY (y)dy = 1

then f (y) = fY (y) for all y, where the randomvariable Y is the identity mapping.

Example 3.4.5Suppose we would like a continuous random vari-

able Y to ”select” a number between 0 and 1 insuch a way that the intervals near the middle of therange would be more likely to be represented thanintervals near 0 and 1. One pdf having that prop-erty is the function fY (y) = 6y(1 − y), 0 ≤ y ≤ 1(see figure 3.4.9). Do we know for certain that thefunction pictured is a legitimate pdf?Yes because fY (y) ≥ 0 for all y, and∫ 10 6y(1− y)dy = 6[y2/2− y3/3]|10 = 1

Continuous Cumulative Distribution func-tions

Definition 3.4.3. The cdf for a continuous

random variable Y is the indefinite integral of itspdf:

FY (y) =

∫ y

−∞fY (r)dr

= P ({s ∈ S|Y (s) ≤ y})= P (Y ≤ y)

Theorem 3.4.1 Let fY (y) be the pdf of a con-tinuous R.V. with cdf FY (y). Then

d

dyFY (y) = fY (y)

Theorem 3.4.2 Let Y be a continuous randomvariable with cdf FY (y). Then

a) P (Y > s) = 1− FY (s)

b) P (r < Y ≤ s) = FY (s)− FY (r)

c) limy→∞ FY (y) = 1

d) limy→−∞ FY (y) = 0

Transformation

Theorem 3.4.3 Suppose X is a continuousR.V. Let Y = aX + b where a ̸= 0 and b are

constant. Then

fY (y) =1

|a|fX

(y − b

a

)Proof. We begin by writing an expression for

the cdf of Y :

FY (y) = P (Y ≤ y)

= P (aX + b ≤ y)

= P (aX ≤ y − b)

At this point we will consider two cases, First leta > 0. Then

FY (y) = P (Y ≤ y)

= P (aX + b ≤ y)

= P (aX ≤ y − b)

= P

(X ≤ y − b

a

)

and differentiating FY (y) yield fY (y).

fY (y) =d

dyFY (y)

=d

dyFX

(y − b

a

)=

1

afX

(y − b

a

)=

1

|a|fX

(y − b

a

)If a < 0

FY (y) = P (Y ≤ y)

= P (aX + b ≤ y)

= P (aX ≤ y − b)

= P

(X ≥ y − b

a

)= 1− P

(X ≤ y − b

a

)Differentiation yield

fY (y) =d

dyFY (y)

=d

dy

[1− FX

(y − b

a

)]= −1

afX

(y − b

a

)=

1

|a|fX

(y − b

a

)Practice 3.4.1More Practices3.4.2, 3.4.4,3.4.5, 3.4.6, 3.4.8, 3.4.11, 3.4.12,3.4.16,

3.4.17Practice 3.4.5.The length of time, Y , that a customer spends in

line at a bank tellerfs window before being servedis described by the exponential pdffY (y) = 0.2e−0.2y, y > 0.(a) What is the probability that a customer will

wait more than ten minutes?

P (Y > 10) =

∫ +∞

100.2e−0.2y

=0.2e−0.2y

−0.2|+∞10

= lim b→+∞e−0.2y |y=by=10

= lim b→+∞1

eb− (− 1

e5)

=1

e5= 0.135

(b) Suppose the customer will leave if the wait ismore than ten minutes. Assume that the customergoes to the bank twice next month. Let the randomvariable X be the number of times the customerleaves without being served. CalculatepX(1).Let A = Probability customers leaves on first tripB = Probability customers leaves on second tripP (A) = P (B) = 0.135PX(1) = P (A)P (BC) + P (AC)P (B)= 2(0.135)(0.865) = 0.23355

3.5 Expected Values

measure of central tendency - mean - Expected valueµx or µY

Definition 3.5.1Let X be a discrete random variable with proba-

bility function pX(k). The expected value of X isdenoted as E(X)( or µ or µX) is given by

E(X) = µ = µX =∑all x

x · pX(x)

Similarly if Y be a continuous random variablewith probability function fY (y).

E(Y ) = µ = µY =

∫ ∞

−∞y · fY (y)dy

Comment: We assume that both the sum andthe integral in Definition 3.5.1 converges absolutely

∑all x

|x|pX(x) < ∞,

∫ ∞

−∞|y| · fY (y)dy < ∞

Theorem 3.5.1 Suppose X is a binomial R.V.with parameter n and p. then E(X) = np.Proof According to Definition 3.5.1, E(X) for a

binomial random variable is the sum

E(X) =

n∑x=0

x · pX(x)

=

n∑x=0

x

(n

x

)px(1− p)n−x

=

n∑x=0

x · n!x!(n− x)!

px(1− p)n−x

=

n∑x=1

n!

(x− 1)!(n− x)!px(1− p)n−x

= np

n∑x=1

(n− 1)!

(x− 1)!(n− x)!px−1(1− p)n−x

At this point, a trick is called for. If E(X) =∑all k g(x) can be factored in such a way that

E(X) = h∑

all k pX∗(x), where pX∗(x) is the pdffor some random variable X., then E(X)=h, sincethe sum of a pdf over its entire range is 1. Here,suppose that np is factored out of Equation 3.5.2.Then

E(X) = np

n∑x=1

(n− 1

x− 1

)px−1(1− p)n−x

Now, let j = x− 1. It follows that

E(X) = npn−1∑j=0

(n− 1

j

)pj(1− p)(n−1)−j

Finally, letting m = n− 1 gives

E(X) = npm∑j=0

(m

j

)pj(1− p)m−j

and, since the value of the sum is 1 (why?),E(X) = np

Comment: The statement of Theorem 3.5.1make sense since for example if a multiple choicetest have 100 questions, each with five possible an-swer, we would expect to get twenty correct, justby guessing. E(X) = np = 100 · (15)

Theorem 3.5.2 suppose X is a hypergeometricR.V. with parameters r, w, and n. That is supposean urn contains r red balls and w white balls. Asample of size n is drawn simultaneously from theurn. LetX be the number of red balls in the sample.ThenE(X) = n · r

r+w

Example 3.5.6The distance, Y , that a molecule in a gas trav-

els before colliding with another molecule can bemodeled by the exponential pdf

fY (y) =1

µe−yµ y ≥ 0

where µ is a positive constant known as mean free

path. Find E(Y ).Since the R.V. here is continuous, its expected

value is

E(Y ) =

∫ ∞

0y

1

µe−yµ dy

Integrating by parts give E(Y ) = µThe above Equation shows that µ is aptly namedit

does, in fact, represent the average distance a moleculetravels, free of any collisions. Nitrogen (N2), forexample, at room temperature and standard at-mospheric pressure has µ = 0.00005 cm. An N2molecule, then, travels that far before colliding withanother N2 molecule, on the average.

Example 3.5.7One continuous distribution that has a number

of interesting applications in Physics is the Raleighdistribution, where the pdf is given by

fY (y) =y

a2e−( y

2

2a2)a > 0; 0 ≤ y < ∞

Calculate E(Y ).

E(Y ) =

∫ ∞

0y · y

a2e−( y

2

2a2)dy.

Letting ν = y√2a,

√2adν = dy. Then

E(Y ) = 2√2a

∫ ∞

0ν2 · e−(ν2)dν

= 2√2a

1

4

√π = a

√π/2

This is due to∫∞0 ν2 · e−(ν2)dν is a special case

of the general form ν2k · e−(ν2). For k = 1∫ ∞

0ν2k · e−(ν2)dν

=

∫ ∞

0ν2 · e−(ν2)dν

=1

4

√π

∫ ∞

0ν2 · e−(ν2)dν

Let u = ν2, du = 2νdν we have

∫ ∞

0ue−u du

2√u

=1

2

∫ ∞

0u1/2e−udu

Define Gamma function Γ(x) =∫∞0 tx−1e−tdt

Γ(1) = 1Γ(x + 1) = xΓ(x)Γ(1/2 + 1) = 1/2Γ(1/2) = 1/2

√π

therefore12

∫∞0 u1/2e−udu =

√π4

A second Measure of Central Tendency: The Median

Definition 3.5.2. If X is a discrete R.V, themedian, m, is that point for which P (X < m) =P (X > m). In the event that P (X ≤ m) = 0.5and P (X ≥ m′) = 0.5, the median is the arith-metic average, (m+m′)/2. If y is a continuous R.V.,its median is the solution to the integral equation∫m−∞ fY (y)dy = 0.5

Example 3.5.8 If a random variable’s pdf issymmetric, both µ and m will be equal. ShouldpX(k) and fY (y) not be symmetric, though, thedifference between the expected value and the me-dian can be considerable, especially if the symmetrytakes the form of extreme skewness. The situationdescribed here is a case in point.

Soft glow makes a 60-watt light bulb that is adver-tised to have an average life of one thousand hours.Assuming that the performance claim is valid, is itreasonable for consumer to conclude that soft glowbulb they bought will last for approximately onethousand hours?No! If the average life of a bulb is one thousand

hours, the pdf fY (y), modeling the length of timeY , that it remains lit before burning out is likely tohave the form

fY (y) = 0.001e−0.001y, y > 0. (3.5.1)

Equation 3.5.1 is very skewed pdf. The median isto the left of the mean.

The median is the solution to equation∫ m

00.001e−0.001y = 0.5

where m = 693So even though the average life of these bulb is

one thousand hours. there is a 50% chance that theone that you buy will last 693 hours.

Homework : 3.5.8, 3.5.10, 3.5.12, 3.5.16, 3.5.27

The Expected Value of a Function of aRandom Variable

Theorem 3.5.3. Suppose X is a discrete ran-dom variable with pdf pX(x). Let g(X) be a func-tion of X . Then the expected value of the randomvariable g(X) is given by

E[g(X)] =∑all x

g(x) · pX(x)

provided that∑

all x |g(x)|pX(x) < ∞.If Y is a continuous R.V. with pdf fY (y), and if

g(Y ) is a continuous function, the expected value

of the R.V g(Y ) is

E[g(Y )] =

∫ ∞

−∞g(y) · fY (y)dy,

provided that∫∞−∞ |g(y)| · fY (y)dy < ∞.

Corollary. For any R.V W , E(aW + b) =aE(W ) + b where a and b are constants.Proof. SupposeW is continuous; the proof for the

discrete case is similar. By theorem 3.5.3, E(aW +b) =

∫∞−∞(aw+ b)fW (w)dw, but the latter can be

written as a∫∞−∞w·fW (w)dw+b

∫∞−∞ fW (w)dw =

aE(W ) + b · 1 = aE(W ) + b

Example 3.5.10suppose the amount of propellant, Y , put into a

can of spray paint is a random variable with pdf

fY (y) = 3y2, 0 < y < 1

Experience have shown that the largest surface areathat can be painted by a can having Y amount ofpropellant is twenty times the area of a circle gener-ated by a radius of Y ft. If the Purple Dominoes, a

newly formed urban group, have just bought theirfirst can of spray paint, can they expect to haveenough to cover a 5′ × 8′ subway panel?No. By assumption, the maximum area (in ft2 )

that can be covered by a can of paint is describedby the function

area = g(Y ) = 20πY 2

. According to the second statement in Theorem3.5.3, though, the average value for g(Y ) is slightlyless than the desired 40ft2:

E[g(Y )] =

∫ 1

020πy2 · 3y2dy

=60πy5

5|10

= 12π = 37.7ft2

Example 3.5.11A fair coin is tossed until a head appears. You

will be given (12)k dollars if that first head occurs

on the kth toss. How much money can you expectto be paid? Let the random variable X denote the

toss at which the first head appears. Then

pX(x) = P (X = x)

= P (1 st k-1 tosses are tails and kth toss is a head )

=

(1

2

)x−1 1

2

=

(1

2

)x

More over,

E( Amount won) = E

[(1

2

)X]

= E[g(X)]

=∑all x

g(x) · pX(x)

=

∞∑x=1

(1

2

)x(12

)x

=

∞∑x=1

(1

2

)2x

=

∞∑x=1

(1

4

)x

=

∞∑x=0

(1

4

)x

−(1

4

)0

=1

1− 14

− 1 = $0.33

Practice 3.5.28

Homework: 3.5.27,3.5.29, 3.5.31

3.6 The Variance

Dispersion of the distribution.

Definition 3.6.1 The variance of a random vari-able is the expected value of its squared deviationsfrom µ. If X is discrete with pdf pX(x),

Var(X) = σ2 = E[(X−µ)2] =∑all x

(x−µ)2·pX(x).

If Y is continuous with pdf fY (y).

Var(Y ) = σ2 = E[(Y − µ)2]

=

∫ ∞

−∞(y − µ)2 · fY (y)dy.

If E(X2) or E(Y 2) is not finite, the variance is notdefined.

Comment: The unit of the variance is the squareof the unit of the R.V. For application the square

root of the variance namely standard deviation canbe used to measure the dispersion. It has the sameunit as the R.V.

σ =

√∑

all x(x− µ)2 · pX(x). if X is discrete√∫∞−∞(y − µ)2 · fY (y)dy if Y is continuous

Theorem 3.6.1LetW be any R.V, discrete or continuous, having

mean µ and for which E(W 2) is finite. The

Var(W ) = σ2 = E(W 2)− µ2

Proof. We will proof the theorem for the contin-uous case. The argument for discrete W is similar.In theorem 3.5.3, let g(W ) = (W − µ)2. Then

Var(W ) = E[(W − µ)2] =

∫ ∞

−∞g(w)fW (w)dw

=

∫ ∞

−∞(w − µ)2fW (w)dw

Squaring out the term (w− µ)2 that appear in theintegrand and using the additive property of inte-

grals give∫ ∞

−∞g(w)fW (w)dw = E(W 2)− µ2

Read Example 3.6.1 for Variance of hy-pergeometric distribution

Theorem 3.6.2Let W be any R.V, discrete or continuous, hav-

ing mean µ and for which E(W 2) is finite. ThenVar[(aW + b)] = a2Var(W ).Proof. Using the same approach taken in the

proof of theorem 3.6.1, it can be shown thatE[(aW+b)2] = a2E(W 2) + 2abµ + b2. We also know fromcorrollary to Theorem 3.5.3 that E(aW + b) =aµ + b. Using Theorem 3.6.1, then we can write

Var(aW + b)

= E[(aW + b)2]− [E(aW + b)]2

= [a2E(W 2) + 2abµ + b2]− [aµ + b]2

= [a2E(W 2) + 2abµ + b2]− [a2µ2 + 2abµ + b2]

= a2[E(W 2)− µ2] = a2Var(W )

Example 3.6.2A R.V Y is described by the pdf

fY (y) = 2y, 0 < y < 1

What is the standard deviation of 3Y + 2?First, we need to find the variance of Y .

E(Y ) =

∫ 1

0y · 2ydy =

2

3and

E(Y 2) =

∫ 1

0y2 · 2ydy =

1

2So

Var(Y ) = E(Y 2)− µ2 =1

2−(2

3

)2

=1

18

Then by Theorem 3.6.2

Var(3Y + 2) = (3)2 · Var(Y ) = 9 · 18=

1

2which makes the standard deviation of 3Y +2 equals

to√

12 or 0.71

Practice: 3.6.4: Compute the variance for a uni-form random variable defined on the unit interval

fY (y) = 1 0 ≤ y ≤ 1

µ =

∫ 1

0y · 1dy = 1/2

E(Y 2) =

∫ 1

0y2 · 1dy = 1/3.

Var(Y ) = 1/3− (1/2)2 = 1/12

Practice: 3.6.2, 3.6.6, 3.6.8, 3.6.11, 3.6.14

Higher Moments.

E(W ) (measure of central tendency - locationmeasure) is the first moment about the origin

σ2 = E(W 2) − [E(W )]2 (measure of dispersion)the second moment about the mean

How about skewness of the distribution? Can usethird moment.

Definition 3.6.2. Let W be any random vari-

able with pdf fW (w). For any positive integer r,

1. The rth moment of W about the origin, µr, isgiven by µr = E(W r),

provided that∫∞−∞ |w|r · fW (w)dw < ∞ or

provided that the analogous on the summationholds if W is discrete. When r = 1, we usuallydrop the subscript and write E(W ) = µ ratherthan µ1

2. The rth moment of W about the mean, µ′r, isgiven by µ′r = E[(W −µ)r], provided the finite-ness conditions of part 1 hold.

Comment. We can express µ′r in terms of µj,j = 1, 2, · · · , r by simply writing out the binomialexpansion of (W − µ)r:

Recall binomial expansion(x + y)n =

∑nk=0

(nk

)xkyn−k

then(W−µ)r = (W+(−µ))r =

∑rj=0

(rj

)W j(−µ)r−j

and

E(W − µ)r = E(W + (−µ))r

= E

r∑j=0

(r

j

)W j(−µ)r−j

µ′r = E[(W − µ)r] =

r∑j=0

(r

j

)E(W j)(−µ)r−j

Thus,

µ′2 = E[(W − µ)2] = σ2 = µ2 − µ21

µ′3 = E[(W − µ)3] = µ3 − 3µ1µ2 + 2µ31

µ′4 = E[(W − µ)4] = µ4 − 4µ1µ3 + 6µ21µ2 − 3µ41

Example 3.6.3

a) Coefficient of skewness = γ1 =E[(W−µ)3]

σ3

Note that the division by σ3 makes γ1 dimen-sionless. Note also that when the pdf is symmetry,E[(W − µ)3] will be zero.When γ1 > 0 distribution is skew rightWhen γ1 < 0 distribution is skew left

b) Coefficient of kurtosis γ2( measure the flatnessor peakedness of a pdf is a shape parameter relativeto the bell shape pdf)

γ2 =E[(W − µ)4]

σ4− 3

Low kurtosis - flatHigh kurtosis - peakedFor certain pdf’s γ2 is useful measure of peaked-

ness: a relatively flat pdf’s are said to be platykur-tic; more peaked pdf’s are called leptokurtic.

Sometimes E(W j) is not finite. How to deter-mine?

Theorem 3.6.3. If the kth moment of a randomvariables exists, all moments of order less that kexist (finite).

Example 3.6.4 (fourth edition)The pdf for a Student t random variable is given

by

fY (y) =c(n)(

1 + y2

n

)(n+1)/2, −∞ < y < ∞, n ≥ 1

where n is referred to as the ” degree of freedom” of the distribution, and c(n) is a constant. Bydefinition, the 2k moment is the integral

E(Y 2k) = c(n) ·∫ ∞

−∞

y2k(1 + y2

n

)(n+1)/2Is E(Y 2k) finite?Not necessarily! Recall from calculus that an in-

tegral of the form ∫ ∞

−∞

1

yαdy

will converge only if α > 1.

The convergence of y2k(1+y2

n

)(n+1)/2 are the same as

those for

y2k

y2(n+1)/2

=1

yn+1−2k.

Therefore, if E(Y 2k) to be finite, we must have

n + 1− 2k > 1

or equivalently, 2k < n. thus for a Student t ran-dom variable with, say n = 9 degree of freedom hasE(X8) < ∞, but no moment of order higher thaneight exists.

Practices: 3.6.19, 3.6.20, 3.6.22

3.7 Joint Densities

This section introduce the concepts, definition andmathematical techniques associated with distribu-tion based on two (or more) random variables.

Discrete Joint Pdfs

Definition 3.7.1. Suppose S is a dicrete sam-plea space on which two r.v. X and Y are defined.The joint probability density function of X and Y(or joint pdf) is denoted

pX,Y (x, y) = P ({s|X(s) = x and Y (s) = y})= P (X = x, Y = y)

Example 3.7.1 A supermarket has two expresslanes. Let X and Y denote the number of cus-tomers in the first and second, respectively, at anygiven time. During nonrush hours, the joint pdf ofX and Y is summarized by the following table:

Y0 1 2 3

0 0.1 0.2 0 0X 1 0.2 0.25 0.05 0

2 0 0.05 0.05 0.0253 0 0 0.025 0.05

Find P (|X−Y | = 1), the probability thatX andY differ by exactly 1.

By definition

P (|X − Y | = 1)

=∑ ∑

|x−y|=1pX,Y (x, y)

= pX,Y (0, 1) + pX,Y (1, 0) + pX,Y (1, 2)

+ pX,Y (2, 1) + pX,Y (2, 3) + pX,Y (3, 2)

= 0.2 + 0.2 + 0.05 + 0.05 + 0.25 + 0.25 = 0.55

Example 3.7.2Suppose two fair dice are rolled. LetX be the sum

of the numbers showing, and let Y be the larger ofthe two. So, for example,

pX,Y (2, 3) = P (X = 2, Y = 3) = P (∅) = 0

pX,Y (4, 3) = P (X = 4, Y = 3) = P ({(1, 3), (3, 1)})

=2

36and

pX,Y (6, 3) = P (X = 6, Y = 3) = P ({(3, 3)}) = 1

36.

The entire joint pdf is given in table 3.7.1.

y1 2 3 4 5 6 Row

2 1/36 0 0 0 0 0 1/363 0 2/36 0 0 0 0 2/364 0 1/36 2/36 0 0 0 3/365 0 0 2/36 2/36 0 0 4/366 0 0 1/36 2/36 2/36 0 5/36

x 7 0 0 0 2/36 2/36 2/36 6/368 0 0 0 1/36 2/36 2/36 5/369 0 0 0 0 2/36 2/36 4/3610 0 0 0 0 1/36 2/36 3/3611 0 0 0 0 0 2/36 2/3612 0 0 0 0 0 1/36 1/36Col 1/36 3/36 5/36 7/36 9/36 11/36

Notice that the row totals in the right hand mar-gin of the table give the pdf for X . similarly, thecolumn totals along the bottom detail the pdf forY . Those are not coincidences. theorem 3.7.1 givesa formal statement of the relationship between thejoint pdf and the individual pdfs.

Theorem 3.7.1 Suppose that pX,Y (x, y) is the

joint pdf of the discrete random variablesX and Y .Then

pX(x) =∑all y

pX,Y (x, y) and

pY (y) =∑all x

pX,Y (x, y)

Definition 3.7.2. An individual pdf obtainedby summing a joint pdf over all values of the otherrandom variable is called a marginal pdf

Continuous Joint Pdfs

IfX and Y are both continuous random variables,Definition 3.7.1 does not apply because P (X =x, Y = y) will be identically 0 for all (x, y). As wasthe case in single-variable situation, the joint pdf fortwo continuous variables will be defined as a func-tion, in which when integrated yields the probabil-ity that (X,Y ) lied in a specified region of xy−plane.

Definition 3.7.3. Two random variables de-fined on the same set of real numbers are jointly

continuous if there exists a function fX,Y (x, y) suchthat for any region R in the xy−planeP ((X, Y ) ∈ R) =

∫ ∫R fX,Y (x, y)dxdy. the

function fX,Y (x, y) is the joint pdf of X and Y .Note: Any function fX,Y (x, y) for which

1. fX,Y (x, y) ≥ 0 for all x and y

2.∫∞−∞

∫∞−∞ fX,Y (x, y)dxdy = 1

qualifies as a joint pdf.

Example 3.7.3Suppose that the variation in two continuous ran-

dom variables, X and Y , can be modeled by thejoint pdf fX,Y (x, y) = cxy for 0 < y < x < 1.Find c.By inspection, fX,Y (x, y) will be nonnegative as

long as c ≥ 0. The particular c that qualifiesfX,Y (x, y) as a joint pdf, though, is the one thatmakes the volume under fX,Y (x, y) equal to 1. But

∫ ∫Scxy dydx = 1

= c

∫ 1

0

[∫ x

0(xy) dy

]dx

= c

∫ 1

0x

(y2

2|x0

)dx

= c

∫ 1

0

(x3

2

)dx

= cx4

8

∣∣∣10=

(1

8

)c

Therefore, c = 8.

Example 3.7.4A study claim that the daily number of hours,X ,

a teenager watches television and the daily numberof hours, Y , he works on his homework are approx-imated by the joint pdf,

fX,Y (x, y) = x y e−(x+y), , x > 0, y > 0

What is the probability that the amount of timethe student chosen at random watch TV is at leasttwice the amount of time working on his homework?We wanted P (X > 2Y ). LetR be the region such

that X > 2Y when x > 0, y > 0. So P (X > 2Y )is the volume under fX,Y (x, y) above the region R

P (X > 2Y ) =

∫ ∞

0

∫ x/2

0xye−(x+y)dydx

Separating variables, we can write

P (X > 2Y ) =

∫ ∞

0xe−x

∫ x/2

0ye−(y)dydx.

And the double integral reduces to 727 :

P (X > 2Y )

=

∫ ∞

0xe−x

[1−

(x2+ 1)e−x/2

]dx.

=

∫ ∞

0xe−xdx−

∫ ∞

0

x2

2e−3x/2 −

∫ ∞

0xe−3x/2

= 1− 16

54− 4

9=

7

27

Geometric ProbabilityOne particularly important special case of defini-

tion 3.7.3 is the joint uniform pdf, which is repre-sented by a surface having a constant height every-where above a specified rectangular xy-plane. Thatis:

fX,Y (x, y) =1

(b− a)(d− c), a ≤ x ≤ b, c ≤ y ≤ d

If R is some region in the rectangle where X andY are defined,P [(X,Y ) ∈ R] reduces to simple ra-tios of areas

P [(X,Y ) ∈ R] =area of R

(b− a)(d− c)

practice 3.7.1. If pX,Y (x, y) = cxy at points(1, 1), (2, 1), (2, 2), and (3, 1) and equals to 0 else-where. Find c.1 =

∑x,y p(x, y) =

∑x,y cxy

Practices: 3.7.2, 3.7.4, 3.7.8,3.7.10, 3.7.11,3.7.13

Marginal Pdfs for Continuous Random

Variables.

Theorem 3.7.2. Suppose X and Y are jointlycontinuous with joint pdf fX,Y (x, y). Then themarginal pdfs, fX(x) and fY (y) are given by

fX(x) =

∫ ∞

−∞fX,Y (x, y)dy and

fY (y) =

∫ ∞

−∞fX,Y (x, y)dx

Example 3.7.7Suppose fX,Y (x, y) =

16, 0 ≤ x ≤ 3, 0 ≤ y ≤ 2

Applying Theorem 3.7.2 gives

fX(x) =

∫ 2

0fX,Y (x, y)dy =

∫ 2

0

1

6dy =

1

3,

0 ≤ x ≤ 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Example 3.7.8Consider the case where X and Y are two contin-

uous random variables, jointly distributed over thefirst quadrant of the xy-plane according to the joint

pdf,

fX,Y (x, y) =

{y2e−y(x+1) x ≥ 0, y ≥ 0

0 elsewhere.

Find the two marginal cdf’s.a)

fX(x) =

∫ ∞

0y2e−y(x+1)dy

using substitution and integrating by parts twice:w = y(x+ 1) making du = (x+ 1)dy. This gives

fX(x) =1

x + 1

∫ ∞

0

w2

(x + 1)2e−wdw

=1

(x + 1)3

∫ ∞

0w2e−wdw

After applying by part twice to get∫∞0 w2e−wdw,

we get

fX(x) =1

(x + 1)3

[−w2e−w − 2we−w − 2e−w

] ∣∣∣∣∞0

=1

(x + 1)3

[2− limw→∞

(w2

ew+2w

ew+

2

ew

)]=

2

(x + 1)3x ≥ 0

Finding fY (y) is a bit easier.

fY (y)

=

∫ ∞

0y2e−y(x+1)dx

= y2e−y∫ ∞

0e−yxdx = y2e−y

(1

y

)(−e−yx

∣∣∣∣∞0

)= ye−y, y ≥ 0

Homework: 3.7.19, 3.7.20, 3.7.21, 3.7.22

Joint Cdf

Definition 3.7.4.Let X and Y be any two random variables. The

joint cumulative distribution function of X and Y(or joint cdf) is denoted FX,Y (u, v), where

FX,Y (u, v) = P (X ≤ u and Y ≤ v)

Example 3.7.9Find the joint cdf,FX,Y (u, v) for two random vari-

ables X and Y whose joint pdf is fX,Y (x, y) =43(x + xy), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

FX,Y (u, v) = P (X ≤ u and Y ≤ v)

=4

3

∫ v

0

∫ u

0(x + xy)dxdy

=4

3

∫ v

0

(∫ u

0(x + xy)dx

)dy

=4

3

∫ v

0

(x2

2(1 + y)

∣∣∣∣u0

)dy

=4

3

∫ v

0

u2

2(1 + y)dy

=4

3

u2

2

(y +

y2

2

)∣∣∣∣v0

=4

3

u2

2

(v +

v2

2

)For what values of u and v if FX,Y (u, v) defined?Theorem 3.7.3Let FX,Y (u, v) be the joint cdf associated with

the continuous random variables X and Y . Thenthe joint pdf of X and Y , fX,Y (x, y) is a secondpartial derivative of the joint cdf - that is fX,Y (x, y) =

∂2

∂x∂yFX,Y (x, y) provided that FX,Y (x, y) has con-

tinuous second derivatives.

Example 3.7.10

What is the joint pdf of the random variables Xand Y whose joint cdf is FX,Y (x, y) =

13x

2(2y+y2)

fX,Y (x, y) =∂2

∂x∂yFX,Y (x, y)

=∂2

∂x∂y

1

3x2(2y + y2)

=∂

∂y

2

3x(2y + y3)

=2

3x(2 + 2y) =

4

3(x + xy)

Compare with example 3.7.9Read the format for Multivariate Densities.

Homework: 3.7.25,3.7.27, 3.7.29, 3.7.30

Independence of two random variablesDefinition 3.7.5. Two random variablesX and

Y are said to be independent if for every interval

A and every interval B, P (X ∈ A and Y ∈ B) =P (X ∈ A)P (Y ∈ B)

Theorem 3.7.4. The random variables X andY are independent if and only if there are functionsg(x) and h(y) such that

fX,Y (x, y) = g(x)h(y) (3.7.1)

If equation 3.7.1 holds, there is a constant k suchthat fX(x) = kg(x) and fY (y) = (1/k)h(y).Notes: If X and Y are independent then fX,Y (x, y) =

fX(x)fY (y)

Example 3.7.11.Suppose that the probabilistic behavior of two

random variables X and Y described by the jointpdf fX,Y (x, y) = 12xy(1 − y), 0 ≤ x ≤ 1, 0 ≤y ≤ 1. Are X and Y independent? If they are findfX(x) and fY (y).From theorem 3.7.4, if fX,Y (x, y) = g(x)h(y),

then X and Y are independent. Let g(x) = 12xand g(y) = y(1− y).From Theorem 3.7.4 fX(x) = kg(x), therefore

∫ 10 kg(x)dx = 1 =

∫ 10 k12xdx = 12k

2 x2∣∣∣∣10

So k = 1/6, therefore fX(x) = 2x, 0 ≤ x ≤ 1and fY (y) = 6y(1− y), 0 ≤ y ≤ 1

Practice 3.7.43. Suppose that random vari-ables X and Y are independent with marginal pdffX(x) = 2x, 0 ≤ x ≤ 1 and fY (y) = 3y2, 0 ≤y ≤ 1. find P (Y < X) ?

P (Y < X) =

∫ 1

0

∫ x

0fX,Y )(x, y)dydx

=

∫ 1

0

∫ x

0(2x)(3y2)dydx

=

∫ 1

02x4dx = 2/5

Independence of n(> 2) random vari-ables.Definition 3.7.6 The n random variablesX1, X2, · · · , Xn are said to be independent if there

are functionsg1(x1), g2(x2), · · · , gn(xn)such that for every x1, x2, · · · , xn

fX1,X2,··· ,Xn(x1, x2, · · · , xn) =

g1(x1) · g2(x2) · · · gn(xn)Example 3.7.12 Consider k urns, each holding n

chips numbered 1 through n. A chip is to be drawnat random from each urn. What is the probabil-ity that all k chips will bear the same number?If X1, X2, · · · , Xk denote the numbers on the 1st,2nd, . . ., and kth chips, respectively, we are look-ing for the probability that X1 = X2 = · · · = Xk .In terms of the joint pdf,

P (X1 = X2 = · · · = Xk)

=∑

x1=x2=···=xkpX1,X2,··· ,Xk

(x1, x2, · · · , xk)

Each of the selections here is obviously indepen-dent of all the others, so the joint pdf factors ac-cording to Definition 3.7.6, and we can write

P (X1 = X2 = · · · = Xk)

=∑

x1=x2=···=xkpX1,X2,··· ,Xk

(x1, x2, · · · , xk)

=

n∑i=1

pX1(xi) · pX2

(xi) · · · pXk(xi)

= n ·(1

n· 1n· · · 1

n

)=

1

nk−1

Random SamplesDefinition 3.7.6 addresses the question of inde-

pendence as it applies to n random variables havingmarginal pdfs, say,f1(x1), f2(x2), · · · , fn(xn) that might be quite

different. A special case of that definition occursfor virtually every set of data collected for statisti-cal analysis.Suppose an experimenter takes a set of n mea-

surements, x1, x2, · · · , xn under the same condi-tions. ThoseXi’s, then, qualify as a set of indepen-

dent random variables moreover, each representsthe same pdf. The special- but familiar-notationfor that scenario is given in Definition 3.7.7. Wewill encounter it often in the chapters ahead.

Definition 3.7.7. Let X1, X2, · · · , Xn be aset of n independent random variables, all havingthe same pdf. Then X1, X2, · · · , Xn are said to bea random sample of sixe n.

Homework: 3.7.39, 3.7.42, 3.7.45,3.7.46

3.8 Transformation and Combining Random Variables

In Section 3.4 we can find the pdf for Y = aX + bgiven the pdf for X . In this section we can find thepdf for a random variable that is a function of Xand Y .Linear Transformation

Theorem 3.3.1.Suppose X is a discrete random variable. Let

Y = aX + b, where a and b are constants. Then

pY (y) = pX((y−b)a )

Proof.

pY (y) = P (Y = y) = P (aX + b = y)

= P (X =(y − b)

a) = pX

((y − b)

a

)Practice 3.3.11 Suppose X is a binomial dis-

tribution with n = 4 and p = 23. What is the pdf

of 2X + 1Given

pX(x) =

(n

x

)px(1− p)n−x

Let Y = 2X + 1, then

P (Y = y) = P (2X + 1 = y) = P (X =(y − 1)

2)

= pX

((y − 1)

2

)=

(n

(y−1)2

)p(y−1)2 (1− p)n−

(y−1)2

=

(4

(y−1)2

)(2

3)(y−1)2 (1− (

2

3))4−

(y−1)2

Theorem 3.4.3 Suppose X is a continuousR.V. Let Y = aX + b where a ̸= 0 and b areconstant. Then

fY (y) =1

|a|fX

(y − b

a

)Proof. We begin by writing an expression for

the cdf of Y :

FY (y) = P (Y ≤ y)

= P (aX + b ≤ y)

= P (aX ≤ y − b)

At this point we will consider two cases, First leta > 0. Then

FY (y) = P (Y ≤ y)

= P (aX + b ≤ y)

= P (aX ≤ y − b)

= P

(X ≤ y − b

a

)and differentiating FY (y) yield fY (y).

fY (y) =d

dyFY (y)

=d

dyFX

(y − b

a

)=

1

afX

(y − b

a

)=

1

|a|fX

(y − b

a

)If a < 0

FY (y) = P (Y ≤ y)

= P (aX + b ≤ y)

= P (aX ≤ y − b)

= P

(X ≥ y − b

a

)= 1− P

(X ≤ y − b

a

)Differentiation yield

fY (y) =d

dyFY (y)

=d

dy

[1− FX

(y − b

a

)]= −1

afX

(y − b

a

)=

1

|a|fX

(y − b

a

)

Finding the pdf for the sum of r.v.

Theorem 3.8.1. Suppose that X and Y areindependent random variables. Let W = X + Y .Then

1. If X and Y are discrete r.v with pdf’s pX(x)pY (y),respectively, then

pW (w) =∑all x

pX(x)pY (w − x)

2. If X and Y are continuous r.v with pdf’s fX(x)fY (y),respectively, then

fW (w) =

∫ ∞

−∞fX(x)fY (w − x)dx

Proof. 1.

pW (w) = P (W = w) = P (X + Y = w)

= P (∪allx

(X = x, Y = w − x))

=∑allx

P (X = x, Y = w − x)

=∑allx

P (X = x)P (Y = w − x)

=∑allx

pX(x)pY (w − x)

where the next-to-last equality derives from the in-dependence of X and Y2. Since X and Y are continuous random vari-

ables, we can find fW (w) by differentiating the cor-responding cdf, FW (w). Here, FW (w) = P (X +Y ) ≤ w is found by integrating fX,Y (x, y) = fX(x)·fY (y) over the shaded region R in figure 3.8.1.By inspection

FW (w) = P (W < w) = P (X + Y < w)

=

∫ ∞

−∞

∫ w−x

−∞fX(x) fY (y)dydx

=

∫ ∞

−∞fX(x)

(∫ w−x

−∞fY (y)dy

)dx

=

∫ ∞

−∞fX(x)FY (w − x)dx

Assume that the integrand in the above equationis sufficiently smooth so that differentiation and in-

tegration can be interchanged. Then we can write

fW (w) =d

dwFW (w)

=d

dw

∫ ∞

−∞fX(x)FY (w − x)dx

=

∫ ∞

−∞fX(x)

(d

dwFY (w − x)

)dx

=

∫ ∞

−∞fX(x)fY (w − x)dx

and the theorem is proved.

Comment. The integral in part (2) above is re-ferred to as the convolution of the functions fx andfy. Besides their frequent appearances in random-variable problems, convolutions turn up in manyareas of mathematics and engineering.

Example 3.8.2 Suppose that X and Y are twoindependent binomial random variables, each withthe same success probability but defined on m andn trials, respectively. Specifically,

pX(x) =

(m

x

)px(1− p)m−x, x = 0, 1, · · · ,m

pY (y) =

(n

y

)py(1− p)n−y, y = 0, 1, · · · , n

Find pW (w), where W = X + YBy Theorem 3.8.3 pW (w) =

∑all x pX(x)pY (w−

x), but the summation over all x needs to be inter-preted as the set of values for x and w − x suchthat pX(x) and pY (w − x), respectively are bothnonzero. But that will be true for all integers xfrom 0 to w, i.epY (w−x) =

( n(w−x)

)p(w−x)(1−p)n−(w−x), (w−

x) = 0, 1, · · · , nSo w − x ≥ 0, therefore x ≤ w

Therefore

pW (w)

=∑all x

pX(x)pY (w − x)

=

w∑x=0

(m

x

)px(1− p)m−x

(n

w − x

)pw−x(1− p)n−(w−x)

=

w∑x=0

(m

x

)(n

w − x

)pw(1− p)m+n−w

EXAMPLE 3.8.3 Suppose a radiation monitorrelies on an electronic sensor, whose lifetime X is

modeled by the exponential pdf fx(x) = λ−λxe , x >

0. To improve the reliability of the monitor, themanufacturer has included an identical second sen-sor that is activated only in the event the first sen-sor malfunctions. (This is called cold redundancy.)Let the random variable Y denote the operatinglifetime of the second sensor, in which case thelifetime of the monitor can be written as the sumW = X + Y . Find fw(w).

Since X and Y are both continuous random vari-ables,fW (w) =

∫∞−∞ fX(x)fY (w − x)dx (3.8.2)

Notice that fX(x) > 0 only if x > 0 and fY (w−x) > 0 only if x < w. Therefore, the integral in(3.8.2) that goes from −∞ to ∞ reduces to an in-tegral from 0 to w, and we can write

fW (w) =

∫ w

0fX(x)fY (w − x)dx

=

∫ w

0λe−λxλe−λ(w−x)dx

= λ2∫ w

0e−λxe−λ(w−x)dx

= λ2e−λw∫ w

0dx = λ2we−λw, w ≥ 0

Finding the Pdfs of Quotients and Prod-uctsInterested in finding the pdf for

1) W = Y/X

2) W = XY

Theorem 3.8.4Let X and Y be independent continuous R.V,

with pdfs fX(x) and fY (y), respectively. Assumethat X is zero for at most a set of isolated points.Let W = Y/X . Then

fW (w) =

∫ ∞

−∞|x|fX(x)fY (wx)dx

Example 3.8.4Let X and Y be independent continuous R.V,

with pdfsfX(x) = λe−λx, x > 0 and fY (y) = λe−λy, y >

0, respectively. Define W = Y/X . Find fW (w)

fW (w) =

∫ ∞

0xλe−λxλe−λwxdx

= λ2∫ ∞

0xe−λ(1+w)xdx

=λ2

λ(1 + w)

∫ ∞

0xλ(1 + w)e−λ(1+w)xdx

=λ2

λ(1 + w)

1

λ(1 + w)

=1

(1 + w)2, w > 0

Theorem 3.8.5Let X and Y be independent continuous R.V,

with pdfs fX(x) and fY (y), respectively. Assumethat X is zero for at most a set of isolated points.Let W = XY . Then

fW (w) =

∫ ∞

−∞

1

|x|fX(x)fY (w/x)dx

=

∫ ∞

−∞

1

|y|fX(w/y)fY (y)dy

Example 3.8.5Let X and Y be independent continuous R.V,

with pdfsfX(x) = 1, 0 ≤ x ≤ 1 and fY (y) = 2y, 0 ≤

y ≤ 1, respectively. Define W = XY . FindfW (w).From Theorem 3.8.5fW (w) =

∫∞−∞

1|x|fX(x)fY (w/x)dx

The region of integration, though, needs to berestricted to values of x such that the integrandis positive. But fY (w/x) is positive only if 0 ≤w/x ≤ 1, which implies that x ≥ w. Moreover,for fX(x) to be positive requires 0 ≤ x ≤ 1. Anyx the from w to 1 will yield a positive integrand.Therefore,

fW (w) =

∫ 1

w

1

x(1)(2w/x)dx

= 2w

∫ 1

w

1

x2dx

= 2− 2w, 0 ≤ w ≤ 1

Practice: 3.8.1, 3.8.2, 3.8.5, 3.8.6,3.8.11

3.9 Further properties of Mean and Variance

Theorem 3.9.1

1. Suppose X and Y are discrete r.v with joint pdfpX,Y (x, y) and let g(X,Y ) be a function of Xand Y . Then the expected value of the randomvariable g(X,Y ) is given by

E[g(X,Y )] =∑all x

∑all y

g(x, y) · pX,Y (x, y)

provided∑

all x

∑all y |g(x, y)| · pX,Y (x, y)

2. Suppose X and Y are continuous r.v with jointpdf fX,Y (x, y) and let g(X,Y ) be a functionof X and Y . Then the expected value of therandom variable g(X,Y ) is given by

E[g(X,Y )] =

∫ ∞

−∞

∫ ∞

−∞g(x, y) · fX,Y (x, y)

provided∫∞−∞

∫∞−∞ |g(x, y)| · fX,Y (x, y)

Theorem 3.9.2 Let X and Y be any two ran-dom variables (discrete or continuous dependent orindependent), and let a and b be any two constants.Then

E(aX + bY ) = aE(X) + bE(Y )

provided E(X) and E(Y) are both finite.

Corollary. Let W1,W2, · · · ,Wn, be any r.v.for which E(Wi) < ∞, i = 1, 2, · · · , n and leta1, a2, · · · , an be any sets of constant. Then

E(a1W1 + a2W2 + · · · + anWn)

= a1E(W1) + a2E(W2) + · · · + anE(Wn)

Example 3.9.3Let X be a binomial random variable defined on

n independent trials, each trial resulting in successwith probability p. Find E(X). Note first, that Xcan be thought as a sum, X = X1 + X2 + · · · +Xn, where Xi represent the number of successesoccuring at the ith trial:

Xi =

{1 if the ith trial produces a success0 if the ith trial produces a failure

( Any Xi defined in this way on an individualtrial is called Bernoulli random variable. Everybinomial r.v. then, can be thought of as the sumof n independent Bernoullis ). By assumption thatpXi

(1) = p and pXi(0) = 1 − p, i = 1, 2, . . . , n.

Using the corollary,E(X) = E(X1) + E(X2) + · · · + E(Xn). But

E(X1) = 1 · p + 0 · (1− p) = p so E(X) = np

Example 3.9.5Ten fair dice are rolled. Calculate the expected

value of the sum of the faces showing. If the randomvariable X denotes the sum of these faces showingon the ten dice, then

X = X1 +X2 + · · · +X10

, where Xi is the number showing on the Xith die,i = 1, 2, · · · , 10. By assumption, pX(k) = 1/6 fork = 1, 2, · · · , 6, so

E(Xi) =∑6

k=116k = 1

6

∑6k=1 k = 1

6.6(7)2 By

theorem 3.9.2 E(X) = E(X1) + E(X2) + · · · +E(X10) = 10(3.5) = 35

Expected values of products: A specialCase

Theorem 3.9.3. If X and Y are independentrandom variables,E(XY) = E(X). E(Y)provided E(X) and E(Y) both exist.Proof. Suppose X and Y are discrete random

variable, and since X and Y are independent,pX,Y (x, y) = pX(x) · pY (y), then,

E(XY ) =∑all x

∑all y

xy · pX,Y (x, y)

=∑all x

∑all y

xy · pX(x) · pY (y)

=∑all x

x · pX(x)∑all y

y · pY (y)

= E(X)E(Y )

Practice 3.9.3

Calculating The variance of a sum of Ran-dom Variable

When random variables are not independent themeasure of relationship is called covariance.

Definition 3.9.1 Given R.V. X and Y withfinite variances, define the covariance of X and Y ,written Cov(X,Y ), as

Cov(X, Y ) = E(XY )− E(X)E(Y )

Theorem 3.9.4 If X and Y are independent,then Cov(X, Y ) = 0.Proof: If X and Y are independent then from

theorem 3.9.3E(XY )− E(X)E(Y ), thereforeCov(X, Y ) = E(XY )−E(X)E(Y ) = E(X)E(Y )−

E(X)E(Y )

Note: The converse of theorem 3.9.4 is NOT true.Just because Cov(X,Y ) = 0, we cannot concludethat X and Y are independent. Read Example

3.9.7.

Theorem 3.9.5 Suppose R.V. X and Y with finitevariances, and a and b are constants. Then

Var(aX + bY )

= a2Var(X) + b2Var(Y ) + 2abCov(X,Y )

Proof. For convenience, denote E(X) = µX andE(Y ) = µY and E(aX + bY ) = aµX + bµY

Var(aX + bY )

= E[(aX + bY )2]− [E(aX + bY )]2

= E[(aX + bY )2]− (aµX + bµY )2

= E(a2X2) + E(2abXY ) + E(b2Y 2)

− (a2µ2X + 2abµXµY + b2µ2Y )

= E(a2X2)− a2µ2X + E(b2Y 2)− b2µ2Y+ E(2abXY )− 2abµXµY

= a2[E(X2)− µ2X ] + b2[E(Y 2)− µ2Y ]

+ 2ab[E(XY )− µXµY ]

= a2Var(X) + b2Var(Y ) + 2abCov(X,Y )

Read Example 3.9.8.Corollary. SupposeW1,W2, · · · ,Wn, are ran-

dom variables with finite variances. Then

Var(

a∑i=1

aiWi) =

a∑i=1

a2iVar(Wi)+2∑i<j

aiajCov(Wi,Wj)

Corollary. Let W1,W2, · · · ,Wn, be a set of in-dependent r.v. for whichE(W 2

i ) < ∞, i = 1, 2, · · · , n.Let a1, a2, · · · , an be any sets of constant. Then

Var(a1W1 + a2W2 + · · · + anWn)

= a21Var(W1) + a22Var(W2) + · · · + a2nVar(Wn)

Example 3.9.9From Example 3.9.3,Let X be a binomial random variable defined on

n independent trials, each trial resulting in successwith probability p. Find E(X). Note first, that Xcan be thought as a sum, X = X1 + X2 + · · · +Xn, where Xi represent the number of successesoccuring at the ith trial:

Xi =

{1 if the ith trial produces a success0 if the ith trial produces a failure

( Any Xi defined in this way on an individualtrial is called Bernoulli random variable. Everybinomial r.v. then, can be thought of as the sumof n independent Bernoullis ). By assumption thatpXi

(1) = p and pXi(0) = 1− p, i = 1, 2, . . . , n.

E(X) = E(X1) + E(X2) + · · · + E(Xn). ButE(X1) = 1 · p + 0 · (1− p) = p so E(X) = npE(X2

i ) = (1)2 · p + (0)2 · (1− p) = p

V ar(Xi) = E(X2i )− [E(Xi)]

2 = p− p2 = p(1−p) It follows, then, that the variance of a binomialrandom variable is np(1- p):V ar(X) =

∑ni=1 V ar(Xi) = np(1− p)

Example 3.9.11 In statistics, it is often nec-essary to draw inferences based on W̄ , the aver-age computed from a random sample of n obser-vations. Two properties of W̄ are especially im-portant. First, if the Wi’s come from a populationwhere the mean is µ, the corollary from theorem

3.9.2 implies that E(W̄ = µ. Second, if the Wi’scomes from the population whose variances is σ2,then Var(W̄ ) = σ2/nProof.

W̄ =1

n

n∑i=1

Wi =1

nW1 +

1

nW2 + · · · + 1

nWn

Practice:3.9.14, 3.9.16, 3.9.17,3.9.20

3.10 Order Statistics —- skip

3.11 Conditional Densities

Finding Conditional Pdfs for Discrete RandomVari-ablesdefinition 3.11.1 let X and Y be discrete random

variables. The conditional probability density func-tion of Y given x - that is the probability that Ytakes on the value y given that X is equal x - de-noted as

pY |x(y) = P (Y = y|X = x) =pX,Y (x,y)

pX(x)

for pX(x) ̸= 0

Example 3.11.2 Assume that the probabilistic be-havior of a pair of discrete random variables X andY is described by the joint pdf

pX,Y (x, y) = xy2/39

define over the four points (1, 2), (1, 3), (2, 2), (2, 3).find the conditional probability that X = 1 giventhat Y = 2

pX|Y =2 =pX,Y (1, 2)

pY (2)

=1 · 22/39

1.22/39 + 2.22/39= 1/3

When r.v X and Y are continuous then,

fY |X=x(y) = fY |x(y) =fX,Y (x, y)

fX(x)

See Example 3.11.5Practice: 3.11.2,3.11.11, 3.11.16

3.12 Moment Generating Functions

Recall E(Xk) is called kth central moment mo-ment.How to obtained them? BY the definition.Moment generating function can be used to find

the moment.Definition 3.12.1. Let W be a random vari-

able. The moment generating function (mgf) forWis denoted MW (t) and given by

MW (t) =

E(etW ) =

{∑all k e

tkpW (k) if W is discrete∫∞−∞ etwfW (w)dw if W is continuous

at all values of t for which the expected value exists.

Example 3.12.1 Suppose the r.v. X has ageometric pdf,

pX(k) = (1− p)k−1p, k = 1, 2, · · ·[In practice, this is the pdf that models the oc-

curence of the first success in a series of indepen-dent trials, where each trial has a probability p of

ending in success ( recall Example 3.3.2)]Find theMX(t), the moment generating function

for X .Since X is discrete, the first part of Definition

3.12.1. applies, so

MX(t) = E(etX) =

∞∑k=1

etk(1− p)k−1p

=p

1− p

∞∑k=1

etk(1− p)k

=p

1− p

∞∑k=1

[(1− p)et]k

The term t in MX(t) can be any number in aneighborhood of zero, as long asMX(t) < ∞. HereMX(t) is an infinite sum of the terms [(1 − p)et]k

sum will be finite only if (1 − p)et < 1, or equiv-alently, if t < ln[ 1

1−p]. It is assume then in what

follows that t < ln[ 11−p].

Recall that∞∑k=0

rk =1

1− r

provided that 0 < r < 1. This formula can beused to solve the problem where r = (1− p)et and0 < t < ln[ 1

1−p]. Specifically,

MX(t) =p

1− p

∞∑k=0

[(1− p)et]k − [(1− p)et]0

=

p

1− p

[1

1− (1− p)et− 1

]=

pet

1− (1− p)et

Example 3.12.2. Suppose thatX is a binomialrandom variable with pdf

pX(x) =

(n

x

)px(1− p)n−x, x = 0, 1, 2, · · · , n

Find MX(t)

By definition 3.12.1.

MX(t) = E(etX) =

n∑x=0

etx(n

x

)px(1− p)n−x

=

n∑x=0

(n

x

)(pet)x(1− p)n−x

We know that (a + b)n =∑n

k=0

(nk

)akbn−k for

any x and y.Suppose we let a = pet and b = 1− p Then

MX(t) = (1− p + pet)n

Example 3.12.3 Suppose Y has an exponentialpdf, where fY (y) = λe−λy, y > 0

MY (t) = E(etY ) =

∫ ∞

0etyλe−λy

=

∫ ∞

0λe−(λ−t)ydy

Using substitution,

MY (t) =

∫ ∞

u=0λe−u du

λ− t

=λ

λ− t

Note: HereMY (t) if finite and nonzero only whenu = (λ− t)y > 0, which implies that t must be lessthan λ. For t > λ,MY (t) fails to exist.

Example 3.12.4.Memorize the result. When Y is normally dis-

tributed with mean µ and variance σ2 hasMY (t) =

eµt+σ2t2/2

Practice: 3.12.3, 3.12.4, 3.12.5 and 3.12.6,3.12.7

Using Moment Generating Functions tofind moments

Theorem 3.12.1 Let W be a random variablewith probability density function fW (w). [ If Wis continuous, fW (w) must be sufficiently smooth

to allow the order of differentiation and integrationto be interchanged.] Let MY (t) be the moment-generating function for W . Then provided the rthmoment exists,

M(r)W (0) = E(W r)

The proof is straight forward.Example 3.12.5. For a geometric random variable

X with pdf

pX(k) = (1− p)k−1p, k = 1, 2, · · ·which from Example 3.12.1 has a moment gener-

ating function

MX(t) =pet

1− (1− p)et.

Find the Expected value of X by differentiationits moment-generating function.Using the product rule, the first derivative ofMX(t)

is

M(1)X (t)

= pet(−1)[1− (1− p)et]−2(−1)(1− p)et

+ [1− (1− p)et]−1pet

=p(1− p)e2t

[1− (1− p)et]2+

pet

1− (1− p)et

Setting t = 0 shows that E(X) = 1p:

M(1)X (0) = E(X) =

p(1− p)

p2+p

p=

1

p

Example 3.12.6 Find the Expected value ofan exponential random variable with pdf fY (y) =

λe−λy, y > 0. Use the fact that MY (t) =λ

λ−1

M(1)Y (t) = λ(−1)(λ− t)−2(−1)

=λ

(λ− t)2

Set t = 0. then

M(1)Y (0) = E(Y ) =

1

λ

Using Moment Generating Functions tofind VariancesBecause V ar(Y ) = E(Y−µ)2 = E(Y 2)−[E(Y )]2,

when MY (t) is available,

V ar(Y ) = M(2)Y (0)− [M

(1)Y (0)]2

Example 3.12.9A discrete random variable X is said to have a

Poisson distribution if

pX(x) = P (X = x) =e−λλx

x!, x = 0, 1, 2, · · ·

. It can be shown (Question 3.12.7) that the moment-generating function for a Poisson random variableis given by

MX(t) = e−λ+λet

Use MY (t) to find E(X) and V ar(X)

M(1)Y (0) = E(X) = λ

M(2)Y (0) = E(X2) = λ + λ2

Var(X) = E(X2)− [E(X)]2 = λ

Homework:3.12.9, 3.12.11, 3.12.16

Using Moment-Generating Functions toIdentify Pdfs

Finding moments is not the only application ofmoment-generating functions. They are also usedto identify the pdf of sums of random variables- thatis, finding fW (w), whereW = W1+W2+ · · ·+Wn

Theorem 3.12.2 Suppose that W1 and W2 arerandom variables for which MW1

(t) = MW2(t) for

some interval of t’s containing 0. Then fW1(w) =

fW2(w)

Theorem 3.12.3

a. Let W be a random variable with moment gen-erating function MW (t). Let V = aW + b, then

MV (t) = ebtMW (at)

b. Let W1,W2, · · · ,Wn be independent randomvariables with moment generating function

MW1(t),MW2

(t), · · · ,MWn(t) respectively. Let

W = W1 +W2 + · · · +Wn. then

MW (t) = MW1(t) ·MW2

(t) · · ·MWn(t)

Example 3.12.10 Suppose that X1 and X2 aretwo independent Poisson random variables with pa-rameters λ1 and λ1 respectively. That is

pX1(x) = P (X1 = x) =

e−λ1λx1x! , x = 0, 1, 2, · · ·

and

pX2(x) = P (X2 = x) =

e−λ2λx2x! , x = 0, 1, 2, · · ·

Let X = X1 +X2. What is the pdf X?

MX1(t) = e−λ1+λ1e

t

and

MX2(t) = e−λ2+λ2e

t

Moreover when X = X1 +X2.

MX(t) = MX1(t) ·MX2

(t)

= e−λ1+λ1et· e−λ2+λ2e

t

= e−(λ1+λ2)+(λ1+λ2)et

(3.12.1)

By inspection, 3.12.1 is the moment generatingfunction for Poisson with λ = λ1 + λ2

Example 3.12.The normal random variable, Y , with mean µ and

variance σ2 has pdf

fY (y) =1√2πσ

exp

[−1

2

(y − µ

σ

)2],−∞ < y < ∞

and mgf

MY (t) = eµt+σ2t2/2

By definition, the standard normal distributiondenoted as Z, have µ = 0 and σ2 = 1. Then

fZ(z) =1√2π

exp

[−1

2(z)2

],−∞ < y < ∞

MZ(t) = et2/2

Show that the ratio Y−µσ is a standard normal

variable, Z.

Write Y−µσ as 1

σY − µσ

From the theorem 3.12.3 which stated

MY−µσ

(t) = ebtMY (at)

= e−µt/σMY

(t

σ

)= e−µt/σ · e[−µt/σ+σ2(t/σ)2/2]

= et2/2

But MZ(t) = et2/2 then it follows that the pdf

for Y−µσ is the same as the pdf for fZ(z). We call

Y−µσ a Z transformation.

Practice 3.12.23,

Homework: 3.12. 19, 3.12.20, 3.12.21 and 3.12.22

3.13 Taking a second look at statistics(Interpreting Means)

Please Read.

Appendix 3.A.1 : R Application

Downloading Rhttp://www.r-project.org/

For binomial Distribution

http://www.stat.umn.edu/geyer/old/5101

/rlook.html#dbinom

For Exponential Distribution and other continu-ous distributionshttp://msenux.redwoods.edu/math/R/ContinuousDistributions.php

http://stat.ethz.ch/R-manual/R-patched

/library/stats/html/Exponential.html

#3.13 R application

#Binomial Distribution

#dbinom - evaluate P(X=k) where X is Binomial(n,p)

dbinom(10, size=30, prob=0.60) #dbinom(x,n,p)

x=seq(0,30,by=1)

y=dbinom(x, size=30, prob=0.60)

plot(x,y,main=’Binomial pdf n=30, p=0.60’)

#plot(dbinom(x, size=30, prob=0.60)) #dbinom(x,n,p)

require(graphics)

# Compute P(5 < X < 18) for X Binomial(30,0.6)

sum(dbinom(6:17, 30, 0.6))

# Compute P(-1 < X < 16) for X Binomial(30,0.6);cdf F_X(15)

sum(dbinom(0:15, 30, 0.6))

#########cdf#############################

pbinom(15, size=30, prob=0.6) # F_X(15)

pbinom(15, 30, 0.6)

cdf.bi=pbinom(x, 30, 0.6) #F_X(x)

cdf.bi

plot(x,cdf.bi)

############inverse probability##########

#qbinom(q, n, p) = x value such that P(X<=x)=q

# q == pbinom(qbinom(q, n, p))

#q=is quantile # see ?

qbinom(0.1, 30, 0.6)

qbinom(0.17, 30, 0.6)

qbinom(0.2, 30, 0.6)

# and so forth, or all at once with

qbinom(seq(0.1, 0.9, 0.1), 10, 1/3)

################################################

#Exponential Distribution

#dexp(x, rate = 1, log = FALSE); f_X(x)

#pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE); F_X(x)

#qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)

x=seq(0,4,length=200)

y=dexp(x,rate=1) # lambda=1

plot(x,y,type="l",lwd=2,col="red",ylab="p")

x=seq(0,4,length=200)

y=dexp(x,rate=0.5) # lambda=1

plot(x,y,type="l",lwd=2,col="red",ylab="p")

#cdf ;F_X(1)

pexp(1, rate = 1, lower.tail = TRUE, log.p = FALSE)

#inverse probability

qexp(p=0.632, rate = 1, lower.tail = TRUE, log.p = FALSE)

#################################################

#Normal Distribution

#REF: http://msenux.redwoods.edu/math/R/normal.php

#

#standard normal Distribution

x=seq(-4,4,length=200)

y=1/sqrt(2*pi)*exp(-x^2/2)

plot(x,y,type="l",lwd=2,col="red")

#Alternative approach


y=dnorm(x,mean=0,sd=1)

plot(x,y,type="l",lwd=2,col="red")


y1=dnorm(x,mean=0,sd=1)

plot(x,y1,type="l",lwd=2,col="red")

y2=dnorm(x,mean=0,sd=2)

lines(x,y2,type="l",lwd=2,col="blue")

legend("topright",c("sigma=1","sigma=2"),lty=c(1,1),col=c("red","blue"))

###########################################

#Poisson

#dpois(x, lambda, log = FALSE)

#ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)

#qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)

#Poisson Distribution

#The Poisson distribution is the probability distribution

# of independent event occurrences in an interval.

#If ? is the mean occurrence per interval, then the

# probability of having x occurrences within a given interval is:

# f(x) = (lambda)^x e^{-(lambda)} /x! where x = 0,1,2,3,...

#Problem

#If there are twelve cars crossing a bridge per minute on average,

#find the probability of having seventeen or more cars crossing

#the bridge in a particular minute.

#Solution

#The probability of having sixteen or less cars crossing the bridge

#in a particular minute is given by the function ppois.

ppois(16, lambda=12) # lower tail

#[1] 0.89871

#Hence the probability of having seventeen or more cars crossing

#the bridge in a minute is in the upper tail of the probability density function.

ppois(16, lambda=12, lower=FALSE) # upper tail

#[1] 0.10129

#Answer

#If there are twelve cars crossing a bridge per minute on average,

# the probability of having seventeen or more cars crossing the bridge

# in a particular minute is 10.1%.

an introduction to mathematical statistics and its applications … · an introduction to...

Documents