chapter 5: probabilitycparrish.sewanee.edu/stat204 s2017/notes/part 03...

12
Stat 204, Part 3 Probability Chapter 5: Probability These notes reflect material from our text, Exploring the Practice of Statistics, by Moore, McCabe, and Craig, published by Freeman, 2014. Probability Probability quantifies randomness. It is a formal framework with a very specific vocabulary and nota- tion. Imagine an experiment with a specific set of outcomes (say, flipping a fair coin twice). S is the sample space of all possible outcomes. Subsets of S are called events and are denoted with letters like A and B. The empty set, φ, is the event that contains no outcomes. Two events are disjoint if their intersection is empty. The Russian mathematician Kolmogorov helped to clarify the essential properties of a probability func- tion, P. P(S) = 1 for the entire sample space S 0 P(A) 1 for any event A S P(n i=1 A i )= n i=1 P(A i ) for disjoint events A i First examples : flip a coin, flip three coins, roll a die, roll two dice If you roll a die once the result is completely uncertain, because the individual outcomes are equally likely. But now begin to methodically roll the die and after each toss calculate the total number of 6’s observed so far divided by the total number of rolls at this point. Call this a cumulative proportion and graph these cumulative proportions for a large number of rolls of the die, say 100,000 rolls. A computer did this and displayed the following graph. In this particular simulation, the first ten rolls of the die produced the sequence 0001010010, where 1 means a 6 was rolled and 0 means something else appeared. Calculate the first ten cumulative sums for this short sequence and compare your results to the following chart. What is the height of the dotted red line? n (number of rolls) 1 10 100 1,000 10,000 100,000 0.00 0.05 0.10 0.15 0.20 0.25 0.30 p ^ n Fig. Cumulative proportions of a 6 in 100,000 rolls of a fair die, from OpenIntro Statistics, chapter 2 Display discrete probabilities in a table Flip a fair coin outcome h t probability 0.5 0.5 Spring 2017 Page 1 of 12

Upload: others

Post on 16-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Chapter 5: Probability

These notes reflect material from our text, Exploring the Practice of Statistics, by Moore, McCabe, andCraig, published by Freeman, 2014.

Probability

Probability quantifies randomness. It is a formal framework with a very specific vocabulary and nota-tion. Imagine an experiment with a specific set of outcomes (say, flipping a fair coin twice). S is the samplespace of all possible outcomes. Subsets of S are called events and are denoted with letters like A and B.The empty set, φ, is the event that contains no outcomes. Two events are disjoint if their intersection isempty.

The Russian mathematician Kolmogorov helped to clarify the essential properties of a probability func-tion, P.

• P(S) = 1 for the entire sample space S

• 0 ≤ P(A) ≤ 1 for any event A ⊂ S• P(∪ni=1Ai) =

∑ni=1 P(Ai) for disjoint events Ai

First examples : flip a coin, flip three coins, roll a die, roll two dice

If you roll a die once the result is completely uncertain, because the individual outcomes are equallylikely. But now begin to methodically roll the die and after each toss calculate the total number of 6’sobserved so far divided by the total number of rolls at this point. Call this a cumulative proportion andgraph these cumulative proportions for a large number of rolls of the die, say 100,000 rolls. A computer didthis and displayed the following graph. In this particular simulation, the first ten rolls of the die producedthe sequence 0001010010, where 1 means a 6 was rolled and 0 means something else appeared. Calculatethe first ten cumulative sums for this short sequence and compare your results to the following chart. Whatis the height of the dotted red line?

n (number of rolls)

1 10 100 1,000 10,000 100,000

0.00

0.05

0.10

0.15

0.20

0.25

0.30

p̂n

Fig. Cumulative proportions of a 6 in 100,000 rollsof a fair die, from OpenIntro Statistics, chapter 2

Display discrete probabilities in a table

Flip a fair coin

outcome h tprobability 0.5 0.5

Spring 2017 Page 1 of 12

Page 2: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Venn diagram

BA

Rules of Probability

Mutually exclusive events. A ∩B = φ

Unions. P(A ∪B) = P(A) + P(B)− P(A ∩B).

Complements. P(Ac) = 1− P(A).

Independent events. P(A ∩B) = P(A)P(B) when A and B are independent.

Conditional probability. P(A|B) = P(A ∩B)/P(B) when P(B) 6= 0

Intersections. P(A ∩B) = P(A|B)P(B)

Spring 2017 Page 2 of 12

Page 3: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Contingency tables and conditional probabilities

Vocabulary for diagnostic testing, S medical state present, POS test positive :

sensitivity P(POS|S), specificity P(NEG|Sc), incidence P(S)

Consider the Triple Blood Test for Down Syndrome (Agresti and Franklin, chapter 5, pp.232-233)

Blood Test

Status POS NEG Total

D (Down) 48 6 54Dc (unaffected) 1307 3921 5228Total 1355 3927 5282

Calculate the following probabilities based on the figures in this study:

sensitivity P(POS|D), specificity P(NEG|Dc), incidence P(D)false positives P(POS|Dc), false negatives P(NEG|D)

An individual being tested would be most concerned about P(D|POS). What is this probability? Whyis it so small? Hint: Calculate P(Dc|POS).

Again, an individual being tested would want to know P(D|NEG). How would that probability com-pare to the a priori P(D)?

Triple Blood Test

blood test

status

POS NEG

Down

unaffected

Spring 2017 Page 3 of 12

Page 4: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Using R to Compute Conditional Probabilities

Construct a data frame named down to represent the Down Syndrome contingency table, and then useaddmargins(down) to compute its row and column totals.

down <- c(48, 1307, 6, 3921)

dim(down) <- c(2, 2)

dimnames(down) <- list(status=c("down", "unaffected"),

"blood test"=c("pos", "neg"))

down

# blood test

# status pos neg

# down 48 6

# unaffected 1307 3921

addmargins(down)

# blood test

# status pos neg Sum

# down 48 6 54

# unaffected 1307 3921 5228

# Sum 1355 3927 5282

Then prop.table(down, 1) will divide each row by its row sum. The numbers in each row are conditionalprobabilities. And prop.table(down, 2) will divide each column by its column sum. The numbers in eachcolumn are conditional probabilities. Therefore, each of the eight numbers shown below is a conditionalprobability of the form P(A |B) for some A and B. Identify the correct A and B for each number.

prop.table(down, 1)

# blood test

# status pos neg

# down 0.8888889 0.1111111

# unaffected 0.2500000 0.7500000

prop.table(down, 2)

# blood test

# status pos neg

# down 0.03542435 0.001527884

# unaffected 0.96457565 0.998472116

What values do these tables indicate for P(pos | down) and P(down | pos)?

Spring 2017 Page 4 of 12

Page 5: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Boston Smallpox Epidemic of 1721

The following contingency table (OpenIntro Statistics, pp.83–87) refers to the Boston smallpox epidemicof 1721. A total of 6224 residents of Boston contracted smallpox in this epidemic and 850 of them died.The epidemic was marked by vigorous public debate of the value (or lack thereof) of a type of inoculationknown as variolation (which was dangerous). The Reverend Cotton Mather advocated inoculation but thephysician William Douglass was firmly against it. See the article in Harvard’s Contagion for more details.An effective smallpox vaccination procedure was eventually demonstrated by Edward Jenner in Englandin 1796, and succeeding efforts to eradicate smallpox from the world were finally declared to be successfulin 1980 by the World Health Organization. Cotton Mather, on the other hand, lives on in infamy for hisrole in the Salem witch trials.

Inoculated

Result yes no Total

lived 238 5136 5374died 6 844 850Total 244 5980 6224

Smallpox Epidemic, Boston, 1721

innoculated

result

yes no

lived

died

Spring 2017 Page 5 of 12

Page 6: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Tree Diagrams

The following tree diagram, generated by OpenIntro software, summarizes the relevant statistics forthe Boston smallpox epidemic of 1721. Here Inoculated is a categorical explanatory variable with levelsyes and no. In the Inoculated column of the tree diagram are the probabilities P(yes) and P(no). Thecategorical response variable Result has levels lived and died. The conditional probabilities in the Resultcolumn are

P(lived | yes),P(died | yes),P(lived |no),P(died |no).

The probabilities calculated by the software in the third column are

P(lived and yes),P(died and yes),P(lived and no),P(died and no),

becauseP(A)× P(B |A) = P(A ∩B).

Innoculated Result

yes, 0.0392

lived, 0.97540.0392*0.9754 = 0.03824

died, 0.02460.0392*0.0246 = 0.00096

no, 0.9608

lived, 0.85890.9608*0.8589 = 0.82523

died, 0.14110.9608*0.1411 = 0.13557

Fig. Smallpox in Boston, 1721, from OpenIntro Statistics, chapter 2, pp.83-87

Spring 2017 Page 6 of 12

Page 7: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Random variables

A random variable is a function from the sample space, S, of an experiment to the real numbers,X : S → R, so we might characterize a random variable as a function which assigns a numerical value to anoutcome of an experiment. Random variables can be defined on discrete and on continuous sample spaces.

discrete: flip a coin, flip three coins, roll a die, roll two dice, roulette wheel, spinner

continuous: random number generators: U [0, 1], N(0, 1), N(µ, σ)

Expected value of a random variable, E(X) = µX

Variance of a random variable, Var(X) = σ2X

Linear combinations of random variables, Y = aX1 + bX2

Expected value and variance of a linear combination of random variables. If Y = aX1 + bX2, then

E(Y ) = aE(X1) + bE(X2),

andVar(Y ) = a2Var(X1) + b2V(X2).

Distributions of random variables

Calculation of probability using a continuous distribution, P(X ≤ x). The area of the blue region inthe following figure is the probability that the random variable X ∼ N(µ, σ) takes on a value less than orequal to 5. That probability is denoted P(X ≤ 5).

X ~ N(µ, σ)

x

y

-3 -1 1 3 5 7 9

Normal distributions

Normal random variable, X ∼ N(µ, σ). z-Score, z = (x− µ)/σ. If z = (x− µ)/σ, then x = µ+ z × σ.

Spring 2017 Page 7 of 12

Page 8: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Standardized normal random variable, Z ∼ N(0, 1). If X ∼ N(µ, σ) and Z = (X − µ)/σ, thenZ ∼ N(0, 1). This is why our textbook need only contain a table of values for the standard normal distri-bution.

Areas of regions under a normal distribution curve. Percentiles. The 68-95-99.7% rule. Q-Q plots.

Calculations with X ∼ N(0, 1)

Suppose that the random variable X has a standard normal distribution, X ∼ N(0, 1).

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

X ~ N(0, 1)

There are four useful procedures in R for working with normal distributions:

dnorm, pnorm, qnorm, rnorm.

a. pnorm(2) ⇒ P(X ≤ 2)

b. pnorm(2) - pnorm(-2) ⇒ P(−2 ≤ X ≤ 2)

c. 1 - pnorm(2) ⇒ P(X ≥ 2)

d. qnorm(0.60) ⇒ q60 such that P(X ≤ q60) = 0.60, the 60th percentile

e. rnorm(3) ⇒ three random numbers from the standard normal distribution, for instance

0.3612443 0.1075216 − 1.0473477

f. dnorm() used for drawing the graph of the bell curve

Spring 2017 Page 8 of 12

Page 9: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Calculations with X ∼ N(µ, σ)

Agresti and Franklin report that female students at the University of Georgia have an approximatelynormal height distribution, with mean µW = 65 inches and standard deviation σW =3.5 inches. Malestudents have an approximately normal height distribution, with mean µM = 70 inches and standarddeviation σM=4.0 inches. Let W ∼ N(µW , σW ), and M ∼ N(µM , σM ), and calculate the following (usingR and using Agresti and Franklin, Appendix A, pp.A-1 and A-2):

P(W ≤ 66), P(M ≥ 72), q such that P(W ≤ q) = 0.30, q such that P(M ≥ q) = 0.25

Calculate the z-score of a person with W = 63, of a person with M = 67. How tall is a woman withz-score 0.6? How tall is a man with z-score -0.7? See page 11 of these notes for R expressions which willcalculate the answers to these questions.

55 60 65 70 75 80 85

0.00

0.02

0.04

0.06

0.08

0.10

Men's and Women's Heights

height (in)

menwomen

Student’s t, Chi-Square, F

Student’s t, Chi-Square, and F distributions play key roles in the sequel. All of them are families ofcontinuous distributions. Student’s t distributions resemble Normal distributions but they have fatter tails.Chi-Square and F distributions have domains the half line [0,∞), so neither one is symmetric.

Discrete distributions

For X to be a Bernoulli random variable, and hence have a Bernoulli distribution, X ∼ Bernoulli(p),we require

i. a binary outcome for a single event (generally coded as success, 1, or failure, 0)

ii. a fixed probability of success, P(X = 1) = p, and failure, P(X = 0) = 1− p, for that event

iii. exactly one event

Examples of Bernoulli random variables include the outcome of a coin flip (h or t), or driver was wearinga seat belt (yes or no), or basketball player made a basket (1 or 0).

Spring 2017 Page 9 of 12

Page 10: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Expected value and variance of a Bernoulli random variable, X ∼ Bernoulli(p):

Expected value, µX = p.

Variance, σ2X = p(1− p).

0.0

0.2

0.4

0.6

0.8

1.0

Bernoulli distribution, p=1/6

k

prob

abili

ty d

ensi

ty

0 1

Binomial random variable, X ∼ Binomial(n, p). The probability of k successes in n trials. Expectedvalue, µX = np. Variance, σ2

X = np(1− p). Normal approximation to a binomial distribution.

0 2 4 6 8 10

0.00

0.10

0.20

0.30

binomial distribution, p=1/6, n=10

k

prob

abili

ty d

ensi

ty

● ● ● ● ● ●

Spring 2017 Page 10 of 12

Page 11: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Conditions for a binomial distribution

For X to be a binomial random variable, and hence have a binomial distribution, X ∼ Binomial(n, p),we require

i. a binary outcome for each event (coin flip produces h or t)

ii. a single fixed probability of success for each event (p = 0.5)

iii. a fixed number of events (n = 10 coin flips)

Normal approximations to binomial distributions

The distribution of a binomial random variable, X ∼ Binomial(n, p), has mean np and standarddeviation

√np(1− p). It can be approximated by a normal probability distribution with the same mean

and standard deviation, Y ∼ N(µ = np, σ =√np(1− p)). The fit improves as n gets larger.

0 1 2 3 4 5 6

0.00

0.10

0.20

0.30

binomial distribution, p=1/6, n=10

k

prob

abili

ty d

ensi

ty

0 2 4 6 8 10 12

0.00

0.05

0.10

0.15

binomial distribution, p=1/6, n=30

k

prob

abili

ty d

ensi

ty

0 5 10 15 20

0.00

0.05

0.10

0.15

binomial distribution, p=1/6, n=50

k

prob

abili

ty d

ensi

ty

0 10 20 30 40

0.00

0.04

0.08

binomial distribution, p=1/6, n=100

k

prob

abili

ty d

ensi

ty

Answers

The following R expressions calculate the answers to the questions about heights of men and womenat the University of Georgia posed above. For each calculation, draw a corresponding normal curve andshade the area or mark the measurement in question.

pnorm(66, mean = 65, sd = 3.5), 1− pnorm(72, mean = 70, sd = 4.0),

qnorm(0.30, mean = 65, sd = 3.5), qnorm(1− 0.25, mean = 70, sd = 4.0),

z← 63− 65

3.5, z← 67− 70

4.0,

x← 65 + 0.6× 3.5, x← 70− 0.7× 4.0.

Spring 2017 Page 11 of 12

Page 12: Chapter 5: Probabilitycparrish.sewanee.edu/stat204 S2017/notes/part 03 inference/05_probability.pdf · Distributions of random variables Calculation of probability using a continuous

Stat 204, Part 3 Probability

Exercises

We will attempt to solve some of the following exercises as a community project in class today. Finish thesesolutions as homework exercises, write them up carefully and clearly, and hand them in at the beginningof class next Friday.

Homework 5a – probability models

Exercises from Sections 5.1, 5.2:5.2 (graduation rates), 5.3 (free throws), 5.24 (blood types), 5.26 (Canada)

Homework 5b – random variables

Exercises from Sections 5.3, 5.4:5.46 (households), 5.54 (foreign-born), 5.65 (fruits and veggies), 5.75 (sums)

Homework 5c – binomial distributions and probability rules

Exercises from Sections 5.5, 5.6 and Chapter 5 exercises:5.94 (music), 5.102 (die), 5.118 (tree diagram), 5.142 (SAT scores)

Spring 2017 Page 12 of 12