stat 111 introductory statistics

35
Lecture 8: More on the Binomial Distribution and Sampling Distributions June 1, 2004 STAT 111 Introductory Statistics

Upload: kaemon

Post on 06-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

STAT 111 Introductory Statistics. Lecture 8: More on the Binomial Distribution and Sampling Distributions June 1, 2004. Today’s Topics. More on the binomial distribution Mean and variance Sample proportion Normal approximation of the binomial Continuity correction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STAT 111 Introductory Statistics

Lecture 8: More on the Binomial Distribution

and Sampling Distributions

June 1, 2004

STAT 111 Introductory Statistics

Page 2: STAT 111 Introductory Statistics

Today’s Topics

• More on the binomial distribution– Mean and variance

• Sample proportion

• Normal approximation of the binomial

• Continuity correction

• Sampling distribution of sample means

• Central Limit Theorem

Page 3: STAT 111 Introductory Statistics

Recall: The Binomial Setting

• There are a fixed number n of trials.

• The n trials are all independent.

• Each trial has one of two possible outcomes, labeled “success” and “failure.”

• The probability of success, p, remains the same for each trial.

Page 4: STAT 111 Introductory Statistics

Recall: The Binomial Distribution

• The distribution of the count X of successes in the binomial setting is called the binomial distribution with parameter n and p, where

– n is the number of trials

– p is the probability of a success on any trial

• The count X is a discrete random variable, typically abbreviated as X ~ B(n, p).

• Possible values of X are the whole numbers from 0 to n.

Page 5: STAT 111 Introductory Statistics

The Binomial Distribution

• If X ~ B(n,p), then

• Examples: Let n = 3.

xnxxnx ppxnx

npp

x

nxXP

)1(

)!(!

!)1()(

3)12)(1(

123

!2!1

!3

1

3:1

1)123)(1(

123

!3!0

!3

0

3:0

x

nx

x

nx

Page 6: STAT 111 Introductory Statistics

Developing Binomial Probabilities for n = 3

S1

F1

S2

F2

S2

F2

F3

S3

F3

S3

F3

F3

S3

S3

p

1-p

pp

p

p

p

p

1-p

1-p

1-p

1-p

1-p

1-p

P(SSS) = p3

P(SSF) = p2(1 – p)P(SFS) = p2(1 – p)P(SFF) = p(1 – p)2

P(FSS) = p2(1 – p)P(FSF) = p(1 – p)2

P(FFS) = p(1 – p)2

P(FFF) = (1 – p)3

Page 7: STAT 111 Introductory Statistics

P(X = 0) = (1 – p)3

P(X = 1) = 3p(1 – p) 2

P(X = 2) = 3p2(1 – p)

P(X = 3) = p3

• Let X be the number of successes in three trials.

X=0

X=1

X=2

X=3

Binomial Probabilities for n = 3

P(FFF) = (1 – p)3

P(SSF) = p2(1 – p)P(SFS) = p2(1 – p)

P(SFF) = p(1 – p)2

P(FSS) = p2(1 – p)

P(FSF) = p(1 – p)2

P(FFS) = p(1 – p)2

P(SSS) = p3

Page 8: STAT 111 Introductory Statistics

Example: Rolling a Die

• Roll a die 4 times, let X be the number of times the number 5 appears.

• “Success” = get a roll of 5, so P(Success) = 1/6.

X = 0

X = 1

X = 2

X = 3

X = 4

0.4823)6/11()6/1()!04(!0

!4)0( 040

XP

0.3858)6/11()6/1()!14(!1

!4)1( 141

XP

0.1157)6/11()6/1()!24(!2

!4)2( 242

XP

0.0154)6/11()6/1()!34(!3

!4)3( 343

XP

0.0008)6/11()6/1()!44(!4

!4)4( 444

XP

Page 9: STAT 111 Introductory Statistics

Example: Rolling a Die

• Find the probability that we get at least 2 rolls of 5.

Page 10: STAT 111 Introductory Statistics

Expected Value and Variance of a Binomial Random Variable

• If X~B(n,p),then

)1(

)1()(

)(2

pnp

pnpXVar

npXE

X

X

X

Page 11: STAT 111 Introductory Statistics

• Let Xi indicate whether the i th trial is a success or failure by,

• X1, …, Xn are independent and identically distributed with probability distribution

Set-up for Derivation

Outcome: 1 0

Probability: p 1-p

Xi =1, if ith trial is a success i = 1,2,….,n.Xi =0, if ith trial is a failure.

Page 12: STAT 111 Introductory Statistics

Binomial Example: Checkout Lanes

• A grocery store has 10 checkout lanes. During a busy hour the probability that any given lane is occupied (has at least one customer) is 0.75. Assume that the lanes are occupied or not occupied independently of each other.– What is the probability that a customer will find at

least one lane unoccupied?– What is the expected number of occupied lanes?– What is the standard deviation of the number of

occupied lanes?

Page 13: STAT 111 Introductory Statistics

Sample Proportions

• In statistical sampling we often want to estimate the proportion p of “successes” in a population.

• The sample proportion is defined as

• If the count X is B(n, p), then the mean and standard deviation of the sample proportion are

sample of size

samplein successes ofcount ˆ

n

Xp

npp

ppE

p

p

)1(

)ˆ(

ˆ

ˆ

Page 14: STAT 111 Introductory Statistics

Sample Proportions

• Our sample proportion is an unbiased estimator of the population proportion p.

• The variability of our estimator decreases as sample size increases.

• In particular, we must multiply the sample size by 4 if we want the cut the standard deviation in half.

Page 15: STAT 111 Introductory Statistics

Sample Proportions

• The histogram of the distribution of the sample proportion when n = 1000, p = 0.6

0.00

0.05

0.10

0.15

0.20

0.25

0.30

P(X

)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 1

p_hat

Page 16: STAT 111 Introductory Statistics

Normal Approximation for Counts, Proportions

• Let X be the number of successes in a SRS of size n from a large population having proportion p of successes, and let the sample proportion of successes be denoted by

• Then for large n,– X is approximately normal with mean np and variance

np(1 – p).– is approximately normal with mean p and variance

p(1 – p) / n.

n

Xp ˆ

Page 17: STAT 111 Introductory Statistics

Normal Approximation: Rule of Thumb

• The accuracy of the approximation generally improves as the sample size n increases.

• For any fixed sample size, the approximation is most accurate when p is close to 0.5, and least accurate when p is near 0 or 1.

• As a general rule of thumb, then, we use the normal approximation for values of n and p such that np ≥ 10 and n(1 – p) ≥ 10.

Page 18: STAT 111 Introductory Statistics

Example

• The Laurier Company’s brand has a market share of 30%. Suppose that in a survey, 1,000 consumers of the product are asked which brand they prefer. What is the probability that more than 32% of the respondents will say they prefer the Laurier brand?

Page 19: STAT 111 Introductory Statistics

Another Example

• A quality engineer selects an SRS of size 100 switches from a large shipment for detailed inspection. Unknown to the engineer, 10% of the switches in the shipment fail to meet the specifications. The actual binomial probability that no more than 9 of the switches in the sample fail inspection is P(X ≤ 9) = .4513.

• How accurate is the normal approximation for this probability?

Page 20: STAT 111 Introductory Statistics

Another Example (cont.)

• Let X be the number of bad switches; then

X ~ B(100, 0.1).

• It’s not that accurate. Note that np = 10, so n and p are on the border of values for which we are willing to use the approximation.

3707.0)33.0(

)1.01)(1.0(100

)1.0(1009

)1()9(

ZP

pnp

npXPXP

Page 21: STAT 111 Introductory Statistics

Continuity Correction

• While the binomial distribution places probability exactly on X = 9 and X = 10, the normal distribution spreads probability continuously in that interval.

• The bar for X = 9 in a probability histogram goes from 8.5 to 9.5, but calculating P(X ≤ 9) using the normal approximation only includes the area to the left of the center of this bar.

• To improve the accuracy of our approximation, we should let X = 9 extend from 8.5 to 9.5, etc.

Page 22: STAT 111 Introductory Statistics

Continuity Correction

• Use continuity correction to approximate the binomial probability P(X=10) when n=100, p=0.1

• Using the normal approximation to the binomial distribution, X is approximately distributed as N(10, 3).

Page 23: STAT 111 Introductory Statistics

Continuity Correction

109.5 10.5

The exact binomial probability is P(X=10)=0.13187

P(Xbinomial=10)=0.13187

P(9.5<Xnormal<10.5)=0.13237

Page 24: STAT 111 Introductory Statistics

Continuity Correction

8)5.8()8( normalbinomial XPXP

8.5

Q: what about continuity correction for P(X<8)?

Page 25: STAT 111 Introductory Statistics

Continuity Correction

)5.13()14( normalbinomial XPXP

1413.5

Q: what about continuity correction for P(X>14)?

Page 26: STAT 111 Introductory Statistics

Example Re-visited

• Using the continuity correction, the probability that no more than 9 of the switches in the sample fail inspection is

4338.0)1667.(

)1.01)(1.0(100

)1.0(1005.9

)5.9()9(

ZP

ZP

XPXP normalbinomial

Page 27: STAT 111 Introductory Statistics

Example: Inspection of Switches

• Find the probability that at least 5 but at most 15 switches fail the inspection.

)4()15()155( binombinombinom XPXPXP

Page 28: STAT 111 Introductory Statistics

Sampling Distributions

• Counts and proportions are discrete random variables; used to describe categorical data.

• Statistics used to describe quantitative data are most often continuous random variables.

• Examples: sample mean, percentiles, standard deviation

• Sample means are among the most common statistics.

Page 29: STAT 111 Introductory Statistics

Sampling Distributions

• Regarding sample means,– They tend to be less variable than individual

observations.– Their distribution tends to be more normal than that

of individual observations.

• We’ll see why later.

Page 30: STAT 111 Introductory Statistics

Sampling Distributions of Sample Means

• Let be the mean of an SRS of size n from a population having mean µ and standard deviation σ.

• The mean and standard deviation of are

• Why?

x

x

nx

x

Page 31: STAT 111 Introductory Statistics

Sampling Distributions of Sample Means

• The shape of the distribution of the sample mean depends on the shape of the population distribution itself.

• One special case: normal population distribution

• Because: any linear combination of independent normal random variables is normal distributed.

on.distributi )/,( thehas

nsobservatiot independen of mean sample the

then on,distributi ),( thehas population a If

nN

nx

N

Page 32: STAT 111 Introductory Statistics

Example

• The foreman of a bottling plant has observed that the amount of soda pop in each “32-ounce” bottle is actually a normally distributed random variable, with a mean of 32.2 ounces and a standard deviation of .3 ounce.– If a customer buys one bottle, what is the probability

that that bottle contains more than 32 ounces?– If that same customer instead buys a carton of 4

bottles, what is the probability that the mean of those 4 bottles is greater than 32 ounces?

Page 33: STAT 111 Introductory Statistics

Example• The starting salaries of M.B.A.s at Wilfrid

Laurier Univ.(WLU) are normally distributed with a mean of $62,000 and a standard deviation of $14,500. The starting salaries of M.B.A.s at the University of Western Ontario (UWO) are normally distributed with a mean of $60,000 and a standard deviation of $18,300.– A random sample of 50 WLU M.B.A.s and a random

sample of 60 UWO M.B.A.s are selected– What is the probability that the sample mean of WLU

graduates will exceed that of the UWO graduates?

Page 34: STAT 111 Introductory Statistics

Central Limit Theorem

• When the population distribution is normal, so is the sampling distribution of

• What about when the population distribution is non-normal?

• For large sample sizes, it turns out that the distribution of gets closer to a normal distribution.

• As long as the population has finite standard deviation, this will be true regardless of the actual shape of the population distribution

x

x

Page 35: STAT 111 Introductory Statistics

Central Limit Theorem

• Formally, draw an SRS of size n from any population with mean µ and finite standard deviation σ.

• As n approaches infinity (gets very large)

• This can hold even if the observations are not independent or identically distributed.

• This is why normal distributions are common models for observed data.

n

Nx,ely approximat is