agresti/franklin statistics, 1e, 1 of 139 section 6.4 how likely are the possible values of a...

Agresti/Franklin Statistics, 1e, 1 of 139

Section 6.4

How Likely Are the Possible Values of a Statistic?

The Sampling Distribution


Statistic

Recall: A statistic is a numerical summary of sample data, such as a sample proportion or a sample mean.


Parameter

Recall: A parameter is a numerical summary of a population, such as a population proportion or a population mean.


Statistics and Parameters

In practice, we seldom know the values of parameters.

Parameters are estimated using sample data.

We use statistics to estimate parameters.


Example: 2003 California Recall Election

Prior to counting the votes, the proportion in favor of recalling Governor Gray Davis was an unknown parameter.

An exit poll of 3160 voters reported that the sample proportion in favor of a recall was 0.54.



If a different random sample of about 3000 voters were selected, a different sample proportion would occur.



Imagine all the distinct samples of 3000 voters you could possibly get.

Each such sample has a value for the sample proportion.


Statistics and Parameters

How do we know that a sample statistic is a good estimate of a population parameter?

To answer this, we need to look at a probability distribution called the sampling distribution.


Sampling Distribution

The sampling distribution of a statistic is the probability distribution that specifies probabilities for the possible values the statistic can take.


The Sampling Distribution of the Sample Proportion

Look at each possible sample. Find the sample proportion for each

sample. Construct the frequency distribution of

the sample proportion values. This frequency distribution is the

sampling distribution of the sample proportion.


Example: Sampling Distribution

Which Brand of Pizza Do You Prefer?

• Two Choices: A or D.

• Assume that half of the population prefers Brand A and half prefers Random D.

• Take a random sample of n = 3 tasters.



Sample No. Prefer Pizza A

Proportion

(A,A,A) 3 1

(A,A,D) 2 2/3

(A,D,A) 2 2/3

(D,A,A) 2 2/3

(A,D,D) 1 1/3

(D,A,D) 1 1/3

(D,D,A) 1 1/3

(D,D,D) 0 0



Sample Proportion

Probability

0 1/8

1/3 3/8

2/3 3/8

1 1/8


Mean and Standard Deviation of the Sampling Distribution of a Proportion

For a binomial random variable with n trials and probability p of success for each, the sampling distribution of the proportion of successes has:

To obtain these value, take the mean np and standard deviation for the binomial distribution of the number of successes and divide by n.

n

p)-p(1deviation standard and pMean

)1( pnp



Sample: Exit poll of 3160 voters.

Suppose that exactly 50% of the population of all voters voted in favor of the recall.



Describe the mean and standard deviation of the sampling distribution of the number in the sample who voted in favor of the recall.

• µ = np = 3160(0.50) = 1580

• 1.28)50.0)(50.0(3160p)-np(1



Describe the mean and standard deviation of the sampling distribution of the proportion in the sample who voted in favor of the recall.

0089.0000079.03160

)50.0)(50.0()1(Deviation Standard

0.50 p Mean

n

pp

BE VERY CAREFUL


The Standard Error

To distinguish the standard deviation of a sampling distribution from the standard deviation of an ordinary probability distribution, we refer to it as a standard error.



If the population proportion supporting recall was 0.50, would it have been unlikely to observe the exit-poll sample proportion of 0.54?

Based on your answer, would you be willing to predict that Davis would be recalled from office?



Fact: The sampling distribution of the sample proportion has a bell-shape with a mean µ = 0.50 and a standard deviation σ = 0.0089.



Convert the sample proportion value of 0.54 to a z-score:

4.5 0.0089

0.50) - (0.54 z



The sample proportion of 0.54 is more than four standard errors from the expected value of 0.50.

The sample proportion of 0.54 voting for recall would be very unlikely if the population support were p = 0.50.



A sample proportion of 0.54 would be even more unlikely if the population support were less than 0.50.

We there have strong evidence that the population support was larger than 0.50.

The exit poll gives strong evidence that Governor Davis would be recalled.



Describe the mean and standard deviation of the sampling distribution of the proportion in the sample who voted in favor of the recall.

0089.0000079.03160

)50.0)(50.0()1(Deviation Standard

0.50 p Mean

n

pp

BE VERY CAREFUL


Summary of the Sampling Distribution of a Proportion

For a random sample of size n from a population with proportion p, the sampling distribution of the sample proportion has

If n is sufficiently large such that the expected numbers of outcomes of the two types, np and n(1-p), are both at least 15, then this sampling distribution has a bell-shape.

n

p)-p(1error standard and pMean


Section 6.5

How Close Are Sample Means to Population Means?


The Sampling Distribution of the Sample Mean

The sample mean, x, is a random variable.

The sample mean varies from sample to sample.

By contrast, the population mean, µ, is a single fixed number.


Mean and Standard Error of the Sampling Distribution of the Sample Mean

For a random sample of size n from a population having mean µ and standard deviation σ, the sampling distribution of the sample mean has:

• Center described by the mean µ (the same as the mean of the population).

• Spread described by the standard error, which equals the population standard deviation divided by the square root of the sample size: n


Example: How Much Do Mean Sales Vary From Week to Week?

Daily sales at a pizza restaurant vary from day to day.

The sales figures fluctuate around a mean µ = $900 with a standard deviation σ = $300.



The mean sales for the seven days in a week are computed each week.

The weekly means are plotted over time. These weekly means form a sampling

distribution.



What are the center and spread of the sampling distribution?

113 7

300

900$


Sampling Distribution vs. Population Distribution


Standard Error

Knowing how to find a standard error gives us a mechanism for understanding how much variability to expect in sample statistics “just by chance.”


Standard Error The standard error of the sample mean:

As the sample size n increases, the denominator increase, so the standard error decreases.

With larger samples, the sample mean is more likely to fall close to the population mean.

n


Central Limit Theorem

Question: How does the sampling distribution of the sample mean relate with respect to shape, center, and spread to the probability distribution from which the samples were taken?


Central Limit Theorem

For random sampling with a large sample size n, the sampling distribution of the sample mean is approximately a normal distribution.

This result applies no matter what the shape of the probability distribution from which the samples are taken.


Central Limit Theorem: How Large a Sample?

The sampling distribution of the sample mean takes more of a bell shape as the random sample size n increases. The more skewed the population distribution, the larger n must be before the shape of the sampling distribution is close to normal. In practice, the sampling distribution is usually close to normal when the sample size n is at least about 30.


A Normal Population Distribution and the Sampling Distribution

If the population distribution is approximately normal, then the sampling distribution is approximately normal for all sample sizes.


How Does the Central Limit Theorem Help Us Make Inferences

For large n, the sampling distribution is approximately normal even if the population distribution is not.

This enables us to make inferences about population means regardless of the shape of the population distribution.

agresti/franklin statistics, 1e, 1 of 139 section 6.4 how likely are the possible values of a...

Documents