a p statistics lesson 9 – 1 ( day 1 )

A P STATISTICSLESSON 9 – 1

( DAY 1 )

SAMPLING DISTRIBUTIONS

ESSENTIAL QUESTION: How often would this method give a correct answer if I used it very many times? Objectives:To distinguish between parameters

and statistics.T0 define and recognize sampling

distributions, bias and variability.The advantages and disadvantages in

size of sample.

Introduction The reasoning of statistical

inference rests on asking, “How often would this method give a correct answer if I used it many, many times?”

If it doesn’t make sense to imagine repeatedly producing your data in the same circumstances, statistical inference is not possible.

Introduction (continued…)

All agree that inference is most secure when we produce data by random sampling or randomized comparative experiments.

The reason is that when we use chance to choose respondents or assign subjects, the laws of probability answer the question “What would happen if we did this many times?”

Parameter, Statistic A parameterparameter is a number that

describes the populationpopulation. A parameter is a fixed number, but in practice we do not know its value because we cannot examine the entire population.

A statisticstatistic is a number that describes a samplesample. The value of a statistic is known when we have taken a sample, but it can change from sample to sample. We often use a statistic to estimate an unknown parameter.

Example 9.1 Page 488Making Money

The mean income of the sample of households contacted by the Current Survey was x = $57,045. The number $57,045 is a statisticstatistic because it describes the Current Population Survey sample.

The parameter of interest is the mean income of all of these households. We don’t know the value of this parameterparameter.

Symbols for Populations and Samples

The symbol for population population proportionproportion is p.

The symbol for sample proportionsample proportion is p.

Since most of the time the actual parameters are not known, the mean and standard deviation for a sample are used for the parameters mean and standard deviation.

Example 9.1 (continued…)

The representation for a parameter mean is the Greek letter μ which is “mu”.

The mean of the sample is the symbol x.

The basic fact that every sample’s μ will probably be different is called sampling variabilitysampling variability. The value of a statistic varies in repeated random samples.

Sampling VariabilityIf we take many samples:

1. Take a large number of samples from the same population.

2. Calculate the sample mean x or proportion p for each sample.

3. Make a histogram of the values of x and p.

4. Examine the distribution displayed in the histogram for shape, center, and spread, as well as outliers or other deviations.

Example 9.3 page 490Baggage Check!

Simulation is a powerful tool for studying chance.

It is much faster to use Table BTable B than to actually draw repeated SRS’s, and much faster yet to use a computer program to produce random digits.

Sampling Distribution The sampling distribution of a

statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Strictly speaking, the sampling distribution is the ideal pattern that would emerge if we looked at all possible samples of the same size from the population.

Describing Sampling Distributions

Describe a sampling distribution by finding the centercenter and spreadspread of the sample.

Example 9.5 page 494Are You a Survivor Fan?

Figure 9.5 shows the results of drawing 1000 SRSs of size n = 100 from a population with p = 0.37.

We see that: The overall shape of the distribution is

symmetric and approximately normal. The center of the distribution is very

close to the true value p = 0.37.

The Bias of a Statistic Sampling distributions allow us to

describe bias more precisely by speaking of the bias of a statistic rather than bias in a sampling method.

Bias concerns the center of the sampling

distribution. The statistic from the larger sample is less variable.

(a) Sample size 100

(b) Sample size 1000

Unbiased Statistics A statistic used to estimate a parameter

is unbiased if the mean mean of its sampling distribution is equal to the true valuetrue value of the parameter being estimated.

An unbiased statistic will sometimes fall above the true value of the parameter and sometimes below if we take many samples.

Because its sampling distribution is centered at the true value, however, there is no systematic tendency to overestimate or underestimate the parameter.

The Variability of a Statistic

The statistics whose sampling distribution is unbiased when the its center is centered at the true proportion.

The sample proportion p from a random sample of any size is an unbiased estimate of the parameter p.

The Variability of a Statistic(continued…)

The variability of a statistic is described by the spread of its sampling distribution. This spread is determined by the sampling designsampling design and the size of size of the samplethe sample. Larger samples give smaller spread.

As long as the population is much larger than the sample (say, at least 10 times as large), the spread of the sampling distribution is approximately the same for any population size.

Bias and Variability Bias means that our aim is off and we consistently

miss the bulls-eye in the same direction. Our sample values do not center on the population

value. High variability means that repeated shots are

widely scattered on the target. Notice that low variability (shots are consistently

away from the bulls-eye in one direction), and low bias (shots centered on the bulls-eye), can accompany high variability (shots that are widely scattered).

Properly chosen statistics computed from random samples of sufficient size will have low bias and low variability.

Figure 9.9 Page 500

a p statistics lesson 9 – 1 ( day 1 )

Documents