sampling distributions & point estimation. questions what is a sampling distribution? what is...

30
Sampling Distributions & Point Estimation

Upload: dwight-stewart

Post on 22-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Sampling Distributions & Point Estimation

Page 2: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Questions

• What is a sampling distribution?

• What is the standard error?

• What is the principle of maximum likelihood?

• What is bias (in the statistical sense)?

• What is a confidence interval?• What is the central limit theorem?• Why is the number 1.96 a big deal?

Page 3: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Population

• Population & Sample Space

• Population vs. sample

• Population parameter, sample statistic

Page 4: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Parameter Estimation

We use statistics to estimate parameters, e.g., effectiveness of pilot training,

effectiveness of psychotherapy.

X SD

Page 5: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Sampling Distribution (1)

• A sampling distribution is a distribution of a statistic over all possible samples.

• To get a sampling distribution, – 1. Take a sample of size N (a given number like

5, 10, or 1000) from a population

– 2. Compute the statistic (e.g., the mean) and record it.

– 3. Repeat 1 and 2 a lot (infinitely for large pops).

– 4. Plot the resulting sampling distribution, a distribution of a statistic over repeated samples.

Page 6: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Suppose

• Population has 6 elements: 1, 2, 3, 4, 5, 6 (like numbers on dice)

• We want to find the sampling distribution of the mean for N=2

• If we sample with replacement, what can happen?

Page 7: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

1st 2nd M 1st 2nd M 1st 2nd M

1 1 1 3 1 2 5 1 3

1 2 1.5 3 2 2.5 5 2 3.5

1 3 2 3 3 3 5 3 4

1 4 2.5 3 4 3.5 5 4 4.5

1 5 3 3 5 4 5 5 5

1 6 3.5 3 6 4.5 5 6 5.5

2 1 1.5 4 1 2.5 6 1 3.5

2 2 2 4 2 3 6 2 4

2 3 2.5 4 3 3.5 6 3 4.5

2 4 3 4 4 4 6 4 5

2 5 3.5 4 5 4.5 6 5 5.5

2 6 4 4 6 5 6 6 6

Possible Outcomes

Page 8: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

HistogramSampling distribution for mean of 2 dice.

1+2+3+4+5+6 = 21.21/6 = 3.5

There is only 1 way to get a mean of 1, but 6 ways to get a mean of 3.5.

Page 9: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Sampling Distribution (2)

• The sampling distribution shows the relation between the probability of a statistic and the statistic’s value for all possible samples of size N drawn from a population.

Mean Value

f(M

)

Hypothetical Distribution of Sample Means

Page 10: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Sampling Distribution Mean and SD• The Mean of the sampling distribution is

defined the same way as any other distribution (expected value).

• The SD of the sampling distribution is the Standard Error. Important and useful.

• Variance of sampling distribution is the expected value of the squared difference – a mean square.

• Review22 )( GG GE

Page 11: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Review

• What is a sampling distribution?

• What is the standard error of a statistic?

Page 12: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Statistics as Estimators

• We use sample data compute statistics.• The statistics estimate population values, e.g.,

• An estimator is a method for producing a best guess about a population value.

• An estimate is a specific value provided by an estimator.

• We want good estimates. What is a good estimator? What properties should it have?

X

Page 13: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Maximum Likelihood (1)

• Likelihood is a conditional probability.• • L is the probability (say) that x has some

value given that the parameter theta has some value. L1 is the probability of observing heights of 68 inches and 70 [data] inches given adult males[theta]. L2 is the probability of 68 and 70 inches given adult females.

• Theta ( ) could be continuous or discrete.

)|( valuexpL

Page 14: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Maximum Likelihood (2)

• Suppose we know the function (e.g., binomial, normal) but not the value of theta.

• Maximum likelihood principle says take the estimate of theta that makes the likelihood of the data maximum.

• MLP says: Choose the value of theta that makes this maximum:

)|,...,( 21 NxxxL

Page 15: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Maximum Likelihood (3)

• Suppose we have 2 values hypothesized for proportions of male grad students at USF, 50 and 40. We randomly sample 15 students and find that 9 are male.

• Calculate likelihood for each using binomial:

• The .50 estimate is better because the data are more likely.

153.50.50.9

15)15,50.;9( 69

NpxL

061.60.40.9

15)15,40.;9( 69

NpxL

Page 16: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Likelihood Function

-0.05

0

0.05

0.1

0.15

0.2

0.25

0 0.2 0.4 0.6 0.8 1

Theta (p value)

Lik

elih

ood

The binomial distribution computes probabilities

Page 17: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Maximum Likelihood (4)

• In example, best (max like) estimate would be 9/15 = .60.

• There is a general class called maximum likelihood estimators that find values of theta that maximizes the likelihood of a sample result.

• ML is one principle of ‘goodness’ of an estimator

Page 18: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

More Goodness (1)

• Bias. If E(statistic)=parameter, the estimator is unbiased. If it’s unbiased, the mean of the sampling distribution equals the parameter. The sample mean has this property: . Sample variance is biased.

)(XE

Page 19: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

More Goodness (2)

• Efficiency – size of the sampling variance.• Relative Efficiency. Relative efficiency is the

ratio of two sampling variances.

• More efficient statistics have smaller sampling variances, smaller standard error, and are preferred because if both are unbiased, one is closer than the other to the parameter on average.

HtorelativeGofefficiencyG

H 2

2

Page 20: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Goodness (3)

• Sometimes we trade off bias and efficiency. A biased estimator is sometime preferred if it is more efficient, especially if the magnitude of bias is known.

• Resistance. Indicates minimal influence of outliers. Median is more resistant than the mean.

Page 21: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Sampling Distribution of the Mean• Unbiased:

• Variance of sampling distribution of means based on N obs:

• Standard Error of the Mean:

• Law of large numbers: Large samples produce sample estimates very close to the parameter.

)(XE

NV MM

22

NM

Page 22: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Unbiased Estimate of Variance• It can be shown that:

• The sample variance is too small by a factor of (N-1)/N.

• We fix with

• Although the variance is unbiased, the SD is still biased, but most inferential work is based on the variance, not SD.

22

22 1)(

N

N

NSE

1

)(

1

222

N

XXS

N

Ns

Page 23: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Review

• What is the principle of maximum likelihood?

• Define– Bias– Efficiency– Resistance

• Is the sample variance (SS divided by N) a biased estimator?

Page 24: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Interval Estimation

• Use the standard error of the mean to create a bracket or confidence interval to show where good estimates of the mean are.

• The sampling distribution of the mean is nice* when N>20. Therefore:

• Suppose M=100, SD=14, N=49. Then SDM=14/7=2. Bracket = 100-6 =94 to 100+6 = 106 is 94 to 106. P is probability of sample not mu.

95.)33( MM XXp

* Unimodal and symmetric

Page 25: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Review

• What is a confidence interval?

• Suppose M = 50, SD = 10, and N =100. What is the confidence interval?

SEM = 10/sqrt(100) = 10/10 = 1CI (lower) = M-3SEM = 50-3 = 47CI (upper) = M+3SEM = 50+3 = 53CI = 47 to 53

Page 26: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Central Limit Theorem

– 1. Sampling distribution of means becomes normal as N increases, regardless of shape of original distribution.

– 2. Binomial becomes normal as N increases.

– 3. Applies to other statistics as well (e.g., variance)

Page 27: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Properties of the Normal

• If a distribution is normal, the sampling distribution of the mean is normal regardless of N.

• If a distribution is normal, the sampling distributions of the mean and variance are independent.

Page 28: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Confidence Intervals for the Mean• Over samples of size N, the probability is .95

for • Similarly for sample values of the mean, the

probability is .95 that

• The population mean is likely to be within 2 standard errors of the sample mean.

• Can use the Normal to create any size confidence interval (85, 99, etc.)

MM X 96.196.1

MM XX 96.196.1

Page 29: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Size of the Confidence Interval• The size of the confidence interval depends on

desired certainty (e.g., 95 vs 99 pct) and the size of std error of mean ( ).

• Std err of mean is controlled by population SD and sample size. Can control sample size.

• SD 10. If N=25 then SEM = 2 and CI width is about 8. If N=100, then SEM = 1 and CI width is about 4. CI shrinks as N increases. As N gets large, decreasing change in CI because of square root. Less bang for buck as N gets big.

M

NM

Page 30: Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

Review

• What is the central limit theorem?• Why is the number 1.96 a big deal?• Assume that scores on a curiosity scale are

normally distributed. If the sample mean is 50 based on 100 people and the population SD is 10, find an approx 99 pct CI for the population mean.