for 95 out of 100 (large) samples, the interval will contain the true population mean. but we...
TRANSCRIPT
For 95 out of 100 (large) samples, the interval
will contain the true population mean.
nx x96.1
But we don’t know ?!
Inference for the Mean of a Population
To estimate , we use a confidence interval around x.
The confidence interval is built with , which we replace with s (the sample std. dev.) if is not known.
nx x96.1
t-distributions
ns
The “standard error” of x.
nsx
t
The “standard error” of x.
For an SRS sample, the one-sample t-statistic has the t-distribution with n-1 degrees of freedom.
(see Table D)
t-distributions
t-distributions with k (=n-1) degrees of freedom – are labeled t(k), – are symmetric around 0, – and are bell-shaped – … but have more variability than Normal
distributions, due to the substitution of s in the place of .
Example: Estimating the level of vitamin C
Data:
26 31 23 22 11 22 14 31 Find a 95% confidence interval for . A: ( , ) Write it as “estimate plus margin of error”
STATA Exercise 1
STATA Exercise 2
STATA Exercise 2
STATA Exercises 3 and 4
Paired, unpaired tests
“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.
Ho: mean(pretest - posttest) = mean(diff) = 0
STATA Exercise 5
STATA Exercise 6
Robustness of t procedures
t-tests are only appropriate for testing a hypothesis on a single mean in these cases:– If n<15: only if the data is Normally distributed
(with no outliers or strong skewness)– If n≥15: only if there are no outliers or strong
skewness– If n≥40: even if clearly skewed (because of the
Central Limit Theorem)
Comparing Two Means
Comparing Two Means
Suppose we make a change to the registration procedure. Does this reduce the number of mistakes?
Basically, we’re looking at two populations: – the before-change population (population 1)– the after-change population (population 2)
Is the mean number of mistakes (per student) different? Is 1 – 2 = 0 or 0?
Comparing Two Means
Notice that we are not matching pairs. We compare two groups.
Comparing Two Means
Population Variable MeanStandard Deviation
1 x1 1 1
2 x2 2 2
Comparing Two Means
PopulationSample
SizeSample Mean
Sample Standard Deviation
1 n1 x1 s1
2 n2 x2 s2
Comparing Two Means
The population, really, is every single student using each registration procedure, an infinite number of times.– Suppose we get a “good” result today: how do we
know it will be repeated tomorrow? We can’t repeat the procedure an infinite
number of times, we only have a “sample”: numbers from one year.
We estimate (1 – 2) with (x1 – x2) .
Comparing Two Means
Remember is a Random Variable. To estimate we need both and the margin of error around , which is
So we need to know ,or rather, the appropriate standard error for this estimation.
Because we are estimating a difference, we need the standard error of a difference.
nt x*x
nx
xx
=0
Comparing Two Means
If the standard error for is
Then the standard error for (x1 – x2) is
1
1
n
1x
2
22
1
21
nn
2
22
1
21
2121
nn
xxt
STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.
Two-sample significance test
STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.
STATA Exercise 7
Paired, unpaired tests
“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.
Ho: mean(pretest - posttest) = mean(diff) = 0 “Unpaired” tests take the mean of each variable and test
whether the difference of the means is zero.Ho: mean(pretest) - mean(posttest) = diff = 0
STATA Exercise 5
STATA Exercise 8ttest ego, by(group) unequal
Robustness and Small Samples
Two-sample methods are more robust than one-sample methods.– More so if the two samples have similar shapes
and sample sizes. STATA assumes that the variances are the same (what
the book calls “pooled t procedures”), unless you tell it the opposite, using the unequal option.
Small samples, as always, make the test less robust.
Pooled two-sample t procedures
Pooled two-sample t procedures
Suppose the two Normal population distributions have the same standard deviation.
Then the t-statistic that compares the means of samples from those two populations has exactly a t-distribution.
Pooled two-sample t procedures
The common, but unknown standard deviation of both populations is . The sample standard deviations s1 and s2 estimate .
The best way to combine these estimates is to take a “weighted average” of the two, using the dfs as the weights:
2
11
21
222
2112
nn
snsnsp
(assuming is the same for both populations)
21
11
nnsp
Here, t* is the value for the t(n1 + n2 – 2) density curve with area C between – t* and t*.
To test the hypothesis Ho: 1 = 2, compute the pooled two-sample t statistic
And use P-values from the t(n1 + n2 – 2) distribution.
21
21
11nn
s
xxt
p
THE POOLED TWO-SAMPLE T PROCEDURES
ttest ego, by(group)