for 95 out of 100 (large) samples, the interval will contain the true population mean. but we...

For 95 out of 100 (large) samples, the interval

will contain the true population mean.

nx x96.1

But we don’t know ?!

Inference for the Mean of a Population

To estimate , we use a confidence interval around x.

The confidence interval is built with , which we replace with s (the sample std. dev.) if is not known.

nx x96.1

t-distributions

ns

The “standard error” of x.

nsx

t

The “standard error” of x.

For an SRS sample, the one-sample t-statistic has the t-distribution with n-1 degrees of freedom.

(see Table D)

t-distributions

t-distributions with k (=n-1) degrees of freedom – are labeled t(k), – are symmetric around 0, – and are bell-shaped – … but have more variability than Normal

distributions, due to the substitution of s in the place of .

Example: Estimating the level of vitamin C

Data:

26 31 23 22 11 22 14 31 Find a 95% confidence interval for . A: ( , ) Write it as “estimate plus margin of error”

STATA Exercise 1

STATA Exercise 2

STATA Exercises 3 and 4

Paired, unpaired tests

“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.

Ho: mean(pretest - posttest) = mean(diff) = 0

STATA Exercise 5

STATA Exercise 6

Robustness of t procedures

t-tests are only appropriate for testing a hypothesis on a single mean in these cases:– If n<15: only if the data is Normally distributed

(with no outliers or strong skewness)– If n≥15: only if there are no outliers or strong

skewness– If n≥40: even if clearly skewed (because of the

Central Limit Theorem)

Comparing Two Means

Comparing Two Means

Suppose we make a change to the registration procedure. Does this reduce the number of mistakes?

Basically, we’re looking at two populations: – the before-change population (population 1)– the after-change population (population 2)

Is the mean number of mistakes (per student) different? Is 1 – 2 = 0 or 0?

Comparing Two Means

Notice that we are not matching pairs. We compare two groups.

Comparing Two Means

Population Variable MeanStandard Deviation

1 x1 1 1

2 x2 2 2

Comparing Two Means

PopulationSample

SizeSample Mean

Sample Standard Deviation

1 n1 x1 s1

2 n2 x2 s2

Comparing Two Means

The population, really, is every single student using each registration procedure, an infinite number of times.– Suppose we get a “good” result today: how do we

know it will be repeated tomorrow? We can’t repeat the procedure an infinite

number of times, we only have a “sample”: numbers from one year.

We estimate (1 – 2) with (x1 – x2) .

Comparing Two Means

Remember is a Random Variable. To estimate we need both and the margin of error around , which is

So we need to know ,or rather, the appropriate standard error for this estimation.

Because we are estimating a difference, we need the standard error of a difference.

nt x*x

nx

xx

=0

Comparing Two Means

If the standard error for is

Then the standard error for (x1 – x2) is

1

1

n

1x

2

22

1

21

nn

2

22

1

21

2121

nn

xxt

STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.

Two-sample significance test

STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.

STATA Exercise 7

Paired, unpaired tests

“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.

Ho: mean(pretest - posttest) = mean(diff) = 0 “Unpaired” tests take the mean of each variable and test

whether the difference of the means is zero.Ho: mean(pretest) - mean(posttest) = diff = 0

STATA Exercise 5

STATA Exercise 8ttest ego, by(group) unequal

Robustness and Small Samples

Two-sample methods are more robust than one-sample methods.– More so if the two samples have similar shapes

and sample sizes. STATA assumes that the variances are the same (what

the book calls “pooled t procedures”), unless you tell it the opposite, using the unequal option.

Small samples, as always, make the test less robust.

Pooled two-sample t procedures


Suppose the two Normal population distributions have the same standard deviation.

Then the t-statistic that compares the means of samples from those two populations has exactly a t-distribution.


The common, but unknown standard deviation of both populations is . The sample standard deviations s1 and s2 estimate .

The best way to combine these estimates is to take a “weighted average” of the two, using the dfs as the weights:

2

11

21

222

2112

nn

snsnsp

(assuming is the same for both populations)

21

11

nnsp

Here, t* is the value for the t(n1 + n2 – 2) density curve with area C between – t* and t*.

To test the hypothesis Ho: 1 = 2, compute the pooled two-sample t statistic

And use P-values from the t(n1 + n2 – 2) distribution.

21

21

11nn

s

xxt

p

THE POOLED TWO-SAMPLE T PROCEDURES

ttest ego, by(group)

for 95 out of 100 (large) samples, the interval will contain the true population mean. but we...

Documents