announcements exam 1 key posted on web hw 7 (posted on web) due oct. 24 bonus e due oct. 24 office...
TRANSCRIPT
Announcements
• Exam 1 key posted on web
• HW 7 (posted on web) Due Oct. 24
• Bonus E Due Oct. 24• Office Hours
– this week:
– Wed. 8-11, 3-4
– Fri. 8-11
• For Bonus E look for sample size, margin of error, sampling method, etc.
• Last time in class, we discussed summary statistics and graphs
• Today we will cover the sampling distribution of the mean
Bonus E: Election Coverage
• Give a statistical critique of election coverage of next week’s debate
• If you can’t watch debate, you may use a magazine or newspaper (include copy)
• Clarity: 2 points
• Validity: 2 points
• Brevity: 2 points
• Typed on paper: due Oct. 24
Sample & Population Symbols
• Sample mean = x (x-bar)• Sample SD = s• Sample size = n
• Population mean = pop
• Population SD = pop
Standard letters represent sample values (which are known) and Greek letters represent population values (which are unknown and “Greek to me”).
The sample values will be used to estimate the population values.
Sampling Distribution of the Mean
• Sampling distribution characterizes all possible sample means and their likelihoods
• Sampling distribution will be used in hypothesis testing and confidence intervals for the population mean
It’s normal!• For large enough samples
(at least 30 observations) the sampling distribution is normal
• If the population is normal, the sampling distribution is normal for any sample size.
• The mean of the normal curve is pop
• The SD of the normal curve is pop/sqrt(n)
• (n=no. obs)
pop
pop/sqrt(n)
Benefits of Normality
• The horizontal axis corresponds to possible sample values for the mean
• The height of the curve represents how many samples have a sample mean of that value
• The most likely sample means are also those closest to the true value.
• The width of the curve narrows with larger samples - showing that larger samples get closer to the truth!
Review so far
• We’re dealing with numerical data
• Wish to summarize with mean and SD
• Want to generalize to population
• Look at all possible samples and their corresponding means
• Resulting distribution of sample means is normal
• Think of the distribution as a histogram of the possible sample means
Where to go from here
• We want to use this fact to generalize our results to the population through hypothesis tests and confidence intervals
• We know that the normal curve is centered at the right place - the population mean
• If we can figure out the width of the normal curve, then we know how close the sample mean should be to the population mean
• We need to relate the resulting normal curve to the standard normal to find areas
Converting it to a “Z”• Recall that to convert
our curve to the standard normal, we use Z = (X-)/
= pop
= pop/sqrt(n)
• Then we can find the areas and the probabilities of certain samples
• There is a minor obstacle - we don’t know pop
• Let’s estimate pop with s, the sample SD
• This will modify the right side of the Z formula
• Left must be modified to balance this change
How to modify “Z”• The modification must account for the fact
that a sample value was used to replace a true population value
• The sample standard deviation has its own sampling distribution
• With larger samples, the sample standard deviation will be closer to the true population standard deviation
• By accounting for these considerations, the sampling distribution for “Z” was found
The t-distribution• The correct sampling distribution for the
value t = (x-)/(s/sqrt(n)) is the t-distribution
• The t-distribution is shaped like the standard normal, but a little shorter with heavier tails
• The heavier tails account for the fact that the sample standard deviation has it’s own sampling variability
• The t-distribution has a total area of 1
The t-distribution• Symmetric about zero• “Spread” determined by
the degrees of freedom• df = n-1• Higher df means that the
sample standard deviation is a good approximation
• Therefore, higher df makes the t more like the z
History of the t-distribution
• Actually it’s called the “Student’s t-distribution”
• Guiness Brewery in Ireland was trying to use sampling to monitor the quality of its products and hired a mathematician
• This mathematician developed the t-distribution to address the challenge
• Published under the name “student” to protect company secrets in early 1900’s
Example: Frozen Dinners
• Each dinner is slightly different
• Stated No. of calories per meal is 240
• Wish to test if this is true (H0: = 240)
• HA: 240, = 0.05
• Take a simple random sample of 12 dinners
• Calorie Counts: 255, 244, 239, 242, 265, 245, 259, 248, 225, 226, 251
• x = 244.33, s = 12.38• t=(244.33-240)
/(12.38/sqrt(12)) =1.21• df = 11• p-value = 0.2508• 95 % CI = (236,253)
Example: Frozen Dinners
• The t-curve to the right represents all possible sample values of t if the true pop is normal with mean 240
• Our sample is fairly reasonable under this null hypothesis
• Fail to reject H0
Example: Pennies• Wish to determine the
average age of penny in circulation
• Test H0: pop <= 8
• Set = 0.05• Sample 67 pennies (is
it simple random?)• sample mean = 11.40• sample SD = 8.50
• Sample t = 3.28• df = 66• p-value = 0.0008• Reject the null• If the true population
mean age was 8, we would observe a sample with this high of a mean (or higher) only 0.0008 of the time
Example: Pennies• If the true population has
mean 8, then all possible sample means and SDs (and thus sample t’s) are described by this t-dist
• The real sample has t=3.28, quite rare if the null is true
• Reject the null! The real sample doesn’t match the proposed population.
Example: Pennies
• Well, if 8 is not a reasonable value for the population mean, then what is?
• I start by proposing population values until my observed sample falls in the middle.
• This results in the confidence interval.
• The 95% CI in this case is (9.32,13.48)
• Note: The interval does not include 8! The test and CI agree: 8 is not a reasonable population value.
The Formulas• Hypothesis Testing
t = (x-Ho)/(s/sqrt(n))
df = n-1
use table to find p-value• Confidence Interval
x ± t /2,n-1s/sqrt(n)
• Stataquest can perform calculations
• Not required to memorize, but note:
• Increasing n narrows CI’s• Decreasing widens CI’s• Increasing n reduces the
probability of a Type II• Increasing reduces the
probability of a Type II• Type I is fully controlled
by = Prob of Type I
Review and Preview
• We’re dealing with numerical data
• Want to make statements about pop mean
• Sampling distribution of mean is normal
• SD of resulting normal is unknown, depends on pop SD
• Estimating the pop SD with sample SD results in t-distribution
• t distribution is used to make statements about a population mean through hypothesis tests and confidence intervals
Review and Preview
• Both examples done today used the one-sample t-test
• The one sample t-test is used to make statements about a single population mean
• Next time we will discuss the paired t-test
• The paired t-test is used to test the mean change in a population (before-after studies, like Quaker Oats commercial)
Using StataQuest: One Sample t procedures
• Click Editor. Enter data. Click Close.
• Go to Statistics: Parametric Tests: 1- sample t test.
• Select the variable of interest and set confidence level
• For the frozen dinners, this gives:
Variable | Obs Mean Std. Dev.
---------+---------------------------------
calories | 12 244.3333 12.38278
Ho: mean = 240
t = 1.21 with 11 d.f.
Pr > |t| = 0.2508
95% CI = (236.46568,252.20098)
• Since our test is two-tailed and SQ gives two-tailed p-values, this is the p-value for our test as well.
Using StataQuest:One sample t procedures
• Following the steps on the previous page for the pennies we get:
Variable | Obs Mean Std. Dev.
---------+---------------------------------
age | 67 11.40299 8.501443
Ho: mean = 8
t = 3.28 with 66 d.f.
Pr > |t| = 0.0017
95% CI = (9.3293251,13.476645)
• We want a right tail p-value since our alternative is right-sided
• Since t>0, we take the 2-tailed p-value given by SQ and divide it by two
• If the alternative had been left sided, we would take half of SQ’s p-value and subtract it from one
Using StataQuestOne Sample t procedures
• Use this method when you have summary stats: x, s, n
• Go to Calculator: 1-sample t test
• Enter the requested info (hypothesized mean is the mean stated in null hypothesis)
• For the pennies– No. obs = 67
– Sample mean = 11.40
– Sample SD = 8.50
– Hyp. Mean = 8
– Conf. Level = 95
Variable | Obs Mean Std. Dev.
---------+---------------------------------
x | 67 11.4 8.5
Ho: mean = 8
t = 3.27 with 66 d.f.
Pr > |t| = 0.0017
95% CI = (9.32669,13.47331)