stat 512 – day 8 tests of significance (ch. 6). last time use random sampling to eliminate...
Post on 21-Dec-2015
221 views
TRANSCRIPT
Stat 512 – Day 8
Tests of Significance (Ch. 6)
Last Time
Use random sampling to eliminate sampling errors
Use caution to reduce nonsampling errors Use probability theory to estimate the size of
random sampling errors Normal distribution is often useful
Last Time – Central Limit Theorem Case 1: Variable of interest is quantitative
Eg.,date of pennies, length of words Parameters: population mean, , population standard
deviation, Sampling distribution for sample means will be centered at
with standard deviation (assuming random sample) Sampling distribution will follow a normal distribution if
population is normal otherwise sampling distribution will be approximately normal if n is “large” (n > 30)
n/
Last Time – Central Limit Theorem Case 2: Variable of interest is qualitative (binary)
Eg., orange or not orange candy, did heart transplant patient survive
Parameters: population proportion/probability, Sampling distribution for sample proportions will be
centered at with standard deviation (assuming random sample)
Sampling distribution will approximately follow a normal distribution if if n is “large” (n > 10; n(1-)>10)
n/)1(
Population
samples
p̂
p̂p̂p̂
p̂ p̂ p̂p̂
p̂
p̂p̂
p̂ p̂
pop proportion
p̂p̂
p̂ p̂ p̂ p̂p̂p̂ p̂ p̂p̂ p̂p̂p̂ Sampling
distribution
p̂
p̂p̂p̂p̂p̂p̂p̂p̂p̂p̂p̂p̂
p̂p̂p̂p̂p̂p̂p̂p̂p̂ p̂p̂p̂p̂
Is the sample proportion that I observed surprising?
Sample proportions
Last Time – Central Limit Theorem Moral: Don’t have to keep running these
simulations with means and proportions, know what the answer is going to be!
Therefore
Knowledge of the sampling distribution, allows us to make judgments of whether an observation is surprising Instead of running simulations, have the Central
Limit Theorem to tell us about the behavior of many (hypothetical) samples
Technical Conditions
Keep in mind that we can’t always apply the Central Limit Theorem Means: random sample, n > 30 or normal
population Proportions: random sample, n>10 and
n(1-)>10 Consider both sample size and value of Recall Gettysburg Address
Practice Problem 2
Hospital under suspicion for high mortality rate from heart transplantations Last ten cases has 80% mortality rate
Researchers determined that 15% was a reasonable benchmark for transplantation.
Parameter of interest: = the overall mortality rate for heart transplants at this hospital
Give the hospital the benefit of the doubt and assume is equal to .15
See if that is consistent with the data at the hospital
Practice Problem 2
Observed: 19.7% of the 361 heart transplant patients between 1986 and 2000 died at this hospital
Can this be explained “by chance”? Central Limit Theorem
If =.15, we would see a sample proportion at least as large as .197 in only .62% of random samples.
Either really unlucky coincidence or =.15 belief is wrong
More Formal Structure
1. Assumptions Review data collection plan
Simple random sample? Quantitative or qualitative data? Are the conditions for the Central Limit Theorem
met?
Check technical conditions
More Formal Structure
2. State competing conjectures about the parameter of interest = .15 (equals national rate)
Null hypothesis, H0
> .15 (what actually suspect) Alternative hypothesis, Ha
More Formal Structure
3. Measure the discrepancy between what observed and what hypothesized
)ˆ(
ˆ
pSD
edhypothesizpobservedz
50.2361/)85(.15.
15.197.
z
“test statistic”
0
More Formal Structure
4. How unusual is this discrepancy, assuming the null hypothesis is true Probability > .197 = probability Z > 2.50 = .0062
“p-value”assuming = .15
The smaller the p-value, the stronger the evidence against the null hypothesis Would get such an extreme sample proportion in only .6% of
random samples if =.15.
p̂
Matches direction of alternative hypothesis
More Formal Structure
5. Draw a conclusion Use same cut-off guidelines
E.g., p-value < .05 statistically significant evidence against null hypothesis “Reject the null hypothesis”
If p-value is considered large, we “fail to reject the null hypothesis”
Test of Significance
0. Define parameter of interest1. Check technical conditions2. State null and alternative hypotheses about
the population parameter of interest (in symbols and in words)
3. Calculate test statistic4. Determine p-value
Be able to interpret
5. State conclusion (about the null hypothesis)
Sketch a picture of the sampling distribution!
In English!
Example 1: Cohen v. Brown University Observational units = Brown university
athletes; variable = gender Population = process of males and females
becoming athletes at Brown Parameter of interest = probability of a Brown
University athlete being female, H0: .51 (proportional to student body)
Ha: .51 (smaller prob of an athlete being F)
Example 1: Cohen v. Brown Univ Give the university the benefit of doubt Does the CLT apply?
897(.51) = 457.5 > 10 and 897(.49)=439.5 > 10 Consider this a “random sample” from this process
Test statistic: z = -7.79 P-value < .000000287 (from Table A) Reject H0, very unlikely to have only 38% females in
sample if randomly selected from population w/ 51% Strong evidence that < .51
Example 2: Kissing the Right Way “Human Behavior: Adult Persistence of Head-
Turning Asymmetry,” Güntürkün, O. Nature, 421: 771, 2003.
Example 2: Kissing the Right Way Let represent the probability of a kissing couple
turning to the right H0: = .5 (equally likely left or right) Ha: > .5 (majority of couples turn right) CLT? .5(124) = 62 > 10, assuming it’s a
representative sample Test statistic: z = 3.23 P-value = .000619 < .05 Reject H0
Conclude a majority of couples to the right as long as believe this is a representative sample
Example 3: Body Temperatures Observational units: healthy adults Variable: temperature (quantitative) Parameter: let represent the average body
temperature of a healthy adult Sample size is large (n = 130 > 30) and we
are assuming this is a representative sample H0: = 98.6oF Ha…. Test statistic: - 6.22…
Behavior of ns
x
/
ns
x
/
n
x
/
t distribution, Table B
Minitab/applet (p. 5)
Example 4: Golden Rectangles Sample is skewed to the right
Evidence that population is skewed to the right as well
Population is probably not normal so need large n. Sample size is 20, which isn’t all that large. Proceed with caution
Example 4: Golden Rectangles Let represent the ratio used by American Indians H0: = .618 (same ratio on average)
Ha: ≠ .618 (ratio used by American Indians differs) t = 2.05 with df = 19 P-value = .054 Weak evidence against H0
Not overwhelmingly convincing that the mean ratio used by American Indians differs from .618.
For Tuesday
Read Ch. 5 (confidence intervals) Complete PP 9 Turn in second project report (see syllabus
for details) Start HW 5
Online anonymous Blackboard survey by Thursday