Download - AP Exam Prep: Essential Notes
AP Exam Prep: Essential Notes
Chapter 11: Inferencefor Distributions
11.1 Inference for Means of a Population11.2 Comparing Two Means
Moving away from z … In chapter 10, when we knew σ, we calculated
a z-score for a particular mean as follows:
n
xz
/
Now, we do not know σ, so we calculate a t-score, which provides somewhat of a “fudge-factor” because we do not know σ, but must estimate it from the sample :
ns
xt
/
Standard error
of the mean
One-sample t-procedures (p. 622)
Confidence interval:
n
stx *
Hypothesis test:
ns
xt
/0
In both cases, σ is unknown.
Matched Pairs t Procedures Matched pairs designs: subjects are matched
in pairs and each treatment is given to one subject in the pair (randomly).
One type of matched pairs design is to have a group of subjects serve as their own pair-mate. Each subject then gets both treatments (randomize the order).
Apply one-sample t-procedures to the observed differences.
Example 11.4, p. 629 Note H0
Look at Figure 11.7, p. 631
Conditions for Inference about a Mean (p. 617)
SRS Observations from the population
have a normal distribution with mean µ and standard deviation σ. Symmetric and single-peaked
essential.
Using t-procedures
See Box, p. 636 SRS very important! n<15: do not use t-procedures if the data are
clearly non-normal or if outliers are present. n at least 15: t-procedures can be used
except in the presence of outliers or strong skewness.
n at least 40: t-procedures can be used for even clearly skewed distributions.
By CLT
11.2 Comparing Two Means
The goal of two-sample inference problems is to compare the responses of two treatments or to compare the characteristics of two populations.
We must have a separate sample from each treatment or each population.
Unlike the matched-pairs designs.
A two-sample problem can arise from a randomized comparative experiment that randomly divides subjects into two groups and exposes each group to a different treatment.
Conditions for Significance TestsComparing Two Means (p. 650)
Two SRSs from distinct populations. Samples are independent (matching violates this
assumption). We measure the same variable for each sample.
Both populations are normally distributed. Means and standard deviations of both are unknown.
Two-sample t-test
The appropriate t-statistic is as follows. The degrees of freedom calculation is complex; we will use our calculators to provide this for us (the df are usually not whole numbers for two-sample tests).
2
22
1
21
2121 )()(
ns
ns
xxt
=0 for the H0:µ1=µ2
Two-sample confidence intervalfor µ1-µ2
Draw an SRS of size n1 from a normal population with unknown mean µ1, and draw an independent SRS of size n2 from a normal population with unknown mean µ2. The confidence interval for µ1-µ2 is given by the following:
2
22
1
21*
21 )(n
s
n
stxx
Again, we need the df for t*, but we will let the calculator do that for us.
Using t-procedures for two-sample analyses
See Box, p. 636 SRS very important! n1+n2<15: do not use t-procedures if the
data are clearly non-normal or if outliers are present.
n1+n2 at least 15: t-procedures can be used except in the presence of outliers or strong skewness.
n1+n2 at least 40: t-procedures can be used for even clearly skewed distributions.
By CLT
Chapter 12: Inference for Proportions
12.1 Inference for a Population Proportion12.2 Comparing Two Proportions
Conditions for Inference abouta Proportion (p. 687)
SRS N at least 10n For a significance test of H0:p=p0:
The sample size n is so large that both np0 and n(1-p0) are at least 10.
For a confidence interval: n is so large that both the count of successes, n*p-
hat, and the count of failures, n(1 - p-hat), are at least 10.
Normal Sampling Distribution
If these conditions are met, the distribution of p-hat is approximately normal, and we can use the z-statistic:
npp
ppz
)1(
^
Inference for a Population Proportion
Confidence Interval:
Significance test of H0: p=p0:
n
ppzp
)1(^^
*^
npp
ppz
)1( 00
0
^
Choosing a Sample Size (p. 695)
Our guess p* can be from a pilot study, or we could use the most conservative guess of p*=0.5.
Solve for n. Example 12.9, p. 696.
mn
ppZ
)1( ***
Conditions: Confidence Intervals for Comparing Two Proportions
SRS from each population N>10n All of these are at least 5:
)1(
)1(
2
^
2
2
^
2
1
^
1
1
^
1
pn
pn
pn
pn
Calculating a Confidence Interval for Comparing Two Proportions (p. 704)
2
2
^
2
^
1
1
^
1
^
*
2
^
1
^ )1()1()(
nnz
pppppp
Significance Tests forComparing Two Proportions
The test statistic is:
21
21^
nn
XXp
21
^^
^
2
^
1
11)1(
nnpp
ppz
Where,
Conditions: Significance Test for Comparing Two Proportions
SRS from each population N>10n All of these are at least 5:
)1(
)1(
2
^
2
2
^
2
1
^
1
1
^
1
pn
pn
pn
pn
Chapter 13: Chi-Square Procedures
13.1 Test for Goodness of Fit
13.2 Inference for Two-Way Tables
M&Ms Example
Sometimes we want to examine the distribution of proportions in a single population.
As opposed to comparing distributions from two populations, as in Chapter 12.
Does the distribution of colors in your bags match up with expected values?
We can use a chi-square goodness of fit test. Χ2
We would not want to do multiple one-proportion z-tests.
Why?
Performing a X2 Test
1. H0: the color distribution of our M&Ms is as advertised:
Pbrown=0.30, Pyellow=Pred=0.20, and Porange=Pgreen=Pblue=0.10
Ha: the color distribution of our M&Ms is not as advertised.
2. Conditions:1. All individual expected counts are at least 1.2. No more than 20% of expected counts are less than 5.
3. Chi-square statistic: EEOX /)( 22
Section 13.2 (Two-way tables)
Example 13.4, pp. 744-748
Is there a difference between proportion of successes?
At left is a two-way table for use in studying this question.
Explanatory Variable: Type of Treatment
Response Variable: Proportion of no relapses
Relapse?
Treatment No Yes Total
Desipramine
14 10 24
Lithium 6 18 24
Placebo 4 20 24
Total 24 48 72
Expected Counts and Conditions
All expected counts are at least 1, no more than 20% less than 5.
totaltable
alcolumn tot totalrowcount Expected
Chapter 14: Inference about the Model
y = 3.9951x + 4.5711
R2 = 0.9454
181920212223242526
3.5 4.0 4.5 5.0
Fiber Tenacity, g/den
Fabr
ic Te
nacit
y, lb
/oz/y
d^2
Confidence Intervals for the Regression Slope (p. 788)
If we repeated our sampling and computed another model, would we expect a and b to be exactly the same?
Of course not, given what we’ve learned about random variation and sampling error!
We are interested in the true slope (β), which is unknowable, but we are able to estimate it.
Confidence Interval for the slope β of the true regression line:
bxay ^
dfnt
SEtb b
)2(*
*
Given in output from stats package.
Is β=0?
H0: β=0 vs. Ha: β ≠0 or β>0 or β<0
Perform a t-test:
bb SE
b
SE
bt
0