proportion estimation: 1 point and confidence interval estimation of a population proportion,
Post on 20-Dec-2015
220 views
TRANSCRIPT
Proportion Estimation: 1
Point and Confidence Interval Estimation
of a Population Proportion,
Proportion Estimation: 2
We are frequently interested in estimating the
proportion of a population with a characteristic of
interest, such as:
• the proportion of smokers
• the proportion of cancer patients who survive at
least 5 years
• the proportion of new HIV patients who are
female
Proportion Estimation: 3
• If we take a random sample from a population
• observe the number of subjects with the characteristic of interest (# of “successes”)
• we are observing a binomial random variable.
Now, however, we will focus on
• estimating the true proportion , , in the population
• rather than focusing on the count.
Proportion Estimation: 4
Again, one way to deal with this type of data is to define a random variable X that can take two values:
X = 1, if characteristic is present – a “success”
X = 0, if characteristic is absent – a “failure”
Then
• if we sum all values in a population,
• we are summing zeros and ones –
• this will give a count of the number of individuals in the population with the characteristic:
1
#n
ii
X of successes
Proportion Estimation: 5
1
#1 N
ss
successesx proportion of successes
N N
The population mean is the Proportion of individuals in population with the characteristic:
The sample proportion is then:
1
#1 n
ii
successesp X X
n n
Therefore, p is the estimator of , the proportion with a characteristic of interest.
Proportion Estimation: 6
By the Central Limit Theorem, we know, for n large
even when X is not normally distributed.
When X is a 0,1 variable, for n large we know from the central limit theorem.
2~ ( , / )X N n
2~ ( , / )X P N n ( )Approximately
Proportion Estimation: 7
What is the variance, 2, for a 0,1 variable?
We know
2 2 2
1 1
1 1( ) ( )
N N
s ss s
x xN N
By use of algebra, and the fact that xs2 = xs.
for a 0,1 variable,
we can show that 2 (1 )
Proportion Estimation: 8
2 2 2 2
1 1
1 1( ) 2
N N
s s ss s
x x xN N
2 2
1 1 1
12
s
N N N
ss s s
x xN
2
1 1
1 12
s
N N
ss s
Nx x
N N N
2 22 (1 )
For those who want the algebra:
expand
x2 = x, for 0,1
sum over constant
Proportion Estimation: 9
Hence,
The standard error of the sample proportion is
2 (1 )
22 2 (1 )
PX n n
(1 )P n
Standard error of P:
Proportion Estimation: 10
We also know, by the central limit theorem, that for
large n, P is approximately normally distributed:
For Estimation of the population proportion, :
Point Estimate: Confidence Interval Estimate:
(1 )~ ,P N
n
1
1 n
ii
P Xn
1 / 2( )( )PP z
Proportion Estimation: 11
Example: Suppose that a sample of 1000 voters is taken to determine presidential preference.
In this sample, 585 persons indicated that they would vote for candidate A.
Construct a 95% confidence interval estimate for the true proportion, , in the population planning to vote for candidate A.
The confidence interval for takes the form:
1 / 2 1 / 2( ) ( )PX z se P z
Proportion Estimation: 12
1. The point estimate of the proportion is:p= (585/1000) = .585
2. The 95% confidence interval estimator of is
However we don’t know , so we will use p in it’s place to estimate the standard error:
1 / 2 .975
(1 )( )( ) ( )PP z P z
n
.975
(1 ) .585(.415)( ) .585 (1.96)
1000
p pp z
n
.585 (1.96)(.0156)
.585 (.0305)
Proportion Estimation: 13
The 95% CI on the proportion preferring Candidate A is (.554, .616).
This does not include the value .50:
Either we obtained an unusually large sample mean (such that the interval estimate did not overlap µ=0.5) if µ really is .5, or the population mean is not .5, suggesting that candidate A will win the election.
Proportion Estimation: 14
When is the sample large enough to use the normal approximation to the binomial?
When (n)(π)5, and (n)(1-π)5
That is,
when both the expected number of successes and the expected number of failures is greater than 5.
Proportion Estimation: 15
Aside: improve to the normal Appoximation for a Binomial
• The Binomial distribution is discrete, while the normal distribution is continuous. When the true proportion,π, is known, we can match the binomial distribution better to a normal distribution by including a correction. The correction is called the ‘continuity correction’.
1 2( ),x x
P Pn n
We use instead the normal approximation for the probability
1 21 / 2 1 / 2( )x x
P Pn n
• For example, when π = .5, and n = 10, to approximate
Proportion Estimation: 16
Example of ‘Continuity Correction' to the Normal Approximation to the Binomial.
Suppose π = .5 and n = 16. Compare the exact normal approximation and continuity corrected values of P(.4375 ≤ P ≤ .5).
• From Binomial Table:
• Using Normal Approximation, no correction
• Using Correction:
7 8(.4375 .5) ( ) .1746 .1964 0.371
16 16P P P P
.4375 0.5 .5 .5 .0625( ) ( 0) ( .5 0) .1915
0.125 0.125 .125P z P z P z
6.5 8.5( ) (.40625 .53125)16 16( 0.75 0.25) .5987 .2266 .3721
P P P P
P z
Proportion Estimation: 17
Using P in place of to estimate the standard error p:
1.If (n)(π)5 and (n)(1-π)5, use P:
2.Otherwise, a) Assume π=.5,or b) use an ‘exact ’method for the CI
We do this to avoid underestimating the variance,
(1– ) which is at a maximum when =.5
Don’t use Student’s t with proportions since the assumption of normality of the underlying population elements is not satisfied by a 0,1 variable.
(1 )( )
P Pse P
n
.5(1 .5)( )se P
n
Proportion Estimation: 18
What do we use when the normal approximation is not appropriate?
Exact Binomial Confidence Intervals for can be computed:
Solve for x in the following and then substitute into p= x/n:
Lower Limit:
Upper Limit:
Clearly, exact binomial CI is not simple to compute
0
Pr( | ) (1 )x
k n kn k
k
X x p C p p
Pr( | ) (1 )n
k n kn k
k x
X x p C p p
Proportion Estimation: 19
Go to Minitab or other software
Stat Basic Statistics 1 Proportion
Leave blank for Binomial CI;
Check for Normal approx.
n
x
Proportion Estimation: 20
EXACT Binomial:Test and CI for One Proportion
Test of p = 0.5 vs p not = 0.5
ExactSample X N Sample p 95.0% CI P-Value1 585 1000 0.585 (0.553748, 0.615750) 0.000
Normal Approximation:Test and CI for One Proportion
Test of p = 0.5 vs p not = 0.5
Sample X N Sample p 95.0% CI Z-Value P-Value1 585 1000 0.585 (0.554461, 0.615539) 5.38 0.000
Proportion Estimation: 21
Sample Size Estimation when the goal is Estimating a Population Proportion,
The pattern is the same as when goal is estimation of a mean:
If we know
• the desired precision (width of interval)
• confidence level
• “guess” of the proportion to get std error
we can estimate the sample size, n.
Proportion Estimation: 22
The width of a confidence interval for P is:
w = 2[z1-/2 (P)] ,
where P is the standard error of P
( )P P + z1-/2(P)P – z1-/2(P)
w
Using
we have
(1 )P n
1 / 2
(1 )2( )w z
n
Proportion Estimation: 23
Solving for n gives us
21 / 2
2
4( ) (1 )zn
w
Note:
• this requires information about which is our goal!
• However, (1–) is at a maximum when =.5
• To be conservative
• (over- rather than under-estimate sample size)
• use (.5) in place of
Proportion Estimation: 24
Substituting in .5 for gives a conservative sample size estimator of:
2 21 / 2 1 / 2
2 2
4( ) (1 ) 4( ) .5(.5)z zn
w w
21 / 2
2
( )zn
w
Proportion Estimation: 25
Example:
For an election poll, how many voters should be surveyed to estimate the proportion, to within 5%, in favor of re-electing the current mayor, with 95% confidence?
1. We have a confidence level, 1– = .95 z.975 = 1.96
2. We have a desired width of 5% = .05, w = .10
Conservative: n = (z1-a/2)2/w2 = (1.96)2/(.10)2 = 384.16
We should poll 385 voters to achieve a 95% CI of 5%
Proportion Estimation: 26
What if we have some information on ?
A previous poll tells us that the current office-holder had ~ 75% of the voter support.
Assuming = .75:
n = 4(1–)(z1-/2)2/w2
= 4(.75)(.25)(1.96)2/(.10)2 = 288.12
Using available information
• we get a sample size estimate of 289 voters
• which can save us considerable time and expense, compared to the more conservative estimate.
Proportion Estimation: 27
Confidence Interval Calculation for the
Difference between two proportions, 1 – 2,
Two independent groups
We are often interested in comparing proportions from 2 populations:
• Is the incidence of disease A the same in two populations?
• Patients are treated with either drug D, or with placebo. Is the proportion “improved” the same in both groups?
Proportion Estimation: 28
Suppose we take independent, random samples from two groups, and estimate a proportion in each.
For large enough sample size, we know:
1 11 1
1
(1 )~ ,P N
n
2 22 2
2
(1 )~ ,P N
n
Then the standard error of the difference between the sample proportions is the square root of the sum of the variances:
1 2
1 1 2 2
1 2
(1 ) (1 )P P n n
Proportion Estimation: 29
Or, since we don’t know the true proportions, the sample estimate of the standard error:
1 1 2 21 2
1 2
(1 ) (1 )( )
P P P Pse P P
n n
Thus, for n large, the (1) confidence interval estimator is:
1 1 2 21 2 1 / 2
1 2
(1 ) (1 )( ) ( )
P P P PP P z
n n
Proportion Estimation: 30
Example:
In a clinical trial for a new drug to treat hypertension, 50 patients were randomly assigned to receive the new drug, and 50 patients to receive a placebo.
34 of the patients receiving the drug showed improvement, while 15 of those receiving placebo showed improvement.
Compute a 95% confidence interval estimate for the difference between proportions improved.
Proportion Estimation: 31
1. Point Estimate of (1 – 2):
p1 = 34/50 = .68
p2 = 15/50 = .30 (p1 – p2)= .68 – .30 = .38
2. Since we have n1 = n2 = 50, our sample size is
large enough to use the sample estimate of standard error:
1 1 2 21 2
1 2
(1 ) (1 )( )
p p p pse p p
n n
.68(.32) .30(.70).0923
50 50
Proportion Estimation: 32
3. Confidence coefficient:
For 1 – = .95, z1-/2 = z.975 = 1.96
4. Confidence Interval Estimate:
The 95% CI estimate is:
(.199 , .561) or (19.9% , 56.1%)
The difference between proportions improved is bounded away from zero – it seems that the proportion improved by the drug is clearly greater than the proportion by placebo.
1 1 2 21 2 1 / 2
1 2
(1 ) (1 )( ) ( )
P P P PP P z
n n
(.38) (1.96)(.0923) .38 .181
Proportion Estimation: 33
Using Minitab: Stat Basic Statistics 2 Proportions
Enter sample sizes n1 and n2
Enter # of successes x1 and x2
Proportion Estimation: 34
Test and Confidence Interval for Two Proportions
Sample X N Sample p
1 34 50 0.680000
2 15 50 0.300000
Estimate for p(1) - p(2): 0.38
95% CI for p(1) - p(2): (0.198748, 0.561252)
Proportion Estimation: 35
The same cautions apply here, as for estimates for a single proportion
• the sample size should be large enough in each group, so that the normal approximation will hold:
• nπ5 and n(1-π)5 for each sample
• Otherwise: a) use .5 in place of π when estimating the variance for the confidence interval.b) use some other method.
• Minitab offers the option to compute a pooled estimate of the standard error
Proportion Estimation: 36
And in summary:
Confidence interval estimates provide
• a range of likely values
• an associated probability, or confidence level.
The width of the confidence interval depends upon:
• The underlying variability in the population
• The sample size
• The confidence level
Proportion Estimation: 37
It is important to keep track of assumptions that we must make about the data:
• Samples should be selected randomly
• selection of any element is independent of selection of any others
• For many cases, we must assume that the underlying population follows a normal distribution
• without this assumption, probabilities computed using the
t-distribution
2–distribution
F-distribution may not be correct.
Proportion Estimation: 38
When we speak of “knowing” the population variance, 2,
• we really mean that we have an outside source of information
• previous research, census data, etc.
• the key is that we are not using the sample estimate, s2, based upon the current sample.
Proportion Estimation: 39
The key to confidence interval estimation is to know
• what parameter you are estimating
• the point estimate of the parameter
• the confidence level
• what distributional assumptions are required
• the associated distribution for computing probabilities.
I have started a summary table for you below – completing this table will be a good review exercise.
Proportion Estimation: 40
Distribution of data
Parameter to Estimate
Point Estimate
(1 –) Confidence Interval Estimate
N( , 2) 2 known:
2 unknown:
Any,
n large
For n large:
2 known:
Bin (n,) P
N( , 2) 2 S2
N( , 2)
N( , 2)
–
Bin (n1,)
Bin (n2,)
–
N( , 2)
N( , 2)
X
X
1 / 2 /X z n
1;1 / 2 /nX t S n
1 / 2 /X z n
1 / 2 /X z S n