inference about a population proportion. 1. paper due march 29 last day for consultation with me...
TRANSCRIPT
Inference about a population proportion.
1
• Paper due March 29• Last day for consultation with me March 22
2
Who prefers the RAZR?
• http://www.nytimes.com/2009/03/22/business/media/23mostwanted.html?_r=1&ref=media
3
4
Prediction
5
Prediction
6
Probabilistic Reasoning
• “The Achilles’ heel of human cognition.”
7
Probabilistic Reasoning
• “Men are taller than women”
• “All men are taller than all women”
8
Probabilistic Reasoning
• A probabilistic trend means that it is more likely than not but does not always hold true.
9
Probabilistic Reasoning
• Knowledge does not have to be certain to be useful.
• Individual cases cannot be predicted but trends can
BPS - 5th Ed. Chapter 19 10
• The proportion of a population that has some outcome (“success”) is p.
• The proportion of successes in a sample is measured by the sample proportion:
Proportions
sample the in nsobservatio of number totalsample the in successes of numberp̂
“p-hat”
BPS - 5th Ed. Chapter 19 11
Inference about a ProportionSimple Conditions
Confidence Intervals for Proportions
• Social media is poised to become a central player in the 2012
12
Example 19.5 page 508
• What proportion of Euros have cocaine traces?
• Sample 17 out of 20• 85%• Plus 4 method• 79%
13
n
ppzp
ˆˆˆ 1
Dealing with sampling error
• Confidence intervals • Hypothesis testing
Obtaining confidence intervals• estimate + or - margin of
error
Determining Critical values of Z
• 90% .05 1.645• 95% .025 1.96• 99% .005 2.576• Critical Values: values that mark off a
specified area under the standard normal curve.
19.25 page 517
• Do smokers know it is bad for them?
• Yes 848• Total 1010• 85%• Margin of error .2263• Lower limit .8170• Upper .8622
17
Problem 19.6 page 507
• What proportion of SAT takers have coaching?
• 427 coaching• 2733 did not• 3160 total• standard error 0.0061• margin of error
0.0157• upper 0.1508• lower 0.1195
18
n
ppzp
ˆˆˆ 1
19
Two-way tables
William P. Wattles, Ph.D.
Chapter 20
20
Categorical Data
• Examples, gender, race, occupation, type of cellphone, type of trash are categorical
21
Categorical Data• Sometimes
measurement data is grouped into categorical.
heart dis Freqless than 219.2 12219.2-247.9 13248-282 13>=282 13
22
Categorical Data
• Expressed in counts or percents
heart dis Freq %less than 219.2 12 24%219.2-247.9 13 25%248-282 13 25%>=282 13 25%
51 100%
Less than 219.2
219.2 to 247.9
248.0 to 282.0
More than 282.0
23
PopulationParameter
p = population proportion
Sample
phat=sample proportion
24
counttotal
successesofcount
proportionsamplep
ˆ
25
Two-way table
• Organizes data about two categorical variables
Column VariableRow variable column 1 column 2 column 3row1 # # # row1 totalrow2 # # # row2 total
col1 total col2 total col3 total
Chapter 6 26BPS - 5th Ed.
• Now we will study the relationship between two categorical variables (variables whose values fall in groups or categories).
• To analyze categorical data, use the counts or percents of individuals that fall into various categories.
Categorical Variables
Chapter 6 27BPS - 5th Ed.
• When there are two categorical variables, the data are summarized in a two-way table– each row in the table represents a value of the row
variable– each column of the table represents a value of the column
variable
• The number of observations falling into each combination of categories is entered into each cell of the table
Two-Way Table
Two-way table
28
25- 34 35- 54 55+No High School 4459 9174 14226 27859High School 11562 26455 20060 58077College 1- 3 10693 22647 11125 44465College 4 + 11071 23160 10597 44828
37785 81436 56008
Chapter 6 29BPS - 5th Ed.
• A distribution for a categorical variable tells how often each outcome occurred – totaling the values in each row of the table gives the
marginal distribution of the row variable (totals are written in the right margin)
– totaling the values in each column of the table gives the marginal distribution of the column variable (totals are written in the bottom margin)
Marginal Distributions
30
25- 34 35- 54 55+No High School 4459 9174 14226 27859High School 11562 26455 20060 58077College 1- 3 10693 22647 11125 44465College 4 + 11071 23160 10597 44828
37785 81436 56008
Chapter 6 31BPS - 5th Ed.
• It is usually more informative to display each marginal distribution in terms of percents rather than counts– each marginal total is divided by the table total to
give the percents
• A bar graph could be used to graphically display marginal distributions for categorical variables
Marginal Distributions
32
25- 34 35- 54 55+No High School 4459 9174 14226 15.9%High School 11562 26455 20060 33.1%College 1- 3 10693 22647 11125 25.4%College 4 + 11071 23160 10597 25.6%
21.6% 46.5% 32.0%
Chapter 6 33BPS - 5th Ed.
Case Study
Data from the U.S. Census Bureau for the year 2000 on the level of education reached by
Americans of different ages.
(Statistical Abstract of the United States, 2001)
Age and Education
Chapter 6 34BPS - 5th Ed.
Case StudyAge and Education
Variables
Marginal distributions
Chapter 6 35BPS - 5th Ed.
Case StudyAge and Education
Variables
Marginal distributions
21.6% 46.5% 32.0%
15.9%33.1%25.4%25.6%
Chapter 6 36BPS - 5th Ed.
Case StudyAge and Education
Marginal Distributionfor Education Level
Not HS grad 15.9%
HS grad 33.1%
College 1-3 yrs 25.4%
College ≥4 yrs 25.6%
Chapter 6 37BPS - 5th Ed.
• Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table– prevents misleading comparisons due to unequal
sample sizes for different groups
Conditional Distributions
Chapter 6 38BPS - 5th Ed.
Case StudyAge and Education
Compare the 25-34 age group to the 35-54 age group in terms of success in completing at least 4 years of college:
Data are in thousands, so we have that 11,071,000 persons in the 25-34 age group have completed at least 4 years of college, compared to 23,160,000 persons in the 35-54 age group.
The groups appear greatly different, but look at the group totals.
BPS - 5th Ed. Chapter 6 39
Case StudyAge and Education
Compare the 25-34 age group to the 35-54 age group in terms of success in completing at least 4 years of college:
Change the counts to percents: Now, with a fairer comparison using percents, the groups appear very similar.group age 54-35 for (28.4%) .284
81,435
23,160
group age 34-25 for (29.3%) .29337,786
11,071
Chapter 6 40BPS - 5th Ed.
Case StudyAge and Education
If we compute the percent completing at least four years of college for all of the age groups, this would give us the conditional distribution of age, given that the education level is “completed at least 4 years of college”:
Age: 25-34 35-54 55 and over
Percent with≥ 4 yrs college: 29.3% 28.4% 18.9%
Chapter 6 41BPS - 5th Ed.
• The conditional distribution of one variable can be calculated for each category of the other variable.
• These can be displayed using bar graphs.• If the conditional distributions of the second variable are
nearly the same for each category of the first variable, then we say that there is not an association between the two variables.
• If there are significant differences in the conditional distributions for each category, then we say that there is an association between the two variables.
Conditional Distributions
Chapter 6 42BPS - 5th Ed.
Case StudyAge and Education
Conditional Distributions of Age for each level of Education:
Cell phone preference
43
44
Marginal Distribution
• Row and column totals • Provides counts or percents of one variable
45
Conditional Variable
• Each value as a Percent of the marginal distribution
46
Two-way Tables
• Do you think the Bush administration has a clear and well-thought-out policy on Iraq, or not?
• new yorkers USA
Yes 42% 59%No 48% 35%No opinion
10% 6%
47
Relationships between categorical variables
Risks of SoccerElite non-elite did not play
Arthritis 10 9 24No Arthritis 61 206 548
48
Relationships between categorical variables
Risks of SoccerElite non-elite did not play
Arthritis 10 9 24 43No Arthritis 61 206 548 815
71 215 572 8588% 25% 67%
49
Relationships between categorical variables
• Calculate percent of players who had arthritis
50
Relationships between categorical variables
• Calculate percent of players who had arthritis
Risks of Soccer Percent with ArthritisElite non-elite did not play
Arthritis 14.1% 4.2% 4.2%No Arthritis 85.9% 95.8% 95.8%
51
Categorical data
• Smoking Data
SmokingNeither parent smokes
one parent smokes
both parents smoke
Student does not smoke 1168 1823 1380Student smokes 188 416 400
52
Categorical data
• Smoking Data
SmokingNeither parent smokes
one parent smokes
both parents smoke
Student does not smoke 1168 1823 1380 4371Student smokes 188 416 400 1004
1356 2239 1780 5375
53
Categorical data
• Smoking Data
SmokingNeither parent smokes
one parent smokes
both parents smoke
Student does not smoke 86.1% 81.4% 77.5%Student smokes 13.9% 18.6% 22.5%
54
Student Smoking
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
Neither parent smokes one parent smokes both parents smoke
sm
ok
e
55
Evaluating Treatment
better not betterswim 200 75no swim 50 15
56
Evaluating Treatment
better not betterswim 200 75 275no swim 50 15 65
250 90 340
57
better not betterswim 200 75 275no swim 50 15 65
250 90 340
Percent improvedbetter not betterswim 73% 27% 100%no swim 77% 23% 100%
58The End