![Page 1: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/1.jpg)
AP Statistics Section 4.2
Relationships Between Categorical Variables
![Page 2: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/2.jpg)
We will now describe relationships between two or more categorical
variables. Some variables, such as race or sex are categorical by nature. Other categorical variables are created by
grouping values of a quantitative variable into classes. To analyze categorical data, we use groups or classes of individuals
that fall into various categories.
![Page 3: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/3.jpg)
The table below presents Census Bureau data describing the age and sex of college students.
This is a _____-_____ table because it describes two categorical variables. (Age is categorical
here because the students are grouped into age categories.) Age group is the ______ variable
because each row in the table describes students in one age group. Sex is the ________
variable because each column describes one sex. The entries in the table are the counts of
students in each age-by-sex class.
two way
row
column
![Page 4: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/4.jpg)
![Page 5: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/5.jpg)
Discrepancies may appear in tabular data. For example, the sum of entries in the “25 to 34” row is_________________. The entry in
the total column is ______. The explanation is _________ error.
493,3589,1904,1 494,3
rounding
![Page 6: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/6.jpg)
To best grasp the information contained in the table, first look at the distribution of
each variable separately. The distributions of sex alone and age alone are called __________________ because they
appear at the right and bottom margins of the two-way table. The distribution of a categorical variable says how often each
outcome occurred. Usually it is advantageous to look at percents as
opposed to counts.
marginal distributions
![Page 7: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/7.jpg)
Example 1: Determine the percent of college students in each age group.
%9.
16639
150
15.8% 21.0% 62.3%
![Page 8: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/8.jpg)
Each marginal distribution from a two-way table is a distribution for a
single categorical variable. We could use a pie graph or bar graph
to display such a distribution.
![Page 9: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/9.jpg)
The marginal distributions of sex and age do not tell us how the two variables are related. How can we describe the relationship between age and sex of college students? To
describe relationships among categorical variables, calculate appropriate percents from the
counts given.
![Page 10: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/10.jpg)
Example 2: Complete the tables below which give the conditional distribution of sex, given age.
54.7%
10365
5668
63.1%
2630
1660
45.3%
36.9%
![Page 11: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/11.jpg)
When we compare the percent of women in two age groups we are comparing conditional distributions. Comparing conditional distributions reveals the nature of the association between the sex and age of college students. Look at the bar graph at the right. The heights do not differ greatly but women are most common among the __________ age group.35 or older
![Page 12: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/12.jpg)
Example 3: Complete the tables below which give the conditional distribution of age, given sex.
Male students are more likely to be __________ years old and quite a bit less likely to be _________.
.01% 60.8% 20.4% 17.8%
0.8% 64.2% 21.7% 13.3%
2418 1715
![Page 13: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/13.jpg)
CAUTION: No single graph (such as a scatterplot) portrays the form of
the relationship between categorical variables. No single
numerical measure (such as correlation) summarizes the strength of the association.
![Page 14: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/14.jpg)
As is the case with quantitative variables, the effects of lurking variables can change or even
reverse relationships between two categorical variables.
![Page 15: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/15.jpg)
Example 4: Accident victims are sometimes taken by helicopter from the accident scene to a hospital. Helicopters save time. Do they also save lives?
Complete the table at the right.
32% 24%68% 76%
![Page 16: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/16.jpg)
Notice that a greater percentage of helicopter patients died. How
discouraging.
But wait.
![Page 17: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/17.jpg)
Here’s the data broken down by the seriousness of the accident.
Complete the tables at the right.
48% 60%52% 40%
16% 20%84% 80%
![Page 18: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/18.jpg)
Among serious accident victims, the helicopter saves 52% compared with
40% for road transport. For less serious accidents, 84% of those transported by helicopter survive, versus 80% of those transported by road. Both groups have a higher survival rate when transported
by helicopter.
![Page 19: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/19.jpg)
The reason for the paradox in the data is the helicopter carries
patients who are more likely to die. The seriousness of the accident
was the _______ variable.lurking
![Page 20: AP Statistics Section 4.2 Relationships Between Categorical Variables](https://reader035.vdocuments.us/reader035/viewer/2022072005/56649ce25503460f949adb6c/html5/thumbnails/20.jpg)
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called
_________ paradox.Simpson’s