section 2.6 relations in categorical variables so far in chapter two we have dealt with data that is...
Post on 20-Dec-2015
215 views
TRANSCRIPT
Section 2.6 Relations in Categorical Variables
So far in chapter two we have dealt with data that is quantitative.
In this section we consider categorical data.
Suppose we measure two variables in an individual, and both of those variables are categorical in nature. How can we display their association if there is any?
Section 2.6 Relations in Categorical Variables
Consider the situation where 400 individuals are classified as having received a vaccine and whether the vaccine helped ward off the illness which it was intended for.
One method is to display the information in a table. In the table we would write in the appropriate counts for each category. This table is known as a Two-Way Table, if we have measured two variables.
Section 2.6 Relations in Categorical Variables
Consider the situation where 400 individuals are classified as having received a vaccine and whether the vaccine helped ward off the illness which it was intended for.
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
Treatment
Med
ical
co
nditi
onTwo - Way Table
Margins
The rows and columns that contain the totals are considered the margins.
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
Treatment
Med
ical
co
nditi
onTwo - Way Table
Margins
A nice method of creating a picture with this table is to use bar graphs. However, a single bar graph can not capture all the information that is shown. We must choose what it is we want to see.
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Marginal Distributions
Margins
If we create a bar graph that graphs the margins, this is known as a marginal distribution. And since we have two categories then we need to bar graphs to show both categories.
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Marginal Distributions
Marginal Distribution for Treatment
0
50
100
150
200
250
300
Vaccinated Not Vaccinated
Treatment
Freq
uenc
y
Marginal Distribution for Medical Condition
0
50
100
150
200
250
300
Attacked Not AttackedMedical Condition
Fre
qu
en
cy
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Conditional Distributions
If we only consider a particular row or column then the graph is considered a conditional distribution.
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Conditional Distributions
If we only consider a particular row or column then the graph is considered a conditional distribution.
Suppose that we wish to consider only the medical condition where a person has been attacked, and find out if being vaccinated resulted in less cases of attacks compared to not being vaccinated.
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Conditional Distributions
People Who Have Been Attacked
0102030405060708090
Vaccinated Not Vaccinated
Treatment
Fre
qu
en
cy
This bar graph is based on the condition that we only are considering people who have been attacked.
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Conditional Distributions
Medical Condition Based on Treatment-No Vaccination
0
20
40
60
80
100
Attacked Not Attacked
Medical Condition
Fre
qu
ency
This bar graph of the medical condition is based on the condition that the person is not vaccinated.
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Conditional Distributions
Usually percentages can add to the understanding of a distribution. We could create a table based on percentages of the total (marginal), or percentages based on a column or row (conditional).
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400
TreatmentM
edic
al
cond
ition
Conditional Distributions
Vaccinated Not Vaccinated Total
Attacked 60/400=.15 85/400=.2125 0.3625Not Attacked 190/400 = .475 65/400=.1625 0.6375
Total 250/400=.625 150/400=.375 1
Treatment
Med
ical
co
nditi
on
Section 2.6 Relations in Categorical Variables
Marginal Distributions
Vaccinated Not Vaccinated Total
Attacked 60/400=.15 85/400=.2125 0.3625Not Attacked 190/400 = .475 65/400=.1625 0.6375
Total 250/400=.625 150/400=.375 1
Treatment
Med
ical
co
nditi
on
Marginal Distribution for Medical Condition
0
0.2
0.4
0.6
0.8
Attacked NotAttackedMedical Condition
Rela
tive
Freq
uenc
y
Section 2.6 Relations in Categorical Variables
Conditional Distributions
Vaccinated Not Vaccinated Total
Attacked 60/250=.24 85/150=.57 0.3625Not Attacked 190/250 = .76 65/150=.43 0.6375
Total 250/250=1 150/150=1 1Medic
al co
nditio
n
Conditional Distribution Based on a Person Being Vaccinated
00.10.2
0.30.40.50.6
0.70.8
Attacked Not Attacked
Medical Condition
Re
lati
ve F
req
ue
ncy
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400Medic
al co
nditio
n
What percentage of the people who were attacked are vaccinated?
60/145 .414 rounded
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400Medic
al co
nditio
n
What percentage of the people are vaccinated?
250/400 = .625
Section 2.6 Relations in Categorical Variables
Vaccinated Not Vaccinated Total
Attacked 60 85 145Not Attacked 190 65 255
Total 250 150 400Medic
al co
nditio
n
What percentage of the people who are not vaccinated were not attacked?
65/150 .433
Simpson’s Paradox
An association or comparison that holds for several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s paradox (page 200).
Simpson’s Paradox
Page 207 problem 96.
Yes No Yes NoWhite Victim 19 132 White Victim 11 52Black Victim 0 9 Black Victim 6 97
White DefendantDeath Penalty
Black DefendantDeath Penalty
This is a three-way table because there are three categories: race of defendant, race of victim, death penalty verdict. In order to show all three categories two tables are needed.
Simpson’s Paradox Page 207 problem 96.
Yes No total Yes No totalWhite Victim 19 132 151 White Victim 11 52 61Black Victim 0 9 9 Black Victim 6 97 103
19 141 17 149
White DefendantDeath Penalty
Black DefendantDeath Penalty
Yes No Yes NoWhite Victim 0.125828 132 White Victim 0.174603 52Black Victim 0 9 Black Victim 0.058252 97
White DefendantDeath Penalty
Black DefendantDeath Penalty
Let us look at the percentage of time the death penalty is given depending on the race of the defendant.
Notice that the black defendant receives the death penalty more often, regardless of the race of the victim as compared to the white defendant.
Simpson’s Paradox Page 207 problem 96.
Yes No TotalWhite def 19 141 160Black def 17 149 166
36 290
Death PenaltyYes No Total
White def 0.1188 0.8813 160Black def 0.1024 0.8976 166
36 290
Death Penalty
Yes No total Yes No totalWhite Victim 19 132 151 White Victim 11 52 61Black Victim 0 9 9 Black Victim 6 97 103
19 141 17 149
White DefendantDeath Penalty
Black DefendantDeath Penalty
Yes No Yes NoWhite Victim 0.125828 132 White Victim 0.174603 52Black Victim 0 9 Black Victim 0.058252 97
White DefendantDeath Penalty
Black DefendantDeath Penalty
When we remove the category “victims race” by combining the tables, the result of this is that the white defendant receives the death penalty more often, 11.88%, than the black defendant, 10.24%.
THE END