crosstabs & measures of association
DESCRIPTION
Crosstabs & Measures of Association. POL242 October 9 and 11, 2012 Jennifer Hove. Questions of Causality. Recall: Most causal thinking in social sciences is probabilistic, not deterministic: as X increases, the probability of Y increases, not that X invariably produces Y - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/1.jpg)
Crosstabs & Measures of Association
POL242October 9 and 11, 2012
Jennifer Hove
![Page 2: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/2.jpg)
Questions of CausalityRecall:
Most causal thinking in social sciences is probabilistic, not deterministic: as X increases, the probability of Y increases, not that X invariably produces Y
We can observe only association per HumeWe must therefore infer causationNot one, but many possible causes
![Page 3: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/3.jpg)
Inferring Causal Relations1. There must be association
X Y; ~X ~Y2. Time order must be considered
Presumed cause should precede presumed effect3. Must rule out possible rival explanations
Sometimes what appears to be a strong relationship between two variables is due to influence of others
4. Must be able to identify the process by which one factor brings about change in anotherCausal linkage
![Page 4: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/4.jpg)
Establishing AssociationWith nominal or ordinal data, relationships usually
presented in tabular or table formWhy? Hypotheses rest on core idea of comparison
Ex: if we compare respondents on basis of their value on the IV, say party identification, they should also differ along DV, say support for gay rights
Crosstabs are a wonderful means of making comparisons
“God speaks to you through crosstabs!”
![Page 5: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/5.jpg)
Using/Interpreting CrosstabsData arranged in side-by-
side frequency distributionsIV (X) presented across the
top of the table – in columns If ordinal, arrange from low
scores (on left) to high scores (on right)
DV (Y) presented down the left hand side of the table – in rowsAgain, if ordinal, arrange
from low (at top) to high (at bottom)
Low HighAll
Respondents86.1%(173)
52.7%(355)
60.4%(528)
13.9(28)
47.3(318)
39.6(346)
Tau-b=.29Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007
100(201)
100(673)
100(874)
Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007
Low
High
Total (N)
Fear of Taliban Resurgence
Support for Afghan Mission
![Page 6: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/6.jpg)
Using/Interpreting CrosstabsData presented so that
categories of the IV add to 100%Percentaging within
categories of the IV (down in a table)
Comparisons are made across categories of the IVFrom left to rightTo see the effect of
the IV on the DV
Low HighAll
Respondents86.1%(173)
52.7%(355)
60.4%(528)
13.9(28)
47.3(318)
39.6(346)
Tau-b=.29Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007
100(201)
100(673)
100(874)
Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007
Low
High
Total (N)
Fear of Taliban Resurgence
Support for Afghan Mission
![Page 7: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/7.jpg)
Rules (!) of Crosstabs1. Make the IV define the columns and the DV define
the rows of the table
2. Always percentage down within categories of the IV
3. Interpret the relationship by comparing across columns, within rows of the table
![Page 8: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/8.jpg)
Example: 2 x 2 CrosstabSupport for Y Variable by Support for X Variable
Score on X Variable Low High
Score on Y Variable
Low A B A + B High C D C + D
A + C B+ D
Low HighAll
Respondents86.1%(173)
52.7%(355)
60.4%(528)
13.9(28)
47.3(318)
39.6(346)
100(201)
100(673)
100(874)
Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007
Low
High
Total (N)
Fear of Taliban Resurgence
Support for Afghan Mission
![Page 9: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/9.jpg)
DiagonalsMain diagonal: running to the right and down
When larger proportion of cases fall on main diagonal, relationship is said to be direct or positive
Low values on X associated with low values on Y; high values on X associated with high values on Y
Score on X Variable Low High
Score on Y Variable
Low A B A + B High C D C + D
A + C B+ D
![Page 10: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/10.jpg)
DiagonalsOff diagonal: running to the right and up
When larger proportion of cases fall on off diagonal, relationship is said to be inverse or negative
Low values on X associated with high values on Y; high values on X associated with low values on Y
Score on X Variable Low High
Score on Y Variable
Low A B A + B High C D C + D
A + C B+ D
![Page 11: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/11.jpg)
Explaining Variation in YRelationships between variables in social sciences are
rarely, if ever, perfectly predictableYou are unlikely to see something like this:
Support for Y Variable by Support for X Variable
Low HighLow 100% 0High 0 100%Total 100 100
Score on X Variable
Score on Y Variable
![Page 12: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/12.jpg)
Explaining Variation in YThere is likely to be more than one explanation or
“cause” behind the variation in YSo we will generally be looking at:
X1 YX2 Y
To compare, we want to know relative strength of each relationship
A variety of summary terms called measures of association are used
![Page 13: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/13.jpg)
Measures of AssociationCompress information that appears in a crosstab
into a single number by summarizing:Magnitude (strength) of the relationshipDirection of the relationship
Magnitude: ranges from 0 (completely unpredictable) to 1 (perfectly predictable)
Direction: positive (+) = cases primarily on main diagonal; negative (-) = cases primarily on off diagonal
![Page 14: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/14.jpg)
Two Cautionary NotesDirection is not useful with nominal-level variables,
since they are not ordered/ranked from low to highEven with ordinal measurement, interpretation of
direction depends entirely on how your variables are codedShould always code your variables so that high scores
indicate “more” of what you want to explain
![Page 15: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/15.jpg)
Direction & StrengthCombining direction & strength, we get a range
of possibilities
All intermediary values can also occur, e.g. -.2367Note that equivalent positive and negative scores are
equal in strengthEx: +.4 and -.4 are equal in strength; they differ only in
direction
-1.0 -.8 -.6 -.4 -.2 0 +.2 +.4 +.6 +.8 +1.0
![Page 16: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/16.jpg)
Choosing among Measures We use different measures of association for 2 main
reasons:1. There are different levels of measurement
Ordinal measurement offers ranking information used to calculate association, which isn’t available with nominal data
2. Some measures are specific to tables of certain sizes and shapesSpecific measures for 2 x 2 tables; others for larger
square tables; still others for rectangular tables
![Page 17: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/17.jpg)
Phi ΦUse with dichotomous variables, 2 x 2 tablesApplies to nominal and ordinal dataMeasures the strength of a relationship by taking the
# of cases on the main diagonal minus the # of cases on the off diagonal (adjusting for marginal distribution of cases, i.e. the sum of the columns and rows)
))()()(( DBCADCBABCAD
![Page 18: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/18.jpg)
2 Examples: Phi Φ
6.
2.
Low HighLow 75% 10%High 25% 90%Total 100 100
Score on X Variable
Score on Y Variable
Low HighLow 50% 20%High 50% 80%Total 100 100
Score on X Variable
Score on Y Variable
![Page 19: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/19.jpg)
Cramer’s VAn extension of PhiLogic of Cramer’s V is based on percentage
differences across the columns, not on logic of diagonals
Use with nominal data, when tables are larger than 2 x 2
![Page 20: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/20.jpg)
Lambda Lambda (λ) is another measure of association for
nominal dataIts rationale of “percentage of improvement” or
“proportion reduction in error” is relatively easy to explain
Not recommended in this courseWhen modal category of each column is in same row,
λ=0
![Page 21: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/21.jpg)
Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals
Support for X Low Med High
Support for Y
Low a b c Med d e f
High g h i
![Page 22: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/22.jpg)
Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals
Support for X Low Med High
Support for Y
Low a b c Med d e f
High g h i
![Page 23: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/23.jpg)
Measures of Association: Ordinal DataMeasures include Tau-b, Tau-c and Gamma Rely on analysis of diagonals
Support for X Low Med High
Support for Y
Low a b c Med d e f
High g h i
![Page 24: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/24.jpg)
Mind your Ps and QsThe letter P indicates the # of pairs of cases on the
main diagonals (from left to right)The letter Q indicates the # of pairs of cases on the
off diagonal (from right to left)If P > Q, we have a positive associationIf P < Q, we have a negative associationThe core calculation = P - Q
![Page 25: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/25.jpg)
GammaThe information of P and Q can be used to
calculate Gamma (γ)
Problems:Any vacant cell produces a score of 1.0Tends to overstate strength of a relationship
QPQP
QPQ
QPP
QPQP
![Page 26: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/26.jpg)
Tau-b and Tau-cPreferable to Gamma, though built on the same
logic of diagonalsTends to produce results similar to phi (using
nominal data) or the most important interval measure (r) – to be discussed later in the year
))(( YQPXQPQPbTau
![Page 27: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/27.jpg)
Tau-b and Tau-cTau-b never quite reaches 1.0 in non-square tablesSo Tau-c was developed to use with rectangular
tablesIn practice, the difference between Tau-b and Tau-c
when applied to the same table is not great, but keep the distinction above in mind
![Page 28: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/28.jpg)
Example
Approval of Chavez
Very Bad Bad Good
Very Good
All Respondents
Disapprove12.7%(26)
22.8%(64)
43.4%(171)
67.9%(110)
35.6%(371)
Approve87.3(178)
77.2(217)
56.6(223)
32.1(52)
64.4(670)
100(394)
100(162)
100(1041)
Table 2: Approval of President Chavez by Opinion of the United States, 2007
Opinion of the United States
Total (N)
100(204)
100(281)
Tau-c: -.39 Tau-b: -.35Source: Latinobarometer, 2007 – Venezuelan respondents only
![Page 29: Crosstabs & Measures of Association](https://reader036.vdocuments.us/reader036/viewer/2022062315/56816238550346895dd26aec/html5/thumbnails/29.jpg)
Summing UpWith nominal data, use Phi or Cramer’s V
Phi used for 2 x 2 tablesCramer’s V used for any other crosstab involving
nominal dataAvoid Lambda
With ordinal data, use Tau-c or Tau-bTau-b used for square tables: 3 x 3, 4 x 4, etcTau-c used for rectangular tablesAvoid Gamma