more tabs
TRANSCRIPT
Cross-Tabs Continued
Andrew Martin
PS 372
University of Kentucky
Statistical Independence
Statistical independence is a property of two variables in which the probability that an observation is in a particular category of one variable and a particular category of the other variable equals the simple or marginal probability of being in those categories.
Contrary to other statistical measures discussed in class, statistical independence indicators test for a lack of a relationship between two variables.
Statistical Independence
Let us assume two nominal variables, X and Y. The values for these variables are as follows:
X: a, b, c, ...
Y: r, s, t, ...
Statistical Independence
P(X=a) stands for the probability a randomly selected case has property or value a on
variable X.
P(Y=r) stands for the probability a randomly selected case has property or value r on
variable Y
P(X=a, Y=r) stands for the joint probability that a randomly selected observation has both property a and property r simultaneously.
Statistical Independence
If X and Y are statistically independent:
P(X=a, Y=r) = [P(X=a)][P(Y=r)] for all a and r.
Statistical Independence
If gender and turnout are independent:
Total obs in column m * Total obs in row v N = mv
Statistical Independence
Total obs in column m * Total obs in row v N = mv
210 * 100300 = 70
70 is the expected frequency. Because the observed and expected frequencies are the
same, the variables are independent.
150 * 150300 = 75
Here, the relationship is not independent (or dependent) because 75 (expected frequency) is
less than 100 (observed frequency).
Testing for Independence
How do we test for independence for an entire cross-tabulation table?
A statistic used to test the statistical significance of a relationship in a cross-tabulation table is a
chi-square test (χ2).
Chi-Square Statistic
The chi-square statistic essentially compares an observed result—the table produced by the data—with a hypothetical table that would occur if, in the population, the variables were statistically independent.
How is the chi-square statistic calculated?
The chi-square test is set up just like a hypothesis test. The observed chi-square value
is compared to the critical value for a certain critical region.
A statistic is calculated for each cell of the cross-tabulation and is similar to the independence statistic.
How is the chi-square statistic calculated?
(Observed frequency – expected frequency)2
Chi-Square Test
• The null hypothesis is statistical independence between X and Y.
• H0: X, Y Independent• The alternative hypothesis is X and Y are not
independent. • HA: X, Y Dependent
Chi-Square Test
• The chi-square is a family of distributions, each of which depends on degrees of freedom. The degrees of freedom equals the number of rows minus one times the number of columns minus one. (r-1)(c-1)
• Level of significance: The probability (α) of incorrectly rejecting a true null hypothesis.
Chi-Square Test
• Critical value: The chi-square test is always a one-tail test. Choose the critical value of chi-square from a tabulation to make the critical region (the region of rejection) equal to α.
• (JRM: Appendix C, pg. 577)
Chi-Square Test
• The observed chi-2 is the sum of the squared differences between observed and expected frequencies divided by the expected frequency.
• If χ2obs ≥ χ2
crit.,reject null hypothesis. Otherwise, do not reject.
Chi-Square Test
• Let's assume we want to test the relationship at the .01 level.
• The observed χ2 is 62.21.• The degrees of freedom is (5-1)(2-1) = 4.• The critical χ2 is 13.28.• Since 62.21 > 13.28, we can reject the null
of an independent relationship.• Y (attitudes toward gun control) is
dependent on X (gender).
Chi-Square Test
• The χ2 statistic works for dependent variables that are ordinal or nominal measures, but another statistic is more appropriate for interval- and ratio-level data.