more tabs

Cross-Tabs Continued

Andrew Martin

PS 372

University of Kentucky

Statistical Independence

Statistical independence is a property of two variables in which the probability that an observation is in a particular category of one variable and a particular category of the other variable equals the simple or marginal probability of being in those categories.

Contrary to other statistical measures discussed in class, statistical independence indicators test for a lack of a relationship between two variables.


Let us assume two nominal variables, X and Y. The values for these variables are as follows:

X: a, b, c, ...

Y: r, s, t, ...


P(X=a) stands for the probability a randomly selected case has property or value a on

variable X.

P(Y=r) stands for the probability a randomly selected case has property or value r on

variable Y

P(X=a, Y=r) stands for the joint probability that a randomly selected observation has both property a and property r simultaneously.


If X and Y are statistically independent:

P(X=a, Y=r) = [P(X=a)][P(Y=r)] for all a and r.

If gender and turnout are independent:

Total obs in column m * Total obs in row v N = mv


Total obs in column m * Total obs in row v N = mv

210 * 100300 = 70

70 is the expected frequency. Because the observed and expected frequencies are the

same, the variables are independent.

150 * 150300 = 75

Here, the relationship is not independent (or dependent) because 75 (expected frequency) is

less than 100 (observed frequency).

Testing for Independence

How do we test for independence for an entire cross-tabulation table?

A statistic used to test the statistical significance of a relationship in a cross-tabulation table is a

chi-square test (χ2).

Chi-Square Statistic

The chi-square statistic essentially compares an observed result—the table produced by the data—with a hypothetical table that would occur if, in the population, the variables were statistically independent.

How is the chi-square statistic calculated?

The chi-square test is set up just like a hypothesis test. The observed chi-square value

is compared to the critical value for a certain critical region.

A statistic is calculated for each cell of the cross-tabulation and is similar to the independence statistic.

How is the chi-square statistic calculated?

(Observed frequency – expected frequency)2

Chi-Square Test

• The null hypothesis is statistical independence between X and Y.

• H0: X, Y Independent• The alternative hypothesis is X and Y are not

independent. • HA: X, Y Dependent

Chi-Square Test

• The chi-square is a family of distributions, each of which depends on degrees of freedom. The degrees of freedom equals the number of rows minus one times the number of columns minus one. (r-1)(c-1)

• Level of significance: The probability (α) of incorrectly rejecting a true null hypothesis.

Chi-Square Test

• Critical value: The chi-square test is always a one-tail test. Choose the critical value of chi-square from a tabulation to make the critical region (the region of rejection) equal to α.

• (JRM: Appendix C, pg. 577)

Chi-Square Test

• The observed chi-2 is the sum of the squared differences between observed and expected frequencies divided by the expected frequency.

• If χ2obs ≥ χ2

crit.,reject null hypothesis. Otherwise, do not reject.

Chi-Square Test

• Let's assume we want to test the relationship at the .01 level.

• The observed χ2 is 62.21.• The degrees of freedom is (5-1)(2-1) = 4.• The critical χ2 is 13.28.• Since 62.21 > 13.28, we can reject the null

of an independent relationship.• Y (attitudes toward gun control) is

dependent on X (gender).

Chi-Square Test

• The χ2 statistic works for dependent variables that are ordinal or nominal measures, but another statistic is more appropriate for interval- and ratio-level data.

more tabs

Documents

chisquare statistic

chisquare test critical

hypothesis test

observed chisquare value

statistical independence

y independent

critical value of chisquare

tail test