chapter 13: categorical data analysis statistics

23
Chapter 13: Categorical Data Analysis Statistics

Upload: briana-lee

Post on 02-Jan-2016

244 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 13: Categorical Data Analysis Statistics

Chapter 13: Categorical Data Analysis

Statistics

Page 2: Chapter 13: Categorical Data Analysis Statistics

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

2

Where We’ve Been Presented methods for making inferences

about the population proportion associated with a two-level qualitative variable (i.e., a binomial variable)

Presented methods for making inferences about the difference between two binomial proportions

Page 3: Chapter 13: Categorical Data Analysis Statistics

Where We’re Going

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

3

Discuss qualitative (categorical) data with more than two outcomes

Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable – called a one-way analysis

Present a chi-square hypothesis test relating two qualitative variables – called a two-way analysis

Page 4: Chapter 13: Categorical Data Analysis Statistics

13.1: Categorical Data and the Multinomial Experiment

Properties of the Multinomial Experiment

1. The experiment consists of n identical trials.

2. There are k possible outcomes (called classes, categories or cells) to each trial.

3. The probabilities of the k outcomes, denoted by p1, p2, …, pk, where p1+ p2+ … + pk = 1, remain the same from trial to trial.

4. The trials are independent.

5. The random variables of interest are the cell counts n1, n2, …, nk of the number of observations that fall into each of the k categories.

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

4

Page 5: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

Suppose three candidates are running for office, and 150 voters are asked their preferences. Candidate 1 is the choice of 61 voters. Candidate 2 is the choice of 53 voters. Candidate 3 is the choice of 36 voters.

Do these data suggest the population may prefer one candidate over the others?

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

5

Page 6: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

Candidate 1 is the

choice of 61 voters.

Candidate 2 is the

choice of 53 voters.

Candidate 3 is the

choice of 36 voters.

n =150

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

6

130 1 2 3

13

0

1 2 3

20

2 22 1 1 2 2

1 2

: No preference

: At least one of the proprtions exceeds

150(Number of votes for each candidate| ) 50350

A chi-square ( ) test is used to test .

[ ] [ ]

a

H p p p

H

E H

E E E

H

n E n E

E E

23 3

3

2 2 22

2.05, 2

[ ]

[61 50] [53 50] [36 50]6.52

50 50 50

5.99147df

n E

E

Page 7: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

7

Reject the null hypothesis

Page 8: Chapter 13: Categorical Data Analysis Statistics

Test of a Hypothesis about Multinomial Probabilities: One-Way Table

H0: p1 = p1,0, p2 = p2,0, … , pk = pk,0

where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities

Ha: At least one of the multinomial probabilities does not equal its hypothesized value

where Ei = np1,0, is the expected cell count given the null hypothesis.

13.2: Testing Categorical Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

8

22 [ ]

Test statistic: i i

i

n E

E

2 2Rejection region: ,

with (k-1) df.

Page 9: Chapter 13: Categorical Data Analysis Statistics

Conditions Required for a Valid 2 Test:One-Way Table

1. A multinomial experiment has been conducted. 2. The sample size n will be large enough so that, for every cell,

the expected cell count E(ni) will be equal to 5 or more.

13.2: Testing Categorical Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

9

Page 10: Chapter 13: Categorical Data Analysis Statistics

Legalization Decriminalization Existing Law No Opinion

7% 18% 65% 10%

13.2: Testing Categorical Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

10

Example 13.2: Distribution of Opinions About MarijuanaPossession Before Television Series has Aired

Table 13.2: Distribution of Opinions About MarijuanaPossession After Television Series has Aired

Legalization Decriminalization Existing Law No Opinion

39 99 336 26

Page 11: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

11McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

Page 12: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

12

Expected Distribution of 500 Opinions About MarijuanaPossession After Television Series has Aired

Legalization Decriminalization Existing Law No Opinion

500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50

0 1 2 3 4

22

2 2.01, 3

: .07, .18, .65, .10

: At least one of the proportions differs

from its null hypothesis value.

[ ]Test statistic:

Rejection region: 11.3449

a

i i

i

df

H p p p p

H

n E

E

Page 13: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13

Expected Distribution of 500 Opinions About MarijuanaPossession After Television Series has Aired

Legalization Decriminalization Existing Law No Opinion

500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50

2 2.01, 3

2 2 2 22

2

Rejection region: 11.3449

(39 35) (99 90) (336 325) (26 50)

35 90 325 50

13.249

df

Page 14: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

14

Expected Distribution of 500 Opinions About MarijuanaPossession After Television Series has Aired

Legalization Decriminalization Existing Law No Opinion

500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50

2 2.01, 3

2 2 2 22

2

Rejection region: 11.3449

(39 35) (99 90) (336 325) (26 50)

35 90 325 50

13.249

df

Reject the null hypothesis

Page 15: Chapter 13: Categorical Data Analysis Statistics

13.2: Testing Categorical Probabilities: One-Way Table

Inferences can be made on any single proportion as well: 95% confidence interval on the proportion of citizens in the

viewing area with no opinion is

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

15

4

4

4

ˆ4

44

4 4ˆ

ˆ4

ˆ 1.96

26ˆwhere .052

500

ˆ ˆ(1 ) .052(.948)and .0099

500ˆ 1.96 .052 1.96(.0099) .052 .019

p

p

p

p

np

n

p p

np

Page 16: Chapter 13: Categorical Data Analysis Statistics

13.3: Testing Categorical Probabilities: Two-Way Table

Chi-square analysis can also be used to investigate studies based on qualitative factors. Does having one characteristic make it

more/less likely to exhibit another characteristic?

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

16

Page 17: Chapter 13: Categorical Data Analysis Statistics

13.3: Testing Categorical Probabilities: Two-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

17

Column

1 2 c Row Totals

1 n11 n12 n1c R1

Row 2 n21 n22 n2c R2

r nr1 nr2 nrc Rr

Column Totals C1 C1 C1 n

The columns are divided according to the subcategories for one qualitative variable and the rows for the other qualitative variable.

Page 18: Chapter 13: Categorical Data Analysis Statistics

13.3: Testing Categorical Probabilities: Two-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

18

0

22

General Form of a Two-way (Contigency) Table Analysis:

A Test for Independence

: The two classifications are independent

: The two classifications are dependent

[ ]Test statistic:

where

a

ij ij

ij

H

H

n E

E

2 2

and total for row , total for row , sample size

Rejection region: , df = ( 1)( 1)

i jij

i j

RCE

nR i C j n

r c

Page 19: Chapter 13: Categorical Data Analysis Statistics

13.3: Testing Categorical Probabilities: Two-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

19

The results of a survey regarding marital status and religious affiliation are reported below (Example 13.3 in the text).

A B C D None Totals

Divorced 39 19 12 28 18 116

Married, never divorced

172 61 44 70 37 384

Totals 211 80 56 98 55 500

MaritalStatus

Religious Affiliation

H0: Marital status and religious affiliation are independentHa: Marital status and religious affiliation are dependent

Page 20: Chapter 13: Categorical Data Analysis Statistics

13.3: Testing Categorical Probabilities: Two-Way Table

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

20

The expected frequencies (see Figure 13.4) are included below:

A B C D None Totals

Divorced 39(48.95)

19(18.56)

12(12.99)

28(27.74)

18(12.76)

116

Married, never divorced

172(162.05)

61(61.44)

44(43.01)

70(75.26)

37(42.24)

384

Totals 211 80 56 98 55 500

MaritalStatus

Religious Affiliation

The chi-square value computed with SAS is 7.1355, with p-value = .1289.Even at the = .10 level, we cannot reject the null hypothesis.

Page 21: Chapter 13: Categorical Data Analysis Statistics

13.3: Testing Categorical Probabilities: Two-Way Table

21McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

Page 22: Chapter 13: Categorical Data Analysis Statistics

13.4: A Word of Caution About Chi-Square Tests

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

22

Page 23: Chapter 13: Categorical Data Analysis Statistics

13.4: A Word of Caution About Chi-Square Tests

McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

23

Be sure