inference for categorical data

57
1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University

Upload: merton

Post on 15-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Inference for Categorical Data. William P. Wattles, Ph. D. Francis Marion University. Continuous vs. Categorical. Continuous (measurement) variables have many values Categorical variables have only certain values representing different categories - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inference for Categorical Data

1

Inference for Categorical Data

William P. Wattles, Ph. D.Francis Marion University

Page 2: Inference for Categorical Data

2

Continuous vs. Categorical• Continuous (measurement) variables have

many values• Categorical variables have only certain

values representing different categories• Ordinal-a type of categorical with a natural

order (e.g., year of college)• Nominal-a type of categorical with no order

(e.g., brand of cola)

Page 3: Inference for Categorical Data

3

Categorical Data• Tells which category an individual is in

rather than telling how much.• Sex, race, occupation naturally categorical• A quantitative variable can be grouped to

form a categorical variable. • Analyze with counts or percents.

Page 4: Inference for Categorical Data

4

Describing relationships in categorical data

• No single graph portrays the relationship

• Also no similar number summarizes the relationship

• Convert counts to proportions or percents

Page 5: Inference for Categorical Data

55

Prediction

Page 6: Inference for Categorical Data

66

Prediction

Page 7: Inference for Categorical Data

7

Moving from descriptive to Inferential

• Chi Square Inference involves a test of independence.

• If variable are independent, knowledge of one variable tells you nothing about the other.

Page 8: Inference for Categorical Data

8

Moving from descriptive to Inferential

• Inference involves expected counts. – Expected count=The count that would occur if

the variables are independent

Page 9: Inference for Categorical Data

9

Inference for two-way tables

• Chi Square test of independence.• For more than two groups• Cannot compare multiple groups one at a

time.

Page 10: Inference for Categorical Data

10

To Analyze Categorical Data

• First obtain counts• In Excel can do this with a pivot table• Put data in a Matrix or two-way table

Page 11: Inference for Categorical Data

11

Matrix or two-way table

Republican Democrat Independent

Male 18 43 14

Female 39 23 18

Page 12: Inference for Categorical Data

12

Inference for two-way tables

• Expected count• The count that would occur if the variables

are independent

Page 13: Inference for Categorical Data

13

Matrix or two-way table• Rows• Columns• Distribution: how often each outcome

occurred• Marginal distribution: Count for all entries

in a row or column

Page 14: Inference for Categorical Data

14

Row and column totals

RepublicanDemocrat IndependentMale 18 43 14 75Female 39 23 18 80

57 66 32 155

Page 15: Inference for Categorical Data

15

RepublicanDemocrat IndependentMale 75 48%Female 80 52%

57 66 32 15537% 43% 21%

Page 16: Inference for Categorical Data

16

Expected counts• 37% of all subjects are Republicans• If independent 37% of females should be

Republican (expected value)• 37% of 80= 29• 37% of 75 = 28

Page 17: Inference for Categorical Data

17

Expected counts rounded

Republican Democrat Independent totalMale 28 32 15 75Female 29 34 17 80total 57 66 32 155

Page 18: Inference for Categorical Data

18

Observed vs. ExpectedRepublicanDemocrat Independent

Male 18 43 14 75Female 39 23 18 80

57 66 32 155

Republican Democrat Independent totalMale 28 32 15 75Female 29 34 17 80total 57 66 32 155

Page 19: Inference for Categorical Data

19

Chi-Square• Chi-square A measure of how far the

observed counts are from the expected counts

Page 20: Inference for Categorical Data

20

Chi-square test of independence

e

eo

fffX

22 )(

Page 21: Inference for Categorical Data

21

Chi Square test of independence with SPSS

Page 22: Inference for Categorical Data

22

Chi Square test of independence with SPSS

Page 23: Inference for Categorical Data

23

Chi Square

Page 24: Inference for Categorical Data

24

Chi-square test of independence

• Degrees of Freedom• df=number of rows-1 times number of

columns -1• compare the observed and expected counts.• P-value comes from comparing the Chi-

square statistic with critical values for a chi-square distribution

Page 25: Inference for Categorical Data

25

Example• Have the percent of majors changed by

school?

Page 26: Inference for Categorical Data

26

Data collection

http://www.fmarion.edu/about/FactBook2004/2005 Fall 2004 Graduates by Major

Page 27: Inference for Categorical Data

27

Page 28: Inference for Categorical Data

28

Page 29: Inference for Categorical Data

29

Chi Square

Page 30: Inference for Categorical Data

30

Marital Status, page 543

job grade single married divorced widowed1 58 874 15 82 222 3927 70 203 50 2396 34 104 7 533 7 4

Page 31: Inference for Categorical Data

31

Marital Status, page 543

Test Statistics Value df p-valuePearson Chi-Square 67.491 9 0.0000

Page 32: Inference for Categorical Data

32

Olive Oil, page 578

 

low medium highColon cancer 398 397 430rectal 250 241 217controls 1368 1377 1409

Olive Oil

Page 33: Inference for Categorical Data

33

Olive Oil, page 578

Test Statistics Value df p-valuePearson Chi-Square 1.552 4 0.817Continuity Adjusted Chi-Square1.396 4 0.845Likelihood Ratio Chi-Square1.549 4 0.818

Page 34: Inference for Categorical Data

34

Business Majors, page 563

Female MaleAccounting 68 56Administration 91 40Economics 5 6Finance 61 59

Page 35: Inference for Categorical Data

35

Business Majors, page 563

Test Statistics Value df p-valuePearson Chi-Square 10.827 3 0.013

Page 36: Inference for Categorical Data

36

Exam Three• 37 multiple choice

questions, 4 short answer• T-tests and chi square on

Excel• General questions about

analyzing categorical data and t-tests

• Review from earlier this term

Page 37: Inference for Categorical Data

37

Inference as a decision• We must decide if the null hypothesis is

true.• We cannot know for sure.• We choose an arbitrary standard that is

conservative and set alpha at .05• Our decision will be either correct or

incorrect.

Page 38: Inference for Categorical Data

38

Type I and Type II errors

Ho is really True

Ho is really False

We reject Ho

Type I Error (false alarm)

Correct Decision

We accept Ho

Correct decision Type II Error (miss)

Page 39: Inference for Categorical Data

39

Type I error• If we reject Ho when in fact Ho is true, this

is a Type I error• Statistical procedures are designed to

minimize the probability of a Type I error, because they are more serious for science.

• With a Type I error we erroneously conclude that an independent variable works.

Page 40: Inference for Categorical Data

40

Type II error

• If we accept Ho when in fact Ho is false this is a Type II error.

• A type two error is serious to the researcher.• The Power of a test is the probability that

Ho will be rejected when it is, in fact, false.

Page 41: Inference for Categorical Data

41

Probability

Ho is really True

Ho is really False

We reject Ho

p= p=1-

We accept Ho

p=1- p=

Page 42: Inference for Categorical Data

42

Power• The goal of any scientific research is to

reject Ho when Ho is false.• To increase power:

– a. increase sample size– b. increase alpha– c. decrease sample variability– d. increase the difference between the means

Page 43: Inference for Categorical Data

43

Categorical data example• African-American students more likely to

register via the web.

Page 44: Inference for Categorical Data

44

Table

Variable White African-AmericanStudents University-Wide n Percent n PercentRegister on the Web 447 34% 284 44%Register with other method 876 66% 356 56%Total 1323 640

Page 45: Inference for Categorical Data

45

Web Registration by Race

34%

25%

44%

29%

0%

10%

20%

30%

40%

50%

60%

2000 2001Year

WhiteAfrican-American

Page 46: Inference for Categorical Data

46

Categorical Data Example• African-American students university-wide

(44%) were more likely that white students (34%) to use web registration, X2(1, N = 1963) = 20.7 , p < .001.

Page 47: Inference for Categorical Data

47

Page 48: Inference for Categorical Data

48

Smoking among French Men

• Do these data show a relationship between education and smoking in French men?

Page 49: Inference for Categorical Data

49

Page 50: Inference for Categorical Data

50

Page 51: Inference for Categorical Data

51

The EndThe End

Page 52: Inference for Categorical Data

52

Benford’s Law page 550• Faking data?

Page 53: Inference for Categorical Data

53

Problem 20.14Digit ratio Observed

1 0.301 62 0.176 43 0.125 64 0.097 75 0.079 36 0.067 57 0.058 68 0.051 49 0.046 4

Page 54: Inference for Categorical Data

54

Digit ratio Expected Observed1 0.301 13.545 62 0.176 7.92 43 0.125 5.625 64 0.097 4.365 75 0.079 3.555 36 0.067 3.015 57 0.058 2.61 68 0.051 2.295 49 0.046 2.07 4

Page 55: Inference for Categorical Data

55

Expected Observed13.545 6 4.20280731

7.92 4 1.940202025.625 6 0.0254.365 7 1.590658653.555 3 0.086645573.015 5 1.306873962.61 6 4.40310345

2.295 4 1.266677562.07 4 1.7994686

16.6214371

Page 56: Inference for Categorical Data

56

Significance test

chitest p = 0.03430

Page 57: Inference for Categorical Data

57

Example• Survey2 Berk & Carey

page 261