cross tabulation and chi square test for independence

Post on 10-Feb-2016

61 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Cross Tabulation and Chi Square Test for Independence. Cross-tabulation. Helps answer questions about whether two or more variables of interest are linked: Is the type of mouthwash user (heavy or light) related to gender? - PowerPoint PPT Presentation

TRANSCRIPT

Cross Tabulation and Chi Square

Test for Independence

Cross-tabulation• Helps answer questions about whether two

or more variables of interest are linked:– Is the type of mouthwash user (heavy or light)

related to gender?– Is the preference for a certain flavor (cherry or

lemon) related to the geographic region (north, south, east, west)?

– Is income level associated with gender?• Cross-tabulation determines association not

causality.

• The variable being studied is called the dependent variable or response variable.

• A variable that influences the dependent variable is called independent variable.

Dependent and Independent Variables

Cross-tabulation• Cross-tabulation of two or more variables is

possible if the variables are discrete:– The frequency of one variable is subdivided by the other

variable categories.• Generally a cross-tabulation table has:

– Row percentages– Column percentages– Total percentages

• Which one is better?DEPENDS on which variable is considered as independent.

Cross tabulationGROUPINC * Gender Crosstabulation

10 9 1952.6% 47.4% 100.0%55.6% 18.8% 28.8%15.2% 13.6% 28.8%

5 25 3016.7% 83.3% 100.0%27.8% 52.1% 45.5%7.6% 37.9% 45.5%

3 14 1717.6% 82.4% 100.0%16.7% 29.2% 25.8%4.5% 21.2% 25.8%

18 48 6627.3% 72.7% 100.0%

100.0% 100.0% 100.0%27.3% 72.7% 100.0%

Count% within GROUPINC% within Gender% of TotalCount% within GROUPINC% within Gender% of TotalCount% within GROUPINC% within Gender% of TotalCount% within GROUPINC% within Gender% of Total

income <= 5

5<Income<= 10

income >10

GROUPINC

Total

Female MaleGender

Total

• A contingency table shows the conjoint distribution of two discrete variables

• This distribution represents the probability of observing a case in each cell– Probability is calculated as:

Contingency Table

Observed casesTotal cases

P=

Chi-square Test for Independence

• The Chi-square test for independence determines whether two variables are associated or not.

H0: Two variables are independent H1: Two variables are not independent

Chi-square test results are unstable if cell count is lower than 5

x² = chi-square statisticsOi = observed frequency in the ith cellEi = expected frequency on the ith cell

i

ii )²( ²E

EOx

nCR

E jiij

Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample sizeEij = estimated cell frequency

Estimated cell Frequency

Chi-Square statistic

Chi-Square Test

Degrees of Freedom

d.f.=(R-1)(C-1)

Aware 50/39 10/21 60

Unaware 15/21 25/14 40 65 35 100

Men Women Total

Awareness of Tire Manufacturer’s Brand

21)2110(

39)3950( 22

2

X

14)1425(

26)2615( 22

Chi-Square Test: Differences Among Groups Example

161.22643.8654.4762.5102.3

2

2

1)12)(12(..)1)(1(..

fd

CRfd

X2 with 1 d.f. at .05 critical value = 3.84

Chi-square Test for Independence

• Under H0, the joint distribution is approximately distributed by the Chi-square distribution (2).

2

Reject H0 Chi-square

3.84

22.16

Differences Between Groups when Comparing Means

• Ratio scaled dependent variables• t-test

– When groups are small– When population standard deviation is

unknown• z-test

– When groups are large

021

21

OR

Null Hypothesis About Mean Differences Between Groups

means random ofy Variabilit2mean - 1mean t

t-Test for Difference of Means

21

21 XXS

t

X1 = mean for Group 1X2 = mean for Group 2SX1-X2 = the pooled or combined standard error of difference between means.

t-Test for Difference of Means

21

21 XXS

t

t-Test for Difference of Means

X1 = mean for Group 1X2 = mean for Group 2SX1-X2

= the pooled or combined standard error

of difference between means.

t-Test for Difference of Means

Pooled Estimate of the Standard Error

2121

222

211 11

2))1(1

21 nnnnSnSnS XX

S12 = the variance of Group 1

S22

= the variance of Group 2n1 = the sample size of Group 1n2 = the sample size of Group 2

Pooled Estimate of the Standard Error

Pooled Estimate of the Standard Error t-test for the Difference of Means

2121

222

211 11

2))1(1

21 nnnnSnSnS XX

S12 = the variance of Group 1

S22

= the variance of Group 2n1 = the sample size of Group 1n2 = the sample size of Group 2

Degrees of Freedom

• d.f. = n - k• where:

–n = n1 + n2

–k = number of groups

14

1211

336.2131.220 22

21 XXS

797.

t-Test for Difference of Means Example

797.2.125.16

t797.

3.4

395.5

Comparing Two Groups when Comparing Proportions

• Percentage Comparisons• Sample Proportion - P• Population Proportion -

Differences Between Two Groups when Comparing Proportions

The hypothesis is:

Ho: 1

may be restated as:

Ho: 1

21: oHor

0: 21 oH

Z-Test for Differences of Proportions

Z-Test for Differences of Proportions

21

2121

ppSppZ

p1 = sample portion of successes in Group 1p2 = sample portion of successes in Group 21 1)= hypothesized population proportion 1

minus hypothesized populationproportion 1 minus

Sp1-p2 = pooled estimate of the standard errors of difference of proportions

Z-Test for Differences of Proportions

Z-Test for Differences of Proportions

21

1121 nn

qpS pp

p = pooled estimate of proportion of success in a sample of both groupsp = (1- p) or a pooled estimate of proportion of failures in a sample of both groupsn= sample size for group 1 n= sample size for group 2

p

q p

Z-Test for Differences of Proportions

Z-Test for Differences of Proportions

21

2211

nnpnpnp

100

1100

1625.375.21 ppS

068.

Z-Test for Differences of Proportions

100100

4.10035.100

p

375.

A Z-Test for Differences of Proportions

Analysis of Variance

Hypothesis when comparing three groups

1

groupswithinVariancegroupsbetweenVariance

F

Analysis of Variance F-Ratio

Analysis of Variance Sum of Squares

betweenwithintotal SS SS SS

n

i

c

j1 1

2total )( SS XX ij

Analysis of Variance Sum of SquaresTotal

Analysis of Variance Sum of Squares

pi = individual scores, i.e., the ith observation or test unit in the jth grouppi = grand meann = number of all observations or test units in a groupc = number of jth groups (or columns)

ijX

X

n

i

c

jj

1 1

2within )( SS XX ij

Analysis of Variance Sum of SquaresWithin

Analysis of Variance Sum of SquaresWithin

pi = individual scores, i.e., the ith observation or test unit in the jth grouppi = grand meann = number of all observations or test units in a groupc = number of jth groups (or columns)

ijX

X

n

jjjn

1

2between )( SS XX

Analysis of Variance Sum of Squares Between

Analysis of Variance Sum of squares Between

= individual scores, i.e., the ith observation or test unit in the jth group = grand meannj = number of all observations or test units in a group

jX

X

1

cSSMS between

between

Analysis of Variance Mean Squares Between

ccnSSMS within

within

Analysis of Variance Mean Square Within

within

between

MSMSF

Analysis of Variance F-Ratio

Sales in Units (thousands)

Regular Price$.99

1301188784

X1=104.75X=119.58

Reduced Price$.89

145143120131

X2=134.75

Cents-Off CouponRegular Price

1531299699

X1=119.25

Test Market A, B, or CTest Market D, E, or FTest Market G, H, or ITest Market J, K, or L

MeanGrand Mean

A Test Market Experiment on Pricing

ANOVA Summary Table Source of Variation

• Between groups• Sum of squares

– SSbetween• Degrees of freedom

– c-1 where c=number of groups• Mean squared-MSbetween

– SSbetween/c-1

ANOVA Summary Table Source of Variation

• Within groups• Sum of squares

– SSwithin• Degrees of freedom

– cn-c where c=number of groups, n= number of observations in a group

• Mean squared-MSwithin– SSwithin/cn-c

WITHIN

BETWEEN

MSMSF

ANOVA Summary Table Source of Variation

• Total• Sum of Squares

– SStotal• Degrees of Freedom

– cn-1 where c=number of groups, n= number of observations in a group

top related