introduction to biostatistics (zju 2008)
Post on 03-Jan-2016
50 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction to Introduction to Biostatistics (ZJU Biostatistics (ZJU
2008)2008)Wenjiang Fu, Ph.DWenjiang Fu, Ph.DAssociate ProfessorAssociate Professor
Division of Biostatistics, Department of Division of Biostatistics, Department of Epidemiology Epidemiology
Michigan State UniversityMichigan State UniversityEast Lansing, Michigan 48824, USAEast Lansing, Michigan 48824, USA
Email: Email: fuw@msu.edufuw@msu.eduwww: www: http://www.msu.edu/~fuwhttp://www.msu.edu/~fuw
Categorical Data AnalysisCategorical Data Analysis Examples of categorical data. Examples of categorical data.
Qualitative Random Variables Yield Qualitative Random Variables Yield Responses That Can Be Put In Categories. Responses That Can Be Put In Categories. Example: Gender (Male, Female), disease Example: Gender (Male, Female), disease (+, -).(+, -). Measurement or Count Reflect # in CategoryMeasurement or Count Reflect # in Category Nominal (no order) or Ordinal Scale (order)Nominal (no order) or Ordinal Scale (order) Data can be collected as continuous but Data can be collected as continuous but
recoded to categorical data. Example (Systolic recoded to categorical data. Example (Systolic Blood Pressure - Hypotension, Normal tension, Blood Pressure - Hypotension, Normal tension, hypertension )hypertension )
Counts of episodes of incidence. Counts of episodes of incidence.
Categorical Data AnalysisCategorical Data Analysis Responses are not continuous, Responses are not continuous,
cannot use linear regression cannot use linear regression model or ANOVA model.model or ANOVA model.
Not even to use transformations Not even to use transformations to make response variable to make response variable meaningful under new scale.meaningful under new scale.
Need to introduce new methods Need to introduce new methods to analyze data. to analyze data.
Test for Test for TwoTwo ProportionsProportions
Research Questions
Hypothesis No DifferenceAny Difference
Pop 1 Pop 2Pop 1 < Pop 2
Pop 1 Pop 2Pop 1 > Pop 2
H0 p1 - p2 = 0 p1 - p2 0 p1 - p2
0
Ha p1 - p2 0 p1 - p2 < 0 p1 - p2 > 0
Research Questions
Hypothesis No DifferenceAny Difference
Pop 1 Pop 2Pop 1 < Pop 2
Pop 1 Pop 2Pop 1 > Pop 2
H0 p1 - p2 = 0 p1 - p2 0 p1 - p2
0
Ha p1 - p2 0 p1 - p2 < 0 p1 - p2 > 0
Z Test for Two Z Test for Two ProportionsProportions
1.1. AssumptionsAssumptions Populations Are IndependentPopulations Are Independent Populations Follow Binomial Populations Follow Binomial
DistributionDistribution Normal Approximation Can Be Used Normal Approximation Can Be Used
for large samples for large samples (All Expected (All Expected Counts Counts 5) 5)
2.2. Z-Test Statistic for Two ProportionsZ-Test Statistic for Two Proportions
1 2 1 2 1 2
1 2
1 2
ˆ ˆwhere
1 11
p p p p X XZ p
n np p
n n
1 2 1 2 1 2
1 2
1 2
ˆ ˆwhere
1 11
p p p p X XZ p
n np p
n n
Sample Distribution for Sample Distribution for Difference Between Difference Between
Proportions Proportions
1 1 2 21 2 1 2
1 2
0 1 21 2
1 2
1 2
1 1 N ;
1 1N 0; :
,
p p p pp p p p
n n
pq under H p pn n
x xp
n n
2 21 2
1 2 1 21 2
~ N ;X Xn n
Z Test for Two Z Test for Two Proportions Proportions
Thinking Challenge Thinking Challenge You’re an epidemiologist for the You’re an epidemiologist for the
US Department of Health and US Department of Health and Human Services. You’re studying Human Services. You’re studying the prevalence of disease X in two the prevalence of disease X in two states (MA and CA). In states (MA and CA). In MAMA, , 7474 of of 15001500 people surveyed were people surveyed were diseased and in diseased and in CACA, , 129 129 of of 15001500 were diseased. At were diseased. At .05.05 level, does level, does MAMA have a have a lowerlower prevalence prevalence rate? rate?
MA
CA
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*
HH00: : ppMAMA - - ppCACA = 0 = 0
HHaa: : ppMAMA - - ppCACA < 0 < 0
== .05 .05
nnMAMA == 1500 1500 nnCACA == 15001500
Critical Value(s):Critical Value(s):
Z0-1.645
.05
Reject
Z0-1.645
.05
Reject
Z Test for Two Z Test for Two Proportions Solution*Proportions Solution*
74 129ˆ ˆ.0493 .0860
1500 1500
74 129 .0677
1500 1500
.0493 .0860 0
1 1.0677 1 .0677
1500 1500
4.00
CAMAMA CA
MA CA
MA CA
MA CA
XXp p
n n
X Xp
n n
Z
74 129ˆ ˆ.0493 .0860
1500 1500
74 129 .0677
1500 1500
.0493 .0860 0
1 1.0677 1 .0677
1500 1500
4.00
CAMAMA CA
MA CA
MA CA
MA CA
XXp p
n n
X Xp
n n
Z
Z = -4.00Z = -4.00
Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*
HH00: : ppMAMA - - ppCACA = 0 = 0
HHaa: : ppMAMA - - ppCACA < 0 < 0
== .05 .05
nnMAMA = = 1500 1500 nnCACA == 15001500
Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Z0-1.645
.05
Reject
Z0-1.645
.05
Reject
Z = -4.00Z = -4.00
Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*
HH00: : ppMAMA - - ppCACA = 0 = 0
HHaa: : ppMAMA - - ppCACA < 0 < 0
= = .05.05
nnMAMA = = 1500 1500 nnCACA == 15001500
Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Z0-1.645
.05
Reject
Z0-1.645
.05
Reject Reject at Reject at = .05 = .05
Z = -4.00Z = -4.00
Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*
HH00: : ppMAMA - - ppCACA = 0 = 0
HHaa: : ppMAMA - - ppCACA < 0 < 0
== .05 .05
nnMAMA == 1500 1500 nnCACA == 15001500
Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Z0-1.645
.05
Reject
Z0-1.645
.05
Reject Reject at Reject at = .05 = .05
There is evidence MA There is evidence MA is less than CAis less than CA
Chi-Square (Chi-Square (22) Test ) Test for for kk Proportions Proportions
1.1. Tests Equality (=) of Proportions OnlyTests Equality (=) of Proportions OnlyExample: Example: pp11 = .2, = .2, pp22=.3, =.3, pp33 = .5 = .5
2.2. One Variable With Several LevelsOne Variable With Several Levels
3.3. AssumptionsAssumptionsMultinomial ExperimentMultinomial Experiment
Large Sample Size (All Expected Counts Large Sample Size (All Expected Counts 5) 5)
4.4. Uses One-Way Contingency TableUses One-Way Contingency Table
Multinomial ExperimentMultinomial Experiment
1.1. nn Identical Trials Identical Trials2.2. kk Outcomes to Each Trial Outcomes to Each Trial
3.3. Constant Outcome Probability, Constant Outcome Probability, pp1 1 ,…, ,…, ppkk
4.4. Independent TrialsIndependent Trials
5.5. Random Variable is Count, nRandom Variable is Count, n1 1 ,…, n,…, nkk
6.6. Example: In health services research, Example: In health services research, ask 100 workers Which of 3 health ask 100 workers Which of 3 health insurance plans (insurance plans (k=3k=3) they prefer) they prefer
Health insurance plan
HMO PPO Other Total
35 20 45 100
Health insurance plan
HMO PPO Other Total
35 20 45 100
One-Way Contingency One-Way Contingency TableTable
1.1. Shows # Observations in Shows # Observations in kk Independent Groups (Outcomes or Independent Groups (Outcomes or Variable Levels)Variable Levels) Outcomes (Outcomes (kk = 3) = 3)
Number of responsesNumber of responses
1.1. HypothesesHypothesesHH00: : pp11 = = pp1,01,0, , pp22 = = pp2,02,0, ..., , ..., ppkk = = ppkk,0,0
HHaa: Not all : Not all ppii are equal are equal
2.2. Test StatisticTest Statistic
3.3. Degrees of Freedom: Degrees of Freedom: kk - 1 - 1
22
n E n
E ni i
i
afafall cells
22
n E n
E ni i
i
afafall cells
22 Test for Test for kk Proportions Proportions
Hypotheses & StatisticHypotheses & Statistic
Observed countObserved count
Expected countExpected count
Number of Number of outcomesoutcomes
Hypothesized Hypothesized probabilityprobability
22 Test Basic Idea Test Basic Idea
1.1. Compares Observed Count to Compares Observed Count to Expected Count under the Null Expected Count under the Null HypothesisHypothesis
2.2. The Closer Observed Count to The Closer Observed Count to Expected Count, the More Likely the Expected Count, the More Likely the HH00 Is True Is True Measured by Squared Difference Measured by Squared Difference
Relative to Expected CountRelative to Expected Count Reject Large ValuesReject Large Values
Upper Tail AreaDF .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991
Upper Tail AreaDF .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991
Finding Critical Value Finding Critical Value ExampleExample
20 5.991
Reject
20 5.991
Reject
What is the critical What is the critical 22 value if value if kk = 3, & = 3, & =.05? =.05?
= .05= .05
22 Table Table (Portion)(Portion)
dfdf = = kk - 1 = 2 - 1 = 2
If If nnii = = EE((nnii)), , 22 = 0. = 0.
Do not reject HDo not reject H00
As an MD epidemiologist, you want to test As an MD epidemiologist, you want to test the difference in prevention measures for the difference in prevention measures for patients with CVD diseases Hx. Of patients with CVD diseases Hx. Of 180180 patientspatients, , 6363 were following strict were following strict prevention measures regularlyprevention measures regularly (taking beta- (taking beta-blockers, aspirin, etc…), blockers, aspirin, etc…), 4545 were taking were taking some prevention measures but not on a some prevention measures but not on a regular basis regular basis and and 7272 were not following were not following any preventionsany preventions. At the . At the .05.05 level, is there a level, is there a differencedifference in prevention measures?in prevention measures?
22 Test for Test for kk Proportions ExampleProportions Example
22 Test for Test for kk Proportions SolutionProportions Solution
HH00: : pp11 = = pp22 = = pp33 = = 1/3 1/3
HHaa: : At least 1 is At least 1 is differentdifferent
= = .05.05
nn11 = = 63 63 nn22 = = 45 45 nn33 = = 72 72
Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
20 5.991
Reject
20 5.991
Reject
= .05= .05
E n np
E n E n E n
n E n
E n
n n n
i i
i i
i
afaf af af a f
afaf
,
.
0
1 2 3
22
12
22
32
2 2 2
180 1 3 60
60
60
60
60
60
60
63 60
60
45 60
60
72 60
606 3
all cells
E n np
E n E n E n
n E n
E n
n n n
i i
i i
i
afaf af af a f
afaf
,
.
0
1 2 3
22
12
22
32
2 2 2
180 1 3 60
60
60
60
60
60
60
63 60
60
45 60
60
72 60
606 3
all cells
22 Test for Test for kk Proportions SolutionProportions Solution
22 Test for Test for kk Proportions SolutionProportions Solution
HH00: : pp11 = = pp22 = = pp33 = = 1/3 1/3
HHaa: : At least 1 is At least 1 is differentdifferent
= = .05.05
nn11 = = 63 63 nn22 = = 45 45 nn33 = = 72 72
Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Reject at Reject at = .05 = .05
There is evidence of a There is evidence of a difference in proportions difference in proportions 20 5.991
Reject
20 5.991
Reject
= .05= .05
22 = 6.3 = 6.3
22 Test for Test for kk Proportions ProportionsSAS CodesSAS Codes
DataData CVD; CVD; input method count;input method count;datalines;datalines;1 631 632 452 453 723 72;;runrun;;
procproc freqfreq data=CVD order=data; data=CVD order=data; weight Count;weight Count; tables method/nocum testp=(tables method/nocum testp=(0.3330.333 0.3330.333 0.3330.333););runrun;;
22 Test for Test for kk Proportions ProportionsSAS OutputSAS Output
The FREQ Procedure
Test method Frequency Percent
Percent --------------------------------------------------------- 1 63 35.00 33.30 2 45 25.00 33.30 3 72 40.00 33.30
Chi-Square Test for Specified Proportions
-------------------------------------------- Chi-Square 6.3065 DF 2 Pr > ChiSq 0.0427
Sample Size = 180
R program for Chisq.testR program for Chisq.test
chisq.test(c(63,45,72),p=c(1/3,1/3,1/chisq.test(c(63,45,72),p=c(1/3,1/3,1/3))3))
Chi-squared test for given Chi-squared test for given probabilitiesprobabilities
data: c(63, 45, 72) data: c(63, 45, 72)
X-squared = 6.3, df = 2, p-value = X-squared = 6.3, df = 2, p-value = 0.042850.04285
Relation to the Z test for Relation to the Z test for comparison of two comparison of two
proportions?proportions? When k=2, only two groups are to be When k=2, only two groups are to be
compared, the Z test and the compared, the Z test and the 22 Test for Test for 2 Proportions are equivalent tests.2 Proportions are equivalent tests.(p(p11=p=p22=1/2)=1/2)
Notice that when k=2, the Notice that when k=2, the 22 Test for 2 Test for 2 Proportions will have 1dfProportions will have 1df
In distribution, In distribution, 22 = Z = Z22
22 Test of Homogeneity Test of Homogeneity
1.1. Shows If a Relationship Exists Shows If a Relationship Exists Between 2 Qualitative Variables, but Between 2 Qualitative Variables, but does does NotNot Show Causality (no Show Causality (no specification of dependency)specification of dependency)
2.2. AssumptionsAssumptionsMultinomial ExperimentMultinomial Experiment
All Expected Counts All Expected Counts 5 5
3.3. Uses Two-Way Contingency Uses Two-Way Contingency TableTable
Residence Disease Status
Urban Rural Total
Disease 63 49 112 No disease 15 33 48 Total 78 82 160
Residence Disease Status
Urban Rural Total
Disease 63 49 112 No disease 15 33 48 Total 78 82 160
22 Test of Test of Homogeneity Homogeneity
Contingency Table Contingency Table 1.1. Shows # Observations From 1 Shows # Observations From 1
Sample Jointly in 2 Qualitative Sample Jointly in 2 Qualitative VariablesVariables Levels of variable 2Levels of variable 2
Levels of variable 1Levels of variable 1
22 Test of Test of Homogeneity Homogeneity Hypotheses & Hypotheses &
StatisticStatistic1.1. HypothesesHypothesesHH00: Homogeneity or equal proportions : Homogeneity or equal proportions
HHaa: Heterogeneity or unequal proportions: Heterogeneity or unequal proportions
2.2. Test StatisticTest Statistic
Degrees of Freedom: (Degrees of Freedom: (rr - 1)( - 1)(cc - 1) - 1)RowsRows Columns Columns
Observed countObserved count
Expected Expected countcount 2
2
n E n
E n
ij ij
ij
c h
c hall cells
2
2
n E n
E n
ij ij
ij
c h
c hall cells
22 Test of Homogeneity Test of Homogeneity Expected CountsExpected Counts
1.1. Statistical Independence Means Statistical Independence Means Joint Probability Equals Product of Joint Probability Equals Product of Marginal ProbabilitiesMarginal Probabilities
2.2. Compute Marginal Probabilities Compute Marginal Probabilities & Multiply for Joint Probability& Multiply for Joint Probability
3.3. Expected Count Is Sample Size Expected Count Is Sample Size Times Joint ProbabilityTimes Joint Probability
Residence Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
Residence Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
112 112 160160
Marginal probability = Marginal probability =
Residence Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
Residence Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
112 112 160160
78 78 160160
Marginal probability = Marginal probability =
Marginal probability = Marginal probability =
Residence Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
Residence Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
Expected Count ExampleExpected Count Example
112 112 160160
78 78 160160
Marginal probability = Marginal probability =
Marginal probability = Marginal probability =
Joint probability = Joint probability = 112 112 160160
78 78 160160
Expected count = 160· Expected count = 160· 112 112 160160
78 78 160160
= 54.6 = 54.6
Residence Disease Urban Rural
Status Obs. Exp. Obs. Exp. Total
Disease 63 54.6 49 57.4 112
No Disease 15 23.4 33 24.6 48
Total 78 78 82 82 160
Residence Disease Urban Rural
Status Obs. Exp. Obs. Exp. Total
Disease 63 54.6 49 57.4 112
No Disease 15 23.4 33 24.6 48
Total 78 78 82 82 160
Expected Count Expected Count CalculationCalculation
112x82 112x82 160160
48x78 48x78 160160
48x82 48x82 160160
112x78 112x78 160160
Expected count = Row total Column total
Sample sizea fa f
Expected count = Row total Column total
Sample sizea fa f
HIV STDs Hx No Yes Total
No 84 32 116 Yes 48 122 170 Total 132 154 286
HIV STDs Hx No Yes Total
No 84 32 116 Yes 48 122 170 Total 132 154 286
You randomly sample You randomly sample 286286 sexually active sexually active individuals and collect information on their individuals and collect information on their HIV status and History of STDs. At the HIV status and History of STDs. At the .05.05 level, is there evidence of a level, is there evidence of a relationshiprelationship??
22 Test of Test of Homogeneity Homogeneity
Example on HIVExample on HIV
22 Test of Test of Homogeneity Homogeneity
SolutionSolutionHH00: : No No
Relationship Relationship
HHaa: : Relationship Relationship
= = .05.05
df = df = (2 - 1)(2 - 1) (2 - 1)(2 - 1) = 1 = 1
Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
20 3.841
Reject
20 3.841
Reject
= .05= .05
HIV No Yes
STDs HX Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286
HIV No Yes
STDs HX Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286
EE((nnijij)) 5 in all 5 in all
cellscells
170x132 170x132 286286
170x154 170x154 286286
116x132 116x132 286286
154x132 154x132 286286
22 Test of Test of Homogeneity Homogeneity
SolutionSolution
2
2
11 11
2
11
12 12
2
12
22 22
2
22
2 2 284 53 5
53 5
32 62 5
62 5
122 915
91554 29
n E n
E n
n E n
E n
n E n
E n
n E n
E n
ij ij
ij
.
.
.
.
.
..
c hc h
a fa f
a fa f
a fa f
all cells
2
2
11 11
2
11
12 12
2
12
22 22
2
22
2 2 284 53 5
53 5
32 62 5
62 5
122 915
91554 29
n E n
E n
n E n
E n
n E n
E n
n E n
E n
ij ij
ij
.
.
.
.
.
..
c hc h
a fa f
a fa f
a fa f
all cells
22 Test of Test of Homogeneity Homogeneity
SolutionSolution
22 Test of Test of Homogeneity Homogeneity
SolutionSolutionHH00: : No No
Relationship Relationship
HHaa: : Relationship Relationship
= .05= .05
dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1
Critical Value(s):Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Reject at Reject at = .05 = .05
20 3.841
Reject
20 3.841
Reject
= .05= .05
22 = 54.29 = 54.29
22 Test of Homogeneity Test of Homogeneity SAS CODESSAS CODES
DataData dis; dis;input STDs HIV input STDs HIV
count;count;cards;cards;1 1 84 1 1 84 1 2 321 2 322 1 482 1 482 2 1222 2 122;;runrun;;
ProcProc freqfreq data=dis data=dis order=data;order=data;
weight weight Count;Count;
tables tables STDs*HIV/STDs*HIV/chischisqq;;
runrun;;
22 Test of Homogeneity Test of Homogeneity SAS OUTPUTSAS OUTPUT
Statistics for Table of STDs by HIV
Statistic DF Value Prob ------------------------------------------------------- Chi-Square 1 54.1502 <.0001 Likelihood Ratio Chi-Square 1 55.7826 <.0001 Continuity Adj. Chi-Square 1 52.3871 <.0001 Mantel-Haenszel Chi-Square 1 53.9609 <.0001
Phi Coefficient 0.4351 Contingency Coefficient 0.3990 Cramer's V 0.4351
Continuity Correction
∑i (|Oi-Ei|-0.5)2 / Ei
Fisher’s Exact TestFisher’s Exact Test
Fisher’s Exact TestFisher’s Exact Test is a test for is a test for independence in a 2 X 2 table. It is independence in a 2 X 2 table. It is most useful when the total sample most useful when the total sample size and the expected values are size and the expected values are small. The test holds the marginal small. The test holds the marginal totals fixed and computes the totals fixed and computes the hypergeometric probability that nhypergeometric probability that n1111 is at least as large as the observed is at least as large as the observed valuevalue
Fisher’s Exact Test Fisher’s Exact Test ExampleExample
Is HIV Infection related to Hx of STDs Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? in Sub Saharan African Countries? Test at 5% level.Test at 5% level.
yesyes nono totatotall
yesyes 33 77 1010
nono 55 1010 1515
totatotall
88 1717
HIV Infection
Hx of STDs
Data dis;Data dis;input STDs $ HIV $ count;input STDs $ HIV $ count;cards;cards;no no 10 no no 10 No Yes 5No Yes 5yes no 7yes no 7yes yes 3yes yes 3;;run;run;
proc freq data=dis order=data;proc freq data=dis order=data; weight Count;weight Count; tables STDs*HIV/chisq;tables STDs*HIV/chisq;run;run;
Fisher’s Exact Test SAS Fisher’s Exact Test SAS CodesCodes
Fisher’s Exact Test SAS Fisher’s Exact Test SAS OutputOutput Statistics for Table of STDs by HIV
Statistic DF Value Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0306 0.8611 Likelihood Ratio Chi-Square 1 0.0308 0.8608 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0294 0.8638 Phi Coefficient -0.0350 Contingency Coefficient 0.0350 Cramer's V -0.0350
WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F 0.6069 Right-sided Pr >= F 0.7263 Table Probability (P) 0.3332 Two-sided Pr <= P 1.0000
Fisher’s Exact TestFisher’s Exact Test The output consists of three p-values and one
table probability. Left: Use this when the alternative to
independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right.
Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right.
2-Tail: Use this when there is no prior alternative.
Table probability – the probability for this table to occur – not a tail probability or not a p-value.
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
The approximate test previously presented for The approximate test previously presented for assessing a difference in proportions is based assessing a difference in proportions is based upon the assumption that the two samples are upon the assumption that the two samples are independentindependent..
Suppose, however, that we are faced with a Suppose, however, that we are faced with a situation where this is not true. Suppose we situation where this is not true. Suppose we randomly-select 100 people, and find that 20% randomly-select 100 people, and find that 20% of them have flu. Then, imagine that we apply of them have flu. Then, imagine that we apply some type of treatment to all sampled peoples; some type of treatment to all sampled peoples; and on a post-test, we find that 20% have flu. and on a post-test, we find that 20% have flu.
Basis / Rationale for the TestBasis / Rationale for the Test
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
We might be tempted to suppose that no hypothesis We might be tempted to suppose that no hypothesis test is required under these conditions, in that the test is required under these conditions, in that the ‘Before’ and ‘After’ probabilities are identical, and ‘Before’ and ‘After’ probabilities are identical, and would surely result in a test statistic value of 0.00.would surely result in a test statistic value of 0.00.
The problem with this thinking, however, is that the The problem with this thinking, however, is that the two sample two sample pp values are dependent, in that each values are dependent, in that each person was assessed twice. It person was assessed twice. It isis possible that the 20 possible that the 20 people that had flu originally still had flu. It is also people that had flu originally still had flu. It is also possible that the 20 people that had flu on the possible that the 20 people that had flu on the second test second test were a completely different setwere a completely different set of 20 of 20 people!people!
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
It is for precisely this type of situation that It is for precisely this type of situation that McNemar’s Test for Correlated (Dependent) McNemar’s Test for Correlated (Dependent) Proportions is applicable.Proportions is applicable.
McNemar’s Test employs two unique McNemar’s Test employs two unique features for testing the two proportions:features for testing the two proportions:
* a special fourfold contingency table; with a* a special fourfold contingency table; with a
* special-purpose chi-square (* special-purpose chi-square ( 22) test ) test statistic (the approximate test).statistic (the approximate test).
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
Sample ProblemSample ProblemA randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
Sample ProblemSample ProblemThe summary data for this study appear as follows:
Number ofStudents
Status BeforeSession
Status AfterSession
4 Fail Fail4 Pass Fail
56 Fail Pass56 Pass Pass
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
Nomenclature for the Fourfold (2 x 2) Nomenclature for the Fourfold (2 x 2) Contingency TableContingency Table
AA BB
DDCC
(B + D)(B + D)(A + C)(A + C)
(C + D)(C + D)
(A + B)(A + B)
nn
where(A+B) + (C+D) =(A+C) + (B+D) = n = number of units evaluated
and where
df = 1
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
1. Construct a 2x2 table where the paired 1. Construct a 2x2 table where the paired observations are the sampling units.observations are the sampling units.
2. Each observation must represent a single 2. Each observation must represent a single joint event possibility; that is, classifiable joint event possibility; that is, classifiable in only one cell of the contingency table.in only one cell of the contingency table.
3. In it’s Exact form, this test may be 3. In it’s Exact form, this test may be conducted as a One Sample Binomial for conducted as a One Sample Binomial for the B & C cellsthe B & C cells
Underlying Assumptions of the TestUnderlying Assumptions of the Test
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
4. The expected frequency (4. The expected frequency (ffee) for ) for the B and C cells on the the B and C cells on the contingency table must be equal to contingency table must be equal to or greater than 5; where or greater than 5; where
ffe e = (B + C) / 2 = (B + C) / 2
from the Fourfold tablefrom the Fourfold table
Underlying Assumptions of the TestUnderlying Assumptions of the Test
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
The data are then entered into the Fourfold Contingency table:
5656 5656
4444
60606060
88
112112
120120
BeforePass Fail
Pass
Fail
After
Step I : State the Null & Research HypothesesStep I : State the Null & Research Hypotheses
HH00 : : = =
HH11 : :
where where and and relate to the proportion of observations reflecting relate to the proportion of observations reflecting changeschanges in status (the B & C cells in the table) in status (the B & C cells in the table)
Step II : Step II : 0.050.05
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
Step III : State the Associated Test StatisticStep III : State the Associated Test Statistic
B – C| - 1 }2
2 = B + C
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
Continuity Correction
Step IV : State the distribution of the Test Step IV : State the distribution of the Test Statistic When HStatistic When Hoo is True is True
22 = = 22 with 1 with 1 dfdf when H when Hoo is True is True
Step V : Reject Ho if test statistic 2 > 3.84
d
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
X² Distribution
X²0 3.84
Chi-Square Curve w. 1 df
Step VI : Calculate the Value of the Test StatisticStep VI : Calculate the Value of the Test Statistic
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
56 – 4| - 1 }2
2 = = 43.35 56 + 4
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS (Dependent) Proportions-SAS
CodesCodesDataData test; test;input Before $ After $ input Before $ After $
count;count;cards;cards;pass pass 56 pass pass 56 pass fail 56pass fail 56fail pass 4fail pass 4fail fail 4;fail fail 4;runrun;;
procproc freqfreq data=test data=test order=data;order=data;
weight Count;weight Count; tables tables
Before*After/Before*After/agreeagree;;runrun;;
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS (Dependent) Proportions-SAS
OutputOutput
Statistics for Table of Before by After
McNemar's Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Statistic (S) 45.0667 DF 1 Pr > S <.0001
Sample Size = 120
Without the correction
R program for R program for McNemar’s TestMcNemar’s Test
> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=T)> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=T)
McNemar's Chi-squared test with continuity correctionMcNemar's Chi-squared test with continuity correction
data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) McNemar's chi-squared = 43.35, df = 1, p-value = 4.577e-11McNemar's chi-squared = 43.35, df = 1, p-value = 4.577e-11
> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=F)> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=F)
McNemar's Chi-squared testMcNemar's Chi-squared test
data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) McNemar's chi-squared = 45.0667, df = 1, p-value = 1.904e-McNemar's chi-squared = 45.0667, df = 1, p-value = 1.904e-
1111
top related