discrete multivariate analysis
DESCRIPTION
Discrete Multivariate Analysis. Analysis of Multivariate Categorical Data. References. Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/1.jpg)
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
![Page 2: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/2.jpg)
References
1. Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass.
2. Fingelton, B. (1984), Models for Category Counts , Cambridge University Press.
3. Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.
![Page 3: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/3.jpg)
Example 1
Data Set #1 - A two-way frequency table Serum Systolic Blood pressure
Cholesterol <127 127-146 147-166 167+ Total <200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439 260+ 67 99 46 33 245 Total 388 527 204 118 1237
In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol
![Page 4: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/4.jpg)
Example 2
The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).
![Page 5: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/5.jpg)
The study involved a dichotomous response Y– Success (no major parole violation) or – Failure (returned to prison either as technical
violators or with a new conviction)
based on a one-year follow-up.The predictors of parole success included are:
1. type of committed offence (Person offense or Other offense),
2. Age (25 or Older or Under 25), 3. Prior Record (No prior sentence or Prior
Sentence), and 4. Drug or Alcohol Dependency (No drug or
Alcohol dependency or Drug and/or Alcohol dependency).
![Page 6: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/6.jpg)
• The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses.
• The second part of the data was set aside for a validation study of the model to be fitted in the first part.
![Page 7: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/7.jpg)
Table
No drug or alcohol dependency Drug and/or alcohol dependency 25 or older Under 25 25 or Older Under 25 Person
offense Other
offense Person offense
Other offense
Person offense
Other offense
Person offense
Other offense
No prior Sentence of Any Kind Success 48 34 37 49 48 28 35 57 (44) (34) (29) (58) (47) (38) (37) (53) Failure 1 5 7 11 3 8 5 18 (1) (7) (7) (5) (1) (2) (4) (24) Prior Sentence Success 117 259 131 319 197 435 107 291 (111) (253) (131) (320) (202) (392) (103) (294) Failure 23 61 20 89 38 194 27 101 (27) (55) (25) (93) (46) (215) (34) (102)
![Page 8: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/8.jpg)
Multiway Frequency Tables
• Two-Way
A
B
![Page 9: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/9.jpg)
• Three -Way
A
B
C
![Page 10: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/10.jpg)
A
B
C
• Three -Way
![Page 11: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/11.jpg)
• four -Way
A
B
C
D
![Page 12: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/12.jpg)
Analysis of a Two-way Frequency Table:
![Page 13: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/13.jpg)
Frequency Distribution (Serum Cholesterol and Systolic Blood Pressure)
Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total
<200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439
260+ 67 99 46 33 245 Total 388 527 204 118 1237
![Page 14: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/14.jpg)
Joint and Marginal Distributions (Serum Cholesterol and Systolic Blood Pressure)
Serum Systolic Blood pressure Marginal distn Cholesterol <127 127-146 147-166 167+ (Serum Chol.)
<200 9.46 9.78 3.80 1.78 24.82 200-219 6.87 7.92 3.48 1.62 19.89 220-259 9.62 16.90 5.50 3.48 35.49
260+ 5.42 8.00 3.72 2.67 19.81 Marginal distn (BP)
31.37 42.60 16.49 9.54 100.00
The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.
![Page 15: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/15.jpg)
Conditional Distributions ( Systolic Blood Pressure given Serum Cholesterol )
The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.
Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total
<200 38.11 39.41 15.31 7.17 100.00 200-219 34.55 39.84 17.48 8.13 100.00 220-259 27.11 47.61 15.49 9.79 100.00
260+ 27.35 40.41 18.78 13.47 100.00 Marginal distn (BP)
31.37 42.60 16.49 9.54 100.00
![Page 16: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/16.jpg)
Conditional Distributions (Serum Cholesterol given Systolic Blood Pressure)
Serum Systolic Blood pressure Marginal distn Cholesterol <127 127-146 147-166 167+ (Serum Chol.)
<200 30.15 22.96 23.04 18.64 24.82 200-219 21.91 18.60 21.08 16.95 19.89 220-259 30.67 39.66 33.33 36.44 35.49
260+ 17.27 18.79 22.55 27.97 19.81 Total 100.00 100.00 100.00 100.00 100.00
![Page 17: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/17.jpg)
GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol
127-146 147-166<127 167+
SYSTOLIC BLOOD PRESSURE
<200
200-219
260+
220-259
Marginal Distribution
SERUM CHOLESTEROL
40%
50%
30%
20%
10%
![Page 18: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/18.jpg)
Notation:
Let xij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.
1
c
i i ijj
x R x
1
r
j j iji
x C x
1 1 1 1
r c r c
ij i ji j i j
x N x x x
![Page 19: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/19.jpg)
Different Models
,ij P X i Y j
11 1211 12 11 12
11
, , , rcxx xrc rc
rc
Nf x x x
x x
The Multinomial Model:Here the total number of cases N is fixed and xij follows a multinomial distribution with parameters ij
11 1211 12
11
!! !
rcxx xrc
rc
Nx x
ij ij ijE x N
![Page 20: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/20.jpg)
11 1211 12 1| 2| |
1 1
, , , ic
ri xx x
rc i i c ii i ic
Rf x x x
x x
The Product Multinomial Model:Here the row (or column) totals Ri are fixed and for a given row i, xij follows a multinomial distribution with parameters j|i
|ij ij i j iE x R
![Page 21: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/21.jpg)
11 121 1
, , ,!
ij
ij
xr cij
rci j ij
f x x x ex
The Poisson Model:In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let ij
denote the mean of xij.
ij ijE x
!
ij
ij
xij
ij ijij
f x ex
![Page 22: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/22.jpg)
Independence
![Page 23: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/23.jpg)
Multinomial Model ,ij P X i Y j P X i P Y j
i j
ij ij i jN N
if independent
and
The estimated expected frequency in cell (i,j) in the case of independence is:
ˆ ˆ ˆ jiij ij i j
xxm N N
N N
i j i jx x R CN N
![Page 24: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/24.jpg)
The same can be shown for the other two models – the Product Multinomial model and the Poisson model
namelyThe estimated expected frequency in cell (i,j) in the case of independence is:
ˆ i j i jij ij
R C x xm
N x
Standardized residuals are defined for each cell:
ij ijij
ij
x mr
m
![Page 25: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/25.jpg)
The Chi-Square Statistic
2
2 2
1 1 1 1
r c r cij ij
iji j i j ij
x mr
m
The Chi-Square test for independence
Reject H0: independence if
2
2 2/ 2
1 1
1 1r c
ij ij
i j ij
x mdf r c
m
![Page 26: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/26.jpg)
TableExpected frequencies, Observed frequencies,
Standardized Residuals
Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total
<200 96.29 130.79 50.63 29.29 307 (117) (121) (47) (22) 2.11 -0.86 -0.51 -1.35
200-219 77.16 104.80 40.47 23.47 246 (85) (98) (43) (20) 0.86 -0.66 0.38 -0.72
220-259 137.70 187.03 72.40 41.88 439 (119) (209) (68) (43) -1.59 1.61 -0.52 0.17
260+ 76.85 104.38 40.04 23.37 245 (67) (99) (46) (33) -1.12 -0.53 0.88 1.99
Total 388 527 204 118 1237 2 = 20.85 (p = 0.0133)
![Page 27: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/27.jpg)
Example
In the example N = 57,407 cases in which individuals were victimized twice by crimes were studied.
The crime of the first victimization (X) and the crime of the second victimization (Y) were noted.
The data were tabulated on the following slide
![Page 28: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/28.jpg)
Table 1: Frequencies
Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Total Ra 26 50 11 6 82 39 48 11 273 A 65 2997 238 85 2553 1083 1349 216 8586
First Ro 12 279 197 36 459 197 221 47 1448 Victimization PP/PS 3 102 40 61 243 115 101 38 703
in pair PL 75 2628 413 229 12137 2658 3689 687 22516 B 52 1117 191 102 2649 3210 1973 301 9595 HL 42 1251 206 117 3757 1962 4646 391 12372 MV 3 221 51 24 678 301 367 269 1914 Total 278 8645 1347 660 22558 9565 12394 1960
![Page 29: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/29.jpg)
Table 2: Expected Frequencies (assuming independence)
Ra A Ro PP/PS PL B HL MV TotalRa 1.32 41.11 6.41 3.14 107.27 45.49 58.94 9.32 273A 41.58 1292.98 201.46 98.71 3373.86 1430.58 1853.69 293.14 8586
Ro 7.01 218.06 33.98 16.65 568.99 241.26 312.62 49.44 1448PP/PS 3.40 105.87 16.50 8.08 276.24 117.13 151.78 24.00 703
PL 109.04 3390.72 528.32 258.86 8847.63 3751.56 4861.14 768.75 22516B 46.46 1444.92 225.14 110.31 3770.34 1598.69 2071.53 327.59 9595
HL 59.91 1863.12 290.30 142.24 4861.56 2061.39 2671.08 422.41 12372MV 9.27 288.23 44.91 22.00 752.10 318.91 413.23 65.35 1914
Total 278 8645 1347 660 22558 9565 12394 1960 57407
![Page 30: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/30.jpg)
Table 3: Standardized residuals
Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Ra 21.5 1.4 1.8 1.6 -2.4 -1.0 -1.9 0.6 A 3.6 47.4 2.6 -1.4 -14.1 -9.2 -11.7 -4.5
First Ro 1.9 4.1 28.0 4.7 -4.6 -2.8 -5.2 -0.3 Victimization PP/PS -0.2 -0.4 5.8 18.6 -2.0 -0.2 -4.1 2.9
in pair PL -3.3 -13.1 -5.0 -1.9 35.0 -17.9 -16.8 -2.9 B 0.8 -8.6 -2.3 -0.8 -18.3 40.3 -2.2 -1.5 HL -2.3 -14.2 -4.9 -2.1 -15.8 -2.2 38.2 -1.5 MV -2.1 -4.0 0.9 0.4 -2.7 -1.0 -2.3 25.2
11,430 (highly significant)
![Page 31: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/31.jpg)
Table 3: Conditional distribution of second victimization given the first victimization (%)
Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Ra 9.5 18.3 4.0 2.2 30.0 14.3 17.6 4.0 100.0 A 0.8 34.9 2.8 1.0 29.7 12.6 15.7 2.5 100.0
First Ro 0.8 19.3 13.6 2.5 31.7 13.6 15.3 3.2 100.0 Victimization PP/PS 0.4 14.5 5.7 8.7 34.6 16.4 14.4 5.4 100.0
in pair PL 0.3 11.7 1.8 1.0 53.9 11.8 16.4 3.1 100.0 B 0.5 11.6 2.0 1.1 27.6 33.5 20.6 3.1 100.0 HL 0.3 10.1 1.7 0.9 30.4 15.9 37.6 3.2 100.0 MV 0.2 11.5 2.7 1.3 35.4 15.7 19.2 14.1 100.0 Marginal 0.5 15.1 2.3 1.1 39.3 16.7 21.6 3.4 100.0
![Page 32: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/32.jpg)
Log Linear Model
![Page 33: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/33.jpg)
Recall, if the two variables, rows (X) and columns (Y) are independent then
ij ij i jN N
and
ln ln ln lnij i jN
![Page 34: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/34.jpg)
In general let
1( ) 2( ) 12( , )ln ij i j i ju u u u
1 ln iji j
urc
1( )1 lni ij
j
u uc
2( )1 lnj ij
i
u ur
12( , ) 1( ) 2( )lni j ij i ju u u u
then
where1( ) 2( ) 12( , ) 12( , ) 0i j i j i j
i j i j
u u u u
(1)
Equation (1) is called the log-linear model for the frequencies xij.
![Page 35: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/35.jpg)
Note: X and Y are independent if
1( ) 2( )ln ij i ju u u
In this case the log-linear model becomes
12( , ) 0 for all ,i ju i j
![Page 36: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/36.jpg)
Comment:The log-linear model for a two-way frequency table:
is similar to the model for a two factor experiment
1( ) 2( ) 12( , )ln ij i j i ju u u u
ijji
ij jBiAy
and when ofmean the where
ijkij
ijkijjiijky
![Page 37: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/37.jpg)
Three-way Frequency Tables
![Page 38: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/38.jpg)
ExampleData from the Framingham Longitudinal Study of Coronary Heart Disease (Cornfield [1962])
Variables1. Systolic Blood Pressure (X)
– < 127, 127-146, 147-166, 167+
2. Serum Cholesterol– <200, 200-219, 220-259, 260+
3. Heart Disease– Present, Absent
The data is tabulated on the next slide
![Page 39: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/39.jpg)
Three-way Frequency Table
Coronary Heart
Serum Cholesterol
Systolic Blood pressure (mm Hg)
Disease (mm/100 cc) <127 127-146 147-166 167+ <200 2 3 3 4
Present 200-219 3 2 0 3 220-259 8 11 6 6 260+ 7 12 11 11 <200 117 121 47 22
Absent 200-219 85 98 43 20 220-259 119 209 68 43 260+ 67 99 46 33
![Page 40: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/40.jpg)
Log-Linear model for three-way tables
Let ijk denote the expected frequency in cell (i,j,k) of the table then in general
1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u
1( ) 2( ) 3( ) 12( , ) 12( , )0 i j k i j i ji j k i j
u u u u u
13( , ) 23( , ) 123( , , )i k j k i j ku u u
where
13( , ) 13( , ) 23( , ) 23( , )i k i k j k j ki k j k
u u u u 123( , , ) 123( , , ) 123( , , )i j k i j k i j k
i j k
u u u
![Page 41: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/41.jpg)
Hierarchical Log-linear models for categorical Data
For three way tables
The hierarchical principle:If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
![Page 42: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/42.jpg)
1.Model: (All Main effects model)ln ijk = u + u1(i) + u2(j) + u3(k)
i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0.
Notation:[1][2][3]
Description:Mutual independence between all three variables.
![Page 43: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/43.jpg)
2.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j)
i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0.
Notation:[12][3]
Description:Independence of Variable 3 with variables 1 and 2.
![Page 44: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/44.jpg)
3.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k)
i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0.
Notation: [13][2]
Description:Independence of Variable 2 with variables 1 and 3.
![Page 45: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/45.jpg)
4.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u23(j,k)
i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0.
Notation: [23][1]
Description:Independence of Variable 3 with variables 1 and 2.
![Page 46: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/46.jpg)
5.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
i.e. u23(j,k) = u123(i,j,k) = 0.
Notation:[12][13]
Description:Conditional independence between variables 2 and 3 given variable 1.
![Page 47: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/47.jpg)
6.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k)
i.e. u13(i,k) = u123(i,j,k) = 0.
Notation:[12][23]
Description:Conditional independence between variables 1 and 3 given variable 2.
![Page 48: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/48.jpg)
7.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k)
i.e. u12(i,j) = u123(i,j,k) = 0.
Notation: [13][23]
Description:Conditional independence between variables 1 and 2 given variable 3.
![Page 49: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/49.jpg)
8.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
+ u23(j,k) i.e. u123(i,j,k) = 0.
Notation: [12][13][23]
Description:Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
![Page 50: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/50.jpg)
9.Model: (the saturated model)ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
+ u23(j,k) + u123(i,j,k)
Notation: [123]
Description:No simplifying dependence structure.
![Page 51: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/51.jpg)
Hierarchical Log-linear models for 3 way table
Model Description[1][2][3] Mutual independence between all three variables.
[1][23] Independence of Variable 1 with variables 2 and 3.
[2][13] Independence of Variable 2 with variables 1 and 3.
[3][12] Independence of Variable 3 with variables 1 and 2.
[12][13] Conditional independence between variables 2 and 3 given variable 1.
[12][23] Conditional independence between variables 1 and 3 given variable 2.
[13][23] Conditional independence between variables 1 and 2 given variable 3.
[12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
[123] The saturated model
![Page 52: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/52.jpg)
Maximum Likelihood Estimation
Log-Linear Model
![Page 53: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/53.jpg)
For any Model it is possible to determine the maximum Likelihood Estimators of the parameters
Example Two-way table – independence – multinomial model
11 1211 12 11 12
11
, , , rcxx xrc rc
rc
Nf x x x
x x
11 12
11 12
11
!! !
rcxx xrc
rc
Nx x N N N
ij ij ijE x N orij
ij N
![Page 54: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/54.jpg)
Log-likelihood
11 12, , ln ! ln !rc iji j
l N x
ln lnij ij iji j i j
N x x lnij ij
i j
K x where ln ! ln ! lnij
i j
K N x N N
1 2ln ij i ju u u
With the model of independence
![Page 55: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/55.jpg)
and
1 1 1 2 1 2, , , , , ,c rl u u u u u K
1 2ij i ji j
x u u u
with 1 2 0i ji j
u u
1 2i ji ji j
K Nu x u x u
1 2 1 2i j i ju u u u uuij
i j i j i j
e e e e N
also
![Page 56: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/56.jpg)
Let 1 2 21 1 1 2 1 2, , , , , , , , ,c rg u u u u u
1 2
1 11 2i ju uu
i ji j i j
u u e e e N
1 2i ji j
i j
K Nu x u x u
Now
1 2 1 0i ju uu
i j
g N e e e Nu
1
![Page 57: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/57.jpg)
1 2
11
i ju uui
ji
g x e e eu
1
11 0i
i
u
i u
i
ex Ne
1
1
1i
i
ui i
u
i
x xeN Ne
1 111 and 0
ii i
i
xx
rN N N
Since
![Page 58: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/58.jpg)
Now 1
1iu
ie x K
or 11 ln lniiu x K
11 ln ln 0iii i
u x r K
![Page 59: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/59.jpg)
Hence
11ln lni ii
i
u x xr
11ln ln i
i
K xr
and
21ln lnj jj
i
u x xc Similarly
1 2 1 2i j i ju u u u uuij
i j i j i j
e e e e N
Finally
![Page 60: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/60.jpg)
Hence
2
1
1
ju j
c c
jj
xe
x
Now
1 2i j
uu u
i j
Nee e
and
1
1
1
iu i
r r
ii
xe
x
11
1 1
r c cru
i ji ji j
i j
Ne x xx x
11
1 1
1 r c cr
i ji j
x xN
![Page 61: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/61.jpg)
Hence
Note
1 1ln ln lni ji j
u x x Nr c
1 2ln ij i ju u u 1 1ln ln lni j
i j
x x Nr c
1 1ln ln ln lni i j ji i
x x x xr c
ln ln lni jN x x
or i jij
x xN
![Page 62: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/62.jpg)
Comments• Maximum Likelihood estimates can be
computed for any hierarchical log linear model (i.e. more than 2 variables)
• In certain situations the equations need to be solved numerically
• For the saturated model (all interactions and main effects), the estimate of ijk… is xijk… .
![Page 63: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/63.jpg)
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
![Page 64: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/64.jpg)
Multiway Frequency Tables
• Two-Way
A
B
![Page 65: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/65.jpg)
• four -Way
A
B
C
D
![Page 66: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/66.jpg)
Log Linear Model
![Page 67: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/67.jpg)
Two- way table
where1( ) 2( ) 12( , ) 12( , ) 0i j i j i j
i j i j
u u u u
1( ) 2( ) 12( , )ln ij i j i ju u u u
jiji
uuuuij
jiji eeee ,1221,1221
The multiplicative form:
![Page 68: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/68.jpg)
Log-Linear model for three-way tablesLet ijk denote the expected frequency in cell (i,j,k) of the table then in general
1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u
1( ) 2( ) 3( ) 12( , ) 12( , )0 i j k i j i ji j k i j
u u u u u
13( , ) 23( , ) 123( , , )i k j k i j ku u u
where
13( , ) 13( , ) 23( , ) 23( , )i k i k j k j ki k j k
u u u u 123( , , ) 123( , , ) 123( , , )i j k i j k i j k
i j k
u u u
![Page 69: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/69.jpg)
Log-Linear model for three-way tablesLet ijk denote the expected frequency in cell (i,j,k) of the table then in general
1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u
13( , ) 23( , ) 123( , , )i k j k i j ku u u
or the multiplicative form1( ) 2( ) 3( ) 12 ( , )ln ij i j k i ju u u uu
ij e e e e e e 13( , ) 23( , ) 123( , , )i k j k i j ku u ue e e
13( , ) 23( , ) 123( , , )i k j k i j k 1( ) 2( ) 3( ) 12( , )i j k i j
![Page 70: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/70.jpg)
Comments• The log-linear model is similar to the ANOVA
models for factorial experiments. • The ANOVA models are used to understand the
effects of categorical independent variables (factors) on a continuous dependent variable (Y).
• The log-linear model is used to understand dependence amongst categorical variables
• The presence of interactions indicate dependence between the variables present in the interactions
![Page 71: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/71.jpg)
Hierarchical Log-linear models for categorical Data
For three way tables
The hierarchical principle:If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
![Page 72: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/72.jpg)
1.Model: (All Main effects model)ln ijk = u + u1(i) + u2(j) + u3(k)
i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0.
Notation:[1][2][3]
Description:Mutual independence between all three variables.
![Page 73: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/73.jpg)
2.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j)
i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0.
Notation:[12][3]
Description:Independence of Variable 3 with variables 1 and 2.
![Page 74: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/74.jpg)
3.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k)
i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0.
Notation: [13][2]
Description:Independence of Variable 2 with variables 1 and 3.
![Page 75: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/75.jpg)
4.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u23(j,k)
i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0.
Notation: [23][1]
Description:Independence of Variable 3 with variables 1 and 2.
![Page 76: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/76.jpg)
5.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
i.e. u23(j,k) = u123(i,j,k) = 0.
Notation:[12][13]
Description:Conditional independence between variables 2 and 3 given variable 1.
![Page 77: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/77.jpg)
6.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k)
i.e. u13(i,k) = u123(i,j,k) = 0.
Notation:[12][23]
Description:Conditional independence between variables 1 and 3 given variable 2.
![Page 78: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/78.jpg)
7.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k)
i.e. u12(i,j) = u123(i,j,k) = 0.
Notation: [13][23]
Description:Conditional independence between variables 1 and 2 given variable 3.
![Page 79: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/79.jpg)
8.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
+ u23(j,k) i.e. u123(i,j,k) = 0.
Notation: [12][13][23]
Description:Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
![Page 80: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/80.jpg)
9.Model: (the saturated model)ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
+ u23(j,k) + u123(i,j,k)
Notation: [123]
Description:No simplifying dependence structure.
![Page 81: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/81.jpg)
Hierarchical Log-linear models for 3 way table
Model Description[1][2][3] Mutual independence between all three variables.
[1][23] Independence of Variable 1 with variables 2 and 3.
[2][13] Independence of Variable 2 with variables 1 and 3.
[3][12] Independence of Variable 3 with variables 1 and 2.
[12][13] Conditional independence between variables 2 and 3 given variable 1.
[12][23] Conditional independence between variables 1 and 3 given variable 2.
[13][23] Conditional independence between variables 1 and 2 given variable 3.
[12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
[123] The saturated model
![Page 82: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/82.jpg)
Goodness of Fit Statistics
These statistics can be used to check if a log-linear model will fit the
observed frequency table
![Page 83: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/83.jpg)
Goodness of Fit StatisticsThe Chi-squared statistic
22 Observed Expected
Expected
The Likelihood Ratio statistic:
2 2 ln 2 lnˆ
ijkijk
ijk
xObservedG Observed xExpected
d.f. = # cells - # parameters fitted
2ˆ
ˆijk ijk
ijk
x
We reject the model if 2 or G2 is greater than2
/ 2
![Page 84: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/84.jpg)
Example: Variables
Coronary Heart
Serum Cholesterol
Systolic Blood pressure (mm Hg)
Disease (mm/100 cc) <127 127-146 147-166 167+ <200 2 3 3 4
Present 200-219 3 2 0 3 220-259 8 11 6 6 260+ 7 12 11 11 <200 117 121 47 22
Absent 200-219 85 98 43 20 220-259 119 209 68 43 260+ 67 99 46 33
1. Systolic Blood Pressure (B)Serum Cholesterol (C)Coronary Heart Disease (H)
![Page 85: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/85.jpg)
MODEL DF LIKELIHOOD- PROB. PEARSON PROB. RATIO CHISQ CHISQ ----- -- ----------- ------- ------- ------- B,C,H. 24 83.15 0.0000 102.00 0.0000 B,CH. 21 51.23 0.0002 56.89 0.0000 C,BH. 21 59.59 0.0000 60.43 0.0000 H,BC. 15 58.73 0.0000 64.78 0.0000 BC,BH. 12 35.16 0.0004 33.76 0.0007 BH,CH. 18 27.67 0.0673 26.58 0.0872 n.s. CH,BC. 12 26.80 0.0082 33.18 0.0009 BC,BH,CH. 9 8.08 0.5265 6.56 0.6824 n.s.
Goodness of fit testing of Models
Possible Models:1. [BH][CH] – B and C independent given H.2. [BC][BH][CH] – all two factor interaction model
![Page 86: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/86.jpg)
Model 1: [BH][CH] Log-linear parameters
Heart disease -Blood Pressure Interaction
Bp Hd <127 127-146 147-166 167+ Pres -0.256 -0.241 0.066 0.431 Abs 0.256 0.241 -0.066 -0.431
,HB i ju
Bp Hd <127 127-146 147-166 167+ Pres -2.607 -2.733 0.660 4.461 Abs 2.607 2.733 -0.660 -4.461
,
,
HB i j
HB i j
u
uz
![Page 87: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/87.jpg)
Multiplicative effect
,
, ,exp HB i juHB i j HB i ju e
Bp Hd <127 127-146 147-166 167+ Pres 0.774 0.786 1.068 1.538 Abs 1.291 1.272 0.936 0.65
, ,ln ijk H i B j C k HB i j HC i ku u u u u u
, ,H i B j C k HB i j HC i ku u u u uuijk e e e e e e
Log-Linear Model
, ,H i B j C k HB i j HC i k
![Page 88: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/88.jpg)
Heart Disease - Cholesterol Interaction
Chol Hd <200 200-219 220-259 260+ Pres -0.233 -0.325 0.063 0.494 Abs 0.233 0.325 -0.063 -0.494
,HC i ku
,
,
HC i k
HC i k
u
uz
Chol Hd <200 200-219 220-259 260+ Pres -1.889 -2.268 0.677 5.558 Abs 1.889 2.268 -0.677 -5.558
![Page 89: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/89.jpg)
Multiplicative effect
,
, ,exp HB i kuHC i k HB i ku e
Chol Hd <200 200-219 220-259 260+ Pres 0.792 0.723 1.065 1.640 Abs 1.262 1.384 0.939 0.610
![Page 90: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/90.jpg)
Model 2: [BC][BH][CH] Log-linear parameters
Blood pressure-Cholesterol interaction: ,BC j ku
Bp Chol <200 200-219 220-259 260+ <200 0.222 -0.019 -0.034 -0.169 200-219 0.114 -0.041 0.013 -0.086 220-259 -0.114 0.154 -0.058 0.018 260+ -0.221 -0.094 0.079 0.237
![Page 91: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/91.jpg)
,
,
BC j k
BC j k
u
uz
Bp Chol <200 200-219 220-259 260+ <200 2.68 -0.236 -0.326 -1.291 200-219 1.27 -0.472 0.117 -0.626 220-259 -1.502 2.253 -0.636 0.167 260+ -2.487 -1.175 0.785 2.051
Bp Chol <200 200-219 220-259 260+ <200 1.248 0.981 0.967 0.844 200-219 1.120 0.960 1.013 0.918 220-259 0.892 1.166 0.944 1.018 260+ 0.802 0.910 1.082 1.267
Multiplicative effect ,
, ,exp HB j kuBC j k BC j ku e
![Page 92: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/92.jpg)
Heart disease -Blood Pressure Interaction
Bp Hd <127 127-146 147-166 167+ Pres -0.211 -0.232 0.055 0.389 Abs 0.211 0.232 -0.055 -0.389
,HB i ju
Bp Hd <127 127-146 147-166 167+ Pres -2.125 -2.604 0.542 3.938 Abs 2.125 2.604 -0.542 -3.938
,
,
HB i j
HB i j
u
uz
![Page 93: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/93.jpg)
Multiplicative effect
,
, ,exp HB i juHB i j HB i ju e
Bp Hd <127 127-146 147-166 167+ Pres 0.809 0.793 1.056 1.475 Abs 1.235 1.261 0.947 0.678
![Page 94: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/94.jpg)
Heart Disease - Cholesterol Interaction
Chol Hd <200 200-219 220-259 260+ Pres -0.212 -0.316 0.069 0.460 Abs 0.212 0.316 -0.069 -0.460
,HC i ku
,
,
HC i k
HC i k
u
uz
Chol Hd <200 200-219 220-259 260+ Pres -1.712 -2.199 0.732 5.095 Abs 1.712 2.199 -0.732 -5.095
![Page 95: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/95.jpg)
Multiplicative effect
,
, ,exp HB i kuHC i k HB i ku e
Chol Hd <200 200-219 220-259 260+ Pres 0.809 0.729 1.071 1.584 Abs 1.237 1.372 0.933 0.631
![Page 96: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/96.jpg)
Another Example
In this study it was determined for N = 4353 males
1. Occupation category2. Educational Level3. Academic Aptidude
![Page 97: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/97.jpg)
1. Occupation categoriesa. Self-employed Businessb. Teacher\Educationc. Self-employed Professionald. Salaried Employed
2. Education levelsa. Lowb. Low/Medc. Medd. High/Mede. High
![Page 98: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/98.jpg)
3. Academic Aptitudea. Lowb. Low/Medc. High/Medd. High
![Page 99: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/99.jpg)
Table Self-employed, Business Teacher Education Education
Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low 42 55 22 3 122 Low 0 0 1 19 20
LMed 72 82 60 12 226 LMed 0 3 3 60 66 Med 90 106 85 25 306 Med 1 4 5 86 96
HMed 27 48 47 8 130 HMed 0 0 2 36 38 High 8 18 19 5 50 High 0 0 1 14 15 Total 239 309 233 53 834 Total 1 7 12 215 235
Self-employed, Professional Salaried Employed Education Education
Aptitude Low LMed HMed High Total Aptitude Low LMed HMed High Total Low 1 2 8 19 30 Low 172 151 107 42 472
LMed 1 2 15 33 51 LMed 208 198 206 92 704 Med 2 5 25 83 115 Med 279 271 331 191 1072
HMed 2 2 10 45 59 HMed 99 126 179 97 501 High 0 0 12 19 31 High 36 35 99 79 249 Total 6 11 70 199 286 Total 794 781 922 501 2998
![Page 100: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/100.jpg)
Two-way Tables (With 2): Education vs Aptitude Education vs Occcupation
(2 = 178.6) (2 = 1254.1) Low Lmed HMed High Total Low Lmed HMed High Total
Low 215 208 138 83 644 SEB 239 309 233 53 834 Lmed 281 285 284 197 1047 SEP 6 11 70 199 286 Med 372 386 446 385 1589 TCHR 1 7 12 215 235
HMed 128 176 238 186 728 SEM 794 781 922 501 2998 High 44 53 131 117 345 Total 1040 1108 1237 968 4353 Total 1040 1108 1237 968 4353
Aptitude vs Occupation
(2 = 35.8) SEB SEP TCHR SEM Total
Low 122 30 20 472 644 Lmed 226 51 66 704 1047 Med 306 115 96 1072 1589
HMed 130 59 38 501 728 High 50 31 15 249 345 Total 834 286 235 2998 4353
![Page 101: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/101.jpg)
• It is common to handle a Multiway table by testing for independence in all two way tables.
• This is similar to looking at all the bivariate correlations
• In this example we learn that:
1. Education is related to Aptitude2. Education is related to Occupational category3. Education is related to Aptitude
Can we do better than this?
![Page 102: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/102.jpg)
Fitting various log-linear models
Goodness of fit
Model Likelihood
Ratio DF Sig. Pearson DF Sig. [Occ][Ed][Apt] 1356.9702 69 0.0000 1519.802 69 0.0000 [Occ, Ed] [Apt] 228.2215 60 0.0000 226.6615 60 0.0000 [Apt, Ed][Occ] 1179.6403 57 0.0000 1336.765 57 0.0000 [Apt, Occ][Ed] 1319.561 57 0.0000 1424.1488 57 0.0000 [Occ, Ed] [Occ,Apt] 190.8123 48 0.0000 184.6386 48 0.0000 [Apt, Ed] [Occ,Apt] 1142.2311 45 0.0000 1301.1317 45 0.0000 [Apt, Ed] [Occ, Ed] 50.8915 48 0.3605 48.0105 48 0.4724 [Apt, Ed] [Occ, Ed] [Occ, Apt] 25.1048 36 0.9134 23.6465 36 0.9436
Simplest model that fits is: [Apt,Ed][Occ,Ed]This model implies conditional independence betweenAptitude and Occupation given Education.
![Page 103: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/103.jpg)
Log-linear ParametersAptitude – Education Interaction
Education Aptitude Low Low-Med High-Med High
Low 0.4602 0.3225 -0.2752 -0.5075 Low-Med 0.1857 0.0953 -0.0957 -0.1853
Med 0.0399 -0.0277 -0.0706 0.0584 High-Med -0.2250 -0.0111 0.1032 0.1329
High -0.4607 -0.3791 0.3383 0.5015
![Page 104: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/104.jpg)
Aptitude – Education Interaction (Multiplicative)
Education Aptitude Low Low-Med High-Med High
Low 1.584 1.381 0.759 0.602 Low-Med 1.204 1.100 0.909 0.831
Med 1.041 0.973 0.932 1.060 High-Med 0.799 0.989 1.109 1.142
High 0.631 0.684 1.403 1.651
![Page 105: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/105.jpg)
Occupation – Education Interaction
Occupation Education SEB T SEP SAL
Low 1.241 -1.528 -0.718 1.005 LowMed 0.800 -0.280 -0.810 0.290 HighMed -0.050 -0.309 0.472 -0.112
High -1.991 2.117 1.057 -1.182
![Page 106: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/106.jpg)
Occupation – Education Interaction (Multiplicative)
Occupation Education SEB T SEP SAL
Low 3.460 0.217 0.488 2.731 LowMed 2.226 0.756 0.445 1.336 HighMed 0.951 0.734 1.603 0.894
High 0.137 8.303 2.877 0.307
![Page 107: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/107.jpg)
Conditional Test Statistics
![Page 108: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/108.jpg)
• Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1.
• That is the parameters of Model 2 are a subset of the parameters of Model 1.
• Also assume that Model 1 has been shown to adequately fit the data.
![Page 109: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/109.jpg)
In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:
2 2 22 1 2 1G G G
1
2
2Expected
ObservedExpected
2 1df df df
![Page 110: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/110.jpg)
Example
Table 1: Cross-Classification of a Sample of 1008 consumers according to: (1) The Softness of the Laundry Used (2) The Previous Use of Detergent Brand M (3) The Temperature of the Laundry Water Used (4) The preference of Detergent Brand X over Brand M in a Consumer Blind Trial. Previous user of M Previous nonuser of M
Water Softness
Brand Preference
High Temperature
Low Temperature
High Temperature
Low Temperature
Soft X 19 57 29 63 M 29 49 27 53 Medium X 23 47 33 66 M 47 55 23 50 Hard X 24 37 42 68 M 43 52 30 42
![Page 111: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/111.jpg)
Model d.f. G2 p - valueAll k-factor models[1][2][3][4] 18 42.9 0.00083 G2(1)[12][13][14][23][24][34] 9 9.9 0.35864 G2(2)[123][124][134][234] 2 0.7 0.70469 G2(3)[1234] 0 0.0 G2(4)
Goodness of Fit test for the all k-factor models
Model d.f. G2 p - valuetwo-factor interactions 9 33.0 0.00013 G2(1|2)= G2(1)-G2(2)three-factor interactions 7 9.2 0.23861 G2(2|3)= G2(2)-G2(3)four-factor interaction 2 0.7 0.70469 G2(3|4)= G2(3)-G2(4)
Conditional tests for zero k-factor interactions
![Page 112: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/112.jpg)
Conclusions
1. The four factor interaction is not significant G2(3|4) = 0.7 (p = 0.705)
2. The all three factor model provides a significant fit G2(3) = 0.7 (p = 0.705)
3. All the three factor interactions are not significantly different from 0, G2(2|3) = 9.2 (p = 0.239).
4. The all two factor model provides a significant fit G2(2) = 9.9 (p = 0.359)
5. There are significant 2 factor interactions G2(1|2) = 33.0 (p = 0.00083.
Conclude that the model should contain main effects and some two-factor interactions
![Page 113: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/113.jpg)
There also may be a natural sequence of progressively complicated models that one might want to identify.In the laundry detergent example the variables are:
1. Softness of Laundry Used2. Previous use of Brand M3. Temperature of laundry water used4. Preference of brand X over brand M
![Page 114: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/114.jpg)
A natural order for increasingly complex models which should be considered might be:
1. [1][2][3][4]2. [1][3][24]3. [1][34][24]4. [13][34][24]5. [13][234]6. [134][234]
The all-Main effects model Independence amongst all four variables
Since previous use of brand M may be highly related to preference for brand M, add first the 2-4 interaction
Brand M is recommended for hot water add 2nd the 3-4 interactionbrand M is also recommended for Soft laundry add 3rd the 1-3 interaction
Add finally some possible 3-factor interactions
![Page 115: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/115.jpg)
Models d]f] G2
[1][3][24] 17 22.4[1][24][34] 16 18[13][24][34] 14 11.9[13][23][24][34] 13 11.2[12][13][23][24][34] 11 10.1[1][234] 14 14.5[134][24] 10 12.2[13][234] 12 8.4[24][34][123] 9 8.4[123][234] 8 5.6
Likelihood Ratio G2 for various models
![Page 116: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/116.jpg)
Table 2: A Partitioning of the Likelihood Ratio Chi-Square Statistic for Complete Independence (Model (a) = [1][2][3][4], Model (b) = [1][3][24], Model (c) = [1][24][34], Model (d) = [13][24][34], Model (e) = [13][234], Model (f) = [123][234]) Model d.f. G2 Model (a) 18 42.9* Difference between models (b) and (a) 1 20.5* Model (b) 17 22.4 Difference between models (c) and (b) 1 4.4* Model (c) 16 18.0 Difference between models (d) and (c) 2 6.1* Model (d) 14 11.9 Difference between models (e) and (d) 2 3.5 Model (e) 12 8.4 Difference between models (f) and (e) 4 2.8 Model (f) 8 5.6
![Page 117: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/117.jpg)
Discrete Multivariate Analysis
Analysis of Multivariate Categorical Data
![Page 118: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/118.jpg)
Log-Linear model for three-way tables
Let ijk denote the expected frequency in cell (i,j,k) of the table then in general
1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u
1( ) 2( ) 3( ) 12( , ) 12( , )0 i j k i j i ji j k i j
u u u u u
13( , ) 23( , ) 123( , , )i k j k i j ku u u
where
13( , ) 13( , ) 23( , ) 23( , )i k i k j k j ki k j k
u u u u 123( , , ) 123( , , ) 123( , , )i j k i j k i j k
i j k
u u u
![Page 119: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/119.jpg)
Hierarchical Log-linear models for categorical Data
For three way tables
The hierarchical principle:If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction
![Page 120: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/120.jpg)
Models for three-way tables
![Page 121: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/121.jpg)
1.Model: (All Main effects model)ln ijk = u + u1(i) + u2(j) + u3(k)
i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0.
Notation:[1][2][3]Description:Mutual independence between all three variables.
Comment: For any model the parameters (u, u1(i) , u2(j) , u3(k)) can be estimated in addition to the expected frequencies (ijk) in each cell
![Page 122: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/122.jpg)
2.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j)
i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0.
Notation:[12][3]
Description:Independence of Variable 3 with variables 1 and 2.
![Page 123: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/123.jpg)
3.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k)
i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0.
Notation: [13][2]
Description:Independence of Variable 2 with variables 1 and 3.
![Page 124: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/124.jpg)
4.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u23(j,k)
i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0.
Notation: [23][1]
Description:Independence of Variable 3 with variables 1 and 2.
![Page 125: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/125.jpg)
5.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
i.e. u23(j,k) = u123(i,j,k) = 0.
Notation:[12][13]
Description:Conditional independence between variables 2 and 3 given variable 1.
![Page 126: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/126.jpg)
6.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k)
i.e. u13(i,k) = u123(i,j,k) = 0.
Notation:[12][23]
Description:Conditional independence between variables 1 and 3 given variable 2.
![Page 127: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/127.jpg)
7.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k)
i.e. u12(i,j) = u123(i,j,k) = 0.
Notation: [13][23]
Description:Conditional independence between variables 1 and 2 given variable 3.
![Page 128: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/128.jpg)
8.Model:ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
+ u23(j,k) i.e. u123(i,j,k) = 0.
Notation: [12][13][23]
Description:Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.
![Page 129: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/129.jpg)
9.Model: (the saturated model)ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)
+ u23(j,k) + u123(i,j,k)
Notation: [123]
Description:No simplifying dependence structure.
![Page 130: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/130.jpg)
Goodness of Fit StatisticsThe Chi-squared statistic
22 Observed Expected
Expected
The Likelihood Ratio statistic:
2 2 ln 2 lnˆ
ijkijk
ijk
xObservedG Observed xExpected
d.f. = # cells - # parameters fitted
2ˆ
ˆijk ijk
ijk
x
We reject the model if 2 or G2 is greater than2
/ 2
![Page 131: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/131.jpg)
Conditional Test Statistics
![Page 132: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/132.jpg)
In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:
2 2 22 1 2 1G G G
1
2
2Expected
ObservedExpected
2 1df df df
![Page 133: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/133.jpg)
Stepwise selection procedures
Forward SelectionBackward Elimination
![Page 134: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/134.jpg)
Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model:To determine the significance of a parameter added we use the statistic:
G2(2|1) = G2(2) – G2(1)Model 1 contains the parameter.Model 2 does not contain the parameter
![Page 135: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/135.jpg)
Backward Elimination: Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved.At each step the log-linear parameter that is least significant is deleted from the model:
To determine the significance of a parameter deleted we use the statistic:
G2(2|1) = G2(2) – G2(1)Model 1 contains the parameter.Model 2 does not contain the parameter
![Page 136: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/136.jpg)
Example: Fitting a Log-linear model – Forward Selection Table: Dyke -Patterson Data - N=1729 individuals classified according to five variables (1) Reading Newspapers (2) Listen to radio (3) Do "solid'" reading (4) Attend Lectures (5) Knowledge regarding cancer
Radio No Radio Solid
Reading No solid Reading
Solid Reading
No solid Reading
Good Poor Good Poor Good Poor Good Poor Newspaper Lectures 23 8 8 4 27 18 7 6 None 102 67 35 59 201 177 75 156 None Lectures 1 3 4 3 3 8 2 10 None 16 16 13 50 67 83 84 393
![Page 137: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/137.jpg)
MODEL D.F. CHI-SQUARE PROB CHI-SQUARE PROB ----- ---- ---------- ---- ---------- ---- K,L,N,S,R. 26 596.84 0.0000 751.31 0.0000 MODELS FORMED BY ADDING TERMS TO MODEL -- K,L,N,S,R. LIKELIHOOD-RATIO PEARSON MODEL D.F. CHI-SQUARE PROB CHI-SQUARE PROB ----- ---- ---------- ---- ---------- ---- KL,N,S,R. 25 579.68 0.0000 691.18 0.0000 DIFF. DUE TO ADDING KL. 1 17.16 0.0000 KN,L,S,R. 25 491.06 0.0000 533.89 0.0000 DIFF. DUE TO ADDING KN. 1 105.78 0.0000 KS,L,N,R. 25 446.39 0.0000 497.12 0.0000 DIFF. DUE TO ADDING KS. 1 150.45 0.0000 KR,L,N,S. 25 572.59 0.0000 674.61 0.0000 DIFF. DUE TO ADDING KR. 1 24.25 0.0000 K,LN,S,R. 25 575.24 0.0000 688.89 0.0000 DIFF. DUE TO ADDING LN. 1 21.60 0.0000 K,LS,N,R. 25 573.09 0.0000 692.25 0.0000 DIFF. DUE TO ADDING LS. 1 23.74 0.0000 K,LR,N,S. 25 577.89 0.0000 698.17 0.0000 DIFF. DUE TO ADDING LR. 1 18.95 0.0000 K,L,NS,R. 25 343.13 0.0000 383.90 0.0000 DIFF. DUE TO ADDING NS. 1 253.71 0.0000 K,L,NR,S. 25 522.61 0.0000 615.20 0.0000 DIFF. DUE TO ADDING NR. 1 74.23 0.0000 K,L,N,SR. 25 575.76 0.0000 680.88 0.0000 DIFF. DUE TO ADDING SR. 1 21.08 0.0000 STEP 1. BEST MODEL FOUND IS -- K,L,NS,R.
K = knowledge
N = Newspaper
R = Radio
S = Reading
L = Lectures
![Page 138: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/138.jpg)
KL,NS,R. 24 325.97 0.0000 339.14 0.0000 DIFF. DUE TO ADDING KL. 1 17.16 0.0000 KN,L,NS,R. 24 237.35 0.0000 258.87 0.0000 DIFF. DUE TO ADDING KN. 1 105.78 0.0000 KS,L,NS,R. 24 192.68 0.0000 216.12 0.0000 DIFF. DUE TO ADDING KS. 1 150.45 0.0000 KR,L,NS. 24 318.88 0.0000 329.40 0.0000 DIFF. DUE TO ADDING KR. 1 24.25 0.0000 K,LN,NS,R. 24 321.53 0.0000 341.35 0.0000 DIFF. DUE TO ADDING LN. 1 21.60 0.0000 K,LS,NS,R. 24 319.39 0.0000 348.68 0.0000 DIFF. DUE TO ADDING LS. 1 23.75 0.0000 K,LR,NS. 24 324.18 0.0000 341.62 0.0000 DIFF. DUE TO ADDING LR. 1 18.95 0.0000 K,L,NR,NS. 24 268.90 0.0000 280.86 0.0000 DIFF. DUE TO ADDING NR. 1 74.23 0.0000 K,L,SR,NS. 24 322.05 0.0000 347.33 0.0000 DIFF. DUE TO ADDING SR. 1 21.08 0.0000 STEP 2. BEST MODEL FOUND IS -- KS,L,NS,R.
![Page 139: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/139.jpg)
KL,KS,NS,R. 23 175.52 0.0000 182.86 0.0000 DIFF. DUE TO ADDING KL. 1 17.16 0.0000 KN,KS,L,NS,R. 23 152.96 0.0000 163.87 0.0000 DIFF. DUE TO ADDING KN. 1 39.72 0.0000 KR,KS,L,NS. 23 168.43 0.0000 173.32 0.0000 DIFF. DUE TO ADDING KR. 1 24.25 0.0000 KS,LN,NS,R. 23 171.08 0.0000 184.56 0.0000 DIFF. DUE TO ADDING LN. 1 21.60 0.0000 LS,KS,NS,R. 23 168.93 0.0000 202.28 0.0000 DIFF. DUE TO ADDING LS. 1 23.74 0.0000 KS,LR,NS. 23 173.73 0.0000 178.08 0.0000 DIFF. DUE TO ADDING LR. 1 18.95 0.0000 KS,L,NR,NS. 23 118.45 0.0000 128.83 0.0000 DIFF. DUE TO ADDING NR. 1 74.23 0.0000 SR,KS,L,NS. 23 171.60 0.0000 198.23 0.0000 DIFF. DUE TO ADDING SR. 1 21.08 0.0000 STEP 3. BEST MODEL FOUND IS -- KS,L,NR,NS.
![Page 140: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/140.jpg)
LN,KL,SR,KR,KN,LR,LS,KS,NR,NS. 16 19.56 0.2406 21.21 0.1706 DIFF. DUE TO ADDING SR. 1 0.42 0.5147 KLN,KR,LR,LS,KS,NR,NS. 16 18.86 0.2762 21.53 0.1589 DIFF. DUE TO ADDING KLN. 1 1.13 0.2878 LN,KLS,KR,KN,LR,NR,NS. 16 15.99 0.4538 15.63 0.4794 DIFF. DUE TO ADDING KLS. 1 4.00 0.0456 LN,KLR,KN,LS,KS,NR,NS. 16 19.28 0.2543 20.81 0.1860 DIFF. DUE TO ADDING KLR. 1 0.70 0.4015 LN,KL,KR,KNS,LR,LS,NR. 16 16.78 0.4000 18.74 0.2821 DIFF. DUE TO ADDING KNS. 1 3.21 0.0733 LN,KL,KNR,LR,LS,KS,NS. 16 19.90 0.2247 21.27 0.1682 DIFF. DUE TO ADDING KNR. 1 0.09 0.7704 LNS,KL,KR,KN,LR,KS,NR. 16 19.58 0.2397 20.98 0.1794 DIFF. DUE TO ADDING LNS. 1 0.41 0.5239 LNR,KL,KR,KN,LS,KS,NS. 16 18.11 0.3176 18.80 0.2790 DIFF. DUE TO ADDING LNR. 1 1.88 0.1706 STEP 10. BEST MODEL FOUND IS -- LN,KLS,KR,KN,LR,NR,NS.
Continuing after 10 steps
![Page 141: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/141.jpg)
LN,SR,KLS,KR,KN,LR,NR,NS. 15 15.55 0.4127 15.15 0.4406 DIFF. DUE TO ADDING SR. 1 0.44 0.5072 KLN,KLS,KR,LR,NR,NS. 15 12.98 0.6041 13.84 0.5379 DIFF. DUE TO ADDING KLN. 1 3.01 0.0827 LN,KLR,KLS,KN,NR,NS. 15 15.10 0.4446 15.06 0.4471 DIFF. DUE TO ADDING KLR. 1 0.89 0.3446 LN,KNS,KLS,KR,LR,NR. 15 13.21 0.5861 13.19 0.5878 DIFF. DUE TO ADDING KNS. 1 2.78 0.0955 LN,KLS,KNR,LR,NS. 15 15.93 0.3870 15.48 0.4173 DIFF. DUE TO ADDING KNR. 1 0.06 0.8034 LNS,KLS,KR,KN,LR,NR. 15 15.87 0.3905 15.60 0.4089 DIFF. DUE TO ADDING LNS. 1 0.12 0.7343 LNR,KLS,KR,KN,NS. 15 14.23 0.5085 13.75 0.5446 DIFF. DUE TO ADDING LNR. 1 1.76 0.1842 STEP 11. BEST MODEL FOUND IS -- KLN,KLS,KR,LR,NR,NS.
The final step
![Page 142: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/142.jpg)
The best model was found a the previous step• [LN][KLS][KR][KN][LR][NR][NS]
![Page 143: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/143.jpg)
Modelling of response variables
Independent → Dependent
![Page 144: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/144.jpg)
Logit Models
To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.
![Page 145: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/145.jpg)
Example: Logit Models Table: The Effect of planting depth on mortality of Pine seedlings Longleaf Seedlings Slash Seedlings
Depth of Planting Dead Alive Totals Dead Alive Totals Too High 41 59 100 12 88 100 Too Low 11 89 100 5 95 100
Totals 52 148 200 17 183 200 Table: Loglinear Models Fit to Data in Above Table and their Goodness of Fit Statistics Model 2 G2 df [12][13][23] 1.37 1.28 1 [13][23] 26.54 27.79 2 [12][13] 24.03 25.03 2 [13][2] 54.70 50.10 3
![Page 146: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/146.jpg)
The variables1. Type of seedling (T)
a. Longleaf seedlingb. Slash seedling
2. Depth of planting (D)a. Too low.b. Too high
3. Mortality (M) (the dependent variable)a. Deadb. Alive
![Page 147: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/147.jpg)
The Log-linear Model
Note: ij1 = # dead when T = i and D = j.
ln ijk T i D j M ku u u u
, , , , ,TD i j TM i k DM j k TDM i j ku u u u
ij2 = # alive when T = i and D = j.
1
2
ij
ij
deadalive
= mortality ratio when T = i and D = j.
![Page 148: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/148.jpg)
Hence
1T i D j Mu u u u
, ,1 ,1 , ,1TD i j TM i DM j TDM i ju u u u
11 2
2
ln ln ln log-mortality ratioijij ij
ij
since
2T i D j Mu u u u
, ,2 ,2 , ,2TD i j TM i DM j TDM i ju u u u
1 ,1 ,1 , ,12 2 2 2M TM i DM j TDM i ju u u u
2 1 ,2 ,1, ,M M TM i TM iu u u u
,2 ,1 , ,2 , ,1,DM j DM j TDM i j TDM i ju u u u
![Page 149: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/149.jpg)
The logit model:1
1 22
ln ln ln log-mortality ratioijij ij
ij
where ,T i D j TD i jv v v v
1 ,1 ,12 , 2 , 2 , andM T i TM i D j DM jv u v u v u
, , ,12TD i j TDM i jv u
![Page 150: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/150.jpg)
Thus corresponding to a loglinear model there is logit model predicting log ratio of expected frequencies of the two categories of the independent variable.
Also k +1 factor interactions with the dependent variable in the loglinear model determine k factor interactions in the logit modelk + 1 = 1 constant term in logit modelk + 1 = 2, main effects in logit model
![Page 151: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/151.jpg)
Example: Logit Models Table: The Effect of planting depth on mortality of Pine seedlings Longleaf Seedlings Slash Seedlings
Depth of Planting Dead Alive Totals Dead Alive Totals Too High 41 59 100 12 88 100 Too Low 11 89 100 5 95 100
Totals 52 148 200 17 183 200 Table: Loglinear Models Fit to Data in Above Table and their Goodness of Fit Statistics Model 2 G2 df [12][13][23] 1.37 1.28 1 [13][23] 26.54 27.79 2 [12][13] 24.03 25.03 2 [13][2] 54.70 50.10 3
1 = Depth, 2 = Mort, 3 = Type
![Page 152: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/152.jpg)
Log-Linear parameters for Model: [TM][TD][DM]Main Effects: Mort Mort ------ Dead Alive ------------------- -0.946 0.946 Type Type ------ Lleaf Slash ------------------- 0.240 -0.240 Depth Depth ------ low high ------------------- 0.257 -0.257
Two-Factor Interactions: Type-Mort Type Mort ------ ------ Dead Alive --------------------------- Lleaf 0.354 -0.354 Slash -0.354 0.354
Depth-Mort Depth Mort ------ ------ Dead Alive --------------------------- low 0.376 -0.376 high -0.376 0.376 Mort -Type Depth Type ------ ------ Lleaf Slash --------------------------- low -0.063 0.063 high 0.063 -0.063
![Page 153: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/153.jpg)
Logit Model for predicting the Mortality
ln D i T kMR v v v
D i T kv vvdeadMR e e ealive
or
Log-Linear Logit Multconst -0.946 -1.892 0.151Depth- High 0.354 0.708 2.030
Low -0.354 -0.708 0.493Type-Long 0.376 0.752 2.121
Slash -0.376 -0.752 0.471
![Page 154: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/154.jpg)
Example: Fitting a Log-linear model – Forward Selection Table: Dyke -Patterson Data - N=1729 individuals classified according to five variables (1) Reading Newspapers (2) Listen to radio (3) Do "solid'" reading (4) Attend Lectures (5) Knowledge regarding cancer
Radio No Radio Solid
Reading No solid Reading
Solid Reading
No solid Reading
Good Poor Good Poor Good Poor Good Poor Newspaper Lectures 23 8 8 4 27 18 7 6 None 102 67 35 59 201 177 75 156 None Lectures 1 3 4 3 3 8 2 10 None 16 16 13 50 67 83 84 393
![Page 155: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/155.jpg)
The best model was found by forward selection was[LN][KLS][KR][KN][LR][NR][NS]
To fit a logit model to predict K (Knowledge) we need to fit a loglinear model with important interactions with K (knowledge), namely
[LNRS][KLS][KR][KN]The logit model will containMain effects for L (Lectures), N (Newspapers), R (Radio), and S (Reading)Two factor interaction effect for L and S
![Page 156: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/156.jpg)
The Logit Parameters for the Model : LNSR, KLS, KR, KN ( Multiplicative effects are given in brackets, Logit Parameters = 2 Loglinear parameters)The Constant term:
-0.226 (0.798)The Main effects on Knowledge:Lectures Lect 0.268 (1.307)
None -0.268 (0.765)Newspaper News 0.324 (1.383)
None -0.324 (0.723)Reading Solid 0.340 (1.405)
Not -0.340 (0.712)Radio Radio 0.150 (1.162)
None -0.150 (0.861)
The Two-factor interaction Effect of Reading and Lectures on Knowledge
Reading Lectures Solid Not
Lect -0.180 (0.835) 0.180 (1.197) None 0.180 (1.197) -0.180 (0.835)
ratio goodKpoor
![Page 157: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/157.jpg)
Fitting a Logit Model with a Polytomous Response Variable
![Page 158: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/158.jpg)
Example: Table
Observed Cross-Classification of 2294 Males Who Failed to Pass the Armed Forces Qualification Test
Father's Respondent's Education Race Age Education Grammar School Some HS HS Graduate
GS 39 29 8 < 22 Some HS 4 8 1 HS Grad 11 9 6 NA 48 17 8
White GS 231 115 51 22 Some HS 17 21 13 HS Grad 18 28 45 NA 197 111 35 GS 19 40 19 < 22 Some HS 5 17 7 HS Grad 2 14 3 NA 49 79 24
Black GS 110 133 103 22 Some HS 18 38 25 HS Grad 11 25 18 NA 178 206 81
NA – Not available
![Page 159: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/159.jpg)
The variables
1. Race – white, black2. Age - < 22, ≥ 223. Father’s education – GS, some HS, HS grad,
NA4. Respondents Education - GS, some HS, HS
grad – the response (dependent) variable
![Page 160: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/160.jpg)
Table: Various Loglinear Models Fit to the 3 4 2 2 Table above Model d.f. G2 p-value [234][1] 30 254.8 0.0000 [234][12] 24 162.6 0.0000 [234][13] 28 242.7 0.0000 [234][14] 28 152.8 0.0000 [234][12][13] 22 151.5 0.0000 [234][12][14] 22 46.7 0.0016 [234][13][14] 26 142.5 0.0000 [234][12][13][14] 20 36.9 0.0120 [234][123][14] 14 27.9 0.0147 [234][124][13] 14 18.1 0.2023 [234][134][12] 18 33.2 0.0158 [234][123][124] 8 9.7 0.2867
![Page 161: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/161.jpg)
Techniques for handling Polytomous Response VariableApproaches1. Consider the categories 2 at a time. Do this for all
possible pairs of the categories.2. Look at the continuation ratios
i. 1 vs 2ii. 1,2 vs 3iii. 1,2,3 vs 4iv. etc
![Page 162: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/162.jpg)
Table Estimated Logit Effects for The Three Logit Models
Corresponding to the Log Linear Model - [234][124][13]
Grammar vs Some HS
log(m1jkl/m2jkl)
Grammar vs HS Grad
log(m1jkl/m3jkl)
Some HS vs HS Grad
log(m2jkl/m3jkl) Constant -0.289 0.451 0.740
Race White 0.395 0.390 -0.005 Black -0.395 -0.390 0.005
Age < 22 -0.120 0.099 0.219 ≥ 22 0.120 -0.099 -0.219 Grammar 0.380 0.406 0.026
Father's Some HS -0.371 -0.355 0.016 Education HS Grad -0.441 -0.918 -0.477
NA 0.432 0.867 0.435
Race - Father's Education Interaction Grammar 0.063 0.345 0.282
White by Some HS -0.128 -0.016 0.112 HS Grad 0.030 -0.429 -0.459 NA 0.035 0.101 0.066 \Grammar -0.063 -0.345 -0.282
Black by Some HS 0.128 0.016 -0.112 HS Grad -0.030 0.429 0.459 NA -0.035 -0.101 -0.066
![Page 163: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/163.jpg)
Table Multiplicative Logit Effects for The Three Logit Models Corresponding to the Log Linear Model - [234][124][13]
Grammar vs Some HS
log(m1jkl/m2jkl)
Grammar vs HS Grad
log(m1jkl/m3jkl)
Some HS vs HS Grad
log(m2jkl/m3jkl) Constant 0.749 1.570 2.096
Race White 1.484 1.477 0.995 Black 0.674 0.677 1.005
Age < 22 0.887 1.104 1.245 ≥ 22 1.127 0.906 0.803 Grammar 1.462 1.501 1.026
Father's Some HS 0.690 0.701 1.016 Education HS Grad 0.643 0.399 0.621
NA 1.540 2.380 1.545
Race - Father's Education Interaction Grammar 1.065 1.412 1.326
White by Some HS 0.880 0.984 1.119 HS Grad 1.030 0.651 0.632 NA 1.036 1.106 1.068 Grammar 0.939 0.708 0.754
Black by Some HS 1.137 1.016 0.894 HS Grad 0.970 1.536 1.582 NA 0.966 0.904 0.936
![Page 164: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/164.jpg)
Table Various Logit Models for thre Log Continuation ratios in the first Table
a log
m2jkm1jk
b log
m3jkm1jk m2jk
Combined Fit
Model d.f. G2 d.f. G2 d.f. G2 [234][1] 15 131.5 15 123.3 30 254.8 [234][12] 12 97.9 12 64.7 24 162.6 [234][13] 14 123.3 14 119.4 28 242.7 [234][14] 14 49.0 14 102.8 28 152.8 [234][12][13] 11 91.9 11 60.3 22 152.2 [234][12][14] 11 16.1 11 35.6 22 51.7 [234][13][14] 13 43.7 13 98.7 26 142.4 [234][12][13][14] 10 12.4 10 29.8 20 42.2 [234][123][14] 7 9.3 7 23.2 14 32.5 [234][124][13] 7 9.3 7 23.2 14 18.5 [234][134][12] 9 8.6 9 29.7 18 38.3 [234][123][124] 4 8.5 4 1.2 8 9.7
![Page 165: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/165.jpg)
Causal or Path Analysis for Categorical Data
![Page 166: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/166.jpg)
When the data is continuous, a causal pattern may be assumed to exist amongst the variables.The path diagramThis is a diagram summarizing causal relationships.Straight arrows are drawn between a variable that has some cause and effect on another variable X YCurved double sided arrows are drawn between variables that are simply correlated
X Y
![Page 167: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/167.jpg)
Example 1 The variables – Job stress, Smoking, Heart DiseaseThe path diagram
Job Stress
Heart Disease
Smoking
In Path Analysis for continuous variables, one is interested in determining the contribution along each path (the path coefficents)
![Page 168: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/168.jpg)
Example 2The variables – Job stress, Alcoholic Drinking, Smoking, Heart DiseaseThe path diagram Job
Stress
Heart Disease
SmokingDrinking
![Page 169: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/169.jpg)
In analysis of categorical data there are no path coefficients but path diagrams can point to the appropriate logit analysis
ExampleIn this example the data consists of a two wave, two variable panel data for a sample of n =3398 schoolboys.It is looking at “membership” and “attitude towards” the leading crowd.
![Page 170: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/170.jpg)
The path diagram: A B C D This suggest predicting B from A, thenC from A and B and finallyD from A, B and C.
Examples of Causal Analysis Using Recursive Systems of Logit Models Example 1 Two-Wave Two-Variable Panel Data for 3398 Schoolboys: Membership in and attitude toward the "Leading Crowd".
Second Interview Membership + + - - Attitude + - + -
Membership Attitude + + 458 140 110 49 First + - 171 182 56 87 Interview - + 184 75 531 281 - - 85 97 338 554
A = Membership at first interview , B = Attitude at first interview C = Membership at second interview, D = Attitude at second interview
![Page 171: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/171.jpg)
Two-way Analysis for determining the effect of A on B Attitude(B)
+ - + 757 496 Membership
(A)
- 1071 1074
![Page 172: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/172.jpg)
Goodness of Fit Statistics for determining the effect of A, B on C 1. [AB][AC][BC] (1 df; G2 = 0.0) 2. [AB][BC] (2 df; G2 = 1005.1) 3. [AB][AC] (2 df; G2 = 27.2) Identified Logit Model (Model # 1. [AB][AC][BC])
logitAB|C
ij log
mAB|Cij1
mAB|Cij2
wAB|C wAB|C
1i wAB|C2j
![Page 173: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/173.jpg)
Goodness of Fit Statistics for determining the effect of A, B, C on D 4. [ABC][AD][BD][CD] (4 df; G2 = 1.2) 5. [ABC][BD][CD] (5 df; G2 = 4.0) 6. [ABC][AD][CD] (5 df; G2 = 262.5) 7. [ABC][AD][BD] (5 df; G2 = 15.7)
Identified Logit Model (Model # 5. [ABC][BD][CD])
logitABC|D
ijk wABC|D wABC|D2j wABC|CD
3k
![Page 174: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/174.jpg)
Example 2In this example we are looking at 1. Social Economic Status (SES)2. Sex3. IQ4. Parental Encouragement for Higher
Education (PE)5. College Plans(CP)
![Page 175: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/175.jpg)
Social Class, Parental Encouragement,IQ, and Educational Aspirations College Parental SES Sex IQ Plans Encouragement L LM UM H M L Yes Low 4 2 8 4 High 13 27 47 39 No Low 349 232 166 48 High 64 84 91 57 LM Yes Low 9 7 6 5 High 33 64 74 123 No Low 207 201 120 47 High 72 95 110 90 UM Yes Low 12 12 17 9 High 38 93 148 224 No Low 126 115 92 41 High 54 92 100 65 H Yes Low 10 17 6 8 High 49 119 198 414 No Low 67 79 42 17 High 43 59 73 54 M L Yes Low 5 11 7 6 High 9 29 36 36 No Low 454 285 163 50 High 44 61 72 58 LM Yes Low 5 19 13 5 High 14 47 75 110 No Low 312 236 193 70 High 47 88 90 76 UM Yes Low 8 12 12 12 High 20 62 91 230 No Low 216 164 174 48 High 35 85 100 81 H Yes Low 13 15 20 13 High 28 72 142 360 No Low 96 113 81 49 High 24 50 77 98
![Page 176: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/176.jpg)
The Path Diagram
SES
Sex
IQ
PE
CP
![Page 177: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/177.jpg)
The path diagram suggests
1. Predicting Parental Encouragement from Sex, SocioEconomic status, and IQ, then
2. Predicting College Plans from Parental Encouragement, Sex, SocioEconomic status, and IQ.
![Page 178: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/178.jpg)
Goodness of Fit Statistics for determining the effect of A, B, C on D (A = Social class, B = IQ, C = Sex, D = Parental Encouragement, E = College Plans) 1. [ABC][AD][BD][CD] (24 df; G2 = 55.81) 2. [ABC][ABD][CD] (15 df; G2 = 34.60) 3. [ABC][BCD][ACD] (18 df; G2 = 31.48) 4. [ABC][ABD][BCD] (12 df; G2 = 22.44) 5. [ABC][ABD][ACD] (12 df; G2 = 22.45) 6. [ABC][ABD][ACD][BCD] (9 df; G2 = 9.22)
![Page 179: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/179.jpg)
Logit Parameters: Model [ABC][ABD][ACD][BCD]
Constant term wABC|D = 0.124 Main Effects Social Class L LM UM H w1(i)
ABC|D = -1.178, -0.384, 0.222, 1.340 IQ L LM UM H w2(j)
ABC|D = -0.772, -0.226, 0.210, 0.788 Sex M F w3(k)
ABC|D = 0.304, -0.304
![Page 180: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/180.jpg)
Two factor Interactions
IQ by Social Class IQ L LM UM H L -0.016 -0.098 -0.058 -0.026 Social LM 0.066 0.032 0.144 -0.244 Class UM 0.074 -0.044 -0.138 0.108 H -0.126 -0.086 0.048 0.164
![Page 181: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/181.jpg)
Social Class by Sex Sex M F L 0.140 -0.140 Social LM -0.052 0.052 Class UM 0.018 -0.018 H -0.106 0.106
IQ by Sex Sex M F L -0.126 0.126 IQ LM -0.016 0.016 UM 0.018 -0.018 H 0.122 -0.122
![Page 182: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/182.jpg)
Goodness of Fit Statistics for determining the effect of A, B, C, D on E (A = Social class, B = IQ, C = Sex, D = Parental Encouragement, E = College Plans) 7. [ABCD][E][CD] (63 df; G2 = 4497.51) 8. [ABCD][AE][BE][CE][DE] (55 df; G2 = 73.82) 9. [ABCD][BCE][AE][DE] (52 df; G2 = 59.55)
![Page 183: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/183.jpg)
Logit Parameters for Predicting College Plans Using Model 9:[ABCD][BCE][AE][DE]
Constant term wABCD|E = - 1.292 Main Effects Social Class L LM UM H w1(i)
ABCD|E = -0.650, -0.200, 0.062, 0.790 IQ L LM UM H w2(j)
ABCD|E = -0.840, -0.300, 0.266, 0.876 Sex M F w3(k)
ABCD|E = 0.082, -0.082 Parental Encouragement L H w4(l)
ABCD|E = -1.214, 1.214
![Page 184: Discrete Multivariate Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062301/56815b69550346895dc95e9c/html5/thumbnails/184.jpg)
Two Factor Interactions IQ by Sex Sex M F L -0.134 0.134 IQ LM -0.078 0.078 UM 0.094 -0.094 H 0.118 -0.118