Download - Data Mining through Linear Modeling
Factors associated with the clinical characteristic “Dominance” published in “Psychosocial Treatments for Cocaine
Dependence”
[Arch Gen Psychiatry 56: June 1999]
Tim Hare
STA531 Fall 2009
Cross-sectional (Month = 6) modeling of DOMI (Dominance) as a primary outcome
• The outcome measure under consideration is the clinical attribute/behavior “dominance”.
• Preliminary hypothesis directed the search somewhat: Is Dominance correlated with depression scores and psychological characteristics scores?
• A focus on scores for personality traits as well as assessed alcohol use yielded a subset of candidate predictors with strong correlation.
– VIND = vindictive– ALC_SUB = ASI Alcohol composite– INTR = intrusive– COLD
• Exploratory analysis of these to develop the hypothesis?
Correlation data for raw score VIND vs. DOMI
Pearson Correlation Coefficients, N = 456
Prob > |r| under H0: Rho=0
DOMI VIND
DOMI 1.00000 0.80637<.0001
VIND 0.80637<.0001
1.00000
Binned by score trend of the95% CI’s suggests possible significance
of categorical based on binning
Correlation data for raw score INTR vs DOMI
Pearson Correlation Coefficients, N = 456
Prob > |r| under H0: Rho=0
DOMI INTR
DOMI 1.00000 0.79210<.0001
INTR 0.79210<.0001
1.00000
Again, trend in the 95% CI’s
Correlation data for raw score COLD vs DOMI
Pearson Correlation CoefficientsProb > |r| under H0: Rho=0
Number of Observations
DOMI COLD
DOMI 1.00000
456
0.73152<.0001
456
COLD 0.73152<.0001
456
1.00000
457
Again, trend in the 95% CI’s
Cross-sectional (Month = 6) modeling of DOMI (Dominance) as a primary outcome
• The outcome measure under consideration is the clinical attribute/behavior “dominance”.
• A number of potential factors were evaluated, however a focus on related psychological scores as well as assessed alcohol use yielded a subset of relatively strong predictors.
– VIND = vindictive– ALC_SUB = ASI Alcohol composite– INTR = intrusive– COLD
• Based upon the preliminary exploration, the above patient scores were transformed to create a new set of 2-LEVEL categorical variables codified by assignment according to whether they were above or below their mean (e.g. “high” “low”)
– VINDGROUP– ALC_SUBGRUOP– INTRGROUP– COLDGROUP
• Additional crossed 2- and 3-way group variables were created based upon the ABOVE GROUP variables (e.g. combinations of “high” and “low” levels).
– VA = vindictive + ASI Alcohol composite– VI = vindictive + intrusive– VAI = vindictive + ASI Alcohol composite + intrusive
• Before we get started, what about any MODELING ASSUMPTIONS?
OUTCOME MEASURE = DOMI Raw score NORMAL PROBABILITY PLOT
Pearson Correlation Coefficients, N = 456Prob > |r| under H0: Rho=0
DOMI zdomi
DOMI 1.00000 0.94328<.0001
zdomiRank for Variable DOMI
0.94328<.0001
1.00000
Potential 2-way & 3-way CROSS FACTOR GROUP VIEWS
VINDGROUP-INTRGROUP
VINDGROUP-ALC_SUBGROUPVINDGROUP-ALC_SUBGROUP-INTRGROUP
1) Evidence for non-homogeneousvariance.
2) As well, groups areunbalanced (dissimilar counts)
USE PROC MIXED
Any hypothesis suggest itself?
• Let’s take a look at the new CATEGORICAL VARIABLES we created, graphically…
• Let’s also take a look at the new observational CROSS FACTOR data groups...
• Let’s also look for any confounding 2-way or 3-way INTERACTIONS between the variables…
• Will a story emerge?
NEW CATEGORICAL VARIABLES(NOT RAW DATA) MEAN DOMI SCORE by LEVEL
ALC_SUBGROUP
INTRGROUPVINDGROUP
Some good evidence that our2-LEVEL categorization correlates with
our OUTCOME measure.
What about combinationsof the above CATEGORIES by LEVEL?
2-way- & 3-way CROSS FACTOR GROUPS MEAN DOMI SCORES
(VIND*INTR)
(VIND*ALC_SUB)
VIND*ALC_SUB*INTR
Good story to explore…
What about “interaction”???
Possible interaction between VINDGROUP and ALC_SUBGROUP
Possible interaction between INTRGROUP and ALC_SUBGROUP
Possible interaction between INTRGROUP and VINDGROUP
Graphical analysis to examine potential 3-WAY interaction VINDGROUP*INTRGROUP*ALC_SUBGROUP(LOW/HIGH)
ALC_SUBGROUP = LOW
ALC_SUBGROUP = HIGH
PREVIEW for longitudinal modeling: Correlation noted in CROSS SECTIONAL (Month=6)
data seems to persist across time…
Therefore suspect “repeated measures”(longitudinal modeling) may model
well from the same variables
Exploration leads to Hypothesis
• Incidence of the clinical characteristic “Dominance” can likely be explained by modeling with the categorical variables (VINDGROUP, INTRGROUP, ALC_SUBGROUP, COLD) derived from the raw scores.
• Interaction terms (VINDGROUP*ALC_SUBGROUP, VINDGROUP*INTRGROUP, INTRGROUP*ALC_SUBGROUP) will likely play a role in the modeling process given the preliminary 2-/3-way interaction plots (COLDGROUP*<other> not compelling, data not shown).
• Finally, there are some compelling graphs that suggest that interaction can be explained and may be significant in many cases (we’ll explore these further with contrasts later).
Cross sectional (Month=6) modeling results confirm our suspicions
PROC MIXED (α=0.05)
Type 3 Tests of Fixed Effects
EffectNum
DFDen DF F Value Pr > F
COLDGROUP 1 447 13.79 0.0002
VINDGROUP 1 447 136.66 <.0001
ALC_SUBGROUP 1 447 6.13 0.0136
INTRGROUP 1 447 163.16 <.0001
VINDGROUP*INTRGROUP 1 447 33.72 <.0001
VINDGROUP*ALC_SUBGROUP 1 447 9.11 0.0027
VINDGROUP*ALC_SUBGROUP*INTRGROUP 2 447 5.50 0.0044
2X/3X
CROSS SECTIONAL(Month=6) DOMI SCORE
LSMEANS for 3-way crossed observational groups
Least Squares Means
Effect VINDGROUP ALCSUBGROUP INTRGROUP Estimate Error DF t Value
VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 1_Q12INTR 1.2440 0.2747 447 4.53VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 2_Q34INTR 2.5681 0.6553 447 3.92VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 1_Q12INTR 1.9119 0.3445 447 5.55VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 2_Q34INTR 5.5909 0.6210 447 9.00VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 1_Q12INTR 3.5733 0.4871 447 7.34VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 2_Q34INTR 11.0133 0.3890 447 28.31 VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 1_Q12INTR 4.4362 0.5418 447 8.19VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 2_Q34INTR 9.7837 0.4263 447 22.95
Effect VINDGROUP ALCSUBGROUP INTRGROUP Pr > |t| Alpha Lower Upper
VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 1_Q12INTR <.0001 0.05 0.7042 1.7838VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 2_Q34INTR 0.0001 0.05 1.2802 3.8559VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 1_Q12INTR <.0001 0.05 1.2349 2.5889VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 2_Q34INTR <.0001 0.05 4.3705 6.8114VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 1_Q12INTR <.0001 0.05 2.6160 4.5306VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 2_Q34INTR <.0001 0.05 10.2489 11.7777VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 1_Q12INTR <.0001 0.05 3.3714 5.5009VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 2_Q34INTR <.0001 0.05 8.9460 10.6215
If you’re Vindictive=Low, Intrusive=Low…you probably don’t have to worry about being
overly Dominant after drinking…Estimates
Label EstimateStandard
Error DF t Value Pr > |t|
LLL 1.2440 0.2747 447 4.53 <.0001
LHL 1.9119 0.3445 447 5.55 <.0001
Contrasts
LabelNum
DFDen DF F Value Pr > F
LHL-LLL 1 447 2.82 0.0938
At 95% conf.Level: notsignificant
Does intrusiveness trump alcohol in low vindictives?
Estimates
Label EstimateStandard
Error DF t Value Pr > |t|
LLL 1.24 0.27 447 4.53 <.0001
LHL 1.91 0.34 447 5.55 <.0001
LHH 5.59 0.62 447 9.00 <.0001
LLH 2.57 0.66 447 3.92 0.0001
AVG(LLL,LHL) 1.58 0.24 447 6.58 <.0001
AVG(LHH,LLH) 4.08 0.45 447 9.04 <.0001
Contrasts
LabelNum
DFDen DF F Value Pr > F
LHL-LLL 1 447 2.82 0.093
LHH-LLH 1 447 11.21 0.0009
AVG(LHH,LLH)-AVG(LLL,LHL)
1 447 24.87 <.0001
VL AL/H IL
VL AL/H IH
CROSS SECTIONAL MODEL: (Month=6) Good fit? Residuals plots…
Pearson Correlation Coefficients, N = 456Prob > |r| under H0: Rho=0
Resid zresidResidResidual
1.00000 0.97126<.0001
zresidRank for Variable Resid
0.97126<.0001
1.00000
Repeated Measures Analysis of the entire Month1-Month6 data set
• The original cross sectional model was evaluated for COV / VAR structure (e.g. adjust for possible correlation in longitudinal data) by comparison
– CS, UN, AR(1) TOEP, CSH, ARH(1)
• COV / VAR type “UN” was retained as smallest -2ResLogLikelihood, significantly smaller than all the rest, adjusting for DF.
Type 3 Tests of Fixed Effects
Effect
Num
DF
Den
DF F Value Pr > F
COLDGROUP 1 102 22.20 <.0001
VINDGROUP 1 102 132.23 <.0001
ALC_SUBGROUP 1 102 0.06 0.8081
INTRGROUP 1 102 125.09 <.0001
VINDGROUP*INTRGROUP 1 102 39.29 <.0001
VINDGROUP*ALC_SUBGRO 1 102 6.93 0.0098
VINDGR*ALC_SU*INTRGR 2 102 7.16 0.0012
ALC_SUBGROUP was retained due to participation in
higher order terms of significance.
LONGITUDINAL MODEL: Good fit? Residuals plots…
Pearson Correlation Coefficients, N = 456Prob > |r| under H0: Rho=0
Resid zresid
ResidResidual
1.00000 0.95069<.0001
zresidRank for Variable Resid
0.95069<.0001
1.00000
Repeated Measures (COV=UN,
no terms removed)
Least Squares Means
Effect VINDGROUP ALC_SUBGROUP INTRGROUP Estimate Error DF t Value VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 1_Q12INTR 1.6757 0.2996 102 5.59 VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 2_Q34INTR 2.9805 0.5583 102 5.34 VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 1_Q12INTR 2.0854 0.3452 102 6.0 VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 2_Q34INTR 4.1859 0.5689 102 7.36 VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 1_Q12INTR 3.5756 0.4301 102 8.31 VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 2_Q34INTR 10.0167 0.3986 102 25.13 VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 1_Q12INTR 4.2246 0.5035 102 8.39 VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 2_Q34INTR 8.0591 0.4076 102 19.77
Effect VINDGROUP ALC_SUBGROUP INTRGROUP Pr > |t| Alpha Lower Upper VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 1_Q12INTR <.0001 0.05 1.0815 2.2699 VINDGR*ALC_SU*INTRGR 1_Q12VIND 1_Q12ALC_SUB 2_Q34INTR <.0001 0.05 1.8730 4.0879 VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 1_Q12INTR <.0001 0.05 1.4006 2.7702 VINDGR*ALC_SU*INTRGR 1_Q12VIND 2_Q34ALC_SUB 2_Q34INTR <.0001 0.05 3.0575 5.3143 VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 1_Q12INTR <.0001 0.05 2.7224 4.4288 VINDGR*ALC_SU*INTRGR 2_Q34VIND 1_Q12ALC_SUB 2_Q34INTR <.0001 0.05 9.2261 10.8073 VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 1_Q12INTR <.0001 0.05 3.2259 5.2234 VINDGR*ALC_SU*INTRGR 2_Q34VIND 2_Q34ALC_SUB 2_Q34INTR <.0001 0.05 7.2506 8.8676
Estimates
Label EstimateStandard
Error DF t Value Pr > |t|
LLL 1.6757 0.2996 102 5.59 <.0001
LHL 2.0854 0.3452 102 6.04 <.0001
Contrasts
LabelNum
DFDen DF F Value Pr > F
LHL-LLL 1 102 1.57 0.2124
At 95% conf.Level: notsignificant
What about our cross sectional contrast of alcohol being trumped by intrusiveness in low vindictives? Does it still hold in the longitudinal analysis?
Estimates
Label EstimateStandard
Error DF t Value Pr > |t|
LLL 1.6757 0.2996 102 5.59 <.0001
LHL 2.0854 0.3452 102 6.04 <.0001
LHH 4.1859 0.5689 102 7.36 <.0001
LLH 2.9805 0.5583 102 5.34 <.0001
AVG(LLL,LHL) 1.8806 0.2790 102 6.74 <.0001
AVG(LHH,LLH) 3.5832 0.4247 102 8.44 <.0001
Contrasts
LabelNum
DFDen DF F Value Pr > F
LHL-LLL 1 102 1.57 0.2124
LHH-LLH 1 102 2.65 0.1069
AVG(LHH,LLH)-AVG(LLL,LHL) 1 102 15.41 0.0002
Still significantAt 95% CI
Conclusions• The interrelationship between the clinical attribute “dominance” and
the related attributes “intrusive” “vindictiveness”, “cold”, along with the ASI Alcohol Composite score, combine to model the incidence of dominance in the clinical data.
• Both longitudinal modeling (using the entire data set) and cross sectional modeling (Month=6) support the same conclusions.
• We can use CONTRASTS to profitably to validate the BARCHARTS showing possible differences in relationship between the interactions of 3 key variables (VINDGROUP, INTRGROUP, and ALC_SUBGROUP) types that correlated with DOMINANCE, but have complex INTERACTIONS.
Q&A