sequential logistic regression: modeling risk factors and child outcomes presented to nic chapter of...
TRANSCRIPT
Sequential Logistic Regression: Modeling Risk Factors and Child Outcomes Presented to NIC Chapter of ASA
October 21, 2005
Logistic Regression Model
Statistical method for relating explanatory variable(s) to the log odds of a binary outcome measure.
Dependent variable is always a binary outcome.
Independent variables may be categorical or quantitative.
p is the probability associated with the binary outcome measure.
eß1 is the odds ratio for independent variable x1. Odds ratio (eß1) being the amount of increase in
the odds associated with a unit increase in x1.
Logistic Regression Model
kx
kxx
p
p ...221101
ln
Log of the Odds Ratio
Statistical Inference for Logistic Regression The confidence interval for the slope b1 is
The confidence interval for the odds ratio is
Where z is the value from the standard normal density curve.
11 bSEzb
1111 , bb SEzbSEzb ee
Statistical Inference for Logistic Regression
To test the hypothesis Ho: ß1 = 0 we compute the test statistic
Which has approximately a Chi-Square distribution with 1 df.
2
1
12
bSE
bX
Logistic Regression with One Predictor1
Assume in a large sample of college students, those who frequently engage in binge drinking are 3,314/17,096 = 0.1938.
Odds for a for this outcome are thus:
24.08062.0
1938.0
1
p
p
This example borrowed from introduction to the Practice of Statistics by Moore and McCabe (2006).
Odds Males:
Odds for Females:
Is Gender a Predictor?
294.07730.0
2270.0
1
p
p
205.08302.0
1698.0
1
p
p
Log Odds:22.1)294.0ln(
58.1)205.0ln(
Log Odds:
Gender
Binge Drinking? Male Female Total
Yes 1,630 1,684 3,314
No 5,550 8,232 13,782
Total 7,180 9,916 17,096
Model for this example is:
For females (x1= 0) we have:
Thus the estimate of the intercept is equal to ß0 which is the log odds for females.
Interpreting the LogReg Model
1101ln x
p
p
01 001
ln
p
p
59.101
ln
p
p
The estimate of the slope is the difference between the log odds for males on the predictor and the log odds for females on the predictor:
The fitted model is: log(ODDS)=-1.59 +0.36x
Interpreting the LogReg Model
36.0)59.1(23.11
ln1
ln0
0
1
11
p
p
p
pb
Meaning of the Odds Ratio
The odds ratio is:
Interpretation: the odds of being a frequent binge drinker for males is 1.43 times the odds for females.
43.136.059.1
36.059.1
ee
e
ODDS
ODDS
females
males
Multivariate Logistic Regression
The multivariate case has the same statistical concepts but the computations are more difficult because of the potential correlation among multiple predictors.
It is easy to conduct the analysis using a statistical software package.
Overview of Study
Children grow up within the context of personality, family, neighborhood, and society.
They grow up with both disadvantages and opportunities, problems and strengths, referred to here as risk and protective factors. Examples of commonly understood risk factors include low birth weight,
child maltreatment, illness, neighborhood violence. Examples of commonly understood protective factors include individual
verbal communication skills, the capacity for empathy, problem solving skills, frustration tolerance, the presence of multiple and consistent caregivers, access to health care and social services, and the concrete, social, and affective support of family and friends.
The aim of this study was to empirically measure risk and protective factors at the individual, family, and neighborhood level and to relate them to poor short- and longer-term outcomes such as health problems, behavioral and cognitive development, and maltreatment.
Methods -- Subjects
The 219 mother-infant dyads recruited for this study were part of a larger cohort recruited in waves over four years, beginning in 1990 as part of the Capella Project, a twenty year longitudinal study funded by NIH.
Data used in the current analysis were collected over a period of approximately 4-5 years.
Infants in the study were all under 18 months of age when they entered the study.
Methods -- Instruments Extensive information was collected during the primary maternal interview. The main tools were the interview and self-report inventories.
Combination of study-developed and standardized instruments. Maternal Information
Use of alcohol and drugs. Physical and psychological health. Personal history of physical, sexual and emotional abuse. Family functioning and daily life stressors. Neighborhood conditions.
Child Information Behavior. Health, accidents, hospitalizations. Cognitive and emotional development.
Child maltreatment Abuse or neglect in the child’s first year of life, obtained from an annual review of hotline
records of reports, and supplemented by case record review
Caregiver Intra-Personal Functioning CAGE—4 item rapid alcoholism screening scale. Subjects were
classified as having a possible alcohol problem if they endorsed 2 or more items.
Center for Epidemiologic Studies Depression Scale—20-item scale to measure depressive symptoms. Clinical cut-off score of 16 used here.
Health Opinion Survey—20 item scale to assess neurotic or psychosomatic symptoms. Higher scores indicate more symptoms. A binary measure was computed using a median split, to reflect above-average psychosomatic symptoms.
Service Utilization – report of a psychiatric or substance use hospitalization.
Caregiver Inter-Personal Functioning Family and Neighborhood —The family APGAR is a 5-
item inventory of family function and satisfaction. The Neighborhood Satisfaction Index is a 9-item inventory of neighborhood characteristics.
Domestic Violence was defined by self-report in conjunction with questions regarding childhood physical, sexual and emotional abuse, and was further confirmed as current by interviewer in the site-specific Trauma and Violence scale.
Lifetime Stressors – An inventory of common stressors such as marriage, divorce, death in the family, moving, experiencing violence, etc.
Child Short-Term Outcomes
Child Health Status— items to assess general health, specific conditions applying to child and other illness or problems.
Service Utilization Measures—to assess accidents and hospitalizations of the child.
Child Abuse Neglect Tracking System—abuse or neglect in the child’s first year of life, obtained from an annual review of hotline records of reports, and supplemented by case record review.
Battelle Developmental Inventory Screening Test—96 items (out of 341 in complete battery) to assess five domains: personal-social skills, adaptive behavior, psychomotor ability, communication and cognitive. Child considered to have delayed development if (standardized) Battelle total score more than 1 standard deviation from the mean.
Child Long-Term Outcomes
Child Health – items assessing general health, specific conditions applying to child and other illness or problems through caregiver report.
Child Behavior Checklist – 5 scale scores assessing a child’s behavioral and social development.
PRESS – A measure of intelligence for pre-school children.
Hypotheses
The theoretical model guiding the analyses posited a sequential model of the effect of certain risk factors on child developmental outcomes.
These risk factors were: Maternal history of loss and/or victimization. Maternal compromised emotional status. Domestic violence. Family and/or neighborhood problems.
Hypotheses
Maternal history of loss/victimization would be associated with maternal compromised emotional status.
Maternal compromised emotional status would be associated with problems in the family and neighborhood and/or domestic violence.
Problems in the family and neighborhood and domestic violence would be associated with poor short-term child outcomes.
Poor short-term outcomes would be associated with poor longer-term child outcomes.
Visual Model of the Hypotheses
CompromisedEmotional Status
CageA/CES-DHealth Opinion
SurveyResidential Treatment
Family & Neighborhood
FAPGARLife Experiences
Neighborhood Short Form
Domestic Violence
Short-Term Outcomes
Child Abuse or Neglect
AOD/Battelle Child Health
Long-Term Outcomes
Press/CBCLBattelle
Child Health
Maternal History
Victim of Child Abuse
Lost a Parent
Measures used in Analyses
Maternal loss/victimization history coded yes (1) if the mother reported either a personal history of abuse or losing a parent before the age of 18. Coded no (0) otherwise.
Maternal compromised emotional status was coded yes (1) if the mother any of the following: Score of 2 or higher on a 4-item rapid alcoholism screening
inventory (CAGE). Score above cutoff of 16 on the depression inventory (CESD). Score on inventory of psychosomatic symptoms above the
median. Report of a substance or psychiatric hospitalization.
Measures used in Analyses
Problems in the family or neighborhood was coded yes (1) if the mother scored above the median on two or more of the following inventories: Family function and satisfaction. Neighborhood characteristics. Lifetime stressors.
Domestic violence coded yes (1) if the mother reported domestic violence.
Measures used in Analyses
Poor short-term (1-2 Year) child outcomes was coded yes (1) if the child had any two of the following:Health Problem(s), accident or hospitalization.Delayed Development (BATTELLE).Presence of Alcohol or Drugs at birth.
OR there was a report of abuse or neglect.
Measures used in Analyses
Poor long-term (3-4 Year) child outcomes was coded yes (1) if the child had any two of the following:Health Problem(s), accident or hospitalization.Delayed Development (PRESS).Behavioral Problems (CBCL)
Logistic Regression #1
Maternal /loss victimization history entered as a single predictor for maternal compromised emotional status.
This analysis was statistically significant (Chi-Square = 13.94, p < .001), and resulted in correct classification of 47% of cases without impaired caregiver status, 77% of cases with caregiver status problems and 68% of cases overall.
The odds ratio for the predictor (maternal victimization history) was 3.1, and the 95% CI (1.7 to 5.6).
SPSS Output
The Model so Far
CompromisedEmotional Status
CageA/CES-DHealth Opinion
SurveyResidential Treatment
Family & Neighborhood
FAPGARLife Experiences
Neighborhood Short Form
Domestic Violence
Short-Term Outcomes
Child Abuse or Neglect
AOD/Battelle Child Health
Long-Term Outcomes
Press/CBCLBattelle
Child Health
3.1
Maternal History
Victim of Child Abuse
Lost a Parent
Logistic Regression #2
Maternal loss/victimization history and maternal compromised emotional status entered together as predictors for family/neighborhood problems.
This analysis was also statistically significant (Chi-Square = 16.17, p < .001), and resulted in correct classification of 60% of cases without family/neighborhood problems, 65% of cases with family/neighborhood problems, and 63% of cases overall.
The odds ratio for the maternal compromised emotional status as a predictor (family neighborhood problems) was 2.5, and the 95% CI (1.4 to 4.6).
The odds ratio for maternal victimization history was not statistically significant.
SPSS Output
The Model so Far
CompromisedEmotional Status
CageA/CES-DHealth Opinion
SurveyResidential Treatment
Family & Neighborhood
FAPGARLife Experiences
Neighborhood Short Form
Domestic Violence
Short-Term Outcomes
Child Abuse or Neglect
AOD/Battelle Child Health
Long-Term Outcomes
Press/CBCLBattelle
Child Health
3.1
Maternal History
Victim of Child Abuse
Lost a Parent
2.5
Logistic Regression #3
Maternal loss/victimization history, caregiver status, and family/neighborhood problems entered in one step to predict presence of domestic violence in the home.
This regression was statistically significant (Chi-Square = 16.36, p < .001), and resulted in correct classification of 71% cases without domestic violence in the home, 51% of cases with domestic violence in the home, and 62% cases overall.
The odds ratio for the maternal compromised emotional status as a predictor (of domestic violence) was 2.1, and the 95% CI (1.4 to 4.6).
The odds ratio for family/neighborhood problems as a predictor (of domestic violence) was 1.8, and the 95% CI (>1.0 to 3.2).
The odds ratio for maternal victimization history was not statistically significant.
SPSS Output
The Model so Far
CompromisedEmotional Status
CageA/CES-DHealth Opinion
SurveyResidential Treatment
Family & Neighborhood
FAPGARLife Experiences
Neighborhood Short Form
Domestic Violence
Short-Term Outcomes
Child Abuse or Neglect
AOD/Battelle Child Health
Long-Term Outcomes
Press/CBCLBattelle
Child Health
3.1
Maternal History
Victim of Child Abuse
Lost a Parent
2.5
2.11.8
Logistic Regression #4
Maternal loss/victimization history, caregiver status, family/neighborhood problems, and domestic violence entered in one step to predict presence of poor short-term child outcomes.
The overall regression was not statistically significant (Chi-Square = 8.98, p < .062), and classification was less effective. Under this model, all cases were classified into the poor short-term child outcome group, correctly classifying only those subjects who did in fact have poor short-term child outcomes (66%), and misclassifying all the rest.
The odds ratio domestic violence as a predictor (of poor short-term child outcomes) was 2.1, and the 95% CI (1.2 to 3.9). This was statistically significant.
The odds ratios for the other predictors were not statistically significant.
SPSS Output
The Model so Far
CompromisedEmotional Status
CageA/CES-DHealth Opinion
SurveyResidential Treatment
Family & Neighborhood
FAPGARLife Experiences
Neighborhood Short Form
Domestic Violence
Short-Term Outcomes
Child Abuse or Neglect
AOD/Battelle Child Health
Long-Term Outcomes
Press/CBCLBattelle
Child Health
3.1
Maternal History
Victim of Child Abuse
Lost a Parent
2.5
2.11.8
2.1
Logistic Regression #5
Maternal loss/victimization history, caregiver status, family/neighborhood problems, domestic violence, and poor short-term child outcomes entered in one step to predict presence of poor longer-term child outcomes.
The overall regression was statistically significant (Chi-Square = 16.67, p < .005), and resulted in correct classification of 39% cases without poor long-term child outcomes, 85% of cases having poor long-term child outcomes, and 68% cases overall.
The odds ratio for family/neighborhood problems as a predictor (of poor long-term child outcomes) was 2.6, and the 95% CI (1.1 to 6.1).
The odds ratio for poor short-term outcomes as a predictor (of poor long-term child outcomes) was 3.2, and the 95% CI (1.4 to 7.6).
The odds ratios for the other predictors were not statistically significant.
SPSS Output
The Final Model
CompromisedEmotional Status
CageA/CES-DHealth Opinion
SurveyResidential Treatment
Family & Neighborhood
FAPGARLife Experiences
Neighborhood Short Form
Victim of Domestic Violence
Short-Term Outcomes
Child Abuse or Neglect
AOD/Battelle Child Health
Long-Term Outcomes
Press/CBCLBattelle
Child Health
3.1
Maternal History
Victim of Child Abuse
Lost a Parent
2.5
2.11.8
2.1
2.6
3.2
Goodness of Fit -2LL (LL = log likelihood) is 0 if model fits
perfectly. Chi-Square is test the change in -2LL from
constant only to model with set of predictors.
Goodness of Fit Quantification of the proportion of explained
variance. Cox & Snell R2 & Nagelkerke R2
These are similar in intent to R2 in multiple linear regression. For the current model, about 19.5%.
Discrimination and Calibration
Model DiscriminationAbility of the model to discriminate
observations in the two groups. Model Calibration
How close the observed and predicted probabilities match.
Model Discrimination
SPSS provides a classification table.Shown earlier.
SPSS also provides a histogram of estimated probabilities.Positive cases should be on the right and
negative cases on the left.
Model Discrimination
not so good
one serious problem is the sample itself was quite biased towards poor outcomes because of poverty, etc.
Calibration
Hosmer-Lemeshow goodness-of-fit Cases divided into deciles based on estimated
probabilities. Compare observed to expected numbers (contingency table)
Null hypothesis for this is there is no difference between the observed and predicted values.
This statistic should be interpreted carefully because it’s value is dependent upon the number of groups.
Interpretation should be cautious.
Hosmer and Lemeshow for Final Model
null hypothesis is not rejected, suggesting the model is OK.
The c-Statistic
c-Statistic Interpreted as the proportion of pairs of cases with different observed
outcomes where the model results in higher probability for cases with the event than for cases without the event.
Ranges in value from 0.5 to 1.0, where 1.0 means the model always assigns higher probability to cases with the event than to those without the event.
In SPSS to get this you first have to save the predicted probabilities along with the actual outcome measure into a new file, and then group them into a reasonably large number of distinct groups using an equation like this: probcat = trunc(prob_1/.00005) Next cross tabulate probcat with the outcome measure and calculate
Somers’ d.
Somers’ d
c-Statistic
The c-statistic is interpreted as the % of possible pairs of cases in which one is positive on the outcome and the other is negative, that the logistic model assigns a higher probability to the positive case.
895.05.02
79.005.
2
dc
Conclusion
These results provided general support for the model overall.
Subsequent analyses (not reported here) helped further refine the model and explore relationships among risk factors and child outcomes.