addressing self-selection effects in evaluations of mutual help groups and professional mental...

8
Pergamon Evaluarion and Program Planning, Vol. 19. NO. 4, pp. 301-308, 1996 Copyright 0 1996 Elsevier Science Ltd Printed in Great Britain. All rights reserved SO149-7189(96)00028-6 Ol49-7189/96 $15.00+0.00 ADDRESSING SELF-SELECTION EFFECTS IN EVALUATIONS OF MUTUAL HELP GROUPS AND PROFESSIONAL MENTAL HEALTH SERVICES: AN INTRODUCTION TO TWO-STAGE SAMPLE SELECTION MODELS KEITH HUMPHREY& CIARAN S. PHIBBS and RUDOLF H. MOOS Center for Health Care Evaluation, Palo Alto Veterans Affairs Health Care System and Stanford University School of Medicine ABSTRACT Because random assignment to conditions is often neither possible nor desirable in longitudinal evaluations of mutual help organizations, the influence of self-selection effects must be assessed in order to accurately interpret outcome data. One approach to adjusting for self-selection effects is to control for covariates that predict outcome using statistical procedures such as analysis of covariance (ANCOVA), partial correlations, and hierarchical regression. This approach has considerable power, but is less useful when an evaluator is interested in directly modeling the process of entry into a program and incorporating information on the factors affecting self’ selection into estimation of program effects. Two-stage sample selection models are designed to address such situations. These models refy on regression procedures in which program participation is modeled in an initial equation, which yields a sample selection correction factor. The correction factor is included with participation in a second equation that predicts outcome. This two-stage procedure allows the evaluator to interpret the observed effects of a professional service or a self help group in the context of the magnitude and direction of selection effects. We compare and contrast the covariate control and sample selection models in a longitudinal study of the effects of participation in Alcoholics Anonymous on drinking behavior. Copyright 0 1996 Elsevier Science Isd INTRODUCTION Addressing self-selection effects is among the more difficult challenges of naturalistic evaluation research. When individuals are not randomly assigned to programs,’ outcome data are often difficult to interpret because of pre-existing group differences on important variables. In this paper, we describe the scope of the self-selection problem in program evaluation and high- light its particular relevance to research on mutual help organizations. We then contrast two ways of addressing self-selection: covariate control approaches and two- stage sample selection models. Two-stage sample selec- tion models have been developed and utilized in econ- omics and education research for over 20 years, and our purpose here is not to provide a technical account or statistical extension of these approaches. Rather, we offer a non-technical discussion of these models for researchers who have not used these models extensively and may not be aware of their potential to inform evalu- ations of professional mental health services and mutual help organizations. This project was supported by NIAAA grants AA02863 and AA06699, and by the Department of Veterans Affairs Mental Health and Behavioral Science Service and Health Services Research and Development Service. John Finney, Ted Joyce, Phil Lavori, Douglas Luke and two anonymous reviewers made helpful comments on earlier versions of this paper. Requests for reprints should be addressed to Keith Humphrey& Ph.D., Center for Health Care Evaluation, VAHCS (152-MPD), 795 Willow Road, Menlo Park, CA 94025, U.S.A. ‘We use the term “programs” generically in this paper to refer to whatever an evaluator is studying, be it a professional mental health treatment, an educational program, or a self-help organization. Although our focus in this paper is on self-help organizations, many of the issues we discuss also arise in evaluations of professional mental health services. 301

Upload: keith-humphreys

Post on 15-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Pergamon

Evaluarion and Program Planning, Vol. 19. NO. 4, pp. 301-308, 1996 Copyright 0 1996 Elsevier Science Ltd

Printed in Great Britain. All rights reserved SO149-7189(96)00028-6 Ol49-7189/96 $15.00+0.00

ADDRESSING SELF-SELECTION EFFECTS IN EVALUATIONS OF MUTUAL HELP GROUPS AND PROFESSIONAL MENTAL HEALTH

SERVICES: AN INTRODUCTION TO TWO-STAGE SAMPLE SELECTION MODELS

KEITH HUMPHREY& CIARAN S. PHIBBS and RUDOLF H. MOOS

Center for Health Care Evaluation, Palo Alto Veterans Affairs Health Care System and Stanford University School of Medicine

ABSTRACT

Because random assignment to conditions is often neither possible nor desirable in longitudinal evaluations of mutual help organizations, the influence of self-selection effects must be assessed in order to accurately interpret outcome data. One approach to adjusting for self-selection effects is to control for covariates that predict outcome using statistical procedures such as analysis of covariance (ANCOVA), partial correlations, and hierarchical regression. This approach has considerable power, but is less useful when an evaluator is interested in directly modeling the process of entry into a program and incorporating information on the factors affecting self’ selection into estimation of program effects. Two-stage sample selection models are designed to address such situations. These models refy on regression procedures in which program participation is modeled in an initial equation, which yields a sample selection correction factor. The correction

factor is included with participation in a second equation that predicts outcome. This two-stage procedure allows the evaluator to interpret the observed effects of a professional service or a self help group in the context of the magnitude and direction of selection effects. We compare and contrast the covariate control and sample selection models in a longitudinal study of the effects of participation in Alcoholics Anonymous on drinking behavior. Copyright 0 1996 Elsevier Science Isd

INTRODUCTION

Addressing self-selection effects is among the more difficult challenges of naturalistic evaluation research. When individuals are not randomly assigned to programs,’ outcome data are often difficult to interpret because of pre-existing group differences on important variables. In this paper, we describe the scope of the self-selection problem in program evaluation and high- light its particular relevance to research on mutual help organizations. We then contrast two ways of addressing

self-selection: covariate control approaches and two- stage sample selection models. Two-stage sample selec- tion models have been developed and utilized in econ- omics and education research for over 20 years, and our purpose here is not to provide a technical account or statistical extension of these approaches. Rather, we offer a non-technical discussion of these models for researchers who have not used these models extensively and may not be aware of their potential to inform evalu- ations of professional mental health services and mutual help organizations.

This project was supported by NIAAA grants AA02863 and AA06699, and by the Department of Veterans Affairs Mental Health and Behavioral

Science Service and Health Services Research and Development Service. John Finney, Ted Joyce, Phil Lavori, Douglas Luke and two anonymous

reviewers made helpful comments on earlier versions of this paper. Requests for reprints should be addressed to Keith Humphrey& Ph.D.,

Center for Health Care Evaluation, VAHCS (152-MPD), 795 Willow Road, Menlo Park, CA 94025, U.S.A. ‘We use the term “programs” generically in this paper to refer to whatever an evaluator is studying, be it a professional mental health treatment,

an educational program, or a self-help organization. Although our focus in this paper is on self-help organizations, many of the issues we discuss also arise in evaluations of professional mental health services.

301

302 KEITH HUMPHREYS et al

A Brief Overview of the Selection Problem During the War on Poverty/Great Society period, researchers were called upon to evaluate income support benefits, childhood health clinics, job training programs, and educational approaches from Main Street to Sesame Street. For practical and ethical reasons, it was usually not possible to employ ran- domized designs when evaluating much social programs, leading economists and other social scientists to develop methods for making stronger causal infer- ences from non-experimental data (Moffitt, 1991). Some approaches to dealing with self-selection effects, such as case matching and ANCOVA, are frequently employed in mental health services research, whereas others, such as two-stage sample selection models, are not.

The basic structure of the selection problem in evalu- ation research can best be grasped by contrasting it with randomized designs. When an experimental and a control group are initially equivalent, differences between the groups after the experimental group has participated in a program usually can be attributed to the program. The underlying logic is that the control group serves as benchmark because it tells us what the experimental group would look like if it had not par- ticipated in the program. However, when participants self-select into programs, this logic does not hold because we have no group of “comparable” non-par- ticipants to use as a benchmark: If non-participants were truly comparable to program participants, they too would have selected into the program.

Self-selection is not an artifact that can be eliminated through statistical procedures; it is inherent in how indi- viduals interact with programs in the real world. The perspective we take here is that no statistical procedure can fully solve the self-selection problem, but that different approaches to it (e.g. covariate control, sample selection models) may be helpful to program evaluators depending on the context and purpose of the evaluation being conducted.

SPECIAL CONSIDERATIONS OF THE SELECTION PROBLEM IN MUTUAL HELP

RESEARCH

Self-help (aka “mutual help”) organizations are vol- untary associations of individuals who share some status that causes them suffering, is socially stigmatized, or both (e.g. being alcoholic, having a developmentally disabled child, being gay or lesbian). Although they should not be equated with professional mental health services, mutual help organizations serve as a helping resource for approximately 7.5 million Americans (Lieberman & Snowden, 1993). Mutual help organ- izations are controlled by members rather than by pro- fessionals, and members have widely varying

involvement in the mutual help organizations to which they belong.

Self-help group evaluators cannot rely extensively on randomized clinical trials, because this research design tends to change the phenomenon under study (Hum- phreys & Rappaport, 1994; Levy, in press). When indi- viduals are randomly assigned or coerced into going to a self-help group (e.g. Walsh et al., 1991), the organiz- ation loses its voluntary character. Further, outside con- trol of self-help groups seems to lessen their effectiveness (see Toro, Rieschl, Zimmerman, Rappaport, Seidman, Luke & Roberts, 1988), and thus researchers who ran- domly assign individuals to attend self-help groups may dilute the power of the organization and thereby under- estimate its effect. Unlike professional mental health services, self-help groups make no distinction between those who give and receive help, so an evaluator who changes the composition of a group through random assignment necessarily changes the “intervention” (Levy, in press).

Humphreys and Rappaport (1994) argue that eva- luators should study mutual help groups as they exist in context, that is, as member controlled organizations that individuals chose to enter or leave at any time. To do this successfully requires an ability to estimate sam- ple selection effects. Understanding and adjusting for self-selection effects is important if we are to determine if self-help actually has an effect or is merely a case (as has been argued) of the best prognosis individuals getting better as they would without going to a group. If one can show that self-help groups have an effect that is not due to self-selection, it would make sense for policy makers to support self-help clearinghouses (non- profit agencies that provide information and referrals to persons seeking or starting self-help groups), for churches and community centers to allow self-help organizations to use their space, and for professional counsellors to encourage their clients to attend self-help groups. Further, understanding how individuals select into groups may be helpful to those mutual help organ- izations that wish to increase their accessibility and membership.

Hinrichsen, Revenson and Shinn (1985) provide an example of the difficulties of conducting self-help group evaluations in the absence of information on self-selec- tion effects. They conducted a cross-sectional com- parison of 245 individuals who participated in scoliosis self-help clubs with 495 individuals who received infor- mation from the clubs but did not attend meetings. Although the individuals who attended meetings reported high satisfaction with the clubs, they were not significantly different than non-attenders on psy- chosocial functioning. These data might indicate that scoliosis clubs do not help their members. However, people who join self-help groups may be more distressed than those who do not (see Levy & Derby, 1992). Thus,

Self-Selection Effects in Evaluation Research 303

if individuals who selected into the clubs were initially functioning at a lower level than those who did not, the later equivalent functioning for the two groups might indicate that the clubs had a positive effect.

Covariate control approaches to selection effects are familiar to most mental health evaluators. In order to estimate the effect of a program, the evaluator des- ignates certain variables (covariates) as “nuisance” or “control” variables. Because covariates relate to the outcome under study and their mean values are different across groups, the evaluator uses statistical procedures (e.g. ANCOVA, partial correlation, hierarchical regression) to control for the covariates and estimate what the effect of the program theoretically would have been if the groups had been equal on the covariates initially. For example, if one was evaluating the effects of a self-help group for bereaved people on psycho- logical functioning, one might find that initial depression predicted outcome and that persons who elected to participate in the group initially were more depressed. To increase the likelihood that observed differences in outcome were due to the self-help group and not the pre-existent levels of depression, the eva- luator could designate depression at baseline as a covari- ate and then conduct ANCOVA or regression with a control step, which theoretically would estimate the effect of the self-help group as if the covariate (initial depression) had been equal across groups.

Covariate control approaches have some logical and practical limitations, which have been lucidly described by Cronbach (1982) and Meehl (1970). However, covariate control procedures have utility in many evalu- ation contexts if their assumptions are borne in mind (see Huitema, 1980). Even if covariate control pro- cedures were above criticism, there are situations where an evaluator may wish to do more than control for variables that predict outcome. For example, it is some- times of interest to understand the process of program selection itself (Lavori, Dawson & Mueller, 1994).

In naturalistic evaluations of self-help groups and professional mental health services, part of under- standing the context of the intervention is determining who selects into it and why. For example, epidemiologic data suggest that Latinos may be less likely than Cau- casians to seek help from self-help groups and pro- fessional mental health services (Lieberman & Snowden, 1993). Agencies (e.g. crisis hotlines, self-help clearinghouses) that attempt to get individuals in con- tact with forma1 and informal helping systems may wish to understand the reasons for this difference and change their referral patterns accordingly.

Relatedly, once the self-selection process is better understood, an evaluator may wish to use the infor- mation gained to obtain more precise estimates of treat- ment effects. That is, if some individual or program characteristics predicts selection into a program, we can

use that information when we are trying to determine if the program is effective. Heckman (1979) among many others, has argued that incorporating information on selection derived from an initial regression mode1 into a second stage model that predicts outcome reduces bias in estimates of program effects, and there is evidence that this is the case for certain types of selection prob- lems (e.g. survey non-response, see Grotzinger, Stuart & Ahern, 1994). Hence, this approach may also be use- ful for addressing self-selection effects in self-help

research. In covariate control approaches, one does not typi-

cally employ variables that predict selection because one is using a one-stage modelling approach in which the independent variables directly predict outcome. In contrast, two-stage sample selection procedures are intended to reduce bias in estimates of program effec- tiveness through the use of self-selection information based on “instruments”. An instrument is a variable that predicts selection into a program, but does not predict outcome. Two common sample selection models that use instruments are two-stage least squares instru- mental variables models (see Greene, 1993) and Heck- man’s (1979, generalized by Maddala, 1983). We describe Heckman’s model here because we use it in our empirical example.

Two-stage sample selection models, such as Heck- man’s, use a first stage mode1 to predict program par- ticipation:

P = yw+v (1)

where W is a vector of factors that predict participation, y is the set of parameters associated with W, and v is an error term. W must include one or more instruments. For example, we could put distance to the nearest self- help group meeting in this equation because it would probably relate to participation in self-help but not to outcome. In addition to instruments, W can also include variables that relate to outcome. However, as in any other regression equation, one wants to include only variables that make conceptual sense, in this case vari- ables that might relate to participation. If one adds variables indiscriminately to equation (1) this tends to produce multicollinearity and attendant unstable results.

Heckman’s method uses equation (1) to generate a sample selection correction factor, /l, which is incor- porated in equation (2) below. Other than including this sample selection correction factor (“lambda”, also called an “inverse Mills Ratio”), equation (2) is identical to a covariate control model.

A = fiX+zP+j?J+e. (2)

The inclusion of ;i is intended to reduce bias in esti- mates of the participation effect (cI). Further, the absol- ute value of /IA (the coefficient of the sample selection

304 KEITH HUMPHREYS et al.

correction factor, lambda) is an indicator of degree of selection bias: the higher its value the more pronounced the self-selection effects. If fil is 0, we can assume that program participation is “ignorable”, meaning that individuals did not select into the program in such a way as to make a difference with respect to outcome. Heckman’s method requires that P in equation (2) be a binary variable (it is possible to expand this to a limited number of binary choices using an ordered probit model or various forms of discrete choice models, see Maddala, 1983). In cases where an evaluator wishes to use a continuous participation variable, a two-stage, ordinary least squares instrumental variable model can be used (see Greene, 1993). For more details on Heck- man’s method and how to calculate 2, see Heckman (1979).

One could reasonably ask why we cannot just put all our instruments in equation (2) and use covariate con- trol as usual. As we will demonstrate, this approach is not advisable because equation (2) predicts outcome rather than participation, so we would not gain any information by including variables in it that predict program participation but not outcome, and we would be excluding the information that equation (1) yields about the selection process. Further, this approach pro- duces multicollinearity in equation (2) because the instruments predict participation, which is included as an independent variable in equation (2).

EMPIRICAL EXAMPLE

An empirical application of the covariate control and sample selection approaches should help clarify some of the issues we have raised so far. The data are drawn from a longitudinal study of the course of drinking problems. For the sake of simplicity, we discuss only a few of the measures and part of the sample being studied in the larger research program (for more detail on the full study, see Finney & Moos, 1995; Humphreys, Finney & Moos, 1994; Timko, Moos, Finney & Moos, 1994).

Purpose This study was intended to determine if attendance at AA meetings predicted reductions in the severity of the drinking pattern of individuals with drinking problems. Because individuals chose whether or not to attend AA, self-selection issues are important to answering our question of interest.

Sample This empirical example uses 218 individuals with drink- ing problems who were recruited through alcoholism information and referral services and detoxification cen- ters and were followed up 1 year later. No one in the

sample received professional alcoholism treatment before or after baseline. The sample was 56% female, 23% married, and 84% Caucasian. Average age was 34.3 years and average education was 12.8 years. At baseline, participants were intoxicated an average of 12.4 days in the past month and consumed 11.7 ounces of alcohol on typical drinking days.

Procedure At baseline and 1 year follow-up, participants com- pleted the Health and Daily Living Form (Moos, Cron- kite & Finney, 1990; more information on sample recruitment and follow-up procedures can be found in Timko et al., 1994). The measures we use here are an item tapping participants’ Perceived Seriousness of Their Drinking Problem (range 0 = not considered a problem to 4 = considered a serious problem), and a scale measuring Information Seeking Coping Responses (range 0 = individual never seeks infor- mation/advice when dealing with stressors, 21 = indi- vidual frequently uses information seeking coping responses). At baseline and follow-up, a Drinking Pat- tern score was calculated for the past 6 months. Indi- viduals received a score for their drinking in each of the past 6 months (range 0 = totally abstinent to 5 = very heavy drinking) and these values were summed to create a score ranging from O-30. We also obtained demo- graphic information (e.g. sex, marital status) and deter- mined if and when respondents attended AA meetings. A dichotomous variable was created reflecting whether participants had attended AA meetings in the first 6 months after baseline (number of AA attenders = 95, range = &210 meetings, mean = 17). This 6-month window for AA attendance was used so that there would not be overlap between the periods that program attend- ance and outcome were measured.

Examination of Potential Instruments Perceived seriousness of drinking may predict par- ticipation because individuals who believe their drin- king is a problem are probably more likely to go to AA. A tendency to cope with problems by seeking infor- mation/advice also seems likely to predict seeking out AA. Finally, women may be more likely to seek out substance abuse self-help programs than are men (Hum- phreys, Mavis & Stoffelmayr, 1991). However, to deter- mine if these are good instruments, we must examine their correlations with AA participation and with out- come (see Table 1). All three of these variables predict participation but do not predict outcome, making them appropriate for constructing a sample selection model, but less useful in a covariate control model that directly predicts outcome, as will be shown below.

Covariate Control Approach In most cases, when evaluators are predicting problem severity at follow-up, they want to take the baseline

Self-Selection Effects in Evaluation Research 305

TABLE 1 CORRELATION MATRIX OF POTENTIAL INSTRUMENTS WITH

AA PARTICIPATION AND OUTCOME

AA participation Drinking-pattern

in the 6 months at 1 year after baseline follow-up

Information seeking coping 0.18** -0.06

Sex (0 = male, 1 = female) 0.15* -0.09

Perceived seriousness of drinking 0.27** -0.07

l p < 0.05; **p < 0.01.

level of problems into account. In this example, this involves designating drinking pattern in the 6 months before baseline as a covariate. In studies of alcoholism, it is also often appropriate to include as covariates demographic variables that relate to alcohol con- sumption (in this example, being married). Now that we have designated covariates, we can run ANCOVA or regression with controls (statistically, these pro- cedures are identical) to see if participation in AA pre- dicts the outcome (drinking pattern at l-year follow-

up). The results of two variations of the covariate control

regression equation are presented in Table 2. The first version includes only those covariates we expect to relate to outcome, whereas the second also includes our instruments. The results demonstrate two points: (1) Even with important control covariates entered in the equation, individuals who attended AA meetings had less severe drinking patterns at follow-up; and (2) Adding the instruments to the covariate equation adds virtually nothing to our ability to predict outcome.

TABLE 2 COVARIATE CONTROL REGRESSION EQUATIONS PREDICTING DRINKING PATTERN AT ONE YEAR

FOLLOW-UP WITH AND WITHOUT INSTRUMENTS IN THE

MODEL

Model with Model with

no instruments instruments B (SE B) B (SE B)

Instruments Sex (0 = male, 1 =female) -1.16(1.09)

Information seeking coping -0.04(0.12) Perceived seriousness of

drinking -0.44 (0.57) Control variables Baseline drinking pattern 0.17 (0.08)* 0.20 (0.09)* Married (0 = no, 1 =yes) - 1.78 (1.23) - 1.69 (1.25)

Program variable AA meetings (0 = no, 1 = yes) - 3.47 (1.05)* -2.82 (1.15)* R* for model 0.079 0.084

‘p < 0.05.

These results are more convincing than a bivariate cor- relation between more AA meetings and less severe drinking, but one must remember that the obtained estimate of AAs effectiveness is not adjusted for factors that predicted self-selection into AA. To address this issue, and to make better use of the instruments we have identified, we will conduct a two-stage sample selection model.

Heckman’s Two Stage Sample Selection Approach The first stage of the sample selection model will be termed the “participation model”. It includes our instruments because they predict participation. The par- ticipation model can also include variables that may predict outcome as long as they also are likely to predict participation. However, the participation model must include at least one independent variable that is not in the outcome model.

Heckman’s model can be estimated using several computer packages, including LIMDEP (Greene, 1992). Table 3 presents the participation model gener- ated by this procedure and Table 4 presents the second stage, or outcome model. The participation model is a probit regression model because we are predicting a dichotomous dependent variable (participation in AA). Results suggest that individuals who use information seeking coping responses and perceive their drinking as a more serious problem are more likely to go to AA. The effect for gender approaches but does not attain significance (p = 0.123).

The participation model also produced “lambda”, which is a mathematical expression of how self-selection

TABLE 3 FIRST STAGE OF HECKMAN MODEL PREDICTING

PARTICIPATION (BINOMIAL PROBIT MODEL)

Coefficient SE T p

Sex (0 = male, 1 = female) 0.287 Information seeking coping 0.059 Perceived seriousness of

drinking 0.382

R2=0.129.

0.19 1.54 0.123

0.02 2.78 0.005

0.09 4.14 0.000

TABLE 4 SECOND STAGE OF HECKMAN MODEL PREDICTING

OUTCOME

B SEB T p

Baseline drinking pattern 0.20 0.08 2.36 0.018 Married (0 = no, 1 = yes) -1.68 1.23 -1.37 0.172 AA meetings (0 = no,

1 = yes) -6.31 3.04 -2.08 0.038 Lambda 2.10 1.98 1.06 0.289

@=0.084.

306 KEITH HUMPHREYS et al.

differences between participants and non-participants affect outcome estimates. This sample selection cor- rection factor is included in our outcome model (see Table 4). Adjusted for self-selection effects, AA is still statistically significant. More importantly, the effect size for AA in the Heckman model (B = -6.31) is almost double that derived in the covariance model (B = -3.47), which tells us that the direction of the selection effect in this sample is to minimize the estimate of the influence of AA on drinking. The absolute value of the lambda term is non-zero, further supporting the idea that there is a selection effect in the sample, although it is not statistically significant 0) = 0.289).

DISCUSSION: PRACTICAL PROBLEMS AND FURTHER APPLICATIONS

Self-selection issues are inherent in evaluations of mem- ber controlled self-help groups and are also often a concern in evaluations of professional mental health services. No one strategy can work to address selection effects in all cases, but an evaluator who is familiar with both covariate control and two-stage sample selection approaches will be better able to respond appropriately and flexibly to the demands of each program evaluation. Two-stage sample selection procedures may be pref- erable to covariate control when an evaluator is inter- ested in understanding the selection process, which is often useful. For example, one might conclude from our participation model that AA participation could be increased by heightening individuals’ concern about heavy drinking (such as through an educational cam- paign). Two-stage sample selection models are also use- ful when one wishes to incorporate selection data into outcome models.

No statistical procedure is without limitations. Sam- ple selection approaches tend to have less power than covariate control models, particularly when good instruments are not available. Hence, they are less useful in small samples or in cases where the instruments weakly predict participation. When instruments do a poor job of differentiating participants and non-par- ticipants, the reliability of the estimated program effects falls. Monte Carlo studies of sample selection models have shown that good instruments will yield the unbiased results that theory predicts, but very weak instruments yield results that are worse than making no effort to correct for selection effects (Bound, Jaeger & Baker, 1993; Staiger & Stock, 1993). In other words, an evaluator cannot correct for selection effects using variables that do not predict selection!

How can an evaluator determine if the participation model is doing a reasonable job of predicting selection? In small samples, the statistical significance of the pre- dictors can be helpful in evaluating the model, but in

large samples findings that have little practical import are often statistically significant. Two useful assessment strategies are to examine whether the predictors improve on chance in classifying participants and non- participants (such a “hit rate” table is produced auto- matically by the logistic regression function of SPSS), and to see if one’s current model predicts selection better (as indicated by hit rate or R2) than other plausible models that can be created from the data set.

The key to building a useful participation model is locating good instruments. Variables that measure the availability of the program are a potentially rich source of instruments (Moffit, 1991). One can also use indi- vidual level variables, as in our example. A key factor to remember in the search for instruments is to look beyond the individuals studied to program and com- munity features that relate to participation (e.g. trans- portation availability, cost, number of places a program is offered). Some planning may be necessary at the beginning of the study in order to ensure that such data are available.

In many cases it may be informative to run both covariate control and two-stage sample selection models and use a weight of evidence approach. In the example presented here, we were able to use both approaches to converge on what seems a solid finding: even when baseline functioning, demographics, and the effects of self-selection are taken into account, AA participation predicts less severe drinking patterns. Other selection adjustment procedures that could be used to test con- vergence of approaches are propensity models (Rubin, 1974, 1977) and FIML latent-variable covariance struc- ture models (Jbreskog & S&-born, 1993). Reynolds and Temple (1995) provide an accessible example of how to conduct such convergence analyses.

In this paper, our two procedures gave a similar result. However, as noted by an anonymous reviewer, one cannot assume that different methods of adjusting for selection will produce the same substantive finding. Some variation in effect estimates across procedures is to be expected, but what should the evaluator do if one procedure indicates that a program is effective and another indicates that it is not? Heckman, Hotz and Dabos (1987) have argued that the most common reasons for wide variation in results across selection adjustment procedures are model mispecifications and disregard for the statistical assumptions of models. Their paper provides a number of directions to pursue for the evaluator whose models have given a “split decision”.

With regards to other practical problems that may arise in working with sample selection models (e.g. one reviewer brought up the potential problem of inter- correlated predictors), it may be helpful to recognize that Heckman’s procedure draws heavily on the general linear model regression procedures with which most

Self-Selection Effects in Evaluation Research 307

mental health evaluators are already familiar. Indeed, the second stage of the model is simply an OLS regression equation. Hence, much of the expertise men- tal health evaluators have garnered from working with regression equations (e.g. how to assess and transform skewed variables, dealing with “bouncing betas”) will help address practical problems of this “new” procedure.

health services to apply them in their work. The inter- ested reader will find further information on the under- lying statistical logic and applications of sample selection models elsewhere (Greene, 1993; Heckman, 1979; Heckman & Robb, 1985; Manski, 1989, 1990; Newey, Powell & Walker, 1990; Reynolds & Temple, 1995; Willis & Rosen, 1979).

We now turn to some other applications of the sample selection approach to evaluations of mutual help groups and professional services. In the full data set of the study we described (see Timko et al., 1994), a common situation obtained: a subset of the sample sought inpa- tient or residential professional care, and subsets of those individual’s subsequently went to AA only, out- patient treatment only, AA and outpatient treatment, or sought no further help. To isolate the effects of self- help and professional services in situations such as these requires a more complex model than the one we have presented. When individuals make sequential choices and have multiple options at each decision point, sample selection effects are termed “nested”. Although beyond the scope of this paper, we refer the interested reader to Maddala (1983) and McFadden (1974, 1987) for a description of the nested logit choice method that can be used in situations of this sort.

REFERENCES

BOUND J., JAEGER, D. A., & BAKER R. (1993). The cure can

be worse thun the disease: A cautionary tale regarding instrumental

variables. (Available from National Bureau of Economic Research,

1050 Massachusetts Avenue, Cambridge, MA 02 138.)

CAMERON, A. C., & TRIVEDI, P. K. (1986). Econometric models

based on count data: Comparisons and applications of some esti-

mators and tests. Journal ofApplied Econometrics, I, 29-53.

CRONBACH, L. J. (1982). Designing evaluations of educational and

socialprograms. San Francisco: Jossey-Bass.

FINNEY, J. W., & MOOS, R. H. (1995). Entering treatment for

alcohol abuse: a stress and coping model. Addiction, 90, 1223-1240.

GREENE, W. H. (1992). LIMDEP User’s Manual and Reference

Guide. Bellport, NY: Econometric Software.

Self-help evaluations sometimes focus on a question with two components: Do people in the group do better than people not in the group; and within the group, do people who are more involved do better than those who are less involved? To estimate such a model, one could use the two stage demand model from the Rand Health Insurance experiment (Manning, Newhouse, Duran, Keeler, Leibowitz & Marquis, 1987) to generate pre- dictions of participation, and, conditional on partici- pation, predictions of the number of meetings attended. This model would use a logit or probit model to predict participation, and then an OLS or count data model (Cameron & Trivedi, 1986) to predict the number of meetings attended for those who participate.

GREENE, W. H. (1993). Selection-incidental truncation. In W. H.

Greene, Econometric Analysis (pp. 706715). New York: MacMillan.

GROTZINGER, K. M., STUART, B. C., & AHERN, F. (1994).

Assessment and control of nonresponse bias in a survey of medicine

use by the elderly. Medical Care, IO, 989-1003.

HECKMAN, J. (1979). Sample selection as a specification error. Econometrica, 47, 1533161.

HECKMAN, J., & ROBB, R. (1985). Alternative methods for evalu-

ating the impact of interventions: An overview. Journal of Econo-

metries, 30, 239-267.

An endogenous switching model (Maddala, 1983) is another extension of sample selection modeling that may be useful to program evaluators. This model relaxes and tests the assumption of the Heckman model that the estimated parameters in the model are the same for participants and non-participants. For example, in the example presented here, the slopes connecting some of the baseline variables and outcome tended toward heterogeneity. In cases where this heterogeneity is pro- nounced, an endogenous switching model could detect the heterogeneity in slopes and estimate sample selec- tion parameters accordingly. Such a model also yields an estimate of what the observed program effect would be under random assignment.

HECKMAN, J. J., HOTZ, V. J., & DABOS, M. (1987). Do we need experimental data to evaluate the impact of manpower training on

earnings? Evaluation Review, II, 395427.

HINRICHSEN, G. A., REVENSON, T. A., & SHINN, M. (1985).

Does self-help help? An empirical investigation of scoliosis peer sup-

port groups. Journal of Social Issues, 41, 65-87.

HUITEMA, B. E. (1980). The analysis of covariance and alternatives.

New York: Wiley.

HUMPHREYS, K., & RAPPAPORT, J. (1994). Researching self- help/mutual aid groups and organizations: Many roads, one journey.

Applied and Preventive Psychology, 3, 217-23 I.

HUMPHREYS, K., FINNEY, J. W.,&MOOS, R. H. (1994). Apply-

ing a stress and coping perspective to research on mutual help organ- izations. Journal of Community Psychology, 22, 3 12-327.

We hope this presentation of sample selection models HUMPHREY& K., MAVIS, B. E., & STijFFELMAYR, B. E.

encourages evaluators of self-help groups and mental (1991). Factors predicting attendance at self-help groups after sub-

308 KEITH HUMPHREYS et al.

stance abuse treatment: Preliminary findings. Journal qf’ Consulting MOFFITT, R. (1991). Program evaluation with nonexperimental and Clinical Psychology, 59, 591-593. data. Evaluation Review, 15, 29 1-3 14.

JijRESKOG, K. G., & SijRBOM, D. (1993). LISREL 8: THE

SIMPLIS command /angua,ge. Hillsdale, NJ: Lawrence Erlbaum.

MOOS, R. H., CRONKITE, R. C., & FINNEY, J. W. (1990). Healfh

and daily /icing form manual (2nd ed.). Palo Alto, CA: Consulting

Psychologists Press.

LAVORI, P. W., DAWSON, R., & MUELLER, T. B. (1994). Causal

estimation of time-varying treatment effects in observational studies:

Application to depressive disorder. Statistics in Medicine, 13, 1089-

1100.

NEWEY, W. K., POWELL, J. L., & WALKER, J. R. (1990).

Semiparametric estimation of selection models: Some empirical

results. American Economic Review, 80, 324328.

LEVY, L. (in press). Self-help groups. In J. Rappaport & E. Seidman

(Eds.), Handbook of community psychology. New York: Plenum.

REYNOLDS, A. J., & TEMPLE, J. A. (1995). Quasi-experimental

estimates of the effects of preschool intervention. Evaluation Review,

19,347-373.

LEVY, L. H., & DERBY, J. F. (1992). Bereavement support groups:

Who joins, who does not, and why. American Journal of Community

Psychology, 20, 649-662.

RUBIN, D. B. (1974). Estimating causal effect of treatments in ran-

domized and nonrandomized studies. Journul qf Educational

Psychology, 66, 688-701.

LIEBERMAN, M. A., & SNOWDEN, L. R. (1993). Problems in

assessing prevalence and membership characteristics of self-help par-

ticipants. Journal of Applied Behavioral Science, 29, 166-l 80.

RUBIN, D. B. (1977). Assignment to treatment group on the basis

of a covariance. Journal of the Royal Statistical Society, 2(l), l-26.

MADDALA, G. S. (1983). Limited-Dependent and qualitative vnri-

able.7 in econometrics. Cambridge: Cambridge University Press.

MANNING, W. G., NEWHOUSE, J. P., DUAN, N., KEELER,

E. B., LEIBOWITZ, A., & MARQUIS, M. S. (1987). Health insur-

ance and the demand for medical care. American Economic Review,

77,251-277.

STAIGER, D., & STOCK J. H. (1993). Instrumental variables

regression with weak instruments. (Available from National Bureau of

Economic Research, 1050 Massachusetts Avenue, Cambridge, MA

02138.)

MANSKI, C. (1989). Anatomy of the selection problem. Journal of

Human Resources, 24, 343-360.

TIMKO, C., MOOS, R. H., FINNEY, J. W., & MOOS, B. S. (1994).

Outcome of treatment for alcohol abuse and involvement in AA

among previously untreated problem drinkers. Journal qf Mental

Health Administration, 21, 145-160.

MANSKI, C. (1990). Nonparametric bounds for treatment effects.

American Economic Review,, 80, 319-323.

TORO, P. A., REISCHL, T. M., ZIMMERMAN, M. A., RAPPA-

PORT, J., SEIDMAN, E., LUKE, D. A., & ROBERTS, L. J. (1988).

Professionals in mutual help groups: Impact on social climate and

members’ behavior. Journal of Consulting and Clinical Psychology,

56, 631-632. MCFADDEN, D. (1974). Condition logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp.

105-142). New York: Academic Press.

MCFADDEN, D. (1987). Regression-based specification tests for the

multinomial logit model. Journal of Econometrics, 34, 63-82.

WALSH, D. C., HINGSON, R. W., MERRIGAN, D. M., LEVEN-

SON, S. M., CUPPLES, A., HEEREN, T., COFFMAN, G. A.,

BECKER, C. A., BARTLER, T. A., HAMILTON, S. K., McGUIRE, T. G., & KELLY, C. A. (1991). A randomized trial of

treatment options for alcohol-abusing workers. New England Journal

of Medicine, 325, 775-782.

MEEHL, P. (1970). Nuisance variables and the ex post facto design.

In M. Radner & S. Vinokur (Eds.), Minnesota srudies in thephilosophl

of science (pp. 373402). Minneapolis: University of Minnesota Press.

WILLIS, R. J., & ROSEN, S. (1979). Education and self-selection.

Journal of Polilical Economy, 87, S7-S36.