the mathem atic s of risk cla ssif ica tion: chang ing data int o … · 2005. 8. 24. · risk cla...

48
Changing Data into Valid Instruments for Juvenile Courts The Mathematics of Risk Classification: Report U.S. Department of Justice Office of Justice Programs Office of Juvenile Justice and Delinquency Prevention

Upload: others

Post on 31-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Changing Datainto ValidInstruments forJuvenile Courts

    The Mathematics ofRisk Classification:

    Report

    U.S. Department of Justice

    Office of Justice Programs

    Office of Juvenile Justice and Delinquency Prevention

  • This report was prepared under a contract with the National Center for Juvenile Justice, the research division ofthe National Council of Juvenile and Family Court Judges for their National Juvenile Justice Data Analysis Projectfunded by a cooperative agreement with the Office of Juvenile Justice and Delinquency Prevention of the U.S.Department of Justice. The opinions expressed are those of the authors and do not necessarily reflect the views orendorsement of the National Center for Juvenile Justice, the National Council of Juvenile and Family Court Judges,the Office of Juvenile Justice and Delinquency Prevention, the U.S. Department of Justice, the Maricopa CountyJuvenile Court or its Juvenile Justice Center, or any other agency or person.

    U.S. Department of JusticeOffice of Justice Programs

    810 Seventh Street NW.Washington, DC 20531

    Alberto R. GonzalesAttorney General

    Tracy A. HenkeActing Assistant Attorney General

    J. Robert FloresAdministrator

    Office of Juvenile Justice and Delinquency Prevention

    Office of Justice ProgramsPartnerships for Safer Communities

    www.ojp.usdoj.gov

    Office of Juvenile Justice and Delinquency Preventionwww.ojp.usdoj.gov/ojjdp

    The Office of Juvenile Justice and Delinquency Prevention is a component of the Office of Justice Programs, which also includes the Bureau of Justice Assistance, the Bureau of Justice Statistics, the National Institute ofJustice, and the Office for Victims of Crime.

  • The Mathematics of Risk Classification: Changing Data into Valid Instruments for Juvenile Courts Don M. Gottfredson and Howard N. Snyder

    National Center for Juvenile Justice

    Report July 2005 Office of Juvenile Justice and Delinquency Prevention NCJ 209158

  • iii

    Prologue

    The primary goal of a risk scale is to predict—notto explain, but to predict. Individual items are placedon the risk instrument because they are believed to bebetter at predicting the criterion (which in the workthat follows is recidivism to juvenile court) than areother available measures. Some risk instruments aredeveloped subjectively, by asking knowledgeable pro-fessionals to select items their experience tells themare predictive. Other instruments are developed empir-ically, using statistical analyses to select among a setof available data elements that capture various attrib-utes of the youth and his/her juvenile court history.Statistically, when an item is a significant predictor ofthe criterion, it improves the power of the instrumentto predict and should be used in the scale.

    However, problems arise when empirical power isthe sole criterion used to select risk scale predictors.Items that predict may also carry with them valuejudgements or ethical connotations that threaten theface validity of the instrument and may erroneouslyimply certain motives to its developers or users. Theseconcerns cannot, and should not, be ignored or dis-counted. The obvious example of such a dilemma forrisk scale developers in juvenile justice is the use ofrace (and/or ethnicity) as a predictor of recidivism (orprogram failure). The fact is that most juvenile justiceresearch has found a correlation between race andnegative outcomes. Knowing nothing else about ayouth, one would be on statistically-sound footing topredict that minority youth are more likely to recidi-vate than white youth. However, few today wouldargue that a child's race is causally related to his likeli-hood of recidivism. Indeed, such a use of race wouldviolate our sense of equal justice and equal protectionunder the law. Recidivism differences may be correlat-ed with race but they are not caused by race. Therecidivism differences are caused by attributes that arethemselves correlated with race (e.g., poverty, schoolfailure, a high proportion of unsupervised time in aday, the levels of community disruption, and theamount of police surveillance in the community).

    Statistically, if these items were available for eachchild, they could be incorporated into the pool ofpotential risk scale items and race would "drop out" asa predictor in the statistical analysis (i.e., found not tobe a significant predictor of recidivism). However,when they are not available, their predictive power willbe partially captured by a race variable that will thenshow a significant correlation with recidivism. In otherwords, when the range of available data is limited, racewill "stand in for" the absent variables that are causallyrelated to recidivism.

    So what are risk scale developers to do?Statistically, including the race variable improves thepredictions made by the risk scale. Ethically, excludingthe race variable prevents the assignment of riskscores that may inappropriately lead to greater sanc-tions for minority youth. There is a solution thatenables risk scale developers to accommodate bothconcerns. This solution is central to the work that fol-lows. Briefly, the authors suggest that risk scale devel-opers include a race variable in the early stages of riskscale development, but then remove the race variablefrom the published instrument. The authors argue thatsome statistical methods used in the development ofrisk scales enable the unique (e.g., independent) effectof race to be removed from the prediction process. Infact, the authors argue that unless race is mathemati-cally included in the initial steps of the development ofa risk scale when race is found to correlate with thecriterion measure, racial bias cannot be removed fromthe resulting risk scale, but remains unobtrusivelybeneath the surface influencing each risk scale score.As discussed on page 31, mathematically removing therace variable from the published risk instrument treatsall youth as having the same race. Ethically, statistical-ly removing the influence of race on risk scores goes along way to assure users that race is not a factor in theassignment of these scores. Practically speaking, inorder to avoid having racial biases in the final product,the developer must use the race variable and thenremove it.

  • iv

    Prologue

    As you read the material that follows, be assuredthat while race was found to be a predictive factorwhen using various statistical techniques to develop arisk scale, there is no claim that race causes recidivism.More importantly, it is the clear intent of this work todemonstrate how the influence of race (and otherinvidious variables) can and should be systematicallyremoved from risk scales by those tasked to developthese important decisionmaking aides. The risk scalespresented in the report include race as a predictivefactor. These prototype scales are presented as modelsof those experimental scales that will be prepared dur-ing the development stages and used in the validationphases, to be seen and used only by the technical stafftasked with developing the scales. In the end, whenthe scales move from development into field-testing

    and then into day-to-day use, the race variable wouldnot be a part of risk scales, either explicitly as an itemon the scale or indirectly as a spurious correlationwith other scale items. Part of the real value of thiswork is to demonstrate the method by which race (andother such variables) may be initially used to createthe prediction scale, but then removed so that the ves-tiges of discrimination are removed from the finalscale. As the discussion on page 31 notes, such aprocess is feasible when multiple regression tech-niques are used, and the use of such a multiple stepprocess (first using race as a predictor, then removingit from the scale) can assist in creating predictionscales that do not exacerbate the issues of minorityoverrepresentation in contact with the justice system.

  • v

    Acknowledgments

    Tierney prepared the manuscript for publication. Wealso valued the support and advice of Joseph Mooneof the Office of Juvenile Justice and DelinquencyPrevention, who is Program Monitor for the NationalJuvenile Justice Data Analysis Project (grant number99–JN–FX–K002).

    The permission of the Maricopa County JuvenileCourt (Phoenix, AZ) and its Juvenile Justice Center wasmuch appreciated. Terrence A. Finnegan prepared thedata file from collection of the National Juvenile CourtData Archive maintained by the National Center forJuvenile Justice. David P. Farrington reviewed the man-uscript and provided many helpful suggestions. Nancy

  • vii

    Table of Contents

    Page

    List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

    List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

    Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Purpose of This Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    Prior Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    General Limitations of Risk-Screening Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    General Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    Steps Followed in Prediction Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    Simplification Practices and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Requirements for Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Probability and Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Predictor Candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Methods of Combining Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    Linear Additive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    Configural or Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    “Bootstrap” Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    Data Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Construction and Validation Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    Development and Validation of Operational Tools . . . . . . . . . . . . . . . . . . . . . . 17

    Burgess Method (Equal Weight Linear Model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Predictive Attribute Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    Bootstrap Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    Comparison of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    Predictive Efficiency and Operational Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    Removing Invidious Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    Which Method? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

  • ix

    List of Tables

    Table Page

    1. Selected Attributes Included for Study, With Relation to Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . .14

    2. New Referral within 2 Years, Analyzed by Whether There Were Any Prior Referrals . . . . . . . . . . . . .15

    3. Selected Variables Included for Study, With Relation to Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

    4. Classification by the Burgess 9-Item Scale, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

    5. Classification by the Burgess 15-Item Scale, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

    6. Multiple Linear Regression of New Referrals on Selected Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

    7. Classification by the Multiple Linear Regression Scale, Validation Sample . . . . . . . . . . . . . . . . . . . . .19

    8. Logistic Regression of New Referrals on Selected Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

    9. Classification by the Logistic Regression Scale, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

    10. Classification by Predictive Attribute Analysis, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . .23

    11. Classification of Youth With No Prior Referrals, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . .24

    12. Classification of Youth With Prior Referrals, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

    13. Classification of All Youth by Bootstrap Method, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . .25

    14. Correlation of Prediction Scores With Outcomes, Construction and Validation Samples . . . . . . . . .26

    15. Criterion Variance Explained, Construction and Validation Samples, Five Classification Methods . . .28

    16. Rankings and Authors’ Ratings of Six Classification Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

    17. Correlations of Classification Procedures Based on Various Prediction Methods, Validation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

  • xi

    List of Figures

    Figure Page

    1. Burgess 9-Item Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

    2. Classification Groups From the Burgess 9-Item Scale, Validation Sample . . . . . . . . . . . . . . . . . . . . . .17

    3. Burgess 15-Item Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

    4. Classification Groups From the Burgess 15-Item Scale, Validation Sample . . . . . . . . . . . . . . . . . . . . .18

    5. Classification of Youth Based on Multiple Linear Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . .19

    6. Classification Groups From the Multiple Linear Regression Scale, Validation Sample . . . . . . . . . . . .19

    7. Classification Groups Based on Logistic Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

    8. Classification Groups From the Logistic Regression Scale, Validation Sample . . . . . . . . . . . . . . . . . .20

    9. Predictive Attribute Analysis, Construction Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

    10. Classification Into Five Groups by Predictive Attribute Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

    11. Classification Groups From the Predictive Attribute Analysis, Validation Sample . . . . . . . . . . . . . . .23

    12. Classification of Youth With and Without Prior Referrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

    13. Classification of Youth With No Prior Referrals Into Five Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

    14. Classification of Youth With Prior Referrals Into Five Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

    15. Classification of All Youth Into Five Groups by Bootstrap Method . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

    16. Frequency Distribution of Scores for the Burgess 15-Item Scale Scoresfor Youth With and Without at Least One New Referral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

    17. Percent Distribution of Scores for the Burgess 15-Item Scale Scores for Youth With at Least One New Referral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

    18. Frequency Distribution of Scores for the Multiple Linear Regression Scale Scoresfor Youth With and Without at Least One New Referral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

    19. Percent Distribution of Scores for the Multiple Linear Regression Scale Scoresfor Youth With at Least One New Referral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

  • xiii

    Preface

    This report is meant to help juvenile courts devel-op practical risk-screening instruments. Courtsincreasingly are using some method of risk classifica-tion to assist in assignment of youth to differentialservice/supervision programs. A comparison of com-monly used or advocated risk classification methodsmay provide courts with guidance in selecting amethod for developing a screening instrument. Thisreport assesses the strengths and weaknesses of sever-al prediction methods in the context of juvenile courts'risk classification needs, based on analyses of onecourt's data that were submitted to the NationalJuvenile Court Data Archive maintained by theNational Center for Juvenile Justice.

    Although the authors hope such advice will beuseful, they recognize that giving it is problematic.What works best in one situation or with one set ofdata may work less well in another situation or withanother set of data. The authors have sought to offer abasic description of the main statistical and practicalproblems associated with the methods compared andto provide a fair assessment of the advantages and dis-advantages of each method. These assessments arebased largely on the predictive validity of classificationinstruments similar to those that a court might deviseby using the methods compared. Because the emphasis

    is on practice rather than on statistics, the authorsoften simplified concepts in the interests of helpingcourts devise clear classification procedures that areeasily understood and used. In comparing classifica-tion methods, the authors placed major importance onthe validity of the measures as a court might actuallyuse them.

    In order to focus on comparing methods of com-bining predictors into a single instrument for risk clas-sification, the authors ignore a number of issues thatare of great importance in prediction, including con-cerns related to sampling, reliability, and discrimina-tion in predictor variables and in the criterion.Selection problems are influenced not only by theseconcerns and by issues of validity, but also by the pro-portions of a population to be selected and the shapesof distributions. In addition, the authors devote littleattention to the subject of "what to predict" (i.e., thecriterion) and do not address the importance of thebase rate. Lastly, except for basing conclusions onresults derived from a validation sample, the authorsdo not consider general problems associated with vali-dation studies or with measurement of the accuracy ofprediction. This report provides references to discus-sions of these topics.

  • 1

    1 For histories and reviews of methodological issues, see H.Mannheim and L. Wilkins, Prediction Methods in Relation toBorstal Training (London, England: Her Majesty’s StationeryOffice, 1955); D.M. Gottfredson, “Assessment and PredictionMethods in Crime and Delinquency,” in Task Force Report:Juvenile Delinquency and Youth Crime (Washington, DC: U.S.Government Printing Office, 1967); H.F. Simon, PredictionMethods in Criminology (London, England: Her Majesty’sStationery Office, 1971); D.M. Gottfredson and M. Tonry(eds.), Prediction and Classification: Criminal Justice DecisionMaking, vol. 9 of Crime and Justice: A Review of Research(Chicago, IL: The University of Chicago Press, 1987).

    2 For a recent review, see P.E. Meehl and W.M. Grove,“Comparative Efficiency of Informal (Subjective,Impressionistic) and Formal (Mechanical, Algorithmic)Prediction Procedures: The Clinical-Statistical Controversy,”Psychology, Public Policy, and Law 2, no. 2 (1996).

    3 See P.E. Meehl and A. Rosen, “Antecedent Probability andthe Efficiency of Psychometric Signs, Patterns, or CuttingScores,” Psychological Bulletin 52 (1955).

    4 For discussions, see D.T. Campbell and J.C. Stanley,Experimental and Quasi-Experimental Designs for Research(Chicago, IL: Rand McNally, 1963); L.J. Cronbach, Essentials ofPsychological Testing (New York, NY: Harper Brothers, 1960);and E.F. Cureton, “Validity, Reliability, and Baloney,” inProblems of Human Assessment, edited by D.M. Jackson and S.Messnick (New York, NY: McGraw-Hill, 1967).

    5 See D.M. Gottfredson and M. Tonry (eds.), Prediction andClassification: Criminal Justice Decision Making, vol. 9 of Crimeand Justice: A Review of Research (Chicago, IL: The Universityof Chicago Press, 1987).

    6 See S.D. Gottfredson and D.M. Gottfredson, “The Accuracyof Prediction Models,” in Research in Criminal Careers and“Career Criminals,” vol. 2, edited by A. Blumstein, J. Cohen,J.A. Roth, and C.A. Visher. (Washington, DC: NationalAcademy Press, 1986).

    Introduction

    The juvenile courts are not alone in their need toclassify persons on the basis of predicted futurebehavior. College administrators and admissions com-mittees, personnel managers, parolling authorities,medical practitioners, and others must attempt to pre-dict future behaviors when making decisions aboutpeople. The problem of prediction is central to allbehavioral science, and it is at the core of decision-making problems in many areas of life.

    Because of this fundamental importance, a greatdeal of effort has been devoted to establishing meansof predicting specific behaviors (or events related tothese behaviors). The fields of juvenile and criminaljustice have been at the forefront of much of this effortand have produced a long history of research and arich literature on predicting offense behaviors.1

    Prediction methods in juvenile or criminal justicehave various utilities, and research workers may havedifferent purposes in developing risk classifications.Major benefits from prediction research include suchcontributions as increasing understanding, testinghypotheses, doing research on effectiveness of treat-ment, and planning programs for different categories ofoffenders. Each application raises different problemsfor comparative assessments of methods. This reportfocuses only on applications used to classify youthinto a small number of groups according to risk as apart of the ordinary operating procedures of a juvenilecourt. It compares the various statistical methods byassessing the validity of the resulting simple classification

    instruments, although it also reports the validity of theprediction methods on which they are based.

    Because this report emphasizes the comparativevalidity of the various classification devices, it ignoresor mentions only briefly other topics of great impor-tance in comparisons of prediction methods. Theseinclude problems associated with the following:

    ■ The relative efficiency of informal clinical or othersubjective (“in-the-head”) predictions and moreformal (“actuarial” or “statistical”) methods ofprediction.2

    ■ Base rates.3

    ■ Unreliability.4

    ■ Selection of criteria (outcomes) to be predicted.5

    ■ Measurement of the accuracy of prediction.6

    ■ Validation issues.

  • 2

    Introduction

    These topics are interrelated, and full considera-tion of any one requires consideration of the others.Such consideration is not within the scope of thisreport.

    Purpose of This Report

    This report is directed to persons who are respon-sible for developing and using risk classification instru-ments in juvenile courts and is intended to help themclassify youth into a small number of risk groups as anaid to program assignments. These individuals mustdecide which of several competing, commonly usedstatistical procedures will work best in selecting andcombining variables for prediction. Given a data setthat records characteristics of youth at referral, theymust decide which characteristics will be most usefulfor risk classification. They must also decide how tocombine the characteristics selected as predictors,usually into a single scale that will be the basis for theclassification tool. When risk classification is actuallyput into practice, the validity of the classification toolis more important than the validity of the predictioninstrument on which it is based. Although some pre-dictive efficiency may be lost in the conversion fromthe prediction instrument to the simpler classificationtool, it is the validity of this tool, as it might be used inpractice, that is of utmost importance.

    Prior Studies

    Empirical comparisons of the predictive efficiencyof different statistical methods for selecting and com-bining predictors have been carried out occasionally;most have been based on adult offender samples.7

    Generally, statistical methods that are theoreticallyless powerful and computationally and procedurallyrather simple have been found to be equal or superiorin predictive validity to the more complex and theoret-ically better methods. Debate continues, though, aboutthe most efficient (most valid, least costly, most opera-tionally useful) way to develop a risk classificationinstrument.

    Procedures

    This report compares the validities of severalinstruments derived by three markedly different typesof statistical methods that are the most commonlyused for selecting and combining predictor variablesto yield risk scores, such as the risk of a new referralto the court. The first procedure is based on theassumption that a linear equation can be found thatadds the scores of the predictor variables to provide atotal score related to the outcome of interest (such asa new referral to the court). Two instruments based onthis procedure are compared—in the first, the valuesof the predictor variables are simply added; in the sec-ond, weights are applied to each variable, and theweighted scores are added. The second procedureuses a more statistically sophisticated model, basednot on the risk of failure but rather on the logarithm ofthe odds of “succeeding” or “failing” associated witheach predictor variable. The third procedure is quitedifferent from the others. It uses a “clustering” orgroup classification method to subdivide a sample intorisk subgroups directly rather than on the basis ofscores on a single scale. A fourth approach, whichcombines features from two of these procedures, isalso considered. This approach first subdivides thesample into more homogeneous subgroups and thendevelops linear equations separately for each subgroup.

    Throughout this report, the statistical solutionsare used to devise classification tools, because that ishow the solutions are commonly used in practice. Asthese tools need to be not only credible but also easyto use, the report introduces some simplifications. Forexample, in practice, classifications of youth usuallydefine only a small number of groups (no more thanfive and often only three). Therefore, emphasis isplaced herein on comparisons of validity based on

    7 D.M. Gottfredson et al., The Utilization of Experience inParole Decision-Making: Summary Report (Washington, DC:U.S. Government Printing Office, 1974); H.F. Simon, PredictionMethods in Criminology (London, England: Her Majesty’sStationery Office, 1971); S.D. Gottfredson and D.M.Gottfredson, Screening for Risk: A Comparison of Methods(Washington, DC: U.S. Department of Justice, NationalInstitute of Corrections, 1979); D.P. Farrington and R. Tarling,“Criminological Prediction: The Way Forward,” in Prediction inCriminology, edited by D.P. Farrington and R. Tarling (Albany,NY: State University of New York Press, 1985); W. Wilbanksand M. Hindelang, “The Comparative Efficiency of ThreePredictive Methods,” app. B, in D.M. Gottfredson, L.T. Wilkins,and P.B. Hoffman, Summarizing Experience for Parole DecisionMaking (Davis, CA: National Council on Crime andDelinquency Research Center, 1972).

  • 3

    Introduction

    classifications into just five groups, although asubstantially larger number of classifications could bedevised with the statistical methods considered.8

    At the end, the authors provide a few recommen-dations that focus on practical aspects of risk classification.

    General Limitations of Risk-Screening Instruments

    The instruments discussed in this report havebeen developed and validated with respect to specificcriteria, using available data in a specific jurisdictionduring a specific time period. As a result, any general-izations—to other outcomes of interest, with modifica-tions of the item definitions used, or to other jurisdic-tions or time periods—should be regarded as uncertain.

    8 The emphasis on a relatively small number of categorieslimits the utility of the comparisons herein for other applica-tions of prediction methods. When a prediction methodresults in a larger number of categories (commonly, a contin-uously distributed score), there are more choices of cuttingpoints—a factor that often is important (particularly forselection problems), in part because the distributions usual-ly are markedly skewed. See L.J. Cronbach and G.C. Gleser,Psychological Tests and Personnel Decisions (Urbana, IL:University of Illinois, 1957). To simplify the comparisons, thisproblem has been ignored.

  • 5

    General Method9

    9 This section is adapted from D.M. Gottfredson (ed.),Juvenile Justice with Eyes Open (Pittsburgh, PA: NationalCenter for Juvenile Justice, 2000).

    10 In the case of dichotomous variables, other indices, suchas the odds ratio of “successes” and “failures,” may be pre-ferred; see, e.g., D.P. Farrington and R. Loeber, “Some Benefitsof Dichotomization in Psychiatric and Criminological Research,”Criminal Behavior and Mental Health 10 (2000):100–122.

    Steps Followed in PredictionStudies

    Five steps should be followed in developing andusing a risk assessment (prediction) measure: (1)defining what is to be predicted (the criterion cate-gories), (2) selecting the predictors, (3) measuringrelations between the predictors and the criterion cat-egories, (4) verifying the relations found (validating themethod), and (5) applying the method.

    Defining Criterion Categories

    The first step is to establish the criterion cate-gories of “favorable” or “unfavorable” performance, or“new offense,” or some other event. This involvesdefining the behavior or event to be predicted anddeveloping procedures for classifying persons on thebasis of their performance in regard to that behavioror event. This step is of utmost importance, because itdefines the standard for selecting predictors and test-ing the validity of results. In addition, it sets limits togeneralization.

    Selecting Predictors

    Second, the attributes or characteristics on whichthe predictions may be based are selected and defined.These “predictor candidates” are expected to relatesignificantly to the criterion categories. (Critics of pre-diction methods often argue that the proceduresignore differences among individuals. However, suchdifferences, often assumed to be a source of error inother problems, are in fact the basis for any predictioneffort. If the persons studied are alike with respect tothe predictor candidates, no differential prediction canbe made. If they are alike with respect to the criterioncategories, there is no prediction problem.)

    Measuring Relations

    The third step is to determine the relationsbetween the predictor candidates and the criterioncategories (and, for some methods, among the predic-tor variables) in a sample representative of the popula-tion for which inferences are to be drawn. These rela-tions ordinarily are measured by the Pearson productmoment correlation coefficient (for continuous vari-ables), the point biserial correlation (an estimate ofthe correlation coefficient for one continuous and onedichotomous variable), or the phi coefficient (an esti-mate of the correlation for two dichotomous attrib-utes).10 A representative sample usually is sought byrandom selection from the population to which gener-alizations are to be made. Any haphazard sample is aptto be biased, and procedures for sample selectionshould ensure that each individual in the populationhas an equal chance of inclusion in the sample.

    Verifying Relations

    Fourth, the relations found in the original samplemust be verified by testing the prediction proceduresin a new sample, or samples, of the population. Althoughthis verification (referred to as cross-validation)often is omitted, it is a critical step. Without it, therecan be little confidence in the utility of a predictionmethod for any practical application.

    Applying the Method

    Fifth, the prediction method may be applied in sit-uations for which it was developed, provided that thestability of predictions has been supported in thecross-validation step and appropriate samples havebeen used.

  • 6

    General Method

    Simplification Practices andAssumptions

    To simplify procedures, prediction studies com-monly engage in certain practices and make certainassumptions. Some practices that are not strictly justi-fied on the basis of measurement considerations arefollowed nevertheless. For example, numerals may beused to represent differences in attributes (variatesthat are dichotomous)11 because this serves a practi-cal purpose. In addition, qualitative attributes may beassigned numbers, which are then used in statisticalcomputations. These practices no doubt will continue,because the results are useful. The measurement andscaling problems involved in these practices, however,should not be ignored, because improving the meas-urement of individual differences by examining theirhypothesized relations with later delinquency behavioris one means of improving current abilities to predictrisk.

    A second example of simplification is the assump-tion that the criterion magnitudes are linear functionsof the predictor variables. This assumption is commonin prediction studies, including some (but not all) ofthe studies considered in this report. Improvements ofprediction methods, however, may also require studyof nonlinear relations.

    A third example of simplification is the commonassumption that the relations among predictors andbetween predictors and the criterion hold for sub-groups of a heterogeneous population. Predictionmethods might be improved, however, by separatestudy of various subgroups of youth.

    In addition, prediction studies commonly use cer-tain statistical methods in a way that ignores generallyaccepted requirements for the appropriate use of themethods. One example of this practice is the use ofmultiple linear regression when the criterion isdichotomous rather than continuous. This practiceprobably also will continue because, although

    questionable, it produces results that often are usefulnevertheless.

    These considerations may point to avenues forimproving risk classifications, but they do not precludethe use of prediction methods. That is because resultsof these methods are constantly being judged by theirability to discriminate criterion classifications in newsamples. How well a method works is the ultimate test.

    Requirements for Predictors

    Two requirements for any predictor candidate(besides an expected relation to the criterion) are dis-crimination and reliability.

    Discrimination

    Discrimination is reflected in the number of cate-gories to which significant numbers of persons areassigned. A classification that assigns all persons toone category does not discriminate. The best proce-dure is one that provides a “rectangular” distributionof categorizations of persons for the sample—i.e., onewith about equal numbers of persons in each category.When only two categories are involved, the most dis-criminating item is one with half of the persons in thesample in each category.

    One consequence of the discrimination require-ment is that predictor items found useful in one juris-diction may not prove useful in another. For example, adrug-involvement item that is a useful predictor in ajurisdiction where many youth have histories of drugabuse may be of no use in another jurisdiction wherefew youth use drugs. The discrimination requirementpoints to the need for cross-validation studies in vari-ous jurisdictions. It also points to one hazard of theuntested acceptance of prediction measures developedoutside a given target jurisdiction.

    Considerations related to sampling and the con-cept of validity suggest similar conclusions. An itemmay effectively discriminate within a sample but notbe a valid predictor for a given criterion, but an itemwith good discrimination within the sample may proveto be more useful than a valid predictor with littlediscrimination.

    11 This report generally uses the term “attribute” to refer toitems or characteristics that are dichotomous (e.g., whiteand not white) and the term “variable” to items or character-istics that are continuous (e.g., age).

  • 7

    General Method

    Reliability

    Reliability refers to the consistency or stability ofrepeated observations, scores, or other classifications.If a procedure is reliable, then repetitions of the proce-dure lead to similar classifications. Valid prediction isnot possible with completely unreliable measures, butall measurement is relatively unreliable. This meansthat some variation is always to be expected. It alsomeans that if valid prediction is demonstrated, thenthe predictor attributes are not completely unreliable.Because the main interest in a prediction method is inhow well it works, more importance is attached to theconcept of validity than to that of reliability. This doesnot mean, however, that the issue of reliability is unim-portant. Making predictor variables more reliable isanother means of improving prediction.

    Probability and Validity

    Because no prediction of future behavior, achievedby any means, can be made with certainty, a statementof degree of probability is a more appropriate predic-tion. Predictions are properly applied to groups of per-sons who are similar in terms of some set of character-istics, rather than to individuals. In any predictionproblem, individuals are assigned to classes, and thenstatements are made about the expected performanceof members of the classes. For specific classificationsof persons, the expected performance outcomes oughtto be those that provide the most probable values forthose classes.

    The validity of a prediction method refers to thedegree to which earlier assessments are related tolater criterion classifications. The question of validityasks how well the method works. Any predictionmethod, however, may be thought of as having or lack-ing not one but many validities of varying degrees. Forexample, an intelligence test might provide a valid pre-diction of high school grades. It might provide a muchless valid, but still useful, measure of expected socialadjustment, and it might have no validity for the pre-diction of delinquency. The validity of prediction refersto the relation between an earlier assessment and aspecific criterion measure, and it is dependent uponthe particular criterion used. A prediction method hasas many validities as there are criterion measures tobe predicted.

    Evidence of validity with respect to a specific cri-terion of interest in a particular target population obvi-ously is necessary before any practical application ofthe method in that population. Nevertheless, juvenilejustice agencies sometimes use risk measures devel-oped and tested elsewhere (or simply created withoutfollowing all of the steps described above) withoutfirst obtaining this crucial validation. The practice ofnot validating a method in samples from a given targetpopulation should be discouraged.

    Predictor Candidates

    Predictor items may be obtained from self-reportsof the youth concerned, reports of observers or judges,records of past performance, other elements of theyouth’s social history, or direct observation by theperson making the prediction. Predictor items mayinclude psychological test scores, measures of atti-tudes or interests, biographical items, ratings—or,indeed, any data about the subject.

    In juvenile justice system agencies, predictoritems commonly are data coded from social histories,complaints, or other records pertaining to youth. Manystudies have found the following items to be related totypical criterion classifications such as new offenses orreferrals:

    ■ Age. The relation of age to later delinquency andcrime is well established. The likelihood of offend-ing increases sharply with age in the early teens,reaching a peak in the late teens or early twenties,and then declines continuously with age. Anotherage variable often found useful in predictions foryouth populations is age at first complaint, referralto the courts, or delinquency.

    ■ Gender. Because girls generally are found to bebetter risks than boys are (e.g., less likely to com-mit subsequent offenses), gender often is a usefulpredictor.

    ■ Record of complaints or offending. Most measuresof prior delinquency may be expected to be relatedto typical criterion classifications. These measuresinclude number of prior complaints or court refer-rals, prior petitions, previous adjudications, priorprobation violations, and other indicators of previ-ous difficulty with rules violations.

  • 8

    General Method

    ■ Classifications of type of complaint.

    ■ History of drug or alcohol abuse.

    Although these items are commonly found to beuseful predictors, any jurisdiction that is consideringusing an item as a predictor should test the relation ofthe item to the jurisdiction’s criterion classification ina sample drawn from its own target population. Theusual relation may not be found, or the strength of the relation may be different from that found in otherjurisdictions.

  • 9

    Methods of Combining Predictors

    12 E.W. Burgess, “Factors Determining Success or Failure onParole,” in The Workings of the Indeterminate Sentence Lawand Parole in Illinois, edited by A.A. Bruce (Springfield, IL:Illinois State Parole Board, 1928).

    A number of predictor items may be used inestablishing a single prediction score or in establishinga predictive classification of persons into groups (i.e.,a “risk classification,” as often used in juvenile justice).The items may be selected and combined in variousways. All or some of the predictor items may be com-bined without weighting, some procedure for selectingand weighting the items may be used, or some othertype of classification based on the items may be adopt-ed. How to determine which procedure works best isthe primary focus of this report and the questionaddressed in the comparisons that follow.

    Linear Additive Models

    Arbitrary Weights

    The most common method of constructing a pre-diction score is to use many predictor items and toarbitrarily assign each item the same weight. The pro-cedure either uses dichotomous attributes as predic-tors or reduces variables to attributes by collapsing acontinuous distribution to two categories. Each itemfound to be predictive is assigned one point regardlessof the strength of its association with the criterion.The sum of the points assigned in this way is the totalscore.

    Thus, the model is as follows:

    Y = (a) + x1 + x2 + x3 + … + xi

    where Y is the prediction score related to the expectedvalue of the criterion, a is an arbitrary constant thatmay or may not be included, and xi is some predictorattribute.

    This method often is referred to as the Burgessmethod, after its originator in parole prediction studies.12

    It is a simple and popular procedure, but it may be

    inefficient because it does not take into account theinterrelations among the predictor items. Neverthe-less, if the number of items is large, there are goodarguments besides simplicity for using this procedure.If the number of predictor items is large, the effect ofdifferential weighting that takes into account the over-laps among items tends to become unimportant. If thenumber of items is relatively small, however, weightingthe items by one of the available empirical methodscan improve the efficiency of prediction.

    If all items are weighted equally—-i.e., each isassigned one—the equation above shows that the totalscore is simply the sum of all the items. In juvenile jus-tice risk classification studies, however, the procedureoften is modified by a more complex (though still arbi-trary) weighting of the predictor items. These weightsare arrived at subjectively or on the basis of severalarbitrary cutting scores on a continuous variable suchas “number of prior referrals to the court.”

    Then the model is:

    Y = (a) + λ1x1 + λ2x2 + λ3x3 + … + λi xi

    where Y and a are as before, xi is some predictorattribute, and λi is some arbitrary weight. For example,a judgment may be made that some specific offensespredict reoffending, and these offenses may be subjec-tively assigned more weight than are others. Or, anitem such as “number of prior referrals” may be classi-fied into several groups such as “high,” “medium,” and“low” and then scored as 3, 2, and 1 respectively. Theweights are called “arbitrary” because, unlike the pro-cedures described in the next section, they are notbased on the data—i.e., they are not derived empirical-ly. Although often used in risk classification studies,these more complex arbitrary weighting procedurescannot be recommended; therefore, the Burgessmethod as presented in this study uses only unitweights (i.e., one point for each item).

    The advantages of the Burgess method include itssimplicity, its ability to produce an easily understood

  • 10

    Methods of Combining Predictors

    scale that has face validity, and the fact that the result-ing instrument is easy to score and interpret.13 Thereare, however, several disadvantages: the intercorrela-tions—i.e., “overlaps” among the items—are ignored;there is no basis for assuming that the arbitrary weightof one will provide optimal discrimination of the crite-rion groups; and there is no way to tell which items areactually redundant.

    Methods of Weighting Items

    The second set of methods in common use isbased on some procedure for determining statisticallyhow to weight the predictor items to arrive at moreefficient prediction. There are three often-used proce-dures, each with some advantages and disadvantages.These procedures are called multiple linear regression,discriminant analysis, and logistic regression.

    Multiple Linear Regression

    This procedure provides a linear equation (a setof weighted items added together) with weightsassigned to minimize the squared deviations ofobserved and expected criterion values. It takes intoaccount the intercorrelations among predictor vari-ables as well as the relation of each predictor to thecriterion. This method is most appropriate when thecriterion measure is composed of continuous scoreson an interval scale—e.g., scores providing a measureof overall social adjustment. It often is used with adichotomous criterion such as “success” or “failure,”but such use violates some statistical assumptions andhas disadvantages as discussed below.

    The model is as follows:

    Y = (a) + β1x1 + β2x2 + β3x3 + … + βixi

    where Y is the predicted value of the criterion, a is aconstant (the intercept of the regression line), xi is apredictor variable and βi is its coefficient (weight) esti-mated from the data.

    One advantage of this procedure is that it resultsin a theoretically optimal weighting of the predictorswhen they are used in combination with each other.Another advantage of the procedure is that it makes itpossible to estimate each predictor item’s relative con-tribution to explaining variability in the outcome crite-rion—thereby facilitating selection of predictor items.Moreover, the standardized coefficients (weights)serve to indicate the relative importance of each itemin the context of all others in the equation. Anotherpossible advantage of this procedure is that a methodexists for eliminating the effect of unwanted “invidi-ous” predictors, as discussed later in this report. Someclassifications that result from this procedure mayhave substantial face validity, but others may not (i.e.,it may not be apparent why the variables are weightedas they are).14

    Disadvantages of this procedure include itsassumptions that (1) predictor variables are linearlyrelated to the criterion measure and (2) variables arenormally distributed and measured on an intervalscale. In addition, multiple linear regression usuallydevelops weights that are derived from the correlationmatrix (for the predictor candidates and the criterion)using the total sample of subjects; by doing so, there isan implicit assumption that the relations are the samefor all subgroups of youth, which may be false.Another disadvantage arises when, as is usual for riskclassification in juvenile justice applications, the crite-rion classification is not a continuous measure butinstead is composed of two or more nominal classes(i.e., groups with numbers assigned solely to give themnames). In that case, logistic regression (discussedbelow), or a related model, would be theoreticallysuperior to multiple linear regression. Another disad-vantage, arising from the violation of statisticalassumptions underlying multiple linear regression, isthat expected values resulting from the analysis maybe greater than one or less than zero, although the cri-terion is scored either zero or one. In addition, it isunreasonable to assume, as the model requires, thatthe errors are normally distributed.

    13 Moreover, a simple, equally weighted linear additive modelhas been found to be as good as, and in some ways betterthan, multiple linear regression. See H. Wainer, “EstimatingCoefficients in Linear Models: It Don’t Make No Nevermind,”Psychological Bulletin 83 (1976).

    14 Even linear models that use random regression weightshave been found to perform substantially better in predictionsthan humans do. See R.M. Dawes and B. Corrigan, “LinearModels in Decision-Making,” Psychological Bulletin 81 (1974).

  • 11

    Methods of Combining Predictors

    Discriminant Analysis

    This procedure is closely related to the multiplelinear regression procedure, but the weights are deter-mined in such a way that the criterion classificationgroups are maximally separated in terms of the aver-age values for the prediction scores in relation to theirpooled standard deviations. If the criterion classifica-tions provide only two groups (e.g., scored zero andone), then this method is mathematically equivalent tomultiple linear regression.15 It can, however, be usedwith more than two groups, making it possible to findseveral equations (up to one less than the number ofcriterion groups) that optimally separate the groups.The advantages and disadvantages of discriminantanalysis parallel those of multiple linear regression.

    Logistic Regression

    When the criterion is composed of only twogroups, as is usual for risk classification in juvenile jus-tice, logistic regression is the theoretically best approach.The multiple linear regression and discriminant analy-sis procedures discussed above present statistical diffi-culties when the criterion can take on only two values(such as “success” or “failure”)—the assumptions thatunderlie the use of these methods are violated in suchapplications. The logistic regression procedure requiresfewer assumptions but provides an estimate of the like-lihood that an event will occur. The weights assignedare those that make these estimates most likely. Theequation can be written to show how the odds that anevent will occur change with changes in each variableincluded. This study uses logistic regression and basesthe resulting classification tool on the weights thatgenerate the probabilities.16

    This procedure seems more complex, but theunderlying rationale is straightforward. Estimates ofthe probability of an event occurring are made directly.Whereas in the multiple linear regression model dis-cussed above, weights for predictors are estimated on

    the basis of “least squares,” in logistic regression theyare estimated by a method of “maximum likelihood”—i.e., the coefficients selected are those that make theobserved results most likely. The weights can be inter-preted as a change in the logarithm of the odds ratioassociated with a one-unit change in any predictor. It iseasier to think in terms of odds rather than log odds,though, and the equation can be written in terms ofodds, resulting in an additive model.

    The model is as follows:

    Ω = ez/(1 + ez)

    where Ω is the probability of the event, e is the base ofthe natural logarithms (about 2.718), and z is the linearcombination

    Z = (a) + B1x1 + B2x2 + … + Bixi

    where a is a constant, the Bs are coefficients estimatedfrom the data, and xs are the values of the predictors.17

    The equation may be written in terms of the log ofthe odds ratio, which is called the logit:

    log (Probability of event/Probability of non-event) = log (p/(1–p)) = Z

    The B coefficients show how the odds change witheach predictor variable. If expressed in terms of odds,rather log odds, the equation may be written as follows:

    Z= e(B + B1x1 + B2x2 + B3x3 ... + Bixi )

    Raising e to the power B for any predictor variableshows the factor by which the odds change when thatpredictor variable is raised by one unit.

    The main advantage of logistic regression is thatfew statistical assumptions are required for its use. Inaddition, it generates probability values that, unlikethose generated by regression with a dichotomous cri-terion, are constrained between zero and one. A disad-vantage of logistic regression is that persons unfamil-iar with statistics may find it more difficult to interpret.The resulting classification may or may not have muchface validity for administrators. Another disadvantage

    15 For this reason, discriminant analysis is not included inthis report’s comparisons. See, for example, O.R. Porebski,“On the Interrelated Nature of the Multivariate StatisticsUsed in Discrimination Analysis,” British Journal ofMathematical and Statistical Psychology (1966).

    16 For the classification proposed, the probabilities are notneeded.

    17 D.W. Hosmer and S. Lemeshow, Applied Logistic Regression(New York, NY: John Wiley and Sons, 1989).

  • 12

    Methods of Combining Predictors

    is that complete data for each subject are needed tocalculate weights; thus, missing data cause greatproblems.

    Configural or Clustering Methods

    A variety of configured or clustering methods forgrouping youth according to combinations of predictoritems also have been used in delinquency prediction.The most common have been those called “predictiveattribute analysis,” “association analysis,” and “config-ural analysis.”18 This study uses a predictive attributeanalysis, which is the clustering method most directlyrelated to the prediction problem and also is the con-figural procedure that generally has shown the bestresults in related studies.19

    Predictive attribute analysis is a hierarchical, divi-sive clustering method. First, the single best predictoramong all the available (attribute) candidates is found.Then, after dividing the whole sample into two sub-groups on the basis of that predictor, the single bestpredictor is found within each of the subgroups. Thegroups are again divided, and the process is repeatedfor four subgroups. This process continues until some“stopping rule” ends it. A common rule is to stop whenno further discriminators are found at a predeterminedlevel of confidence, such as .01 or .05. If the samplestudied is large, this process may generate a largenumber of subgroups; thus, the procedure may bestopped when identification of further subgroups doesnot seem operationally useful (even though thisrequires a subjective judgment).

    Advantages of predictive attribute analysis arethat it requires few statistical assumptions and usuallyproduces results that have considerable face validity.Another possible advantage is the discovery of anyinteractions of predictors that may affect the outcomeof the prediction. Another advantage is that, unlikeother methods, predictive attribute analysis takes into

    account the heterogeneity of the subjects and, in fact,builds on these subgroup differences.

    One small disadvantage of predictive attributeanalysis is that computer analysis programs for theentire procedure are not readily available (althoughthey are for the other methods discussed).20 Anotherdisadvantage may be that, in predictive attribute analy-sis (unlike other methods), the final classifications foroperational use are found directly rather than by theuse of cutting scores for a continuous distribution—afactor that limits the extent to which group sizes fordifferential supervision programming may be manipu-lated for administrative advantages. Another issue isthat termination of the procedure depends on arbi-trary stopping rules, including the level of significancerequired for continuation of the process. Yet anotherconcern is that attributes may be very close (or evenequal) in apparent predictive power except, for exam-ple, in the third decimal place of the measure of therelation to the criterion. In such a case, the choicebetween two attributes is actually quite arbitrary—usually the item thought to be more reliable is selected.

    “Bootstrap” Methods

    “You can’t lift yourself up by your own boot-straps” is an adage sometimes referred to by statisti-cians. They mean that no statistical analysis canimprove inadequate data (i.e., “GIGO, garbage in,garbage out”) and also that no amount of extensivestatistical manipulation can find results not present inthe data (“GISGO, garbage in, sophisticated garbage out”).

    Nevertheless, there are sound reasons to expectthat a statistical approach sometimes called “boot-strapping” might improve the procedures discussedabove. If a population contains subgroups that differfrom one another in the way the predictor candidatesare correlated or in their relations to the criterion, theprediction method may benefit from a separate analy-sis of those subgroups. This report explores the use ofone such bootstrap procedure.2118 L.T. Wilkins and P. MacNaughton-Smith, “New Prediction

    and Classification Methods in Criminology,” Journal ofResearch in Crime and Delinquency 1(1954):19–32. For adescription of other clustering methods, see T. Brennan,“Classification Methods in Criminology,” in D.M. Gottfredsonand M. Tonry (1987), supra, note 1, 201–248.

    19 See, for example, S.D. Gottfredson and D.M. Gottfredson,(1979), supra, note 7.

    20 All analyses described in this report were completed usingStatistical Package for the Social Sciences (SPSS) 6.0; see M.J.Norusis, SPSS for Windows (Chicago, IL: SPSS, Inc., 1993).

    21 The term “bootstrap” is sometimes used in other ways,e.g., in reference to analyses of repeated random samples todetermine stable estimates.

  • 13

    Data Used

    22 The age variable usually proves helpful in prediction, butthis study’s exclusion of 16- and 17-year-olds limits its vari-ability. Thus, the study may find only a weak relation of ageto the criterion, even though the relation would be strongwere the full age range considered.

    23 It is desirable that all sample members have equivalentexposure to risk of the unfavorable outcome. Although it istechnologically feasible to follow youth through any deten-tion, commitment to State confinement, and charges in adultcourts, such tracking rarely has been accomplished in thejuvenile courts to date. The lack of the kind of data thatwould result from such tracking reduces the reliability andvalidity of the criterion classifications, thereby reducing theefficiency of prediction.

    Sample

    This study was based on the records of 9,476youth ages 8–15 who were referred to the JuvenileCourt of Maricopa County, AZ, in calendar year 1990.These records were provided by the court to theNational Juvenile Court Data Archive which is main-tained by the National Center for Juvenile Justice.Records of 16- and 17-year-olds were excluded becausethese youth would “age out” of the juvenile court juris-diction (18) before the end of the study’s 2-year followupperiod.22, 23

    Construction and ValidationSamples

    Approximately a 20-percent random sample ofrecords (1,924) was selected to compose the construc-tion sample—i.e., the sample on which the predictioninstruments were developed. The remaining 80 percent(7,552) were reserved for a validation sample—thesample to be used for verification.

    The selection of the construction sample size wasbased on several considerations. The larger the con-struction study sample, the greater the stability to beexpected upon validation; however, many smaller juve-nile courts have relatively small samples available forstudy, and this study sought to simulate the applica-tion of methods that might be meaningful to all.

    Experience shows that when the linear models areused, construction samples of about 500 to 1,000 sub-jects usually result in adequate stability when testedon validation samples. Larger samples are needed,however, for predictive attribute analysis (in whichsamples quickly become smaller as the subdivisionprocess advances) and for the bootstrap method.Thus, the selection of the 20-percent construction sam-ple seemed a reasonable compromise for taking intoaccount the need for a sufficiently large sample, thelimitations likely to be encountered by smaller courts,and the problem of sample size reduction for some ofthe methods to be assessed.

    Jurisdictions with smaller populations availablefor sample selection might find it advantageous to splitthe sample randomly in half to provide constructionand validation samples. This is a common procedure.However, because prediction implies looking to thefuture and because conditions may change over time, abetter approach is to use all available cases in oneyear for the construction sample and cases from alater period for the validation sample.

    Criterion

    The criterion for this study was defined as “returnto the juvenile court with a new complaint within 2years.” This criterion was selected because it is com-monly used in juvenile courts, typically has adequatediscrimination in samples, and is a reasonable indicantof relevant youth behavior subsequent to a referral tothe court. In this study, 42.8 percent of youth in theconstruction sample and 44.7 percent of youth in thevalidation sample had at least one new referral within2 years.

    Attributes

    Attributes were scored as zero or one. Someitems—such as whether there were any prior referralsto the court—are natural attributes. Others result fromcollapsing continuous variables—such as the number

  • 14

    Data Used

    of prior referrals—into only two categories.24 Becausesome analyses require attribute data, all variables weremade into attributes (even though they could be usedalternatively as continuous scores when the analyticprocedures permitted that). Selected attribute predic-tor candidates are listed in table 1, which also showsthe proportion of the construction sample possessingeach attribute, the percentage of new referrals amongyouth possessing the attribute, and the attribute’srelation to the criterion. (The last is measured by thephi coefficient, an estimate of correlation for fourfoldtables—i.e., two rows and two columns.)25 Table 1 liststhe items in order of the magnitude of the phi coefficients.

    Fourfold tables can, of course, be produced foreach listed attribute from the data in table 1. Table 2shows an example of a fourfold table for the first listedattribute—“any prior referrals.”

    Variables

    Table 3 shows selected items that are distributedcontinuously, along with a measure of their relations tothe criterion: the point biserial correlation.26 The dis-tributions tend to be markedly skewed to the right.

    24 An example of creating an attribute from a continuouslydistributed variable is this study’s use of the classification“three or more complaints.” The distribution of the variable“number of complaints” was dichotomized in this way becausesubjects who had more than two complaints experiencednew referrals within 2 years at rates higher than the baserate (i.e., the overall rate of new referrals for the sample). Itis common practice to examine the distributions for a contin-uous variable for the outcome groups (in the constructionsample) and then to make a judgment about classifying thevariable into an attribute. In doing so, however, the risk ofcapitalizing on chance variation in selecting the cutting scoreshould be kept in mind.

    25 Phi = the square root of the quantity (chi-square dividedby the number of cases), when there is only one degree offreedom.

    26 The point biserial correlation is an estimate of the correla-tion coefficient for a continuous interval level variable and adichotomous dependent one. For simplicity, this study con-sidered all variables listed as having equal intervals.

  • 15

    Data Used

    Table 1: Selected Attributes Included for Study, With Relation to Criterion (N = 1,924)

    Percent WithPercent of All Attribute Who Had

    Youth in Sample New Referrals PhiAttribute With Attribute Within 2 Years Coefficient Significance

    Any prior referrals 41% 64% .35 < .001At-risk record * 41 64 .35 < .001Any prior delinquent offense 37 65 .34 < .001

    Three or more complaints in history** 24 73 .34 < .001Any prior informal dispositions 37 64 .32 < .001Any prior theft offense referrals 28 68 .32 < .001

    Any prior formal dispositions 22 71 .30 < .001Any prior adjudications 18 74 .29 < .001Petition filed 23 61 .20 < .001

    Male 70 49 .19 < .001Any prior status offense referral 15 65 .19 < .001Any prior adjustment 11 69 .19 < .001

    Any prior cases with detention 9 73 .19 < .001At-risk youth*** 13 65 .17 < .001Not white 39 52 .16 < .001

    Grade 7 or higher 70 47 .14 < .001Any prior violent offense referrals 4 76 .13 < .001Mexican American 26 53 .12 < .001

    Age 13 or older 78 46 .11 < .001Auto theft or person offense 17 55 .11 < .001Age 14 or older 60 46 .09 < .001

    Referral not from law enforcement 6 60 .09 < .001Theft offense referral 48 38 –.09 < .001Present referral for auto theft 5 61 .09 < .001

    Violence offense, present referral 4 62 .08 < .001Person offense 12 53 .07 < .001Detained 8 54 .07 < .003

    Any prior drug offense referrals 2 67 .07 < .002Not in school, but should be 8 55 .06 < .008Three or more counts at referral 5 56 .06 < .004

    Referral for delinquency 81 42 –.03 nsIn school 86 43 .02 nsStatus offense referral 19 45 .02 ns

    Referral for drug offense 2 49 .02 nsParents married 22 43 .00 ns

    * At-risk record: any record of referral for a drug offense or violence offense or for delinquency or status.** Three or more complaints in history: reflects the sum of present and prior complaints.*** At-risk youth: not in school, referred by source other than law enforcement agency (e.g., school, parents), or prior informal

    adjustment of alleged offense.

  • 16

    Data Used

    Table 2: New Referral Within 2 Years, Analyzed by Whether There Were AnyPrior Referrals

    New Referral Within 2 YearsAny Prior No Yes TotalReferrals Number Percent Number Percent Number Percent

    No 812 71.9% 317 28.1% 1,129 58.7%

    Yes 289 36.4 506 63.6 795 41.3

    Total 1,101 57.2 823 42.8 1,924 100.0

    Phi coefficient = .354Chi-square = 241df = 1P

  • 17

    Development and Validation ofOperational Tools

    27 For discussion of such errors and consequences for pre-dictive decisions, see S.D. Gottfredson and D.M. Gottfredson,“Behavioral Prediction and the Problem of Incapacitation,”Criminology 32, no. 3 (1994):441–474.

    28 Validities of the prediction methods (rather than of theclassifications) are summarized later in this report (see table 14).

    29 Cramèr’s statistic is an estimate of correlation calculatedfrom chi-square for any k by l table. This statistic is usedbecause the classification is only rank ordered. Several othermeasures of the efficiency of prediction have been devel-oped, and the question of which measure to use is complex.Other indices would be preferred if the samples compareddiffered in base rates or in the number of categories in theclassification. For discussions of the limitations of the meas-ures used here and of other measures of predictive efficien-cy, see S.D. Gottfredson and D.M. Gottfredson, “The Accuracyof Prediction Models,” in Research in Criminal Careers and“Career Criminals,” vol. 2, edited by A. Blumstein et al.(Washington, DC: National Academy Press, 1986). For abriefer discussion, see S.D. Gottfredson, “Prediction:Methodological Issues,” in D.M. Gottfredson and M. Tonry,(1987), supra, note 1, 29–33.

    Each prediction method under consideration isused to devise a simple form, in a process intended tosimulate what might be done in a juvenile court thatwishes to develop a classification tool for use in itseveryday operations. Courts generally prefer a simpleprocedure that permits the classification of all youthinto a few categories, with each category differing inthe proportion of youth expected to return to thecourt on a new complaint. Thus, this study sought toproduce forms that are easily completed and thatresult in the classification of youth into five groups.

    In devising these classification procedures, thedistributions of scores in the construction samplewere examined in order to determine cutting scoresthat would allow placement of all cases into fivegroups. The cutting scores were determined by select-ing points that would result in groups with approxi-mately equal numbers of cases. The proportions ofcases with each score and the percentages with atleast one new referral were also examined. When dis-tributions were markedly skewed, the five groups werefar from equal in their portions of total cases classi-fied. When cutting scores are based on apparently opti-mal discrimination of outcomes in the constructionsample, there exists the risk of capitalizing on chancevariation; the use of a validation sample, however, pro-vides a check on this.

    In an operational situation, cutting scores shouldbe determined with a view to the objectives of theintended use of the classification instrument. If, forexample, the objective is to identify a group with lowprobabilities of new referrals in order to place youth ina minimal supervision program, the decision about thecutting score might take into account a number of con-cerns: the shape of the distribution, the degree ofvalidity of the instrument, the proportions selected ata given point, the percentage of youth with favorableor unfavorable outcomes, and the judged tolerance forrisk.

    In any decision of this sort, two types of errormust be taken into account: misclassifying a youth asbeing likely to return to court with a new referral,when the youth actually “succeeds,” and misclassifyinga youth as being a likely “success,” when the youthactually experiences a new referral.27 Both types oferror have important consequences that should beassessed in program planning when tools such asthose discussed in this report are to be used.

    For each method, the following sections summa-rize the results of the analysis in the construction sam-ple and also provide a measure of the predictive validi-ty found when the five-group classification “opera-tional tools” devised from the methods were applied tothe validation sample.28 For a simple measure of validi-ty, Cramèr’s statistic is presented to describe the rela-tion between the classification model and thecriterion.29

  • 18

    Development and Validation of Operational Tools

    Figure 2: Classification Groups From theBurgess 9-Item Scale, Validation Sample

    1 2 3 4 50%

    20%

    40%

    60%

    80%

    27.8%

    43.3%52.2%

    63.8%

    77.3%Percent With a New Referral

    Low Risk High Risk

    30 The effect of adding more items was illustrated with thedata in this study when Burgess scales made up of 6, 9, 12,15, and 20 items were devised and tested. The coefficientsmeasuring the relation of scores to the criterion (i.e., theCramèr’s statistics) were as follows: 6 items, .359 (in the con-struction sample) and .349 (in the validation sample); 9items, .392 and .387; 12 items, .405 and .394; 15 items, .417and .399; 20 items, .401 and .389.

    31 “Shrinkage” refers to the apparent loss in predictive powerfrom the construction sample to the validation sample. Someshrinkage is normally expected and usually results fromoverfitting of the device or equation on the constructionsample. For the comparisons in this report, shrinkage maybe due to such loss, to rounding, or to the reduction of scalesfor the purpose of providing the five-group classifications.

    Figure 1: Burgess 9-Item Scale

    Add one point for each item:

    Any prior referral ___Any prior adjudication ___Any prior formal disposition ___Any prior informal disposition ___Any prior theft offense referrals ___Three or more complaints in history ___Any prior delinquency referral ___Petition filed ___Prior referral for drugs, delinquency,

    violence, or status ___

    Total score:

    Classify youth:Score: Group:0 1 Lowest risk1–3 24–5 36–7 48–9 5 Highest risk

    Table 4: Classification by the Burgess 9-ItemScale, Validation Sample

    Group Number Percent New ReferralClassification in Group of Total Number Percent

    1 3,724 49.3% 1,034 27.8%2 756 10.0 327 43.33 975 12.9 509 52.24 872 11.5 556 63.85 1,225 16.2 947 77.3

    Total 7,552 100.0 3,373 44.7

    Cramèr’s statistic = .383

    Group

    Burgess Method (Equal WeightLinear Model)

    Various Burgess-type models were devised. All ofthe models followed the traditional method, with onepoint added for each predictor variable, but varying inthe number of attributes included in the scale. Theoptimal number of items for this type of scale is notwell established, although it may be assumed that asthe number of positively correlated variates increases,the correlations between any two sets of weightedscores approach one and the effect of weighting theitems tends to disappear. Experience shows that usual-ly little predictive efficiency is gained after 8 or 10items, and adding more items probably is redundantand may contribute more to face validity than empiricalvalidity.30 The issue of number of items is not necessar-ily a trivial one, because the acceptance and use of theinstrument may be at stake. On the other hand, addingitems has some operational cost and at some pointfails to increase (or may actually decrease) predictiveefficiency.

    This study somewhat arbitrarily presents twoscales: one based on only 9 items, the other based on15. The validity of the five-group classification derivedfrom application of each scale is examined.

    Figure 1 shows the Burgess 9-item scale. In theconstruction sample, the Cramèr’s Statistic was .39 (forthe ungrouped scores). In the validation sample, it was.38 (for the grouped scores), which means that thegrouping and testing in the different sample resulted inlittle shrinkage (loss of predictive efficiency).31 Table 4summarizes results from application of the Burgess

  • 19

    Development and Validation of Operational Tools

    Figure 3: Burgess 15-Item Scale

    Add one point for each item:Any prior referral ____Any prior adjudication ____Any prior formal disposition ____Any prior informal disposition ____Any prior theft offense referrals ____Three or more complaints in history ____Any prior delinquency referral ____Petition filed ____Prior referral for drugs, delinquency,

    violence, or status ____Any prior detention ____Male ____Any prior adjustment ____Any Prior status offense referral ____At risk youth ____Not Caucasian ____

    Total score:

    Classify youth:Score: Group:0 1 Lowest risk1 22–5 36–9 410 or more 5 Highest risk

    Table 5: Classification by the Burgess 15-ItemScale, Validation Sample

    Group Number Percent New ReferralClassification in Group of Total Number Percent

    1 830 11.0% 157 18.9%2 2,103 27.8 573 27.23 1,758 23.3 728 41.44 1,489 19.7 872 58.65 1,372 18.2 1,043 76.0

    Total 7,552 100.0 3,373 44.7

    Cramèr’s statistic = .390

    Figure 4: Classification Groups From theBurgess 15 Item Scale, Validation Sample

    1 2 3 4 50%

    20%

    40%

    60%

    80%

    18.9%27.2%

    41.4%

    58.6%

    76.0%Percent With a New Referral

    Low Risk High Risk

    Group

    9-item scale to the validation sample. Figure 2 showsthe percentage of youth with at least one new referralto juvenile court during the 2-year followup period, foreach of the five classification groups. The group classi-fication, like the distribution of the Burgess scores, ismarkedly skewed to the right, and the classificationplaces half of the youth in the lowest risk category.32

    Groups 2–5 have new referrals above the base rate of43 percent.

    Figure 3 shows the Burgess 15-item scale. Table 5summarizes validation results from the 15-item scale,and figure 4 shows the percentage of youth in eachclassification group with at least one new referral dur-ing the 2-year followup.

    Despite the significantly larger number of items inthe 15-item scale, the validity coefficient (Cramèr’s sta-tistic) is only slightly larger than in the 9-item scale.Nevertheless, the classification based on the 15-itemscale might be more useful administratively, because itprovides better discrimination at the “low risk” end ofthe scale. In addition, the 15-item method offers morepossibilities for the choice of cutting scores to estab-lish the classification groups. Depending on the objec-tives of the classification decision, the greater choiceof cutting scores may provide an operational advan-tage in that the five-group classification could be for-mulated in many ways.33

    Multiple Linear Regression

    Table 6 presents a summary of the multiple linearregression analysis, including the attributes and vari-ables included in the resulting scale. Because the βcoefficients show the relative contributions to predic-tion of each variable in the context of all other vari-ables, it can be seen that the attribute “any prior refer-rals” is found to be the best predictor. Other variablescontributing to the prediction are “number of priorreferrals,” “whether male,” “whether Caucasian,“ and

    32 Because half of the sample had scores of zero (no prior refer-rals, no more than two complaints, and no petition filed), itwas not possible to classify youth into approximately equalgroups.

    33 This issue is discussed further, with examples from theresults of the present study, in a later section (pages 27–28).

  • 20

    Development and Validation of Operational Tools

    Figure 6: Classification Groups From the MultipleLinear Regression Scale, Validation Sample

    1 2 3 4 50%

    20%

    40%

    60%

    80%

    22.3%29.3%

    43.8%54.5%

    76.5%Percent With a New Referral

    Low Risk High Risk

    Table 6: Multiple Linear Regression of NewReferrals on Selected Items

    Unstan- Stan-dardized Standard dardized-

    Item Coefficient Error Coefficient Signif-(Attribute or Variable) (B) of B (β) icance

    Male .161 .023 .150

  • 21

    Development and Validation of Operational Tools

    Figure 7: Classification of Youth Based onLogistic Regression Analysis

    IfAny prior referrals Add 8.9 ______Male Add 5.9 ______And add2.6 times the number ofprior delinquency referrals ______

    Subscore ______Then subtractIf Caucasian 5.5 ______0.7 times the age at the time of the complaint ______

    Subscore ______

    Total ScoreClassify youth:

    Score: Group:9.90 or lower 1 Lowest risk9.91–12.30 212.31–17.50 317.51–19.80 419.81 or higher 5 Highest risk

    Group

    Table 9: Classification by the Logistic RegressionScale, Validation Sample

    Group Number Percent New ReferralClassification in Group of Total Number Percent

    1 1,596 21.1 389 24.42 1,343 17.3 393 30.13 1,609 21.4 645 39.94 1,435 19.3 794 54.55 1,569 20.9 1,152 73.1

    Total 7,552 100.0 3,373 44.7

    Cramèr's statistic = .357

    Table 8: Logistic Regression of New Referrals onSelected Items

    Unstan-Item dardized Standard

    (Attribute or Coefficient Error ExponentVariable) (B) of B (ββ) Wald* P

    Any prior referrals .892 .136 2.440 42.907

  • 22

    Development and Validation of Operational Tools

    Figure 9: Predictive Attribute Analysis, Construction Sample

    All 1,924 youth(44.7% with at least

    1 new referral)

    795 with prior referals (63.6%

    with a new referral)

    1,129 with no priorreferrals (28.1% with

    a new referral)

    377 not Caucasians(35.0% with anew referral)

    752 Caucasians(24.6% with anew referral)

    193 females(43.0% with anew referral)

    602 males(70.3% with anew referral)

    134 females(27.6% with anew referral)

    243 males(39.1% with anew referral)

    257 females(18.3% with anew referral)

    495 males(27.9% with anew referral)

    Group 5(43.0% with anew referral)

    234 with 1 or 2prior complaints

    54.7% with anew referral)

    368 with 3 or more prior complaints(80.2% with anew referral)

    Group 2(27.6% with anew referral)

    Group 4(39.1% with anew referral)

    Group 1(18.3% with anew referral)

    Group 3(27.9% with anew referral)

    Group 6(54.7% with anew referral)

    Group 7(73.3% with anew referral)

    Group 8(86.7% with anew referral)

    180 Caucasians(73.3% with anew referral)

    188 not Caucasians(86.7% with anew referral)

    Predictive Attribute Analysis

    The predictive attribute analysis was carried outon the construction sample with the 1-percent level ofsignificance (for chi-square with one degree of free-dom) taken as the stopping rule. Candidates for thebest predictor for each subgroup identified were allattributes shown in table 1 (not already used in thesubdivision process).

    This process resulted in the identification of eightgroups, as shown in figure 9. The classification is sim-ple, the procedure has considerable face validity, andthe eight-group classification probably would workwell in operations.

    The number of groups, however, was reduced tofive, to permit comparisons with the other five-groupclassification procedures tested. The five groups wereobtained by combining several groups with similarrates of new referrals, as shown in figure 10.35 Whenthe classification in the validation sample was carriedout, the results, shown in table 10 and figure 11, wereobtained.

    35 The following groups from figure 9 were combined: 1 and2, 4 and 5, and 7 and 8.

  • 23

    Development and Validation of Operational Tools

    Figure 10: Classification Into Five Groups by Predictive Attribute Analysis

    Group 1

    Group 3

    Group 1

    Group 2

    Group 4

    Group 5

    Group 3

    Male?

    Male?

    3 or more priorcomplaints?

    All Youth Any priorreferrals?

    Caucasian?

    Male?

    No

    Yes

    No

    Yes

    No

    Yes

    No

    Yes

    No

    Yes

    No

    Yes

  • 24

    Development and Validation of Operational Tools

    Table 10: Classification by Predictive AttributeAnalysis, Validation Sample

    Group Number Percent New ReferralClassification in Group of Total Number Percent

    1 1,484 19.7% 332 22.4%2 1,821 24.1 536 29.43 1,765 23.4 813 46.14 965 12.8 534 55.35 1,517 20.1 1,158 76.3

    Total 7,552 100.0 3,373 44.7

    Cramèr's statistic = .387

    Figure 11: Classification Groups From thePredictive Attribute Analysis, Validation Sample

    1 2 3 4 50%

    20%

    40%

    60%

    80%

    22.4%29.4%

    46.1%55.3%

    76.3%Percent With a New Referral

    Low Risk High Risk

    Other grouping procedures might be more usefuloperationally, depending upon the objectives for theintended use of the classification. For example, goingback up the tree of subdivision in figure 9, five groupsmay be defined. Ignoring the Caucasian attributeamong youth with prior referrals results in combininggroups 7 and 8. Ignoring the gender attribute amongyouth with no prior referrals results in combininggroups 2 and 4 and 1 and 3.

    Bootstrap Analysis

    A potential advantage of a bootstrap procedure isthat it takes into account the possibility that theweights developed on the basis of a sample as a wholemay be wrong for subgroups if intercorrelations differfor them. Disadvantages may be that sample sizes arereduced, resulting in less stability of the weights andan increased likelihood of capitalizing on chance varia-tion. One result may be that greater shrinkage is foundon validation.

    To explore a bootstrap method with the MaricopaCounty data, the authors conducted separate regres-sion analyses on the data for youth with and withoutany prior referral to the juvenile court. Tools simulat-ing the o