what works? targeting, testing & tracking (triple-t) amimb, glazier’s hall 29 october 2013...

What Works?Targeting, Testing & Tracking

(Triple-T)

AMIMB, Glazier’s Hall29 October 2013

Lawrence W. ShermanInstitute of CriminologyCambridge University

Forecasting Storms

• 1987 –Big False Negative• 2013—Big True Positive• Week in advance• Protective actions taken• Huge reduction in harm• Same in Eastern India 2013• Who was the hero?

Who Was Vice-Admiral Robert Fitzroy?

1819: Weather Was All Words

• Fitzroy changed the business model• ADDED numbers into it• Sharpened the words with scales, %• Improved accuracy over “experience”• Transformed qualitative forecasts• Into quantitatively-supported statements• Now we call them “algorithms”• See things coming much sooner

Meteorological Office of the Board of Trade 1854

• Job: Publish Weather Data• Not Predict It• Fitzroy invented term “forecasting”• Controversial idea• Not 100% accurate• Royal Society Attacked It• Budget Cut; forecasts stopped• Fitzroy committed suicide

Prevention Finally Won• Falling Barometer Gale Warnings• Fishing ships told to stay in port• Owners objected; Practice discontinued• Deaths from gales; Prevention revived

What’s YOUR Job?

• Measuring Harm?• Preventing Harm? • Reducing Harm?• Correcting Harm?• Redressing Wrongs?

What is your business model?

Nor for making money.

Just for doing the job.

Words--not many Numbers

• Every prison is different• Every human life is important• All rights must be protected

BUT....• Which prisons are worst?• Which are about to explode in violence?• Which will have the most suicides?• How can algorithms help prevent harm?

AMIMB Website Einstein Quote

• Using same method• Expect different result?• Is there a new method IMBs could use?• Are there other business models?• Could they get better results? • Could they ADD numbers to add VALUE?

Evidence-Based Practice

1. An Idea:

“...Practices should be based on

scientific evidence about what works best.”

(Sherman, 1998)

2. An Analytic Framework

“A standard for all....strategies: using scientific evidence to target, test and track ...[all] practices” (Sherman, 2013)

Words AND Numbers

• AMIMB Practical Guide Has Many Words• Many numbers requested• Not clear how they are to be gathered• Whether comparisons could be reliable• What is the logic model by which• EACH IMB REPORT CAN REDUCE

HARM?

Same Problem for HM Inspectors

• Police since 1856• Prisons? Probation?• “Efficiency and Effectiveness”• Case by case• Word by word• HMI Police developed a “report card”• But measures were not scientific• Not “risk-adjusted”

Daniel Kahneman

QUICK! What emotion does this person feel?

QUICK! How Much is This Product?

17 X 24 = ???

Thinking, Fast and….Slower

• Reading the face—fast!

• Multiplying two numbers—slow (408)

• Why is that important to monitoring?

• It adds theory about how we think, decide, act

Two Cognitive Systems:Overlapping

System I FAST

Intuitive

Automatic

Effortless

Associative

Rapid

Opaque Process

Skilled

System IISLOW

Reflective

Controlled

Effortful

Deductive

Slow

Self-Aware

Rule-Following

Good News, Bad News

Good News:• Most decisions are made with System I• System I conserves energy, time• Most System I decisions are right--driving

Bad News:• Many important decisions are “wrong”• Many could be “right” if we used System II• We resist System II because it is “costly,” tiring

Slow Thinking About Strategy

TargetingAiming for biggest impact

TrackingMeasuring BOTH policing and crime

TestingDeciding what works

Targeting---PREDICT

Focus

•Issues, situations, processes

•Units, managers, problem leaders

Classify

•Concentrations

•Causes

Prioritize

•Greatest impact

•Best chance of success

Clinical vs. Statistical: Kahneman Chapter 21Began Career with

Paul Meehl (1954)200 Replications

Since ThenFormulas always

beat (60%) or equal (40%) human judgment—latter is far more costly

Crime, parole, air pilot errors, credit risk

Wine value forecasts Cot Death risk

Paul Meehl

3 ways of PredictingQualitative1. Clinical (System 1~2?; not transparent)

Statistical:

2. checklist with validation

3. supercomputer data mining

High Risk (2%)

Neither High nor Low Risk (38%)

Low Risk (60%)

High Risk 2% vs. Bottom 60%Two Years From Forecast Date

Charges for Any Offence 8 X more

Charges Serious Offence 10 X more

Charges Murder or Attempt 75 X more

TestingComparing two methodsSame kind of problems

THEN,.... ASKING:

Which one works better?Which one costs less?Which one gets best result for

same cost?

Testing DefinedA Fair Comparison Between two different methodsE = Experimental Method (new)C= Control Method (current) All else equalWhich one is better?By what criteria?

What Kind of Evidence?Not This Kind But This Kind

But what is good evidence?What constitutes a good test?

Many bad tests

More intuitive than quantitative

“Illusion of Validity”

Restorative Justice Experiment:Did Program Cause Crime Drop?

71% 71%

49%

100%

20%

40%

60%

80%

100%

120%

140%

Yr (-2) Yr (-1) Yr (+1) Yr (+2)

Line 1

DCOffenders (n =62)

Inferring Cause From Trend?Post Hoc Ergo

Propter Hoc?

After this, then because of this?

Or would it have dropped anyway?“Natural” Trend“History”Other factors“Spurious” explanationsThat can be eliminated—give confidence

All ruled out by randomized controlled trials

(RCTs)

Randomized Controlled Trial RCT:COMPARISON or NET difference

101%

121%

71% 71%

28%

49%

100%

20%

40%

60%

80%

100%

120%

140%

Yr (-2) Yr (-1) Yr (+1) Yr (+2)

CourtOffenders(n=59)

DCOffenders (n =62)

CONTROL Group A sample that measures what would

happen at the same time and place without the intervention being introduced to an otherwise identical

EXPERIMENTAL Group

AFP/ACT ANU Experiment:Did Program Cause Crime Drop?

71% 71%

49%

100%

20%

40%

60%

80%

100%

120%

140%

Yr (-2) Yr (-1) Yr (+1) Yr (+2)

Line 1

DCOffenders (n =62)

Inferring Cause From Trend?

• Post Hoc Ergo Propter Hoc?

• After this, then because of this?

Or would it have dropped anyway?

• “Natural” Trend• “History”• Other factors• “Spurious” explanations• That can be eliminated—give confidence

All ruled out by randomized controlled trials

(RCTs)

Randomized Controlled Trial RCT:COMPARISON or NET difference

101%

121%

71% 71%

28%

49%

100%

20%

40%

60%

80%

100%

120%

140%

Yr (-2) Yr (-1) Yr (+1) Yr (+2)

CourtOffenders (n=59)

DCOffenders (n =62)

CONTROL Group

• A sample that measures what would happen at the same time and place without the intervention being introduced to an otherwise identical

EXPERIMENTAL Group

Tracking

Crime Where When

Policing Patrol, POP Arrests

Match? Ratios Trends

Tracking in Politics

• Obama’s “Cave”• Tracking volunteer activity—500 offices• Contacting voters• Determining who supports Obama• Determining when & how they will vote• Arranging transport to the polls • Scheduling election day drivers • “Ground Game” cost $100 Million

3. Tracking in Prisons

• Liebling’s Quality of Prison Life Measures

• What are the trends in

--inputs (resources)

--outputs (activities)

--outcomes (results)

AMIMB-NOMS COMPSTAT?

Tracking With Evidence

• Discuss• Criticize• Problem-Solve• Use and misuse of data• Refinement through trial and error • Technology making change easier

Sherman: 1998

The Rise of Evidence?

• More Evidence is Available• More Evidence is Demanded• Budget cutbacks make evidence relevant• But there are only early adopters• A tipping point may some soon• The pace is quickening

How Can IMBs Be Evidence-Based?

1. Quantify current reports

2. Design new—fewer—measures

3. Test different models of IMB work

What Works?Targeting, Testing & Tracking

(Triple-T)

THANK YOU

Lawrence W. ShermanInstitute of CriminologyCambridge University

what works? targeting, testing & tracking (triple-t) amimb, glazier’s hall 29 october 2013...

Documents

kind slide

suicide slide

tiring slide

adjusted slide

act slide

daniel kahneman slide

gales prevention revived

high risk