thoughts on big data and more for the wa state legislature

17
A few thoughts on big data, human services, and legislation Bill Howe, PhD Associate Professor, Information School Director, Urbanalytics Group Adjunct Associate Professor, Computer Science & Engineering University of Washington

Upload: university-of-washington

Post on 23-Jan-2018

49 views

Category:

Data & Analytics


0 download

TRANSCRIPT

A few thoughts on big data, human

services, and legislation

Bill Howe, PhDAssociate Professor, Information School

Director, Urbanalytics Group

Adjunct Associate Professor, Computer Science & Engineering

University of Washington

Data Science for Social Good

• Quarter-long, on-site projects, engagement two days per

week

– Simple two-page proposals

– 4-6 concurrent teams: Network effects among cohort beyond 1:1

– Each team is ~50% project lead + ~50% eScience FTE

• Capstone and course projects

• Commissioned Research projects

2http://urbanalytics.uw.edu/

Submit project ideas at

Predictors of Permanent Housing for Homeless Families

Project Leads: Neil Roche & Anjana Sundaram, Gates FoundationDSSG Fellows: Joan Wang, Jason Portenoy, Fabliha Ibnat, Chris SuberlakALVA High School Students: Cameron Holt, Xilalit SanchezeScience Data Scientist Mentors: Ariel Rokem, Bryna Hazelton

When homeless families engage in services and programs, what factors are most likely to lead to a successful exit?

The DSSG team • developed algorithms to identify

‘families’ and to identify ‘episodes’ of homelessness including back-to-back, or overlapping enrollments in individual programs

• devised innovative ways to visualize and analyze the ways families transition between programs

The Gates Foundation, together with Building Changes have partnered with King, Pierce and Snohomish counties to make homelessness in these counties rare, brief and one-time.

Homeless families may take many pathways through programs

Emergency

shelter

Transitional

housingRapid

re-housing

Permanent

housing

Housing with

servicesUnsuccessful

exit

Relatively simple visualizations…

Preliminary results to understand potential predictors of successful outcomes

Correlation with successful outcome,

by family characteristics

Correlation with successful outcome,

by homelessness program

Emergency Shelter use

tends to be associated

with unsuccessful

outcomes (unsurprising!)

Homelessness

Prevention programs

more strongly

associated with positive

outcomes than

transitional housing

Substance abuse

strongly associated with

unsuccessful outcomes

Parent employment

strongest predictor of

successful outcomes

Common trajectories lead to different outcomes: • a successful exit from an episode would mean that the family found a permanent housing

solution• a proportion of these still receive government subsidies • other exits are exits back into homelessness, or to other, unknown destinations

Novel Analyses of Family Trajectories through Programs

An example using Pierce County data

How much time do you spend “handling

data” as opposed to “doing science”?

Mode answer: “90%”

10/6/2017 Bill Howe, UW 8

My research for 10 years:Making it easier to work with large, noisy, heterogeneous datasets

• SQLShare: Easier to use databases

• Myria: Easier to use scalable systems

• Worked great in the physical sciences

• But social, health, and civic colleagues have stricter requirements…

October 6, 2017 9

10/6/2017 Bill Howe, UW 10

Observation:

Epistemic issues are beginning

to dominate the big data / data

science discussion in every field

reproducibility, algorithmic bias, curation, fairness,

accountability, transparency, provenance, explanations,

persuasion

11

Propublica, May 2016

“Should I be afraid of risk assessment tools?”

“No, you gotta tell me a lot more about yourself.

At what age were you first arrested?

What is the date of your most recent crime?”

“…and what was the culture of policing in the

neighborhood in which I grew up in?”

Technical.ly, September 2016

“Philadelphia is grappling with the prospect of a racist computer

algorithm”

First decade of Data Science research and practice:

What can we do with massive, noisy, heterogeneous datasets?

Next decade of Data Science research and practice:

What should we do with massive, noisy, heterogeneous datasets?

The way I think about this…..(1)

The way I think about this…. (2)

Decisions are based on two sources of information:

1. Past examplese.g., “prior arrests tend to increase likelihood of future arrests”

2. Societal constraintse.g., “we must avoid racial discrimination”

10/6/2017 Data, Responsibly / SciTech NW 14

We’ve become very good at automating the use of past examples

We’ve only just started to think about incorporating societal constraints

The way I think about this… (3)

How do we apply societal constraints to algorithmic decision-making?

Option 1: Rely on human oversight

Ex: EU General Data Protection Regulation requires that a human be involved in legally binding algorithmic decision-making

Ex: Wisconsin Supreme Court says a human must review algorithmic decisions made by recidivism models

Issues with scalability, prejudice

Option 2: Build systems to help enforce these constraints

This is the approach we are exploring

10/6/2017 Data, Responsibly / SciTech NW 15

10/6/2017 Bill Howe, UW 16

Closing thoughts….

• WA State has an opportunity to play a leadership role in legislation around algorithmic bias, fairness, accountability, and transparency

• We have the private and public tech expertise, the community engagement, and the political will to address this issue directly.

• If we let the technology guide the policy, we’re in trouble.