thoughts on big data and more for the wa state legislature
TRANSCRIPT
A few thoughts on big data, human
services, and legislation
Bill Howe, PhDAssociate Professor, Information School
Director, Urbanalytics Group
Adjunct Associate Professor, Computer Science & Engineering
University of Washington
Data Science for Social Good
• Quarter-long, on-site projects, engagement two days per
week
– Simple two-page proposals
– 4-6 concurrent teams: Network effects among cohort beyond 1:1
– Each team is ~50% project lead + ~50% eScience FTE
• Capstone and course projects
• Commissioned Research projects
2http://urbanalytics.uw.edu/
Submit project ideas at
Predictors of Permanent Housing for Homeless Families
Project Leads: Neil Roche & Anjana Sundaram, Gates FoundationDSSG Fellows: Joan Wang, Jason Portenoy, Fabliha Ibnat, Chris SuberlakALVA High School Students: Cameron Holt, Xilalit SanchezeScience Data Scientist Mentors: Ariel Rokem, Bryna Hazelton
When homeless families engage in services and programs, what factors are most likely to lead to a successful exit?
The DSSG team • developed algorithms to identify
‘families’ and to identify ‘episodes’ of homelessness including back-to-back, or overlapping enrollments in individual programs
• devised innovative ways to visualize and analyze the ways families transition between programs
The Gates Foundation, together with Building Changes have partnered with King, Pierce and Snohomish counties to make homelessness in these counties rare, brief and one-time.
Homeless families may take many pathways through programs
Emergency
shelter
Transitional
housingRapid
re-housing
Permanent
housing
Housing with
servicesUnsuccessful
exit
Preliminary results to understand potential predictors of successful outcomes
Correlation with successful outcome,
by family characteristics
Correlation with successful outcome,
by homelessness program
Emergency Shelter use
tends to be associated
with unsuccessful
outcomes (unsurprising!)
Homelessness
Prevention programs
more strongly
associated with positive
outcomes than
transitional housing
Substance abuse
strongly associated with
unsuccessful outcomes
Parent employment
strongest predictor of
successful outcomes
Common trajectories lead to different outcomes: • a successful exit from an episode would mean that the family found a permanent housing
solution• a proportion of these still receive government subsidies • other exits are exits back into homelessness, or to other, unknown destinations
Novel Analyses of Family Trajectories through Programs
An example using Pierce County data
How much time do you spend “handling
data” as opposed to “doing science”?
Mode answer: “90%”
10/6/2017 Bill Howe, UW 8
My research for 10 years:Making it easier to work with large, noisy, heterogeneous datasets
• SQLShare: Easier to use databases
• Myria: Easier to use scalable systems
• Worked great in the physical sciences
• But social, health, and civic colleagues have stricter requirements…
October 6, 2017 9
10/6/2017 Bill Howe, UW 10
Observation:
Epistemic issues are beginning
to dominate the big data / data
science discussion in every field
reproducibility, algorithmic bias, curation, fairness,
accountability, transparency, provenance, explanations,
persuasion
“Should I be afraid of risk assessment tools?”
“No, you gotta tell me a lot more about yourself.
At what age were you first arrested?
What is the date of your most recent crime?”
“…and what was the culture of policing in the
neighborhood in which I grew up in?”
Technical.ly, September 2016
“Philadelphia is grappling with the prospect of a racist computer
algorithm”
First decade of Data Science research and practice:
What can we do with massive, noisy, heterogeneous datasets?
Next decade of Data Science research and practice:
What should we do with massive, noisy, heterogeneous datasets?
The way I think about this…..(1)
The way I think about this…. (2)
Decisions are based on two sources of information:
1. Past examplese.g., “prior arrests tend to increase likelihood of future arrests”
2. Societal constraintse.g., “we must avoid racial discrimination”
10/6/2017 Data, Responsibly / SciTech NW 14
We’ve become very good at automating the use of past examples
We’ve only just started to think about incorporating societal constraints
The way I think about this… (3)
How do we apply societal constraints to algorithmic decision-making?
Option 1: Rely on human oversight
Ex: EU General Data Protection Regulation requires that a human be involved in legally binding algorithmic decision-making
Ex: Wisconsin Supreme Court says a human must review algorithmic decisions made by recidivism models
Issues with scalability, prejudice
Option 2: Build systems to help enforce these constraints
This is the approach we are exploring
10/6/2017 Data, Responsibly / SciTech NW 15
Closing thoughts….
• WA State has an opportunity to play a leadership role in legislation around algorithmic bias, fairness, accountability, and transparency
• We have the private and public tech expertise, the community engagement, and the political will to address this issue directly.
• If we let the technology guide the policy, we’re in trouble.