barocas - presentation - final - berkeley law · barocas - presentation - final.pptx author: sir...

26
Managing the Inferen-al Possibili-es of Open Government Data Solon Barocas and Arvind Narayanan Princeton University

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Managing  the  Inferen-al  Possibili-es  of  Open  Government  Data  

Solon  Barocas  and  Arvind  Narayanan  Princeton  University  

Page 2: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Jeff  Hammerbacher:  Facebook  “actually  built  a  system  to  predict  the  gender  of  a  user  who  did  not  offer  their  gender.”    

Page 3: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

“[W]hen  someone  suddenly  starts  buying  lots  of  scent-­‐free  soap  and  extra-­‐big  bags  of  coQon  balls,  in  addi-on  to  hand  sani-zers  and  washcloths,  it  signals  they  could  be  geTng  close  to  their  delivery  date.”    

Page 4: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z
Page 5: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z
Page 6: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

“Selected  most  predic-ve  Likes”  

Trait Selected most predictive Likes

IQ

Hig

h

The Godfather Mozart Thunderstorms The Colbert Report Morgan Freemans Voice The Daily Show Lord Of The Rings To Kill A Mockingbird Science Curly Fries

Jason Aldean Tyler Perry Sephora Chiq Bret Michaels Clark Griswold Bebe I Love Being A Mom Harley Davidson Lady Antebellum

Low Sa

tisfa

ctio

n W

ith L

ife

Satis

fied

Sarah Palin Glenn Beck Proud To Be Christian Indiana Jones Swimming Jesus Christ Bible Jesus Being Conservative Pride And Prejudice

Hawthorne Heights Kickass Atreyu (Metal Band) Lamb Of God Gorillaz Science Quote Portal Stewie Griffin Killswitch Engage Ipod

Dissatisfied

Ope

nnes

s

Libe

ral &

Arti

stic

Oscar Wilde Charles Bukowski Sylvia Plath Leonardo Da Vinci Bauhaus Dmt The Spirit Molecule American Gods John Waters Plato Leonard Cohen

NASCAR Austin Collie Monster-In-Law I  don’t  read Justin Moore ESPN2 Farmlandia The Bachelor Oklahoma State University Teen Mom 2

Conservative C

onsc

ient

ious

ness

Wel

l Org

anize

d

Law Officer National Law Enforcement Lowfares.Com Accounting Foursquare Emergency Medical Services Sunday Best Kaplan University Glock Inc Mycalendar 2010

Wes Anderson Bandit Nation Omegle Vocaloid Serial Killer Screamo Anime Vamplets Join If Ur Fat Not Dying

Spontaneous Ex

trav

ersio

n

Out

goin

g &

Act

ive

Beerpong Michael Jordan Dancing Socializing Chris Tucker I Feel Better Tan Modeling Cheerleading Theatre Flip Cup

RPGs Fanfiction.Net Programming Anime Manga Video Games Role Playing Games Minecraft Voltaire Terry Pratchet

Shy & Reserved

Page 7: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

“Computer-­‐based  personality  judgments  are  more  accurate  than  those  made  by  humans”  

Page 8: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

The  devil  is  not  just  in  the  details  •  Data  mining  provides  a  path  by  which  –  to  glean  facts  that  have  not  been  disclosed  –  to  guess  at  characteris-cs  that  cannot  be  observed  directly  

–  and  to  avoid  the  need  to  obtain  that  informa-on  from  others  

•  By  automa-ng  inference  and  extending  the  fron-er  of  the  inferable,  data  mining  introduces  new  routes  to  arrive  at  certain  personal  details—and  with  them,  a  new  set  of  prac-cal  and  philosophical  challenges  for  privacy  

Page 9: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Anecdotes  à  Systema-c  Understanding  

Page 10: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

A  Tenta-ve  Taxonomy  

1.  Inferences  made  automa-cally  and  en  masse  2.  Inferences  drawn  on  the  basis  of  criteria  not  

commonly  perceived  as  relevant    3.  Inferences  that  extend  the  fron-er  of  the  

inferable  

NB:  Elsewhere,  I’ve  tried  to  develop  a  norma-ve  theory  to  explain  what  counts  as  privacy-­‐viola-ng  inference-­‐making  (“Leaps  and  Bounds”)  

Page 11: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Why  Open  Government  Data?  

•  The  state  is  frequently  able  to  collect  informa-on  that  other  ins-tu-ons  cannot  – Compelled  disclosure  – Sharing  is  necessary  for  the  provision  of  basic  services  

•  The  state  is  likely  to  have  far  more  comprehensive  coverage  of  the  popula-on  than  any  other  ins-tu-on  

Page 12: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

A  Tenta-ve  Taxonomy  

1.  Inferences  made  automa-cally  and  en  masse  2.  Inferences  drawn  on  the  basis  of  criteria  not  

commonly  perceived  as  relevant    3.  Inferences  that  extend  the  fron-er  of  the  

inferable  

Page 13: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

A  Fool’s  Errand?  

•  Prac-cally  impossible  to  an-cipate  and  head-­‐off  the  many  different  inferences  that  government  data  might  support  

•  The  very  point  of  open  government  data  – To  facilitate  poten-al  uses  that  government  is  unlikely  to  recognize  or  an-cipate  

–  Indeed,  open  government  data  ini-a-ves  will  be  perceived  as  successful  only  to  the  extent  that  they  support  novel  applica-ons  

 

Page 14: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

A  Familiar  Problem  

Page 15: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

 Re-­‐Iden-fica-on  

 Deduc-ve  Inference  

 Machine  Learning  

 Induc-ve  Inference  (training  the  model)  

 Deduc-ve  Inference  (applying  the  model)  

     

Page 16: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Troy  Raeder,  Brian  Dalessandro,  Claudia  Perlich,  “Considering  Privacy  in  Predic-ve  Modeling  Applica-ons,”  Proceedings  of  the  Data  Ethics  Workshop,  Conference  on  Knowledge  Discovery  and  Data  Mining  (KDD),  August  2014  

Page 17: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

#FAIL  

“We  believe  that  the  empirical  study  of  machine  learning  and  data  mining  methods  oien  falls  prey  to  the  effects  of  publica-on  bias  that  favors  posi-ve  results  over  nega-ve  ones.  Most,  if  not  all,  ar-cles  in  conferences  and  journals  report  only  posi-ve  results.  This  does  not  reflect  the  prac-ce  of  a  field  where  failures  happen  regularly.”    Christophe  Giraud-­‐Carrier  and  Margaret  H  Dunham,  “On  the  Importance  of  Sharing  Nega-ve  Results,”  SIGKDD  Explora;ons  12,  no.  2  (December  2010):  3.  

Page 18: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Limi-ng  Factors  

Algorithms   Phenomena    Data    

Training  Data    

Labeled  Examples    

Feature  Selec-on/Engineering  

   To  what  extent  can  we  figure  out  which  limit  is  at  play  in  any  given  

situa-on—and  use  that  to  roll  the  clock  forward  to  think  through  future  situa-ons?  

Page 19: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z
Page 20: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z
Page 21: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

The  Phenomena  Themselves  

 Inferring  gender  from  narra-ve  text    

 (and  yet  we’re  very  good  at  author  

iden-fica-on)    

Page 22: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

From  the  Field  

•  Government  data  lacks  the  necessary  granularity  for  effec-ve  machine  learning  –  Insufficiently  precise  measurement  – Purposefully  coarse  data  

•  Do  anonymity-­‐preserving  techniques  have  the  addi-onal  effect  of  limi-ng  what  can  be  learned?  •  (Increasingly)  well  understood  trade-­‐off  with  u-lity,  but  no  clear  sense  of  whether  this  imposes  poten-ally  desirable  limits  on  what  can  be  learned  

Page 23: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

 Machine  Learning  

 Induc-ve  Inference  (training  the  model)  

 Deduc-ve  Inference  (applying  the  model)  

     

Page 24: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Induc-ve  Step  

•  How  much  training  data  do  we  need?  – What  propor-on  of  people  need  to  disclose  that  they  possess  a  certain  aQribute  for  data  miners  to  be  able  to  iden-fy  all  the  other  members  in  the  popula-on  who  also  have  this  aQribute?  

•  What  is  the  minimum-­‐sized  set  of  features  necessary  to  induce  a  reliable  rule?  

Page 25: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Deduc-ve  Step  

•  How  accessible  are  the  features  that  are  necessary  to  apply  the  rule?  –  The  government  informa-on  that  figures  in  the  induc-ve  step  may  be  more  or  less  difficult  to  observe  by  those  who  want  to  apply  the  rule  

•  How  easily  can  we  withhold  or  conceal  these  features?  –  Forthcoming  research  suggests  that  it  depends  on  the  learning  process  •  Daizhuo  Chen,  Robert  Moakler,  Samuel  P.  Fraiberger,  Foster  Provost,  “Enhancing  Transparency  and  Control  when  Predic-ng  Private  Traits  from  Digital  Indicators  of  Human  Tastes”  

Page 26: Barocas - Presentation - Final - Berkeley Law · Barocas - Presentation - Final.pptx Author: sir Created Date: 20150509131633Z

Thank  You  

 Solon  Barocas  

[email protected]  solon.barocas.org  

@s010n  

 Arvind  Narayanan  

[email protected]  randomwalker.info  @random_walker