what your tweets tell us about you, speaker notes

5

Click here to load reader

Upload: kriskasianovitz

Post on 01-Nov-2014

722 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: What Your Tweets Tell Us About You, Speaker Notes

• Introduce  paper  title  • Ask  people  to  interact,  comment,  respond  to  our  questions  during  presentation  using  

#tweetprivacy    • Credits    

o Charlesworth  –  whose  Digital  Lives  Report  was  one  of  the  only  papers  that  provided  any  analysis  and  guidance  in  the  area  of  social  media  archiving.    

   Interest  in  social  media  data  is  multidisciplinary,  resulting  in  conflicting  views  regarding  the  ethical  management  of  captured  datasets.  Curators  will  be  required  to  navigate  these  conflicting  views  as  they  work  to  provide  appropriate  mechanisms  for  access  and  reuse  of  these  data.      We  hope  to  encourage  researchers,  library,  archive,  or  repository  staff  to  engage  in  a  cross-­‐disciplinary  conversation  about  the  privacy  issues  (as  well  as  the  host  of  other  issues)  inherent  in  using  social  media  as  a  primary  source  for  research.      We’re  going  to  show  you  a  clip  from  Laila  Sakr’s  presentation  at  the  Tech@state  Data  Visualization  Conference  in  Washington  DC.  The  clip  provides  a  good  example  of  how  researchers  are  using  twitter  and  other  social  media  data.    [Play  Clip]      There  are  two  key  things  I  want  to  point  out:      1.  Long-­‐term  archiving  of  this  data  and  other  curatorial  issues  like  value,  authenticity,  and  significant  properties  are  absent  from  this  talk,  which  is  not  surprising.  They  were  also  absent  in  many  of  the  papers  we  read  that  utilized  Twitter  data.  This  demonstrates  that  there  is  an  overall  emphasis  by  researches  at  this  point,  on  collection  and  analysis  rather  than  on  preservation.      2.  Sakr  makes  sure  to  say  that  she  is  downloading  only  the  publicly  available  tweets  using  the  search  API  and  how  this  could  potentially  affect  her  sample  and  the  validity  of  it.    She’s  not  talking  about  it  in  terms  of  privacy  issues  –  which  further  illustrates  that  the  focus  is  on  analysis  rather  privacy  or  the  ethics.    We’d  like  to  take  an  informal  poll  similar  to  last  night’s  poll  of  the  audience’s  willingness  to  have  their  genome  sequenced.    Who  among  those  of  you  who  use  Twitter  as  a  communication  tool  is  completely  fine  with  having  your  tweets,  profile  information,  images,  location  data  downloaded,  analyzed,  archived,  preserved?    -­‐of  those  of  you  with  your  hands  raised,  how  many  of  you  have  tweeted  something  of  a  more  personal  nature  that  you  might  not  want  archived?    And  who  here  is  actively  involved  with  the  collection  of  Twitter  data?  –  any  social  media  data?  ?What  do  you  do  with  it  –  Tweet  here]      The  reason  I  ask  is  we  found  through  our  work  with  the  Hypercities  Egypt  Twitter  data,  that  the  issue  of  whether  or  not  there  are  privacy  concerns  with  a  data  source  like  Twitter  is  essentially  a  research  ethics  issue;  which  varies  depending  on  the  role  and/or  subject  background  of  the  researcher  and  how  they  view  the  context  of  the  data  creation.    (refer  Confounding  Relationships  to  point  out  various  roles)    

Page 2: What Your Tweets Tell Us About You, Speaker Notes

So,  our  central  thesis  is  that  perceptions  of  privacy  in  social  media  platforms  are  formed  by  disciplinary  culture,  the  capabilities  and  constraints  of  the  platform,  and  community  norms  the  platform  itself.    Does  analyzing  a  person's  Tweets  constitute  researching  a  human  subject?  Or  are  Tweets  a  creative  text  which  requires  proper  citation  and  credit  to  the  authors  or  tweeters?  Or  are  Tweets  part  of  the  open  public  record.  Social  scientists  tend  to  view  the  data  as  Human  Subject  research,  while  Humanists  tend  to  view  the  data  as  a  form  of  publication.    These  very  different  ways  of  viewing  the  data  require  different  methods  for  dealing  with  privacy.      We  feel  it  is  important  to  state  that  social  media  data  are  not  homogenous;  each  platform  has  its  own  unique  constraints  for  the  creation/inclusion  of  content  as  well  as  constraints  on  how  users  may  engage  in  the  space,  and  their  expectations  and  norms  of  interaction.      Our  case  study  focuses  on  Twitter,  so  while  we  provide  a  general  framework  assessing  privacy  issues  with  social  media,  it  must  be  understood,  that  because  of  the  uniqueness  of  Twitter’s  Privacy  Policies,  Terms  of  Service,  Developers  Rules  of  the  Road,  the  analysis  and  interpretation  are  not  necessarily  generalizable  to  other  platforms,  such  as  Facebook.    Like  many  data  curation  activities  there  will  be  some  facets  which  can  be  generalized,  while  others  may  be  platform,  or  subject  specific.    Part  of  determining  the  curation  needs  of  social  media  data  will  be  to  determine  these  boundaries.      What  can  we  learn  about  you  from  Twitter?  [Show  different  visualizations,  then  tweet  map,  tweet  image]  Depending  on  how  the  data  are  visualized  we  can  learn  about  you  as  an  individual,  your  internet  relations,  or  as  part  of  huge  collective,  or  nothing  about  you  as  an  individual  (r-­‐shief  image).    Different  visualizations  will  enable  better  anonymization  than  others.    However,  the  underlying  dataset  used  to  generate  the  visualizations  will  still  contain:  if  your  account  is  unprotected,  name,  location,  photos,  etc.  anything  you  decide  to  share  in  your  timeline  –  so  if  you  include  other  personal  info  –  like  an  email  or  some  such  thing,  we  can  find  it  out  about  you.    But  What  else  can  we  find  out  about  you?  [show  the  Alyaa  Gad  slide  –  then  the  Google  Search]    Thanks  to  the  power  of  search  engines  like  google,  we  can  get  a  lot  more  information,  which  may  be  collected  and  archived  as  well.      Our  Case  Study  or  what  I  like  to  call  “we’ve  got  tweets,  now  what?”    Todd  Presner,  a  UCLA  Faculty  member  and  two  researchers  collected  a  subset  of  the  overall  Twitter  data  available.  He  asked  the  library  to  archive  it.  Before  we  could  do  anything  with  it,  we  had  to  assess  what  he  had  collected.      The  HyperCities  team  used  the  Twitter  Search  API  to  pull  data  based  on  the  location  parameter  (within  200  km  of  the  center  of  Cairo),  time  period  (January  30,  2011  through  February  24,  2011),  AND  one  of  three  hashtags  (#jan25  OR  #egypt  OR  #tahrir).        They  downloaded  approximately  420,000  public  Tweets  during  the  initial  phase  of  this  analysis  and  continue  to  feed  their  site  with  live  feeds.      

Page 3: What Your Tweets Tell Us About You, Speaker Notes

Like  Sakr,  the  data  capture  was  motivated  by  the  fact  that  significant  events  were  taking  place  using  Twitter,  and  because  twitter  data  disappears  quickly  (10  days),  they  decided  to  start  downloading  and  make  it  available  to  as  many  people  as  possible  for  future  reference  and  study.      There  wasn’t  necessarily  any  research  question  or  overarching  thesis  behind  the  collection  other  than  to  provide  a  glimpse  back  to  the  Egyptian  Revolution  Twitterverse.    As  Dr.  Charlesworth  pointed  out  yesterday  morning,  legal  issues  with  gathering  this  type  of  data  won’t  be  at  the  forefront  of  the  researcher’s  mind.      Based  on  the  search  parameters,  the  data  set  captured  eight  out  of  approximately  forty  possible  Twitter  data  fields,  revealing  how  the  method  of  capture,  and  search  parameters  profoundly  shape  the  resultant  data.      The  data  is  sitting  on  Prof.  Presner’s  personal  server  as  JSON  files,  but  the  data  will  soon  be  converted  into  XML  for  ease  in  depositing  and  managing  the  data  in  Isalandora.    These  facts  must  be  documented  in  order  for  future  users  to  have  a  clear  understanding  of  the  data  set.        “But  the  data  are  already  public…”  So  if  the  general  understanding  that  your  twitter  data  is  open  and  public,  and  that  people  using  these  platforms  want  to  be  seen  AND  heard,  why  should  we  be  concerned  about  privacy?        The  Privacy  Policy  of  twitter  stipulates  that  while  you  “own”  your  content  –  anyone,  including  twitter  or  any  third  party,  are  given  the  right  to  access  your  data  and  re-­‐use  it.  (our  reading  of  the  privacy  policy)     Those who see Twitter data as data that contains potentially identifying information about human subjects may want to anonymize the data for the authors' protection, and may see displaying user names as unethical. This runs contrary to Twitters Rules of the Road which require the display of a user id to give credit to the person who tweeted.  Yet Twitter also acknowledges this public/private tension in their own policies by suggesting if there is a concern over privacy or security risks by making a user id or other information available, the individual or media should get in touch with them. The  debate  about  the  capture,  reuse,  and  display  of  Twitter  data  is  the  line  between    thelegality  of  collecting  this  content  and  the  ethics  of  doing  so.      To  date  there  haven’t  been  any  formal  legal  challenges  about  the  downloading,  use  and  archiving  of  Twitter  data,  that  we  are  aware  of.        Thus  ensues  a  wide-­‐ranging  debate  by  scholars  who  characterize  privacy  issues  with  social  media  data  in  the  following  ways:    Most  researchers  take  a  harm-­‐based  view  of  privacy,  in  which  the  goal  is  to  protect  users’  information  from  negative  actors.    

This  includes  concern  for  security  issues  (used  by  government  agencies  to  track  and  arrest;  use  as  evidence).    

Recognizing  there  are  loopholes  in  the  data,  which  enables  someone  to  get  a  lot  of  information  about  an  individual,  even  if  all  you  have  is  a  username;  deletion  of  account  and  changing  from  public  to  private  content  captured  will  be  available.    

Page 4: What Your Tweets Tell Us About You, Speaker Notes

 Finally,  (Buyer  beware)  those  users  who  have  opted  to  make  their  accounts  public  have  no  grounds  for  complaint  about  the  collection  and  reuse  of  their  content,  even  if  they  did  not  anticipate  reuse  by  researchers  or  commercial  firms  (Thelwall,  2010;  Vieweg,  2010).    Danah  boyd  still  asks:  Just  because  we  can  collect  it,  should  we?    Michael  Zimmer,  an  Internet  Privacy  scholar,  argues  instead  for  a  dignity-­‐based  view  of  privacy  that  views  the  act  of  another  person  taking  one’s  personal  information  from  the  social  networking  sphere,  amassing  into  a  database,  making  available  for  use  and  scrutiny,  is  an  affront  to  the  users’/subjects’  human  dignity  and  their  ability  to  control  the  flow  of  their  personal  information.        Finally,  What  are  the  user’s  expectations  of  how  their  tweets  will  be  used?      How  many  here  have  actually  read  Twitter’s  privacy  policy?  FB?    Do  you  understand  the  implications  of  re-­‐use?    ___Schmidt,  Trepte,  and  Reinecke  (2011)  observe  that  users  develop  shared  routines  and  expectations  of  self-­‐disclosure,  noting  that  privacy  management  is  performed  for  a  specific  audience.      Facebook  for  example  enables  users  to  select  privacy  settings  on  a  post-­‐by-­‐post  basis,  choosing  who  is  able  to  read,  comment,  and  interact  with  specific  content,  and  allowing  the  user  fairly  granular  control  over  the  flow  of  their  information.  Twitter  allows  only  binary  control;  users  can  designate  their  account  as  “protected”  (i.e.  Tweets  are  only  visible  to  approved  followers),  or  “public”  (enabled  by  default),  which  makes  a  user’s  profile  and  timeline  accessible  to  anyone,  even  those  without  a  Twitter  account.      The  ethical  jury  is  going  to  be  out  on  this  for  a  while;  at  least  until  scholarly  communities  work  out  parameters  and  provide  guidance  for  acceptable  use  of  social  media  data.    In  the  meantime,  what  are  we  to  do?  Legal  and  ethical  policy  related  to  privacy  and  social  media  data  is  still  in  flux  and  almost  always  lags  behind  the  pace  of  research.  Yet  libraries,  etc  are  pressured  to  act  now  to  archive  the  data.      Data  repositories  will  be  caught  in  the  middle  of  these  divergent  viewpoints  when  trying  to  determine  the  best  methods  of  providing  access  to  the  data.      The  norms  of  individual  research  disciplines  often  provide  guidance  for  curators,  but  when  researchers  with  divergent  norms  seek  access  to  the  same  data,  it  can  be  difficult  to  determine  how  best  to  serve  the  broadest  number  of  users.        Experience  with  this  data  set  and  the  literature  review  led  us  to  the  following  recommendations  Libraries  or  other  data  repositories  will  need  to  decide  if  archiving  social  media  data  fits  with  their  overall  institutional  mission  and  goals.        Libraries  should  determine  the  overall  risks  associated  with  collecting  and  archiving  social  media  data  and  design  strategies  to  mitigate  those  risks.    Because  of  the  significance  scholars  are  placing  on  the  need  to  collect  and  now  in  our  case  archive  twitter  data,  we  are  convinced  that  providing  for  the  collection,  preservation,  and  reuse  of  social  media  data  requires  at  the  very  least  conversations  among  researchers,  libraries,  archives,  institutional  review  

Page 5: What Your Tweets Tell Us About You, Speaker Notes

boards,  scholarly  societies,  and  other  national  and  international  organizations  concerned  with  the  production  and  preservation  of  scholarship.      Part  of  the  discussion  will  need  to  include  the  context  or  conditions  under  which  the  data  have  been  collected.    One  important  aspect  of  this,  as  Dr.  Charlesworth  mentioned  in  his  talk  yesterday,  is  a  way  to  gather  “legal  metadata”  so  that  going  forward  the  archive  or  repository  will  have  the  necessary  privacy  I’s  dotted  and  t’s  crossed,  in  so  much  as  it  is  possible.      Libraries  should  engage  researchers  as  early  as  possible  in  the  research  process.    curators  are  presented  with  a  golden  opportunity  to  collaborate  with  researchers  as  close  to  the  beginning  of  the  research  lifecycle.    Through  a  collaborative  process,  we  can  ideally  facilitate  the  creation  of  collections  that  balance  openness  with  privacy  concerns,  and  encourage  broad  reuse.      While  that  early  intervention  may  not  happen,  we  can  employ  curatoratorial  strategies  on  the  backend  of  the  data  gathering  will  hopefully  push  the  issue.  (one  of  which  will  be  discussed  in  our  next  recommendation.)    Here  we  start  to  addresses  the  question  somewhat  that  was  asked  yesterday  at  Dr.  Charlesworth’s  presentation  about  educating  the  researchers.        Libraries  choosing  to  archive  social  media  data  should  develop  clear  and  easy  to  use  collection  and  deposit  policies,  forms  and  tools.      It  has  been  our  argument  since  first  working  with  Twitter  Data  that  a  way  to  both  educate  researchers  and  create  ingestible,  reusable  data  into  a  repository  is  to  create  a  workflow  that  asks  the  necessary  questions  of  researchers,  which  would  aid  in  the  creation  of  a  codebook  and  documentation  for  the  data.    We  created  a  twitter  deposit  form  that  is  geared  toward  raising  the  privacy  issues  with  this  platform,  educating  the  researcher,  as  well  as  providing  a  way  to  record  the  basic  legal  and  descriptive  metadata  necessary  for  contextualizing  the  data  for  re-­‐use.    Teachable  moments  for  information  literacy  librarians.  Understand  and  know  the  source  of  information.    Twitter  adds  language  to  their  privacy  policy  that  more  explicitly  state  use.  (gain  consent  –  and  then  the  data  truly  become  open)    Ideally,  Twitter  would  take  a  different  approach  to  releasing  the  data  for  research;  partner  directly  with  researchers;  rather  than  with  a  third  party  like  GNIP  which  charges  for  the  data  and  isn’t  clear  what  can  be  done  with  it  once  it’s  been  purchased.      Lastly,  thanks  to  all  who  have  been  tweeting  during  the  session;  we  wanted  to  let  you  know  that  we’ve  archived  them  in  TwapperKeeper.