finding your friends and following them to where you are #wsdm2012

Post on 11-Jun-2015

14.274 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented by Yoh Okuno, WSDM 2012 reading

TRANSCRIPT

Finding  Your  Friends  and  Following  Them  to  Where  You  Are

Adam  Sadilek,  Henry  Kautz,  Jeffrey  P.  Bigham    

University  of  Rochester,  New  York,  USA  

Presenter:  Yoh  Okuno  #wsdm2012  

•  Name:  Yoh  Okuno    

•  R&D  Engineer  at  Yahoo!  Japan  

•  Interest:  NLP  (Natural  Language  Processing),  

Machine  Learning,  and  Data  Mining.  

•  Skills:  C/C++,  Java,  Python,  and  Hadoop.  

•  Website:  http://yoh.okuno.name/  

About  Presenter

Overview

1.  Introduction  

2.  Friendship  Prediction  

3.  Location  Prediction  

4.  Evaluation  

5.  Conclusion  

1.  Introduction  

“Check-­‐in”  Services  or  Posts  with  Geo-­‐tags

Figure  1:  Tweets  with  Geo-­‐tags  at  New  York  City

http://cs.rochester.edu/u/sadilek/research  

Summary:  Predicting  Friendships  and  Locations

•  Tasks:  friendship  and  location  prediction  

•  Approach:  model  interaction  between  them  

•  Data:  real-­‐world  Twitter  dataset  

•  Problem:  private  locations  are  not  provided    

•  Result:  90%  of  private  locations  is  revealed  

Data:  Crawled  Twitter  Search  API  f0r  1  Month •  Focus  on  users  who  have  >100  geo-­‐tag  tweets  

FLAP:  Friendship  +  Location  Analysis  and  Prediction

Crawler

Visualizer

Learning  and  Inference

2.  Friendship  Prediction  Task  

Similarity  Features:  Text,  Location,  and  Graph

1.  Text:  inner  product  without  stop  word  

2.  Co-­‐location:  overlap  time  in  the  same  place  

3.  Graph  :  #  of  common  friends  (normalized)  

Learning:  Regression  Decision  Tree  (DT)

•  Used  DT  whose  output  is    probability  

•  These  3  features  had  the  maximum  

information  gain  for  DT  

•  Other  features  including  Jaccard  coefficient  

were  useless  in  this  case  

•  LSH  speeds  up  O(n^2)  operation

3.  Location  Prediction  Task  

Figure  3:  Dynamic  Bayesian  Network  (DBN)

•  People  move  between  tweets  t  and  t+1  

–  u_t:  location  of  user  u  at  tweet  t  

–  fi_t:  location  of  friend  i  at  tweet  t  

–  td_t:  time  of  day  at  tweet  t  

– w_t:  whether  it  is  work  day  or  not  at  tweet  t All  variables  are  discrete

Learning:  Both  Supervised  and  Unsupervised

•  Supervised  learning  for  each  geo-­‐active  users  

•  Unsupervised:  simulate  “virtual”  private  users  

– EM  algorithm  with  forward-­‐backward  

– Simulated  annealing  to  avoid  local  optimum  

4.  Evaluation  

Evaluation  for  Friendship  Prediction  Task

•  Evaluation  settings  

– Reconstructed  friendship  graphs  via  models  

– Selected  edges  randomly  from  0%  to  50%  

•  Evaluation  results  

– FLAP  outperforms  previous  works  

– FLAP  works  well  even  if  no  edges  were  given  

•  Note:  texts  and  locations  are  provided  normally  

Figure  4:  Averaged  ROC  Curve

Evaluation  for  Location  Prediction  Task

•  Evaluation  settings  – Data:  first  20  days  for  learning  /  later  6  days  for  test  

– Varied  #  of  friends  that  the  system  considers  

•  Evaluation  results  –  Supervised:  77%  accuracy  with  only  2  friends  

– Unsupervised:  57%  accuracy  with  9  friends  

–  “Locations  can  be  inferred  even  for  private  accounts”

Table  6:  Accuracy  for  Location  Prediction  Task

Conclusion

•  For  friendship  prediction  task:  

– Combined  text,  location  and  graph  features  

– Reconstructed  friendship  graph  with  no  seeds  

•  For  location  prediction  task:  

– Exploited  friend’s  locations  to  infer  location  

– Unsupervised  result  shows  “private  is  not  safe”  

Future  Work

•  Text  features  (NER)  for  location  prediction  

•  Joint  model  of  locations  and  friendships  

•  Evaluate  semi-­‐supervised  learning  (hopefully)  

•  Consider  the  privacy  issue  as  a  tradeoff  

Any  Questions?

More  Precisely:  Belief  Propagation

top related