finding your friends and following them to where you are #wsdm2012

Finding Your Friends and Following Them to Where You Are

Adam Sadilek, Henry Kautz, Jeffrey P. Bigham

University of Rochester, New York, USA

Presenter: Yoh Okuno #wsdm2012

•  Name: Yoh Okuno

•  R&D Engineer at Yahoo! Japan

•  Interest: NLP (Natural Language Processing),

Machine Learning, and Data Mining.

•  Skills: C/C++, Java, Python, and Hadoop.

•  Website: http://yoh.okuno.name/

About Presenter

Overview

1.  Introduction

2.  Friendship Prediction

3.  Location Prediction

4.  Evaluation

5.  Conclusion

1. Introduction

“Check-‐in” Services or Posts with Geo-‐tags

Figure 1: Tweets with Geo-‐tags at New York City

http://cs.rochester.edu/u/sadilek/research

Summary: Predicting Friendships and Locations

•  Tasks: friendship and location prediction

•  Approach: model interaction between them

•  Data: real-‐world Twitter dataset

•  Problem: private locations are not provided

•  Result: 90% of private locations is revealed

Data: Crawled Twitter Search API f0r 1 Month •  Focus on users who have >100 geo-‐tag tweets

FLAP: Friendship + Location Analysis and Prediction

Crawler

Visualizer

Learning and Inference

2. Friendship Prediction Task

Similarity Features: Text, Location, and Graph

1.  Text: inner product without stop word

2.  Co-‐location: overlap time in the same place

3.  Graph : # of common friends (normalized)

Learning: Regression Decision Tree (DT)

•  Used DT whose output is probability

•  These 3 features had the maximum

information gain for DT

•  Other features including Jaccard coefficient

were useless in this case

•  LSH speeds up O(n^2) operation

3. Location Prediction Task

Figure 3: Dynamic Bayesian Network (DBN)

•  People move between tweets t and t+1

–  u_t: location of user u at tweet t

–  fi_t: location of friend i at tweet t

–  td_t: time of day at tweet t

– w_t: whether it is work day or not at tweet t All variables are discrete

Learning: Both Supervised and Unsupervised

•  Supervised learning for each geo-‐active users

•  Unsupervised: simulate “virtual” private users

– EM algorithm with forward-‐backward

– Simulated annealing to avoid local optimum

4. Evaluation

Evaluation for Friendship Prediction Task

•  Evaluation settings

– Reconstructed friendship graphs via models

– Selected edges randomly from 0% to 50%

•  Evaluation results

– FLAP outperforms previous works

– FLAP works well even if no edges were given

•  Note: texts and locations are provided normally

Figure 4: Averaged ROC Curve

Evaluation for Location Prediction Task

•  Evaluation settings – Data: first 20 days for learning / later 6 days for test

– Varied # of friends that the system considers

•  Evaluation results –  Supervised: 77% accuracy with only 2 friends

– Unsupervised: 57% accuracy with 9 friends

–  “Locations can be inferred even for private accounts”

Table 6: Accuracy for Location Prediction Task

Conclusion

•  For friendship prediction task:

– Combined text, location and graph features

– Reconstructed friendship graph with no seeds

•  For location prediction task:

– Exploited friend’s locations to infer location

– Unsupervised result shows “private is not safe”

Future Work

•  Text features (NER) for location prediction

•  Joint model of locations and friendships

•  Evaluate semi-‐supervised learning (hopefully)

•  Consider the privacy issue as a tradeoff

Any Questions?

More Precisely: Belief Propagation

finding your friends and following them to where you are #wsdm2012

Technology

location prediction

location analysis

location unsupervised

tweets t

private locations

location of user u

location of friend i

tweet t td