finding your friends and following them to where you are #wsdm2012
DESCRIPTION
Presented by Yoh Okuno, WSDM 2012 readingTRANSCRIPT
Finding Your Friends and Following Them to Where You Are
Adam Sadilek, Henry Kautz, Jeffrey P. Bigham
University of Rochester, New York, USA
Presenter: Yoh Okuno #wsdm2012
• Name: Yoh Okuno
• R&D Engineer at Yahoo! Japan
• Interest: NLP (Natural Language Processing),
Machine Learning, and Data Mining.
• Skills: C/C++, Java, Python, and Hadoop.
• Website: http://yoh.okuno.name/
About Presenter
Overview
1. Introduction
2. Friendship Prediction
3. Location Prediction
4. Evaluation
5. Conclusion
1. Introduction
“Check-‐in” Services or Posts with Geo-‐tags
Figure 1: Tweets with Geo-‐tags at New York City
http://cs.rochester.edu/u/sadilek/research
Summary: Predicting Friendships and Locations
• Tasks: friendship and location prediction
• Approach: model interaction between them
• Data: real-‐world Twitter dataset
• Problem: private locations are not provided
• Result: 90% of private locations is revealed
Data: Crawled Twitter Search API f0r 1 Month • Focus on users who have >100 geo-‐tag tweets
FLAP: Friendship + Location Analysis and Prediction
Crawler
Visualizer
Learning and Inference
2. Friendship Prediction Task
Similarity Features: Text, Location, and Graph
1. Text: inner product without stop word
2. Co-‐location: overlap time in the same place
3. Graph : # of common friends (normalized)
Learning: Regression Decision Tree (DT)
• Used DT whose output is probability
• These 3 features had the maximum
information gain for DT
• Other features including Jaccard coefficient
were useless in this case
• LSH speeds up O(n^2) operation
3. Location Prediction Task
Figure 3: Dynamic Bayesian Network (DBN)
• People move between tweets t and t+1
– u_t: location of user u at tweet t
– fi_t: location of friend i at tweet t
– td_t: time of day at tweet t
– w_t: whether it is work day or not at tweet t All variables are discrete
Learning: Both Supervised and Unsupervised
• Supervised learning for each geo-‐active users
• Unsupervised: simulate “virtual” private users
– EM algorithm with forward-‐backward
– Simulated annealing to avoid local optimum
4. Evaluation
Evaluation for Friendship Prediction Task
• Evaluation settings
– Reconstructed friendship graphs via models
– Selected edges randomly from 0% to 50%
• Evaluation results
– FLAP outperforms previous works
– FLAP works well even if no edges were given
• Note: texts and locations are provided normally
Figure 4: Averaged ROC Curve
Evaluation for Location Prediction Task
• Evaluation settings – Data: first 20 days for learning / later 6 days for test
– Varied # of friends that the system considers
• Evaluation results – Supervised: 77% accuracy with only 2 friends
– Unsupervised: 57% accuracy with 9 friends
– “Locations can be inferred even for private accounts”
Table 6: Accuracy for Location Prediction Task
Conclusion
• For friendship prediction task:
– Combined text, location and graph features
– Reconstructed friendship graph with no seeds
• For location prediction task:
– Exploited friend’s locations to infer location
– Unsupervised result shows “private is not safe”
Future Work
• Text features (NER) for location prediction
• Joint model of locations and friendships
• Evaluate semi-‐supervised learning (hopefully)
• Consider the privacy issue as a tradeoff
Any Questions?
More Precisely: Belief Propagation