a unified microblog user similarity model for online...

31
NLPCC 2014 A Unified Microblog User Similarity Model for Online Friend Recommendation Shi Feng, Le Zhang, Daling Wang, Yifei Zhang

Upload: others

Post on 14-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

A Unified Microblog User Similarity Model for Online Friend Recommendation

Shi Feng, Le Zhang, Daling Wang, Yifei Zhang

Page 2: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Outline

• Motivation

• The Characteristics of Friend Relationship in

Microblogs

• Friend Recommendation by Combining Multiple

Measures

• Experiments

• Conclusion and Future Work

Page 3: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Outline

• Motivation

• The Characteristics of Friend Relationship in

Microblogs

• Friend Recommendation by Combining Multiple

Measures

• Experiments

• Conclusion and Future Work

Page 4: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Motivation

• Real-life friends from school-mates, colleagues, neighbors

• Extend real-life social relations into online virtual social networks

• Microblogging services – Weibo, Twitter

– 536 million registered users

– More than 100 million tweets are generated per day

Page 5: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Motivation

• Friend relationship in microblog is different – The users can follow someone without his or her

permissions (more casual)

– The users may add a friend link to someone because of similar hobbies, tags, locations or hot topics

• Major contributions – Find critical features for friend recommendation

– Propose a similarity model by linearly combining multiple measures

– Validate the effectiveness of the proposed method on a real-world microblog dataset

Page 6: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Outline

• Motivation

• The Characteristics of Friend Relationship in

Microblogs

• Friend Recommendation by Combining Multiple

Measures

• Experiments

• Conclusion and Future Work

Page 7: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Crawled Dataset

• Who is the target user’s potential good friend in microblogs? It can be determined by many features because the microblog is full of personal and social relation information

Where the

user is from

Where the user

have been

High

coverage

Page 8: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Statistics of the Crawled Dataset

• The average similarity between users of friends and strangers (based on cosine similarity)

• These features are good indicators for friend recommendation

Page 9: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Outline

• Motivation

• The Characteristics of Friend Relationship in

Microblogs

• Friend Recommendation by Combining Multiple

Measures

• Experiments

• Conclusion and Future Work

Page 10: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Overall Framework of the Proposed Model

Page 11: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Candidate Friend Set Generation

• Select users that are friends’ friends

– Rank the users by their common friends

– Extract the top k users in fr(u)

• Select the most popular k users in microblogs – Assumption: The celebrities are usually good candidate

friends

• Combine these two set together to form the final candidate friend set fc(u) that has 2k users

1

( ) ( ) ( )n

r i

i

f u f u f u

Page 12: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Overall Framework of the Proposed Model

Page 13: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

User Tag Similarity

• Challenges – User tag vectors are very

sparse

– Many OOV words for WordNet

• Build tag tree – hierarchical clustering

– Recalculate the similarity based on the tree

{tag1}

{tag5} {tag6}

{tag2}

{tag4}

{tag3}

{tag2,tag3}

{tag5,tag6}

{tag1,tag2,tag3}

{tag1,tag2,tag3,tag4}

{tag1,tag2,tag3,tag4,tag5,tag6} Depth 0

Depth 1

Depth 2

Depth 3

Depth 4

Page 14: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

User Tag Similarity

• Similarity between two tags

• Similarity between two users

{tag1}

{tag5} {tag6}

{tag2}

{tag4}

{tag3}

{tag2,tag3}

{tag5,tag6}

{tag1,tag2,tag3}

{tag1,tag2,tag3,tag4}

{tag1,tag2,tag3,tag4,tag5,tag6} Depth 0

Depth 1

Depth 2

Depth 3

Depth 4

, 1 ( , )

1( , )

2 /( ( ) ( 1))tt a b

k k SP a b

sim t td k d k

( ) ( )

1( , ) ( , )

| ( ) || ( ) |a i b j

ts i j tt a b

t T u t T ui j

sim u u sim t tT u T u

Page 15: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Overall Framework of the Proposed Model

Page 16: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Overall Framework of the Proposed Model

Page 17: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

User Geography Similarity

• Location Similarity – simct(ui,uj)=1, if the two users have the same location

– simct(ui,uj)=0, if the two user do not have same location

• Check-in Similarity – Divide the check-in information into 12 categories

– Represent check-in using a vector with 12 dimensions

– chk(u)={cp1,cp2,…,cp12}

– simchk(ui,uj)=cos(chk(ui),chk(uj))

• Geography similarity

( , ) ( , ) (1 ) ( , )loc i j ct i j chk i jsim u u sim u u sim u u

Page 18: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Overall Framework of the Proposed Model

Page 19: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

User Hot Topic Similarity

• The hot topic discussion that user takes part in could reflect his/her interests and hobbies

• If two users have discussed more hot topics in common, they will have bigger similarity

| ( ) ( ) |( , ) ( ( ), ( ))

| ( ) ( ) |

i j

tp i j i j

i j

TP u TP usim u u Jaccard TP u TP u

TP u TP u

Page 20: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Overall Framework of the Proposed Model

Page 21: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Unified User Similarity

• Tag, Geography, Hot topic information

• Rank the users in candidate friend set by sim(u,ui)

( , ) ( , ) ( , ) (1 ) ( , )i ts i loc i tp isim u u sim u u sim u u sim u u

Page 22: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Outline

• Motivation

• The Characteristics of Friend Relationship in

Microblogs

• Friend Recommendation by Combining Multiple

Measures

• Experiments

• Conclusion and Future Work

Page 23: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Experiment Setup

• We conduct the 5-fold cross validation on the crawled dataset

• We randomly partition user’s current friends and non-friends into 5 groups respectively

• We randomly put one group of friends and one group of non-friends together to form a subset of the crawled data

• For each run, four of the five subsets are used for training and the remaining one subset is used for testing

• Precision, Recall and F-Measure are used for evaluation

Page 24: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Parameter Tuning for Location Similarity

( , ) ( , ) (1 ) ( , )loc i j ct i j chk i jsim u u sim u u sim u u

Page 25: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Parameter Tuning for Location Similarity

( , ) ( , ) (1 ) ( , )loc i j ct i j chk i jsim u u sim u u sim u u

Page 26: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Parameter Tuning for Friend Recommendation

( , ) ( , ) ( , ) (1 ) ( , )i ts i loc i tp isim u u sim u u sim u u sim u u

Page 27: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Recommendation Results

Page 28: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Recommendation Results

Page 29: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Outline

• Motivation

• Related Work

• Learning Sentiment Lexicon from Massive

Microblog Data

• Sentiment Lexicon Optimization

• Experiments

• Conclusion and Future Work

Page 30: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Conclusion and Future Work

• The friend relationship in microblogs are quite different from other traditional social media – More casual/Unidirectional friendship

– Tags, Locations, Check-ins, Hot topics are good indicators for friend recommendation

• A linearly combination model of multiple measurements are proposed for calculating similarity in microblogs

• Future work – More measures, such as time factors

– New similarity measurement

Page 31: A Unified Microblog User Similarity Model for Online ...tcci.ccf.org.cn/conference/2014/ppts/nlpcc/ppt98.pdf · –Propose a similarity model by linearly combining multiple measures

NLPCC 2014

Thank you for your attention!