dynamic embeddings for user profiling in twitter · 8 related workin dynamic topic models and...

26
Dynamic Embeddings for User Profiling in Twitter 1 KAUST, Saudi Arabia 2 JD.com, China 3 University of Amsterdam, The Netherlands Shangsong Liang 1 , Xiangliang Zhang 1 , Zhaochun Ren 2 , Evangelos Kanoulas 3

Upload: others

Post on 06-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

Dynamic Embeddings for User Profiling in Twitter

1 KAUST, Saudi Arabia2 JD.com, China 3 University of Amsterdam, The Netherlands

Shangsong Liang1, Xiangliang Zhang1,

Zhaochun Ren2, Evangelos Kanoulas3

Page 2: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

2

Overview

Ò The Task Background and Related Work

Ò Our Method

Ò Dynamic User and Word Embedding Model (DUWE)

Ò Streaming Keyword Diversification Model (SKDM)

Ò Experiments

Ò Conclusion

Page 3: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

3

The Task

Input: A stream of tweets generated across the time

Output: A set of keywords to profile the user at different point in time

Tweets over timeTwitter Users

Given a user at time t

SportFood

Page 4: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

4

The Task

Tweets over timeTwitter Users

Given a user at time t

SportFood

Relevant

Diversified

Dynamic

Page 5: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

5

Background of User Profiling Problem

Ò Expert finding task at TREC 2005 enterprise trackÒ Given documents which describes expert candidates, answer

a query with a sorted name list in a specific domain,☛ uncovering associations between people and topics

Ò A generative language modeling approach in Balong et al(2007)

Ò Works on a Static document collectionÒ Assumes users’ profiling results are unchanged

Need Dynamic User Profiling

Page 6: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

6

Dynamic User Profiling Approaches

Ò ExperTime (Rybak et al 2014)

Ò A probabilistic model for learning how personal researchinterests evolve (Fang and Godavarthy 2014)

Page 7: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

7

Limitations of Current User Profiling Methods

Ò Treat words as atomic units leading to a vocabulary mismatch that harms performance

Ò Represent words and users in disjoint vocabulary spaces making it difficult to measure the similarity between users and words when constructing the profile

Can words and users be embedded in the samesemantic space?

Can their embedding be modeled in the dynamicenvironment?

Page 8: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

8

Related Work in Dynamic Topic Models and Dynamic Embedding

Ò Dynamic Topic Models: modeling dynamic user interestsÒ Topic over time model (Wang et al. KDD 2006)Ò Topic tracking model (Iwata et al. IJCAI 2009)Ò Dynamic user clustering topic model (Liang et al. KDD 2016), etcÒ None of them is for user profiling

Ò Dynamic Word EmbeddingÒ Dynamic word embedding by separating data into time bins, and apply

word2vec within each bin (Kim et al. 2014, Hamilton et al. 2016) Ò Or based on Bayesian skip-gram model (Bamler and Mandt, 2017)Ò All of them are for words only but not for usersÒ All of them are not for user profiling

Page 9: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

9

Overview

Ò The Task Background and Related Work

Ò Our Method

Ò Dynamic User and Word Embedding Model (DUWE)

Ò Streaming Keyword Diversification Model (SKDM)

Ò Experiments

Ò Conclusion

Page 10: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

10

Our Approach

Ò Dynamic User and Word Embedding Model (DUWE)Ò Infer both users’ and words’ embeddings over time in the

same semantic spaceÒ Enable to measure the similarities between users’ and words’

embeddings

Ò Streaming Keyword Diversification ModelÒ Retrieve relevant keywords to profile users’ current interests

over timeÒ Diversify the returned relevant keywords such that the

keywords can cover all aspects of the users’ interests

Page 11: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

Dynamic User and Word Embedding

11

vt−1

zt−1

βt−1

ut−1

yt−1

αt−1

vt

zt

βt

ut

yt

αt

n+t−1 m

+t−1 n

+t

m+t

V V |Ut||Ut−1|

Word representation at t-1

User representation at t

Observed co-occurrence of words at t-1

Observed user-word pairs at t-1

p(Ut | Ut�1) / N (Ut�1,↵↵↵2t�1I) · N (0,↵↵↵2

0 I)

p(Vt | Vt�1) / N (Vt�1,���2t�1I) · N (0,���2

0 I)

User Diffusion

Word Diffusion

Page 12: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

12

Diffusion of user representation

According to Kalman filtering, we define the variance of transition kernel for auser embedding from t-1 to t

.

p(Ut | Ut�1) / N (Ut�1,↵↵↵2t�1I) · N (0,↵↵↵2

0 I)

GaussianPrior

• A

• F

• F

measuring the word distribution changes from previous time step t-1 to the current time step t for user u

Page 13: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

13

Diffusion of word representation

According to Kalman filtering, we define the variance of transition kernel for aword embedding from t-1 to t

.

GaussianPrior

• A

• F

• F

measuring the word distribution changes from t-1 to the current time step t

p(Vt | Vt�1) / N (Vt�1,���2t�1I) · N (0,���2

0 I)

Page 14: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

14

DUWE model inference

Ò Apply the skip-gram filtering for the inference (Bamler et al. 2017) and the variational inference algorithm to obtain the embeddings

Ò Posterior distribution over and conditional on the statistics information and as follows:

where we have:

skip-gram model for words skip-gram model for user and words

model transition for users model transition for words

positive and negative indicator matrices for all word-to-word pairs

positive and negative indicator matrices for all user-to-word pairs

Page 15: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

15

Streaming Keyword Diversification ModelÒ generating top-K relevant and diversified keywords for

profiling users’ interests at time t.

Page 16: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

16

Overview

Ò The Task Background and Related Work

Ò Our Method

Ò Dynamic User and Word Embedding Model (DUWE)

Ò Streaming Keyword Diversification Model (SKDM)

Ò Experiments

Ò Conclusion

Page 17: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

17

Experimental Setup

Ò DatasetsÒ 1,375 users randomly sampled from TwitterÒ 3.78 million tweets posted by the users from the beginning of their

registrations up to May 31, 2015Ò Two types of Ground Truth: One for evaluating Relevance-oriented

(RGT) performance and another for evaluating Diversity-oriented (DGT) performance.

Ò Evaluation MetricsÒ Relevance: Pre (Precision), NDCG, MRR, MAPÒ Their semantic version of the metrics, denoted as Pre-S, NDCG-S,

MRR-S, MAP-SÒ Diversity: Pre-IA (Intent-Aware Precision), α-NDCG, MRR-IA, MAP-IA

Page 18: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

18

Experimental Setup

Ò BaselinesÒ Non-dynamic Embedding Models

Ò Skip-Gram Model, i.e., word2vec Model (SGM)Ò Distributed Representations of Documents (DRD)

Ò Dynamic Traditional Profiling ModelÒ Predictive Language Model (PLM)

Ò Dynamic Topic ModelÒ User Clustering Topic model (UCT)

Ò Dynamic Embedding ModelsÒ Dynamic Independent Skip-Gram model (DISG)Ò Dynamic Pre-initialized Skip-Gram model (DPSG)Ò Dynamic Independent Distributed Representations of documents

(DIDR)Ò Dynamic Pre-initialized Distributed Representations of documents

(DPDR)

Page 19: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

19

Overall Performance

Ò Average relevance performance on time periods of each month

Page 20: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

20

Overall Performance

Ò Diversity performance on time periods of each month

Page 21: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

An Example User’s Dynamic Profiling Results over Time

21

Top-6 keywords of an example user’s dynamic profile, whose interests cover a number of aspects and dramatically change over time, from Sport, fitness, kitchen, exercise, to education.

Page 22: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

Relevance and diversity performance over time

22

Relevance performance over time Diversity performance over time

Page 23: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

Performance w.r.t. embedding dimensionality

23

Page 24: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

24

Overview

Ò The Task Background and Related Work

Ò Our Method

Ò Dynamic User and Word Embedding Model (DUWE)

Ò Streaming Keyword Diversification Model (SKDM)

Ò Experiments

Ò Conclusion

Page 25: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

25

Conclusions

Ò Study the problem of dynamic user profiling in Twitter

Ò Propose a Dynamic User and Word Embedding model (DUWE)

Ò Propose a Streaming Keyword Diversification Model (SKDM)

Ò Evaluate the performance of the proposed models in real dataset, Twitter

Page 26: Dynamic Embeddings for User Profiling in Twitter · 8 Related Workin Dynamic Topic Models and Dynamic Embedding Ò Dynamic Topic Models: modeling dynamic user interests Ò Topic over

Thank you for your attention!

Our paper at

http://www.kdd.org/kdd2018/accepted-papers/view/dynamic-

embeddings-for-user-profiling-in-twitter

Lab of Machine Intelligence and kNowledge Engineering (MINE): http://mine.kaust.edu.sa/