plans on “latent topic model”. high-level architecture users ads user encoding ectr / fb...

18
Plans on “Latent Topic Model”

Upload: denis-west

Post on 14-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Plans on “Latent Topic Model”

Page 2: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

High-Level Architecture

Users Ads

UserEncoding

eCTR / FB Prediction

UserClustering

UserEncoding

Prediction

Page 3: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Existing Pipeline

• Encoding– Auto-encoder for dimension reduction– Political affiliation clustering– Output: Hive table (user id + low-dim representation)

• eCTR prediction– Optional: user clustering stage

Page 4: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Approaches to use encoding in eCTR prediction

Page 5: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Social Networks

Page 6: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Information on a social network• Social graph

– Friendship networks– User-ads network ...

• Text– News feed– Messages– Ads text …

• Images – Album– Random posts– Ads figures …

• Demographics – Age, occupation …

• Very high-dimensional• Non-independent • Insufficient training data (this is

true even we use the whole web)• Hard to optimize and interpret

eCTR

Page 7: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Essentials of a good user-ads representation

• Distilling all local attribute semantics– Social roles – Topical contents– Ideology/sentiment

• Capture relational information– long range indirect influence– social environments and contexts

• Capture dynamic trends– e.g., change of strength of interest– New/dying interests

• Discriminative: – optimize against well-defined predictive task rather than vague intermediate

goals such as clustering

• Low dimensional and (perhaps) interpretable

Page 8: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Example:

Page 9: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Proposed Models

Page 10: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Dynamic tomography

• How to model dynamics in a simplex?

Project an individual/stock in network into a "tomographic" space

Trajectory of an individual/stock in the "tomographic" space

Page 11: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Senate Network: role trajectoriesCluster legendJon Corzine’s seat (#28,

Democrat, New Jersey) was taken over by Bob Menendez from t=5

onwards.

Corzine was especially left-wing, so much that his views did not

align with the majority of Democrats (t=1 to 4).

Once Menendez took over, the latent space vector for senator

#28 shifted towards role 4, corresponding to the main Democratic voting clique.

Jon Corzine’s seat (#28, Democrat, New Jersey) was taken over by Bob Menendez from t=5

onwards.

Corzine was especially left-wing, so much that his views did not

align with the majority of Democrats (t=1 to 4).

Once Menendez took over, the latent space vector for senator

#28 shifted towards role 4, corresponding to the main Democratic voting clique.

Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more

consistent with the Republican party.

Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space

vector includes more of role 3, corresponding to the main Republican

voting clique.

This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,

during which a high proportion of Republicans voted for him.

Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more

consistent with the Republican party.

Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space

vector includes more of role 3, corresponding to the main Republican

voting clique.

This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,

during which a high proportion of Republicans voted for him.

Page 12: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Visualization

Page 13: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Visualization

Page 14: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Algorithm Details

Page 15: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Data

Page 16: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Learning System

Given – a network of user/documents

Perform E-step(Gibbs sampling)in parallel way. Get Sufficient Stats

Perform M-stepIn parallel way

Repeat until convergence

Single Program

α, β, η, μα, β, η, μα, β, η, μα, β, η, μ

α, β, η, μα, β, η, μ

zz zz zz zz

Page 17: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Project Plans and Milestones

• Scalable implementation of baseline user text model (M1)

• Discriminative M1

• M1 + network model M2

• M3 + history + time M3

• Parallel work on downstream utility– eCTR prediction– Visualization – User/ads clustering

Page 18: Plans on “Latent Topic Model”. High-Level Architecture Users Ads User Encoding eCTR / FB Prediction User Clustering User Encoding Prediction

Resources

• CMU: – First intern Keisuke will come in mid Oct , implementing

M1– Second intern Qirong Hu will come in later Dec,

implementing M2 and M3

• FB:– Rajat Raina– Rong Yang– System support