multiple domain user personalization

39
Yucheng Low Multiple Domain User Personalization Multiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Upload: enye

Post on 25-Feb-2016

52 views

Category:

Documents


3 download

DESCRIPTION

Multiple Domain User Personalization. Deepak Agarwal Yahoo! Research. Yucheng Low Carnegie Mellon University. Alexander J. Smola Yahoo! Research. Information Flood. Personalization. Golf Reader. Tech. Reader. Can we provide personalization to new users?. One Domain Cold-Start. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Multiple Domain User

PersonalizationDeepak Agarwal

Yahoo! ResearchYucheng Low

Carnegie Mellon UniversityAlexander J. Smola

Yahoo! Research

Page 2: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Information Flood

Page 3: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Personalization

3

Golf Reader Tech. Reader

Can we provide personalization to new

users?

Page 4: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

MoviesUser 1

User 2

Impossible when you have only one domain.Best you can do is to have a good baseline.

One Domain Cold-Start

Page 5: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Movies NewsMusic

Possible when you have many domains.

Multiple Domains Cold Start

Page 6: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Personalization across all domain

Combine tokens from all spaces ignoring the

source domain UserReads Golf News

Watches MTV

Golf, Tiger,Music, Song

Expand token space to include source domain

Golf:1, Tiger:1,Music:2, Song:2

Your FavoritePersonalization

Algorithm

Page 7: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Personalization across all domain

Combine tokens from all spaces ignoring the

source domain UserReads Golf News

Watches MTV

Golf, Tiger,Music, Song

Expand token space to include source domain

Golf:1, Tiger:1,Music:2, Song:2

Your FavoritePersonalization

Algorithm

Domains with more observations will swamp out all other domains

What is a good personalization algorithm that will work for all domains?

Page 8: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

Isolates each domain: Prevents larger domains from swamping out smaller domains.

PersonalizedNews

PersonalizedMusic

Page 9: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

User MovieProfile

Extensible: domains can be added/removed easily

Page 10: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet AllocationBasketbal

l NBA, hoop

Train3-point

Topic 1Golf,

Tiger, Woods, Club, Green, Hole-in-

one

Topic 2Machine,

Learning, Neural,

Network,Train

Topic 3

DocumentTopic 1Topic 2Topic 3

Michael I. Jordan trains a

Neural Network to play golf

2Golf

3Network

Page 11: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

NDocument

1. Each document has a mixture over topics

2. For each word in each document

a)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Page 12: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

NDocument

1.Each document has a mixture over topics

2. For each word in each document

a)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Document

Page 13: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

NDocument

1. Each document has a mixture over topics

2.For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Document

Sample From:

Page 14: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

NDocument

1. Each document has a mixture over topics

2.For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Topic 1: Basketball, Michael, JordanTopic 2: Golf, Tiger, Woods, Club, GreenTopic 3: Machine, Learning, Neural

Page 15: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

NDocument

1. Each document has a mixture over topics

2. For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Topics which make upeach document

Words which make up

each topic

Page 16: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Single Domain Personalization

N

1. Each user has a mixture over topics 2. For each word in each

documenta)Draw a topicb)Draw a word from the topic

A user’s interaction with a domain is a bag of words.A topic is a mixture of words.

User

Words which make up

each topic

Topics each user is interested in

Page 17: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Multiple Domain Personalization

NUser u’s interaction with domain dUser

A user’s interaction with a domain is a bag of words.A topic is a mixture of words.

Each user has a meta-profile:Each domain has a latent matrix:

User’s prior interest in a domain is

Page 18: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Solution Meta-ProfileUser Meta

Profile

User MusicProfile

User NewsProfile

User MovieProfile

Page 19: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Users

Music

News

Movies

Topic->word table

Topic->word table

Topic->word table

Page 20: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

NUser u’s interaction with domain p

LDA

Page 21: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

NUser u’s interaction with domain p

Hold Constant

Sample using LDA Sampler

1: Sample

Hold Constant

Page 22: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

NUser u’s interaction with domain p

Hold Constant

Hold Constant

1: Sample2: Sample

Sample Langevin Diffusion

Page 23: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

NUser u’s interaction with domain p

Optimize

Hold Constant

1: Sample2: Sample 3: Optimize

Hold Constant

LBFGS

Page 24: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Experiments

Page 25: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Experiments @ Yahoo! 2 domain dataset.

Frontpage and News clicks of 5.6 million users. Frontpage/News: Article text for each click.

3 domain dataset: Frontpage, News and MyYahoo clicks of 5.6 million users. MyYahoo: Only has article IDs for each click with no text. Not semantically meaningful.

All user information was anonymized.

Page 26: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Test Protocol

Holdout proportion of users who see more than one domain. Hide one of those domain and try to predict the words.

Prediction metric is cosine similarityBaseline is “mean prediction”.

Page 27: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

ImplementationDistributed implementation in C++ using Memcached for communication.

Alex Smola, Shravan Narayanamurthy “An Architecture for Parallel Topic Models” VLDB 2010

Distributed LBFGS line search: Implement standard MPI-like in Memcached.

BroadcastReduceBarrier

Takes 2-3 days for 500 iterations on 30 machines

Page 28: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

2 Property Sanity Check

Page 29: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

2 Property

Page 30: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

3 Property

Page 31: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

3 Property

Page 32: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

sandra, oscar, oscars, red, carpet, bullock, golden, gown, bullocks, nominee, bestactress, sparkles, stunning,

vienna, bachelor, jake, pavelka, giraldi, finale, show, stars, dancing, love, season, time, abc,

bacteria, fight, super, struggling, developed, doctors, resistant, lethal, virtually, drugs, antibiotic, competitors, chad,

film, movie, movies, films, director, story, avatar, james, time, hollywood, big, make, hes, star,

Frontpage -> NewsCelebrity

Entertainment

Science

Science Fiction

Page 33: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

iphone, apple, app, apps, ipod, google, store, apples, android, mac, mobile, touch, ipad, device, phone,

college, year, earn, years, 000, bestpaid, average, 129, colleges, graduates, ten, alums, schools, actor, likes,

health, care, bill, obama, president, rep, house, republican, senate, news, sen, democrats, fox, congress, reform

drafts, player, nfl, scouts, team, riskiest, peril, bryant, dez, pick, talented, nba, james, news,

News -> Frontpage

home, bank, facing, ceo, gomez, eviction, rosalina, bought, cleaning, foreclosed, client, janitor, offices, surprising, video,,

captured, inside, mountain, terrorist, observers, impresses, alqaidas, complexity, base, features, hideout, size, special, secret, struck,,

Politics Devices

College

Page 34: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Extension

User MetaProfile

User MusicProfile

User NewsProfile

Latent Dirichlet Allocation

Latent Dirichlet Allocation

User MovieProfile

Latent Dirichlet Allocation

Page 35: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Extension

User MetaProfile

User MusicProfile

User NewsProfile

Flexible: Allows different algorithm for each domain

Linear ModelMatrix Factorization

User MovieProfilefLDA

Page 36: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

It Is How You Use It

User MetaProfile

User MusicProfile

Personalized withAlgorithm X

Use the Meta Profile for Initialization.

Page 37: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

It Is How You Use It

User MetaProfile

User MusicProfile

Personalized withAlgorithm X

Periodically Update the Meta Profile and Domain Latent Matrix

Page 38: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

ConclusionAn generic, extensible model for combining domain personalization schemes. Scalable inference procedure that extends to millions of users.Demonstrate strong predictive performance on a large real world data

Page 39: Multiple Domain User Personalization

Yucheng Low Multiple Domain User Personalization

Questions?