graphical models for the internet amr ahmed and alexander smola yahoo research, santa clara, ca
TRANSCRIPT
![Page 1: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/1.jpg)
Graphical Models for the Internet
Amr Ahmed and Alexander SmolaYahoo Research, Santa Clara, CA
![Page 2: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/2.jpg)
Thus far ...
• Motivation• Basic tools– Clustering– Topic Models
• Distributed batch inference– Local and global states– Star synchronization
![Page 3: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/3.jpg)
Up next
• Inference – Online Distributed Sampling– Single machine multi-threaded inference– Online EM and Submodular Selection
• Applications– User tracking for behavioral Targeting– Content understanding– User modeling for content recommendation
![Page 4: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/4.jpg)
4. Online Model
![Page 5: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/5.jpg)
Scenarios• Batch Large-Scale
– Covered in part 1
• Mini-batches– We already have a model– Data arrives in batches– We would like to keep model up-to-data
• Time-sensitive– Data arrives one item at a time– Model should be up-to-data
Time
Time
![Page 6: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/6.jpg)
4.1 Dynamic Clustering
![Page 7: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/7.jpg)
The Chinese Restaurant Process• Allows the number of mixtures to grow with the data• They are called non-parametric models– Means the number of effective parameters grow with data– Still have hyper-parameters that control the rate of growth
• a: how fast a new cluster/mixture is born?• G0: Prior over mixture component parameters
![Page 8: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/8.jpg)
The Chinese Restaurant Process
-For data point xi - Choose table j mj and Sample xi ~ f(fj)- Choose a new table K+1 a
- Sample fK+1 ~ G0 and Sample xi ~ f(fK+1)
Generative Process
f2f1 f3
62
63 6
1
6
The rich gets richer effectCANNOT handle sequential data
![Page 9: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/9.jpg)
Recurrent CRP (RCRP) [Ahmed and Xing 2008]
• Adapts the number of mixture components over time– Mixture components can die out– New mixture components are born at any time– Retained mixture components parameters evolve according
to a Markovian dynamics
![Page 10: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/10.jpg)
The Recurrent Chinese Restaurant Process
f2,1f1,1 f3,1 T=1
Dish eaten at table 3 at time epoch 1OR the parameters of cluster 3 at time epoch 1
-Customers at time T=1 are seated as before:- Choose table j mj,1 and Sample xi ~ f(fj,1)- Choose a new table K+1 a
- Sample fK+1,1 ~ G0 and Sample xi ~ f(fK+1,1)
Generative Process
![Page 11: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/11.jpg)
The Recurrent Chinese Restaurant Process
f2,1f1,1 f3,1 T=1
T=2
f2,1f1,1 f3,1
m'1,1=2 m'2,1=3 m'3,1=1
![Page 12: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/12.jpg)
f2,1f1,1 f3,1 T=1
T=2f2,1f1,1 f3,1
m'1,1=2 m'2,1=3 m'3,1=1
![Page 13: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/13.jpg)
f2,1f1,1 f3,1 T=1
T=2f2,1f1,1 f3,1
62 6
3 61
6
m'1,1=2 m'2,1=3 m'3,1=1
![Page 14: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/14.jpg)
f2,1f1,1 f3,1 T=1
T=2f2,1f1,1 f3,1
62
m'1,1=2 m'2,1=3 m'3,1=1
![Page 15: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/15.jpg)
f2,1f1,1 f3,1 T=1
T=2f2,1f1,2 f3,1
Sample f1,2 ~ P(.| f1,1)
m'1,1=2 m'2,1=3 m'3,1=1
![Page 16: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/16.jpg)
f2,1f1,1 f3,1 T=1
T=2f2,1f1,2 f3,1
16
21 16
3 16
1
16
And so on ……
m'1,1=2 m'2,1=3 m'3,1=1
![Page 17: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/17.jpg)
f2,1f1,1 f3,1 T=1
T=2f2,2f1,2 f3,1
At the end of epoch 2
f4,2
Newly born clusterDied out cluster
m'1,1=2 m'2,1=3 m'3,1=1
![Page 18: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/18.jpg)
f2,1f1,1 f3,1 T=1
T=2f2,2f1,2 f3,1
N1,1=2 N2,1=3 m'3,1=1
f4,2
T=3
f2,2f1,2 f4,2
m'4,2=1m'2,2=2m'1,2=2
![Page 19: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/19.jpg)
Recurrent Chinese Restaurant Process• Can be extended to model higher-order
dependencies• Can decay dependencies over time– Pseudo-counts for table k at time t is
History size
Decay factory Number of customers sitting at table K at time epoch t-h
å=
-
-
÷ø
öçè
æH
hhtk
h
me1
,r
![Page 20: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/20.jpg)
f2,1f1,1 f3,1
T=1
T=2f2,2f1,2 f3,1
m'1,1=2 m'2,1=3 m'3,1=1f4,2
T=3f2,2f1,2 f4,2
å=
-
-
÷ø
öçè
æH
hhtk
h
me1
,r
m'2,3
m'2,3 =
![Page 21: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/21.jpg)
4.2 Online Distributed Inference
Tracking Users Interest
![Page 22: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/22.jpg)
Characterizing User Interests• Short term vs long-term
Jan April July Oct
Music
Housing
Furniture
Buying a car
Travel plans
![Page 23: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/23.jpg)
Characterizing User Interests• Short term vs long-term• Latent
Jan April July Oct
millage
fast
mortgageGaga
seafood
Barcelona
used
![Page 24: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/24.jpg)
Problem formulationInput
• Queries issued by the user or tags of watched content• Snippet of page examined by user• Time stamp of each action (day resolution)
Output
• Users’ daily distribution over interests• Dynamic interest representation• Online and scalable inference• Language independent
FlightLondonHotelweather
SchoolSuppliesLoansemester
classesregistrationhousingrent
![Page 25: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/25.jpg)
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
Input
• Queries issued by the user or tags of watched content• Snippet of page examined by user• Time stamp of each action (day resolution)
Output
• Users’ daily distribution over interests• Dynamic interest representation• Online and scalable inference• Language independent
![Page 26: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/26.jpg)
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a financing ad?
![Page 27: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/27.jpg)
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a financing ad?
![Page 28: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/28.jpg)
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a financing ad?
![Page 29: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/29.jpg)
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a hotel ad?
![Page 30: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/30.jpg)
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a hotel ad?
![Page 31: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/31.jpg)
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
Input
• Queries issued by the user or tags of watched content• Snippet of page examined by user• Time stamp of each action (day resolution)
Output
• Users’ daily distribution over interests• Dynamic interest representation• Online and scalable inference• Language independent
![Page 32: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/32.jpg)
Mixed-Membership Formulation
job Career
BusinessAssistant
HiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Diet Cars Job Finance
Objects
Mixtures
Degree ofmembership
Job Hiring speed price
part-time CamryCareer opening bonus package
card diet caloriesloan recipe milk
Weight lb kg
![Page 33: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/33.jpg)
In Graphical Notation
![Page 34: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/34.jpg)
In Polya-Urn Representation
• Collapse multinomial variables:• Fixed-dimensional Hierarchal Polya-Urn
representation– Chinese restaurant franchise
xx
![Page 35: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/35.jpg)
Food Chicken
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Car speed offercamry accord career
Global topics trends
Topic word-distributions
User-specific topics trends(mixing-vector)
User interactions: queries, keyword from pages viewed
![Page 36: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/36.jpg)
Food Chicken
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 37: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/37.jpg)
Food Chickenpizza
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 38: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/38.jpg)
Food Chickenpizza
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 39: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/39.jpg)
Food Chickenpizza hiring
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 40: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/40.jpg)
Food Chickenpizza millage
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample from word the new topic word-
distribution
Generative Process
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 41: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/41.jpg)
Food Chickenpizza millage
Car speed offercamry accord career
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Problems- Static Model- Does not evolve user’s interests- Does not evolve the global trend of interests- Does not evolve interest’s distribution over terms
![Page 42: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/42.jpg)
Food Chickenpizza millage
Car speed offercamry accord career
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
At time t At time t+1
Build a dynamic model
Connect each level using a RCRP
![Page 43: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/43.jpg)
At time t At time t+1 At time t+2 At time t+3
User 1process
User 2process
User 3process
Globalprocess
mm'
nn'
Which time kernel to use at each level?
![Page 44: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/44.jpg)
Pseudo counts = *
Decay factor
Food Chickenpizza millage
Car speed offercamry accord career
At time t
Observation 1-Popular topics at time t are likely to be popular at time t+1- fk,t+1 is likely to smoothly evolve from fk,t
At time t+1job
CareerBusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 45: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/45.jpg)
Food Chickenpizza millage
Car speed offercamry accord career
At time t At time t+1job
CareerBusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Observation 1-Popular topics at time t are likely to be popular at time t+1- fk,t+1 is likely to smoothly evolve from fk,t
CarAltimaAccordBookKelleyPricesSmallSpeed
fk,t+1 ~ Dir(bk,t+1)fk,t
CarBlueBookKelleyPricesSmallSpeedlarge
Intuition
Captures current trend of the car industry
(new release for e.g.)
~
![Page 46: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/46.jpg)
Food Chickenpizza millage
Car speed offercamry accord career
At time t At time t+1
Observation 2- User prior at time t+1 is a mixture of the user short and long term interest
How do we get a prior that captures both long
and short term interest?
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarAltimaAccord
BlueBookKelleyPricesSmallSpeed
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 47: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/47.jpg)
All
week
job Career
BusinessAssistant
HiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
month
Time t t+1
FoodChickenpizza
recipejobhiring
Part-timeOpeningsalary
foodchickenPizzamillage
Kellyrecipecuisine
Diet Cars Job Finance
Prior for user actions at time t
μ
μ2
μ3
Long-termshort-term
![Page 48: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/48.jpg)
Food ChickenPizza millage
Car speed offercamry accord career
At time t At time t+1
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
short-termpriors
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarAltimaAccord
BlueBookKelleyPricesSmallSpeed
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
![Page 49: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/49.jpg)
Polya-UrnRCRF Process
?
![Page 50: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/50.jpg)
Simplified Graphical Model
~
~
~
~
~
~
At time t At time t+1
![Page 51: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/51.jpg)
Simplified Graphical Model
~
~
~
~
~
~
CarAltimaAccordBookKelleyPricesSmallSpeed
CarBlueBookKelleyPricesSmallSpeedlarge
![Page 52: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/52.jpg)
Simplified Graphical Model
~
~
~
~
~
~
Food ChickenPizza millage
![Page 53: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/53.jpg)
Simplified Graphical Model
~
~
~
~
~
~
![Page 54: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/54.jpg)
~
~
~
~
~
~
Simplified Graphical Model
Topics evolve over time?
User’s intent evolve over time?
Capture long and term interests of users?
![Page 55: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/55.jpg)
User 1process
User 2process
User 3process
Globalprocess
At time t At time t+1 At time t+2 At time t+3
mm'
nn'
![Page 56: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/56.jpg)
4.2 Online Distributed Inference
Work Flow
![Page 57: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/57.jpg)
Work Flowtoday
CurrentUsers’ models
Systemstate
newUsers’ models
User interactions
User interactions
User interactions
User interactions
User interactions
Daily Update
(inference)
Hundred of millions
![Page 58: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/58.jpg)
~
~
~
~
~
~
Online Scalable Inference• Online algorithm– Greedy 1-particle filtering algorithm– Works well in practice – Collapse all multinomials except Ωt
• This makes distributed inference easier
– At each time t:
• Distributed scalable implementation– Used first part architecture as a subroutine– Added synchronous sampling capabilities
![Page 59: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/59.jpg)
Distributed Inference (at time t)
W
f
q
z
w
![Page 60: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/60.jpg)
clientclient
Distributed Inference (at time t)
q
z
w
q
z
w
W
f
Collapse all multinomial
![Page 61: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/61.jpg)
After collapsing
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
client client
Car speed offercamry accord career
Use Star-Synchronization
![Page 62: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/62.jpg)
Fully Collapsed
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
Shared memory
client client
Car speed offercamry accord career
![Page 63: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/63.jpg)
Distributed Inference (at time t)
Food ChickenPizza millage
client client
Car speed offercamry accord career
Local trendGlobal trend Topic factor
![Page 64: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/64.jpg)
Semi-Collapsed
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
Shared memory
client client
Car speed offercamry accord career
![Page 65: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/65.jpg)
Semi-Collapsed
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
Shared memory
client client
ΩtΩtΩt
Car speed offercamry accord career
![Page 66: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/66.jpg)
Semi-Collapsed
Food ChickenPizza millage
client client
ΩtΩt
Car speed offercamry accord career
![Page 67: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/67.jpg)
Distributed Sampling Cycle
Sample ZFor users
Sample ZFor users
Sample ZFor users
Sample ZFor users
Barrier
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Collect counts and sample W
Do nothing Do nothing Do nothing
Barrier
Read W from memcached
Read Wfrom
memcached
Read Wfrom
memcached
Read Wfrom
memcached
Sample Ωt
Requires a reduction step
![Page 68: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/68.jpg)
Distributed Sampling Cycle
Sample ZFor users
Sample ZFor users
Sample ZFor users
Sample ZFor users
Barrier
Write counts Write counts Write counts Write counts
Collect counts and sample W
Do nothing Do nothing Do nothing
Barrier
Read W from Read Wfrom
Read Wfrom
Read Wfrom
![Page 69: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/69.jpg)
Sampling W
• Introduce auxiliary variable mkt
– How many times the global distribution was visited – ~ AnotniaK
W
m
![Page 70: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/70.jpg)
Distributed Sampling Cycle
Sample ZFor users
Sample ZFor users
Sample ZFor users
Sample ZFor users
Barrier
Write counts Write counts Write counts Write counts
Collect counts and sample W
Do nothing Do nothing Do nothing
Barrier
Read W from Read Wfrom
Read Wfrom
Read Wfrom
![Page 71: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/71.jpg)
4.2 Online Distributed Inference
Behavioral Targeting
![Page 72: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/72.jpg)
• Tasks is predicting convergence in display advertising• Use two datasets– 6 weeks of user history– Last week responses to Ads are used for testing
• Baseline:– User raw data as features– Static topic model
Experimental Results
![Page 73: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/73.jpg)
InterpretabilityUser-1 User-2
![Page 74: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/74.jpg)
Performance in Display Advertising
ROC
Number of conversions
![Page 75: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/75.jpg)
Performance in Display Advertising
Weighted ROC measure
StaticBatch modelsEffect of number of topics
![Page 76: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/76.jpg)
How Does It Scale?
2 Billion instances with 5M vocabularyusing 1000 machines
one iteration took ~ 3.8 minutes
![Page 77: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/77.jpg)
Distributed Inference Revisited
![Page 78: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/78.jpg)
To collapse or not to collapse?• Not collapsing– Keeps conditional independence
• Good for parallelization• Requires synchronous sampling
– Might mix slowly
• Collapsing – Mixes faster– Hinder parallelism– Use star-synchronization
• Works well if sibling depends on each others via aggregates
• Requires asynchronous communication
![Page 79: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/79.jpg)
Inference Primitive
• Collapse a variable– Star synchronization for the sufficient statistics
• Sampling a variable– Local • Sample it locally (possibly using the synchronized statistics)
– Shared• Synchronous sampling using a barrier
• Optimizing a variable– Same as in the shared variable case– Ex. Conditional topic models
![Page 80: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/80.jpg)
Online Models• Batch Large-Scale
– Covered in part 1
• Mini-batches– We already have a model– Data arrives in batches– We would like to keep model up-to-data
• Time-sensitive– Data arrives one item at a time– Model should be up-to-data
Time
Time
![Page 81: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/81.jpg)
What Is Coming?
• Inference – Online Distributed Sampling– Single machine multi-threaded inference– Online EM and Submodular Selection
• Applications– User tracking for behavioral Targeting– Content understanding– User modeling for content recommendation
![Page 82: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/82.jpg)
4.2 Scalable SMC Inference
Storylines
![Page 83: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/83.jpg)
News Stream
![Page 84: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/84.jpg)
News Stream
• Realtime news stream – Multiple sources (Reuters, AP, CNN, ...)– Same story from multiple sources– Stories are related
• Goals– Aggregate articles into a storyline– Analyze the storyline (topics, entities)
• How does the story develop over time?• Who are the main entities?• What topics are addressed?
![Page 85: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/85.jpg)
A Unified Model• Jointly solves the three main tasks– Clustering, – Classification – Analysis
• Building blocks– A Topic model
• High-level concepts (unsupervised classification)
– Dynamic clustering (RCRP)• Discover tightly-focused concepts
– Named entities– Story developments
![Page 86: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/86.jpg)
Infinite Dynamic Cluster-Topic HybridSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttack
runman
grouparrested
move
UEFA-soccer
ChampionsGoalCoachStrikerMidfieldpenalty
Juventus AC Milan Lazio RonaldoLyon
g
Tax-Bill
TaxBillionCutPlanBudgetEconomy
BushSenateFleischerWhite HouseRepublican
![Page 87: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/87.jpg)
Infinite Dynamic Cluster-Topic HybridSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttack
runman
grouparrested
move
UEFA-soccer
ChampionsGoalCoachStrikerMidfieldpenalty
Juventus AC Milan Lazio RonaldoLyon
g
Tax-Bill
TaxBillionCutPlanBudgetEconomy
BushSenateFleischerWhite HouseRepublican
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
![Page 88: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/88.jpg)
The Graphical Model
• Topic model• Topics per cluster• RCRP for cluster• Hierarchical DP for
article• Separate model for
named entities• Story specific
correction
![Page 89: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/89.jpg)
4.2 Fast SMC Inference
Inference via SMC
![Page 90: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/90.jpg)
Online Inference Algorithm
• A Particle filtering algorithm• Each particle maintains a hypothesis
– What are the stories– Document-story associations– Topic-word distributions
• Collapsed sampling– Sample (zd,sd) only for each document
![Page 91: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/91.jpg)
Particle Filter Representation
![Page 92: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/92.jpg)
Fold the document into the structure of each filter
- s and z are tightly coupled- Alternatives
- Sample s then sample z (high variance)
w w w w w wentitiesDocument td
z z z z z zs
![Page 93: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/93.jpg)
Fold the document into the structure of each filter
- s and z are tightly coupled- Alternatives
- Sample s then sample z (high variance)- Sample z then sample s (doesn’t make sense)
w w w w w wentitiesDocument td
z z z z z z s
![Page 94: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/94.jpg)
Fold the document into the structure of each filter
- s and z are tightly coupled- Alternatives
- Sample s then sample z (high variance)- Sample z then sample s (doesn’t make sense)
- Idea - Run a few iterations of MCMC over s and z- Take last sample as the proposed value
![Page 95: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/95.jpg)
How good each filter look now?
![Page 96: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/96.jpg)
Get rid of bad filterReplicate good one
![Page 97: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/97.jpg)
Get rid of bad filterReplicate good one
![Page 98: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/98.jpg)
Efficient Computation and Storage• Particles get replicated
1
State P1
1 2
State P2State P1
1 23
State P2State P3 State P1
![Page 99: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/99.jpg)
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree
1
State P1
1 2
State P2State P1
1 23
State P2State P3 State P1
![Page 100: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/100.jpg)
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
1
state
1
![Page 101: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/101.jpg)
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
Root
state
2
state
1
State1
1 2
![Page 102: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/102.jpg)
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
Root
3
state
1
2
State State
state
State1
1 2
1 23
![Page 103: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/103.jpg)
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
– Inverted representation for fast lookup
Root
3
India: story 5Pakistan: story 1
1
2
(empty) Congress: story 2
Bush: story 2India: story 3
India: story 1, US: story 2, story 3
1
1 2
1 23
![Page 104: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/104.jpg)
Efficient Computation and Storage
• Why this is useful?
• Only focus on stories that mention at least one entity– Otherwise pre-compute and reuse
• We can use fast samplers for z as well [Yao et. Al. KDD09]
![Page 105: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/105.jpg)
Experiments
• Yahoo! News datasets over two months– Three sub-sampled sets with different characteristics
• Editorially-labeled documents– Cannot-like and must-link pairs
• Performance measures using clustering accuracy• Baseline– A strong offline Correlation clustering algorithm [WSDM 11]
• Scaled with LSH to compute neighborhood graph (similar to Petrovic 2010)
![Page 106: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/106.jpg)
Structured BrowsingSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttach
runman
grouparrested
move
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
UEFA-soccer
ChampionsGoalLegCoachStrikerMidfieldpenalty
Juventus AC Milan Real Madrid Milan Lazio RonaldoLyon
Tax-bills
TaxBillionCutPlanBudgetEconomylawmakers
BushSenateUSCongressFleischerWhite HouseRepublican
![Page 107: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/107.jpg)
Structured Browsing
More Like India-Pakistan storyMore Like India-Pakistan story
Middle-east-conflict
PeaceRoadmapSuicideViolenceSettlementsbombing
Israel PalestinianWest bankSharonHamasArafat
Based on topics
Nuclear programs
Nuclearsummitwarningpolicymissileprogram
North KoreaSouth KoreaU.SBushPyongyang
Nuclear+ topics [politics]
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
![Page 108: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/108.jpg)
Structured BrowsingSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttach
runman
grouparrested
move
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
UEFA-soccer
ChampionsGoalLegCoachStrikerMidfieldpenalty
Juventus AC Milan Real Madrid Milan Lazio RonaldoLyon
Tax-bills
TaxBillionCutPlanBudgetEconomylawmakers
BushSenateUSCongressFleischerWhite HouseRepublican
More on Personalization later on the talk
![Page 109: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/109.jpg)
Quantitative Evaluation
Number of topics = 100
Effect of number of topics
![Page 110: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/110.jpg)
Scalability
![Page 111: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/111.jpg)
Model Contribution
• Named entities are very important• Removing time increase processing up to 2
seconds per document
![Page 112: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/112.jpg)
Putting Things Together
![Page 113: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/113.jpg)
Time vs. Machines
• Data arrives dynamically• How to keep models up to date?
Batch Mini-batches Truly online
Single-Machine GibbsVariational
Online-LDA SMC
Multi-Machine Star-Synch. Star-Synch + Synchronous step
?
![Page 114: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/114.jpg)
4.3 User Preference
Online EM and Submodularity
![Page 115: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/115.jpg)
Storyline Summarization
• How to summarize a storyline with few articles?
Earthquake & Tsunami Nuclear PowerRescue & Relief Economy
Time
![Page 116: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/116.jpg)
Storyline SummarizationEarthquake & Tsunami Nuclear PowerRescue & Relief Economy
Time
• How to summarize a storyline with few articles?• How to personalize the summary?
![Page 117: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/117.jpg)
Storyline SummarizationEarthquake & Tsunami Nuclear PowerRescue & Relief Economy
Time
• How to summarize a storyline with few articles?• How to personalize the summary?
![Page 118: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/118.jpg)
User Interaction
• Passive– We observe the user generated contents– Model user based on those content using unsupervised techniques
• Explicit– We present users with content– User give explicit feedback
• Like/dislike
– Learn user preference using supervised techniques• Implicit
– Mixture between the two– Present the user with items– Observe which items the user interact with– Learning user preference using semi-supervised models
![Page 119: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/119.jpg)
User Satisfaction
• Modular– Present users with items she prefers
• Regardless of the context
– Targets relevance– Ex: vector space models
• Submodular– More of the same thing is not always better
• Dimensioning return
– Targets diversity – Ex: TDN [ElArini et. Al. KDD 09]
![Page 120: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/120.jpg)
Sequential Click-View Model
Modeling Views based on position
![Page 121: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/121.jpg)
Sequential Click-View Model
Threshold Information gain#clicks
Modeling clicks using position and information gain
Coverage function’s weights are learnt
![Page 122: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/122.jpg)
Sequential Click-View Model
Selected Summary
Story
Features
Modular
Submodular
![Page 123: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/123.jpg)
Online Inference
• Treat missing views as hidden variables– Realistic interaction model
• Use the online EM algorithm– Infer the value of hidden variables
• Optimize parameters using SGD– Use additive weights• Background + story + category + user
![Page 124: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/124.jpg)
Online Inference
![Page 125: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/125.jpg)
How Does it Work?
![Page 126: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/126.jpg)
How Does It Work?
![Page 127: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/127.jpg)
5. Summary Future Directions
![Page 128: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/128.jpg)
Summary• Tools
– Load distribution, balancing and synchronization– Clustering, Topic Models
• Models– Dynamic non-parametric models– Sequential latent variable models
• Inference Algorithms– Distributed batch– Sequential Monte Carlo
• Applications– User profiling– News content analysis & recommendation
![Page 129: Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA](https://reader036.vdocuments.us/reader036/viewer/2022062511/5519495155034679738b46ec/html5/thumbnails/129.jpg)
Future Directions
• Theoretical bounds and guarantees• Network data– Graph partitioning
• Non-parametric models– Learning structure from data
• Working under communication constraints• Data distribution for particle filters