graphical models for the internet
DESCRIPTION
Graphical Models for the Internet. Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA. Thus far . Motivation Basic tools Clustering Topic Models Distributed batch inference Local and global states Star synchronization. Up next. Inference Online Distributed Sampling - PowerPoint PPT PresentationTRANSCRIPT
Graphical Models for the Internet
Amr Ahmed and Alexander SmolaYahoo Research, Santa Clara, CA
Thus far ...• Motivation• Basic tools– Clustering– Topic Models
• Distributed batch inference– Local and global states– Star synchronization
Up next• Inference – Online Distributed Sampling– Single machine multi-threaded inference– Online EM and Submodular Selection
• Applications– User tracking for behavioral Targeting– Content understanding– User modeling for content recommendation
4. Online Model
Scenarios• Batch Large-Scale
– Covered in part 1
• Mini-batches– We already have a model– Data arrives in batches– We would like to keep model up-to-data
• Time-sensitive– Data arrives one item at a time– Model should be up-to-data
Time
Time
4.1 Dynamic Clustering
The Chinese Restaurant Process• Allows the number of mixtures to grow with the data• They are called non-parametric models– Means the number of effective parameters grow with data– Still have hyper-parameters that control the rate of growth
• a: how fast a new cluster/mixture is born?• G0: Prior over mixture component parameters
The Chinese Restaurant Process
-For data point xi - Choose table j mj and Sample xi ~ f(fj)- Choose a new table K+1 a
- Sample fK+1 ~ G0 and Sample xi ~ f(fK+1)
Generative Process
f2f1 f3
a62
a63 a6
1
aa6
The rich gets richer effectCANNOT handle sequential data
Recurrent CRP (RCRP) [Ahmed and Xing 2008]
• Adapts the number of mixture components over time– Mixture components can die out– New mixture components are born at any time– Retained mixture components parameters evolve according
to a Markovian dynamics
The Recurrent Chinese Restaurant Process
f2,1f1,1 f3,1 T=1
Dish eaten at table 3 at time epoch 1OR the parameters of cluster 3 at time epoch 1
-Customers at time T=1 are seated as before:- Choose table j mj,1 and Sample xi ~ f(fj,1)- Choose a new table K+1 a
- Sample fK+1,1 ~ G0 and Sample xi ~ f(fK+1,1)
Generative Process
The Recurrent Chinese Restaurant Process
f2,1f1,1 f3,1 T=1
T=2
f2,1f1,1 f3,1
m'1,1=2 m'2,1=3 m'3,1=1
f2,1f1,1 f3,1 T=1
T=2f2,1f1,1 f3,1
m'1,1=2 m'2,1=3 m'3,1=1
f2,1f1,1 f3,1 T=1
T=2f2,1f1,1 f3,1
a62 a6
3 a61
aa6
m'1,1=2 m'2,1=3 m'3,1=1
f2,1f1,1 f3,1 T=1
T=2f2,1f1,1 f3,1
a62
m'1,1=2 m'2,1=3 m'3,1=1
f2,1f1,1 f3,1 T=1
T=2f2,1f1,2 f3,1
Sample f1,2 ~ P(.| f1,1)
m'1,1=2 m'2,1=3 m'3,1=1
f2,1f1,1 f3,1 T=1
T=2f2,1f1,2 f3,1
a1621 a16
3 a161
aa16
And so on ……
m'1,1=2 m'2,1=3 m'3,1=1
f2,1f1,1 f3,1 T=1
T=2f2,2f1,2 f3,1
At the end of epoch 2
f4,2
Newly born clusterDied out cluster
m'1,1=2 m'2,1=3 m'3,1=1
f2,1f1,1 f3,1 T=1
T=2f2,2f1,2 f3,1
N1,1=2 N2,1=3 m'3,1=1
f4,2
T=3
f2,2f1,2 f4,2
m'4,2=1m'2,2=2m'1,2=2
Recurrent Chinese Restaurant Process• Can be extended to model higher-order
dependencies• Can decay dependencies over time– Pseudo-counts for table k at time t is
History size
Decay factory Number of customers sitting at table K at time epoch t-h
å=
-
-
÷øö
çèæH
hhtk
h
me1
,r
f2,1f1,1 f3,1
T=1
T=2f2,2f1,2 f3,1
m'1,1=2 m'2,1=3 m'3,1=1f4,2
T=3f2,2f1,2 f4,2
å=
-
-
÷øö
çèæH
hhtk
h
me1
,r
m'2,3
m'2,3 =
4.2 Online Distributed Inference
Tracking Users Interest
Characterizing User Interests• Short term vs long-term
Jan April July Oct
Music
Housing
Furniture
Buying a car
Travel plans
Characterizing User Interests• Short term vs long-term• Latent
Jan April July Oct
millage
fast
mortgageGaga
seafood
Barcelona
used
Problem formulationInput
• Queries issued by the user or tags of watched content• Snippet of page examined by user• Time stamp of each action (day resolution)
Output
• Users’ daily distribution over interests• Dynamic interest representation• Online and scalable inference• Language independent
FlightLondonHotelweather
SchoolSuppliesLoansemester
classesregistrationhousingrent
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
Input
• Queries issued by the user or tags of watched content• Snippet of page examined by user• Time stamp of each action (day resolution)
Output
• Users’ daily distribution over interests• Dynamic interest representation• Online and scalable inference• Language independent
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a financing ad?
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a financing ad?
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a financing ad?
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a hotel ad?
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
When to show a hotel ad?
Problem formulation
FlightLondonHotelweather
Travel
BackTo school
financeSchoolSuppliesLoansemester
classesregistrationhousingrent
Input
• Queries issued by the user or tags of watched content• Snippet of page examined by user• Time stamp of each action (day resolution)
Output
• Users’ daily distribution over interests• Dynamic interest representation• Online and scalable inference• Language independent
Mixed-Membership Formulation
job Career
BusinessAssistant
HiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Diet Cars Job Finance
Objects
Mixtures
Degree ofmembership
Job Hiring speed price
part-time CamryCareer opening bonus package
card diet caloriesloan recipe milk
Weight lb kg
In Graphical Notation
In Polya-Urn Representation
• Collapse multinomial variables:• Fixed-dimensional Hierarchal Polya-Urn
representation– Chinese restaurant franchise
xx
Food Chicken
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Car speed offercamry accord career
Global topics trends
Topic word-distributions
User-specific topics trends(mixing-vector)
User interactions: queries, keyword from pages viewed
Food Chicken
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Food Chickenpizza
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Food Chickenpizza
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Food Chickenpizza hiring
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
………
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Food Chickenpizza millage
Car speed offercamry accord career
• For each user interaction• Choose an intent from local distribution
• Sample word from topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample from word the new topic word-
distribution
Generative Process
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Food Chickenpizza millage
Car speed offercamry accord career
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Problems- Static Model- Does not evolve user’s interests- Does not evolve the global trend of interests- Does not evolve interest’s distribution over terms
Food Chickenpizza millage
Car speed offercamry accord career
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
At time t At time t+1
Build a dynamic model
Connect each level using a RCRP
At time t At time t+1 At time t+2 At time t+3
User 1process
User 2process
User 3process
Globalprocess
mm'
nn'
Which time kernel to use at each level?
Pseudo counts = *
Decay factor
Food Chickenpizza millage
Car speed offercamry accord career
At time t
Observation 1-Popular topics at time t are likely to be popular at time t+1- fk,t+1 is likely to smoothly evolve from fk,t
At time t+1job
CareerBusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Food Chickenpizza millage
Car speed offercamry accord career
At time t At time t+1job
CareerBusinessAssistant
HiringPart-timeReceptio
nist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Observation 1-Popular topics at time t are likely to be popular at time t+1- fk,t+1 is likely to smoothly evolve from fk,t
CarAltimaAccordBookKelleyPricesSmallSpeed
fk,t+1 ~ Dir(bk,t+1)fk,t
CarBlueBookKelleyPricesSmallSpeedlarge
Intuition
Captures current trend of the car industry
(new release for e.g.)
~
Food Chickenpizza millage
Car speed offercamry accord career
At time t At time t+1
Observation 2- User prior at time t+1 is a mixture of the user short and long term interest
How do we get a prior that captures both long
and short term interest?
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarAltimaAccord
BlueBookKelleyPricesSmallSpeed
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
All
week
job Career
BusinessAssistant
HiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
month
Time t t+1
FoodChickenpizza
recipejobhiring
Part-timeOpeningsalary
foodchickenPizzamillage
Kellyrecipecuisine
Diet Cars Job Finance
Prior for user actions at time t
μ
μ2
μ3
Long-termshort-term
Food ChickenPizza millage
Car speed offercamry accord career
At time t At time t+1
• For each user interaction• Choose an intent from local distribution
• Sample word from the topic’s word-distribution
• Choose a new intent l • Sample a new intent from the global distribution• Sample word from the new topic word-
distribution
Generative Process
short-termpriors
job Career
BusinessAssistant
HiringPart-timeReceptio
nist
CarAltimaAccord
BlueBookKelleyPricesSmallSpeed
BankOnlineCreditCarddebt
portfolioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilk
ButterPowder
Polya-UrnRCRF Process
?
Simplified Graphical Model
~
~
~
~
~
~
At time t At time t+1
Simplified Graphical Model
~
~
~
~
~
~
CarAltimaAccordBookKelleyPricesSmallSpeed
CarBlueBookKelleyPricesSmallSpeedlarge
Simplified Graphical Model
~
~
~
~
~
~
Food ChickenPizza millage
Simplified Graphical Model
~
~
~
~
~
~
~
~
~
~
~
~
Simplified Graphical Model
Topics evolve over time?
User’s intent evolve over time?
Capture long and term interests of users?
User 1process
User 2process
User 3process
Globalprocess
At time t At time t+1 At time t+2 At time t+3
mm'
nn'
4.2 Online Distributed Inference
Work Flow
Work Flowtoday
CurrentUsers’ models
Systemstate
newUsers’ models
User interactions
User interactions
User interactions
User interactions
User interactions
Daily Update
(inference)
Hundred of millions
~
~
~
~
~
~
Online Scalable Inference• Online algorithm– Greedy 1-particle filtering algorithm– Works well in practice – Collapse all multinomials except Ωt
• This makes distributed inference easier– At each time t:
• Distributed scalable implementation– Used first part architecture as a subroutine– Added synchronous sampling capabilities
Distributed Inference (at time t)
W
f
q
z
w
clientclient
Distributed Inference (at time t)
q
z
w
q
z
w
W
f
Collapse all multinomial
After collapsing
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
client client
Car speed offercamry accord career
Use Star-Synchronization
Fully Collapsed
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
Shared memory
client client
Car speed offercamry accord career
Distributed Inference (at time t)
Food ChickenPizza millage
client client
Car speed offercamry accord career
Local trendGlobal trend Topic factor
Semi-Collapsed
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
Shared memory
client client
Car speed offercamry accord career
Semi-Collapsed
Food ChickenPizza millage
job CareerBusines
sAssistan
tHiringPart-time
Receptionist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
portfolio
Finance
Chase
RecipeChocol
atePizzaFood
Chicken
MilkButterPowde
r
Shared memory
client client
ΩtΩtΩt
Car speed offercamry accord career
Semi-Collapsed
Food ChickenPizza millage
client client
ΩtΩt
Car speed offercamry accord career
Distributed Sampling Cycle
Sample ZFor users
Sample ZFor users
Sample ZFor users
Sample ZFor users
Barrier
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Collect counts and sample W Do nothing Do nothing Do nothing
Barrier
Read W from memcached
Read W from
memcached
Read W from
memcached
Read W from
memcached
Sample Ωt
Requires a reduction step
Distributed Sampling Cycle
Sample ZFor users
Sample ZFor users
Sample ZFor users
Sample ZFor users
Barrier
Write counts Write counts Write counts Write counts
Collect counts and sample W Do nothing Do nothing Do nothing
Barrier
Read W from Read W from
Read W from
Read W from
Sampling W
• Introduce auxiliary variable mkt
– How many times the global distribution was visited – ~ AnotniaK
W
m
Distributed Sampling Cycle
Sample ZFor users
Sample ZFor users
Sample ZFor users
Sample ZFor users
Barrier
Write counts Write counts Write counts Write counts
Collect counts and sample W Do nothing Do nothing Do nothing
Barrier
Read W from Read W from
Read W from
Read W from
4.2 Online Distributed Inference
Behavioral Targeting
• Tasks is predicting convergence in display advertising• Use two datasets– 6 weeks of user history– Last week responses to Ads are used for testing
• Baseline:– User raw data as features– Static topic model
Experimental Results
InterpretabilityUser-1 User-2
Performance in Display Advertising
ROC
Number of conversions
Performance in Display Advertising
Weighted ROC measure
StaticBatch modelsEffect of number of topics
How Does It Scale?
2 Billion instances with 5M vocabularyusing 1000 machines
one iteration took ~ 3.8 minutes
Distributed Inference Revisited
To collapse or not to collapse?• Not collapsing– Keeps conditional independence
• Good for parallelization• Requires synchronous sampling
– Might mix slowly
• Collapsing – Mixes faster– Hinder parallelism– Use star-synchronization
• Works well if sibling depends on each others via aggregates
• Requires asynchronous communication
Inference Primitive• Collapse a variable– Star synchronization for the sufficient statistics
• Sampling a variable– Local • Sample it locally (possibly using the synchronized statistics)
– Shared• Synchronous sampling using a barrier
• Optimizing a variable– Same as in the shared variable case– Ex. Conditional topic models
Online Models• Batch Large-Scale
– Covered in part 1
• Mini-batches– We already have a model– Data arrives in batches– We would like to keep model up-to-data
• Time-sensitive– Data arrives one item at a time– Model should be up-to-data
Time
Time
What Is Coming?• Inference – Online Distributed Sampling– Single machine multi-threaded inference– Online EM and Submodular Selection
• Applications– User tracking for behavioral Targeting– Content understanding– User modeling for content recommendation
4.2 Scalable SMC Inference
Storylines
News Stream
News Stream• Realtime news stream – Multiple sources (Reuters, AP, CNN, ...)– Same story from multiple sources– Stories are related
• Goals– Aggregate articles into a storyline– Analyze the storyline (topics, entities)
• How does the story develop over time?• Who are the main entities?• What topics are addressed?
A Unified Model• Jointly solves the three main tasks– Clustering, – Classification – Analysis
• Building blocks– A Topic model
• High-level concepts (unsupervised classification)– Dynamic clustering (RCRP)
• Discover tightly-focused concepts– Named entities– Story developments
Infinite Dynamic Cluster-Topic HybridSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttack
runman
grouparrested
move
UEFA-soccer
ChampionsGoalCoachStrikerMidfieldpenalty
Juventus AC Milan Lazio RonaldoLyon
g
Tax-Bill
TaxBillionCutPlanBudgetEconomy
BushSenateFleischerWhite HouseRepublican
Infinite Dynamic Cluster-Topic HybridSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttack
runman
grouparrested
move
UEFA-soccer
ChampionsGoalCoachStrikerMidfieldpenalty
Juventus AC Milan Lazio RonaldoLyon
g
Tax-Bill
TaxBillionCutPlanBudgetEconomy
BushSenateFleischerWhite HouseRepublican
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
The Graphical Model
• Topic model• Topics per cluster• RCRP for cluster• Hierarchical DP for
article• Separate model for
named entities• Story specific
correction
4.2 Fast SMC Inference
Inference via SMC
Online Inference Algorithm• A Particle filtering algorithm• Each particle maintains a hypothesis
– What are the stories– Document-story associations– Topic-word distributions
• Collapsed sampling– Sample (zd,sd) only for each document
Particle Filter Representation
Fold the document into the structure of each filter
- s and z are tightly coupled- Alternatives
- Sample s then sample z (high variance)
w w w w w wentitiesDocument td
z z z z z zs
Fold the document into the structure of each filter
- s and z are tightly coupled- Alternatives
- Sample s then sample z (high variance)- Sample z then sample s (doesn’t make sense)
w w w w w wentitiesDocument td
z z z z z z s
Fold the document into the structure of each filter
- s and z are tightly coupled- Alternatives
- Sample s then sample z (high variance)- Sample z then sample s (doesn’t make sense)
- Idea - Run a few iterations of MCMC over s and z- Take last sample as the proposed value
How good each filter look now?
Get rid of bad filterReplicate good one
Get rid of bad filterReplicate good one
Efficient Computation and Storage• Particles get replicated
1
State P1
1 2
State P2State P1
1 23
State P2State P3 State P1
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree
1
State P1
1 2
State P2State P1
1 23
State P2State P3 State P1
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
1
state
1
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
Root
state
2state
1
State1
1 2
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
Root
3
state
1
2
State State
state
State1
1 2
1 23
Efficient Computation and Storage• Particles get replicated– Use thread-safe Inheritance tree [extends Canini et. Al 2009]
– Inverted representation for fast lookup
Root
3
India: story 5Pakistan: story 1
1
2
(empty) Congress: story 2
Bush: story 2India: story 3
India: story 1, US: story 2, story 3
1
1 2
1 23
Efficient Computation and Storage• Why this is useful?
• Only focus on stories that mention at least one entity– Otherwise pre-compute and reuse
• We can use fast samplers for z as well [Yao et. Al. KDD09]
Experiments
• Yahoo! News datasets over two months– Three sub-sampled sets with different characteristics
• Editorially-labeled documents– Cannot-like and must-link pairs
• Performance measures using clustering accuracy• Baseline– A strong offline Correlation clustering algorithm [WSDM 11]
• Scaled with LSH to compute neighborhood graph (similar to Petrovic 2010)
Structured BrowsingSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttach
runman
grouparrested
move
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
UEFA-soccer
ChampionsGoalLegCoachStrikerMidfieldpenalty
Juventus AC Milan Real Madrid Milan Lazio RonaldoLyon
Tax-bills
TaxBillionCutPlanBudgetEconomylawmakers
BushSenateUSCongressFleischerWhite HouseRepublican
Structured Browsing
More Like India-Pakistan story
Middle-east-conflict
PeaceRoadmapSuicideViolenceSettlementsbombing
Israel PalestinianWest bankSharonHamasArafat
Based on topics
Nuclear programs
Nuclearsummitwarningpolicymissileprogram
North KoreaSouth KoreaU.SBushPyongyang
Nuclear+ topics [politics]
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
Structured BrowsingSports
gamesWonTeamFinal
SeasonLeague
held
Politics
GovernmentMinister
AuthoritiesOpposition
OfficialsLeadersgroup
Unrest
PoliceAttach
runman
grouparrested
move
Border-Tension
NuclearBorderDialogueDiplomaticmilitantInsurgencymissile
PakistanIndiaKashmirNew DelhiIslamabadMusharrafVajpayee
UEFA-soccer
ChampionsGoalLegCoachStrikerMidfieldpenalty
Juventus AC Milan Real Madrid Milan Lazio RonaldoLyon
Tax-bills
TaxBillionCutPlanBudgetEconomylawmakers
BushSenateUSCongressFleischerWhite HouseRepublican
More on Personalization later on the talk
Quantitative Evaluation
Number of topics = 100
Effect of number of topics
Scalability
Model Contribution
• Named entities are very important• Removing time increase processing up to 2
seconds per document
Putting Things Together
Time vs. Machines• Data arrives dynamically• How to keep models up to date?
Batch Mini-batches Truly online
Single-Machine GibbsVariational
Online-LDA SMC
Multi-Machine Star-Synch. Star-Synch + Synchronous step
?
4.3 User Preference
Online EM and Submodularity
Storyline Summarization
• How to summarize a storyline with few articles?
Earthquake & Tsunami Nuclear PowerRescue & Relief Economy
Time
Storyline SummarizationEarthquake & Tsunami Nuclear PowerRescue & Relief Economy
Time
• How to summarize a storyline with few articles?• How to personalize the summary?
Storyline SummarizationEarthquake & Tsunami Nuclear PowerRescue & Relief Economy
Time
• How to summarize a storyline with few articles?• How to personalize the summary?
User Interaction
• Passive– We observe the user generated contents– Model user based on those content using unsupervised techniques
• Explicit– We present users with content– User give explicit feedback
• Like/dislike
– Learn user preference using supervised techniques• Implicit
– Mixture between the two– Present the user with items– Observe which items the user interact with– Learning user preference using semi-supervised models
User Satisfaction• Modular– Present users with items she prefers• Regardless of the context
– Targets relevance– Ex: vector space models
• Submodular– More of the same thing is not always better• Dimensioning return
– Targets diversity – Ex: TDN [ElArini et. Al. KDD 09]
Sequential Click-View Model
Modeling Views based on position
Sequential Click-View Model
Threshold Information gain#clicks
Modeling clicks using position and information gain
Coverage function’s weights are learnt
Sequential Click-View Model
Selected Summary
Story
Features
Modular
Submodular
Online Inference• Treat missing views as hidden variables– Realistic interaction model
• Use the online EM algorithm– Infer the value of hidden variables
• Optimize parameters using SGD– Use additive weights• Background + story + category + user
Online Inference
How Does it Work?
How Does It Work?
5. Summary Future Directions
Summary• Tools
– Load distribution, balancing and synchronization– Clustering, Topic Models
• Models– Dynamic non-parametric models– Sequential latent variable models
• Inference Algorithms– Distributed batch– Sequential Monte Carlo
• Applications– User profiling– News content analysis & recommendation
Future Directions• Theoretical bounds and guarantees• Network data– Graph partitioning
• Non-parametric models– Learning structure from data
• Working under communication constraints• Data distribution for particle filters