anant pradhan pet: a statistical model for popular events tracking in social communities cindy xide...
Post on 05-Jan-2016
220 Views
Preview:
TRANSCRIPT
Anant Pradhan
PET: A Statistical Model forPopular Events Tracking in Social CommunitiesCindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)
Introduction
Challenge: Tracking the evolution of a popular topic
2
Introduction• Observing and tracking:– Popular events– Topics that evolve over time
• Existing approaches focus on:– Burstiness – Evolution of networks– Ignore interplay between textual topics and
network structures.3
• Propose a novel statistical method (PET) that:– Models the popularity of events over time– Considers burstiness of user interest– Information diffusion on the network structure– Evolution of textual topics
Introduction
4
Introduction
• Gibbs Random Field used to model:– Influence of historical status – Dependency relationships in the graph
• Topic Model:– designed to explain the generation of text data
• Interplay by regularizing each other.5
Problem Definition
• Set of vertices: Vk
• Set of edges: Ek
• Network Stream: G = {G1, G2, · · ·, GT}
• Snapshot of network: Gk = {Vk, Ek}
• Document Stream: D = {D1,D2, · · ·, DT}
• Topic: θ• Event: ΘE = {θE
0, θE1, θE
2,· · ·, θET}
• Interest: Hk = {hk(1), hk(2), · · ·, hk(N)}6
Problem Definition
• Event-related information in a social community:– An observed stream of network structures– An observed stream of text documents– A latent stream of topics about the event– A latent stream of interests
7
The General Model
• Task is cast as the inference of previous Hk and Θk: P(Hk,Θk|Gk, Dk, Hk−1)
• Assumption 1: Current interest status Hk is independent of the document collection Dk
• Assumption 2: Current topic model θk is independent of the network structure Gk and the previous interest status Hk−1
8
• From the assumptions:P(Hk,Θk|Gk,Dk,Hk−1) = P(Hk|Gk,Hk−1) · P(Θk|Hk,Dk)
The General Model
Interest Model Topic Model
9
The Interest Model
• Modelled as a Gibbs Random Field on the network Gk
• Uses specially designed potential functions
• Uses weighting scheme motivated by real world networks
10
The Topic Model
• Models historical interest status and relationships on the network.
• Allows the topics and popularity of the events to mutually influence each other over time.
• P(Θk|Hk,Dk) P(D∝ k|Hk,Θk) P(Θk|Hk)
11
Connection to Existing Models
• Special cases of PET under certain conditions.
• The State Automation Model: – When the network effect is omitted
• The Contagion Model– When the topic effect is omitted
12
Complexity Analysis
• PLSA (Probabilistic Latent Semantic Analysis): O((N +M)mt)
PET: O(NMmT)N documents involving t topics with M words, m rounds and time T.
• Reasonable.13
Experiments
• JonK: State automation model. First Baseline.• Cont: The contagion model. Second Baseline.• PET- : PET minus network structures.• BOM: Box Office Earning. Gold Standard for
movie-related events.• GInt: Google Insight. Gold Standard for news
related events. 14
Experiments
• Twitter– 5000 users– 1,438,826 tweets– From Oct 2009 to Jan 2010– Events: 2 movies (Avatar, Twilight)
2 news events (Tiger Woods affair, Copenhagen Climate Conference)
15
• Setup:λT: Interest model. Weight for historical info.
λA: Interest model. Weight for structural info.
μE: Topic model.
λT = 1 λA = 3 μE = 1
Experiments
16
17
18
Result Analysis
• PET has the best performance.
• Cont has the worst performance.
• JonK generally performs well, but less accurate than PET.
19
Network Diffusion Analysis
• Cont can’t tell the difference between interest levels.
• Both PET and PET– are able to catch the rising trend of popularity.
• PET is still superior.
20
21
22
Events Analysis on DBLP
• For popular events, PET generates:– More accurate trends– smoother diffusion– meaningful content
evolution
23
Future Work
• Apply this model to track evolution of ideas, scientific innovation.
• Real-time event search system.
Conclusion
• A novel approach.
• Experimental evidence is convincing.
• Complexity might be a reason of concern.
Thank you.
Questions?
top related